Re: [gentoo-dev] [PATCH v2 1/1] esed.eclass: new eclass

2022-06-04 Thread Ionen Wolkens
On Fri, Jun 03, 2022 at 07:15:17PM -0500, Oskari Pirhonen wrote:
[snip[
> Testing for (in)equality between pre- and post-sed contents is
> reasonable enough in most cases. This time, though, it would fail to
> detect anything has changed since the pre-sed contents have their NULL's
> unintentionally stripped, whereas the post-sed contents have them
> intentionally stripped.
>
> While I personally don't think that running sed on binary files is a
> good idea in the first place, it's still relevant since the end result
> would be an incorrect answer to the question of "Did sed actually do
> anything?".

Yeah one of the primary motivation to silence this was elsewhere, I
don't think silencing matters for esed and if binary files are being
modified may as well use sed(1) directly instead (this is something
that'd need to be more intensively verified on bumps than with esed
anyway).

Use is also very uncommon, although sed is still "handy" when changing
just 1 instruction and want to avoid an extra dependency on a proper
binary patching method.
 
> On the other hand, saving a set of pre- and post-sed hashes like Ulrich
> suggested would give the expected result.

If really wanted to solve this yes, although it may make sense to say
this eclass is not for binary files. The talk to add bash-only "erepl"
makes it rather difficult to preserve nulls (mapfile silently strip \0
without even a warning). mapfile -d '' could allow to restore them
but ideally want to iterate on lines to do per-line pattern matches.
Is it possible to hack away something to preserve? Probably.. but it's
going to make this messy and I'm not sure it's worth it.

erepl is also worse because they'll be lost in output and not just
during comparison.

-- 
ionen


signature.asc
Description: PGP signature


Re: [gentoo-dev] [PATCH v2 1/1] esed.eclass: new eclass

2022-06-03 Thread Oskari Pirhonen
On Fri, Jun 03, 2022 at 07:36:46AM -0400, Ionen Wolkens wrote:
> ... snip ...
>
> + # Roughly attempt to find files in arguments by checking if it's a
> + # readable file (aka s/// is not a file) and does not start with -
> + # (unless after --), then store contents for comparing after sed.
> + local contents=() endopts files=()
> + for ((i=1; i<=${#}; i++)); do
> + if [[ ${!i} == -- && ! -v endopts ]]; then
> + endopts=1
> + elif [[ ${!i} =~ ^(-i|--in-place)$ && ! -v endopts ]]; then
> + # detect rushed sed -i -> esed -i, -i also silently 
> breaks enewsed
> + die "passing ${!i} to ${FUNCNAME[0]} is invalid"
> + elif [[ ${!i} =~ ^(-f|--file)$ && ! -v endopts ]]; then
> + i+=1 # ignore script files
> + elif [[ ( ${!i} != -* || -v endopts ) && -f ${!i} && -r ${!i} 
> ]]; then
> + files+=( "${!i}" )
> +
> + # 2>/dev/null to silence null byte warnings if sed 
> binary files
> + { contents+=( "$(<"${!i}")" ); } 2>/dev/null \
> + || die "failed to read: ${!i}"
> + fi
> + done
> + (( ${#files[@]} )) || die "no readable files found from '${*}' 
> arguments"
> +
> + local verbose
> + [[ ${ESED_VERBOSE} ]] && type diff &>/dev/null && verbose=1
> +
> + local changed newcontents
> + if [[ -v _esed_output ]]; then
> + [[ -v verbose ]] &&
> + einfo "${FUNCNAME[0]}: sed ${*} > ${_esed_output} ..."
> +
> + sed "${@}" > "${_esed_output}" \
> + || die "failed to run: sed ${*} > ${_esed_output}"
> +
> + { newcontents=$(<"${_esed_output}"); } 2>/dev/null \
> + || die "failed to read: ${_esed_output}"
> +
> + local IFS=$'\n' # sed concats with newline even if none at EOF
> + contents=${contents[*]}
> + unset IFS
> +
> + if [[ ${contents} != "${newcontents}" ]]; then
> +  changed=1
> +
> + [[ -v verbose ]] &&
> + diff -u --color --label="${files[*]}" 
> --label="${_esed_output}" \
> + <(echo "${contents}") <(echo 
> "${newcontents}")
> + fi
>
> ... snip ...

I'm not 100% convinced that it will give you anything meaningful. The
warning about ignoring NULL is not so much noise as it is bash warning
you that you're probably not doing something correctly. In this case,
you're not pulling _all_ the contents of the file:

[ /tmp ]
oskari@dj3ntoo λ printf "ab\0cd" >test.dat
[ /tmp ]
oskari@dj3ntoo λ hd test.dat
  61 62 00 63 64|ab.cd|
0005
[ /tmp ]
oskari@dj3ntoo λ var=$(< test.dat)
bash: warning: command substitution: ignored null byte in input
[ /tmp ]
oskari@dj3ntoo λ printf "$var" | hd
  61 62 63 64   |abcd|
0004

If it's a binary file, there's a decent chance the NULL's are
significant. Now, consider the following hypothetical example where we
want to remove the NULL's:

[ /tmp ]
oskari@dj3ntoo λ printf "ab\0cd" | sed -e 's/\x00//' | hd
  61 62 63 64   |abcd|
0004

Testing for (in)equality between pre- and post-sed contents is
reasonable enough in most cases. This time, though, it would fail to
detect anything has changed since the pre-sed contents have their NULL's
unintentionally stripped, whereas the post-sed contents have them
intentionally stripped.

While I personally don't think that running sed on binary files is a
good idea in the first place, it's still relevant since the end result
would be an incorrect answer to the question of "Did sed actually do
anything?".

On the other hand, saving a set of pre- and post-sed hashes like Ulrich
suggested would give the expected result.

- Oskari


signature.asc
Description: PGP signature


[gentoo-dev] [PATCH v2 1/1] esed.eclass: new eclass

2022-06-03 Thread Ionen Wolkens
Signed-off-by: Ionen Wolkens 
---
 eclass/esed.eclass | 201 +
 1 file changed, 201 insertions(+)
 create mode 100644 eclass/esed.eclass

diff --git a/eclass/esed.eclass b/eclass/esed.eclass
new file mode 100644
index 000..f327c3bbdf4
--- /dev/null
+++ b/eclass/esed.eclass
@@ -0,0 +1,201 @@
+# Copyright 2022 Gentoo Authors
+# Distributed under the terms of the GNU General Public License v2
+
+# @ECLASS: esed.eclass
+# @MAINTAINER:
+# Ionen Wolkens 
+# @AUTHOR:
+# Ionen Wolkens 
+# @SUPPORTED_EAPIS: 8
+# @BLURB: sed(1) wrappers that die if expressions did not modify any files
+# @EXAMPLE:
+#
+# @CODE
+# esed 's/a/b/' src/file.c # -i is default, dies if 'a' does not become 'b'
+#
+# enewsed 's/a/b/' project.pc.in "${T}"/project.pc # stdin/out not supported
+#
+# esedfind . -type f -name '*.c' -esed 's/a/b/' # dies if zero files changed
+#
+# local esedexps=(
+# # dies if /any/ of these did nothing, -e 's/a/b/' -e 's/c/d/' would not
+# 's/a/b/'
+# 's/c/d/' # bug 00
+# # use quotes around "$(use..)" to avoid word splitting/globs, won't run
+# # sed(1) for empty elements (i.e. if USE is disabled)
+# "$(usev fnord "s/foo bar/${baz}/")"
+# )
+# esed Makefile lib/Makefile # unsets esedexps so it's not re-used
+#
+# use prefix && esed "s|^prefix=|&${EPREFIX}|" project.pc # deterministic
+# @CODE
+#
+# Migration note: be wary of non-deterministic esed() involving variables,
+# e.g. s|lib|$(get_libdir)|, s|-O3|${CFLAGS}|, and the above ${EPREFIX} one.
+# esed() dies if these do nothing, like libdir being 'lib' on x86.  Either
+# verify, keep sed(1), or ensure a change (extra space, @placeholders@).
+
+case ${EAPI} in
+   8) ;;
+   *) die "${ECLASS}: EAPI ${EAPI:-0} not supported" ;;
+esac
+
+if [[ ! -v _ESED_ECLASS ]]; then
+_ESED_ECLASS=1
+
+# @ECLASS_VARIABLE: ESED_VERBOSE
+# @DEFAULT_UNSET
+# @USER_VARIABLE
+# @DESCRIPTION:
+# If set to a non-empty value, esed() and its wrappers will use diff(1)
+# if available to display file differences.
+
+# @VARIABLE: esedexps
+# @DEFAULT_UNSET
+# @DESCRIPTION:
+# Bash array that can optionally contain sed expressions to use sequencially
+# on separate sed calls when using esed() and its wrappers.  Allows inspection
+# of modifications per-expressions.  Unset after use so it's not used in
+# subsequent calls.  Will not run sed(1) for empty array elements.
+
+# @FUNCTION: esed
+# @USAGE: ...
+# @DESCRIPTION:
+# sed(1) wrapper that dies if the expression(s) did not modify any files.
+# sed's -i/--in-place is forced, and so stdin/out cannot be used.
+esed() {
+   local -i i
+
+   if [[ ${esedexps@a} =~ a ]]; then
+   # expression must be before -- but after the rest for e.g. -E 
to work
+   local -i pos
+   for ((pos=1; pos<=${#}; pos++)); do
+   [[ ${!pos} == -- ]] && break
+   done
+
+   for ((i=0; i<${#esedexps[@]}; i++)); do
+   [[ ${esedexps[i]} ]] &&
+   esedexps= esed "${@:1:pos-1}" -e 
"${esedexps[i]}" "${@:pos}"
+   done
+
+   unset esedexps
+   return 0
+   fi
+
+   # Roughly attempt to find files in arguments by checking if it's a
+   # readable file (aka s/// is not a file) and does not start with -
+   # (unless after --), then store contents for comparing after sed.
+   local contents=() endopts files=()
+   for ((i=1; i<=${#}; i++)); do
+   if [[ ${!i} == -- && ! -v endopts ]]; then
+   endopts=1
+   elif [[ ${!i} =~ ^(-i|--in-place)$ && ! -v endopts ]]; then
+   # detect rushed sed -i -> esed -i, -i also silently 
breaks enewsed
+   die "passing ${!i} to ${FUNCNAME[0]} is invalid"
+   elif [[ ${!i} =~ ^(-f|--file)$ && ! -v endopts ]]; then
+   i+=1 # ignore script files
+   elif [[ ( ${!i} != -* || -v endopts ) && -f ${!i} && -r ${!i} 
]]; then
+   files+=( "${!i}" )
+
+   # 2>/dev/null to silence null byte warnings if sed 
binary files
+   { contents+=( "$(<"${!i}")" ); } 2>/dev/null \
+   || die "failed to read: ${!i}"
+   fi
+   done
+   (( ${#files[@]} )) || die "no readable files found from '${*}' 
arguments"
+
+   local verbose
+   [[ ${ESED_VERBOSE} ]] && type diff &>/dev/null && verbose=1
+
+   local changed newcontents
+   if [[ -v _esed_output ]]; then
+   [[ -v verbose ]] &&
+   einfo "${FUNCNAME[0]}: sed ${*} > ${_esed_output} ..."
+
+   sed "${@}" > "${_esed_output}" \
+   || die "failed to run: sed ${*} > ${_esed_output}"
+
+   { newcontents=$(<"${_esed_output}"); } 2>/dev/null \
+   || die "failed to read: ${_esed_output}"
+
+