Re: [gentoo-dev] [PATCH v2 1/1] esed.eclass: new eclass
On Fri, Jun 03, 2022 at 07:15:17PM -0500, Oskari Pirhonen wrote: [snip[ > Testing for (in)equality between pre- and post-sed contents is > reasonable enough in most cases. This time, though, it would fail to > detect anything has changed since the pre-sed contents have their NULL's > unintentionally stripped, whereas the post-sed contents have them > intentionally stripped. > > While I personally don't think that running sed on binary files is a > good idea in the first place, it's still relevant since the end result > would be an incorrect answer to the question of "Did sed actually do > anything?". Yeah one of the primary motivation to silence this was elsewhere, I don't think silencing matters for esed and if binary files are being modified may as well use sed(1) directly instead (this is something that'd need to be more intensively verified on bumps than with esed anyway). Use is also very uncommon, although sed is still "handy" when changing just 1 instruction and want to avoid an extra dependency on a proper binary patching method. > On the other hand, saving a set of pre- and post-sed hashes like Ulrich > suggested would give the expected result. If really wanted to solve this yes, although it may make sense to say this eclass is not for binary files. The talk to add bash-only "erepl" makes it rather difficult to preserve nulls (mapfile silently strip \0 without even a warning). mapfile -d '' could allow to restore them but ideally want to iterate on lines to do per-line pattern matches. Is it possible to hack away something to preserve? Probably.. but it's going to make this messy and I'm not sure it's worth it. erepl is also worse because they'll be lost in output and not just during comparison. -- ionen signature.asc Description: PGP signature
Re: [gentoo-dev] [PATCH v2 1/1] esed.eclass: new eclass
On Fri, Jun 03, 2022 at 07:36:46AM -0400, Ionen Wolkens wrote: > ... snip ... > > + # Roughly attempt to find files in arguments by checking if it's a > + # readable file (aka s/// is not a file) and does not start with - > + # (unless after --), then store contents for comparing after sed. > + local contents=() endopts files=() > + for ((i=1; i<=${#}; i++)); do > + if [[ ${!i} == -- && ! -v endopts ]]; then > + endopts=1 > + elif [[ ${!i} =~ ^(-i|--in-place)$ && ! -v endopts ]]; then > + # detect rushed sed -i -> esed -i, -i also silently > breaks enewsed > + die "passing ${!i} to ${FUNCNAME[0]} is invalid" > + elif [[ ${!i} =~ ^(-f|--file)$ && ! -v endopts ]]; then > + i+=1 # ignore script files > + elif [[ ( ${!i} != -* || -v endopts ) && -f ${!i} && -r ${!i} > ]]; then > + files+=( "${!i}" ) > + > + # 2>/dev/null to silence null byte warnings if sed > binary files > + { contents+=( "$(<"${!i}")" ); } 2>/dev/null \ > + || die "failed to read: ${!i}" > + fi > + done > + (( ${#files[@]} )) || die "no readable files found from '${*}' > arguments" > + > + local verbose > + [[ ${ESED_VERBOSE} ]] && type diff &>/dev/null && verbose=1 > + > + local changed newcontents > + if [[ -v _esed_output ]]; then > + [[ -v verbose ]] && > + einfo "${FUNCNAME[0]}: sed ${*} > ${_esed_output} ..." > + > + sed "${@}" > "${_esed_output}" \ > + || die "failed to run: sed ${*} > ${_esed_output}" > + > + { newcontents=$(<"${_esed_output}"); } 2>/dev/null \ > + || die "failed to read: ${_esed_output}" > + > + local IFS=$'\n' # sed concats with newline even if none at EOF > + contents=${contents[*]} > + unset IFS > + > + if [[ ${contents} != "${newcontents}" ]]; then > + changed=1 > + > + [[ -v verbose ]] && > + diff -u --color --label="${files[*]}" > --label="${_esed_output}" \ > + <(echo "${contents}") <(echo > "${newcontents}") > + fi > > ... snip ... I'm not 100% convinced that it will give you anything meaningful. The warning about ignoring NULL is not so much noise as it is bash warning you that you're probably not doing something correctly. In this case, you're not pulling _all_ the contents of the file: [ /tmp ] oskari@dj3ntoo λ printf "ab\0cd" >test.dat [ /tmp ] oskari@dj3ntoo λ hd test.dat 61 62 00 63 64|ab.cd| 0005 [ /tmp ] oskari@dj3ntoo λ var=$(< test.dat) bash: warning: command substitution: ignored null byte in input [ /tmp ] oskari@dj3ntoo λ printf "$var" | hd 61 62 63 64 |abcd| 0004 If it's a binary file, there's a decent chance the NULL's are significant. Now, consider the following hypothetical example where we want to remove the NULL's: [ /tmp ] oskari@dj3ntoo λ printf "ab\0cd" | sed -e 's/\x00//' | hd 61 62 63 64 |abcd| 0004 Testing for (in)equality between pre- and post-sed contents is reasonable enough in most cases. This time, though, it would fail to detect anything has changed since the pre-sed contents have their NULL's unintentionally stripped, whereas the post-sed contents have them intentionally stripped. While I personally don't think that running sed on binary files is a good idea in the first place, it's still relevant since the end result would be an incorrect answer to the question of "Did sed actually do anything?". On the other hand, saving a set of pre- and post-sed hashes like Ulrich suggested would give the expected result. - Oskari signature.asc Description: PGP signature
[gentoo-dev] [PATCH v2 1/1] esed.eclass: new eclass
Signed-off-by: Ionen Wolkens --- eclass/esed.eclass | 201 + 1 file changed, 201 insertions(+) create mode 100644 eclass/esed.eclass diff --git a/eclass/esed.eclass b/eclass/esed.eclass new file mode 100644 index 000..f327c3bbdf4 --- /dev/null +++ b/eclass/esed.eclass @@ -0,0 +1,201 @@ +# Copyright 2022 Gentoo Authors +# Distributed under the terms of the GNU General Public License v2 + +# @ECLASS: esed.eclass +# @MAINTAINER: +# Ionen Wolkens +# @AUTHOR: +# Ionen Wolkens +# @SUPPORTED_EAPIS: 8 +# @BLURB: sed(1) wrappers that die if expressions did not modify any files +# @EXAMPLE: +# +# @CODE +# esed 's/a/b/' src/file.c # -i is default, dies if 'a' does not become 'b' +# +# enewsed 's/a/b/' project.pc.in "${T}"/project.pc # stdin/out not supported +# +# esedfind . -type f -name '*.c' -esed 's/a/b/' # dies if zero files changed +# +# local esedexps=( +# # dies if /any/ of these did nothing, -e 's/a/b/' -e 's/c/d/' would not +# 's/a/b/' +# 's/c/d/' # bug 00 +# # use quotes around "$(use..)" to avoid word splitting/globs, won't run +# # sed(1) for empty elements (i.e. if USE is disabled) +# "$(usev fnord "s/foo bar/${baz}/")" +# ) +# esed Makefile lib/Makefile # unsets esedexps so it's not re-used +# +# use prefix && esed "s|^prefix=|&${EPREFIX}|" project.pc # deterministic +# @CODE +# +# Migration note: be wary of non-deterministic esed() involving variables, +# e.g. s|lib|$(get_libdir)|, s|-O3|${CFLAGS}|, and the above ${EPREFIX} one. +# esed() dies if these do nothing, like libdir being 'lib' on x86. Either +# verify, keep sed(1), or ensure a change (extra space, @placeholders@). + +case ${EAPI} in + 8) ;; + *) die "${ECLASS}: EAPI ${EAPI:-0} not supported" ;; +esac + +if [[ ! -v _ESED_ECLASS ]]; then +_ESED_ECLASS=1 + +# @ECLASS_VARIABLE: ESED_VERBOSE +# @DEFAULT_UNSET +# @USER_VARIABLE +# @DESCRIPTION: +# If set to a non-empty value, esed() and its wrappers will use diff(1) +# if available to display file differences. + +# @VARIABLE: esedexps +# @DEFAULT_UNSET +# @DESCRIPTION: +# Bash array that can optionally contain sed expressions to use sequencially +# on separate sed calls when using esed() and its wrappers. Allows inspection +# of modifications per-expressions. Unset after use so it's not used in +# subsequent calls. Will not run sed(1) for empty array elements. + +# @FUNCTION: esed +# @USAGE: ... +# @DESCRIPTION: +# sed(1) wrapper that dies if the expression(s) did not modify any files. +# sed's -i/--in-place is forced, and so stdin/out cannot be used. +esed() { + local -i i + + if [[ ${esedexps@a} =~ a ]]; then + # expression must be before -- but after the rest for e.g. -E to work + local -i pos + for ((pos=1; pos<=${#}; pos++)); do + [[ ${!pos} == -- ]] && break + done + + for ((i=0; i<${#esedexps[@]}; i++)); do + [[ ${esedexps[i]} ]] && + esedexps= esed "${@:1:pos-1}" -e "${esedexps[i]}" "${@:pos}" + done + + unset esedexps + return 0 + fi + + # Roughly attempt to find files in arguments by checking if it's a + # readable file (aka s/// is not a file) and does not start with - + # (unless after --), then store contents for comparing after sed. + local contents=() endopts files=() + for ((i=1; i<=${#}; i++)); do + if [[ ${!i} == -- && ! -v endopts ]]; then + endopts=1 + elif [[ ${!i} =~ ^(-i|--in-place)$ && ! -v endopts ]]; then + # detect rushed sed -i -> esed -i, -i also silently breaks enewsed + die "passing ${!i} to ${FUNCNAME[0]} is invalid" + elif [[ ${!i} =~ ^(-f|--file)$ && ! -v endopts ]]; then + i+=1 # ignore script files + elif [[ ( ${!i} != -* || -v endopts ) && -f ${!i} && -r ${!i} ]]; then + files+=( "${!i}" ) + + # 2>/dev/null to silence null byte warnings if sed binary files + { contents+=( "$(<"${!i}")" ); } 2>/dev/null \ + || die "failed to read: ${!i}" + fi + done + (( ${#files[@]} )) || die "no readable files found from '${*}' arguments" + + local verbose + [[ ${ESED_VERBOSE} ]] && type diff &>/dev/null && verbose=1 + + local changed newcontents + if [[ -v _esed_output ]]; then + [[ -v verbose ]] && + einfo "${FUNCNAME[0]}: sed ${*} > ${_esed_output} ..." + + sed "${@}" > "${_esed_output}" \ + || die "failed to run: sed ${*} > ${_esed_output}" + + { newcontents=$(<"${_esed_output}"); } 2>/dev/null \ + || die "failed to read: ${_esed_output}" + +