Bug#1028002: dash: sid dash globs no longer allow [^...] to negate a class; upcoming breaking change from bullseye
On Tue, 7 Nov 2023 16:14:58 + patrick.a...@orange.com wrote: > > So it means that the kind of code below in anyone's script will have an opposite result depending on whether we use dash of the Bookworm version or dash of the Bullseye version and that this same command interpreted by Bash or dash will always have an opposite result ? Not at all sure it's serious > > - Dash Bullseye and bash ># case "A" in [^A]) echo "character not accepted" ;;esac > > - Dash bookworm ># case "A" in [^A]) echo "character not accepted" ;;esac >character not accepted > > thanks for reconsideration, > Patrick +1000! The fact that one year has elapsed with that terrible regression in sid without any complaint does NOT mean that it is "okay"; it only means that a huge population of Debian sysadmins only ever stick to stable. In this huge population, how many might be using #! /bin/sh as a shebang ? And among them, how many use caret-negation in "case..esac" ? And within this (still hefty IMO) subset, how many are operating stuff like nuclear facilities, planes or brain surgery tools ? Does this perspective make it sound reasonable to break decade-old semantics in the most central piece of modern software after the Linux kernel ? TL;DR: please please please NO, don't freakin' break the Shell ! -Alex
Bug#1028002: dash: sid dash globs no longer allow [^...] to negate a class; upcoming breaking change from bullseye
Hi, On Thu, 13 Apr 2023 11:48:10 +0200 Paul Gevers wrote: > Control: clone -1 -2 > Control: reassign -2 release-notes > > On 12-04-2023 16:57, Santiago Ruano Rincón wrote: > > If the current behaviour > > would be part of bookworm, a NEWS entry would be great. > > And a release note would be worth it too I guess. > > Paul So it means that the kind of code below in anyone's script will have an opposite result depending on whether we use dash of the Bookworm version or dash of the Bullseye version and that this same command interpreted by Bash or dash will always have an opposite result ? Not at all sure it's serious - Dash Bullseye and bash # case "A" in [^A]) echo "character not accepted" ;;esac - Dash bookworm # case "A" in [^A]) echo "character not accepted" ;;esac character not accepted thanks for reconsideration, Patrick Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you.
Bug#1034344: Bug#1028002: dash: sid dash globs no longer allow [^...] to negate a class; upcoming breaking change from bullseye
On 29/05/2023 17:59, Paul Gevers wrote: On 29-05-2023 12:51, Max Nikulin wrote: I am unaware of another dash implementation. Do you mean ash from which dash was forked? No, I understood from Andrej that dash *internally* has two ways to do the matching. One embedded implementation, and one using system library calls. Thank you for clarification, I did not realized that you were writing about glob/fnmatch implementation that supports [^c] negation in glibc while the internal alternative treats its as a literal. Other libc variants are out of the scope of the debian package.
Bug#1034344: Bug#1028002: dash: sid dash globs no longer allow [^...] to negate a class; upcoming breaking change from bullseye
Hi, On 29-05-2023 12:51, Max Nikulin wrote: I am unaware of another dash implementation. Do you mean ash from which dash was forked? No, I understood from Andrej that dash *internally* has two ways to do the matching. One embedded implementation, and one using system library calls. Which one is used depends on the configure options during the build. Both code paths are now made consistent (with the way dash maintainers always ment it to be). Paul OpenPGP_signature Description: OpenPGP digital signature
Bug#1034344: Bug#1028002: dash: sid dash globs no longer allow [^...] to negate a class; upcoming breaking change from bullseye
On 29/05/2023 17:30, Paul Gevers wrote: On 29-05-2023 12:02, Max Nikulin wrote: Strictly speaking, behavior of circumflex is *unspecified* in POSIX: ... A bracket expression starting with an unquoted character produces unspecified results. Right. Maybe better to say it now matches the other implementation (dash has two implementations and they were behaving differently). I am unaware of another dash implementation. Do you mean ash from which dash was forked? I have checked https://en.wikipedia.org/wiki/Debian_Almquist_shell and noticed that busybox ash implementation was derived from dash, but the similar issue is still open in their tracker. I would recommend users to check scripts by the "shellcheck" static analyzer, but I am unsure if such suggestion is suitable for release notes or for Debian news in the dash package. https://www.shellcheck.net/wiki/SC3026
Bug#1034344: Bug#1028002: dash: sid dash globs no longer allow [^...] to negate a class; upcoming breaking change from bullseye
Hi, On 29-05-2023 12:02, Max Nikulin wrote: Strictly speaking, behavior of circumflex is *unspecified* in POSIX: ... A bracket expression starting with an unquoted character produces unspecified results. Right. Maybe better to say it now matches the other implementation (dash has two implementations and they were behaving differently). Paul OpenPGP_signature Description: OpenPGP digital signature
Bug#1034344: Bug#1028002: dash: sid dash globs no longer allow [^...] to negate a class; upcoming breaking change from bullseye
On 29/05/2023 02:53, Paul Gevers wrote: Our (crafted with Andrej) proposal is here: https://salsa.debian.org/ddp-team/release-notes/-/merge_requests/181 from the diff: ... as a literal character, as was always the intended POSIX-compliant behavior. Strictly speaking, behavior of circumflex is *unspecified* in POSIX: ... A bracket expression starting with an unquoted character produces unspecified results. Moreover, it is intentionally left unspecified: https://www.austingroupbugs.net/view.php?id=1558
Bug#1034344: Bug#1028002: dash: sid dash globs no longer allow [^...] to negate a class; upcoming breaking change from bullseye
Control: tags -1 pending patch Hi, On Thu, 13 Apr 2023 11:48:10 +0200 Paul Gevers wrote: On 12-04-2023 16:57, Santiago Ruano Rincón wrote: > If the current behaviour > would be part of bookworm, a NEWS entry would be great. And a release note would be worth it too I guess. Our (crafted with Andrej) proposal is here: https://salsa.debian.org/ddp-team/release-notes/-/merge_requests/181 Paul OpenPGP_signature Description: OpenPGP digital signature
Bug#1034344: Bug#1028002: dash: sid dash globs no longer allow [^...] to negate a class; upcoming breaking change from bullseye
On Fri, 06 Jan 2023 10:52:31 +0100 "Andrej Shadura" wrote: > On Thu, 5 Jan 2023, at 21:32, наб wrote: > > Please for the love of god add this to the NEWS. > > I /guarantee/ people are using '[^0-9]' to mean "not 0-9", > > and similar constructs, even if they are well-versed in the shell language. > I’m actually considering reverting that patch, as it seems a bit too late in > the release cycle to introduce such a breaking change. Hi - what is the status of these bugs about globbing in dash: is there a change in dash and a need to add to release-notes or not? https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1028002 against dash asking for NEWS is still open, https://salsa.debian.org/debian/dash/-/blob/debian/unstable/debian/dash.NEWS is not updated since 2009 And the message above says the change might be reverted So should https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1034344 against asking to document in release-notes be closed?
Bug#1034344: Bug#1028002: dash: sid dash globs no longer allow [^...] to negate a class; upcoming breaking change from bullseye
On Thu, 13 Apr 2023 11:48:10 +0200 Paul Gevers wrote: On 12-04-2023 16:57, Santiago Ruano Rincón wrote: > If the current behaviour > would be part of bookworm, a NEWS entry would be great. And a release note would be worth it too I guess. Shellcheck static analyzer detects the issue with [^c] for pattern matching. I think, it may be recommended for installation https://packages.debian.org/bookworm/shellcheck or as an online tool https://www.shellcheck.net/ The warning concerning globs recommends to visit the following page: https://www.shellcheck.net/wiki/SC3026 SC3026 In POSIX sh, ^ in place of ! in glob bracket expressions is undefined.
Bug#1028002: dash: sid dash globs no longer allow [^...] to negate a class; upcoming breaking change from bullseye
Control: clone -1 -2 Control: reassign -2 release-notes On 12-04-2023 16:57, Santiago Ruano Rincón wrote: If the current behaviour would be part of bookworm, a NEWS entry would be great. And a release note would be worth it too I guess. Paul OpenPGP_signature Description: OpenPGP digital signature
Bug#1028002: dash: sid dash globs no longer allow [^...] to negate a class; upcoming breaking change from bullseye
Control: severity -1 important Hi! On Fri, 6 Jan 2023 12:31:47 +0100 =?utf-8?B?0L3QsNCx?= wrote: > Hi! > > On Fri, Jan 06, 2023 at 10:52:31AM +0100, Andrej Shadura wrote: > > On Thu, 5 Jan 2023, at 21:32, наб wrote: > > > Bisecting over the upstream git, I got > > > commit 8f9cca055bc661c4c690a5f5e1ca71370d129bc3 (HEAD, refs/bisect/bad) > > > Author: Herbert Xu > > > Date: Wed Jan 19 16:37:54 2022 +1100 > > > > > > expand: Always quote caret when using fnmatch > > > > > as the first bad commit with default configuration (HAVE_FNMATCH=1). > > > > > > I /cannot/ find a set-up where configuring like Debian > > > (--disable-fnmatch --disable-lineno --disable-glob) > > > isn't broken. > > > > I’m not sure why this also affects configurations with --disable-fnmatch — > > from the description of it, it shouldn’t? > > Well, dash's built-in globs Just Don't Support ^. Never have. > (Defined as "current code doesn't and it blames to start-of-git".) > They're strictly POSIX, and ^ is a regular character for them. > > 8f9cca0 fixes the fact that glibc fnmatch() has a special meaning for ^ > by unconditionally escaping it (if configured for libc fnmatch) ‒ > it normalises [^0-9] to always mean [0-9^], > regardless of --with-fnmatch/--disable-fnmatch. > > > > Y'know what, I bisected the Salsa git, too, but then I consulted POSIX. > > > Apparently, this is fine. > > > > > Please for the love of god add this to the NEWS. > > > I /guarantee/ people are using '[^0-9]' to mean "not 0-9", > > > and similar constructs, even if they are well-versed in the shell > > > language. > > > > > > This is a breaking change going from bullseye, and quite an insidious one. > > > I assume my reaction is gonna mirror others' quite well. > > > > > > /Please/ add this to the NEWS. > > > > I’m actually considering reverting that patch, as it seems a bit too late > > in the release cycle to introduce such a breaking change. > > I've bisected across snapshot.d.o, and the first Debian version > that exhibits this behaviour is 0.5.11+git20210903+057cd650a4ed-4: > > http://snapshot.debian.org/package/dash/0.5.11%2Bgit20210903%2B057cd650a4ed-4/ > > Which, if I understand it right, has landed in sid on 2022-03-04. > Since march of last year, sid and testing have been using this; > quoth tracker.d.o: > [2022-03-07] dash 0.5.11+git20210903+057cd650a4ed-7 MIGRATED to testing > (Debian testing watch) > > So it's been a good part of a year and no-one's complained > (maybe I'm the idiot what doesn't know globs are negated with !s), > from the point of view of "system compatibility", > I think this has passed the test. > > From the point of user code, a NEWS entry I'd consider sufficient, > as usual for breaking-for-compat user-observable changes. > > Reverting this now would probably have the opposite effect I am taking the liberty to increase the severity of this bug. I'd say it is serious, but I'd let the maintainer or the release team to decide on that. I am aware of at least one user hit by this. If the current behaviour would be part of bookworm, a NEWS entry would be great. Thanks, -- Santiago signature.asc Description: PGP signature
Bug#1028002: dash: sid dash globs no longer allow [^...] to negate a class; upcoming breaking change from bullseye
Hi! On Fri, Jan 06, 2023 at 10:52:31AM +0100, Andrej Shadura wrote: > On Thu, 5 Jan 2023, at 21:32, наб wrote: > > Bisecting over the upstream git, I got > > commit 8f9cca055bc661c4c690a5f5e1ca71370d129bc3 (HEAD, refs/bisect/bad) > > Author: Herbert Xu > > Date: Wed Jan 19 16:37:54 2022 +1100 > > > > expand: Always quote caret when using fnmatch > > > as the first bad commit with default configuration (HAVE_FNMATCH=1). > > > > I /cannot/ find a set-up where configuring like Debian > > (--disable-fnmatch --disable-lineno --disable-glob) > > isn't broken. > > I’m not sure why this also affects configurations with --disable-fnmatch — > from the description of it, it shouldn’t? Well, dash's built-in globs Just Don't Support ^. Never have. (Defined as "current code doesn't and it blames to start-of-git".) They're strictly POSIX, and ^ is a regular character for them. 8f9cca0 fixes the fact that glibc fnmatch() has a special meaning for ^ by unconditionally escaping it (if configured for libc fnmatch) ‒ it normalises [^0-9] to always mean [0-9^], regardless of --with-fnmatch/--disable-fnmatch. > > Y'know what, I bisected the Salsa git, too, but then I consulted POSIX. > > Apparently, this is fine. > > > Please for the love of god add this to the NEWS. > > I /guarantee/ people are using '[^0-9]' to mean "not 0-9", > > and similar constructs, even if they are well-versed in the shell language. > > > > This is a breaking change going from bullseye, and quite an insidious one. > > I assume my reaction is gonna mirror others' quite well. > > > > /Please/ add this to the NEWS. > > I’m actually considering reverting that patch, as it seems a bit too late in > the release cycle to introduce such a breaking change. I've bisected across snapshot.d.o, and the first Debian version that exhibits this behaviour is 0.5.11+git20210903+057cd650a4ed-4: http://snapshot.debian.org/package/dash/0.5.11%2Bgit20210903%2B057cd650a4ed-4/ Which, if I understand it right, has landed in sid on 2022-03-04. Since march of last year, sid and testing have been using this; quoth tracker.d.o: [2022-03-07] dash 0.5.11+git20210903+057cd650a4ed-7 MIGRATED to testing (Debian testing watch) So it's been a good part of a year and no-one's complained (maybe I'm the idiot what doesn't know globs are negated with !s), from the point of view of "system compatibility", I think this has passed the test. From the point of user code, a NEWS entry I'd consider sufficient, as usual for breaking-for-compat user-observable changes. Reverting this now would probably have the opposite effect (breaking (and in this case this /is/ breaking, since the new behaviour is correct) people's globs late in the release cycle). But what do I know, наб signature.asc Description: PGP signature
Bug#1028002: dash: sid dash globs no longer allow [^...] to negate a class; upcoming breaking change from bullseye
Hi, On Thu, 5 Jan 2023, at 21:32, наб wrote: > (I built 0.5.12-2 from the .dsc, > the binary packages don't appear to have propagated yet. > I also originally wrote this without knowing that glob > classes are negated by !, not ^. > s/correct/compatible/ and s/broken/incompatible/, i guess) Thanks for the report. > Ruh-roh!!! That's /horrific/. > In my original reproducer that test is for checking the input > is an integer, this is a common pattern. > > This smells an awful lot like it'd affect all globs, right? > Yeah. <...> > Bisecting over the upstream git, I got > commit 8f9cca055bc661c4c690a5f5e1ca71370d129bc3 (HEAD, refs/bisect/bad) > Author: Herbert Xu > Date: Wed Jan 19 16:37:54 2022 +1100 > > expand: Always quote caret when using fnmatch > as the first bad commit with default configuration (HAVE_FNMATCH=1). > > I /cannot/ find a set-up where configuring like Debian > (--disable-fnmatch --disable-lineno --disable-glob) > isn't broken. I’m not sure why this also affects configurations with --disable-fnmatch — from the description of it, it shouldn’t? > Y'know what, I bisected the Salsa git, too, but then I consulted POSIX. > Apparently, this is fine. > Please for the love of god add this to the NEWS. > I /guarantee/ people are using '[^0-9]' to mean "not 0-9", > and similar constructs, even if they are well-versed in the shell language. > > This is a breaking change going from bullseye, and quite an insidious one. > I assume my reaction is gonna mirror others' quite well. > > /Please/ add this to the NEWS. I’m actually considering reverting that patch, as it seems a bit too late in the release cycle to introduce such a breaking change. -- Cheers, Andrej
Bug#1028002: dash: sid dash globs no longer allow [^...] to negate a class; upcoming breaking change from bullseye
Package: dash Version: 0.5.12-2 Version: 0.5.11+git20210903+057cd650a4ed-9 Severity: wishlist Dear Maintainer, (I built 0.5.12-2 from the .dsc, the binary packages don't appear to have propagated yet. I also originally wrote this without knowing that glob classes are negated by !, not ^. s/correct/compatible/ and s/broken/incompatible/, i guess) Original reproducer: sh -xc 'rerat_secs=7200; [ "${rerat_secs%[^0-9]*}" != "$rerat_secs" ]; echo $?' reduced for testing: sh -c 'i=10; echo "${i%[^0-9]*}"' The /correct/ output, given by 0.5.11+git20200708+dd9ef66-5 (bullseye) (and bash, and any other shell), is, naturally "10": we're removing, from the end, a nondigit, then anything. There are no nondigits, so nothing is removed. Let's observe: bullseye$ sh -c 'i=10; echo "${i%[^0-9]*}"' 10 sid$ sh -c 'i=10; echo "${i%[^0-9]*}"' 1 0.5.12-2$ sh -c 'i=10; echo "${i%[^0-9]*}"' 1 trunk$ sh -c 'i=10; echo "${i%[^0-9]*}"' 1 Ruh-roh!!! That's /horrific/. In my original reproducer that test is for checking the input is an integer, this is a common pattern. This smells an awful lot like it'd affect all globs, right? Yeah. $ ls 1 10 2 3 4 5 6 7 8 9 bin DEBIAN usr $ echo [^0-9]* # bash, bullseye dash bin DEBIAN usr $ sh -c 'echo [^0-9]*' # sid dash, dash 0.5.12+ trunk 1 10 2 3 4 5 6 7 8 9 Terrifying. Bisecting over the upstream git, I got commit 8f9cca055bc661c4c690a5f5e1ca71370d129bc3 (HEAD, refs/bisect/bad) Author: Herbert Xu Date: Wed Jan 19 16:37:54 2022 +1100 expand: Always quote caret when using fnmatch This patch forces ^ to be a literal when we use fnmatch. In order to allow for the extra space to quote the caret, the function _rmescapes will allocate up to twice the memory if the flag RMESCAPE_GLOB is set. Fixes: 7638476c18f2 ("shell: Enable fnmatch/glob by default") Reported-by: Christoph Anton Mitterer Suggested-by: Harald van Dijk Signed-off-by: Herbert Xu as the first bad commit with default configuration (HAVE_FNMATCH=1). I /cannot/ find a set-up where configuring like Debian (--disable-fnmatch --disable-lineno --disable-glob) isn't broken. Y'know what, I bisected the Salsa git, too, but then I consulted POSIX. Apparently, this is fine. Apparently, XCU, 2.13.1 Patterns Matching a Single Character: When unquoted and outside a bracket expression, the following three characters shall have special meaning in the specification of patterns: [ If an open bracket introduces a bracket expression as in XBD RE Bracket Expression, except that the character ( '!' ) shall replace the character ( '^' ) in its role in a non-matching list in the regular expression notation, it shall introduce a pattern bracket expression. A bracket expression starting with an unquoted character produces unspecified results. Otherwise, '[' shall match the character itself. Please for the love of god add this to the NEWS. I /guarantee/ people are using '[^0-9]' to mean "not 0-9", and similar constructs, even if they are well-versed in the shell language. This is a breaking change going from bullseye, and quite an insidious one. I assume my reaction is gonna mirror others' quite well. /Please/ add this to the NEWS. Thanks, наб -- System Information: Debian Release: bookworm/sid APT prefers unstable APT policy: (500, 'unstable') Architecture: x32 (x86_64) Foreign Architectures: amd64, i386 Kernel: Linux 6.0.0-6-amd64 (SMP w/2 CPU threads; PREEMPT) Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8), LANGUAGE not set Shell: /bin/sh linked to /usr/bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages dash depends on: ii debianutils 5.7-0.4 ii dpkg 1.21.15 ii libc62.36-7 dash recommends no packages. dash suggests no packages. -- debconf information excluded signature.asc Description: PGP signature