On Thu, Nov 10, 2022 at 7:44 AM Christos Zoulas <chris...@astron.com> wrote:
> In article < > cajgzzorydwzwyur9wggdplocxebjnxepmkbiouyxsxdu-jk...@mail.gmail.com>, > enh <e...@google.com> wrote: > >-=-=-=-=-=- > > > >i see (having synced the current NetBSD lib/libc/regex to Android) that > >regcomp() no longer allows unescaped `{` and `}`. this is technically > >correct (since POSIX explicitly calls this undefined behavior), but it's a > >change from historical NetBSD behavior. > > > >specifically (since this was the existing Android test that failed) this > is > >now rejected: > > > >{\n 1 : 2,\n 3 : -7,\n -1 : 1,\n -2 : {(0x[0-9a-f]{2}, > >){31}0x[0-9a-f]{2}},\n -3 : {(0x[0-9a-f]{2}, ){31}0x[0-9a-f]{2}},\n } > > > >while this is of course fine: > > > >\\{\n 1 : 2,\n 3 : -7,\n -1 : 1,\n -2 : \\{(0x[0-9a-f]{2}, > >){31}0x[0-9a-f]{2}\\},\n -3 : \\{(0x[0-9a-f]{2}, ){31}0x[0-9a-f]{2}\\},\n > >\\} > > > >but the former used to be interpreted as the latter :-) > > > >macOS (which also has a BSD-based libc) seems to still allow the former > >(but they might just be lagging behind, like Android was?). > > > >glibc does not allow it. > > > >i don't yet have any data on app compat failures, just this one unit test > >for the OS itself so far, but i'm curious --- was this a _deliberate_ > >behavior change, or is this a surprise? > > > >i don't have a plan for Android yet, and i'll probably not have one > >until/unless we do see more than one test hit this in practice, but i'm > >trying to think ahead about what my options might be. i'd be interested to > >know whether -- if it came to it -- you'd consider a patch to restore the > >old behavior. or whether you consider this change in NetBSD's behavior to > >be a bug in its own right. alternatively, i can always have a "what > version > >of the OS were you expecting to run on?" check and offer both behaviors > for > >a few years before retiring the old behavior (because the Play Store > >requires that you move forward with your OS version support eventually). > > This was done as part of syncing the NetBSD regex code with FreeBSD's to > get utf8 support. With it came support for GNU regex extensions (\b\s\w > etc), > which are easier to implement if escaped ordinary characters are expected > to behave the same way as unescaped ones: > > > https://github.com/freebsd/freebsd-src/commit/adeebf4cd47c3e85155d92f386bda5e519b75ab2 ah, thanks for that link. stupidly, although i'd seen that the NetBSD changes were syncing with FreeBSD, i didn't go to look at the original FreeBSD commits. cool, that sounds like i (a) have a clear "why" argument should anyone ask, and (b) a ready-made `PFLAG_LEGACY_ESC` escape hatch for backwards compatibility. i'll be interested to see what iOS/macOS does here (because ideally Android and iOS would do the _same_ thing so there's less for mobile developers to worry about!). oh... looks like they use TRE instead? https://opensource.apple.com/source/Libc/Libc-1439.141.1/regex/ interesting; i thought it was only musl that used TRE behind the scenes. anyway, thanks! > > Best, > > christos > >