bug#51231: increase performance and usability of binary search with -P

2021-10-15 Thread Carlo Arenas
The following patch increase performance of grep when looking at binary data, without any side effects: Summary 'cd grep; ./src/grep -Pc foo /Users/carlo/Downloads/FreeBSD-13.0-BETA2-amd64.vhd' ran 1.77 ± 0.02 times faster than 'cd grep.base; ./src/grep -Pc foo

bug#51231: disregard patch

2021-10-16 Thread Carlo Arenas
And of course it has side effects (as shown by the test suite), and would only help (if fixed) when the needle is a fixed string, which is 3x slower than doing -F, -G or -E. Apologies for the distraction. Carlo

bug#51235: resolve old FIXME in PCRE implementation to allow more than 1 expression

2021-10-16 Thread Carlo Arenas
With this patch, multiple expressions (from -e or -f) are now acceptable with -P for easier side by side comparison with the other supported engines. Alternatively, multiple expressions could be compiled and run sequentially for matching, but I suspect the added compilation time is likely higher,

bug#51235: resolve old FIXME in PCRE implementation to allow more than 1 expression

2021-10-16 Thread Carlo Arenas
On Sat, Oct 16, 2021 at 12:50 AM Paul Eggert wrote: > > On 10/16/21 12:00 AM, Carlo Arenas wrote: > > With this patch, multiple expressions (from -e or -f) are now > > acceptable with -P for easier side by side comparison with the other > > supported engines. > >

bug#51727: add an optional flag to -P to disable JIT

2021-11-10 Thread Carlo Arenas
On Tue, Nov 9, 2021 at 4:40 PM Paul Eggert wrote: > > On 11/9/21 11:04, Carlo Marcelo Arenas Belón wrote: > > Severity: wishlist > > > > There are times, when the expression is too simple or will not be used too > > often to justify the extra time in -P that is required for JIT compilation. > >

bug#47264: [PATCH v2] pcre: migrate to pcre2

2021-11-14 Thread Carlo Arenas
On Sun, Nov 14, 2021 at 2:45 PM Jeffrey Walton wrote: > > On Sun, Nov 14, 2021 at 5:26 PM Carlo Arenas wrote: > > On Sun, Nov 14, 2021 at 12:45 PM Paul Eggert wrote: > > > ... > > using idx_t instead of size_t should be fine (if only halves the max > > size

bug#47264: [PATCH v2] pcre: migrate to pcre2

2021-11-14 Thread Carlo Arenas
On Sun, Nov 14, 2021 at 12:45 PM Paul Eggert wrote: > > On 11/9/21 02:58, Carlo Marcelo Arenas Belón wrote: > > Sadly, hadn't been able to generate a release, > > Does this mean you're having trouble running 'make dist'? If so, what's > the trouble? I seem to be unlucky; getting certificate

bug#47264: [PATCH v2] pcre: migrate to pcre2

2021-11-14 Thread Carlo Arenas
On Sun, Nov 14, 2021 at 3:18 PM Carlo Arenas wrote: > On Sun, Nov 14, 2021 at 2:45 PM Jeffrey Walton wrote: > > On Sun, Nov 14, 2021 at 5:26 PM Carlo Arenas wrote: > > > On Sun, Nov 14, 2021 at 12:45 PM Paul Eggert wrote: > > > > ... > > > using idx_t in

bug#47264: [PATCH v2] pcre: migrate to pcre2

2021-11-14 Thread Carlo Arenas
On Sun, Nov 14, 2021 at 7:18 PM Paul Eggert wrote: > On 11/14/21 14:25, Carlo Arenas wrote: > > using idx_t instead of size_t should be fine (if only halves the max > > size of the objects managed), but I am concerned that assuming > > PCRE2_SIZE_MAX is always equivalent

bug#47264: [PATCH] pcre: migrate to pcre2

2021-11-08 Thread Carlo Arenas
On Mon, Nov 8, 2021 at 11:53 AM Paul Eggert wrote: > > On 11/8/21 01:47, Carlo Arenas wrote: > > On Sun, Nov 7, 2021 at 4:30 PM Paul Eggert wrote: > > > Let me know how to help otherwise. > > The main thing from my point of view is that I'd like to know what those &g

bug#51710: [PATCH] pcre: avoid overflow in PCRE JIT stack resizing

2021-11-09 Thread Carlo Arenas
No PCRE2 uses size_t and it is the same (or similar) not signed type when passed to sljit, so no Undefined Behaviour or overflow. We might keep the limit in PCRE2 though, as it should be IMHO far smaller anyway. Carlo Car On Tue, Nov 9, 2021 at 10:28 AM Paul Eggert wrote: > > Thanks for

bug#47264: [PATCH] pcre: migrate to pcre2

2021-11-08 Thread Carlo Arenas
On Sun, Nov 7, 2021 at 4:30 PM Paul Eggert wrote: > > On 11/7/21 11:26, Carlo Marcelo Arenas Belón wrote: > > Mostly a bug by bug translation of the original code to the PCRE2 API. > > but includes a couple of fixes as well that might be worth doing in > > independent patches, if a straight

bug#66251: make [[:digit:]] consistent with \d when UCP mode is enabled in -P

2023-09-28 Thread Carlo Arenas
Enable the PCRE2 flag that will be released with 10.43 to keep [[:digit:]] ASCII just like it was done already for `\d`. Carlo 0001-pcre-make-d-and-digit-consistent-in-UCP-mode.patch Description: Binary data

bug#65416: Feature request: include first line of file in output

2023-08-23 Thread Carlo Arenas
> Daniel Green wrote: > > > I've never looked at the grep source code > > before, but could be tempted to try implementing it myself if there was any > > chance of the path being accepted. A slightly more complicated perl script would be my first choice if coding is the solution, but grep

bug#56888: 'echo message | grep []' is affected by files in local directory when using bracket

2022-08-02 Thread Carlo Arenas
This behaviour is expected and described in the manual (albeit it might be a good candidate for a FAQ) : https://www.gnu.org/software/grep/manual/grep.html#Usage Even before grep gets to see the expression, the shell would try to match it and expand it as needed, which is obviously not what

bug#60618: unicode characters are not identified as such for \w and \b with -P

2023-01-06 Thread Carlo Arenas
Reported to PCRE[1] with mention of GNU grep being also affected. [1] https://github.com/PCRE2Project/pcre2/issues/185 From c2d4a43b5b15df7c8853d591bf6ae872c602ed14 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Carlo=20Marcelo=20Arenas=20Bel=C3=B3n?= Date: Fri, 6 Jan 2023 19:34:56 -0800 Subject:

bug#60708: pcre: improve support for linking with a library without unicode

2023-01-10 Thread Carlo Arenas
Noticed while testing the previous patch, and which resulted in tests being skipped for the wrong reason. Carlo 0001-pcre-only-use-UTF-when-available-in-the-library.patch Description: Binary data

bug#60708: pcre: improve support for linking with a library without unicode

2023-01-12 Thread Carlo Arenas
On Thu, Jan 12, 2023 at 7:38 PM Paul Eggert wrote: > > Without the attached patch, in a > UTF-8 locale "grep -P '[[:alpha:]]'" won't report matching alphabetic > characters, if they're multibyte. Silent misbehavior is quite bad, and > it's better for grep to issue a diagnostic and exit than to

bug#60708: pcre: improve support for linking with a library without unicode

2023-01-11 Thread Carlo Arenas
pcre2_config does a static check (defined at compile time) and therefore is unlikely to fail and might be even under the right circumstances optimized out. you are correct that setting the original value was meant to protect from that function failing and will ensure the original path was still

bug#60708: pcre: improve support for linking with a library without unicode

2023-01-11 Thread Carlo Arenas
On Wed, Jan 11, 2023 at 6:29 PM Paul Eggert wrote: > > Oh, I think see your point, but doesn't this mean that even my code was > too trusting? It should be something like this: > >if (localeinfo.multibyte) > { >uint32_t unicode; >if (! (localeinfo.using_utf8 >

bug#62983: workaround PCRE2 bug affecting at least \D and \W

2023-04-29 Thread Carlo Arenas
Just some nitpicking, but could we use single quotes around the '턞' character in pcre-utf8-bug224 instead of double quotes? Carlo

bug#60690: -P '\d' in GNU and git grep

2023-04-04 Thread Carlo Arenas
On Mon, Apr 3, 2023 at 2:38 PM Paul Eggert wrote: > > In researching this a bit further, I found that on March 23 Git disabled > the use of PCRE2_UCP in PCRE2 10.34 or earlier[6], due to a PCRE2 bug > that can cause a crash when PCRE2_UCP is used[7]. A bug fix[8] should > appear in the next PCRE2

bug#62657: PCRE2-related workarounds that GNU grep might need

2023-04-04 Thread Carlo Arenas
On Mon, Apr 3, 2023 at 11:23 PM Paul Eggert wrote: > > On 2023-04-03 23:17, Carlo Arenas wrote: > > On Mon, Apr 3, 2023 at 2:50 PM Paul Eggert wrote: > >> > >> * Disable PCRE2_UCP unless PCRE2 10.35 or higher. > > > > this is because of a bug i

bug#62657: PCRE2-related workarounds that GNU grep might need

2023-04-04 Thread Carlo Arenas
On Mon, Apr 3, 2023 at 2:50 PM Paul Eggert wrote: > >* Disable PCRE2_UCP unless PCRE2 10.35 or higher. this is because of a bug in JIT, alternatively JIT could be disabled >* If ignoring case and PCRE2_MATCH_INVALID_UTF is defined, then > enable PCRE2_NO_START_OPTIMIZE unless PCRE2

bug#60690: -P '\d' in GNU and git grep

2023-04-07 Thread Carlo Arenas
On Fri, Apr 7, 2023 at 12:00 PM Paul Eggert wrote: > > On 2023-04-06 06:39, demerphq wrote: > > > Unicode specifies that \d match any digit > > in any script that it supports. > > "Specifies" is too strong. The Unicode Regular Expressions technical > standard (UTS#18) mentions \d only in Annex

bug#62769: pcre: correct overpessimistic error checking of pcre2_jit_compile()

2023-04-11 Thread Carlo Arenas
The original code was done in a way that would be useful during porting, but that would hinder future work unnecessarily. Carlo 0001-pcre-correct-overpessimistic-error-checking-of-pcre2.patch Description: Binary data

bug#62769: pcre: correct overpessimistic error checking of pcre2_jit_compile()

2023-04-12 Thread Carlo Arenas
On Tue, Apr 11, 2023 at 3:11 PM Paul Eggert wrote: > > On 4/10/23 23:47, Carlo Arenas wrote: > > The original code was done in a way that would be useful during > > porting, but that would hinder future work unnecessarily. > > Thanks, but wouldn't the attached patch be b

bug#62745: Color only capture group

2023-04-12 Thread Carlo Arenas
You can do that already with PCRE2 and a lookbehind: echo abcedc|ggrep --color -P '(?=b)c'

bug#62745: Color only capture group

2023-04-12 Thread Carlo Arenas
On Tue, Apr 11, 2023 at 11:51 PM Carlo Arenas wrote: > > echo abcedc|ggrep --color -P '(?=b)c' typo: echo abcedc|ggrep --color -P '(?<=b)c' `ggrep`, would be called grep in your environment

bug#60690: -P '\d' in GNU and git grep

2023-04-05 Thread Carlo Arenas
On Wed, Apr 5, 2023 at 12:40 PM Jim Meyering wrote: > > Changing grep -P's \d to match multibyte digits by default would break > an important contract. While I tend to agree[1] (and indeed that is why PCRE2_EXTRA_ASCII_BSD was invented), it would be also important to note that it goes against

bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate

2023-04-02 Thread Carlo Arenas
On Sun, Apr 2, 2023 at 11:30 AM Paul Eggert wrote: > > Also, GNU grep -w passes the following more-complicated regexp to dfaparse: but AFAIK `-w` is not necessary to trigger it, as the following also infloops in Fedora Rawhide $ echo a | grep -E '((()|a)|())+' interestingly; the loop is

bug#63965: grep-3.11: 'make check' fails with glibc-2.37.9000

2023-06-09 Thread Carlo Arenas
On Fri, Jun 9, 2023 at 12:06 AM Jaroslav Škarvada wrote: > diff: in: Value too large for defined data type This has nothing to do with the new glibc, but with the fact that your diff is affected by bug#63492. upgrading to diffutils 3.10 should address that. Carlo

bug#63484: FAIL: y2038-vs-32-bit

2023-05-13 Thread Carlo Arenas
On Sat, May 13, 2023 at 7:48 AM Andreas Schwab wrote: > > On Mai 13 2023, Carlo Marcelo Arenas Belón wrote: > > > on linux m68k. > > ??? Well; the report didn't provide much information, so I made an educated guess. Would you provide a more accurate description? Also, out of curiosity; does

bug#63533: test-mbrlen5.sh failure

2023-05-16 Thread Carlo Arenas
That is a test for a bug that your system image has but that is not relevant to grep (mbrlen doesn't correctly handle a call with a len of 0). Carlo

bug#63533: test-mbrlen5.sh failure

2023-05-19 Thread Carlo Arenas
On Fri, May 19, 2023 at 12:43 PM Carlo Marcelo Arenas Belón wrote: > > On Thu, May 18, 2023 at 10:09:38PM +0200, Jim Meyering wrote: > > On Thu, May 18, 2023 at 2:44 PM Carlo Marcelo Arenas Belón > > wrote: > > > On Wed, May 17, 2023 at 09:09:02PM -0400, Caleb Zulawski wrote: > > > > > > > >