The following patch increase performance of grep when looking at
binary data, without any side effects:
Summary
'cd grep; ./src/grep -Pc foo
/Users/carlo/Downloads/FreeBSD-13.0-BETA2-amd64.vhd' ran
1.77 ± 0.02 times faster than 'cd grep.base; ./src/grep -Pc foo
And of course it has side effects (as shown by the test suite), and
would only help (if fixed) when the needle is a fixed string, which is
3x slower than doing -F, -G or -E.
Apologies for the distraction.
Carlo
With this patch, multiple expressions (from -e or -f) are now
acceptable with -P for easier side by side comparison with the other
supported engines.
Alternatively, multiple expressions could be compiled and run
sequentially for matching, but I suspect the added compilation time is
likely higher,
On Sat, Oct 16, 2021 at 12:50 AM Paul Eggert wrote:
>
> On 10/16/21 12:00 AM, Carlo Arenas wrote:
> > With this patch, multiple expressions (from -e or -f) are now
> > acceptable with -P for easier side by side comparison with the other
> > supported engines.
>
>
On Tue, Nov 9, 2021 at 4:40 PM Paul Eggert wrote:
>
> On 11/9/21 11:04, Carlo Marcelo Arenas Belón wrote:
> > Severity: wishlist
> >
> > There are times, when the expression is too simple or will not be used too
> > often to justify the extra time in -P that is required for JIT compilation.
>
>
On Sun, Nov 14, 2021 at 2:45 PM Jeffrey Walton wrote:
>
> On Sun, Nov 14, 2021 at 5:26 PM Carlo Arenas wrote:
> > On Sun, Nov 14, 2021 at 12:45 PM Paul Eggert wrote:
> > > ...
> > using idx_t instead of size_t should be fine (if only halves the max
> > size
On Sun, Nov 14, 2021 at 12:45 PM Paul Eggert wrote:
>
> On 11/9/21 02:58, Carlo Marcelo Arenas Belón wrote:
> > Sadly, hadn't been able to generate a release,
>
> Does this mean you're having trouble running 'make dist'? If so, what's
> the trouble?
I seem to be unlucky; getting certificate
On Sun, Nov 14, 2021 at 3:18 PM Carlo Arenas wrote:
> On Sun, Nov 14, 2021 at 2:45 PM Jeffrey Walton wrote:
> > On Sun, Nov 14, 2021 at 5:26 PM Carlo Arenas wrote:
> > > On Sun, Nov 14, 2021 at 12:45 PM Paul Eggert wrote:
> > > > ...
> > > using idx_t in
On Sun, Nov 14, 2021 at 7:18 PM Paul Eggert wrote:
> On 11/14/21 14:25, Carlo Arenas wrote:
> > using idx_t instead of size_t should be fine (if only halves the max
> > size of the objects managed), but I am concerned that assuming
> > PCRE2_SIZE_MAX is always equivalent
On Mon, Nov 8, 2021 at 11:53 AM Paul Eggert wrote:
>
> On 11/8/21 01:47, Carlo Arenas wrote:
> > On Sun, Nov 7, 2021 at 4:30 PM Paul Eggert wrote:
>
> > Let me know how to help otherwise.
>
> The main thing from my point of view is that I'd like to know what those
&g
No
PCRE2 uses size_t and it is the same (or similar) not signed type when
passed to sljit, so no Undefined Behaviour or overflow.
We might keep the limit in PCRE2 though, as it should be IMHO far
smaller anyway.
Carlo
Car
On Tue, Nov 9, 2021 at 10:28 AM Paul Eggert wrote:
>
> Thanks for
On Sun, Nov 7, 2021 at 4:30 PM Paul Eggert wrote:
>
> On 11/7/21 11:26, Carlo Marcelo Arenas Belón wrote:
> > Mostly a bug by bug translation of the original code to the PCRE2 API.
> > but includes a couple of fixes as well that might be worth doing in
> > independent patches, if a straight
Enable the PCRE2 flag that will be released with 10.43 to keep
[[:digit:]] ASCII just like it was done already for `\d`.
Carlo
0001-pcre-make-d-and-digit-consistent-in-UCP-mode.patch
Description: Binary data
> Daniel Green wrote:
>
> > I've never looked at the grep source code
> > before, but could be tempted to try implementing it myself if there was any
> > chance of the path being accepted.
A slightly more complicated perl script would be my first choice if
coding is the solution, but grep
This behaviour is expected and described in the manual (albeit it
might be a good candidate for a FAQ) :
https://www.gnu.org/software/grep/manual/grep.html#Usage
Even before grep gets to see the expression, the shell would try to
match it and expand it as needed, which is obviously not what
Reported to PCRE[1] with mention of GNU grep being also affected.
[1] https://github.com/PCRE2Project/pcre2/issues/185
From c2d4a43b5b15df7c8853d591bf6ae872c602ed14 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Carlo=20Marcelo=20Arenas=20Bel=C3=B3n?=
Date: Fri, 6 Jan 2023 19:34:56 -0800
Subject:
Noticed while testing the previous patch, and which resulted in tests
being skipped for the wrong reason.
Carlo
0001-pcre-only-use-UTF-when-available-in-the-library.patch
Description: Binary data
On Thu, Jan 12, 2023 at 7:38 PM Paul Eggert wrote:
>
> Without the attached patch, in a
> UTF-8 locale "grep -P '[[:alpha:]]'" won't report matching alphabetic
> characters, if they're multibyte. Silent misbehavior is quite bad, and
> it's better for grep to issue a diagnostic and exit than to
pcre2_config does a static check (defined at compile time) and
therefore is unlikely to fail and might be even under the right
circumstances optimized out.
you are correct that setting the original value was meant to protect
from that function failing and will ensure the original path was still
On Wed, Jan 11, 2023 at 6:29 PM Paul Eggert wrote:
>
> Oh, I think see your point, but doesn't this mean that even my code was
> too trusting? It should be something like this:
>
>if (localeinfo.multibyte)
> {
>uint32_t unicode;
>if (! (localeinfo.using_utf8
>
Just some nitpicking, but could we use single quotes around the '턞'
character in pcre-utf8-bug224 instead of double quotes?
Carlo
On Mon, Apr 3, 2023 at 2:38 PM Paul Eggert wrote:
>
> In researching this a bit further, I found that on March 23 Git disabled
> the use of PCRE2_UCP in PCRE2 10.34 or earlier[6], due to a PCRE2 bug
> that can cause a crash when PCRE2_UCP is used[7]. A bug fix[8] should
> appear in the next PCRE2
On Mon, Apr 3, 2023 at 11:23 PM Paul Eggert wrote:
>
> On 2023-04-03 23:17, Carlo Arenas wrote:
> > On Mon, Apr 3, 2023 at 2:50 PM Paul Eggert wrote:
> >>
> >> * Disable PCRE2_UCP unless PCRE2 10.35 or higher.
> >
> > this is because of a bug i
On Mon, Apr 3, 2023 at 2:50 PM Paul Eggert wrote:
>
>* Disable PCRE2_UCP unless PCRE2 10.35 or higher.
this is because of a bug in JIT, alternatively JIT could be disabled
>* If ignoring case and PCRE2_MATCH_INVALID_UTF is defined, then
> enable PCRE2_NO_START_OPTIMIZE unless PCRE2
On Fri, Apr 7, 2023 at 12:00 PM Paul Eggert wrote:
>
> On 2023-04-06 06:39, demerphq wrote:
>
> > Unicode specifies that \d match any digit
> > in any script that it supports.
>
> "Specifies" is too strong. The Unicode Regular Expressions technical
> standard (UTS#18) mentions \d only in Annex
The original code was done in a way that would be useful during
porting, but that would hinder future work unnecessarily.
Carlo
0001-pcre-correct-overpessimistic-error-checking-of-pcre2.patch
Description: Binary data
On Tue, Apr 11, 2023 at 3:11 PM Paul Eggert wrote:
>
> On 4/10/23 23:47, Carlo Arenas wrote:
> > The original code was done in a way that would be useful during
> > porting, but that would hinder future work unnecessarily.
>
> Thanks, but wouldn't the attached patch be b
You can do that already with PCRE2 and a lookbehind:
echo abcedc|ggrep --color -P '(?=b)c'
On Tue, Apr 11, 2023 at 11:51 PM Carlo Arenas wrote:
>
> echo abcedc|ggrep --color -P '(?=b)c'
typo:
echo abcedc|ggrep --color -P '(?<=b)c'
`ggrep`, would be called grep in your environment
On Wed, Apr 5, 2023 at 12:40 PM Jim Meyering wrote:
>
> Changing grep -P's \d to match multibyte digits by default would break
> an important contract.
While I tend to agree[1] (and indeed that is why PCRE2_EXTRA_ASCII_BSD
was invented), it would be also important to note that it goes against
On Sun, Apr 2, 2023 at 11:30 AM Paul Eggert wrote:
>
> Also, GNU grep -w passes the following more-complicated regexp to dfaparse:
but AFAIK `-w` is not necessary to trigger it, as the following also
infloops in Fedora Rawhide
$ echo a | grep -E '((()|a)|())+'
interestingly; the loop is
On Fri, Jun 9, 2023 at 12:06 AM Jaroslav Škarvada wrote:
> diff: in: Value too large for defined data type
This has nothing to do with the new glibc, but with the fact that your
diff is affected by bug#63492.
upgrading to diffutils 3.10 should address that.
Carlo
On Sat, May 13, 2023 at 7:48 AM Andreas Schwab wrote:
>
> On Mai 13 2023, Carlo Marcelo Arenas Belón wrote:
>
> > on linux m68k.
>
> ???
Well; the report didn't provide much information, so I made an educated guess.
Would you provide a more accurate description?
Also, out of curiosity; does
That is a test for a bug that your system image has but that is not
relevant to grep (mbrlen doesn't correctly handle a call with a len of
0).
Carlo
On Fri, May 19, 2023 at 12:43 PM Carlo Marcelo Arenas Belón
wrote:
>
> On Thu, May 18, 2023 at 10:09:38PM +0200, Jim Meyering wrote:
> > On Thu, May 18, 2023 at 2:44 PM Carlo Marcelo Arenas Belón
> > wrote:
> > > On Wed, May 17, 2023 at 09:09:02PM -0400, Caleb Zulawski wrote:
> > > >
> > > >
35 matches
Mail list logo