On Thu, Sep 11, 2014 at 12:10 PM, Paul Eggert egg...@cs.ucla.edu wrote:
On 09/11/2014 11:37 AM, Jim Meyering wrote:
Would you mind adding a test to trigger that one?
Ordinarily I would have done that already but this -P stuff is so buggy and
slow that I got discouraged. (If we keep having
On 2014-09-10 13:22:36 +0200, Santiago wrote:
Thanks! I'm including this fix in the current debian package.
Unfortunately, it is very slow, with a large slowdown factor.
I've just reported a new Debian concerning the performance problem.
--
Vincent Lefèvre vinc...@vinc17.net - Web:
Vincent Lefevre wrote:
I've just reported a new Debian concerning the performance problem.
It's not clear from http://bugs.debian.org/761157 that the performance
problem occurs only with -P, but I assume that's what is meant.
Since this is a performance bug with PCRE, I suggest moving the
On Thu, Sep 11, 2014 at 10:07 AM, Paul Eggert egg...@cs.ucla.edu wrote:
Vincent Lefevre wrote:
I've just reported a new Debian concerning the performance problem.
It's not clear from http://bugs.debian.org/761157 that the performance
problem occurs only with -P, but I assume that's what is
On 09/11/2014 11:37 AM, Jim Meyering wrote:
Would you mind adding a test to trigger that one?
Ordinarily I would have done that already but this -P stuff is so buggy
and slow that I got discouraged. (If we keep having trouble with -P I
may start lobbying to remove it) Anyway, I gave it
On 2014-09-11 10:07:49 -0700, Paul Eggert wrote:
Vincent Lefevre wrote:
I've just reported a new Debian concerning the performance problem.
It's not clear from http://bugs.debian.org/761157 that the performance
problem occurs only with -P, but I assume that's what is meant.
It's specific to
Paul Eggert wrote:
perhaps there's a PCRE version dependency here?
I found a PCRE-version-dependent problem that may be relevant, and
installed the attached further patch to fix it.
From dc7d532d16dec740d11b6817c9b558543aca0136 Mon Sep 17 00:00:00 2001
From: Paul Eggert egg...@cs.ucla.edu
El 10/09/14 a las 00:08, Paul Eggert escribió:
Paul Eggert wrote:
perhaps there's a PCRE version dependency here?
I found a PCRE-version-dependent problem that may be relevant, and installed
the attached further patch to fix it.
Thanks! I'm including this fix in the current debian package.
Thanks. I have confirmed that new version has expected response as
following.
$ env LC_ALL=en_US.utf8 src/grep -P '.?b' in
ab
--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
I'm worried that to re-run for invalid UTF-8 makes slowness for searching
of the large number of binary files.
--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Norihiro Tanaka wrote:
I'm worried that to re-run for invalid UTF-8 makes slowness for searching
of the large number of binary files.
Yes, that could be a problem, but even so it's better for grep to report
matches than to give up and fail. Perhaps someone could optimize this
better later,
I see that new version has no response for following test which was used
previously.
printf '\x80ab\n' | env LC_ALL=en_US.utf8 src/grep -P '.?b'
--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Norihiro Tanaka wrote:
I see that new version has no response for following test which was used
previously.
printf '\x80ab\n' | env LC_ALL=en_US.utf8 src/grep -P '.?b'
Thanks for reporting that. The test case works for me (Fedora 20
x86-64, GCC 4.9.1):
$ printf '\x80ab\n' | env
On 08/28/2014 11:47 PM, Santiago wrote:
El 16/08/14 a las 11:36, Paul Eggert escribió:
Santiago wrote:
Another solution would be to don't check if binary files are valid
(passing PCRE_NO_UTF8_CHECK to pcre_exec), but I don't know if that'd
avoid security holes
It wouldn't. (We
El 14/08/14 a las 14:33, Paul Eggert escribió:
Vincent Lefevre wrote:
On input, using null bytes may be better if one wants to be able to
match real replacement characters without false positives.
Maybe, though this is no place to get fancy. It's simple to tell users an
invalid byte acts
On 2014-08-16 16:01:27 +0200, Santiago wrote:
Workaround attached. It's too slow against binary files, but I haven't
found a simpler solution.
To avoid the slowness, I think that it would be better to detect
(directly, not via PCRE) invalid UTF-8 sequences and replace them
by null bytes
El 16/08/14 a las 18:26, Vincent Lefevre escribió:
On 2014-08-16 16:01:27 +0200, Santiago wrote:
Workaround attached. It's too slow against binary files, but I haven't
found a simpler solution.
To avoid the slowness, I think that it would be better to detect
(directly, not via PCRE)
Santiago wrote:
Another solution would be to don't check if binary files are valid
(passing PCRE_NO_UTF8_CHECK to pcre_exec), but I don't know if that'd
avoid security holes
It wouldn't. (We already tried it.)
--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a
Santiago wrote:
Please, revert ca7868cc27db3d9deafaa2e0ac5a2bb0aa8ef373
That commit was necessary to avoid undefined behavior in libpcre. We
can't simply undo the commit (unless you want to reintroduce security
holes into grep :-). The current behavior is the best we can do, unless
On 2014-08-14 09:15:58 -0700, Paul Eggert wrote:
That commit was necessary to avoid undefined behavior in libpcre. We can't
simply undo the commit (unless you want to reintroduce security holes into
grep :-). The current behavior is the best we can do, unless someone fixes
libpcre (which
Vincent Lefevre wrote:
it would be better to replace invalid UTF-8 sequences by
zero bytes before passing them to libpcre. Is it allowed to do
that in Pexecute()?
Sorry, I don't know. I was hoping that the volunteer (whoever it is)
could figure all this stuff out.
grep should work
Vincent Lefevre wrote:
The problem with this solution is that it would change the length
of the text, while replacing invalid bytes by zero bytes could be
done in place (if allowed), with very little change of the code,
I think.
True. Though it might be more user-friendly to use '?' as the
On 2014-08-14 11:19:28 -0700, Paul Eggert wrote:
grep should work correctly even if the input contains NUL bytes, so perhaps
it would be better to replace an invalid byte by the UTF-8 sequence for
U+FFFD REPLACEMENT CHARACTER, as that's one standard way to deal with this
problem. Or perhaps
On 2014-08-14 13:13:45 -0700, Paul Eggert wrote:
Vincent Lefevre wrote:
The problem with this solution is that it would change the length
of the text, while replacing invalid bytes by zero bytes could be
done in place (if allowed), with very little change of the code,
I think.
True. Though
Vincent Lefevre wrote:
On input, using null bytes may be better if one wants to be able to
match real replacement characters without false positives.
Maybe, though this is no place to get fancy. It's simple to tell users
an invalid byte acts like '?'. Simple is good.
Anyway, this is a
25 matches
Mail list logo