Bug#758105: bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error

2014-09-12 Thread Jim Meyering
On Thu, Sep 11, 2014 at 12:10 PM, Paul Eggert egg...@cs.ucla.edu wrote: On 09/11/2014 11:37 AM, Jim Meyering wrote: Would you mind adding a test to trigger that one? Ordinarily I would have done that already but this -P stuff is so buggy and slow that I got discouraged. (If we keep having

Bug#758105: bug#18266: grep -P and invalid exits with error

2014-09-11 Thread Vincent Lefevre
On 2014-09-10 13:22:36 +0200, Santiago wrote: Thanks! I'm including this fix in the current debian package. Unfortunately, it is very slow, with a large slowdown factor. I've just reported a new Debian concerning the performance problem. -- Vincent Lefèvre vinc...@vinc17.net - Web:

Bug#758105: bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error

2014-09-11 Thread Paul Eggert
Vincent Lefevre wrote: I've just reported a new Debian concerning the performance problem. It's not clear from http://bugs.debian.org/761157 that the performance problem occurs only with -P, but I assume that's what is meant. Since this is a performance bug with PCRE, I suggest moving the

Bug#758105: bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error

2014-09-11 Thread Jim Meyering
On Thu, Sep 11, 2014 at 10:07 AM, Paul Eggert egg...@cs.ucla.edu wrote: Vincent Lefevre wrote: I've just reported a new Debian concerning the performance problem. It's not clear from http://bugs.debian.org/761157 that the performance problem occurs only with -P, but I assume that's what is

Bug#758105: bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error

2014-09-11 Thread Paul Eggert
On 09/11/2014 11:37 AM, Jim Meyering wrote: Would you mind adding a test to trigger that one? Ordinarily I would have done that already but this -P stuff is so buggy and slow that I got discouraged. (If we keep having trouble with -P I may start lobbying to remove it) Anyway, I gave it

Bug#758105: bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error

2014-09-11 Thread Vincent Lefevre
On 2014-09-11 10:07:49 -0700, Paul Eggert wrote: Vincent Lefevre wrote: I've just reported a new Debian concerning the performance problem. It's not clear from http://bugs.debian.org/761157 that the performance problem occurs only with -P, but I assume that's what is meant. It's specific to

Bug#758105: bug#18266: grep -P and invalid exits with error

2014-09-10 Thread Paul Eggert
Paul Eggert wrote: perhaps there's a PCRE version dependency here? I found a PCRE-version-dependent problem that may be relevant, and installed the attached further patch to fix it. From dc7d532d16dec740d11b6817c9b558543aca0136 Mon Sep 17 00:00:00 2001 From: Paul Eggert egg...@cs.ucla.edu

Bug#758105: bug#18266: grep -P and invalid exits with error

2014-09-10 Thread Santiago
El 10/09/14 a las 00:08, Paul Eggert escribió: Paul Eggert wrote: perhaps there's a PCRE version dependency here? I found a PCRE-version-dependent problem that may be relevant, and installed the attached further patch to fix it. Thanks! I'm including this fix in the current debian package.

Bug#758105: bug#18266: grep -P and invalid exits with error

2014-09-10 Thread Norihiro Tanaka
Thanks. I have confirmed that new version has expected response as following. $ env LC_ALL=en_US.utf8 src/grep -P '.?b' in ab -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#758105: bug#18266: grep -P and invalid exits with error

2014-09-09 Thread Norihiro Tanaka
I'm worried that to re-run for invalid UTF-8 makes slowness for searching of the large number of binary files. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#758105: bug#18266: grep -P and invalid exits with error

2014-09-09 Thread Paul Eggert
Norihiro Tanaka wrote: I'm worried that to re-run for invalid UTF-8 makes slowness for searching of the large number of binary files. Yes, that could be a problem, but even so it's better for grep to report matches than to give up and fail. Perhaps someone could optimize this better later,

Bug#758105: bug#18266: grep -P and invalid exits with error

2014-09-09 Thread Norihiro Tanaka
I see that new version has no response for following test which was used previously. printf '\x80ab\n' | env LC_ALL=en_US.utf8 src/grep -P '.?b' -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org

Bug#758105: bug#18266: grep -P and invalid exits with error

2014-09-09 Thread Paul Eggert
Norihiro Tanaka wrote: I see that new version has no response for following test which was used previously. printf '\x80ab\n' | env LC_ALL=en_US.utf8 src/grep -P '.?b' Thanks for reporting that. The test case works for me (Fedora 20 x86-64, GCC 4.9.1): $ printf '\x80ab\n' | env

Bug#758105: bug#18266: grep -P and invalid exits with error

2014-08-29 Thread Eric Blake
On 08/28/2014 11:47 PM, Santiago wrote: El 16/08/14 a las 11:36, Paul Eggert escribió: Santiago wrote: Another solution would be to don't check if binary files are valid (passing PCRE_NO_UTF8_CHECK to pcre_exec), but I don't know if that'd avoid security holes It wouldn't. (We

Bug#758105: bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error

2014-08-16 Thread Santiago
El 14/08/14 a las 14:33, Paul Eggert escribió: Vincent Lefevre wrote: On input, using null bytes may be better if one wants to be able to match real replacement characters without false positives. Maybe, though this is no place to get fancy. It's simple to tell users an invalid byte acts

Bug#758105: bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error

2014-08-16 Thread Vincent Lefevre
On 2014-08-16 16:01:27 +0200, Santiago wrote: Workaround attached. It's too slow against binary files, but I haven't found a simpler solution. To avoid the slowness, I think that it would be better to detect (directly, not via PCRE) invalid UTF-8 sequences and replace them by null bytes

Bug#758105: bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error

2014-08-16 Thread Santiago
El 16/08/14 a las 18:26, Vincent Lefevre escribió: On 2014-08-16 16:01:27 +0200, Santiago wrote: Workaround attached. It's too slow against binary files, but I haven't found a simpler solution. To avoid the slowness, I think that it would be better to detect (directly, not via PCRE)

Bug#758105: bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error

2014-08-16 Thread Paul Eggert
Santiago wrote: Another solution would be to don't check if binary files are valid (passing PCRE_NO_UTF8_CHECK to pcre_exec), but I don't know if that'd avoid security holes It wouldn't. (We already tried it.) -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a

Bug#758105: bug#18266: grep -P and invalid exits with error

2014-08-14 Thread Paul Eggert
Santiago wrote: Please, revert ca7868cc27db3d9deafaa2e0ac5a2bb0aa8ef373 That commit was necessary to avoid undefined behavior in libpcre. We can't simply undo the commit (unless you want to reintroduce security holes into grep :-). The current behavior is the best we can do, unless

Bug#758105: bug#18266: grep -P and invalid exits with error

2014-08-14 Thread Vincent Lefevre
On 2014-08-14 09:15:58 -0700, Paul Eggert wrote: That commit was necessary to avoid undefined behavior in libpcre. We can't simply undo the commit (unless you want to reintroduce security holes into grep :-). The current behavior is the best we can do, unless someone fixes libpcre (which

Bug#758105: bug#18266: grep -P and invalid exits with error

2014-08-14 Thread Paul Eggert
Vincent Lefevre wrote: it would be better to replace invalid UTF-8 sequences by zero bytes before passing them to libpcre. Is it allowed to do that in Pexecute()? Sorry, I don't know. I was hoping that the volunteer (whoever it is) could figure all this stuff out. grep should work

Bug#758105: bug#18266: grep -P and invalid exits with error

2014-08-14 Thread Paul Eggert
Vincent Lefevre wrote: The problem with this solution is that it would change the length of the text, while replacing invalid bytes by zero bytes could be done in place (if allowed), with very little change of the code, I think. True. Though it might be more user-friendly to use '?' as the

Bug#758105: bug#18266: grep -P and invalid exits with error

2014-08-14 Thread Vincent Lefevre
On 2014-08-14 11:19:28 -0700, Paul Eggert wrote: grep should work correctly even if the input contains NUL bytes, so perhaps it would be better to replace an invalid byte by the UTF-8 sequence for U+FFFD REPLACEMENT CHARACTER, as that's one standard way to deal with this problem. Or perhaps

Bug#758105: bug#18266: grep -P and invalid exits with error

2014-08-14 Thread Vincent Lefevre
On 2014-08-14 13:13:45 -0700, Paul Eggert wrote: Vincent Lefevre wrote: The problem with this solution is that it would change the length of the text, while replacing invalid bytes by zero bytes could be done in place (if allowed), with very little change of the code, I think. True. Though

Bug#758105: bug#18266: Bug#758105: bug#18266: grep -P and invalid exits with error

2014-08-14 Thread Paul Eggert
Vincent Lefevre wrote: On input, using null bytes may be better if one wants to be able to match real replacement characters without false positives. Maybe, though this is no place to get fancy. It's simple to tell users an invalid byte acts like '?'. Simple is good. Anyway, this is a