Re: bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-20 Thread Norihiro Tanaka
On Mon, 19 Dec 2016 15:38:12 -0800 Paul Eggert wrote: > but the old 'replace' called 'delete' up to N times, Yes, but constraint == 0 does not happen mostly, so in delete() in "while" does not pass normally. > Anyway, I verified that the change improved performance on the

Re: bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-19 Thread Paul Eggert
On 12/19/2016 02:38 PM, Norihiro Tanaka wrote: BTW, What case do you the worst? One more I think previous 'replace' is not O(N*(N + log N)) but O(N + N log N) i.e. O(N log N) . Well, perhaps I misunderstood it, but the old 'replace' called 'delete' up to N times, and 'delete' took O(log N) to

Re: bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-19 Thread Norihiro Tanaka
On Sun, 18 Dec 2016 23:48:10 -0800 Paul Eggert wrote: > >> 'delete' is > >> O(N); 'replace' calls 'delete' in a loop and is therefore O(N**2). > >> 'epsclosure' calls 'replace' in a loop and so I suppose it is O(N**3). > >> I haven't looked into how likely the worst-case

Re: bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-18 Thread Paul Eggert
'delete' is O(N); 'replace' calls 'delete' in a loop and is therefore O(N**2). 'epsclosure' calls 'replace' in a loop and so I suppose it is O(N**3). I haven't looked into how likely the worst-case performance is, though. Yes. It is same both before and after with my proposed patch, but It

Re: bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-17 Thread Norihiro Tanaka
On Wed, 14 Dec 2016 17:19:27 -0800 Paul Eggert wrote: > I was referring to code with his proposed patch installed. 'delete' is > O(N); 'replace' calls 'delete' in a loop and is therefore O(N**2). > 'epsclosure' calls 'replace' in a loop and so I suppose it is O(N**3). > I

Re: bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-14 Thread Paul Eggert
On 12/12/2016 03:49 AM, Bruno Haible wrote: Part of the problem appears to be that position-set merging, even with his latest proposed changes, is O(N**2) where N is the pattern size I'm confused. Which code are you talking about? I was referring to code with his proposed patch

Re: bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-12 Thread Trevor Cordes
On 2016-12-12 Paul Eggert wrote: > grep should just work. It is > not working now and we need to fix that By that rationale, the commit that causes the problems for me should be backed out as a first step, and then a proper solution should be found. If you look at the commit in isolation, it: -

Re: bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-12 Thread Paul Eggert
On 12/12/2016 02:36 AM, Trevor Cordes wrote: If a user bothered to specify fgrep or -F then that user knows what they are doing! Here I have to disagree, and to agree with Bruno. The user should not have to know what grep's algorithm is. grep should just work. It is not working now and we

Re: bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-12 Thread Trevor Cordes
On 2016-12-12 Norihiro Tanaka wrote: > > On my box the above runs for >2m (never completes before I ^C) on > > the version **AFTER** the commits (v2.22). On the test build just > > *BEFORE* the commits (2.21.73-8058), it runs in <2s. So for me, I > > had a working command (-F -w -f) that used to

Re: bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-12 Thread Norihiro Tanaka
On Sun, 11 Dec 2016 18:55:32 +0100 Bruno Haible wrote: > Norihiro Tanaka wrote: > > dfa matcher is not always slower than kws matcher. ... > > It's a trade-off. Can you have any idea to select the better > > matcher for both two cases? > > There are at least two approaches. >

Re: bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-12 Thread Bruno Haible
Hi Paul, > Before installing anything like that, I'd first like to look into improving > the > DFA performance, along the lines suggested by Norhiro Tanaka. Part of the > problem appears to be that position-set merging, even with his latest > proposed > changes, is O(N**2) where N is the

Re: bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-11 Thread Paul Eggert
arn...@skeeve.com wrote: I'm sure that Paul and Jim would welcome patches. I wrote a patch that prefers -F when the input looks like a troublesome case. It merely uses heuristics though, nothing scientific like what Bruno Haible suggested. Before installing anything like that, I'd first

Re: bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-11 Thread arnold
Bruno Haible wrote: > Finally, code this formula into the 'grep' program. I'm sure that Paul and Jim would welcome patches. Arnold

Re: bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-11 Thread Bruno Haible
Trevor Cordes wrote: > I've read in numerous places (O'Reilly books mostly) that grep or pcre > is often/sometimes faster than fgrep, so I think it is (somewhat) common > knowledge and I wouldn't worry too much about that perception. It is wrong to put the burden of the algorithm choice on the

Re: bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-11 Thread Norihiro Tanaka
On Sun, 11 Dec 2016 05:28:56 -0600 Trevor Cordes wrote: > On my box the above runs for >2m (never completes before I ^C) on the > version **AFTER** the commits (v2.22). On the test build just *BEFORE* > the commits (2.21.73-8058), it runs in <2s. So for me, I had a

Re: bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-11 Thread Trevor Cordes
On 2016-12-11 Norihiro Tanaka wrote: > The changes switch used algorithm. They convert grep -w -F to grep > -w. Hi, thanks for helping! Sorry, yes, I forgot I was using --fixed-strings (-F), so yes, my example should have used -F. > Try following test case and before and after the changes,

Re: bug#22357: grep -f not only huge memory usage, but also huge time cost

2016-12-10 Thread Norihiro Tanaka
On Fri, 9 Dec 2016 01:24:19 -0600 Trevor Cordes wrote: > I bisected this bug to commits: > 662b19f2d0edc0bf07f1a2c1421080252df4a37c > 468d5217ed6ec1679512dec208c7f30fb8612957 > (can't narrow it down because the latter doesn't compile for me) The changes switch used