Re: Official request: Please make GNU grep the default

2010-08-23 Thread Gabor Kovesdan

 Em 2010.08.19. 23:42, b. f. escreveu:

Gabor:

One more thing to look into, in addition to the context problems,
ndisgen breakage, and problems on certain file systems:

At r211506, 'grep -wq' does not seem to work properly (in the very
least, it is not the same as with GNU grep), and has broken the
'check-categories' target (and hence builds) of all ports with 'lisp'
in CATEGORIES.


Thanks, I added to my TODO list.


P.S. I hope that you have recovered from your influenza, and are
feeling better now.
Oh, thanks, I'm fine now but I'm moving soon to another country, so will 
be busy for some time. But I've changed back the default to GNU grep, so 
now the fixes aren't so urgent.


Gabor
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-23 Thread Dag-Erling Smørgrav
b. f. bf1...@googlemail.com writes:
 Dag-Erling Smørgrav d...@des.no writes:
  Does not seem to work properly is not a very useful statement.  The
  least you could do is provide an example.
 I did provide an example, later in the same sentence that you quoted.

I forgot to answer this part.  By example, I mean an actual grep command
line and sample input that demonstrates the problem, the smaller the
better:

% echo elisp lisp | grep -w lisp  echo good || echo bad
elisp lisp
good
% echo elisp lisp | grep -wq lisp  echo good || echo bad
bad

No idea what causes it, but a quick grep (hah!) for qflag turns up the
following horror:

/* Find out the correct return value according to the
   results and the command line option. */
exit(c ? (notfound ? (qflag ? 0 : 2) : 0) : (notfound ? 2 : 1));

which shows that -q *does* affect the exit code, but my brain refuses to
try to understand that code.

DES
-- 
Dag-Erling Smørgrav - d...@des.no
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-23 Thread Dag-Erling Smørgrav
Dag-Erling Smørgrav d...@des.no writes:
 No idea what causes it, but a quick grep (hah!) for qflag turns up the
 following horror:

 /* Find out the correct return value according to the
results and the command line option. */
 exit(c ? (notfound ? (qflag ? 0 : 2) : 0) : (notfound ? 2 : 1));

 which shows that -q *does* affect the exit code, but my brain refuses to
 try to understand that code.

My brain is in need of a break from $REALJOB.  POSIX says

EXIT STATUS

The following exit values shall be returned:

 0
One or more lines were selected.
 1
No lines were selected.
1
An error occurred.

CONSEQUENCES OF ERRORS

If the -q option is specified, the exit status shall be zero if an
input line is selected, even if an error was detected. Otherwise,
default actions shall be performed.

I suppose c is supposed to indicate, in some manner, whether an error
occurred, but it's hard to tell; the code seems almost deliberately
obfuscated.  The name gives no clue whatsoever as to its meaning.  It is
incremented like a counter, but tested like a boolean.  Its value is
derived from the value returned by procfile(), but that value is also
named c, and is derived from values returned by other functions which
I could not be bothered to track down.  In any case -

c   notfoundqflag   result
truetruetrue0
truetruefalse   2
truefalse   true0
truefalse   false   0
false   truetrue2
false   truefalse   2
false   false   true1
false   false   false   1

By this point, my brain is tied into the shape of a pretzel, but it
looks like c might actually be a count of matching lines and notfound
might be an error flag.  I give it -10 for calling the count c instead
of count or matches (I'm being generous because c is the first
letter of count), another -10 for testing it as a boolean instead of
comparing it to 0, -1,000 for calling the error flag notfound, and
-1,000,000 for writing code so convoluted that, even with the source
code in front of me, I had to reverse-engineer it to figure out what it
does.  I think that adds up to -1,001,020.

DES
-- 
Dag-Erling Smørgrav - d...@des.no
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-23 Thread Dimitry Andric
On 2010-08-20 23:07, b. f. wrote:
 The lisp category is the only category that causes a problem with the
 new bsdgrep, and I didn't take the time to analyze why ( which is why
 I was not more specific in my original message). 'lisp' is preceded by
 'elisp', which would normally be a match for the 'lisp' in a port
 Makefile, were it not for the -w flag.  'x11' succeeds, but it
 precedes all of the x11-* categories.  I suspect that there is an
 error in the logic of either the -w or the -q flag implementation in
 bsdgrep, which causes problems when the two options are used together.
 The target succeeds as expected with GNU grep.

Can you please try the following patch?  I think this solves more than
one problem in bsdgrep's logic, but it needs to be reviewed by Gabor and
others.

In usr.bin/grep/util.c, in the function procline(), there is the
following fragment:

290 /* Loop to process the whole line */
291 while (st = l-len) {
[...]
295 /* Loop to compare with all the patterns */
296 for (i = 0; i  patterns; i++) {
[... sets r to 0 if a match was found ...]
336 if (r == 0) {
[...]
341 /* matches - skip further patterns 
*/
342 if ((color != NULL  !oflag) || 
qflag || lflag)
343 break;
344 }
345 }
[...]
351 /* One pass if we are not recording matches */
352 if ((color != NULL  !oflag) || qflag || lflag)
353 break;
[...]
357 }

If during the first iteration of the loop to process the whole line
no match was found (for example, if doing a word search for lisp and
the string elisp is found instead), AND the -q option was used, the
test in line 352 aborts the whole loop, without searching any further!
Thus it will miss the lisp string later in the line.

It looks like line 352 should only be evaluated if the for() loop just
above it resulted in one or or more matches, so it is probably easiest
to just replace line 343 with a goto that jumps out of the two enclosing
loops (it is still a pity C does not have a break 2 statement), and
delete lines 351..353 entirely.

However, if there are unintended side effects, for example with weird
combinations of the --color, -o and/or -l flags, please let me know. :)
diff --git a/usr.bin/grep/util.c b/usr.bin/grep/util.c
index c65d8f2..97a14f3 100644
--- a/usr.bin/grep/util.c
+++ b/usr.bin/grep/util.c
@@ -340,7 +340,7 @@ procline(struct str *l, int nottext)
matches[m++] = pmatch;
/* matches - skip further patterns */
if ((color != NULL  !oflag) || qflag 
|| lflag)
-   break;
+   goto skip;
}
}
 
@@ -348,9 +348,6 @@ procline(struct str *l, int nottext)
c = !c;
break;
}
-   /* One pass if we are not recording matches */
-   if ((color != NULL  !oflag) || qflag || lflag)
-   break;
 
if (st == (size_t)pmatch.rm_so)
break;  /* No matches */
@@ -358,6 +355,7 @@ procline(struct str *l, int nottext)
} else
c = !vflag;
 
+skip:
if (c  binbehave == BINFILE_BIN  nottext)
return (c); /* Binary file */
 
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Official request: Please make GNU grep the default

2010-08-23 Thread Alan Cox

Dag-Erling Smørgrav wrote:

Alan Cox a...@cs.rice.edu writes:
  

Here is what actually puzzles me about these results.  With
traditional I/O, even after the optimizations to bsdgrep, the system
time for gnugrep is still less than half that of the optimized
bsdgrep.  I haven't looked at the changes, but I would have thought
the system time for gnugrep and bsdgrep would be almost the same.



Two reasons:

1) BSD grep does tons of unnecessary memory-to-memory copy operations in
   grep_fgetln().

2) GNU grep has its own highly optimized regex code.

  


Umm, not really.  Notice that I said system time not user time.  
Even after the recent changes to optimize the I/O in bsdgrep, Dimitry's 
results show that bsdgrep is spending more than twice as much time in 
the kernel as gnugrep.  That said, in the end, you may be right in the 
sense that the user space inefficiencies may indirectly result in more 
cache misses in the kernel because the additional user space memory used 
by bsdgrep displaces more kernel data from the cache between system 
calls.  However, I would not jump to that conclusion.  The explanation 
for the difference in system time may be more straightforward and easy 
to fix.


It would be nice to see a comparison of bsdgrep and gnugrep using 
pmcstat to profile L2 cache misses.  That might be enlightening.


Alan

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-20 Thread Adrian Chadd
2010/8/19 Dag-Erling Smørgrav d...@des.no:

  time   seconds   seconds    calls  ms/call  ms/call  name
  38.8       0.03     0.03    12717     0.00     0.00  memchr [5]
  35.6       0.07     0.03      395     0.08     0.08  _read [6]
  16.4       0.08     0.01        0  100.00%           _mcount [7]

memchr()? Really?

I've just looked at grep_fgetln(). Surely memchr() isn't required there.



Adrian
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-20 Thread Dag-Erling Smørgrav
Doug Barton do...@freebsd.org writes:
 There are 2 questions, did I do the right thing, and how should people
 report problems in general. As for myself, while I have some facility
 in C it's not my strong suit. Yes, I could have produced a profiling
 version of grep, but it would have taken me a lot more than 10 minutes
 because I don't even build the profiled libs on a regular basis.

That's fair.  I didn't use to do that because it increases build time
considerably, but now I do.

If you have profiling libraries installed, you can build a profiling
version of grep (or any program) like so:

% cd /usr/src/usr.bin/grep
% make clean
% make DEBUG_FLAGS=-pg -g -DNO_SHARED

Do *not make install, because the result will be dog slow and you don't
want to use it in production.  Every time you run it, it will leave a
file named grep.gmon in your current directory, so:

% /usr/obj/usr/src/usr.bin/grep/grep -q ^xfce4-wm /usr/ports/INDEX-9
% gprof /usr/obj/usr/src/usr.bin/grep/grep grep.gmon | less

 More generally however I think that we need to be realistic with what
 we expect people to do about reporting problems.

This is not about what we expect people to do but about what I expect
*you*, an experienced FreeBSD committer, to do :)

DES
-- 
Dag-Erling Smørgrav - d...@des.no
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


[SPAM] Re: Official request: Please make GNU grep the default

2010-08-20 Thread Niclas Zeising

On 2010-08-20 11:10, Dag-Erling Smørgrav wrote:


If you have profiling libraries installed, you can build a profiling
version of grep (or any program) like so:

% cd /usr/src/usr.bin/grep
% make clean
% make DEBUG_FLAGS=-pg -g -DNO_SHARED

Do *not make install, because the result will be dog slow and you don't
want to use it in production.  Every time you run it, it will leave a
file named grep.gmon in your current directory, so:



As a side note, this maybe should be posted in the handbook and/or on 
the wiki for reference, so people who want to profile slow applications 
know how to do it.


Regards!
//Niclas
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-20 Thread Dag-Erling Smørgrav
Adrian Chadd adr...@freebsd.org writes:
 I've just looked at grep_fgetln(). Surely memchr() isn't required there.

Of course it is, how else are you going to locate the '\n'?  OTOH, I'm
not sure grep_fgetln() is needed at all.

DES
-- 
Dag-Erling Smørgrav - d...@des.no
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-20 Thread Adrian Chadd
2010/8/20 Dag-Erling Smørgrav d...@des.no:
 Adrian Chadd adr...@freebsd.org writes:
 I've just looked at grep_fgetln(). Surely memchr() isn't required there.

 Of course it is, how else are you going to locate the '\n'?  OTOH, I'm
 not sure grep_fgetln() is needed at all.

It seems a bit strange that memchr(), which should be hitting data
that's in the cache (as it was recently read, right?) is showing up so
high in the profiling results. memchr() in libc/string/memchr.c looks
like how I'd inline it, so, hm.

Have you tried this in pmc?


Adrian
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-20 Thread Dag-Erling Smørgrav
Adrian Chadd adr...@freebsd.org writes:
 Have you tried this in pmc?

No.  I can't figure out how to use pmcstat, but I did find a bug in it:
if you specify an output file with -o, but the command line is otherwise
incomplete or incorrect, it will print usage information to the output
file instead of stderr.

DES
-- 
Dag-Erling Smørgrav - d...@des.no
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-20 Thread Dag-Erling Smørgrav
Alan Cox a...@cs.rice.edu writes:
 Here is what actually puzzles me about these results.  With
 traditional I/O, even after the optimizations to bsdgrep, the system
 time for gnugrep is still less than half that of the optimized
 bsdgrep.  I haven't looked at the changes, but I would have thought
 the system time for gnugrep and bsdgrep would be almost the same.

Two reasons:

1) BSD grep does tons of unnecessary memory-to-memory copy operations in
   grep_fgetln().

2) GNU grep has its own highly optimized regex code.

DES
-- 
Dag-Erling Smørgrav - d...@des.no
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-20 Thread Dag-Erling Smørgrav
b. f. bf1...@googlemail.com writes:
 At r211506, 'grep -wq' does not seem to work properly (in the very
 least, it is not the same as with GNU grep),

Does not seem to work properly is not a very useful statement.  The
least you could do is provide an example.

DES
-- 
Dag-Erling Smørgrav - d...@des.no
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-20 Thread b. f.
On 8/20/10, Dag-Erling Smørgrav d...@des.no wrote:
 b. f. bf1...@googlemail.com writes:
 At r211506, 'grep -wq' does not seem to work properly (in the very
 least, it is not the same as with GNU grep),

 Does not seem to work properly is not a very useful statement.  The
 least you could do is provide an example.

I did provide an example, later in the same sentence that you quoted.
Using a current ports tree, go to a port with 'lisp' in CATEGORIES,
and run any ports target that requires 'check-categories', e.g.:

make -C /usr/ports/math/maxima check-categories
maxima-5.22.1: Makefile error: category lisp not in list of valid categories.
*** Error code 1

Stop in /mnt/disk2/usr/ports/math/maxima.

From bsd.port.mk:

   2941 VALID_CATEGORIES+= accessibility afterstep arabic archivers
astro audio \
   2942 benchmarks biology cad chinese comms converters databases \
   2943 deskutils devel docs dns editors elisp emulators
finance french ftp \
   2944 games geography german gnome gnustep graphics hamradio
haskell hebrew hungarian \
   2945 ipv6 irc japanese java kde kld korean lang linux lisp \
   2946 mail math mbone misc multimedia net net-im net-mgmt
net-p2p news \
   2947 palm parallel pear perl5 plan9 polish portuguese ports-mgmt \
   2948 print python ruby rubygems russian \
   2949 scheme science security shells spanish sysutils \
   2950 tcl textproc tk \
   2951 ukrainian vietnamese windowmaker www \
   2952 x11 x11-clocks x11-drivers x11-fm x11-fonts
x11-servers x11-themes \
   2953 x11-toolkits x11-wm xfce zope
   2954
   2955 check-categories:
   2956 .for cat in ${CATEGORIES}
   2957 @if ${ECHO_CMD} ${VALID_CATEGORIES} | ${GREP} -wq ${cat}; then \
   2958 ${TRUE}; \
   2959 else \
   2960 ${ECHO_MSG} ${PKGNAME}: Makefile error:
category ${cat} not in list of valid categories.   2960 ; \
   2961 ${FALSE}; \
   2962 fi
   2963 .endfor

A closer look at VALID_CATEGORIES, using vis -oltw:

VALID_CATEGORIES+=\040accessibility\040afterstep\040arabic\040archivers\040astro\040audio\040\\\$
\011benchmarks\040biology\040cad\040chinese\040comms\040converters\040databases\040\\\$
\011deskutils\040devel\040docs\040dns\040editors\040elisp\040emulators\040finance\040french\040ftp\040\\\$
\011games\040geography\040german\040gnome\040gnustep\040graphics\040hamradio\040haskell\040hebrew\040hungarian\040\\\$
\011ipv6\040irc\040japanese\040java\040kde\040kld\040korean\040lang\040linux\040lisp\040\\\$
\011mail\040math\040mbone\040misc\040multimedia\040net\040net-im\040net-mgmt\040net-p2p\040news\040\\\$
\011palm\040parallel\040pear\040perl5\040plan9\040polish\040portuguese\040ports-mgmt\040\\\$
\011print\040python\040ruby\040rubygems\040russian\040\\\$
\011scheme\040science\040security\040shells\040spanish\040sysutils\040\\\$
\011tcl\040textproc\040tk\040\\\$
\011ukrainian\040vietnamese\040windowmaker\040www\040\\\$
\011x11\040x11-clocks\040x11-drivers\040x11-fm\040x11-fonts\040x11-servers\040x11-themes\040\\\$
\011x11-toolkits\040x11-wm\040xfce\040zope\$

The lisp category is the only category that causes a problem with the
new bsdgrep, and I didn't take the time to analyze why ( which is why
I was not more specific in my original message). 'lisp' is preceded by
'elisp', which would normally be a match for the 'lisp' in a port
Makefile, were it not for the -w flag.  'x11' succeeds, but it
precedes all of the x11-* categories.  I suspect that there is an
error in the logic of either the -w or the -q flag implementation in
bsdgrep, which causes problems when the two options are used together.
The target succeeds as expected with GNU grep.

b.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-19 Thread David Xu

Gabor Kovesdan wrote:

Yes, I'm sorry for my slow reaction, I got a flu some time ago and that 
prevented me from fixing the bugs earlier. I have several fixes in my 
working copy, which are being discussed with my mentor. Probably, today 
or tomorrow they will be committed.


Gabor



When will the grep -H print file name for me ?  it is rather painful 
that the feature is missing. :-(

So I can not use it with find:

find . -exec grep -H {} world \;
I don't know which file contains the word world.

Regards,
David Xu

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-19 Thread Dag-Erling Smørgrav
M. Warner Losh i...@bsdimp.com writes:
 So making it default turned out well in the end.  Sure, there was pain
 involved (but this is current), but making it default exposed the pain
 that would otherwise have gone unnoticed.  The big hue and cry, while
 excessive at times, did result in people actually running the
 profiling tools which pointed to the buffering as the number one thing
 to fix.

There is a lesson here: people who are unsatisfied with the performance
of ${TOOL} should profile it before they start a flamefest on -current.

DES
-- 
Dag-Erling Smørgrav - d...@des.no
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-19 Thread Stein Morten Sandbech
Hi,

GNU grep is OK.  However standard BSD grep also work:

find . -exec grep -i world {} /dev/null \;

or even:

find . -exec grep -in world {} /dev/null \;

if you want linenumbers ...

hth

Stein Morten



On Aug 19, 2010, at 11:29, freebsd-current-requ...@freebsd.org wrote:

 Date: Thu, 19 Aug 2010 16:42:26 +
 From: David Xu davi...@freebsd.org
 Subject: Re: Official request: Please make GNU grep the default
 To: Gabor Kovesdan ga...@freebsd.org
 Cc: delp...@freebsd.org, Andrey Chernov a...@nagual.pp.ru,  Doug
   Barton do...@freebsd.org, c...@freebsd.org, curr...@freebsd.org
 Message-ID: 4c6d5ef2.2040...@freebsd.org
 Content-Type: text/plain; charset=ISO-8859-1; format=flowed
 
 Gabor Kovesdan wrote:
 
 Yes, I'm sorry for my slow reaction, I got a flu some time ago and that 
 prevented me from fixing the bugs earlier. I have several fixes in my 
 working copy, which are being discussed with my mentor. Probably, today 
 or tomorrow they will be committed.
 
 Gabor
 
 
 When will the grep -H print file name for me ?  it is rather painful 
 that the feature is missing. :-(
 So I can not use it with find:
 
 find . -exec grep -H {} world \;
 I don't know which file contains the word world.
 
 Regards,
 David Xu

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-19 Thread V. T. Mueller, Continum


Dag-Erling Smørgrav wrote:

There is a lesson here: people who are unsatisfied with the performance
of ${TOOL} should profile it before they start a flamefest on -current.


If you're alluding to Dougs original email, I will strictly disagree.
He found a performance issue which noone had seen or brought up before 
and gave feedback to Gabor in a constructive and distinctively polite 
manner.


I would also second, that changing back the default immediately then 
would have been the better choice.


That others supported Gabor in doing profiling and suggesting 
improvements is a fine thing, and contributes to what makes this 
community stand out from others.


In other words: as long as there are unresolved issues, the default 
should be set to GNU grep. This doesn't stop anyone from improving the 
BSD grep we're all waiting for. It only does good to those who rely on 
using grep - expecting correctness and speed.


My $0.02
vt

--
Volker T. Mueller
Continum AG
Bismarckallee 7d
79098 Freiburg i. Br.
Tel. +49 761 21711171
Fax. +49 761 21711198
http://www.continum.net

Sitz der Gesellschaft: Freiburg im Breisgau
Registergericht: Amtsgericht Freiburg, HRB 6866
Vorstand: Rolf Mathis, Volker T. Mueller
Vorsitzender d. Aufsichtsrats: Prof. Dr. Karl-F. Fischbach

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-19 Thread Dag-Erling Smørgrav
V. T. Mueller, Continum v.t.muel...@continum.net writes:
 If you're alluding to Dougs original email, I will strictly disagree.
 He found a performance issue which noone had seen or brought up before
 and gave feedback to Gabor in a constructive and distinctively polite
 manner.

It would have been far more constructive and distinctively polite to
take ten minutes to build and run a profiling version of grep, and
include the results in the OP.

 I would also second, that changing back the default immediately then
 would have been the better choice.

No, it would only have ensured that nobody except Gabor used it.

 In other words: as long as there are unresolved issues, the default
 should be set to GNU grep. This doesn't stop anyone from improving the
 BSD grep we're all waiting for. It only does good to those who rely on
 using grep - expecting correctness and speed.

Based on my 12 years of experience in this project, you are very, very
wrong.

DES
-- 
Dag-Erling Smørgrav - d...@des.no
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-19 Thread V. T. Mueller, Continum

Dag-Erling Smørgrav wrote:

In other words: as long as there are unresolved issues, the default
should be set to GNU grep. This doesn't stop anyone from improving the
BSD grep we're all waiting for. It only does good to those who rely on
using grep - expecting correctness and speed.


Based on my 12 years of experience in this project, you are very, very
wrong.


An 'argumentation' like the above is simply a killer phrase that ends 
every discussion.
OTOH, absence of valid argumentation doesn't necessarily mean that your 
statement has to be wrong.
I'm always willing to learn. Since you've also tried to correct me about 
constructive and polite behaviour - might I suggest that you add a few 
words about what is right in your opinion, when telling folks they are 
wrong? This would add to both, I guess.


Cheers
vt

--
Volker T. Mueller
Continum AG
Bismarckallee 7d
79098 Freiburg i. Br.
Tel. +49 761 21711171
Fax. +49 761 21711198
http://www.continum.net

Sitz der Gesellschaft: Freiburg im Breisgau
Registergericht: Amtsgericht Freiburg, HRB 6866
Vorstand: Rolf Mathis, Volker T. Mueller
Vorsitzender d. Aufsichtsrats: Prof. Dr. Karl-F. Fischbach

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-19 Thread Dag-Erling Smørgrav
V. T. Mueller, Continum v.t.muel...@continum.net writes:
 Dag-Erling Smørgrav d...@des.no writes:
  Based on my 12 years of experience in this project, you are very,
  very wrong.
 An 'argumentation' like the above is simply a killer phrase that ends
 every discussion.

An 'argumentation' like the above is simply a killer phrase that ends
every discussion, especially when it's based on selective quoting.
Here's what you left out:

   I would also second, that changing back the default immediately then
   would have been the better choice.
  No, it would only have ensured that nobody except Gabor used it.

DES
-- 
Dag-Erling Smørgrav - d...@des.no
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-19 Thread Andrew Milton
+---[ V. T. Mueller, Continum ]--
| 
| In other words: as long as there are unresolved issues, the default 
| should be set to GNU grep. This doesn't stop anyone from improving the 
| BSD grep we're all waiting for. It only does good to those who rely on 
| using grep - expecting correctness and speed.

When did -current become -stable?

-- 
Andrew Milton
a...@theinternet.com.au
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-19 Thread John Baldwin
On Wednesday, August 18, 2010 5:54:41 pm Dimitry Andric wrote:
 On 2010-08-18 23:12, Dimitry Andric wrote:
  And one trial is not statistically valid - especially given the small
  differences.  How about multiple multiple trials with ministat.
  
  The result were averages of three trials
 
 Actually, since I kept using Doug's original grep-time-trial.sh, each of
 the three 'trials' consisted of running grep 100 times, and the listed
 time was the total elapsed time for those 100 runs.  So I assume that
 will reasonably average out the differences between each individual run?

You need the distribution, not just the averages so you can detect outliers 
and determine the standard deviation and confidence intervals.  You could use 
ministat on a file that contained all 100 runtimes perhaps.  I would use at 
least 10 trials though, 3 is a bit small.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-19 Thread Dag-Erling Smørgrav
Gabor Kovesdan ga...@freebsd.org writes:
 I've just committed a patch with the kind help of Dimitry Andric,
 which gives BSD grep a huge performance boost. The performance is now
 almost comparable to GNU grep.

Not quite, as Doug pointed out.  I don't know what benchmark you're
using, but I'm using a greatly simplified variant of Doug's:

% time sh -c 'for n in $(jot 1000) ; do /usr/obj/usr/src/usr.bin/grep/grep -q 
^xfce4-wm /usr/ports/INDEX-9 ; done' 
sh -c   13.57s user 7.06s system 99% cpu 20.783 total
% time sh -c 'for n in $(jot 1000) ; do /usr/obj/usr/src/gnu/usr.bin/grep/grep 
-q ^xfce4-wm /usr/ports/INDEX-9 ; done'
sh -c   7.98s user 7.47s system 100% cpu 15.424 total

The bottleneck is now in quite an unexpected location:

  %   cumulative   self  self total   
 time   seconds   secondscalls  ms/call  ms/call  name
 38.8   0.03 0.0312717 0.00 0.00  memchr [5]
 35.6   0.07 0.03  395 0.08 0.08  _read [6]
 16.4   0.08 0.010  100.00%   _mcount [7]
  1.7   0.08 0.0012362 0.00 0.00  memset [9]
  1.5   0.08 0.000  100.00%   .mcount (110)
  1.5   0.08 0.000   43.41%   re_search_internal [8]
  0.8   0.08 0.00  820 0.00 0.00  memcpy [12]
  0.6   0.09 0.0037045 0.00 0.00  free [13]
  0.6   0.09 0.0012332 0.00 0.01  grep_fgetln [4]
  0.6   0.09 0.001 0.4966.27  procfile [3]
  0.4   0.09 0.000  100.00%   
re_string_construct_common [26]
  0.3   0.09 0.001 0.25 0.34  _Read_RuneMagi [27]
  0.1   0.09 0.00  261 0.00 0.00  arena_avail_comp [39]
  0.1   0.09 0.00  155 0.00 0.00  arena_malloc [24]
  0.1   0.09 0.00  153 0.00 0.00  arena_bin_malloc_easy [40]
  0.1   0.09 0.00   54 0.00 0.00  arena_avail_tree_insert 
[35]
  0.1   0.09 0.005 0.02 0.02  arena_purge [37]
  0.1   0.09 0.003 0.04 0.44  setlocale [10]
  0.1   0.09 0.001 0.12 0.46  __wrap_setrunelocale [21]
  0.1   0.09 0.000   21.76%   re_string_destruct [14]
  0.1   0.09 0.000  100.00%   regexec [38]

The culprit seems to be the first memchr() in grep_fgetln().  For some
reason, even with -O2, it is not inlined:

% echo disassemble grep_fgetln | gdb -q -batch -x /dev/stdin 
/usr/obj/usr/src/usr.bin/grep/grep | grep memchr
0x0040291e grep_fgetln+244:   callq  0x40176c memchr
0x004029fa grep_fgetln+464:   callq  0x40176c memchr

DES
-- 
Dag-Erling Smørgrav - d...@des.no
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-19 Thread Dimitry Andric
On 2010-08-17 23:24, Alan Cox wrote:
 So normal mmap is ~3% slower, and prefault mmap does not seem to make
 any measurable difference.  I guess the added complexity is not really
 worth it, for now.
 
 Do you know what fraction of this time is being spent in the kernel?

I ran 100 trials again, but now using time -a -o logfile, so I could
run ministat over the accumulated results.  This gives:

x gnugrep
+ bsdgrep-r210927 (the initial version that started this thread)
* bsdgrep-r211490 (current version)
% bsdgrep-r211490-mmap-plain
# bsdgrep-r211490-mmap-prefault

Real time:
N   Min   MaxMedian   AvgStddev
x 100  1.15  1.98  1.181.21220.11159613
+ 100  8.57 14.26  8.799.1823 1.0496126
* 100  2.81  6.57  2.913.0189 0.4304259
% 100  2.34  4.03  2.993.00220.12635992
# 100  2.85  3.49  2.882.8981   0.075232904

User time:
N   Min   MaxMedian   AvgStddev
x 100 0  0.07  0.030.0239   0.015627934
+ 100   1.6  3.33   1.9 1.9760.30264824
* 100  0.29 1  0.390.40040.08696824
% 100   1.8  3.56  2.732.72740.13260117
# 100  2.78  3.04  2.812.82380.04039652

System time:
N   Min   MaxMedian   AvgStddev
x 100  1.08  1.91  1.151.18090.10953617
+ 100  6.55  10.9  6.947.19050.77911809
* 100  2.38   5.5  2.532.60610.35068445
% 100  0.18  0.53  0.250.2645   0.053586049
# 100  0.03  0.54  0.060.0668   0.052259647

E.g. it looks like bsdgrep with 'plain' mmap performs almost the same
as the regular bsdgrep (both around 3.0s average), but with mmap much
more of the time is spent in user mode.

And it seems prefaulting does help now!  I guess it also makes sense to
add madvise(..., MADV_SEQUENTIAL)?


 Does
 the value of sysctl vm.pmap.pde.mappings increase as a result of your
 test?  If not, there is still room for improvement in the results with
 mmap().

It always stays at 0, I have never seen any other value.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-19 Thread Tijl Coosemans
On Thursday 19 August 2010 15:38:54 Dag-Erling Smørgrav wrote:
 Gabor Kovesdan ga...@freebsd.org writes:
 I've just committed a patch with the kind help of Dimitry Andric,
 which gives BSD grep a huge performance boost. The performance is
 now almost comparable to GNU grep.
 
 Not quite, as Doug pointed out.  I don't know what benchmark you're
 using, but I'm using a greatly simplified variant of Doug's:
 
 % time sh -c 'for n in $(jot 1000) ; do /usr/obj/usr/src/usr.bin/grep/grep -q 
 ^xfce4-wm /usr/ports/INDEX-9 ; done'
 sh -c   13.57s user 7.06s system 99% cpu 20.783 total
 % time sh -c 'for n in $(jot 1000) ; do 
 /usr/obj/usr/src/gnu/usr.bin/grep/grep -q ^xfce4-wm /usr/ports/INDEX-9 ; 
 done'
 sh -c   7.98s user 7.47s system 100% cpu 15.424 total
 
 The bottleneck is now in quite an unexpected location:
 
   %   cumulative   self  self total   
  time   seconds   secondscalls  ms/call  ms/call  name
  38.8   0.03 0.0312717 0.00 0.00  memchr [5]
  35.6   0.07 0.03  395 0.08 0.08  _read [6]
  16.4   0.08 0.010  100.00%   _mcount [7]
   1.7   0.08 0.0012362 0.00 0.00  memset [9]
   1.5   0.08 0.000  100.00%   .mcount (110)
   1.5   0.08 0.000   43.41%   re_search_internal [8]
   0.8   0.08 0.00  820 0.00 0.00  memcpy [12]
   0.6   0.09 0.0037045 0.00 0.00  free [13]
   0.6   0.09 0.0012332 0.00 0.01  grep_fgetln [4]
   0.6   0.09 0.001 0.4966.27  procfile [3]
   0.4   0.09 0.000  100.00%   
 re_string_construct_common [26]
   0.3   0.09 0.001 0.25 0.34  _Read_RuneMagi [27]
   0.1   0.09 0.00  261 0.00 0.00  arena_avail_comp [39]
   0.1   0.09 0.00  155 0.00 0.00  arena_malloc [24]
   0.1   0.09 0.00  153 0.00 0.00  arena_bin_malloc_easy 
 [40]
   0.1   0.09 0.00   54 0.00 0.00  arena_avail_tree_insert 
 [35]
   0.1   0.09 0.005 0.02 0.02  arena_purge [37]
   0.1   0.09 0.003 0.04 0.44  setlocale [10]
   0.1   0.09 0.001 0.12 0.46  __wrap_setrunelocale 
 [21]
   0.1   0.09 0.000   21.76%   re_string_destruct [14]
   0.1   0.09 0.000  100.00%   regexec [38]
 
 The culprit seems to be the first memchr() in grep_fgetln().  For some
 reason, even with -O2, it is not inlined:
 
 % echo disassemble grep_fgetln | gdb -q -batch -x /dev/stdin 
 /usr/obj/usr/src/usr.bin/grep/grep | grep memchr
 0x0040291e grep_fgetln+244: callq  0x40176c memchr
 0x004029fa grep_fgetln+464: callq  0x40176c memchr

The base system gcc doesn't have a built-in version of memchr to inline.


signature.asc
Description: This is a digitally signed message part.


Re: Official request: Please make GNU grep the default

2010-08-19 Thread Alan Cox

Dimitry Andric wrote:

On 2010-08-17 23:24, Alan Cox wrote:
  

So normal mmap is ~3% slower, and prefault mmap does not seem to make
any measurable difference.  I guess the added complexity is not really
worth it, for now.
  

Do you know what fraction of this time is being spent in the kernel?



I ran 100 trials again, but now using time -a -o logfile, so I could
run ministat over the accumulated results.  This gives:

x gnugrep
+ bsdgrep-r210927 (the initial version that started this thread)
* bsdgrep-r211490 (current version)
% bsdgrep-r211490-mmap-plain
# bsdgrep-r211490-mmap-prefault

Real time:
N   Min   MaxMedian   AvgStddev
x 100  1.15  1.98  1.181.21220.11159613
+ 100  8.57 14.26  8.799.1823 1.0496126
* 100  2.81  6.57  2.913.0189 0.4304259
% 100  2.34  4.03  2.993.00220.12635992
# 100  2.85  3.49  2.882.8981   0.075232904

User time:
N   Min   MaxMedian   AvgStddev
x 100 0  0.07  0.030.0239   0.015627934
+ 100   1.6  3.33   1.9 1.9760.30264824
* 100  0.29 1  0.390.40040.08696824
% 100   1.8  3.56  2.732.72740.13260117
# 100  2.78  3.04  2.812.82380.04039652

System time:
N   Min   MaxMedian   AvgStddev
x 100  1.08  1.91  1.151.18090.10953617
+ 100  6.55  10.9  6.947.19050.77911809
* 100  2.38   5.5  2.532.60610.35068445
% 100  0.18  0.53  0.250.2645   0.053586049
# 100  0.03  0.54  0.060.0668   0.052259647

E.g. it looks like bsdgrep with 'plain' mmap performs almost the same
as the regular bsdgrep (both around 3.0s average), but with mmap much
more of the time is spent in user mode.

  


That makes sense to me.  With traditional I/O, such as read(2), the 
copyout to user space fills the  processor's data cache with the data to 
be processed.  Grep's core algorithm in user space shouldn't be 
experiencing cache misses to obtain the data.  By and large, the cache 
misses will have occurred in the kernel.  However, once you switch to 
mmap(2), the kernel never touches the data, and all cache misses occur 
in user space.  You ought to be able to confirm this with pmcstat's 
sampling mode set to sample L2 cache misses.


Here is what actually puzzles me about these results.  With traditional 
I/O, even after the optimizations to bsdgrep, the system time for 
gnugrep is still less than half that of the optimized bsdgrep.  I 
haven't looked at the changes, but I would have thought the system time 
for gnugrep and bsdgrep would be almost the same.



And it seems prefaulting does help now!  I guess it also makes sense to
add madvise(..., MADV_SEQUENTIAL)?

  


This won't matter as long as you are working with memory resident 
files.  With a memory resident file, it would only be a waste of cycles.


  

Does
the value of sysctl vm.pmap.pde.mappings increase as a result of your
test?  If not, there is still room for improvement in the results with
mmap().



It always stays at 0, I have never seen any other value.
  


Addressing this issue would mostly affect the system time, which is 
already tiny for mmap-prefault, so I wouldn't be concerned about this (yet).


Did you ever describe your test machine?  If so, I'm sorry, but I missed 
that.  Is it running an amd64 or i386 kernel?  Can you briefly describe 
what kind of processor and memory it has?


Regards,
Alan

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-19 Thread Ulrich Spörlein
On Thu, 19.08.2010 at 16:42:26 +, David Xu wrote:
 Gabor Kovesdan wrote:
 
  Yes, I'm sorry for my slow reaction, I got a flu some time ago and that 
  prevented me from fixing the bugs earlier. I have several fixes in my 
  working copy, which are being discussed with my mentor. Probably, today 
  or tomorrow they will be committed.
  
  Gabor
  
 
 When will the grep -H print file name for me ?  it is rather painful 
 that the feature is missing. :-(
 So I can not use it with find:
 
 find . -exec grep -H {} world \;
 I don't know which file contains the word world.

Workaround:

find . -exec grep word {} +

(yeah, not what you asked for ...)

Uli
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-19 Thread Doug Barton

On 08/19/2010 04:13, Dag-Erling Smørgrav wrote:

It would have been far more constructive and distinctively polite to
take ten minutes to build and run a profiling version of grep, and
include the results in the OP.


Meta-comment first. des and I are both people of strong opinions, and we 
agree on more than we disagree on. I have no problem with him stating 
his opinion here, and I don't care if he agrees with me after I state 
mine. :)


There are 2 questions, did I do the right thing, and how should people 
report problems in general. As for myself, while I have some facility in 
C it's not my strong suit. Yes, I could have produced a profiling 
version of grep, but it would have taken me a lot more than 10 minutes 
because I don't even build the profiled libs on a regular basis. In this 
specific case I also didn't think it was my job to do so. Gabor is the 
one developing BSD grep, as far as I'm concerned it's up to him to get 
its performance up to par. I certainly have no objection to others 
helping him, and I'm glad that raising the issue of performance has 
resulted in more attention and assistance being directed at the problem. 
But I feel that I did my part by providing simple to reproduce test 
cases that Gabor could use.


More generally however I think that we need to be realistic with what we 
expect people to do about reporting problems. We WANT more regular 
users to use -current early on in the development cycle, and if they 
see problems to report them. Chastising people for not doing profiling 
runs of things that they are reporting problems with is not going to 
accomplish that goal.



Doug

--

Improve the effectiveness of your Internet presence with
a domain name makeover!http://SupersetSolutions.com/

Computers are useless. They can only give you answers.
-- Pablo Picasso

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-19 Thread Dimitry Andric
On 2010-08-19 18:42, David Xu wrote:
 When will the grep -H print file name for me ?  it is rather painful 
 that the feature is missing. :-(
 So I can not use it with find:
 
 find . -exec grep -H {} world \;
 I don't know which file contains the word world.

I think you mean:

  find . -exec grep -H world {} \;

instead?  In any case, the fix is trivial, please try the attachment.
diff --git a/usr.bin/grep/grep.c b/usr.bin/grep/grep.c
index 3cb277c..cc710ef 100644
--- a/usr.bin/grep/grep.c
+++ b/usr.bin/grep/grep.c
@@ -682,8 +682,6 @@ main(int argc, char *argv[])
if (dirbehave == DIR_RECURSE)
c = grep_tree(aargv);
else {
-   if (aargc == 1)
-   hflag = true;
for (c = 0; aargc--; ++aargv) {
if ((finclude || fexclude)  !file_matching(*aargv))
continue;
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Official request: Please make GNU grep the default

2010-08-19 Thread b. f.
Gabor:

One more thing to look into, in addition to the context problems,
ndisgen breakage, and problems on certain file systems:

At r211506, 'grep -wq' does not seem to work properly (in the very
least, it is not the same as with GNU grep), and has broken the
'check-categories' target (and hence builds) of all ports with 'lisp'
in CATEGORIES.

Regards,
   b.

P.S. I hope that you have recovered from your influenza, and are
feeling better now.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-19 Thread John Baldwin
On Thursday, August 19, 2010 10:12:11 am Dimitry Andric wrote:
 On 2010-08-17 23:24, Alan Cox wrote:
  So normal mmap is ~3% slower, and prefault mmap does not seem to make
  any measurable difference.  I guess the added complexity is not really
  worth it, for now.
  
  Do you know what fraction of this time is being spent in the kernel?
 
 I ran 100 trials again, but now using time -a -o logfile, so I could
 run ministat over the accumulated results.  This gives:
 
 x gnugrep
 + bsdgrep-r210927 (the initial version that started this thread)
 * bsdgrep-r211490 (current version)
 % bsdgrep-r211490-mmap-plain
 # bsdgrep-r211490-mmap-prefault
 
 Real time:
 N   Min   MaxMedian   AvgStddev
 x 100  1.15  1.98  1.181.21220.11159613
 + 100  8.57 14.26  8.799.1823 1.0496126
 * 100  2.81  6.57  2.913.0189 0.4304259
 % 100  2.34  4.03  2.993.00220.12635992
 # 100  2.85  3.49  2.882.8981   0.075232904
 
 User time:
 N   Min   MaxMedian   AvgStddev
 x 100 0  0.07  0.030.0239   0.015627934
 + 100   1.6  3.33   1.9 1.9760.30264824
 * 100  0.29 1  0.390.40040.08696824
 % 100   1.8  3.56  2.732.72740.13260117
 # 100  2.78  3.04  2.812.82380.04039652
 
 System time:
 N   Min   MaxMedian   AvgStddev
 x 100  1.08  1.91  1.151.18090.10953617
 + 100  6.55  10.9  6.947.19050.77911809
 * 100  2.38   5.5  2.532.60610.35068445
 % 100  0.18  0.53  0.250.2645   0.053586049
 # 100  0.03  0.54  0.060.0668   0.052259647
 
 E.g. it looks like bsdgrep with 'plain' mmap performs almost the same
 as the regular bsdgrep (both around 3.0s average), but with mmap much
 more of the time is spent in user mode.

I would add user and system time together and compare the total time.  Given 
that statclock only fires at 128 hz, and we use those counts to subdivide 
rux_runtime, I don't put much faith in user vs system time for benchmarks, 
only the total runtime in rux_runtime (which is user + system) is truly 
accurate.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-19 Thread David Xu

But I think BSD grep should be compatible with GNU grep,
because almost all scripts are written for GNU grep before
BSD grep appears, it is not practical to rewrite all existing
scripts. Anyway, thanks for your help.

David Xu

Stein Morten Sandbech wrote:

Hi,

GNU grep is OK.  However standard BSD grep also work:

find . -exec grep -i world {} /dev/null \;

or even:

find . -exec grep -in world {} /dev/null \;

if you want linenumbers ...

hth

Stein Morten



On Aug 19, 2010, at 11:29, freebsd-current-requ...@freebsd.org wrote:


Date: Thu, 19 Aug 2010 16:42:26 +
From: David Xu davi...@freebsd.org
Subject: Re: Official request: Please make GNU grep the default
To: Gabor Kovesdan ga...@freebsd.org
Cc: delp...@freebsd.org, Andrey Chernov a...@nagual.pp.ru,  Doug
Barton do...@freebsd.org, c...@freebsd.org, curr...@freebsd.org
Message-ID: 4c6d5ef2.2040...@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Gabor Kovesdan wrote:

Yes, I'm sorry for my slow reaction, I got a flu some time ago and that 
prevented me from fixing the bugs earlier. I have several fixes in my 
working copy, which are being discussed with my mentor. Probably, today 
or tomorrow they will be committed.


Gabor

When will the grep -H print file name for me ?  it is rather painful 
that the feature is missing. :-(

So I can not use it with find:

find . -exec grep -H {} world \;
I don't know which file contains the word world.

Regards,
David Xu





___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-19 Thread Lev Serebryakov
Hello, Gabor.
You wrote 14 августа 2010 г., 20:10:56:


 2, GNU grep uses internal optimizations to get that performance. I think
 it's a wrong approach because the regex library itself should be 
 optimized instead to keep BSD grep clean and simple and to provide the
 same efficiency for all utilities that are linked to the regex library.
 Our libc-regex is definitely need to be replaced at some point in the 
 future but that's a more complex item. See the following references:
 http://wiki.freebsd.org/BSDgrep
 http://wiki.freebsd.org/Regex
  You  don't  have  these  links  on Wiki page, so I post them here. I
  hope,  you've  read  these  articles,  but it is better to duplicate
  links, than miss them.

  http://swtch.com/~rsc/regexp/regexp1.html
  http://swtch.com/~rsc/regexp/


  And  it  iw very strange to see TRE s slow, because it seems, it
  is  based  on fast linear approcach, when gnu-regexp is old, slow,
  one...

-- 
// Black Lion AKA Lev Serebryakov l...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-19 Thread Szilveszter Adam
On Thu, Aug 19, 2010 at 09:42:01PM +, b. f. wrote:
 Gabor:
 
 One more thing to look into, in addition to the context problems,
 ndisgen breakage, and problems on certain file systems:
 
 At r211506, 'grep -wq' does not seem to work properly (in the very
 least, it is not the same as with GNU grep), and has broken the
 'check-categories' target (and hence builds) of all ports with 'lisp'
 in CATEGORIES.

Seconded. This also breaks the ports using bsd.apache.mk, and what's
worse, it does so silently. I have been bitten by this myself with
www/apache22, several core modules have not been built resulting in a
useless apache installation.

So, I believe there is more to do here than just performance
optimisation.

-- 
Regards:

Szilveszter ADAM
Budapest
Hungary
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-18 Thread Gabor Kovesdan

 Em 2010.08.13. 10:43, Doug Barton escreveu:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Gabor,

I hope at this point it goes without saying that I have a lot of respect
for the work you've done on BSD grep, and I've already told you that I
think you're very courageous for taking the project on. I've been
testing and evaluating it for some time now, and I think I've given it a
fair trial. You've done a fairly good job of responding to bug reports,
and I understand that the exposure BSD grep has received as the default
in HEAD has been very valuable in exposing additional areas that need
work. However, with all that in mind I am officially asking you to
please change the default in HEAD to GNU grep. (Note, I am _not_ asking
you to remove BSD grep from the tree, just to change the default.)

My reason is simple, performance. [...]


I've just committed a patch with the kind help of Dimitry Andric, which 
gives BSD grep a huge performance boost. The performance is now almost 
comparable to GNU grep. I think with this, BSD grep may remain default 
if no other serious issues come up. Please report if you notice 
something weird.


I know about some minor issues, which aren't fixed yet. I'll be out for 
4 days as of tomorrow but when I come back I'll take care of these:

- Infinite loop when reading directory on ZFS/NFS filesystem
- Problems with context grepping

When reply, please remove core@ from CC, let's not bother them with 
this, I just wanted to let them know that I'm not neglecting this issue 
but if still demanded for a good reason, I'll switch back to default GNU 
grep.


Gabor


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-18 Thread M. Warner Losh
In message: 4c6c1cfe.6060...@freebsd.org
Gabor Kovesdan ga...@freebsd.org writes:
: When reply, please remove core@ from CC, let's not bother them with
: this, I just wanted to let them know that I'm not neglecting this
: issue but if still demanded for a good reason, I'll switch back to
: default GNU grep.

So making it default turned out well in the end.  Sure, there was pain
involved (but this is current), but making it default exposed the pain
that would otherwise have gone unnoticed.  The big hue and cry, while
excessive at times, did result in people actually running the
profiling tools which pointed to the buffering as the number one thing
to fix.  That being fixed now, it looks like we can go to the next
stage: benchmarking again.

Thanks, Gabor and everybody else that contributed, for seeing this
through.

Warner
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-18 Thread Wilko Bulte


Op 18 aug. 2010 om 18:48 heeft Gabor Kovesdan ga...@freebsd.org het volgende 
geschreven:

 Em 2010.08.13. 10:43, Doug Barton escreveu:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256
 
 Gabor,
 
 I hope at this point it goes without saying that I have a lot of respect
 for the work you've done on BSD grep, and I've already told you that I
 think you're very courageous for taking the project on. I've been
 testing and evaluating it for some time now, and I think I've given it a
 fair trial. You've done a fairly good job of responding to bug reports,
 and I understand that the exposure BSD grep has received as the default
 in HEAD has been very valuable in exposing additional areas that need
 work. However, with all that in mind I am officially asking you to
 please change the default in HEAD to GNU grep. (Note, I am _not_ asking
 you to remove BSD grep from the tree, just to change the default.)
 
 My reason is simple, performance. [...]
 
 I've just committed a patch with the kind help of Dimitry Andric, which gives 
 BSD grep a huge performance boost. The performance is now almost comparable 
 to GNU grep. I think with this, BSD grep may remain default if no other 
 serious issues come up. Please report if you notice something weird.
 
 I know about some minor issues, which aren't fixed yet. I'll be out for 4 
 days as of tomorrow but when I come back I'll take care of these:
 - Infinite loop when reading directory on ZFS/NFS filesystem
 - Problems with context grepping
 
 When reply, please remove core@ from CC, let's not bother them with this, I 
 just wanted to let them know that I'm not neglecting this issue but if still 
 demanded for a good reason,

No worries there Gabor!

Wilko

 I'll switch back to default GNU grep.
 
 Gabor
 
 
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-18 Thread Peter Jeremy
On 2010-Aug-17 22:29:46 +0200, Dimitry Andric dimi...@andric.com wrote:
On 2010-08-17 18:29, Alan Cox wrote:
 Try it again on a memory resident file with the MAP_PREFAULT_READ option
 that is provided by this patch:
 
 http://www.cs.rice.edu/~alc/MAP_PREFAULT_READ.patch

A time trial gives:

  grep with normal mmap()   1396s
  grep with prefault mmap() 1354s
  grep with regular read()  1354s

Is this with uncached (ie remount the filesystem on each test) or cached
data?  Which filesystem (and does it change for different filesystems
(particularly between UFS and ZFS))?

And one trial is not statistically valid - especially given the small
differences.  How about multiple multiple trials with ministat.

-- 
Peter Jeremy


pgpPvNHrEZWZQ.pgp
Description: PGP signature


Re: Official request: Please make GNU grep the default

2010-08-18 Thread Dimitry Andric
On 2010-08-18 22:52, Peter Jeremy wrote:
  grep with normal mmap()   1396s
  grep with prefault mmap() 1354s
  grep with regular read()  1354s
 
 Is this with uncached (ie remount the filesystem on each test) or cached
 data?

This is all on the same filesystem, and the test file is ~370MB, so
eventually all data will be in RAM, most likely.  E.g. normal mmap()
seems to add a bit of overhead that explains the slower result.


 Which filesystem (and does it change for different filesystems
 (particularly between UFS and ZFS))?

I only checked on UFS2.


 And one trial is not statistically valid - especially given the small
 differences.  How about multiple multiple trials with ministat.

The result were averages of three trials; they were fairly close to each
other, but I didn't calculate the standard deviation.  I was not aware
of ministat, which looks like a real handy program. :)
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-18 Thread Dimitry Andric
On 2010-08-18 23:12, Dimitry Andric wrote:
 And one trial is not statistically valid - especially given the small
 differences.  How about multiple multiple trials with ministat.
 
 The result were averages of three trials

Actually, since I kept using Doug's original grep-time-trial.sh, each of
the three 'trials' consisted of running grep 100 times, and the listed
time was the total elapsed time for those 100 runs.  So I assume that
will reasonably average out the differences between each individual run?

Also, I'm not sure if the actual disk/fs reading performance will differ
much between GNU grep and any other grep, since they will all basically
read through the whole test file sequentially.  For instance, when I
profiled GNU grep with gprof, the top time results were:

  %   cumulative   self  self total
 time   seconds   secondscalls  ms/call  ms/call  name
 99.1   0.59 0.5911497 0.05 0.05  read [5]
  0.6   0.59 0.0011497 0.00 0.00  kwsexec [8]
  0.1   0.59 0.000  100.00%   .mcount (130)
  0.1   0.59 0.001 0.62   594.77  grepfile [3]
  0.1   0.60 0.0011496 0.00 0.00  memmove [9]
  0.0   0.60 0.004 0.03 0.03  memchr [10]
  0.0   0.60 0.0012541 0.00 0.00  memset [16]
  0.0   0.60 0.0011497 0.00 0.00  EGexecute [7]
  0.0   0.60 0.0011497 0.00 0.05  fillbuf [4]
  0.0   0.60 0.0011497 0.00 0.00  grepbuf [6]

E.g. it looks like most of the time is spent in the read system call.
If mmap'ing can help improve that, it would be nice, but I suspect the
gains would be marginal.

The actual performance difference is much more likely to be related to
how efficiently grep parses out lines, and searches for regexps in
there.  BSD grep still has quite some room for improvement in that
department.

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-17 Thread Dimitry Andric
On 2010-08-16 10:55, Dag-Erling Smørgrav wrote:
 Dimitry Andric dimi...@andric.com writes:
 - Uses plain file descriptors instead of struct FILE, since the
   buffering is done manually anyway, and it makes it easier to support
   gzip and bzip2.
 It might be worth a shot adding mmap(2) support as well, i.e. when
 processing an uncompressed regular file, try to mmap(2) it first, and if
 that fails, fall back to the plain buffered read(2) method.

I added a simple mmap to grep, and time-trialed it, but the mmap version
was somewhat slower than the regular version.  I understood from Kostik
Belousov that readahead does not work properly with mmap, and it should
not be used for one-time reads.

I also experimented with different buffer sizes on the same big test
file, and this gives the following results (times in s):

buffer size test1   test2   test3   average
=== === === === ===
512 467 484 465 472
  1,024 391 415 392 399
  2,048 361 356 365 361
  4,096 353 353 356 354
  8,192 348 345 357 350
 16,384 341 373 350 354
 32,768 339 348 346 344
 65,536 336 359 371 355
262,144 334 352 350 345
  1,048,576 334 350 351 345
  2,097,152 339 342 369 350
373,293,056 544 547 559 550

E.g. the 32k buffer size that I borrowed from GNU grep seems to be
reasonable enough.  There is no profit in wasting huge amounts of memory
to speed things up.

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-17 Thread Kostik Belousov
[Cc: list sanitized]

On Tue, Aug 17, 2010 at 05:28:08PM +0200, Dimitry Andric wrote:
 On 2010-08-16 10:55, Dag-Erling Sm??rgrav wrote:
  Dimitry Andric dimi...@andric.com writes:
  - Uses plain file descriptors instead of struct FILE, since the
buffering is done manually anyway, and it makes it easier to support
gzip and bzip2.
  It might be worth a shot adding mmap(2) support as well, i.e. when
  processing an uncompressed regular file, try to mmap(2) it first, and if
  that fails, fall back to the plain buffered read(2) method.
 
 I added a simple mmap to grep, and time-trialed it, but the mmap version
 was somewhat slower than the regular version.  I understood from Kostik
 Belousov that readahead does not work properly with mmap, and it should
 not be used for one-time reads.
This is not exactly what I said. I argue that read-ahead implemented
by vm_faul() is much less efficient that buffer clustering. Also,
the cost of setting user mapping for the one time read is also non-trivial.
The conclusion is right, it is better to use read(2) for one-time read.


pgpNutWnngklZ.pgp
Description: PGP signature


Re: Official request: Please make GNU grep the default

2010-08-17 Thread Alan Cox
2010/8/17 Dimitry Andric dimi...@andric.com

 On 2010-08-16 10:55, Dag-Erling Smørgrav wrote:
  Dimitry Andric dimi...@andric.com writes:
  - Uses plain file descriptors instead of struct FILE, since the
buffering is done manually anyway, and it makes it easier to support
gzip and bzip2.
  It might be worth a shot adding mmap(2) support as well, i.e. when
  processing an uncompressed regular file, try to mmap(2) it first, and if
  that fails, fall back to the plain buffered read(2) method.

 I added a simple mmap to grep, and time-trialed it, but the mmap version
 was somewhat slower than the regular version.  I understood from Kostik
 Belousov that readahead does not work properly with mmap, and it should
 not be used for one-time reads.



Try it again on a memory resident file with the MAP_PREFAULT_READ option
that is provided by this patch:

http://www.cs.rice.edu/~alc/MAP_PREFAULT_READ.patch

Regards,
Alan
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-17 Thread Alan Cox
On Tue, Aug 17, 2010 at 10:45 AM, Kostik Belousov kostik...@gmail.comwrote:

 [Cc: list sanitized]

 On Tue, Aug 17, 2010 at 05:28:08PM +0200, Dimitry Andric wrote:
  On 2010-08-16 10:55, Dag-Erling Sm??rgrav wrote:
   Dimitry Andric dimi...@andric.com writes:
   - Uses plain file descriptors instead of struct FILE, since the
 buffering is done manually anyway, and it makes it easier to support
 gzip and bzip2.
   It might be worth a shot adding mmap(2) support as well, i.e. when
   processing an uncompressed regular file, try to mmap(2) it first, and
 if
   that fails, fall back to the plain buffered read(2) method.
 
  I added a simple mmap to grep, and time-trialed it, but the mmap version
  was somewhat slower than the regular version.  I understood from Kostik
  Belousov that readahead does not work properly with mmap, and it should
  not be used for one-time reads.
 This is not exactly what I said. I argue that read-ahead implemented
 by vm_faul() is much less efficient that buffer clustering. Also,
 the cost of setting user mapping for the one time read is also non-trivial.
 The conclusion is right, it is better to use read(2) for one-time read.


The mapping (and unmapping) costs should be relatively small if the contents
of the file can be prefaulted using 2/4MB pages.  In such cases, we still
touch every struct vm_page in the 2/4MB region, but we only create and
destroy one PTE and PV entry, and perform a single INVLPG.

Alan
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-17 Thread Dimitry Andric
On 2010-08-17 18:29, Alan Cox wrote:
 Try it again on a memory resident file with the MAP_PREFAULT_READ option
 that is provided by this patch:
 
 http://www.cs.rice.edu/~alc/MAP_PREFAULT_READ.patch

A time trial gives:

  grep with normal mmap()   1396s
  grep with prefault mmap() 1354s
  grep with regular read()  1354s

So normal mmap is ~3% slower, and prefault mmap does not seem to make
any measurable difference.  I guess the added complexity is not really
worth it, for now.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-17 Thread Alan Cox
On Tue, Aug 17, 2010 at 3:29 PM, Dimitry Andric dimi...@andric.com wrote:

 On 2010-08-17 18:29, Alan Cox wrote:
  Try it again on a memory resident file with the MAP_PREFAULT_READ option
  that is provided by this patch:
 
  http://www.cs.rice.edu/~alc/MAP_PREFAULT_READ.patchhttp://www.cs.rice.edu/%7Ealc/MAP_PREFAULT_READ.patch

 A time trial gives:

  grep with normal mmap()   1396s
  grep with prefault mmap() 1354s
  grep with regular read()  1354s

 So normal mmap is ~3% slower, and prefault mmap does not seem to make
 any measurable difference.  I guess the added complexity is not really
 worth it, for now.


Do you know what fraction of this time is being spent in the kernel?  Does
the value of sysctl vm.pmap.pde.mappings increase as a result of your
test?  If not, there is still room for improvement in the results with
mmap().

Alan
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-16 Thread Dag-Erling Smørgrav
Dimitry Andric dimi...@andric.com writes:
 - Uses plain file descriptors instead of struct FILE, since the
   buffering is done manually anyway, and it makes it easier to support
   gzip and bzip2.

It might be worth a shot adding mmap(2) support as well, i.e. when
processing an uncompressed regular file, try to mmap(2) it first, and if
that fails, fall back to the plain buffered read(2) method.

DES
-- 
Dag-Erling Smørgrav - d...@des.no
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Scripting language in base [was Re: Official request: Please make GNU grep the default]

2010-08-16 Thread Peter Jeremy
Since this is now well off the original topic.

On 2010-Aug-15 12:57:23 +0200, Ivan Voras ivo...@freebsd.org wrote:
This is my long-term point - it really would be beneficial to have an
alternative, richer language in base which would fall between the
categories of a good system language but far too complex for simple
string-parsing stuff which is C and a good glue language for system
utilities but lacking more evolved concepts which is shell.

I'd say the bigger problem with C is that its native string processing
capabilities are basically non-existent.  Compared to awk, perl, ruby,
(insert your favourite scripting language here) or even sh, the
equivalent code in C is going to be much longer and less clear.

Note that the system does have a language intermediate between sh
and C - awk.  The main downside of awk is its lack of pipe support.

default. My current favorite is lua because it's very small and easily
embeddable and extendable by C code, but there are others - some
JavaScript engines probably fit the description.

You have previously suggested importing lua.  I'd suggest that all
you need is the killer app written in lua to justify importing it.

Another option might be ficl - the code is already in the base system,
it would just need to be compiled.  Of course, the downside is that
FORTH is a very niche language.

-- 
Peter Jeremy


pgpjJGFhyLDo6.pgp
Description: PGP signature


Re: Official request: Please make GNU grep the default

2010-08-16 Thread Dimitry Andric
On 2010-08-15 21:49, Dimitry Andric wrote:
 ...I
 have attached a more complete patch that:
 
 - Replaces the horrendously inefficient grep_fgetln() with mostly the
   same implementation as the libc fgetln() function.
 - Uses plain file descriptors instead of struct FILE, since the
   buffering is done manually anyway, and it makes it easier to support
   gzip and bzip2.
 - Let the bzip2 reader just read the file as plain data, when the
   initial magic number doesn't match, mimicking the behaviour of GNU
   grep.

Here is a new patch, updated against Gabor's changes in r211364.
diff --git a/usr.bin/grep/fastgrep.c b/usr.bin/grep/fastgrep.c
index c66f0a7..30a2dc5 100644
--- a/usr.bin/grep/fastgrep.c
+++ b/usr.bin/grep/fastgrep.c
@@ -198,7 +198,7 @@ fastcomp(fastgrep_t *fg, const char *pat)
 }
 
 int
-grep_search(fastgrep_t *fg, unsigned char *data, size_t len, regmatch_t 
*pmatch)
+grep_search(fastgrep_t *fg, const unsigned char *data, size_t len, regmatch_t 
*pmatch)
 {
unsigned int j;
int ret = REG_NOMATCH;
diff --git a/usr.bin/grep/file.c b/usr.bin/grep/file.c
index 1872d0e..185ab1d 100644
--- a/usr.bin/grep/file.c
+++ b/usr.bin/grep/file.c
@@ -37,7 +37,7 @@ __FBSDID($FreeBSD$);
 #include bzlib.h
 #include err.h
 #include errno.h
-#include stdio.h
+#include fcntl.h
 #include stdlib.h
 #include string.h
 #include unistd.h
@@ -47,138 +47,160 @@ __FBSDID($FreeBSD$);
 
 #include grep.h
 
-static char fname[MAXPATHLEN]; /* file name */
+#defineMAXBUFSIZ   (32 * 1024)
+#defineLNBUFBUMP   80
 
-#define MAXBUFSIZ  (16 * 1024)
-#define PREREAD_M  0.2
+static gzFile gzbufdesc;
+static BZFILE* bzbufdesc;
 
-/* Some global variables for the buffering and reading. */
-static char*lnbuf;
-static size_t   lnbuflen;
-static unsigned char *binbuf;
-static int  binbufsiz;
-unsigned char  *binbufptr;
-static int  bzerr;
+static unsigned char buffer[MAXBUFSIZ];
+static unsigned char *bufpos;
+static size_t bufrem;
 
-#define iswbinary(ch)  (!iswspace((ch))  iswcntrl((ch))  \
-   (ch != L'\b')  (ch != L'\0'))
+static unsigned char *lnbuf;
+static size_t lnbuflen;
 
-/*
- * Returns a single character according to the file type.
- * Returns -1 on failure.
- */
-static inline int
-grep_fgetc(struct file *f)
+static int
+grep_refill(struct file *f)
 {
-   unsigned char c;
+   ssize_t nr;
+   int bzerr;
 
-   switch (filebehave) {
-   case FILE_STDIO:
-   return (getc_unlocked(f-f));
-   case FILE_GZIP:
-   return (gzgetc(f-gzf));
-   case FILE_BZIP:
-   BZ2_bzRead(bzerr, f-bzf, c, 1);
-   if (bzerr == BZ_STREAM_END)
-   return (-1);
-   else if (bzerr != BZ_SEQUENCE_ERROR  bzerr != BZ_OK)
-   errx(2, %s, getstr(2));
-   return (c);
+   bufpos = buffer;
+   bufrem = 0;
+
+   if (filebehave == FILE_GZIP) {
+   nr = gzread(gzbufdesc, buffer, MAXBUFSIZ);
+   } else if (filebehave == FILE_BZIP  bzbufdesc != NULL) {
+   nr = BZ2_bzRead(bzerr, bzbufdesc, buffer, MAXBUFSIZ);
+   switch (bzerr) {
+   case BZ_OK:
+   case BZ_STREAM_END:
+   /* No problem, nr will be okay */
+   break;
+   case BZ_DATA_ERROR_MAGIC:
+   /*
+* As opposed to gzread(), which simply returns the
+* plain file data, if it is not in the correct
+* compressed format, BZ2_bzRead() instead aborts.
+*
+* So, just restart at the beginning of the file again,
+* and use plain reads from now on.
+*/
+   BZ2_bzReadClose(bzerr, bzbufdesc);
+   bzbufdesc = NULL;
+   if (lseek(f-fd, 0, SEEK_SET) == -1)
+   return (EOF);
+   nr = read(f-fd, buffer, MAXBUFSIZ);
+   break;
+   default:
+   /* Make sure we exit with an error */
+   nr = -1;
+   }
+   } else {
+   nr = read(f-fd, buffer, MAXBUFSIZ);
}
-   return (-1);
+
+   if (nr = 0)
+   return (EOF);
+
+   bufrem = nr;
+   return (0);
 }
 
-/*
- * Returns true if the file position is a EOF, returns false
- * otherwise.
- */
-static inline int
-grep_feof(struct file *f)
+static int
+grep_lnbufgrow(size_t newlen)
 {
+   unsigned char *p;
 
-   switch (filebehave) {
-   case FILE_STDIO:
-   return (feof_unlocked(f-f));
-   case FILE_GZIP:
-   return (gzeof(f-gzf));
-   case FILE_BZIP:
-   return (bzerr == BZ_STREAM_END);
-   }
-   return (1);
+   if 

Re: Official request: Please make GNU grep the default

2010-08-16 Thread Peter Jeremy
On 2010-Aug-16 10:55:18 +0200, Dag-Erling Smørgrav d...@des.no wrote:
It might be worth a shot adding mmap(2) support as well, i.e. when
processing an uncompressed regular file, try to mmap(2) it first, and if
that fails, fall back to the plain buffered read(2) method.

Note that ZFS and mmap() don't get on especially well together so this
isn't a definite win.  You also need to allow for files that are too
large to be mapped in one go.

-- 
Peter Jeremy


pgphMfssk1asB.pgp
Description: PGP signature


Re: Official request: Please make GNU grep the default

2010-08-16 Thread Sean C. Farley

On Sun, 15 Aug 2010, Dag-Erling Smørgrav wrote:


Sean C. Farley s...@freebsd.org writes:

This should trim some time off BSD grep.


Did you actually test your patch?  It makes absolutely no measurable 
difference.


Yes, I saw a reduction, using the first test script Doug provided, from 
36 to 27 seconds.  I only sent the patch after profiling confirmed a 
reduction.  The script ran non-profiled grep 100 times.  Trim did not 
imply it would reduce the time from 36 to 2 seconds.


Here are profiles of one execution of bsdgrep using the parameters from 
the script.


Non-patched grep (cumulative time 4.17 seconds):
  %   cumulative   self  self total
 time   seconds   secondscalls  ms/call  ms/call  name
 79.5   3.32 3.320  100.00%   _mcount [1]
  7.9   3.65 0.330  100.00%   .mcount (106)
  4.5   3.83 0.19 21971711 0.00 0.00  fgetc [7]
  3.5   3.98 0.15 21993762 0.00 0.00  grep_feof [8]
  2.5   4.09 0.10 21971711 0.00 0.00  grep_fgetc [6]
  1.5   4.15 0.0622051 0.00 0.02  grep_fgetln [5]
  0.5   4.17 0.02 1352 0.02 0.02  read [9]
  0.0   4.17 0.00   67 0.02 0.02  memset [14]
  0.0   4.17 0.00   61 0.01 0.01  arena_run_split [17]
  0.0   4.17 0.001 0.50   522.57  procfile [4]
...

Patched grep (cumulative time 2.82 seconds):
  %   cumulative   self  self total
 time   seconds   secondscalls  ms/call  ms/call  name
 77.4   2.19 2.190  100.00%   _mcount [1]
  8.4   2.42 0.240  100.00%   .mcount (109)
  6.1   2.60 0.17 21971711 0.00 0.00  grep_fgetc [6]
  4.9   2.74 0.14 21993762 0.00 0.00  grep_feof [7]
  2.7   2.81 0.0822051 0.00 0.02  grep_fgetln [5]
  0.3   2.82 0.01 1352 0.01 0.01  _read [10]
  0.1   2.82 0.00   67 0.03 0.03  memset [13]
  0.1   2.82 0.00   10 0.15 0.15  free [16]
  0.0   2.82 0.001 1.00   398.11  procfile [4]
...

getc_unlocked() is merely a macro around __sgetr() as opposed to calling 
into getc().  My patch was to show it could be reduced by removing some 
of stdio's overhead (one function call and unneeded locking).  I did not 
have time to do a complete removal of stdio as I was unsure of any 
dependencies this version of bsdgrep had on stdio.


For some reason, I thought there was mention of making bsdgrep into an 
application and library.  If threading became involved, losing fgetc() 
would help more.  Without threading, the test for locking would still be 
performed in getc().


Sean
--
s...@freebsd.org___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Official request: Please make GNU grep the default

2010-08-16 Thread Dag-Erling Smørgrav
Sean C. Farley s...@freebsd.org writes:
 Dag-Erling Smørgrav d...@des.no writes:
  Did you actually test your patch?  It makes absolutely no measurable
  difference.
 Yes, I saw a reduction,

I didn't...

DES
-- 
Dag-Erling Smørgrav - d...@des.no
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-16 Thread Gabor Kovesdan

Em 2010.08.15. 21:49, Dimitry Andric escreveu:

   GNU grep
   Elapsed time: 57 seconds

   BSD grep (original)
   Elapsed time: 820 seconds  (~14.4x slower than GNU grep)

   BSD grep (quickfixed)
   Elapsed time: 115 seconds  (~2.0x slower than GNU grep)

It proves that getting rid of the fgetc's is certainly worthwhile, and I
have attached a more complete patch that:

- Replaces the horrendously inefficient grep_fgetln() with mostly the
   same implementation as the libc fgetln() function.
- Uses plain file descriptors instead of struct FILE, since the
   buffering is done manually anyway, and it makes it easier to support
   gzip and bzip2.
- Let the bzip2 reader just read the file as plain data, when the
   initial magic number doesn't match, mimicking the behaviour of GNU
   grep.

There is probably more room for optimization, but let's see if this
survives a bunch of tests first. :)
   

Thanks Dmitry,
I've also started to work on a similar solution but you were extremely 
fast. :) I'm checking your patch just now and will tell my experiences 
if I see any regression.


Gabor
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-16 Thread Gabor Kovesdan

Em 2010.08.16. 16:11, Dag-Erling Smørgrav escreveu:

Sean C. Farleys...@freebsd.org  writes:
   

Dag-Erling Smørgravd...@des.no  writes:
 

Did you actually test your patch?  It makes absolutely no measurable
difference.
   

Yes, I saw a reduction,
 

I didn't...
   

I also saw a reduction by 8-30% depending on the particular case.

Gabor

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-16 Thread Doug Barton

On 08/16/2010 03:42, Dimitry Andric wrote:

On 2010-08-15 21:49, Dimitry Andric wrote:

...I
have attached a more complete patch that:

- Replaces the horrendously inefficient grep_fgetln() with mostly the
   same implementation as the libc fgetln() function.
- Uses plain file descriptors instead of struct FILE, since the
   buffering is done manually anyway, and it makes it easier to support
   gzip and bzip2.
- Let the bzip2 reader just read the file as plain data, when the
   initial magic number doesn't match, mimicking the behaviour of GNU
   grep.


Here is a new patch, updated against Gabor's changes in r211364.


Huge improvement!

r211364:With your patch:

./grep-time-trial
GNU grep
Elapsed time: 2 seconds

BSD grepBSD grep
Elapsed time: 33 secondsElapsed time: 16 seconds

./grep-time-trial-2
GNU grep
Elapsed time: 3 seconds

BSD grepBSD grep
Elapsed time: 18 secondsElapsed time: 11 seconds


--

Improve the effectiveness of your Internet presence with
a domain name makeover!http://SupersetSolutions.com/

Computers are useless. They can only give you answers.
-- Pablo Picasso

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-15 Thread Bjoern A. Zeeb

On Sat, 14 Aug 2010, Doug Barton wrote:

...

http://people.freebsd.org/~dougb/grep-time-trial-2.sh.txt

Typical times for me, with 489 ports:

GNU grep
Elapsed time: 3 seconds

BSD grep
Elapsed time: 17 seconds


Which version of GNU grep is this that you have in /usr/local?

/bz

--
Bjoern A. Zeeb   This signature is about you not me.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-15 Thread Alban Hertroys
On 15 Aug 2010, at 3:12, Doug Barton wrote:

 (And before anyone bothers to reply saying Use pkg_info -O for that
 I'll save you the trouble. My version is from 10-20% faster. Not sure
 why, don't really care.) :)


Congrats for beating the performance of a(nother) utility in base, but - 
regardless of whether you'd use it in that case - doesn't that just indicate 
that pkg_info could use some performance improvements as well?

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:930,4c67abaf967636193329187!


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-15 Thread Ivan Voras
On 15 August 2010 02:45, Doug Barton do...@freebsd.org wrote:
 Ivan,

 I know that you mean this at least semi-humorously, however I'm going to
 provide a dead-serious reply below.

Thank you for your level-headed response - it's actually better than
continuing less seriously or explosively :) Also, sorry for
redirecting your thread but it provides me context.

 Again, partial agreement. One of the reasons I resisted INDEX support
 for so long was that my original idea of it was to do exactly what you
 suggest here, parse it once then look up the data internally. However
 even though I _can_ do this in shell it actually makes the performance
 worse since now I've got his huge memory footprint to pass around every
 time portmaster calls itself recursively (which for those who don't know
 is portmaster's entire model of operation).

This is my long-term point - it really would be beneficial to have an
alternative, richer language in base which would fall between the
categories of a good system language but far too complex for simple
string-parsing stuff which is C and a good glue language for system
utilities but lacking more evolved concepts which is shell.

[skip the following section, I was going deep into wishful thinking territory]

That said, I know it's useless to simply import something in the hope
it will be useful in the future. My best bet is that I (or someone
else) would write something useful enough to be imported in base in
such a language, which would warrant importing the language itself. I
also know that perl was there and was removed because of maintainance
problems and clashing between user expecting it to be from ports and
having an old version in base, so this potential new language will
have to not clash with ports and not be used by installed ports by
default. My current favorite is lua because it's very small and easily
embeddable and extendable by C code, but there are others - some
JavaScript engines probably fit the description.

 BUT, none of that is germane to my actual argument. I was very careful
 to NOT say, BSD grep is slow, which screws up portmaster, so the
 default has to change. What I said was, BSD grep is anywhere from 6 to
 15 TIMES slower than GNU grep in all cases, so the default needs to
 change.

Yes, and I agree - having new grep which is about an order of
magnitude slower then the old one is a bad situation.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-15 Thread Steven Hartland
- Original Message - 
From: Steve Kargl s...@troutmask.apl.washington.edu

Whereas switching the default back to GNU grep *guarantees*
neither unsophisticated nor sophosticated user will test
BSD grep.

It seems that you are letting a poor design decision with
respect to portmaster impair others contribution to FreeBSD.
I suspect that you could have added a USE_GREP knob to
the port infrastructure and updated your port to use
ports/textproc/gnugrep in the time that you have used to
post and reply here.


It may be ideal to move to a bsd licensed grep implementation
but anything more than a fractional slowdown is an unacceptable
penalty.

Doug is not seeing 1.1 times slower or even 1.5 times slower we
are talking 6 - 15 times slower by his measurements and that
is not something as a users we would want to use.

The fact its a script and there may be more efficient ways of
implementing it doesn't matter one jot. The fact that the new
default grep is so much slower than the one its trying to
replace is what matters, so until its brought up to a comparable
speed then my vote would be switch back to GNU grep as the default.

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-15 Thread Steven Hartland
- Original Message - 
From: Andrew Thompson thom...@freebsd.org

On 15 August 2010 13:55, Doug Barton do...@freebsd.org wrote:

Our default grep should be significantly slower than the old grep because:

I think that new grep which is  times slower than the old grep is still
in the acceptable range.



I think that new grep which is 1000 times slower than the old grep is
still in the acceptable range.

Can we drop this now.


I assume your joking right? Either that or your never use grep day to day
so don't really care :(

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-15 Thread Astrodog
On Sun, Aug 15, 2010 at 12:10 PM, Steven Hartland
kill...@multiplay.co.uk wrote:
 - Original Message - From: Andrew Thompson thom...@freebsd.org

 On 15 August 2010 13:55, Doug Barton do...@freebsd.org wrote:

 Our default grep should be significantly slower than the old grep
 because:

 I think that new grep which is  times slower than the old grep is
 still
 in the acceptable range.


 I think that new grep which is 1000 times slower than the old grep is
 still in the acceptable range.

 Can we drop this now.

 I assume your joking right? Either that or your never use grep day to day
 so don't really care :(


I'd like my grep to be orange, please, not this kiwi lime tripe the
rest of you are pushing.

The person who committed it has already said they'll back it out, but
leave it in the tree for experimenters, until some of these concerns
are addressed. At that point, we can all discuss if the performance
penalty (assuming there is one, at that point) is worth having a BSD
licensed version.
Until then, can this thread die?

--- Harrison
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-15 Thread Tim Kientzle

On Aug 15, 2010, at 1:56 AM, Alban Hertroys wrote:

 On 15 Aug 2010, at 3:12, Doug Barton wrote:
 
 (And before anyone bothers to reply saying Use pkg_info -O for that
 I'll save you the trouble. My version is from 10-20% faster. Not sure
 why, don't really care.) :)
 
 
 Congrats for beating the performance of a(nother) utility in base, but - 
 regardless of whether you'd use it in that case - doesn't that just indicate 
 that pkg_info could use some performance improvements as well?

http://libpkg.googlecode.com/

This is David Forsyth's GSoC project to build a standard library
for managing the on-disk package database.

His pkg_info implementation built on this library is in fact
a lot faster than what's currently in the tree.  He's starting
to prototype pkg_delete and pkg_add on the new library
but there's still a lot of work to do.

Tim

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-15 Thread Dag-Erling Smørgrav
Sean C. Farley s...@freebsd.org writes:
 This should trim some time off BSD grep.

Did you actually test your patch?  It makes absolutely no measurable
difference.

DES
-- 
Dag-Erling Smørgrav - d...@des.no
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-15 Thread Dimitry Andric
On 2010-08-15 21:07, Dag-Erling Smørgrav wrote:
 I built a profiling version of BSD grep and ran it with a regexp that
 matches only the very last line in (my copy of) INDEX-9.  The results
 are pretty clear:

[I also did precisely that, and I just read your mail when I was
composing the following... :) ]

I had a look at Doug Barton's time trial script for GNU and BSD grep,
and decided to profile both greps a bit.  This gave some interesting
results.

I started with GNU grep, doing approximately the same search as in
Doug's trial, e.g. grep -m1 |/usr/ports/x11-wm/xfce4-wm|, but I made
the testfile 373491 lines instead of the original 21971, with the
only matching line as the last one.

The GNU grep top called functions are (please ignore the .mcount entry,
which is a gprof side effect):

  %   cumulative   self  self total   
 time   seconds   secondscalls  ms/call  ms/call  name
 99.1   0.59 0.5911497 0.05 0.05  read [5]
  0.6   0.59 0.0011497 0.00 0.00  kwsexec [8]
  0.1   0.59 0.000  100.00%   .mcount (130)
  0.1   0.59 0.001 0.62   594.77  grepfile [3]
  0.1   0.60 0.0011496 0.00 0.00  memmove [9]
  0.0   0.60 0.004 0.03 0.03  memchr [10]
  0.0   0.60 0.0012541 0.00 0.00  memset [16]
  0.0   0.60 0.0011497 0.00 0.00  EGexecute [7]
  0.0   0.60 0.0011497 0.00 0.05  fillbuf [4]
  0.0   0.60 0.0011497 0.00 0.00  grepbuf [6]

The same exercise with BSD grep indeed runs *much* slower, and its top
called functions are:

  %   cumulative   self  self total   
 time   seconds   secondscalls  ms/call  ms/call  name
 67.3   8.60 8.600  100.00%   .mcount (105)
 22.2  11.44 2.8422778 0.12 0.12  __sys_read [9]
  3.6  11.91 0.46 373553797 0.00 0.00  grep_feof [11]
  3.5  12.35 0.45 373180306 0.00 0.00  fgetc [6]
  2.1  12.63 0.27 373180306 0.00 0.00  grep_fgetc [5]
  1.2  12.78 0.15   373491 0.00 0.01  grep_fgetln [4]
  0.0  12.78 0.01  464 0.01 0.01  memset [12]
  0.0  12.79 0.001 2.83  4184.69  procfile [3]
  0.0  12.79 0.005 0.42 0.69  free [14]
  0.0  12.79 0.0022778 0.00 0.00  __sread [607]

The testfile has 373180306 bytes, so you can see there is a tremendous
overhead in calling grep_fgetc, grep_feof and fgetc, which is done
separately for EACH byte in the file!

So my first quick fix attempt was to replace the home-grown grep_fgetln
with fgetln(3), which is in libc.  This does not support gzip and bzip2
files, but just to prove the point, it is enough.  It gave the following
profiling result:

  %   cumulative   self  self total   
 time   seconds   secondscalls  ms/call  ms/call  name
 80.0   0.68 0.68 5695 0.12 0.12  read [5]
 15.5   0.81 0.13   379178 0.00 0.00  memchr [8]
  3.2   0.83 0.030  100.00%   .mcount (102)
  0.6   0.84 0.00  451 0.01 0.01  memset [9]
  0.4   0.84 0.005 0.59 0.85  free [10]
  0.1   0.84 0.001 1.23   814.47  procfile [3]
  0.1   0.84 0.00   373491 0.00 0.00  fgetln [4]
  0.0   0.84 0.00   373491 0.00 0.00  grep_fgetln [29]
  0.0   0.84 0.0011379 0.00 0.00  memcpy [30]
  0.0   0.84 0.00   50 0.00 0.00  arena_run_alloc [36]

You can see there is much less overhead, and it is the regular read()
function that gets the most calls, though not yet as much as with GNU
grep.  However, this already gave MUCH better runtimes with Doug's trial
script:

  GNU grep
  Elapsed time: 57 seconds

  BSD grep (original)
  Elapsed time: 820 seconds  (~14.4x slower than GNU grep)

  BSD grep (quickfixed)
  Elapsed time: 115 seconds  (~2.0x slower than GNU grep)

It proves that getting rid of the fgetc's is certainly worthwhile, and I
have attached a more complete patch that:

- Replaces the horrendously inefficient grep_fgetln() with mostly the
  same implementation as the libc fgetln() function.
- Uses plain file descriptors instead of struct FILE, since the
  buffering is done manually anyway, and it makes it easier to support
  gzip and bzip2.
- Let the bzip2 reader just read the file as plain data, when the
  initial magic number doesn't match, mimicking the behaviour of GNU
  grep.

There is probably more room for optimization, but let's see if this
survives a bunch of tests first. :)
diff --git a/usr.bin/grep/fastgrep.c b/usr.bin/grep/fastgrep.c
index 5239a0e..2405ed8 100644
--- a/usr.bin/grep/fastgrep.c
+++ b/usr.bin/grep/fastgrep.c
@@ -198,7 +198,7 @@ fastcomp(fastgrep_t *fg, const char *pat)
 }
 
 int

Re: Official request: Please make GNU grep the default

2010-08-15 Thread Gabor Kovesdan

Em 2010.08.15. 21:07, Dag-Erling Smørgrav escreveu:

Ignore the first two lines (that's the profiling code itself).  Note
that the top five lines are all in stdio, and nothing else even shows up
on the radar.  I only included enough output to show where the regexp
code ranks; the complete output is attached.

I hate to suggest reinventing the wheel, but IMHO, this is clearly a
case where it would pay to use hand-rolled buffered input routines
instead of stdio.
   
Thank you very much for the valuable tests, I've already started to 
refactor this part but it will take some time and still it doesn't 
garantize that the performance will be the same as GNU's with this 
change. It may need more investigation, so I'm also already testing my 
patch to change the default and we'll see how the performance 
improvement progresses.


Gabor
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-15 Thread M. Warner Losh
In message: 894c8953-7f2f-486f-8701-a3dff65d7...@kientzle.com
Tim Kientzle t...@kientzle.com writes:
: 
: On Aug 15, 2010, at 1:56 AM, Alban Hertroys wrote:
: 
:  On 15 Aug 2010, at 3:12, Doug Barton wrote:
:  
:  (And before anyone bothers to reply saying Use pkg_info -O for that
:  I'll save you the trouble. My version is from 10-20% faster. Not sure
:  why, don't really care.) :)
:  
:  
:  Congrats for beating the performance of a(nother) utility in base, but - 
regardless of whether you'd use it in that case - doesn't that just indicate 
that pkg_info could use some performance improvements as well?
: 
: http://libpkg.googlecode.com/
: 
: This is David Forsyth's GSoC project to build a standard library
: for managing the on-disk package database.
: 
: His pkg_info implementation built on this library is in fact
: a lot faster than what's currently in the tree.  He's starting
: to prototype pkg_delete and pkg_add on the new library
: but there's still a lot of work to do.

heh, maybe he needs to do work on an upgrader too. :)

Warner
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-15 Thread Tim Kientzle
On Aug 15, 2010, at 12:49 PM, Dimitry Andric wrote:
 So my first quick fix attempt was to replace the home-grown grep_fgetln
 with fgetln(3), which is in libc.  This does not support gzip and bzip2
 files, but just to prove the point, it is enough.  It gave the following
 profiling result:

FYI:  libarchive has some pretty heavily-optimized
bulk I/O routines and handles automatic decompression
(including gzip, bzip2, lzma, xz, lzip, compress,
and soon uuencode).

There's a trick supported in libarchive now that
will let you just use it's automatic decompression
features on non-archive files (via format_raw).
Unfortunately, it provides binary blocks of data;
there's no nice line-reader interface.

There's an effort afoot to refactor libarchive
so that the stream I/O and compression/decompression
support is actually a separate library that should
be very useful for this sort of usage.  As part
of that, we plan to add some line-oriented
I/O features that should be noticeably more
efficient than stdio.

Cheers,

Tim

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-14 Thread Daniel Braniss
   This message is in MIME format.  The first part should be readable text,
   while the remaining parts are likely unreadable without MIME-aware tools.
 
 --56599777-398594934-1281714095=:35204
 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
 
 On Fri, 13 Aug 2010, Gabor Kovesdan wrote:
 
  Em 2010.08.13. 10:43, Doug Barton escreveu:
  My reason is simple, performance. While doing some portmaster work 
  recently I was regression testing some changes I made to the --index* 
  options and noticed that things were dramatically slower than the 
  last time I tested those features. Thinking that I had made a 
  programming mistake I dug into my code, and while the regexps that I 
  was using could be tuned for slightly better performance the problem 
  was not in my code.  I then installed textproc/gnugrep to compare, 
  and the differences were very dramatic using a highly pessimized test 
  case (finding a match on the last line of INDEX). The script I used 
  to test is at http://people.freebsd.org/~dougb/grep-time-trial.sh.txt 
  and a typical result was:
  
  GNU grep
  Elapsed time: 2 seconds
  
  BSD grep
  Elapsed time: 47 seconds
  
  Ok, I'll take care of this soon, and make GNU grep default, again with 
  a knob to build BSD grep. I agree with you that we cannot allow such a 
  big performance drawback but I my measures only showed significant 
  differences for very big searches and I didn't imagine that it could 
  add up to such a big diference. I'm sorry for the bad decision I took 
  making it default.
 
 This should trim some time off BSD grep.  It removes the lock/unlock for 
 each fgetc() by locking/unlocking the file once.  stdio can be slow.
 
 You probably want to replace flockfile() with ftrylockfile() if threads 
 will be involved at some point (threading or making a libgrep that may 
 be used in a threaded process).

why would you want to lock a file for reading anyways?
BTW, back in the jurasic age, ATT/Bell had this poster: 
Reach out and GREP someone!
danny



___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-14 Thread Sam Fourman Jr.
 BSD grep
 Elapsed time: 47 seconds

 what about optimizing BSD grep instead?

I think this is reasonable, leave BSD grep default for a few more weeks, and
work on performance enhancements. I agree that changing the default back
for a RELEASE is probably a good idea, but the exposure to wider testing
while focusing on performance, can't hurt much can it?

-- 

Sam Fourman Jr.
Fourman Networks
http://www.fourmannetworks.com
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-14 Thread Julian H. Stacey
 why would you want to lock a file for reading anyways?

Does current bsdgrep read lock by default ?
If so, it would be better off by default, enabled by an option.
8.0-RELEASE man grep (gnu) does not mention locking.

Cheers,
Julian
-- 
Julian Stacey: BSD Unix Linux C Sys Eng Consultants Munich http://berklix.com
 Mail plain text,  Not HTML, quoted-printable  base 64 dumped with spam.
Avoid top posting, It cripples itemised cumulative responses.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-14 Thread Joel Dahl
On 14-08-2010  4:35, Sam Fourman Jr. wrote:
  BSD grep
  Elapsed time: 47 seconds
 
  what about optimizing BSD grep instead?
 
 I think this is reasonable, leave BSD grep default for a few more weeks, and
 work on performance enhancements. I agree that changing the default back
 for a RELEASE is probably a good idea, but the exposure to wider testing
 while focusing on performance, can't hurt much can it?

I agree, keep bsdgrep as default for a while and focus on the performance
problems.  This is CURRENT after all, and 9.0 isn't anywhere near release
yet (afaik).

--
Joel
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-14 Thread Sean C. Farley

On Sat, 14 Aug 2010, Julian H. Stacey wrote:


why would you want to lock a file for reading anyways?


Does current bsdgrep read lock by default ?
If so, it would be better off by default, enabled by an option.
8.0-RELEASE man grep (gnu) does not mention locking.


bsdgrep in -current does lock via the call to fgetc().  That is why I 
suggested using flockfile/getchar_unlocked+/funlockfile instead.  Other 
unlocked functions would also be useful, i.e., feof_unlocked(). 
Avoiding fgetc() does not completely solve the speed issue, yet it 
certainly helps.


Just for reference:  older bsdgrep used fgetln().

Sean
--
s...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-14 Thread Daniel Braniss
 On Sat, 14 Aug 2010, Julian H. Stacey wrote:
 
  why would you want to lock a file for reading anyways?
 
  Does current bsdgrep read lock by default ?
  If so, it would be better off by default, enabled by an option.
  8.0-RELEASE man grep (gnu) does not mention locking.
 
 bsdgrep in -current does lock via the call to fgetc().  That is why I 
 suggested using flockfile/getchar_unlocked+/funlockfile instead.  Other 
 unlocked functions would also be useful, i.e., feof_unlocked(). 
 Avoiding fgetc() does not completely solve the speed issue, yet it 
 certainly helps.
 
 Just for reference:  older bsdgrep used fgetln().

let me rephrase the question:
why would you want to lock a file for reading anyways??

there is no real benefit that I can see in locking a file for searching
a pattern. On a single file the overhead is irrelevant, but for 'grep -r?'


cheers,
danny


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-14 Thread Kostik Belousov
On Sat, Aug 14, 2010 at 06:23:55PM +0300, Daniel Braniss wrote:
  On Sat, 14 Aug 2010, Julian H. Stacey wrote:
  
   why would you want to lock a file for reading anyways?
  
   Does current bsdgrep read lock by default ?
   If so, it would be better off by default, enabled by an option.
   8.0-RELEASE man grep (gnu) does not mention locking.
  
  bsdgrep in -current does lock via the call to fgetc().  That is why I 
  suggested using flockfile/getchar_unlocked+/funlockfile instead.  Other 
  unlocked functions would also be useful, i.e., feof_unlocked(). 
  Avoiding fgetc() does not completely solve the speed issue, yet it 
  certainly helps.
  
  Just for reference:  older bsdgrep used fgetln().
 
 let me rephrase the question:
   why would you want to lock a file for reading anyways??
 
 there is no real benefit that I can see in locking a file for searching
 a pattern. On a single file the overhead is irrelevant, but for 'grep -r?'
Locked is not a file, but FILE. It is a measure to establish consistency
in the stdio structures in the multithreaded environment, and not a
file-system level lock of any kind. See the description of the flockfile()
in the SUSv4.

On the other hand, the overhead of locking in !mt process for FILE in
our libc should be unmeasurable.


pgp4rUts486ny.pgp
Description: PGP signature


Re: Official request: Please make GNU grep the default

2010-08-14 Thread Andrey Chernov
On Fri, Aug 13, 2010 at 01:43:16AM -0700, Doug Barton wrote:
 Gabor,
 
 I hope at this point it goes without saying that I have a lot of respect

I am Nth on this. Although I do a lot of l10n work in the beautefull less 
bureocracy FreeBSD time and very appreciate what Gabor did, the problem is 
in other area: BSD grep simple not ready for wide testers circle and needs 
to be polished further. I talk not about it speed alone but about all its 
bugs revealed right after import. Let few ethusiast who have spare time 
for experiments run at at now. What we need is rock stable grep, and 
changing regex library for speed don't add anything here. From my point of 
view importing of the latest GNU grep instead would have more benefits. 
And we need to pend BSD grep with all its can of bugs for a while, until 
faster regex will be imported (disadvantages must be balanced with 
somewhat sweet). It is not my final verdict and we'll can return to it 
after all serious bugs will be eliminated. Right now we'll have f.e. 
numerous report agaist wrong greeping dirs which are unacceptable for any 
development system including even non-production one like -current.

-- 
http://ache.pp.ru/
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-14 Thread Gabor Kovesdan

Em 2010.08.13. 10:52, Roman Divacky escreveu:

what about optimizing BSD grep instead?
   

[... picking one mail from the many that suggest this ...]

The problem is that optimization is not that trivial. I think the 
bottleneck is the regex library because:
1, BSD grep is so simple. There may be optimization opportunities and 
they may help but not that much. But if someone can check the code and 
make some suggestions, of course, I'll track those down and try to get 
more of it.
2, GNU grep uses internal optimizations to get that performance. I think 
it's a wrong approach because the regex library itself should be 
optimized instead to keep BSD grep clean and simple and to provide the 
same efficiency for all utilities that are linked to the regex library. 
Our libc-regex is definitely need to be replaced at some point in the 
future but that's a more complex item. See the following references:

http://wiki.freebsd.org/BSDgrep
http://wiki.freebsd.org/Regex

If you can make suggestions to make BSD grep faster without touching the 
regex library please do it and if we can get a performance that is 
acceptable, we can reconsider leaving it the default if nobody objects. 
I'll check Sean's suggestions and make some measures how much does that 
help.


Gabor
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-14 Thread Gabor Kovesdan

Em 2010.08.14. 17:53, Andrey Chernov escreveu:

On Fri, Aug 13, 2010 at 01:43:16AM -0700, Doug Barton wrote:
   

Gabor,

I hope at this point it goes without saying that I have a lot of respect
 

I am Nth on this. Although I do a lot of l10n work in the beautefull less
bureocracy FreeBSD time and very appreciate what Gabor did, the problem is
in other area: BSD grep simple not ready for wide testers circle and needs
to be polished further. I talk not about it speed alone but about all its
bugs revealed right after import. Let few ethusiast who have spare time
for experiments run at at now. What we need is rock stable grep, and
changing regex library for speed don't add anything here. From my point of
view importing of the latest GNU grep instead would have more benefits.
And we need to pend BSD grep with all its can of bugs for a while, until
faster regex will be imported (disadvantages must be balanced with
somewhat sweet). It is not my final verdict and we'll can return to it
after all serious bugs will be eliminated. Right now we'll have f.e.
numerous report agaist wrong greeping dirs which are unacceptable for any
development system including even non-production one like -current.
   
Yes, I'm sorry for my slow reaction, I got a flu some time ago and that 
prevented me from fixing the bugs earlier. I have several fixes in my 
working copy, which are being discussed with my mentor. Probably, today 
or tomorrow they will be committed.


Gabor

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-14 Thread Dimitry Andric
On 2010-08-14 17:53, Andrey Chernov wrote:
 From my point of 
 view importing of the latest GNU grep instead would have more benefits. 

Unfortunately GNU grep switched to GPLv3 as of version 2.5.3.  The last
GPLv2 version of grep is 2.5.1, which is already in our tree.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-14 Thread M. Warner Losh
In message: 4c66c0a4.3000...@freebsd.org
Gabor Kovesdan ga...@freebsd.org writes:
: Yes, I'm sorry for my slow reaction, I got a flu some time ago and
: that prevented me from fixing the bugs earlier. I have several fixes
: in my working copy, which are being discussed with my
: mentor. Probably, today or tomorrow they will be committed.

I don't see a huge issue with it being default for a while, so your
slowness due to flu is OK, imho.  This is -current after all, and
bumps in the road are to be expected.  Reverting to gnu-grep being
default is likely good until you can resolve the issues that you've
talked about in other posts.  IMHO, it's unlikely additional testing
and exposure will, at this time, give you any new information.  Once
things have been tuned up, problems fixed, etc, it would likely make
sense to try this again (with special attention to the issues raised
this time, since there's good reason to believe people will try them
first thing if the switch is thrown back to default again).

I hope you're leaving it in the tree as an option, however, since BSD
grep is good enough for many users of grep.  They have been using it
for years and years without hassle as a port because their grep needs
are simple, and performance requirements modest.  For these folks,
license is the deciding factor.

Warner
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-14 Thread Ivan Voras
On 13.8.2010 11:34, Doug Barton wrote:
 On 08/13/2010 02:08, Gabor Kovesdan wrote:
 Ok, I'll take care of this soon, and make GNU grep default, again with a
 knob to build BSD grep. I agree with you that we cannot allow such a big
 performance drawback but I my measures only showed significant
 differences for very big searches and I didn't imagine that it could add
 up to such a big diference.
 
 To be fair, I didn't notice a performance difference either until I
 started revamping this code that calls my parse_index() for every single
 installed port. Given a 22,042 line INDEX file, that's enough to add up
 to something noticeable.

Wouldn't this might, just might, be an indication that one of the
following is true:

1) writing complex performance-sensitive utilities in shell code simply
sucks because it's too sensitive to issues like borderline behaviours of
utilities

2) implementing complex data structures that might save you reparsing on
the order of complexity of O(npkg * nlines) is too demanding in shell
code and this means it's not exactly the best tool for the job

?

This post brought to you by The Legue for Retiring Shell Scripts Longer
Than 100 Lines - our motto is Fighting against the tide - why not? :)


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-14 Thread Doug Barton
Ivan,

I know that you mean this at least semi-humorously, however I'm going to
provide a dead-serious reply below.

On 08/14/2010 16:04, Ivan Voras wrote:
 On 13.8.2010 11:34, Doug Barton wrote:

 To be fair, I didn't notice a performance difference either until I
 started revamping this code that calls my parse_index() for every single
 installed port. Given a 22,042 line INDEX file, that's enough to add up
 to something noticeable.
 
 Wouldn't this might, just might, be an indication that one of the
 following is true:
 
 1) writing complex performance-sensitive utilities in shell code simply
 sucks because it's too sensitive to issues like borderline behaviours of
 utilities

As someone who used to make a pretty good living writing highly
performance-oriented CGI applications in perl I would agree with you
here to some extent. The original version of what could reasonably be
called an antecedent to what is now portmaster was 102 lines, but only
49 were actual code (the rest were comments or whitespace). The current
behemoth (my dev version that is) is 3,702 lines, 3,069 of which is
actual code. So yes, there is an element of insanity here (and yes, the
current code is under-commented, for those keeping score at home).

 2) implementing complex data structures that might save you reparsing on
 the order of complexity of O(npkg * nlines) is too demanding in shell
 code and this means it's not exactly the best tool for the job

Again, partial agreement. One of the reasons I resisted INDEX support
for so long was that my original idea of it was to do exactly what you
suggest here, parse it once then look up the data internally. However
even though I _can_ do this in shell it actually makes the performance
worse since now I've got his huge memory footprint to pass around every
time portmaster calls itself recursively (which for those who don't know
is portmaster's entire model of operation).

BUT, none of that is germane to my actual argument. I was very careful
to NOT say, BSD grep is slow, which screws up portmaster, so the
default has to change. What I said was, BSD grep is anywhere from 6 to
15 TIMES slower than GNU grep in all cases, so the default needs to
change.

If you insist on applying that directly to portmaster, I will say that
implementing it in shell is a very conscious design tradeoff. If I
hadn't already observed the hilarity ensuing around portupgrade/ruby
updates, and I was sitting down today to design a ports management
tool from scratch, I'd use perl, no question. Even without its own db
everything that portmaster does could be done more easily and faster in
perl. However, even granting THIS point the fact remains that the
previous status quo was 1) a text file data store with a known, (mostly)
easy to parse structure, and 2) an easy to use, fast tool to access that
data with.

Your line of reasoning boils down to, You shouldn't object to the new
tool being slower because you are doing things you shouldn't have been
doing with the old tool in the first place. Even IF I were willing to
grant you that point inre portmaster (I'm not, but let's just say ...)
are you willing to tell every user of grep for every other purpose
(including all the many places it's used in the base, like /etc,
/etc/rc.d, the build ), You have to put up with a slow grep because
?


Doug

-- 

Improve the effectiveness of your Internet presence with
a domain name makeover!http://SupersetSolutions.com/

Computers are useless. They can only give you answers.
-- Pablo Picasso

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-14 Thread Doug Barton
On 08/14/2010 09:10, Gabor Kovesdan wrote:
 Em 2010.08.13. 10:52, Roman Divacky escreveu:
 what about optimizing BSD grep instead?

 [... picking one mail from the many that suggest this ...]

... and responding to your message for the same reason ... :)

[Snipping the bit about why it's a hard problem not likely to be solved
in the next few weeks.]

 If you can make suggestions to make BSD grep faster without touching the
 regex library please do it and if we can get a performance that is
 acceptable, we can reconsider leaving it the default if nobody objects.
 I'll check Sean's suggestions and make some measures how much does that
 help.

As I posted to you privately, the results I got with JUST Sean's patch
on the test case I posted previously were:

GNU grep
Elapsed time: 2 seconds

BSD grep
Elapsed time: 31 seconds

With the more complete patch you provided me privately I was able to
shave one more second off the BSD grep case. So that's a lot better than
the 47 seconds it was previously, but still a long way to go.

I also have a new test case script which actually IS something that
portmaster does, and in fact is the ugliest and most difficult search
that it has to perform, finding an installed port based on grep'ing
+CONTENTS files for an ORIGIN pattern:

http://people.freebsd.org/~dougb/grep-time-trial-2.sh.txt

Typical times for me, with 489 ports:

GNU grep
Elapsed time: 3 seconds

BSD grep
Elapsed time: 17 seconds

(And before anyone bothers to reply saying Use pkg_info -O for that
I'll save you the trouble. My version is from 10-20% faster. Not sure
why, don't really care.) :)

For those whose line of reasoning was, But this is -current, so it's ok
for things to be screwed up my response is, only to a point. In the
real world, people who don't care about performance and/or don't use
grep in interesting and imaginative ways aren't going to mind BSD grep
as the default, but also don't provide really useful test cases. It
works fine up to the 80'th percentile has already been demonstrated by
various pointyhat runs, etc.

Sophisticated users who DO care about performance and/or DO use grep in
interesting and creative ways will put up with the breakage for a while,
then switch their make.conf to use GNU grep, usually silently. Therefore
they stop providing ANY test data at all, never mind useful.

However, given the very small number of people who actually test
-current in the first place, the population I am really concerned about
is the group of people who casually try -current, see that It's really
slow sometimes, don't/can't figure out why, and then get discouraged
and just stop using -current at all. Now you might reply, Great! Good
riddance to those dilettantes! However I believe rather strongly that
we want to make the -current environment MORE friendly to users, even
casual users. Who do you think is actually going to test What will
become 9.0-RELEASE if we don't?

OTOH, leaving it in, but switching the default gives those who are
highly motivated to test and/or improve it a very easy way to do so,
without causing problems for anyone else. It also makes it that much
easier to make it the default again when it IS ready for prime time.

Meanwhile, in response to everyone else, a simple question. How many
TIMES (not percentages, multiples) slower is it Ok for BSD grep to be in
comparison to GNU grep and stay the default?


Doug

-- 

Improve the effectiveness of your Internet presence with
a domain name makeover!http://SupersetSolutions.com/

Computers are useless. They can only give you answers.
-- Pablo Picasso

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-14 Thread Steve Kargl
On Sat, Aug 14, 2010 at 06:12:34PM -0700, Doug Barton wrote:
 
 Sophisticated users who DO care about performance and/or DO use grep in
 interesting and creative ways will put up with the breakage for a while,
 then switch their make.conf to use GNU grep, usually silently. Therefore
 they stop providing ANY test data at all, never mind useful.
 

Whereas switching the default back to GNU grep *guarantees*
neither unsophisticated nor sophosticated user will test
BSD grep.

It seems that you are letting a poor design decision with
respect to portmaster impair others contribution to FreeBSD.
I suspect that you could have added a USE_GREP knob to
the port infrastructure and updated your port to use
ports/textproc/gnugrep in the time that you have used to
post and reply here.

-- 
Steve
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-14 Thread Doug Barton

On 08/14/2010 18:34, Steve Kargl wrote:

On Sat, Aug 14, 2010 at 06:12:34PM -0700, Doug Barton wrote:


Sophisticated users who DO care about performance and/or DO use
grep in interesting and creative ways will put up with the breakage
for a while, then switch their make.conf to use GNU grep, usually
silently. Therefore they stop providing ANY test data at all, never
mind useful.



Whereas switching the default back to GNU grep


... until the performance is acceptably comparable ...


*guarantees* neither unsophisticated nor sophosticated user will
test BSD grep.


... except for those who are already highly motivated to do so.


It seems that you are letting a poor design decision with respect to
portmaster impair others contribution to FreeBSD.


I was hoping to avoid commenting on this, but my feeling (and I would be 
glad to be wrong about it) from reading the responses is that there is a 
fair degree of knee-jerk reaction to what seems to be There's big bad 
dougb picking on some poor innocent developer again! going on here; and 
criticizing MY development skills either A) makes you feel better, B) 
makes you think that you're dishing out to me a little of what you think 
I'm dishing out to Gabor, or both. Well fine, hope you're feeling good 
about yourself, and you made me feel really small and bad. Good on you.


Meanwhile, substitute my stupid way of doing things and defective 
programming skills for any other workload of your choice. Are you really 
going to tell me you've never had to grep a 20,000 line file? Are you 
really going to tell me that you've never had to grep something the size 
of the FreeBSD source and/or ports trees for all the instances of $FOO? 
And you didn't answer either of the questions I had in the post you 
responded to, so let me make it easier for you.


Our default grep should be significantly slower than the old grep because:

I think that new grep which is  times slower than the old grep is 
still in the acceptable range.



Doug

--

Improve the effectiveness of your Internet presence with
a domain name makeover!http://SupersetSolutions.com/

Computers are useless. They can only give you answers.
-- Pablo Picasso

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-14 Thread Randy Bush
 I think that new grep which is  times slower than the old grep is 
 still in the acceptable range.

aol

you are forcing more time to be spent on the mailing list than working
the code.  and many of us have to care about the license issue.

randy
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-14 Thread Doug Barton

On 08/14/2010 19:02, Randy Bush wrote:

I think that new grep which is  times slower than the old grep is
still in the acceptable range.


aol

you are forcing more time to be spent on the mailing list than working
the code.


Not my intention at all, but I feel your pain. I really thought this was 
a slam dunk or I wouldn't have even brought it up.



and many of us have to care about the license issue.


But you're covered on the license issue. The code is in the base 
already, and no one is suggesting removing it (as Warner pointed out 
earlier today).


What I REALLY don't want to see happen here is a religious fervor about 
the license is more important than performance situation. If I wanted 
that kind of drama I'd use linux. (OTOH religious fervor about proving 
me wrong is almost as amusing as it is unfortunate, but that doesn't 
negatively impact our users.)


Meanwhile, I'm pretty confident this is my last post on the issue of 
changing the default. I've said everything I wanted to say, and 
obviously it's not as clear cut as I thought it was.


Doug

--

Improve the effectiveness of your Internet presence with
a domain name makeover!http://SupersetSolutions.com/

Computers are useless. They can only give you answers.
-- Pablo Picasso

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


TIMEOUT (was: Re: Official request: Please make GNU grep the default)

2010-08-14 Thread Colin Percival
Hi all,

Over the past 18 hours, I've received 22 emails in this thread.

In email number 5, sent a mere 25 minutes after the thread started, gabor@
said that he agreed that the performance penalty in BSD grep compared to
GNU grep was excessive and that he was going to revert back to having GNU
grep as the default.

Why are we still discussing this?  If and when gabor@ (or someone else) has
improved BSD grep performance and thinks that it's time to flip the switch
back again, I'm sure there will be ample opportunity for everybody to run
their favourite grep benchmarks, report numbers, and discuss the performance
differences before BSD grep is (re-)made the default.

-- 
Colin Percival
Security Officer, FreeBSD | freebsd.org | The power to serve
Founder / author, Tarsnap | tarsnap.com | Online backups for the truly paranoid
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-14 Thread Andrew Thompson
On 15 August 2010 13:55, Doug Barton do...@freebsd.org wrote:
 Our default grep should be significantly slower than the old grep because:

 I think that new grep which is  times slower than the old grep is still
 in the acceptable range.


I think that new grep which is 1000 times slower than the old grep is
still in the acceptable range.

Can we drop this now.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-14 Thread Justin Hibbits

My $0.02 may not be worth much, but ...

On Aug 14, 2010, at 9:55 PM, Doug Barton wrote:
I was hoping to avoid commenting on this, but my feeling (and I  
would be glad to be wrong about it) from reading the responses is  
that there is a fair degree of knee-jerk reaction to what seems to  
be There's big bad dougb picking on some poor innocent developer  
again! going on here; and criticizing MY development skills either  
A) makes you feel better, B) makes you think that you're dishing out  
to me a little of what you think I'm dishing out to Gabor, or both.  
Well fine, hope you're feeling good about yourself, and you made me  
feel really small and bad. Good on you.


Meanwhile, substitute my stupid way of doing things and defective  
programming skills for any other workload of your choice. Are you  
really going to tell me you've never had to grep a 20,000 line file?  
Are you really going to tell me that you've never had to grep  
something the size of the FreeBSD source and/or ports trees for all  
the instances of $FOO? And you didn't answer either of the questions  
I had in the post you responded to, so let me make it easier for you.


Our default grep should be significantly slower than the old grep  
because:


I think that new grep which is  times slower than the old grep  
is still in the acceptable range.



Doug



Why not perform a run or two with portmaster and bsdgrep with  
profiling, and send Gabor those results?  It would certainly help  
pinpoint the slowdown, and you would have something to point to to say  
X in bsdgrep is slow, so we should switch back to GNU grep until  
that's fixed rather than just bsdgrep is slow, fix it.  Like most  
people here, I agree that such a performance difference is rather  
unacceptable for a production system, but since this is -current,  
fixing the performance issue should be done rather than casting the  
whole thing aside until it's up to par.


There are 28 messages in this thread already, with no consensus in  
sight.


- Justin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-13 Thread Gabor Kovesdan

Em 2010.08.13. 10:43, Doug Barton escreveu:

My reason is simple, performance. While doing some portmaster work
recently I was regression testing some changes I made to the --index*
options and noticed that things were dramatically slower than the last
time I tested those features. Thinking that I had made a programming
mistake I dug into my code, and while the regexps that I was using could
be tuned for slightly better performance the problem was not in my code.
I then installed textproc/gnugrep to compare, and the differences were
very dramatic using a highly pessimized test case (finding a match on
the last line of INDEX). The script I used to test is at
http://people.freebsd.org/~dougb/grep-time-trial.sh.txt and a typical
result was:

GNU grep
Elapsed time: 2 seconds

BSD grep
Elapsed time: 47 seconds
   
Ok, I'll take care of this soon, and make GNU grep default, again with a 
knob to build BSD grep. I agree with you that we cannot allow such a big 
performance drawback but I my measures only showed significant 
differences for very big searches and I didn't imagine that it could add 
up to such a big diference. I'm sorry for the bad decision I took making 
it default.


Gabor

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-13 Thread Roman Divacky
On Fri, Aug 13, 2010 at 01:43:16AM -0700, Doug Barton wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256
 
 Gabor,
 
 I hope at this point it goes without saying that I have a lot of respect
 for the work you've done on BSD grep, and I've already told you that I
 think you're very courageous for taking the project on. I've been
 testing and evaluating it for some time now, and I think I've given it a
 fair trial. You've done a fairly good job of responding to bug reports,
 and I understand that the exposure BSD grep has received as the default
 in HEAD has been very valuable in exposing additional areas that need
 work. However, with all that in mind I am officially asking you to
 please change the default in HEAD to GNU grep. (Note, I am _not_ asking
 you to remove BSD grep from the tree, just to change the default.)
 
 My reason is simple, performance. While doing some portmaster work
 recently I was regression testing some changes I made to the --index*
 options and noticed that things were dramatically slower than the last
 time I tested those features. Thinking that I had made a programming
 mistake I dug into my code, and while the regexps that I was using could
 be tuned for slightly better performance the problem was not in my code.
 I then installed textproc/gnugrep to compare, and the differences were
 very dramatic using a highly pessimized test case (finding a match on
 the last line of INDEX). The script I used to test is at
 http://people.freebsd.org/~dougb/grep-time-trial.sh.txt and a typical
 result was:
 
 GNU grep
 Elapsed time: 2 seconds
 
 BSD grep
 Elapsed time: 47 seconds

what about optimizing BSD grep instead?
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-13 Thread Doug Barton
On 08/13/2010 02:08, Gabor Kovesdan wrote:
 Ok, I'll take care of this soon, and make GNU grep default, again with a
 knob to build BSD grep. I agree with you that we cannot allow such a big
 performance drawback but I my measures only showed significant
 differences for very big searches and I didn't imagine that it could add
 up to such a big diference.

To be fair, I didn't notice a performance difference either until I
started revamping this code that calls my parse_index() for every single
installed port. Given a 22,042 line INDEX file, that's enough to add up
to something noticeable.

 I'm sorry for the bad decision I took making it default.

As I've said all along, I don't think you made a bad decision in having
it as the default to start. It was certainly different than what we
usually do with new utilities, but that didn't make the decision wrong.
I asked you at the time to keep an open mind about the possibility that
the default might need to be flipped, and I appreciate you being
reasonable about it now.


Doug

-- 

Improve the effectiveness of your Internet presence with
a domain name makeover!http://SupersetSolutions.com/

Computers are useless. They can only give you answers.
-- Pablo Picasso

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-13 Thread Anonymous
Doug Barton do...@freebsd.org writes:

[...]
 My reason is simple, performance. While doing some portmaster work
 recently I was regression testing some changes I made to the --index*
 options and noticed that things were dramatically slower than the last
 time I tested those features. Thinking that I had made a programming
 mistake I dug into my code, and while the regexps that I was using could
 be tuned for slightly better performance the problem was not in my code.
 I then installed textproc/gnugrep to compare, and the differences were
 very dramatic using a highly pessimized test case (finding a match on
 the last line of INDEX). The script I used to test is at
 http://people.freebsd.org/~dougb/grep-time-trial.sh.txt and a typical
 result was:

 GNU grep
 Elapsed time: 2 seconds

 BSD grep
 Elapsed time: 47 seconds

Why not allow people to use grep(1) from ports in portmaster, e.g. by
not overriding user-specified PATH?
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-13 Thread Matthias Andree

Gabor Kovesdan wrote on 2010-08-13:


Em 2010.08.13. 10:43, Doug Barton escreveu:

My reason is simple, performance. While doing some portmaster work
recently I was regression testing some changes I made to the --index*
options and noticed that things were dramatically slower than the last
time I tested those features. Thinking that I had made a programming
mistake I dug into my code, and while the regexps that I was using could
be tuned for slightly better performance the problem was not in my code.
I then installed textproc/gnugrep to compare, and the differences were
very dramatic using a highly pessimized test case (finding a match on
the last line of INDEX). The script I used to test is at
http://people.freebsd.org/~dougb/grep-time-trial.sh.txt and a typical
result was:

GNU grep
Elapsed time: 2 seconds

BSD grep
Elapsed time: 47 seconds

Ok, I'll take care of this soon, and make GNU grep default, again with a  
knob to build BSD grep. I agree with you that we cannot allow such a big  
performance drawback but I my measures only showed significant  
differences for very big searches and I didn't imagine that it could add  
up to such a big diference. I'm sorry for the bad decision I took making  
it default.


Without knowing any of the details (I am not using 9-CURRENT), Gabor, I  
suggest that you check the documentation around Google's RE2 library  
(which is in C++); there are quite a few bits of information relating to  
(including worst-case) performance of regexp matchers, both directly in  
the re2 documentation, as well as indirect through links and references.   
Might be worth a read, together with profiling Doug's test case if he  
could tell you how to reproduce those.


--
Matthias Andree
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-13 Thread Matthias Andree

I wrote:


Might be worth a read, together with profiling Doug's test case if he
could tell you how to reproduce those.


Make that since he has provided the means to reproduce those. I had  
read, but not realized, Doug uploaded the test case.


--
Matthias Andree
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-13 Thread Gabor Kovesdan

Em 2010.08.13. 13:09, Matthias Andree escreveu:

Gabor Kovesdan wrote on 2010-08-13:


Em 2010.08.13. 10:43, Doug Barton escreveu:

My reason is simple, performance. While doing some portmaster work
recently I was regression testing some changes I made to the --index*
options and noticed that things were dramatically slower than the last
time I tested those features. Thinking that I had made a programming
mistake I dug into my code, and while the regexps that I was using 
could
be tuned for slightly better performance the problem was not in my 
code.

I then installed textproc/gnugrep to compare, and the differences were
very dramatic using a highly pessimized test case (finding a match on
the last line of INDEX). The script I used to test is at
http://people.freebsd.org/~dougb/grep-time-trial.sh.txt and a typical
result was:

GNU grep
Elapsed time: 2 seconds

BSD grep
Elapsed time: 47 seconds

Ok, I'll take care of this soon, and make GNU grep default, again 
with a knob to build BSD grep. I agree with you that we cannot allow 
such a big performance drawback but I my measures only showed 
significant differences for very big searches and I didn't imagine 
that it could add up to such a big diference. I'm sorry for the bad 
decision I took making it default.


Without knowing any of the details (I am not using 9-CURRENT), Gabor, 
I suggest that you check the documentation around Google's RE2 library 
(which is in C++); there are quite a few bits of information relating 
to (including worst-case) performance of regexp matchers, both 
directly in the re2 documentation, as well as indirect through links 
and references.  Might be worth a read, together with profiling Doug's 
test case if he could tell you how to reproduce those.


Thanks, Matthias. I haven't looked deeply at this but iirc it uses 
Perl-syntax. We need an efficient, wchar-aware, POSIX(ish) regex library 
with a good license and atm only TRE conforms to these criteria. 
Besides, we need GNU-style regex support, which will have to be added to 
TRE before we can replace our libc-regex.


Gabor
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-13 Thread Gabor Kovesdan

Em 2010.08.13. 13:33, Anonymous escreveu:

Doug Bartondo...@freebsd.org  writes:

[...]
   

My reason is simple, performance. While doing some portmaster work
recently I was regression testing some changes I made to the --index*
options and noticed that things were dramatically slower than the last
time I tested those features. Thinking that I had made a programming
mistake I dug into my code, and while the regexps that I was using could
be tuned for slightly better performance the problem was not in my code.
I then installed textproc/gnugrep to compare, and the differences were
very dramatic using a highly pessimized test case (finding a match on
the last line of INDEX). The script I used to test is at
http://people.freebsd.org/~dougb/grep-time-trial.sh.txt and a typical
result was:

GNU grep
Elapsed time: 2 seconds

BSD grep
Elapsed time: 47 seconds
 

Why not allow people to use grep(1) from ports in portmaster, e.g. by
not overriding user-specified PATH?
   
It would be a working solution but having seen the performance issue, it 
may also cause troubles elsewhere.


Gabor
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Official request: Please make GNU grep the default

2010-08-13 Thread Sean C. Farley

On Fri, 13 Aug 2010, Gabor Kovesdan wrote:


Em 2010.08.13. 10:43, Doug Barton escreveu:
My reason is simple, performance. While doing some portmaster work 
recently I was regression testing some changes I made to the --index* 
options and noticed that things were dramatically slower than the 
last time I tested those features. Thinking that I had made a 
programming mistake I dug into my code, and while the regexps that I 
was using could be tuned for slightly better performance the problem 
was not in my code.  I then installed textproc/gnugrep to compare, 
and the differences were very dramatic using a highly pessimized test 
case (finding a match on the last line of INDEX). The script I used 
to test is at http://people.freebsd.org/~dougb/grep-time-trial.sh.txt 
and a typical result was:


GNU grep
Elapsed time: 2 seconds

BSD grep
Elapsed time: 47 seconds

Ok, I'll take care of this soon, and make GNU grep default, again with 
a knob to build BSD grep. I agree with you that we cannot allow such a 
big performance drawback but I my measures only showed significant 
differences for very big searches and I didn't imagine that it could 
add up to such a big diference. I'm sorry for the bad decision I took 
making it default.


This should trim some time off BSD grep.  It removes the lock/unlock for 
each fgetc() by locking/unlocking the file once.  stdio can be slow.


You probably want to replace flockfile() with ftrylockfile() if threads 
will be involved at some point (threading or making a libgrep that may 
be used in a threaded process).


Sean
--
s...@freebsd.orgIndex: file.c
===
--- file.c  (revision 210862)
+++ file.c  (working copy)
@@ -74,7 +74,7 @@
 
switch (filebehave) {
case FILE_STDIO:
-   return (fgetc(f-f));
+   return (getc_unlocked(f-f));
case FILE_GZIP:
return (gzgetc(f-gzf));
case FILE_BZIP:
@@ -189,6 +189,7 @@
f = grep_malloc(sizeof *f);
 
if ((f-f = fdopen(STDIN_FILENO, r)) != NULL) {
+   flockfile(f-f);
f-stdin = true;
return (f);
}
@@ -238,6 +239,7 @@
 
switch (filebehave) {
case FILE_STDIO:
+   funlockfile(f-f);
fclose(f-f);
break;
case FILE_GZIP:
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org