Re: grep: convert fgetln to getline

2019-01-31 Thread Gilles Chehade
On Thu, Jan 31, 2019 at 03:10:53PM +0200, Lauri Tirkkonen wrote:
> On Wed, Jan 30 2019 20:32:50 -0500, Ted Unangst wrote:
> > Thanks for digging into this. I went ahead and committed your diff.
> 
> Thanks for committing it.
> 
> You know, having seen fgetln's allocation strategy and its usage of the
> stdio internals, I couldn't help but wonder if it's something that could
> be eventually removed entirely. It's used in a bunch of places in-tree
> for sure, but not too many to convert, I think; and I've already done a
> few...
> 
> It's not exactly a serious suggestion at this point; I realize there
> probably is third-party software too that uses this function (Linuxes
> provide compat for it in libbsd; there must be a reason for that). I'm
> speaking as someone who's been removing a bunch of crap from that other
> OS I mentioned, so that's my reason for this line of thinking slash
> pipe-dreaming ;)
> 

quite amazing to watch tedu talk to another tedu :-|

-- 
Gilles Chehade @poolpOrg

https://www.poolp.org tip me: https://paypal.me/poolpOrg



Re: grep: convert fgetln to getline

2019-01-31 Thread Lauri Tirkkonen
On Wed, Jan 30 2019 20:32:50 -0500, Ted Unangst wrote:
> Thanks for digging into this. I went ahead and committed your diff.

Thanks for committing it.

You know, having seen fgetln's allocation strategy and its usage of the
stdio internals, I couldn't help but wonder if it's something that could
be eventually removed entirely. It's used in a bunch of places in-tree
for sure, but not too many to convert, I think; and I've already done a
few...

It's not exactly a serious suggestion at this point; I realize there
probably is third-party software too that uses this function (Linuxes
provide compat for it in libbsd; there must be a reason for that). I'm
speaking as someone who's been removing a bunch of crap from that other
OS I mentioned, so that's my reason for this line of thinking slash
pipe-dreaming ;)

-- 
Lauri Tirkkonen | lotheac @ IRCnet



Re: grep: convert fgetln to getline

2019-01-30 Thread Ted Unangst
Lauri Tirkkonen wrote:
> In summary: sorry for the bug I introduced - please apply the diff above
> to fix memory mapping files in grep. As for the fgetln->getline
> conversion, I think this result should show that fgetln is not
> particularly smart about how it's calling mmap -- getline seems better.

Thanks for digging into this. I went ahead and committed your diff.

Now we wait for people to complain...



Re: grep: convert fgetln to getline

2019-01-27 Thread Lauri Tirkkonen
On Thu, Jan 24 2019 17:03:57 -0700, Theo de Raadt wrote:
> Scott Cheloha  wrote:
> 
> > > On Jan 24, 2019, at 06:19, Lauri Tirkkonen  wrote:
> > > 
> > > [...]
> > >  
> > > I haven't done any actual measurements though, so it's possible my
> > > reading is wrong.
> > 
> > Is there a "grepbench" or canonical corpus we can use to get a 95%
> > CI on this and future changes?  People talk about grep(1) perf. like
> > CPU time and memory use, there ought to be something better than
> > "let's search the Linux kernel source for a string that isn't there."
> 
> A study of call patterns with ltrace might help

Ok, so let's do some measurements. First, I compiled four copies of
grep:
/usr/bin/grep   # no modification; uses fgetln
/usr/bin/grep.small # as above, with -DSMALL
/usr/bin/grep.getline   # uses getline through my diff
/usr/bin/grep.small.getline # as above, with -DSMALL

First let's look at a somewhat pathological case: a ~2GB file where each
line is longer than the previous line, by one byte.

cd /tmp
python3 -c 'for i in range(65536): print("a"*i)' > foo
ltrace -u libc -f fgetln.out grep b foo
ltrace -u libc -f getline.out grep.getline b foo
ltrace -u libc -f fgetln.small.out grep.small b foo
ltrace -u libc -f getline.small.out grep.small.getline b foo

side note: my shell reported the following times for the above commands.
these are not conclusive but it is very interesting that if we disable
the mmap path (which supposedly is an optimisation), the process
finishes quite a bit faster, with a lot less user time spent than the
non-SMALL version, regardless of whether we use fgetln or getline...

ltrace -u libc -f fgetln.out grep b foo  12.24s user 5.26s system 62% cpu 
27.797 total
ltrace -u libc -f getline.out grep.getline b foo  13.45s user 3.67s system 64% 
cpu 26.361 total
ltrace -u libc -f fgetln.small.out grep.small b foo  3.62s user 5.00s system 
42% cpu 20.418 total
ltrace -u libc -f getline.small.out grep.small.getline b foo  3.79s user 3.25s 
system 35% cpu 19.726 total

Since both /usr/bin/grep and /usr/bin/grep.getline are able to use
mmap() on this file, I expect no difference in the number of allocations
they make. ltrace agrees:

rush /tmp % kdump -f getline.out|egrep -c 
'"(malloc|calloc|realloc|recalloc)'
9
rush /tmp % kdump -f fgetln.out|egrep -c 
'"(malloc|calloc|realloc|recalloc)' 
9

So how about the -DSMALL versions? These should be more allocations.

rush /tmp % kdump -f fgetln.small.out|egrep -c 
'"(malloc|calloc|realloc|recalloc)'
12
rush /tmp % kdump -f getline.small.out|egrep -c 
'"(malloc|calloc|realloc|recalloc)'
12

Well, that's actually quite surprising - far fewer allocations than I
expected. Maybe I'm measuring the wrong thing somehow (I'm not really
familiar with ltrace). Let's take a look at what ktrace has to say about
these as a sanity check.

ktrace -tc -f kt.fgetln.out grep b foo
ktrace -tc -f kt.getline.out grep.getline b foo
ktrace -tc -f kt.fgetln.small.out grep.small b foo
ktrace -tc -f kt.getline.small.out grep.small.getline b foo

rush /tmp % kdump -f kt.fgetln.out|grep -c 'CALL *mmap.*MAP_ANON'
27886
rush /tmp % kdump -f kt.getline.out|grep -c 'CALL *mmap.*MAP_ANON'
52
rush /tmp % kdump -f kt.fgetln.small.out|grep -c 'CALL *mmap.*MAP_ANON'
27883
rush /tmp % kdump -f kt.getline.small.out|grep -c 'CALL *mmap.*MAP_ANON'
51

So, yes, the fgetln case for this file is indeed hitting MAP_ANON mmaps
way more often. But now I'm also a bit confused - why does the non-SMALL
case also call MAP_ANON mmap so many times? It's supposed to just
memory-map the file.

 41564 grep CALL  
mmap(0,0x80008000,0x1,0x2,0,0)
 41564 grep RET   mmap -1 errno 22 Invalid argument

wait - fd 0? It turns out that mmfile.c passes mmf->fd to mmap, but does
not actually initialize it. That's my fault, sorry - I apparently broke
it in the previous commit to grep (although I don't know why I wouldn't
get a maybe-uninitialized warning from this; perhaps it's because of the
grep_malloc wrapper?). Let's fix it:

diff --git a/usr.bin/grep/mmfile.c b/usr.bin/grep/mmfile.c
index fa5a0ea7752..d866dd3977b 100644
--- a/usr.bin/grep/mmfile.c
+++ b/usr.bin/grep/mmfile.c
@@ -49,6 +49,7 @@ mmopen(int fd, struct stat *st)
if (st->st_size > SIZE_MAX) /* too big to mmap */
goto ouch;
mmf->len = (size_t)st->st_size;
+   mmf->fd = fd;
mmf->base = mmap(NULL, mmf->len, PROT_READ, MAP_PRIVATE, mmf->fd, 
(off_t)0);
if (mmf->base == MAP_FAILED)
goto ouch;

and then rerun the ktrace tests. The naive 'time' test shows a clear
improvement now that we are actually memory mapping the file again:

ktrace -tc -f kt.fgetln.out grep b foo  11.43s user 1.58s 

Re: grep: convert fgetln to getline

2019-01-24 Thread Lauri Tirkkonen
On Thu, Jan 24 2019 18:08:25 -0600, Scott Cheloha wrote:
> not to be standoffish, I'm just travelling so I can't really look into it.

yeah, it's not like I have infinite time either. I'll try to do
something with this over the weekend but no promises.

> if you came forward with malloc numbers before/after with sufficient
> N to get to a standard CI, that would help your case and be useful
> in the future.

It's been a while since I did any statistical analysis. To be honest,
this seems super overkill for measuring a code path that is not actually
taken most of the time; the test would have to compile both of the greps
under test with -DSMALL to avoid the mmap path. But fine, since it seems
so important to you I'll try to get *something*...

-- 
Lauri Tirkkonen | lotheac @ IRCnet



Re: grep: convert fgetln to getline

2019-01-24 Thread Theo de Raadt
Scott Cheloha  wrote:

> > On Jan 24, 2019, at 06:19, Lauri Tirkkonen  wrote:
> > 
> > [...]
> >  
> > I haven't done any actual measurements though, so it's possible my
> > reading is wrong.
> 
> Is there a "grepbench" or canonical corpus we can use to get a 95%
> CI on this and future changes?  People talk about grep(1) perf. like
> CPU time and memory use, there ought to be something better than
> "let's search the Linux kernel source for a string that isn't there."

A study of call patterns with ltrace might help



Re: grep: convert fgetln to getline

2019-01-24 Thread Scott Cheloha
> On Jan 24, 2019, at 06:19, Lauri Tirkkonen  wrote:
> 
> [...]
>  
> I haven't done any actual measurements though, so it's possible my
> reading is wrong.

Is there a "grepbench" or canonical corpus we can use to get a 95%
CI on this and future changes?  People talk about grep(1) perf. like
CPU time and memory use, there ought to be something better than
"let's search the Linux kernel source for a string that isn't there."

... although that isn't a *terrible* benchmark.

What have you got?



Re: grep: convert fgetln to getline

2019-01-24 Thread Lauri Tirkkonen
On Thu, Jan 24 2019 13:58:04 +0200, Lauri Tirkkonen wrote:
> And even then I think the cost is negligible: getline grows
> the buffer in powers of 2, so only lines that happen to be twice as long
> as any previously encountered line pay the price.

Actually, I went to skim the fgetln() implementation and it might very
well be that fgetln() does *more* reallocations than getline. The
storage that fgetln() uses is hidden from the user, but obviously it has
to store its buffer somewhere. It's a member of struct FILE:

125 /* separate buffer for fgetln() when line crosses buffer 
boundary */
126 struct  __sbuf _lb; /* buffer for fgetln() */

and it is expanded by __slbexpand() in stdio/fgetln.c, which just does
recallocarray(). To me it looks like the buffer is either incremented by
OPTIMISTIC (#define'd to be 80 bytes), or again by exactly as much as is
required by the current line if that wasn't enough.

I haven't done any actual measurements though, so it's possible my
reading is wrong.

-- 
Lauri Tirkkonen | lotheac @ IRCnet



Re: grep: convert fgetln to getline

2019-01-24 Thread Lauri Tirkkonen
On Thu, Jan 24 2019 04:40:08 -0700, Theo de Raadt wrote:
> >On Thu, Jan 24 2019 04:22:20 -0700, Theo de Raadt wrote:
> >> I would like to know if this does more malloc.  I worry it is an additional
> >> level of malloc per line.
> >
> >It does do more malloc than plain fgetln since fgetln does no copying,
> >but nowhere near every line. The same buffer lnbuf is used for each
> >line, and libc getline() reallocates it if it is not large enough.
> 
> If there is more allocation, it will be more expensive.  Our malloc
> has many checks and also fresh allocations trigger expensive code paths
> in mmap intentionally -- this makes our runtime somewhat slower, but finds
> and fixes bugs in the same software running on other operating systems
> who lack our attitude towards rigor over performance).
> 
> question is, are you fixing a stylistic nit, or making it fundamentally
> better.

I do think I am making it fundamentally better, since fgetln is a broken
interface (it returns lines that are not null terminated).

The actual reason I started doing this was porting grep to an OS without
fgetln(). But I don't think it's a bad thing to reduce fgetln usage in
OpenBSD either.

> A bunch of us run a whole ton of grep during our work...
> 
> that's my basic question.

Sure, I understand this and I do agree grep performance matters. I just
want to reiterate that this diff only affects performance if the mmap
code path is not taken (ie. either stream is not seekable (determined
with isatty), or if grep was built -DSMALL (bsd.rd grep I think), or if
mmap fails). And even then I think the cost is negligible: getline grows
the buffer in powers of 2, so only lines that happen to be twice as long
as any previously encountered line pay the price.

-- 
Lauri Tirkkonen | lotheac @ IRCnet



Re: grep: convert fgetln to getline

2019-01-24 Thread Theo de Raadt
>On Thu, Jan 24 2019 04:22:20 -0700, Theo de Raadt wrote:
>> I would like to know if this does more malloc.  I worry it is an additional
>> level of malloc per line.
>
>It does do more malloc than plain fgetln since fgetln does no copying,
>but nowhere near every line. The same buffer lnbuf is used for each
>line, and libc getline() reallocates it if it is not large enough.

If there is more allocation, it will be more expensive.  Our malloc
has many checks and also fresh allocations trigger expensive code paths
in mmap intentionally -- this makes our runtime somewhat slower, but finds
and fixes bugs in the same software running on other operating systems
who lack our attitude towards rigor over performance).

question is, are you fixing a stylistic nit, or making it fundamentally
better.  A bunch of us run a whole ton of grep during our work...

that's my basic question.




Re: grep: convert fgetln to getline

2019-01-24 Thread Lauri Tirkkonen
On Thu, Jan 24 2019 04:22:20 -0700, Theo de Raadt wrote:
> I would like to know if this does more malloc.  I worry it is an additional
> level of malloc per line.

It does do more malloc than plain fgetln since fgetln does no copying,
but nowhere near every line. The same buffer lnbuf is used for each
line, and libc getline() reallocates it if it is not large enough.

-- 
Lauri Tirkkonen | lotheac @ IRCnet



Re: grep: convert fgetln to getline

2019-01-24 Thread Theo de Raadt
I would like to know if this does more malloc.  I worry it is an additional
level of malloc per line.

>On Wed, Jan 23 2019 18:01:24 -0500, Ted Unangst wrote:
>> Lauri Tirkkonen wrote:
>> > > > oh, interesting. that's sloppy. can you please fix that first, 
>> > > > separately?
>> > > 
>> > > Sure, here it is.
>> > 
>> > Could you please take a look at it? It's been a couple weeks.
>> 
>> yup, sorry, slipped by. committed.
>
>Thanks. What about the fgetln->getline diff (same as I sent previously
>applies cleanly; reattached below)? I know you said you're concerned
>about the extra copy, but as I pointed out the getline code path will
>only be taken if mmopen() fails, the file is not seekable, or if grep
>was compiled -DSMALL.
>
>diff --git a/usr.bin/grep/file.c b/usr.bin/grep/file.c
>index 4b3c689e4ab..87c49dd5cc0 100644
>--- a/usr.bin/grep/file.c
>+++ b/usr.bin/grep/file.c
>@@ -34,10 +34,8 @@
> #include "grep.h"
> 
> static charfname[PATH_MAX];
>-#ifndef NOZ
> static char   *lnbuf;
>-static size_t  lnbuflen;
>-#endif
>+static size_t  lnbufsize;
> 
> #define FILE_STDIO0
> #define FILE_MMAP 1
>@@ -73,9 +71,9 @@ gzfgetln(gzFile *f, size_t *len)
>   else
>   errx(2, "%s: %s", fname, gzerrstr);
>   }
>-  if (n >= lnbuflen) {
>-  lnbuflen *= 2;
>-  lnbuf = grep_realloc(lnbuf, ++lnbuflen);
>+  if (n >= lnbufsize) {
>+  lnbufsize *= 2;
>+  lnbuf = grep_realloc(lnbuf, ++lnbufsize);
>   }
>   if (c == '\n')
>   break;
>@@ -182,7 +180,13 @@ grep_fgetln(file_t *f, size_t *l)
> {
>   switch (f->type) {
>   case FILE_STDIO:
>-  return fgetln(f->f, l);
>+  if ((*l = getline(, , f->f)) == -1) {
>+  if (ferror(f->f))
>+  err(2, "%s: getline", fname);
>+  else
>+  return NULL;
>+  }
>+  return lnbuf;
> #ifndef SMALL
>   case FILE_MMAP:
>   return mmfgetln(f->mmf, l);
>diff --git a/usr.bin/grep/grep.c b/usr.bin/grep/grep.c
>index 913cc97a0f3..cfac24b12aa 100644
>--- a/usr.bin/grep/grep.c
>+++ b/usr.bin/grep/grep.c
>@@ -224,15 +224,19 @@ read_patterns(const char *fn)
> {
>   FILE *f;
>   char *line;
>-  size_t len;
>+  ssize_t len;
>+  size_t linesize;
> 
>   if ((f = fopen(fn, "r")) == NULL)
>   err(2, "%s", fn);
>-  while ((line = fgetln(f, )) != NULL)
>+  line = NULL;
>+  linesize = 0;
>+  while ((len = getline(, , f)) != -1)
>   add_pattern(line, *line == '\n' ? 0 : len);
>   if (ferror(f))
>   err(2, "%s", fn);
>   fclose(f);
>+  free(line);
> }
> 
> int
>-- 
>Lauri Tirkkonen | lotheac @ IRCnet
>
>



Re: grep: convert fgetln to getline

2019-01-23 Thread Lauri Tirkkonen
On Wed, Jan 23 2019 18:01:24 -0500, Ted Unangst wrote:
> Lauri Tirkkonen wrote:
> > > > oh, interesting. that's sloppy. can you please fix that first, 
> > > > separately?
> > > 
> > > Sure, here it is.
> > 
> > Could you please take a look at it? It's been a couple weeks.
> 
> yup, sorry, slipped by. committed.

Thanks. What about the fgetln->getline diff (same as I sent previously
applies cleanly; reattached below)? I know you said you're concerned
about the extra copy, but as I pointed out the getline code path will
only be taken if mmopen() fails, the file is not seekable, or if grep
was compiled -DSMALL.

diff --git a/usr.bin/grep/file.c b/usr.bin/grep/file.c
index 4b3c689e4ab..87c49dd5cc0 100644
--- a/usr.bin/grep/file.c
+++ b/usr.bin/grep/file.c
@@ -34,10 +34,8 @@
 #include "grep.h"
 
 static char fname[PATH_MAX];
-#ifndef NOZ
 static char*lnbuf;
-static size_t   lnbuflen;
-#endif
+static size_t   lnbufsize;
 
 #define FILE_STDIO 0
 #define FILE_MMAP  1
@@ -73,9 +71,9 @@ gzfgetln(gzFile *f, size_t *len)
else
errx(2, "%s: %s", fname, gzerrstr);
}
-   if (n >= lnbuflen) {
-   lnbuflen *= 2;
-   lnbuf = grep_realloc(lnbuf, ++lnbuflen);
+   if (n >= lnbufsize) {
+   lnbufsize *= 2;
+   lnbuf = grep_realloc(lnbuf, ++lnbufsize);
}
if (c == '\n')
break;
@@ -182,7 +180,13 @@ grep_fgetln(file_t *f, size_t *l)
 {
switch (f->type) {
case FILE_STDIO:
-   return fgetln(f->f, l);
+   if ((*l = getline(, , f->f)) == -1) {
+   if (ferror(f->f))
+   err(2, "%s: getline", fname);
+   else
+   return NULL;
+   }
+   return lnbuf;
 #ifndef SMALL
case FILE_MMAP:
return mmfgetln(f->mmf, l);
diff --git a/usr.bin/grep/grep.c b/usr.bin/grep/grep.c
index 913cc97a0f3..cfac24b12aa 100644
--- a/usr.bin/grep/grep.c
+++ b/usr.bin/grep/grep.c
@@ -224,15 +224,19 @@ read_patterns(const char *fn)
 {
FILE *f;
char *line;
-   size_t len;
+   ssize_t len;
+   size_t linesize;
 
if ((f = fopen(fn, "r")) == NULL)
err(2, "%s", fn);
-   while ((line = fgetln(f, )) != NULL)
+   line = NULL;
+   linesize = 0;
+   while ((len = getline(, , f)) != -1)
add_pattern(line, *line == '\n' ? 0 : len);
if (ferror(f))
err(2, "%s", fn);
fclose(f);
+   free(line);
 }
 
 int
-- 
Lauri Tirkkonen | lotheac @ IRCnet



Re: grep: convert fgetln to getline

2019-01-23 Thread Ted Unangst
Lauri Tirkkonen wrote:
> > > oh, interesting. that's sloppy. can you please fix that first, separately?
> > 
> > Sure, here it is.
> 
> Could you please take a look at it? It's been a couple weeks.

yup, sorry, slipped by. committed.



Re: grep: convert fgetln to getline

2019-01-23 Thread Lauri Tirkkonen
On Mon, Jan 07 2019 23:01:47 +0200, Lauri Tirkkonen wrote:
> On Mon, Jan 07 2019 15:41:53 -0500, Ted Unangst wrote:
> > Lauri Tirkkonen wrote:
> > > On Sun, Jan 06 2019 14:02:16 -0500, Ted Unangst wrote:
> > > > Lauri Tirkkonen wrote:
> > > > > Hi, another simple diff converting fgetln usage to getline.
> > > > > 
> > > > > I also considered converting grep_fgetln to grep_getline, but that 
> > > > > would
> > > > > mean that for FILE_MMAP we'd have to copy the string. So this diff 
> > > > > keeps
> > > > > the grep_fgetln interface as is, but avoids using fgetln from libc 
> > > > > (and
> > > > > adds error handling for FILE_STDIO).
> > > > 
> > > > this looks good and seems to work. thanks.
> > > 
> > > actually, it doesn't work :) the added error handling catches that
> > > directory fd's are being passed into grep_fgetln, causing an error exit
> > > with grep -r (or just grep foo /etc/*). I'm working on a second diff to
> > > refactor the file handling a bit.
> > 
> > oh, interesting. that's sloppy. can you please fix that first, separately?
> 
> Sure, here it is.

Could you please take a look at it? It's been a couple weeks.

-- 
Lauri Tirkkonen | lotheac @ IRCnet



Re: grep: convert fgetln to getline

2019-01-14 Thread Lauri Tirkkonen
On Mon, Jan 07 2019 23:01:47 +0200, Lauri Tirkkonen wrote:
> On Mon, Jan 07 2019 15:41:53 -0500, Ted Unangst wrote:
> > Lauri Tirkkonen wrote:
> > > On Sun, Jan 06 2019 14:02:16 -0500, Ted Unangst wrote:
> > > > Lauri Tirkkonen wrote:
> > > > > Hi, another simple diff converting fgetln usage to getline.
> > > > > 
> > > > > I also considered converting grep_fgetln to grep_getline, but that 
> > > > > would
> > > > > mean that for FILE_MMAP we'd have to copy the string. So this diff 
> > > > > keeps
> > > > > the grep_fgetln interface as is, but avoids using fgetln from libc 
> > > > > (and
> > > > > adds error handling for FILE_STDIO).
> > > > 
> > > > this looks good and seems to work. thanks.
> > > 
> > > actually, it doesn't work :) the added error handling catches that
> > > directory fd's are being passed into grep_fgetln, causing an error exit
> > > with grep -r (or just grep foo /etc/*). I'm working on a second diff to
> > > refactor the file handling a bit.
> > 
> > oh, interesting. that's sloppy. can you please fix that first, separately?
> 
> Sure, here it is.

ping.

-- 
Lauri Tirkkonen | lotheac @ IRCnet



Re: grep: convert fgetln to getline

2019-01-07 Thread Lauri Tirkkonen
On Mon, Jan 07 2019 15:41:53 -0500, Ted Unangst wrote:
> Lauri Tirkkonen wrote:
> > On Sun, Jan 06 2019 14:02:16 -0500, Ted Unangst wrote:
> > > Lauri Tirkkonen wrote:
> > > > Hi, another simple diff converting fgetln usage to getline.
> > > > 
> > > > I also considered converting grep_fgetln to grep_getline, but that would
> > > > mean that for FILE_MMAP we'd have to copy the string. So this diff keeps
> > > > the grep_fgetln interface as is, but avoids using fgetln from libc (and
> > > > adds error handling for FILE_STDIO).
> > > 
> > > this looks good and seems to work. thanks.
> > 
> > actually, it doesn't work :) the added error handling catches that
> > directory fd's are being passed into grep_fgetln, causing an error exit
> > with grep -r (or just grep foo /etc/*). I'm working on a second diff to
> > refactor the file handling a bit.
> 
> oh, interesting. that's sloppy. can you please fix that first, separately?

Sure, here it is. The idea is to use grep_fdopen for everything, making
grep_open just open the file and pass the fd to grep_fdopen, do fstat()
in grep_fdopen and modify mmopen() accordingly (it no longer needs to
open the file or stat it).

grep_tree is modified to not call procfile() on FTS_D (FTS_DP was
already there). In addition grep_fdopen returns NULL and sets errno ==
EISDIR if it encounters a directory; this can happen if a directory
argument is given on command line, or for recursive grep, if the file
being processed is a symlink to a directory, which fts doesn't notice.
procfile() handles the EISDIR return silently, matching current
behavior.

I also removed the useless mode argument from grep_open and grep_fdopen;
we always open files read-only.

diff --git a/usr.bin/grep/file.c b/usr.bin/grep/file.c
index 4b3c689e4ab..2befc4304c1 100644
--- a/usr.bin/grep/file.c
+++ b/usr.bin/grep/file.c
@@ -26,7 +26,10 @@
  * SUCH DAMAGE.
  */
 
+#include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -90,68 +93,64 @@ gzfgetln(gzFile *f, size_t *len)
 #endif
 
 file_t *
-grep_fdopen(int fd, char *mode)
+grep_fdopen(int fd)
 {
file_t *f;
+   struct stat sb;
 
if (fd == STDIN_FILENO)
snprintf(fname, sizeof fname, "(standard input)");
-   else
+   else if (fname[0] == '\0')
snprintf(fname, sizeof fname, "(fd %d)", fd);
 
+   if (fstat(fd, ) == -1)
+   return NULL;
+   if (S_ISDIR(sb.st_mode)) {
+   errno = EISDIR;
+   return NULL;
+   }
+
f = grep_malloc(sizeof *f);
 
 #ifndef NOZ
if (Zflag) {
f->type = FILE_GZIP;
f->noseek = lseek(fd, 0L, SEEK_SET) == -1;
-   if ((f->gzf = gzdopen(fd, mode)) != NULL)
+   if ((f->gzf = gzdopen(fd, "r")) != NULL)
return f;
-   } else
+   }
 #endif
-   {
-   f->type = FILE_STDIO;
-   f->noseek = isatty(fd);
-   if ((f->f = fdopen(fd, mode)) != NULL)
-   return f;
+   f->noseek = isatty(fd);
+#ifndef SMALL
+   /* try mmap first; if it fails, try stdio */
+   if (!f->noseek && (f->mmf = mmopen(fd, )) != NULL) {
+   f->type = FILE_MMAP;
+   return f;
}
+#endif
+   f->type = FILE_STDIO;
+   if ((f->f = fdopen(fd, "r")) != NULL)
+   return f;
 
free(f);
return NULL;
 }
 
 file_t *
-grep_open(char *path, char *mode)
+grep_open(char *path)
 {
file_t *f;
+   int fd;
 
snprintf(fname, sizeof fname, "%s", path);
 
-   f = grep_malloc(sizeof *f);
-   f->noseek = 0;
-
-#ifndef NOZ
-   if (Zflag) {
-   f->type = FILE_GZIP;
-   if ((f->gzf = gzopen(fname, mode)) != NULL)
-   return f;
-   } else
-#endif
-   {
-#ifndef SMALL
-   /* try mmap first; if it fails, try stdio */
-   if ((f->mmf = mmopen(fname, mode)) != NULL) {
-   f->type = FILE_MMAP;
-   return f;
-   }
-#endif
-   f->type = FILE_STDIO;
-   if ((f->f = fopen(path, mode)) != NULL)
-   return f;
-   }
+   if ((fd = open(fname, O_RDONLY)) == -1)
+   return NULL;
 
-   free(f);
-   return NULL;
+   f = grep_fdopen(fd);
+   if (f == NULL)
+   close(fd);
+   return f;
 }
 
 int
diff --git a/usr.bin/grep/grep.c b/usr.bin/grep/grep.c
index 913cc97a0f3..e5e48fa163a 100644
--- a/usr.bin/grep/grep.c
+++ b/usr.bin/grep/grep.c
@@ -63,9 +63,7 @@ intFflag; /* -F: interpret pattern as list of 
fixed strings */
 int Hflag; /* -H: always print filename header */
 int Lflag; /* -L: only show names of files with no matches */
 int Rflag; /* -R: recursively search directory trees */
-#ifndef NOZ
 int Zflag; /* -Z: decompress input before processing */

Re: grep: convert fgetln to getline

2019-01-07 Thread Ted Unangst
Lauri Tirkkonen wrote:
> On Sun, Jan 06 2019 14:02:16 -0500, Ted Unangst wrote:
> > Lauri Tirkkonen wrote:
> > > Hi, another simple diff converting fgetln usage to getline.
> > > 
> > > I also considered converting grep_fgetln to grep_getline, but that would
> > > mean that for FILE_MMAP we'd have to copy the string. So this diff keeps
> > > the grep_fgetln interface as is, but avoids using fgetln from libc (and
> > > adds error handling for FILE_STDIO).
> > 
> > this looks good and seems to work. thanks.
> 
> actually, it doesn't work :) the added error handling catches that
> directory fd's are being passed into grep_fgetln, causing an error exit
> with grep -r (or just grep foo /etc/*). I'm working on a second diff to
> refactor the file handling a bit.

oh, interesting. that's sloppy. can you please fix that first, separately?

(wrt getline, theo raised some concern that getline copies more data than
fgetln does. we care quite a bit about grep performance, so we need to
consider that some more.)



Re: grep: convert fgetln to getline

2019-01-07 Thread Lauri Tirkkonen
On Sun, Jan 06 2019 14:02:16 -0500, Ted Unangst wrote:
> Lauri Tirkkonen wrote:
> > Hi, another simple diff converting fgetln usage to getline.
> > 
> > I also considered converting grep_fgetln to grep_getline, but that would
> > mean that for FILE_MMAP we'd have to copy the string. So this diff keeps
> > the grep_fgetln interface as is, but avoids using fgetln from libc (and
> > adds error handling for FILE_STDIO).
> 
> this looks good and seems to work. thanks.

actually, it doesn't work :) the added error handling catches that
directory fd's are being passed into grep_fgetln, causing an error exit
with grep -r (or just grep foo /etc/*). I'm working on a second diff to
refactor the file handling a bit.

-- 
Lauri Tirkkonen | lotheac @ IRCnet



Re: grep: convert fgetln to getline

2019-01-06 Thread Ted Unangst
Lauri Tirkkonen wrote:
> Hi, another simple diff converting fgetln usage to getline.
> 
> I also considered converting grep_fgetln to grep_getline, but that would
> mean that for FILE_MMAP we'd have to copy the string. So this diff keeps
> the grep_fgetln interface as is, but avoids using fgetln from libc (and
> adds error handling for FILE_STDIO).

this looks good and seems to work. thanks.



grep: convert fgetln to getline

2019-01-06 Thread Lauri Tirkkonen
Hi, another simple diff converting fgetln usage to getline.

I also considered converting grep_fgetln to grep_getline, but that would
mean that for FILE_MMAP we'd have to copy the string. So this diff keeps
the grep_fgetln interface as is, but avoids using fgetln from libc (and
adds error handling for FILE_STDIO).

diff --git a/usr.bin/grep/file.c b/usr.bin/grep/file.c
index 4b3c689e4ab..87c49dd5cc0 100644
--- a/usr.bin/grep/file.c
+++ b/usr.bin/grep/file.c
@@ -34,10 +34,8 @@
 #include "grep.h"
 
 static char fname[PATH_MAX];
-#ifndef NOZ
 static char*lnbuf;
-static size_t   lnbuflen;
-#endif
+static size_t   lnbufsize;
 
 #define FILE_STDIO 0
 #define FILE_MMAP  1
@@ -73,9 +71,9 @@ gzfgetln(gzFile *f, size_t *len)
else
errx(2, "%s: %s", fname, gzerrstr);
}
-   if (n >= lnbuflen) {
-   lnbuflen *= 2;
-   lnbuf = grep_realloc(lnbuf, ++lnbuflen);
+   if (n >= lnbufsize) {
+   lnbufsize *= 2;
+   lnbuf = grep_realloc(lnbuf, ++lnbufsize);
}
if (c == '\n')
break;
@@ -182,7 +180,13 @@ grep_fgetln(file_t *f, size_t *l)
 {
switch (f->type) {
case FILE_STDIO:
-   return fgetln(f->f, l);
+   if ((*l = getline(, , f->f)) == -1) {
+   if (ferror(f->f))
+   err(2, "%s: getline", fname);
+   else
+   return NULL;
+   }
+   return lnbuf;
 #ifndef SMALL
case FILE_MMAP:
return mmfgetln(f->mmf, l);
diff --git a/usr.bin/grep/grep.c b/usr.bin/grep/grep.c
index 913cc97a0f3..cfac24b12aa 100644
--- a/usr.bin/grep/grep.c
+++ b/usr.bin/grep/grep.c
@@ -224,15 +224,19 @@ read_patterns(const char *fn)
 {
FILE *f;
char *line;
-   size_t len;
+   ssize_t len;
+   size_t linesize;
 
if ((f = fopen(fn, "r")) == NULL)
err(2, "%s", fn);
-   while ((line = fgetln(f, )) != NULL)
+   line = NULL;
+   linesize = 0;
+   while ((len = getline(, , f)) != -1)
add_pattern(line, *line == '\n' ? 0 : len);
if (ferror(f))
err(2, "%s", fn);
fclose(f);
+   free(line);
 }
 
 int

-- 
Lauri Tirkkonen | lotheac @ IRCnet