Re: grep: convert fgetln to getline
On Thu, Jan 31, 2019 at 03:10:53PM +0200, Lauri Tirkkonen wrote: > On Wed, Jan 30 2019 20:32:50 -0500, Ted Unangst wrote: > > Thanks for digging into this. I went ahead and committed your diff. > > Thanks for committing it. > > You know, having seen fgetln's allocation strategy and its usage of the > stdio internals, I couldn't help but wonder if it's something that could > be eventually removed entirely. It's used in a bunch of places in-tree > for sure, but not too many to convert, I think; and I've already done a > few... > > It's not exactly a serious suggestion at this point; I realize there > probably is third-party software too that uses this function (Linuxes > provide compat for it in libbsd; there must be a reason for that). I'm > speaking as someone who's been removing a bunch of crap from that other > OS I mentioned, so that's my reason for this line of thinking slash > pipe-dreaming ;) > quite amazing to watch tedu talk to another tedu :-| -- Gilles Chehade @poolpOrg https://www.poolp.org tip me: https://paypal.me/poolpOrg
Re: grep: convert fgetln to getline
On Wed, Jan 30 2019 20:32:50 -0500, Ted Unangst wrote: > Thanks for digging into this. I went ahead and committed your diff. Thanks for committing it. You know, having seen fgetln's allocation strategy and its usage of the stdio internals, I couldn't help but wonder if it's something that could be eventually removed entirely. It's used in a bunch of places in-tree for sure, but not too many to convert, I think; and I've already done a few... It's not exactly a serious suggestion at this point; I realize there probably is third-party software too that uses this function (Linuxes provide compat for it in libbsd; there must be a reason for that). I'm speaking as someone who's been removing a bunch of crap from that other OS I mentioned, so that's my reason for this line of thinking slash pipe-dreaming ;) -- Lauri Tirkkonen | lotheac @ IRCnet
Re: grep: convert fgetln to getline
Lauri Tirkkonen wrote: > In summary: sorry for the bug I introduced - please apply the diff above > to fix memory mapping files in grep. As for the fgetln->getline > conversion, I think this result should show that fgetln is not > particularly smart about how it's calling mmap -- getline seems better. Thanks for digging into this. I went ahead and committed your diff. Now we wait for people to complain...
Re: grep: convert fgetln to getline
On Thu, Jan 24 2019 17:03:57 -0700, Theo de Raadt wrote: > Scott Cheloha wrote: > > > > On Jan 24, 2019, at 06:19, Lauri Tirkkonen wrote: > > > > > > [...] > > > > > > I haven't done any actual measurements though, so it's possible my > > > reading is wrong. > > > > Is there a "grepbench" or canonical corpus we can use to get a 95% > > CI on this and future changes? People talk about grep(1) perf. like > > CPU time and memory use, there ought to be something better than > > "let's search the Linux kernel source for a string that isn't there." > > A study of call patterns with ltrace might help Ok, so let's do some measurements. First, I compiled four copies of grep: /usr/bin/grep # no modification; uses fgetln /usr/bin/grep.small # as above, with -DSMALL /usr/bin/grep.getline # uses getline through my diff /usr/bin/grep.small.getline # as above, with -DSMALL First let's look at a somewhat pathological case: a ~2GB file where each line is longer than the previous line, by one byte. cd /tmp python3 -c 'for i in range(65536): print("a"*i)' > foo ltrace -u libc -f fgetln.out grep b foo ltrace -u libc -f getline.out grep.getline b foo ltrace -u libc -f fgetln.small.out grep.small b foo ltrace -u libc -f getline.small.out grep.small.getline b foo side note: my shell reported the following times for the above commands. these are not conclusive but it is very interesting that if we disable the mmap path (which supposedly is an optimisation), the process finishes quite a bit faster, with a lot less user time spent than the non-SMALL version, regardless of whether we use fgetln or getline... ltrace -u libc -f fgetln.out grep b foo 12.24s user 5.26s system 62% cpu 27.797 total ltrace -u libc -f getline.out grep.getline b foo 13.45s user 3.67s system 64% cpu 26.361 total ltrace -u libc -f fgetln.small.out grep.small b foo 3.62s user 5.00s system 42% cpu 20.418 total ltrace -u libc -f getline.small.out grep.small.getline b foo 3.79s user 3.25s system 35% cpu 19.726 total Since both /usr/bin/grep and /usr/bin/grep.getline are able to use mmap() on this file, I expect no difference in the number of allocations they make. ltrace agrees: rush /tmp % kdump -f getline.out|egrep -c '"(malloc|calloc|realloc|recalloc)' 9 rush /tmp % kdump -f fgetln.out|egrep -c '"(malloc|calloc|realloc|recalloc)' 9 So how about the -DSMALL versions? These should be more allocations. rush /tmp % kdump -f fgetln.small.out|egrep -c '"(malloc|calloc|realloc|recalloc)' 12 rush /tmp % kdump -f getline.small.out|egrep -c '"(malloc|calloc|realloc|recalloc)' 12 Well, that's actually quite surprising - far fewer allocations than I expected. Maybe I'm measuring the wrong thing somehow (I'm not really familiar with ltrace). Let's take a look at what ktrace has to say about these as a sanity check. ktrace -tc -f kt.fgetln.out grep b foo ktrace -tc -f kt.getline.out grep.getline b foo ktrace -tc -f kt.fgetln.small.out grep.small b foo ktrace -tc -f kt.getline.small.out grep.small.getline b foo rush /tmp % kdump -f kt.fgetln.out|grep -c 'CALL *mmap.*MAP_ANON' 27886 rush /tmp % kdump -f kt.getline.out|grep -c 'CALL *mmap.*MAP_ANON' 52 rush /tmp % kdump -f kt.fgetln.small.out|grep -c 'CALL *mmap.*MAP_ANON' 27883 rush /tmp % kdump -f kt.getline.small.out|grep -c 'CALL *mmap.*MAP_ANON' 51 So, yes, the fgetln case for this file is indeed hitting MAP_ANON mmaps way more often. But now I'm also a bit confused - why does the non-SMALL case also call MAP_ANON mmap so many times? It's supposed to just memory-map the file. 41564 grep CALL mmap(0,0x80008000,0x1,0x2,0,0) 41564 grep RET mmap -1 errno 22 Invalid argument wait - fd 0? It turns out that mmfile.c passes mmf->fd to mmap, but does not actually initialize it. That's my fault, sorry - I apparently broke it in the previous commit to grep (although I don't know why I wouldn't get a maybe-uninitialized warning from this; perhaps it's because of the grep_malloc wrapper?). Let's fix it: diff --git a/usr.bin/grep/mmfile.c b/usr.bin/grep/mmfile.c index fa5a0ea7752..d866dd3977b 100644 --- a/usr.bin/grep/mmfile.c +++ b/usr.bin/grep/mmfile.c @@ -49,6 +49,7 @@ mmopen(int fd, struct stat *st) if (st->st_size > SIZE_MAX) /* too big to mmap */ goto ouch; mmf->len = (size_t)st->st_size; + mmf->fd = fd; mmf->base = mmap(NULL, mmf->len, PROT_READ, MAP_PRIVATE, mmf->fd, (off_t)0); if (mmf->base == MAP_FAILED) goto ouch; and then rerun the ktrace tests. The naive 'time' test shows a clear improvement now that we are actually memory mapping the file again: ktrace -tc -f kt.fgetln.out grep b foo 11.43s user 1.58s
Re: grep: convert fgetln to getline
On Thu, Jan 24 2019 18:08:25 -0600, Scott Cheloha wrote: > not to be standoffish, I'm just travelling so I can't really look into it. yeah, it's not like I have infinite time either. I'll try to do something with this over the weekend but no promises. > if you came forward with malloc numbers before/after with sufficient > N to get to a standard CI, that would help your case and be useful > in the future. It's been a while since I did any statistical analysis. To be honest, this seems super overkill for measuring a code path that is not actually taken most of the time; the test would have to compile both of the greps under test with -DSMALL to avoid the mmap path. But fine, since it seems so important to you I'll try to get *something*... -- Lauri Tirkkonen | lotheac @ IRCnet
Re: grep: convert fgetln to getline
Scott Cheloha wrote: > > On Jan 24, 2019, at 06:19, Lauri Tirkkonen wrote: > > > > [...] > > > > I haven't done any actual measurements though, so it's possible my > > reading is wrong. > > Is there a "grepbench" or canonical corpus we can use to get a 95% > CI on this and future changes? People talk about grep(1) perf. like > CPU time and memory use, there ought to be something better than > "let's search the Linux kernel source for a string that isn't there." A study of call patterns with ltrace might help
Re: grep: convert fgetln to getline
> On Jan 24, 2019, at 06:19, Lauri Tirkkonen wrote: > > [...] > > I haven't done any actual measurements though, so it's possible my > reading is wrong. Is there a "grepbench" or canonical corpus we can use to get a 95% CI on this and future changes? People talk about grep(1) perf. like CPU time and memory use, there ought to be something better than "let's search the Linux kernel source for a string that isn't there." ... although that isn't a *terrible* benchmark. What have you got?
Re: grep: convert fgetln to getline
On Thu, Jan 24 2019 13:58:04 +0200, Lauri Tirkkonen wrote: > And even then I think the cost is negligible: getline grows > the buffer in powers of 2, so only lines that happen to be twice as long > as any previously encountered line pay the price. Actually, I went to skim the fgetln() implementation and it might very well be that fgetln() does *more* reallocations than getline. The storage that fgetln() uses is hidden from the user, but obviously it has to store its buffer somewhere. It's a member of struct FILE: 125 /* separate buffer for fgetln() when line crosses buffer boundary */ 126 struct __sbuf _lb; /* buffer for fgetln() */ and it is expanded by __slbexpand() in stdio/fgetln.c, which just does recallocarray(). To me it looks like the buffer is either incremented by OPTIMISTIC (#define'd to be 80 bytes), or again by exactly as much as is required by the current line if that wasn't enough. I haven't done any actual measurements though, so it's possible my reading is wrong. -- Lauri Tirkkonen | lotheac @ IRCnet
Re: grep: convert fgetln to getline
On Thu, Jan 24 2019 04:40:08 -0700, Theo de Raadt wrote: > >On Thu, Jan 24 2019 04:22:20 -0700, Theo de Raadt wrote: > >> I would like to know if this does more malloc. I worry it is an additional > >> level of malloc per line. > > > >It does do more malloc than plain fgetln since fgetln does no copying, > >but nowhere near every line. The same buffer lnbuf is used for each > >line, and libc getline() reallocates it if it is not large enough. > > If there is more allocation, it will be more expensive. Our malloc > has many checks and also fresh allocations trigger expensive code paths > in mmap intentionally -- this makes our runtime somewhat slower, but finds > and fixes bugs in the same software running on other operating systems > who lack our attitude towards rigor over performance). > > question is, are you fixing a stylistic nit, or making it fundamentally > better. I do think I am making it fundamentally better, since fgetln is a broken interface (it returns lines that are not null terminated). The actual reason I started doing this was porting grep to an OS without fgetln(). But I don't think it's a bad thing to reduce fgetln usage in OpenBSD either. > A bunch of us run a whole ton of grep during our work... > > that's my basic question. Sure, I understand this and I do agree grep performance matters. I just want to reiterate that this diff only affects performance if the mmap code path is not taken (ie. either stream is not seekable (determined with isatty), or if grep was built -DSMALL (bsd.rd grep I think), or if mmap fails). And even then I think the cost is negligible: getline grows the buffer in powers of 2, so only lines that happen to be twice as long as any previously encountered line pay the price. -- Lauri Tirkkonen | lotheac @ IRCnet
Re: grep: convert fgetln to getline
>On Thu, Jan 24 2019 04:22:20 -0700, Theo de Raadt wrote: >> I would like to know if this does more malloc. I worry it is an additional >> level of malloc per line. > >It does do more malloc than plain fgetln since fgetln does no copying, >but nowhere near every line. The same buffer lnbuf is used for each >line, and libc getline() reallocates it if it is not large enough. If there is more allocation, it will be more expensive. Our malloc has many checks and also fresh allocations trigger expensive code paths in mmap intentionally -- this makes our runtime somewhat slower, but finds and fixes bugs in the same software running on other operating systems who lack our attitude towards rigor over performance). question is, are you fixing a stylistic nit, or making it fundamentally better. A bunch of us run a whole ton of grep during our work... that's my basic question.
Re: grep: convert fgetln to getline
On Thu, Jan 24 2019 04:22:20 -0700, Theo de Raadt wrote: > I would like to know if this does more malloc. I worry it is an additional > level of malloc per line. It does do more malloc than plain fgetln since fgetln does no copying, but nowhere near every line. The same buffer lnbuf is used for each line, and libc getline() reallocates it if it is not large enough. -- Lauri Tirkkonen | lotheac @ IRCnet
Re: grep: convert fgetln to getline
I would like to know if this does more malloc. I worry it is an additional level of malloc per line. >On Wed, Jan 23 2019 18:01:24 -0500, Ted Unangst wrote: >> Lauri Tirkkonen wrote: >> > > > oh, interesting. that's sloppy. can you please fix that first, >> > > > separately? >> > > >> > > Sure, here it is. >> > >> > Could you please take a look at it? It's been a couple weeks. >> >> yup, sorry, slipped by. committed. > >Thanks. What about the fgetln->getline diff (same as I sent previously >applies cleanly; reattached below)? I know you said you're concerned >about the extra copy, but as I pointed out the getline code path will >only be taken if mmopen() fails, the file is not seekable, or if grep >was compiled -DSMALL. > >diff --git a/usr.bin/grep/file.c b/usr.bin/grep/file.c >index 4b3c689e4ab..87c49dd5cc0 100644 >--- a/usr.bin/grep/file.c >+++ b/usr.bin/grep/file.c >@@ -34,10 +34,8 @@ > #include "grep.h" > > static charfname[PATH_MAX]; >-#ifndef NOZ > static char *lnbuf; >-static size_t lnbuflen; >-#endif >+static size_t lnbufsize; > > #define FILE_STDIO0 > #define FILE_MMAP 1 >@@ -73,9 +71,9 @@ gzfgetln(gzFile *f, size_t *len) > else > errx(2, "%s: %s", fname, gzerrstr); > } >- if (n >= lnbuflen) { >- lnbuflen *= 2; >- lnbuf = grep_realloc(lnbuf, ++lnbuflen); >+ if (n >= lnbufsize) { >+ lnbufsize *= 2; >+ lnbuf = grep_realloc(lnbuf, ++lnbufsize); > } > if (c == '\n') > break; >@@ -182,7 +180,13 @@ grep_fgetln(file_t *f, size_t *l) > { > switch (f->type) { > case FILE_STDIO: >- return fgetln(f->f, l); >+ if ((*l = getline(, , f->f)) == -1) { >+ if (ferror(f->f)) >+ err(2, "%s: getline", fname); >+ else >+ return NULL; >+ } >+ return lnbuf; > #ifndef SMALL > case FILE_MMAP: > return mmfgetln(f->mmf, l); >diff --git a/usr.bin/grep/grep.c b/usr.bin/grep/grep.c >index 913cc97a0f3..cfac24b12aa 100644 >--- a/usr.bin/grep/grep.c >+++ b/usr.bin/grep/grep.c >@@ -224,15 +224,19 @@ read_patterns(const char *fn) > { > FILE *f; > char *line; >- size_t len; >+ ssize_t len; >+ size_t linesize; > > if ((f = fopen(fn, "r")) == NULL) > err(2, "%s", fn); >- while ((line = fgetln(f, )) != NULL) >+ line = NULL; >+ linesize = 0; >+ while ((len = getline(, , f)) != -1) > add_pattern(line, *line == '\n' ? 0 : len); > if (ferror(f)) > err(2, "%s", fn); > fclose(f); >+ free(line); > } > > int >-- >Lauri Tirkkonen | lotheac @ IRCnet > >
Re: grep: convert fgetln to getline
On Wed, Jan 23 2019 18:01:24 -0500, Ted Unangst wrote: > Lauri Tirkkonen wrote: > > > > oh, interesting. that's sloppy. can you please fix that first, > > > > separately? > > > > > > Sure, here it is. > > > > Could you please take a look at it? It's been a couple weeks. > > yup, sorry, slipped by. committed. Thanks. What about the fgetln->getline diff (same as I sent previously applies cleanly; reattached below)? I know you said you're concerned about the extra copy, but as I pointed out the getline code path will only be taken if mmopen() fails, the file is not seekable, or if grep was compiled -DSMALL. diff --git a/usr.bin/grep/file.c b/usr.bin/grep/file.c index 4b3c689e4ab..87c49dd5cc0 100644 --- a/usr.bin/grep/file.c +++ b/usr.bin/grep/file.c @@ -34,10 +34,8 @@ #include "grep.h" static char fname[PATH_MAX]; -#ifndef NOZ static char*lnbuf; -static size_t lnbuflen; -#endif +static size_t lnbufsize; #define FILE_STDIO 0 #define FILE_MMAP 1 @@ -73,9 +71,9 @@ gzfgetln(gzFile *f, size_t *len) else errx(2, "%s: %s", fname, gzerrstr); } - if (n >= lnbuflen) { - lnbuflen *= 2; - lnbuf = grep_realloc(lnbuf, ++lnbuflen); + if (n >= lnbufsize) { + lnbufsize *= 2; + lnbuf = grep_realloc(lnbuf, ++lnbufsize); } if (c == '\n') break; @@ -182,7 +180,13 @@ grep_fgetln(file_t *f, size_t *l) { switch (f->type) { case FILE_STDIO: - return fgetln(f->f, l); + if ((*l = getline(, , f->f)) == -1) { + if (ferror(f->f)) + err(2, "%s: getline", fname); + else + return NULL; + } + return lnbuf; #ifndef SMALL case FILE_MMAP: return mmfgetln(f->mmf, l); diff --git a/usr.bin/grep/grep.c b/usr.bin/grep/grep.c index 913cc97a0f3..cfac24b12aa 100644 --- a/usr.bin/grep/grep.c +++ b/usr.bin/grep/grep.c @@ -224,15 +224,19 @@ read_patterns(const char *fn) { FILE *f; char *line; - size_t len; + ssize_t len; + size_t linesize; if ((f = fopen(fn, "r")) == NULL) err(2, "%s", fn); - while ((line = fgetln(f, )) != NULL) + line = NULL; + linesize = 0; + while ((len = getline(, , f)) != -1) add_pattern(line, *line == '\n' ? 0 : len); if (ferror(f)) err(2, "%s", fn); fclose(f); + free(line); } int -- Lauri Tirkkonen | lotheac @ IRCnet
Re: grep: convert fgetln to getline
Lauri Tirkkonen wrote: > > > oh, interesting. that's sloppy. can you please fix that first, separately? > > > > Sure, here it is. > > Could you please take a look at it? It's been a couple weeks. yup, sorry, slipped by. committed.
Re: grep: convert fgetln to getline
On Mon, Jan 07 2019 23:01:47 +0200, Lauri Tirkkonen wrote: > On Mon, Jan 07 2019 15:41:53 -0500, Ted Unangst wrote: > > Lauri Tirkkonen wrote: > > > On Sun, Jan 06 2019 14:02:16 -0500, Ted Unangst wrote: > > > > Lauri Tirkkonen wrote: > > > > > Hi, another simple diff converting fgetln usage to getline. > > > > > > > > > > I also considered converting grep_fgetln to grep_getline, but that > > > > > would > > > > > mean that for FILE_MMAP we'd have to copy the string. So this diff > > > > > keeps > > > > > the grep_fgetln interface as is, but avoids using fgetln from libc > > > > > (and > > > > > adds error handling for FILE_STDIO). > > > > > > > > this looks good and seems to work. thanks. > > > > > > actually, it doesn't work :) the added error handling catches that > > > directory fd's are being passed into grep_fgetln, causing an error exit > > > with grep -r (or just grep foo /etc/*). I'm working on a second diff to > > > refactor the file handling a bit. > > > > oh, interesting. that's sloppy. can you please fix that first, separately? > > Sure, here it is. Could you please take a look at it? It's been a couple weeks. -- Lauri Tirkkonen | lotheac @ IRCnet
Re: grep: convert fgetln to getline
On Mon, Jan 07 2019 23:01:47 +0200, Lauri Tirkkonen wrote: > On Mon, Jan 07 2019 15:41:53 -0500, Ted Unangst wrote: > > Lauri Tirkkonen wrote: > > > On Sun, Jan 06 2019 14:02:16 -0500, Ted Unangst wrote: > > > > Lauri Tirkkonen wrote: > > > > > Hi, another simple diff converting fgetln usage to getline. > > > > > > > > > > I also considered converting grep_fgetln to grep_getline, but that > > > > > would > > > > > mean that for FILE_MMAP we'd have to copy the string. So this diff > > > > > keeps > > > > > the grep_fgetln interface as is, but avoids using fgetln from libc > > > > > (and > > > > > adds error handling for FILE_STDIO). > > > > > > > > this looks good and seems to work. thanks. > > > > > > actually, it doesn't work :) the added error handling catches that > > > directory fd's are being passed into grep_fgetln, causing an error exit > > > with grep -r (or just grep foo /etc/*). I'm working on a second diff to > > > refactor the file handling a bit. > > > > oh, interesting. that's sloppy. can you please fix that first, separately? > > Sure, here it is. ping. -- Lauri Tirkkonen | lotheac @ IRCnet
Re: grep: convert fgetln to getline
On Mon, Jan 07 2019 15:41:53 -0500, Ted Unangst wrote: > Lauri Tirkkonen wrote: > > On Sun, Jan 06 2019 14:02:16 -0500, Ted Unangst wrote: > > > Lauri Tirkkonen wrote: > > > > Hi, another simple diff converting fgetln usage to getline. > > > > > > > > I also considered converting grep_fgetln to grep_getline, but that would > > > > mean that for FILE_MMAP we'd have to copy the string. So this diff keeps > > > > the grep_fgetln interface as is, but avoids using fgetln from libc (and > > > > adds error handling for FILE_STDIO). > > > > > > this looks good and seems to work. thanks. > > > > actually, it doesn't work :) the added error handling catches that > > directory fd's are being passed into grep_fgetln, causing an error exit > > with grep -r (or just grep foo /etc/*). I'm working on a second diff to > > refactor the file handling a bit. > > oh, interesting. that's sloppy. can you please fix that first, separately? Sure, here it is. The idea is to use grep_fdopen for everything, making grep_open just open the file and pass the fd to grep_fdopen, do fstat() in grep_fdopen and modify mmopen() accordingly (it no longer needs to open the file or stat it). grep_tree is modified to not call procfile() on FTS_D (FTS_DP was already there). In addition grep_fdopen returns NULL and sets errno == EISDIR if it encounters a directory; this can happen if a directory argument is given on command line, or for recursive grep, if the file being processed is a symlink to a directory, which fts doesn't notice. procfile() handles the EISDIR return silently, matching current behavior. I also removed the useless mode argument from grep_open and grep_fdopen; we always open files read-only. diff --git a/usr.bin/grep/file.c b/usr.bin/grep/file.c index 4b3c689e4ab..2befc4304c1 100644 --- a/usr.bin/grep/file.c +++ b/usr.bin/grep/file.c @@ -26,7 +26,10 @@ * SUCH DAMAGE. */ +#include #include +#include +#include #include #include #include @@ -90,68 +93,64 @@ gzfgetln(gzFile *f, size_t *len) #endif file_t * -grep_fdopen(int fd, char *mode) +grep_fdopen(int fd) { file_t *f; + struct stat sb; if (fd == STDIN_FILENO) snprintf(fname, sizeof fname, "(standard input)"); - else + else if (fname[0] == '\0') snprintf(fname, sizeof fname, "(fd %d)", fd); + if (fstat(fd, ) == -1) + return NULL; + if (S_ISDIR(sb.st_mode)) { + errno = EISDIR; + return NULL; + } + f = grep_malloc(sizeof *f); #ifndef NOZ if (Zflag) { f->type = FILE_GZIP; f->noseek = lseek(fd, 0L, SEEK_SET) == -1; - if ((f->gzf = gzdopen(fd, mode)) != NULL) + if ((f->gzf = gzdopen(fd, "r")) != NULL) return f; - } else + } #endif - { - f->type = FILE_STDIO; - f->noseek = isatty(fd); - if ((f->f = fdopen(fd, mode)) != NULL) - return f; + f->noseek = isatty(fd); +#ifndef SMALL + /* try mmap first; if it fails, try stdio */ + if (!f->noseek && (f->mmf = mmopen(fd, )) != NULL) { + f->type = FILE_MMAP; + return f; } +#endif + f->type = FILE_STDIO; + if ((f->f = fdopen(fd, "r")) != NULL) + return f; free(f); return NULL; } file_t * -grep_open(char *path, char *mode) +grep_open(char *path) { file_t *f; + int fd; snprintf(fname, sizeof fname, "%s", path); - f = grep_malloc(sizeof *f); - f->noseek = 0; - -#ifndef NOZ - if (Zflag) { - f->type = FILE_GZIP; - if ((f->gzf = gzopen(fname, mode)) != NULL) - return f; - } else -#endif - { -#ifndef SMALL - /* try mmap first; if it fails, try stdio */ - if ((f->mmf = mmopen(fname, mode)) != NULL) { - f->type = FILE_MMAP; - return f; - } -#endif - f->type = FILE_STDIO; - if ((f->f = fopen(path, mode)) != NULL) - return f; - } + if ((fd = open(fname, O_RDONLY)) == -1) + return NULL; - free(f); - return NULL; + f = grep_fdopen(fd); + if (f == NULL) + close(fd); + return f; } int diff --git a/usr.bin/grep/grep.c b/usr.bin/grep/grep.c index 913cc97a0f3..e5e48fa163a 100644 --- a/usr.bin/grep/grep.c +++ b/usr.bin/grep/grep.c @@ -63,9 +63,7 @@ intFflag; /* -F: interpret pattern as list of fixed strings */ int Hflag; /* -H: always print filename header */ int Lflag; /* -L: only show names of files with no matches */ int Rflag; /* -R: recursively search directory trees */ -#ifndef NOZ int Zflag; /* -Z: decompress input before processing */
Re: grep: convert fgetln to getline
Lauri Tirkkonen wrote: > On Sun, Jan 06 2019 14:02:16 -0500, Ted Unangst wrote: > > Lauri Tirkkonen wrote: > > > Hi, another simple diff converting fgetln usage to getline. > > > > > > I also considered converting grep_fgetln to grep_getline, but that would > > > mean that for FILE_MMAP we'd have to copy the string. So this diff keeps > > > the grep_fgetln interface as is, but avoids using fgetln from libc (and > > > adds error handling for FILE_STDIO). > > > > this looks good and seems to work. thanks. > > actually, it doesn't work :) the added error handling catches that > directory fd's are being passed into grep_fgetln, causing an error exit > with grep -r (or just grep foo /etc/*). I'm working on a second diff to > refactor the file handling a bit. oh, interesting. that's sloppy. can you please fix that first, separately? (wrt getline, theo raised some concern that getline copies more data than fgetln does. we care quite a bit about grep performance, so we need to consider that some more.)
Re: grep: convert fgetln to getline
On Sun, Jan 06 2019 14:02:16 -0500, Ted Unangst wrote: > Lauri Tirkkonen wrote: > > Hi, another simple diff converting fgetln usage to getline. > > > > I also considered converting grep_fgetln to grep_getline, but that would > > mean that for FILE_MMAP we'd have to copy the string. So this diff keeps > > the grep_fgetln interface as is, but avoids using fgetln from libc (and > > adds error handling for FILE_STDIO). > > this looks good and seems to work. thanks. actually, it doesn't work :) the added error handling catches that directory fd's are being passed into grep_fgetln, causing an error exit with grep -r (or just grep foo /etc/*). I'm working on a second diff to refactor the file handling a bit. -- Lauri Tirkkonen | lotheac @ IRCnet
Re: grep: convert fgetln to getline
Lauri Tirkkonen wrote: > Hi, another simple diff converting fgetln usage to getline. > > I also considered converting grep_fgetln to grep_getline, but that would > mean that for FILE_MMAP we'd have to copy the string. So this diff keeps > the grep_fgetln interface as is, but avoids using fgetln from libc (and > adds error handling for FILE_STDIO). this looks good and seems to work. thanks.
grep: convert fgetln to getline
Hi, another simple diff converting fgetln usage to getline. I also considered converting grep_fgetln to grep_getline, but that would mean that for FILE_MMAP we'd have to copy the string. So this diff keeps the grep_fgetln interface as is, but avoids using fgetln from libc (and adds error handling for FILE_STDIO). diff --git a/usr.bin/grep/file.c b/usr.bin/grep/file.c index 4b3c689e4ab..87c49dd5cc0 100644 --- a/usr.bin/grep/file.c +++ b/usr.bin/grep/file.c @@ -34,10 +34,8 @@ #include "grep.h" static char fname[PATH_MAX]; -#ifndef NOZ static char*lnbuf; -static size_t lnbuflen; -#endif +static size_t lnbufsize; #define FILE_STDIO 0 #define FILE_MMAP 1 @@ -73,9 +71,9 @@ gzfgetln(gzFile *f, size_t *len) else errx(2, "%s: %s", fname, gzerrstr); } - if (n >= lnbuflen) { - lnbuflen *= 2; - lnbuf = grep_realloc(lnbuf, ++lnbuflen); + if (n >= lnbufsize) { + lnbufsize *= 2; + lnbuf = grep_realloc(lnbuf, ++lnbufsize); } if (c == '\n') break; @@ -182,7 +180,13 @@ grep_fgetln(file_t *f, size_t *l) { switch (f->type) { case FILE_STDIO: - return fgetln(f->f, l); + if ((*l = getline(, , f->f)) == -1) { + if (ferror(f->f)) + err(2, "%s: getline", fname); + else + return NULL; + } + return lnbuf; #ifndef SMALL case FILE_MMAP: return mmfgetln(f->mmf, l); diff --git a/usr.bin/grep/grep.c b/usr.bin/grep/grep.c index 913cc97a0f3..cfac24b12aa 100644 --- a/usr.bin/grep/grep.c +++ b/usr.bin/grep/grep.c @@ -224,15 +224,19 @@ read_patterns(const char *fn) { FILE *f; char *line; - size_t len; + ssize_t len; + size_t linesize; if ((f = fopen(fn, "r")) == NULL) err(2, "%s", fn); - while ((line = fgetln(f, )) != NULL) + line = NULL; + linesize = 0; + while ((len = getline(, , f)) != -1) add_pattern(line, *line == '\n' ? 0 : len); if (ferror(f)) err(2, "%s", fn); fclose(f); + free(line); } int -- Lauri Tirkkonen | lotheac @ IRCnet