On Mon, Jul 22, 2019 at 2:44 PM Rob Landley <[email protected]> wrote: > > On 7/22/19 10:17 AM, enh via Toybox wrote: > > thoughts? > > Sorry, laptop battery ran out a few days ago (got caught inside by a > thunderstorm and couldn't get back to my charger before the battery drained > enough the Dell bios woke up from suspend and resumed-to-nothing), and > thunderbird doesn't remember open reply windows the way kmail used to. > > > On Fri, Jul 19, 2019 at 9:55 AM enh <[email protected]> wrote: > >> > >> I started this last night, but thought I'd aim to send multiple small > >> patches rather than work through all the callers and send one big patch. > > This has also been on my todo list, it's just big and touches a lot of files > some of which are dirty in my tree, so I hadn't gotten all the way through it > yet. :) > > >> I've deliberately chosen the ugly name `allocated_length` because we've > >> had historical bugs where folks think this a line length in the sense of > >> the return value. I do wonder whether we should actually have some kind > >> of getline() wrapper that hides the `char *`/`size_t` pair in lib/, > > As long as we're changing things anyway, a wrapper that simplifies the API > would > probably make sense? Just always malloc it and return the pointer. Let's see, > is: > > char *blah = xgetline(fp); > > enough info? For the /n version it is (we can strlen), for the delim version > we > want the returned length. and it's a minor optimization to _have_ the > length... > > char *blah = xgetline(FILE *fp, unsigned *len); > > Where if len is NULL it doesn't try to set it, so the "simple call is > s = xgetline(fp, 0); > > And then it can return NUL for EOF. (The downside of this is it wouldn't > report > read errors, although you can always zero and check errno. In theory we could > use int *len and return -1 but... why? Returning null for EOF on a read error > is > right 95% of the time anyway, and the function itself could error_msg() if we > really wanted it to, but we probably don't?) > > On my todo list this is entangled with "make the data member of double_list a > void * instead of a char *" because both design decisions were fallout from > implementing patch.c back in the day. (Although the double_list thing also > involves turning dlist_add_nomalloc into just dlist_add and then making the > current behavior dlist_add_wrap()... possibly struct double_list should _just_ > have the prev and next pointers, and then it can be a first member of > subclassed > double lists and we can dlist_add(&list, &new->dl); > > Ahem. I have a whole lot of polishing work I _could_ do if I wasn't busy with > everything else. :) > > >> which makes the function easier to use in most cases but does add the > >> less common gotcha that you wouldn't be able to getline() through > >> multiple files at once (which does happen in at least one toy). > >> > >> But maybe the real fix is to look harder for places where we can just > >> use loopfiles_lines? > > Also worth doing, yes. Especially since I want to make the plumbing for > loopfiles_lines() try to read larger blocks and chop them up itself. > > >> Speaking of which, should we actually add two more > >> arguments to that? Specifically: switch it to getdelim() rather than > >> getline() behind the scenes, and also add a way to have the trailing > >> '\n' automatically removed, since that seems to be what most callers > >> want? > > Yes, the xgetline() above should NUL out the \n before returning the string. > > There should be a getdelim version, but NUL-ing out the delimiter still makes > sense there. (And nulling out a NUL is a NOP so shouldn't hurt anything.) > ALTHOUGH, it's possible the delimiter could be a string, allowing us to have > utf8 delimiters? (Which makes NUL delimiters awkward: should it treat a NULL > pointer as meaning delimit with NUL, should it treat an empty string that way, > should the delimiter have a length...?) > > char *s = xgetdelim(FILE *fp, char *delim, int delimlen, int *len); > > Eh, it's not _too_ awkward. But is there ever a case where we'd want a > delimiter > with a NUL _in_ it? (Probably not?) In which case the above is > xgetdelim_utf8() > and then we'd have an: > > char *s = xgetdelim(FILE *fp, char delim, int *len); > > Wrapper to call it. And then xgetline() could also be a wrapepr for > xgetdelim_utf8(), and _that_ would be the function doing the fancy "read 64k > at > at a time and chop lines out of it to xstrndup(), calling xrealloc() as > necessary when you cross a block boundary or hit a crazy long one". AND we > could > put a ceiling on line length if we wanted to, although... do we want to? Under > what circumstances? (Query available memory? What does ulimit -d actually > limit > again? The problem is malloc() only allocating _virtual_ memory and then the > OOM > killer telling you when you've run out of physical memory, often long after > swap > thrashing...) > > Ahem. There's a reason this hairball is still on the todo list. :) > > >> Anyway, that seemed like enough questions that it was time to send this > >> initial patch out before doing too much more... > >> --- > >> lib/password.c | 9 +++++---- > >> toys/other/rev.c | 26 ++++++++++++-------------- > >> toys/posix/uudecode.c | 21 +++++++++++++-------- > >> 3 files changed, 30 insertions(+), 26 deletions(-) > > Applied. > > Longer-term the input batching is a reasonable performance win, something you > care about in grep. And that rewrites xgetline() and xgetdelim() wrappers > around > a new function anyway, and as long as we're doing that we may as well change > the > API to a) always return a malloced string, b) optionally return the length of > the string...
i'm not sure what you mean here. the underlying getdelim()/getline() does this (via the buffer in the FILE*). it's the get_line() that we're removing that _doesn't_ buffer input. switching to getline() [or loopfiles_lines, which uses it] gets you all the way to where you want to be from that perspective. > Ah, wait. There ARE times we care about the delimiter being there or not: the > last line isn't guaranteed to have it and there are subtle behavior changes in > sed and such (which I implemented!) based on this. (It's related to the > sed-of-a-binary thing, you don't want to add a NUL to the end of the file if > there wasn't one...) on the [i think valid] assumption that we only need '\n' or '\0' as delimiters, a flag for that and a flag for whether or not to keep the delimiter in loopfiles_lines seems like it would be more than enough to be getting on with? better than where we are today, anyway. apart from the fact a few locals need to be promoted to [TT.] globals, loopfile_lines() seems strictly like an improvement for code currently using getline() too? the attached patch switches over nl.c, for example, and it's less code and a larger scope for the reuse of the getline() buffer (even ignoring the pessimal use of getline() in nl.c that allocated+deallocated a new buffer per line). [PATCH] nl: switch from getline() to loopfiles_lines(). This was basically just to help me think out loud while discussing loopfiles_lines(). --- toys/posix/nl.c | 46 ++++++++++++++++++---------------------------- 1 file changed, 18 insertions(+), 28 deletions(-) > Grrr. <marvin the martian voice>design, design...</marvin the martian voice> > > Rob > > (This is why I didn't immediately reply to the first email.)
From bfb2b7ed43f8ef145f2e4044480c2f3c84e39c67 Mon Sep 17 00:00:00 2001 From: Elliott Hughes <[email protected]> Date: Mon, 22 Jul 2019 16:24:46 -0700 Subject: [PATCH] nl: switch from getline() to loopfiles_lines(). This was basically just to help me think out loud while discussing loopfiles_lines(). --- toys/posix/nl.c | 46 ++++++++++++++++++---------------------------- 1 file changed, 18 insertions(+), 28 deletions(-) diff --git a/toys/posix/nl.c b/toys/posix/nl.c index 1b7b390a..ef2d7ab3 100644 --- a/toys/posix/nl.c +++ b/toys/posix/nl.c @@ -35,36 +35,25 @@ GLOBALS( // Count of consecutive blank lines for -l has to persist between files long lcount; + long slen; ) -static void do_nl(int fd, char *name) +static void do_nl(char **pline, long len) { - FILE *f = xfdopen(fd, "r"); - int w = TT.w, slen = strlen(TT.s); - - for (;;) { - char *line = 0; - size_t temp; - int match = *TT.b != 'n'; - - if (getline(&line, &temp, f) < 1) { - if (ferror(f)) perror_msg_raw(name); - break; - } - - if (*TT.b == 'p') match = !regexec((void *)(toybuf+16), line, 0, 0, 0); - if (TT.l || *TT.b == 't') - if (*line == '\n') match = TT.l && ++TT.lcount >= TT.l; - if (match) { - TT.lcount = 0; - printf(toybuf, w, TT.v++, TT.s); - } else printf("%*c", (int)w+slen, ' '); - xprintf("%s", line); - - free(line); - } - - fclose(f); + char *line; + int match = *TT.b != 'n'; + + if (!pline) return; + line = *pline; + + if (*TT.b == 'p') match = !regexec((void *)(toybuf+16), line, 0, 0, 0); + if (TT.l || *TT.b == 't') + if (*line == '\n') match = TT.l && ++TT.lcount >= TT.l; + if (match) { + TT.lcount = 0; + printf(toybuf, TT.w, TT.v++, TT.s); + } else printf("%*c", (int)(TT.w+TT.slen), ' '); + xprintf("%s", line); } void nl_main(void) @@ -72,6 +61,7 @@ void nl_main(void) char *clip = ""; if (!TT.s) TT.s = "\t"; + TT.slen = strlen(TT.s); if (!TT.n || !strcmp(TT.n, "rn")); // default else if (!strcmp(TT.n, "ln")) clip = "-"; @@ -86,5 +76,5 @@ void nl_main(void) REG_NOSUB | (toys.optflags&FLAG_E)*REG_EXTENDED); else if (!strchr("atn", *TT.b)) error_exit("bad -b '%s'", TT.b); - loopfiles (toys.optargs, do_nl); + loopfiles_lines(toys.optargs, do_nl); } -- 2.22.0.657.g960e92d24f-goog
_______________________________________________ Toybox mailing list [email protected] http://lists.landley.net/listinfo.cgi/toybox-landley.net
