On 7/22/19 10:17 AM, enh via Toybox wrote: > thoughts? Sorry, laptop battery ran out a few days ago (got caught inside by a thunderstorm and couldn't get back to my charger before the battery drained enough the Dell bios woke up from suspend and resumed-to-nothing), and thunderbird doesn't remember open reply windows the way kmail used to.
> On Fri, Jul 19, 2019 at 9:55 AM enh <[email protected]> wrote: >> >> I started this last night, but thought I'd aim to send multiple small >> patches rather than work through all the callers and send one big patch. This has also been on my todo list, it's just big and touches a lot of files some of which are dirty in my tree, so I hadn't gotten all the way through it yet. :) >> I've deliberately chosen the ugly name `allocated_length` because we've >> had historical bugs where folks think this a line length in the sense of >> the return value. I do wonder whether we should actually have some kind >> of getline() wrapper that hides the `char *`/`size_t` pair in lib/, As long as we're changing things anyway, a wrapper that simplifies the API would probably make sense? Just always malloc it and return the pointer. Let's see, is: char *blah = xgetline(fp); enough info? For the /n version it is (we can strlen), for the delim version we want the returned length. and it's a minor optimization to _have_ the length... char *blah = xgetline(FILE *fp, unsigned *len); Where if len is NULL it doesn't try to set it, so the "simple call is s = xgetline(fp, 0); And then it can return NUL for EOF. (The downside of this is it wouldn't report read errors, although you can always zero and check errno. In theory we could use int *len and return -1 but... why? Returning null for EOF on a read error is right 95% of the time anyway, and the function itself could error_msg() if we really wanted it to, but we probably don't?) On my todo list this is entangled with "make the data member of double_list a void * instead of a char *" because both design decisions were fallout from implementing patch.c back in the day. (Although the double_list thing also involves turning dlist_add_nomalloc into just dlist_add and then making the current behavior dlist_add_wrap()... possibly struct double_list should _just_ have the prev and next pointers, and then it can be a first member of subclassed double lists and we can dlist_add(&list, &new->dl); Ahem. I have a whole lot of polishing work I _could_ do if I wasn't busy with everything else. :) >> which makes the function easier to use in most cases but does add the >> less common gotcha that you wouldn't be able to getline() through >> multiple files at once (which does happen in at least one toy). >> >> But maybe the real fix is to look harder for places where we can just >> use loopfiles_lines? Also worth doing, yes. Especially since I want to make the plumbing for loopfiles_lines() try to read larger blocks and chop them up itself. >> Speaking of which, should we actually add two more >> arguments to that? Specifically: switch it to getdelim() rather than >> getline() behind the scenes, and also add a way to have the trailing >> '\n' automatically removed, since that seems to be what most callers >> want? Yes, the xgetline() above should NUL out the \n before returning the string. There should be a getdelim version, but NUL-ing out the delimiter still makes sense there. (And nulling out a NUL is a NOP so shouldn't hurt anything.) ALTHOUGH, it's possible the delimiter could be a string, allowing us to have utf8 delimiters? (Which makes NUL delimiters awkward: should it treat a NULL pointer as meaning delimit with NUL, should it treat an empty string that way, should the delimiter have a length...?) char *s = xgetdelim(FILE *fp, char *delim, int delimlen, int *len); Eh, it's not _too_ awkward. But is there ever a case where we'd want a delimiter with a NUL _in_ it? (Probably not?) In which case the above is xgetdelim_utf8() and then we'd have an: char *s = xgetdelim(FILE *fp, char delim, int *len); Wrapper to call it. And then xgetline() could also be a wrapepr for xgetdelim_utf8(), and _that_ would be the function doing the fancy "read 64k at at a time and chop lines out of it to xstrndup(), calling xrealloc() as necessary when you cross a block boundary or hit a crazy long one". AND we could put a ceiling on line length if we wanted to, although... do we want to? Under what circumstances? (Query available memory? What does ulimit -d actually limit again? The problem is malloc() only allocating _virtual_ memory and then the OOM killer telling you when you've run out of physical memory, often long after swap thrashing...) Ahem. There's a reason this hairball is still on the todo list. :) >> Anyway, that seemed like enough questions that it was time to send this >> initial patch out before doing too much more... >> --- >> lib/password.c | 9 +++++---- >> toys/other/rev.c | 26 ++++++++++++-------------- >> toys/posix/uudecode.c | 21 +++++++++++++-------- >> 3 files changed, 30 insertions(+), 26 deletions(-) Applied. Longer-term the input batching is a reasonable performance win, something you care about in grep. And that rewrites xgetline() and xgetdelim() wrappers around a new function anyway, and as long as we're doing that we may as well change the API to a) always return a malloced string, b) optionally return the length of the string... Ah, wait. There ARE times we care about the delimiter being there or not: the last line isn't guaranteed to have it and there are subtle behavior changes in sed and such (which I implemented!) based on this. (It's related to the sed-of-a-binary thing, you don't want to add a NUL to the end of the file if there wasn't one...) Grrr. <marvin the martian voice>design, design...</marvin the martian voice> Rob (This is why I didn't immediately reply to the first email.) _______________________________________________ Toybox mailing list [email protected] http://lists.landley.net/listinfo.cgi/toybox-landley.net
