On Mon, Jul 22, 2019 at 2:44 PM Rob Landley <[email protected]> wrote:
>
> On 7/22/19 10:17 AM, enh via Toybox wrote:
> > thoughts?
>
> Sorry, laptop battery ran out a few days ago (got caught inside by a
> thunderstorm and couldn't get back to my charger before the battery drained
> enough the Dell bios woke up from suspend and resumed-to-nothing), and
> thunderbird doesn't remember  open reply windows the way kmail used to.
>
> > On Fri, Jul 19, 2019 at 9:55 AM enh <[email protected]> wrote:
> >>
> >> I started this last night, but thought I'd aim to send multiple small
> >> patches rather than work through all the callers and send one big patch.
>
> This has also been on my todo list, it's just big and touches a lot of files
> some of which are dirty in my tree, so I hadn't gotten all the way through it
> yet. :)
>
> >> I've deliberately chosen the ugly name `allocated_length` because we've
> >> had historical bugs where folks think this a line length in the sense of
> >> the return value. I do wonder whether we should actually have some kind
> >> of getline() wrapper that hides the `char *`/`size_t` pair in lib/,
>
> As long as we're changing things anyway, a wrapper that simplifies the API 
> would
> probably make sense? Just always malloc it and return the pointer. Let's see, 
> is:
>
>   char *blah = xgetline(fp);
>
> enough info? For the /n version it is (we can strlen), for the delim version 
> we
> want the returned length. and it's a minor optimization to _have_ the 
> length...
>
>   char *blah = xgetline(FILE *fp, unsigned *len);
>
> Where if len is NULL it doesn't try to set it, so the "simple call is
> s = xgetline(fp, 0);
>
> And then it can return NUL for EOF. (The downside of this is it wouldn't 
> report
> read errors, although you can always zero and check errno. In theory we could
> use int *len and return -1 but... why? Returning null for EOF on a read error 
> is
> right 95% of the time anyway, and the function itself could error_msg() if we
> really wanted it to, but we probably don't?)
>
> On my todo list this is entangled with "make the data member of double_list a
> void * instead of a char *" because both design decisions were fallout from
> implementing patch.c back in the day. (Although the double_list thing also
> involves turning dlist_add_nomalloc into just dlist_add and then making the
> current behavior dlist_add_wrap()... possibly struct double_list should _just_
> have the prev and next pointers, and then it can be a first member of 
> subclassed
> double lists and we can dlist_add(&list, &new->dl);
>
> Ahem. I have a whole lot of polishing work I _could_ do if I wasn't busy with
> everything else. :)
>
> >> which makes the function easier to use in most cases but does add the
> >> less common gotcha that you wouldn't be able to getline() through
> >> multiple files at once (which does happen in at least one toy).
> >>
> >> But maybe the real fix is to look harder for places where we can just
> >> use loopfiles_lines?
>
> Also worth doing, yes. Especially since I want to make the plumbing for
> loopfiles_lines() try to read larger blocks and chop them up itself.
>
> >> Speaking of which, should we actually add two more
> >> arguments to that? Specifically: switch it to getdelim() rather than
> >> getline() behind the scenes, and also add a way to have the trailing
> >> '\n' automatically removed, since that seems to be what most callers
> >> want?
>
> Yes, the xgetline() above should NUL out the \n before returning the string.
>
> There should be a getdelim version, but NUL-ing out the delimiter still makes
> sense there. (And nulling out a NUL is a NOP so shouldn't hurt anything.)
> ALTHOUGH, it's possible the delimiter could be a string, allowing us to have
> utf8 delimiters? (Which makes NUL delimiters awkward: should it treat a NULL
> pointer as meaning delimit with NUL, should it treat an empty string that way,
> should the delimiter have a length...?)
>
>   char *s = xgetdelim(FILE *fp, char *delim, int delimlen, int *len);
>
> Eh, it's not _too_ awkward. But is there ever a case where we'd want a 
> delimiter
> with a NUL _in_ it? (Probably not?) In which case the above is 
> xgetdelim_utf8()
> and then we'd have an:
>
>   char *s = xgetdelim(FILE *fp, char delim, int *len);
>
> Wrapper to call it. And then xgetline() could also be a wrapepr for
> xgetdelim_utf8(), and _that_ would be the function doing the fancy "read 64k 
> at
> at a time and chop lines out of it to xstrndup(), calling xrealloc() as
> necessary when you cross a block boundary or hit a crazy long one". AND we 
> could
> put a ceiling on line length if we wanted to, although... do we want to? Under
> what circumstances? (Query available memory? What does ulimit -d actually 
> limit
> again? The problem is malloc() only allocating _virtual_ memory and then the 
> OOM
> killer telling you when you've run out of physical memory, often long after 
> swap
> thrashing...)
>
> Ahem. There's a reason this hairball is still on the todo list. :)
>
> >> Anyway, that seemed like enough questions that it was time to send this
> >> initial patch out before doing too much more...
> >> ---
> >>  lib/password.c        |  9 +++++----
> >>  toys/other/rev.c      | 26 ++++++++++++--------------
> >>  toys/posix/uudecode.c | 21 +++++++++++++--------
> >>  3 files changed, 30 insertions(+), 26 deletions(-)
>
> Applied.
>
> Longer-term the input batching is a reasonable performance win, something you
> care about in grep. And that rewrites xgetline() and xgetdelim() wrappers 
> around
> a new function anyway, and as long as we're doing that we may as well change 
> the
> API to a) always return a malloced string, b) optionally return the length of
> the string...

i'm not sure what you mean here. the underlying getdelim()/getline()
does this (via the buffer in the FILE*). it's the get_line() that
we're removing that _doesn't_ buffer input. switching to getline() [or
loopfiles_lines, which uses it] gets you all the way to where you want
to be from that perspective.

> Ah, wait. There ARE times we care about the delimiter being there or not: the
> last line isn't guaranteed to have it and there are subtle behavior changes in
> sed and such (which I implemented!) based on this. (It's related to the
> sed-of-a-binary thing, you don't want to add a NUL to the end of the file if
> there wasn't one...)

on the [i think valid] assumption that we only need '\n' or '\0' as
delimiters, a flag for that and a flag for whether or not to keep the
delimiter in loopfiles_lines seems like it would be more than enough
to be getting on with? better than where we are today, anyway.

apart from the fact a few locals need to be promoted to [TT.] globals,
loopfile_lines() seems strictly like an improvement for code currently
using getline() too? the attached patch switches over nl.c, for
example, and it's less code and a larger scope for the reuse of the
getline() buffer (even ignoring the pessimal use of getline() in nl.c
that allocated+deallocated a new buffer per line).

[PATCH] nl: switch from getline() to loopfiles_lines().

This was basically just to help me think out loud while discussing
loopfiles_lines().
---
 toys/posix/nl.c | 46 ++++++++++++++++++----------------------------
 1 file changed, 18 insertions(+), 28 deletions(-)

> Grrr. <marvin the martian voice>design, design...</marvin the martian voice>
>
> Rob
>
> (This is why I didn't immediately reply to the first email.)
From bfb2b7ed43f8ef145f2e4044480c2f3c84e39c67 Mon Sep 17 00:00:00 2001
From: Elliott Hughes <[email protected]>
Date: Mon, 22 Jul 2019 16:24:46 -0700
Subject: [PATCH] nl: switch from getline() to loopfiles_lines().

This was basically just to help me think out loud while discussing
loopfiles_lines().
---
 toys/posix/nl.c | 46 ++++++++++++++++++----------------------------
 1 file changed, 18 insertions(+), 28 deletions(-)

diff --git a/toys/posix/nl.c b/toys/posix/nl.c
index 1b7b390a..ef2d7ab3 100644
--- a/toys/posix/nl.c
+++ b/toys/posix/nl.c
@@ -35,36 +35,25 @@ GLOBALS(
 
   // Count of consecutive blank lines for -l has to persist between files
   long lcount;
+  long slen;
 )
 
-static void do_nl(int fd, char *name)
+static void do_nl(char **pline, long len)
 {
-  FILE *f = xfdopen(fd, "r");
-  int w = TT.w, slen = strlen(TT.s);
-
-  for (;;) {
-    char *line = 0;
-    size_t temp;
-    int match = *TT.b != 'n';
-
-    if (getline(&line, &temp, f) < 1) {
-      if (ferror(f)) perror_msg_raw(name);
-      break;
-    }
-
-    if (*TT.b == 'p') match = !regexec((void *)(toybuf+16), line, 0, 0, 0);
-    if (TT.l || *TT.b == 't')
-      if (*line == '\n') match = TT.l && ++TT.lcount >= TT.l;
-    if (match) {
-      TT.lcount = 0;
-      printf(toybuf, w, TT.v++, TT.s);
-    } else printf("%*c", (int)w+slen, ' ');
-    xprintf("%s", line);
-
-    free(line);
-  }
-
-  fclose(f);
+  char *line;
+  int match = *TT.b != 'n';
+
+  if (!pline) return;
+  line = *pline;
+
+  if (*TT.b == 'p') match = !regexec((void *)(toybuf+16), line, 0, 0, 0);
+  if (TT.l || *TT.b == 't')
+    if (*line == '\n') match = TT.l && ++TT.lcount >= TT.l;
+  if (match) {
+    TT.lcount = 0;
+    printf(toybuf, TT.w, TT.v++, TT.s);
+  } else printf("%*c", (int)(TT.w+TT.slen), ' ');
+  xprintf("%s", line);
 }
 
 void nl_main(void)
@@ -72,6 +61,7 @@ void nl_main(void)
   char *clip = "";
 
   if (!TT.s) TT.s = "\t";
+  TT.slen = strlen(TT.s);
 
   if (!TT.n || !strcmp(TT.n, "rn")); // default
   else if (!strcmp(TT.n, "ln")) clip = "-";
@@ -86,5 +76,5 @@ void nl_main(void)
       REG_NOSUB | (toys.optflags&FLAG_E)*REG_EXTENDED);
   else if (!strchr("atn", *TT.b)) error_exit("bad -b '%s'", TT.b);
 
-  loopfiles (toys.optargs, do_nl);
+  loopfiles_lines(toys.optargs, do_nl);
 }
-- 
2.22.0.657.g960e92d24f-goog

_______________________________________________
Toybox mailing list
[email protected]
http://lists.landley.net/listinfo.cgi/toybox-landley.net

Reply via email to