Re: ksh: skip long history lines
On Wed, Oct 18 2017, Ori Bernstein wrote: > On Thu, 19 Oct 2017 00:22:51 +0200 > Jeremie Courreges-Anglas wrote: > >> Those variables are static so that error messages are only printed once >> in the shell lifetime. history reload happens each time the shell >> detects that the histfile has been modified, which can happen often if >> like me you have multiple interactives shells running at the same time. > > Makes sense, although I had been thinking that it could lead to lost > warnings when changing histfile; I'd hope that a corrupt histfile isn't > a persistent condition. That's a fair point, I think I understand your previous proposal better. Warning only once per histfile would need something like a file-global variable - that we should reset in hist_init. This sounds like extra quirkiness that would make the code more complicated for a limited benefit. In the end, it might eat some of the precious 24 lines of your terminal, but you can also just fix the offending lines. > In any case, this looks pretty good to me. Thanks for the feedback. >> > Maybe hoist this outside of the loop. >> >> I don't see a reason for this. (Now I see.) [...] Updated diff. The logic has changed, hopefully giving more accurate warning messages. Feedback / oks welcome. Index: history.c === RCS file: /d/cvs/src/bin/ksh/history.c,v retrieving revision 1.72 diff -u -p -p -u -r1.72 history.c --- history.c 18 Oct 2017 15:41:25 - 1.72 +++ history.c 19 Oct 2017 07:56:05 - @@ -740,23 +740,38 @@ static void history_load(Source *s) { char*p, encoded[LINE + 1], line[LINE + 1]; + int toolongseen = 0; rewind(histfh); + line_co = 1; /* just read it all; will auto resize history upon next command */ - for (line_co = 1; ; line_co++) { - p = fgets(encoded, sizeof(encoded), histfh); - if (p == NULL || feof(histfh) || ferror(histfh)) - break; + while (fgets(encoded, sizeof(encoded), histfh)) { if ((p = strchr(encoded, '\n')) == NULL) { - bi_errorf("history file is corrupt"); - return; + /* discard overlong line */ + do { + /* maybe a missing trailing newline? */ + if (strlen(encoded) != sizeof(encoded) - 1) { + bi_errorf("history file is corrupt"); + return; + } + } while (fgets(encoded, sizeof(encoded), histfh) + && strchr(encoded, '\n') == NULL); + + if (!toolongseen) { + toolongseen = 1; + bi_errorf("ignored history line(s) longer than" + " %d bytes", LINE); + } + + continue; } *p = '\0'; s->line = line_co; s->cmd_offset = line_co; strunvis(line, encoded); histsave(line_co, line, 0); + line_co++; } history_write(); -- jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF DDCC 0DFA 74AE 1524 E7EE
Re: ksh: skip long history lines
Jeremie Courreges-Anglas(j...@wxcvbn.org) on 2017.10.19 00:22:51 +0200: > > Hi Ori, > > thanks for your feedback. Reply and updated diff below, > > On Wed, Oct 18 2017, Ori Bernstein wrote: > > On Wed, 18 Oct 2017 22:33:51 +0200 > > Jeremie Courreges-Anglas wrote: > > > >> > >> It would be nice to support arbitrarily long lines, but a first step > >> would be to skip them gracefuly. > >> > >> The code modifies the loop condition: no tests against ferror(3)/feof(3) > >> are performed any more (I don't see their point). > >> > >> We could print a warning on ferror(), though, but that would be another > >> diff. > >> > >> ok? o > >> > >> > >> Index: history.c > >> === > >> RCS file: /d/cvs/src/bin/ksh/history.c,v > >> retrieving revision 1.72 > >> diff -u -p -p -u -r1.72 history.c > >> --- history.c 18 Oct 2017 15:41:25 - 1.72 > >> +++ history.c 18 Oct 2017 20:18:24 - > >> @@ -739,24 +739,45 @@ history_close(void) > >> static void > >> history_load(Source *s) > >> { > >> - char*p, encoded[LINE + 1], line[LINE + 1]; > >> + char*nl, *p, encoded[LINE + 1], line[LINE + 1]; > >> > >>rewind(histfh); > >> + line_co = 1; > >> > >>/* just read it all; will auto resize history upon next command */ > >> - for (line_co = 1; ; line_co++) { > >> - p = fgets(encoded, sizeof(encoded), histfh); > >> - if (p == NULL || feof(histfh) || ferror(histfh)) > >> - break; > >> - if ((p = strchr(encoded, '\n')) == NULL) { > >> - bi_errorf("history file is corrupt"); > >> - return; > >> + for (p = fgets(encoded, sizeof(encoded), histfh); p != NULL; > >> + p = fgets(encoded, sizeof(encoded), histfh)) { > > > > The redundant calls prevent this code from scanning as well as it > > could for me. Perhaps: > > > > for(;;) { > > if ((p = fgets(...) == NULL) > > break; > > > > Just a minor cosmetic nitpick, it's not a big deal either way. > > I must admit that the code is needlessly redundant, but I prefer to have > the loop condition inside... the loop condition. > > >> + if ((nl = strchr(encoded, '\n')) == NULL) { > >> + static int corrupt, toolong; > > > > I'm not sure why these need to be static. I'm not an expert on > > the code, but it looks like histsave can be called multiple > > times as the shell runs, which triggers a history reload. > > Keeping this state seems like it would be wrong. > > Those variables are static so that error messages are only printed once > in the shell lifetime. history reload happens each time the shell > detects that the histfile has been modified, which can happen often if > like me you have multiple interactives shells running at the same time. > > > Maybe hoist this outside of the loop. > > I don't see a reason for this. > > >> + /* no trailing newline? */ > >> + if (strlen(p) != sizeof(encoded) - 1) { > >> + if (!corrupt) { > >> + corrupt = 1; > >> + bi_errorf("history file is corrupt"); > >> + } > >> + return; > > > > Why does `corrupt` need to be recorded at all? There's a > > return right after it. > > Also to limit the amount of crap printed on screen, though maybe we > should warn every time in this case. Here's an updated diff that does > so and hopefully makes the intents clearer. It also adds a comment > suggested by anton@ > > If your histfile ends with an overlong line which is also not terminated > by a newline, the do...while loop would silently ignore this last line. > I think that's acceptable, but YMMV. > > > Index: history.c > === > RCS file: /d/cvs/src/bin/ksh/history.c,v > retrieving revision 1.72 > diff -u -p -p -u -r1.72 history.c > --- history.c 18 Oct 2017 15:41:25 - 1.72 > +++ history.c 18 Oct 2017 22:16:43 - > @@ -739,24 +739,42 @@ history_close(void) > static void > history_load(Source *s) > { > - char*p, encoded[LINE + 1], line[LINE + 1]; > + char*nl, *p, encoded[LINE + 1], line[LINE + 1]; > > rewind(histfh); > + line_co = 1; > > /* just read it all; will auto resize history upon next command */ > - for (line_co = 1; ; line_co++) { > - p = fgets(encoded, sizeof(encoded), histfh); > - if (p == NULL || feof(histfh) || ferror(histfh)) > - break; > - if ((p = strchr(encoded, '\n')) == NULL) { > - bi_errorf("history file is corrupt"); > - return; > + while ((p = fgets(encoded, sizeof(encoded), histfh)) != NULL) { > + if ((nl = strchr(encoded, '\n')) == NULL) { > +
Re: ksh: skip long history lines
On Thu, 19 Oct 2017 00:22:51 +0200 Jeremie Courreges-Anglas wrote: > Those variables are static so that error messages are only printed once > in the shell lifetime. history reload happens each time the shell > detects that the histfile has been modified, which can happen often if > like me you have multiple interactives shells running at the same time. Makes sense, although I had been thinking that it could lead to lost warnings when changing histfile; I'd hope that a corrupt histfile isn't a persistent condition. In any case, this looks pretty good to me. > > Maybe hoist this outside of the loop. > > I don't see a reason for this. > > >> + /* no trailing newline? */ > >> + if (strlen(p) != sizeof(encoded) - 1) { > >> + if (!corrupt) { > >> + corrupt = 1; > >> + bi_errorf("history file is corrupt"); > >> + } > >> + return; > > > > Why does `corrupt` need to be recorded at all? There's a > > return right after it. > > Also to limit the amount of crap printed on screen, though maybe we > should warn every time in this case. Here's an updated diff that does > so and hopefully makes the intents clearer. It also adds a comment > suggested by anton@ > > If your histfile ends with an overlong line which is also not terminated > by a newline, the do...while loop would silently ignore this last line. > I think that's acceptable, but YMMV. > > > Index: history.c > === > RCS file: /d/cvs/src/bin/ksh/history.c,v > retrieving revision 1.72 > diff -u -p -p -u -r1.72 history.c > --- history.c 18 Oct 2017 15:41:25 - 1.72 > +++ history.c 18 Oct 2017 22:16:43 - > @@ -739,24 +739,42 @@ history_close(void) > static void > history_load(Source *s) > { > - char*p, encoded[LINE + 1], line[LINE + 1]; > + char*nl, *p, encoded[LINE + 1], line[LINE + 1]; > > rewind(histfh); > + line_co = 1; > > /* just read it all; will auto resize history upon next command */ > - for (line_co = 1; ; line_co++) { > - p = fgets(encoded, sizeof(encoded), histfh); > - if (p == NULL || feof(histfh) || ferror(histfh)) > - break; > - if ((p = strchr(encoded, '\n')) == NULL) { > - bi_errorf("history file is corrupt"); > - return; > + while ((p = fgets(encoded, sizeof(encoded), histfh)) != NULL) { > + if ((nl = strchr(encoded, '\n')) == NULL) { > + static int warned; > + > + /* no trailing newline? */ > + if (strlen(p) != sizeof(encoded) - 1) { > + bi_errorf("history file is corrupt"); > + return; > + } > + > + if (!warned) { > + warned = 1; > + bi_errorf( > + "ignoring history line(s) longer than %d" > + " bytes", LINE); > + } > + > + /* discard overlong line */ > + do { > + p = fgets(encoded, sizeof(encoded), histfh); > + } while (p != NULL && strchr(p, '\n') == NULL); > + > + continue; > } > - *p = '\0'; > + *nl = '\0'; > s->line = line_co; > s->cmd_offset = line_co; > strunvis(line, encoded); > histsave(line_co, line, 0); > + line_co++; > } > > history_write(); > > -- > jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF DDCC 0DFA 74AE 1524 E7EE > -- Ori Bernstein
Re: ksh: skip long history lines
Hi Ori, thanks for your feedback. Reply and updated diff below, On Wed, Oct 18 2017, Ori Bernstein wrote: > On Wed, 18 Oct 2017 22:33:51 +0200 > Jeremie Courreges-Anglas wrote: > >> >> It would be nice to support arbitrarily long lines, but a first step >> would be to skip them gracefuly. >> >> The code modifies the loop condition: no tests against ferror(3)/feof(3) >> are performed any more (I don't see their point). >> >> We could print a warning on ferror(), though, but that would be another >> diff. >> >> ok? >> >> >> Index: history.c >> === >> RCS file: /d/cvs/src/bin/ksh/history.c,v >> retrieving revision 1.72 >> diff -u -p -p -u -r1.72 history.c >> --- history.c18 Oct 2017 15:41:25 - 1.72 >> +++ history.c18 Oct 2017 20:18:24 - >> @@ -739,24 +739,45 @@ history_close(void) >> static void >> history_load(Source *s) >> { >> -char*p, encoded[LINE + 1], line[LINE + 1]; >> +char*nl, *p, encoded[LINE + 1], line[LINE + 1]; >> >> rewind(histfh); >> +line_co = 1; >> >> /* just read it all; will auto resize history upon next command */ >> -for (line_co = 1; ; line_co++) { >> -p = fgets(encoded, sizeof(encoded), histfh); >> -if (p == NULL || feof(histfh) || ferror(histfh)) >> -break; >> -if ((p = strchr(encoded, '\n')) == NULL) { >> -bi_errorf("history file is corrupt"); >> -return; >> +for (p = fgets(encoded, sizeof(encoded), histfh); p != NULL; >> + p = fgets(encoded, sizeof(encoded), histfh)) { > > The redundant calls prevent this code from scanning as well as it > could for me. Perhaps: > > for(;;) { > if ((p = fgets(...) == NULL) > break; > > Just a minor cosmetic nitpick, it's not a big deal either way. I must admit that the code is needlessly redundant, but I prefer to have the loop condition inside... the loop condition. >> +if ((nl = strchr(encoded, '\n')) == NULL) { >> +static int corrupt, toolong; > > I'm not sure why these need to be static. I'm not an expert on > the code, but it looks like histsave can be called multiple > times as the shell runs, which triggers a history reload. > Keeping this state seems like it would be wrong. Those variables are static so that error messages are only printed once in the shell lifetime. history reload happens each time the shell detects that the histfile has been modified, which can happen often if like me you have multiple interactives shells running at the same time. > Maybe hoist this outside of the loop. I don't see a reason for this. >> +/* no trailing newline? */ >> +if (strlen(p) != sizeof(encoded) - 1) { >> +if (!corrupt) { >> +corrupt = 1; >> +bi_errorf("history file is corrupt"); >> +} >> +return; > > Why does `corrupt` need to be recorded at all? There's a > return right after it. Also to limit the amount of crap printed on screen, though maybe we should warn every time in this case. Here's an updated diff that does so and hopefully makes the intents clearer. It also adds a comment suggested by anton@ If your histfile ends with an overlong line which is also not terminated by a newline, the do...while loop would silently ignore this last line. I think that's acceptable, but YMMV. Index: history.c === RCS file: /d/cvs/src/bin/ksh/history.c,v retrieving revision 1.72 diff -u -p -p -u -r1.72 history.c --- history.c 18 Oct 2017 15:41:25 - 1.72 +++ history.c 18 Oct 2017 22:16:43 - @@ -739,24 +739,42 @@ history_close(void) static void history_load(Source *s) { - char*p, encoded[LINE + 1], line[LINE + 1]; + char*nl, *p, encoded[LINE + 1], line[LINE + 1]; rewind(histfh); + line_co = 1; /* just read it all; will auto resize history upon next command */ - for (line_co = 1; ; line_co++) { - p = fgets(encoded, sizeof(encoded), histfh); - if (p == NULL || feof(histfh) || ferror(histfh)) - break; - if ((p = strchr(encoded, '\n')) == NULL) { - bi_errorf("history file is corrupt"); - return; + while ((p = fgets(encoded, sizeof(encoded), histfh)) != NULL) { + if ((nl = strchr(encoded, '\n')) == NULL) { + static int warned; + + /* no trailing newline? */ + if (strlen(p) != sizeof(encoded) - 1) { + bi_errorf("history file is corrupt"); +
Re: ksh: skip long history lines
On Wed, 18 Oct 2017 22:33:51 +0200 Jeremie Courreges-Anglas wrote: > > It would be nice to support arbitrarily long lines, but a first step > would be to skip them gracefuly. > > The code modifies the loop condition: no tests against ferror(3)/feof(3) > are performed any more (I don't see their point). > > We could print a warning on ferror(), though, but that would be another > diff. > > ok? > > > Index: history.c > === > RCS file: /d/cvs/src/bin/ksh/history.c,v > retrieving revision 1.72 > diff -u -p -p -u -r1.72 history.c > --- history.c 18 Oct 2017 15:41:25 - 1.72 > +++ history.c 18 Oct 2017 20:18:24 - > @@ -739,24 +739,45 @@ history_close(void) > static void > history_load(Source *s) > { > - char*p, encoded[LINE + 1], line[LINE + 1]; > + char*nl, *p, encoded[LINE + 1], line[LINE + 1]; > > rewind(histfh); > + line_co = 1; > > /* just read it all; will auto resize history upon next command */ > - for (line_co = 1; ; line_co++) { > - p = fgets(encoded, sizeof(encoded), histfh); > - if (p == NULL || feof(histfh) || ferror(histfh)) > - break; > - if ((p = strchr(encoded, '\n')) == NULL) { > - bi_errorf("history file is corrupt"); > - return; > + for (p = fgets(encoded, sizeof(encoded), histfh); p != NULL; > + p = fgets(encoded, sizeof(encoded), histfh)) { The redundant calls prevent this code from scanning as well as it could for me. Perhaps: for(;;) { if ((p = fgets(...) == NULL) break; Just a minor cosmetic nitpick, it's not a big deal either way. > + if ((nl = strchr(encoded, '\n')) == NULL) { > + static int corrupt, toolong; I'm not sure why these need to be static. I'm not an expert on the code, but it looks like histsave can be called multiple times as the shell runs, which triggers a history reload. Keeping this state seems like it would be wrong. Maybe hoist this outside of the loop. > + /* no trailing newline? */ > + if (strlen(p) != sizeof(encoded) - 1) { > + if (!corrupt) { > + corrupt = 1; > + bi_errorf("history file is corrupt"); > + } > + return; Why does `corrupt` need to be recorded at all? There's a return right after it. > + } > + > + if (!toolong) { > + toolong = 1; > + bi_errorf( > + "ignoring history line(s) longer than %d" > + " bytes", LINE); > + } > + > + do { > + p = fgets(encoded, sizeof(encoded), histfh); > + } while (p != NULL && strchr(p, '\n') == NULL); > + > + continue; > } > - *p = '\0'; > + *nl = '\0'; > s->line = line_co; > s->cmd_offset = line_co; > strunvis(line, encoded); > histsave(line_co, line, 0); > + line_co++; > } > > history_write(); > > -- > jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF DDCC 0DFA 74AE 1524 E7EE > -- Ori Bernstein