Possibly fgetln(3) CAVEATS could benefit from some of what you have written here, although it is all in the manual already just in several places.
Or at least a link to getline(3)? On Sun, Jul 02, 2017 at 12:26:19AM +0200, Ingo Schwarze wrote: > Hi, > > Nicholas Marriott wrote on Sat, Jul 01, 2017 at 09:48:08PM +0100: > > > Safer how? > > 1. fgetln(3) returns a char array that is not NUL-terminated. > That is prone to buffer overrun bugs in general. > > getline(3) always returns proper NUL-terminated C strings. > > 2. The array returned from fgetln(3) contains a trailing '\n' in > most cases, but if the stream ends without it, there won't be > one, which aggravates the danger of item 1: if a careless > programmer reads the buffer until '\n', they will get away with > it on all valid text input files, so the problem is likely to > go undetected during testing, but they may yet be vulnerable > to buffer overrun on crafted input lacking the trailing newline. > > With getline(3), lack of the trailing '\n' is much less of an > issue because people will likely look for the terminating '\0' > in the first place, not for the terminating '\n'. > > 3. For this reason, the fgetln(3) manual page contains a lengthy > CAVEATS section recommending a complicated ten-line idiom to > NUL-terminate the buffer. In general, if something requires > a ten-line idiom, it can hardly be called safe. Adding the > missing two lines for error checking at the end, the complete > example is essentially 19 lines. > > EXAMPLES in fgetline(3) contains a code snipped doing about the > same, and even adding two more lines to strip '\n', it is only > 10 lines long including full initialization and error checking. > So that's half the code, much more intuitive, and much simpler, > without any manual malloc(3) or memcpy(3) or manually assigning > '\0' to some position calculated from pointer arithmetics, > which we all know is prone to off-by-one. > > 4. With fgetln(3), the returned pointer becomes invalid after the > next I/O operation on the stream. So you say line = fgetln(..), > do something else that you consider unrelated because you never > use "line" in it, and the next time you access "line", boom. > That can be very surprising. In general, library interfaces > passing out pointers to library-owned internal memory for the > user to mess with tend to be somewhat risky. > > By contrast, getline(3) copies the data to a dedicated buffer, > which won't become invalid on I/O. So the next time you use > it, it is guaranteed to be still valid, even if that is much > later and the file in question has long been closed. Of course, > if you pass the same buffer to getline(3) again, then after > that, it will contain the *next* line; but that is not surprising > at all, because that is exactly what you explicitly asked for. > > 5. The fgetln(3) manual says that you are allowed to modify the > returned buffer, but item 4 still applies, which aggravates the > danger inherent in item 4. Why would you modify the buffer, > unless you want to do something else after that and then, yet > later, look at your changes again? But then you must be *very* > careful to not do I/O on that file descriptor while doing > the "something else", or you are in for a surprise. > > getline(3) returns an allocated buffer for the user to free(3). > That's an easy to understand operation mode with clearly > assigned responsibilies, and it is so obvious that the user > can alter the buffer - after all, it's their own buffer - > that the manual doesn't even need to mention it. > > Your code follows the CAVEATS section in the fgetln(3) manual quite > closely, and i would have been quite surprised if you had screwed > such a thing up. > > But with most other people, i would have felt somewhat worried > about their code if they had asked me "safer how?" *after* using > fgetln(3)... >:-o > > > I don't mind too much, but you will also need to initialize size > > to 0 (it is currently uninitialized) before calling getline(). > > Ouch. No interface can be so simple that there is no way to > misuse it dangerously, even if it sets fewer traps for the > unwary than fgetln(3). > > >> Index: magic-load.c > >> =================================================================== > >> RCS file: /cvs/src/usr.bin/file/magic-load.c,v > >> retrieving revision 1.25 > >> diff -u -p -r1.25 magic-load.c > >> --- magic-load.c 1 Jul 2017 14:34:29 -0000 1.25 > >> +++ magic-load.c 1 Jul 2017 18:44:42 -0000 > >> @@ -1071,6 +1071,7 @@ magic_load(FILE *f, const char *path, in > >> struct magic_line *ml = NULL, *parent, *parent0; > >> char *line, *tmp; > >> size_t size; > >> + ssize_t slen; > >> u_int at, level, n, i; > >> > >> m = xcalloc(1, sizeof *m); > >> @@ -1084,15 +1085,11 @@ magic_load(FILE *f, const char *path, in > >> > >> at = 0; > >> tmp = NULL; > >> - while ((line = fgetln(f, &size))) { > > Hmmm, the current code does not test ferror(3) after falling out > of the "while" loop - is it intentional to ignore I/O errors? > In that case, an explicit comment "ignore I/O errors" after > the end of the while loop might make sense. > > >> - if (line[size - 1] == '\n') > >> - line[size - 1] = '\0'; > >> - else { > >> - tmp = xmalloc(size + 1); > >> - memcpy(tmp, line, size); > >> - tmp[size] = '\0'; > >> - line = tmp; > >> - } > >> + while ((slen = getline(&tmp, &size, f)) != -1) { > >> + line = tmp; > > I don't see a need for two char * variables here, that's an artifact > of the typical fgetln(3) complications. Just > > char *line = NULL; > size_t linesize = 0; > ssize_t slen; > > while ((slen = getline(&line, &linesize, f)) != -1) { > if (line[slen - 1] == '\n') > /* etc. */ > } > free(line); > if (ferror(f)) > err(1, "%s", path); > > ought to suffice. > > Yours, > Ingo > > >> + if (line[slen - 1] == '\n') > >> + line[slen - 1] = '\0'; > >> + > >> at++; > >> > >> while (isspace((u_char)*line))