Right, I wasn't clear - I didn't mean "why is getline better than
fgetln", I was asking "why do you want to change fgetln to getline in
this code".

It is fine if the answer is "I am worried that another developer might
copy or modify the code and mess it up" rather than "the way file is
using fgetln is not safe" but that was not clear to me.




On Sun, Jul 02, 2017 at 12:26:19AM +0200, Ingo Schwarze wrote:
> Hi,
> 
> Nicholas Marriott wrote on Sat, Jul 01, 2017 at 09:48:08PM +0100:
> 
> > Safer how?
> 
>  1. fgetln(3) returns a char array that is not NUL-terminated.
>     That is prone to buffer overrun bugs in general.
> 
>     getline(3) always returns proper NUL-terminated C strings.
> 
>  2. The array returned from fgetln(3) contains a trailing '\n' in
>     most cases, but if the stream ends without it, there won't be
>     one, which aggravates the danger of item 1: if a careless
>     programmer reads the buffer until '\n', they will get away with
>     it on all valid text input files, so the problem is likely to
>     go undetected during testing, but they may yet be vulnerable
>     to buffer overrun on crafted input lacking the trailing newline.
> 
>     With getline(3), lack of the trailing '\n' is much less of an
>     issue because people will likely look for the terminating '\0'
>     in the first place, not for the terminating '\n'.
> 
>  3. For this reason, the fgetln(3) manual page contains a lengthy
>     CAVEATS section recommending a complicated ten-line idiom to
>     NUL-terminate the buffer.  In general, if something requires
>     a ten-line idiom, it can hardly be called safe.  Adding the
>     missing two lines for error checking at the end, the complete
>     example is essentially 19 lines.
> 
>     EXAMPLES in fgetline(3) contains a code snipped doing about the
>     same, and even adding two more lines to strip '\n', it is only
>     10 lines long including full initialization and error checking.
>     So that's half the code, much more intuitive, and much simpler,
>     without any manual malloc(3) or memcpy(3) or manually assigning
>     '\0' to some position calculated from pointer arithmetics,
>     which we all know is prone to off-by-one.
> 
>  4. With fgetln(3), the returned pointer becomes invalid after the
>     next I/O operation on the stream.  So you say line = fgetln(..),
>     do something else that you consider unrelated because you never
>     use "line" in it, and the next time you access "line", boom.
>     That can be very surprising.  In general, library interfaces
>     passing out pointers to library-owned internal memory for the
>     user to mess with tend to be somewhat risky.
> 
>     By contrast, getline(3) copies the data to a dedicated buffer,
>     which won't become invalid on I/O.  So the next time you use
>     it, it is guaranteed to be still valid, even if that is much
>     later and the file in question has long been closed.  Of course,
>     if you pass the same buffer to getline(3) again, then after
>     that, it will contain the *next* line; but that is not surprising
>     at all, because that is exactly what you explicitly asked for.
> 
>  5. The fgetln(3) manual says that you are allowed to modify the
>     returned buffer, but item 4 still applies, which aggravates the
>     danger inherent in item 4.  Why would you modify the buffer,
>     unless you want to do something else after that and then, yet
>     later, look at your changes again?  But then you must be *very*
>     careful to not do I/O on that file descriptor while doing
>     the "something else", or you are in for a surprise.
> 
>     getline(3) returns an allocated buffer for the user to free(3).
>     That's an easy to understand operation mode with clearly
>     assigned responsibilies, and it is so obvious that the user
>     can alter the buffer - after all, it's their own buffer -
>     that the manual doesn't even need to mention it.
> 
> Your code follows the CAVEATS section in the fgetln(3) manual quite
> closely, and i would have been quite surprised if you had screwed
> such a thing up.
> 
> But with most other people, i would have felt somewhat worried
> about their code if they had asked me "safer how?" *after* using
> fgetln(3)...  >:-o
> 
> > I don't mind too much, but you will also need to initialize size
> > to 0 (it is currently uninitialized) before calling getline().
> 
> Ouch.  No interface can be so simple that there is no way to
> misuse it dangerously, even if it sets fewer traps for the
> unwary than fgetln(3).
> 
> >> Index: magic-load.c
> >> ===================================================================
> >> RCS file: /cvs/src/usr.bin/file/magic-load.c,v
> >> retrieving revision 1.25
> >> diff -u -p -r1.25 magic-load.c
> >> --- magic-load.c   1 Jul 2017 14:34:29 -0000       1.25
> >> +++ magic-load.c   1 Jul 2017 18:44:42 -0000
> >> @@ -1071,6 +1071,7 @@ magic_load(FILE *f, const char *path, in
> >>    struct magic_line       *ml = NULL, *parent, *parent0;
> >>    char                    *line, *tmp;
> >>    size_t                   size;
> >> +  ssize_t                  slen;
> >>    u_int                    at, level, n, i;
> >>  
> >>    m = xcalloc(1, sizeof *m);
> >> @@ -1084,15 +1085,11 @@ magic_load(FILE *f, const char *path, in
> >>  
> >>    at = 0;
> >>    tmp = NULL;
> >> -  while ((line = fgetln(f, &size))) {
> 
> Hmmm, the current code does not test ferror(3) after falling out
> of the "while" loop - is it intentional to ignore I/O errors?
> In that case, an explicit comment "ignore I/O errors" after
> the end of the while loop might make sense.
> 
> >> -          if (line[size - 1] == '\n')
> >> -                  line[size - 1] = '\0';
> >> -          else {
> >> -                  tmp = xmalloc(size + 1);
> >> -                  memcpy(tmp, line, size);
> >> -                  tmp[size] = '\0';
> >> -                  line = tmp;
> >> -          }
> >> +  while ((slen = getline(&tmp, &size, f)) != -1) {
> >> +          line = tmp;
> 
> I don't see a need for two char * variables here, that's an artifact
> of the typical fgetln(3) complications.  Just
> 
>       char    *line = NULL;
>       size_t   linesize = 0;
>       ssize_t  slen;
> 
>       while ((slen = getline(&line, &linesize, f)) != -1) {
>               if (line[slen - 1] == '\n')
>                       /* etc. */
>       }
>       free(line);
>       if (ferror(f))
>               err(1, "%s", path);
> 
> ought to suffice.
> 
> Yours,
>   Ingo
> 
> >> +          if (line[slen - 1] == '\n')
> >> +                  line[slen - 1] = '\0';
> >> +
> >>            at++;
> >>  
> >>            while (isspace((u_char)*line))

Reply via email to