Hi,

Todd C. Miller wrote on Tue, Nov 02, 2021 at 08:12:00AM -0600:
> On Mon, 01 Nov 2021 21:04:54 -0500, Scott Cheloha wrote:

>> Yes it would be simpler.  However I didn't want to start changing the
>> input -- which we currently don't do -- without discussing it.
>>
>> The standard says we should "write one copy of each input line on the
>> output." So, if we are being strict, we don't add a newline that isn't
>> there, because that isn't what we read.  Any other interpretation
>> requires handwaving about what an "input line" even is.

> The System V version of uniq actually ignores the last line if it
> doesn't end in a newline.  For example:
> 
>     $ printf "bar\nfoo\nfool" | uniq
>     bar
>     foo
> 
> AIX, Solaris, and HP-UX still exhibit this behavior.  What happens
> is that the gline() function (which reads the line) returns non-zero
> when it hits EOF, discarding any input in that line.  Interestingly,
> it does realloc the line buffer as needed to handle long lines.
> 
> So really, this is a corner case where you can't count on consistent
> behavior among implementations and we just need to do what we think
> is best.  If you prefer we retain the existing behavior wrt a final
> line without a newline that is OK with me.

Just a quick comment without looking at the code:

POSIX says on
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/uniq.html :

  INPUT FILES
    The input file shall be a text file.

By definition, a file that does not end in a newline character
is *NOT* a text file.  Consequently, what uniq(1) does in this case
is unspecified.

My personal opinion is that in general, if utilities are specified to
process text files and the terminating newline is missing, it is best to
behave as if the mandatory terminating newline were present.  The reason
why i think so is that forgetting the terminating newline is a popular
user error, and silently assuming a terminating newline causes less
surprise with users than silently discarding the incomplete last line.

Most users are not even aware that a file without a terminating newline
is not a text file, so it definitely is less surprising for the vast
majority.  But even those who are aware of that quirk in the definition
of what a text file is (like myself) are regularly surprised by the
consequences...

Yours,
  Ingo

Reply via email to