Hi Theo,

Theo de Raadt wrote on Mon, Dec 20, 2021 at 10:37:00AM -0700:
> Ingo Schwarze <schwa...@usta.de> wrote:

>> The patch(1) manual talks about "lines" throughout,
>> and for binary files, a concept of "lines" does not even exist.

> That is a bit strong.  Some utilities designed for "text files" have no
> problem being both 8-bit clean and very-long-line capable.  For example,
> you can use emacs to edit a /bsd.  Add a space at the start of the file,
> save it.  re-edit the file, delete the space, and you get the same
> result.  I changed the internals of mg decades ago to also be 8-bit
> clean + very-long-line capable.

Yes, i agree such extensions are often useful.

> I think in POSIX this concept of "text
> file" is very weakly defined -- there are a variety of line-length
> issues, weird issues relating to \n and \r translation, obvious
> inability to handle embeded NUL etc, etc etc, and even end-of-file
> newline termination.
> 
> It is a bit of a copout to say "text file", when the truth really is is
> "imperfect content handling".  OTOH, adding perfect handling in many unix
> programs would be basically impossible, the complexity would be pretty high.
> 
> It may even be that the patch inputfile format cannot handle arbitrary data.
> It certainly has newline followed by '+', '-', or ' ' and ending with '\n'
> as context tracking, so it might be impossible.
 
>> The problem arises from the manual failing to mention that patch(1)
>> operates on text files.

> Yes, I am OK with it saying so.

>> Many standard utilities operate on text files
>> only, the concept of a text-file is both well-known and defined by
>> POSIX, so there is no need to re-explain what that means in individual
>> pages.

> Nope, I think POSIX fails to create a good definition.

Admittedly, the POSIX definition is a bit weak in some respects,
but all the same, it is usable as a minmium requirement that many
standard utilities have in common.

> Look at the
> recent commit to uniq.c, you even commented on it.  patch can handle files
> without trailing \n, right?   If the POSIX definition was real, patch would
> probably need to fail upon those files, creating other problems.

If POSIX says "The foobar utility operates on text files", that does not
mean that it requires foobar to fail if the input is not a text file.
It merely means that the behaviour is unspecified in that case.

It may occasionally be useful to add statements similar to the following
to the STANDARDS section:

  As an extension to that specification,

  foobar can handle files containing NUL bytes
or
  foobar can handle files containing bytes that do not form characters
or
  foobar can handle lines longer than LINE_MAX bytes
or
  foobar can handle files lacking the terminating newline character
  on the last line

Only in cases where such an extension is useful for some practical
purpose and we are willing to maintain it, of course.

Yours,
  Ingo

Reply via email to