Re: document patch(1) not supporting binary diffs

2021-12-21 Thread Jason McIntyre
On Tue, Dec 21, 2021 at 09:43:27AM +0100, Ingo Schwarze wrote:
> Jason McIntyre wrote on Mon, Dec 20, 2021 at 04:13:19PM +:
> 
> > i'm ok with your diff
> 
> Thanks for checking.
> 
> > but it is slightly misleading in context of
> > reading from stdin. i suppose that is no biggie.
> 
> I think i see why you say so.
> 
> Speaking *formally*, what we usually do is describing what the utility
> does by default in the first sentence, then explain how command line
> options modify that behaviour below "The options are as follows".
> We could do that here, somewhat like this:
> 
>   patch reads a text file called the "patch file" from standard input,
>   containing any of the four forms ... yadayada ...
>   and applies those differences to an original text file ...
> 
> Besides, usually, we would *not* show a SYNOPSIS line
> 
>   patch  
> In this particular case, however, typing "patch  maybe "patch -i patchfile" is so dominant in practice that focussing
> on it seems to help users more than adhering to formal practices.
> 
> I think calling this "misleading" is a bit strong because very often,
> the first sentence in our manual pages describes the default or most
> important mode of operation and subsequent sentences modify that by
> detailing other modes.  In this case, not only does the next sentence
> say what happens if no patch file is specified, but we also have the
> "patch  

hi.

my concern was that we would change a piece of text that talked
generally about patch files (without overly describing the format) into
a specific reference to an exact file (.Ar patchfile). the former does
not generally exclude stdin, but the latter seemed to do so. or at least
leave some ambiguity.

so i hoped to avoid the issue with a simpler fix. still, as noted, i
didn;t feel it to be overly important since, as you say, the document
does then go on to explain things. and we do usually talk about how
things work in general, then flesh out differences.

> > it would be simpler to just s/patch file/text file/ maybe?
> 
> True, that would be simpler and not outright wrong.  But in normal
> operation, patch(1) deals with many text files, so i worry that
> people might get confused as to which text file we are talking about.
> It seems valuable for the first sentence to make it clear that one
> among all these files is special and to introduce the term "patch file"
> or ".Ar patchfile".
> 

heh, so the opposite of what i'm describing.

> So to first get the issue reported by chrisz@ out of the way, i
> committed the patch as-is.
> 

sure, makes sense

> We could say
> 
>   .Nm
>   takes a text file called the
>   .Sq patch file
>   containing any of the four forms ...
> 
> Do you think that is better, like in the patch below?
> 

yes, because it seems to address all our points. on the other hand, i
can;t help but feel that the original wording was better, even if it has
the issue about not being clear that it should be a text file. it was
simpler and easier to understand.

would it work to reinstate that original wording ("patch will take a
patch file"), but somehow strongarm the second sentence ("If
patchfile is omitted") into one describing both that it shold be a text
file, and what happens if it is omitted?

i.e. leave the original wording of sentence 1, but adjust that of
sentence 2.

what do you think?
jmc

> Yours,
>   Ingo
> 
> 
> Index: patch.1
> ===
> RCS file: /cvs/src/usr.bin/patch/patch.1,v
> retrieving revision 1.34
> diff -u -r1.34 patch.1
> --- patch.1   21 Dec 2021 08:07:20 -  1.34
> +++ patch.1   21 Dec 2021 08:37:54 -
> @@ -47,8 +47,8 @@
>  .Pf \*(Lt Ar patchfile
>  .Sh DESCRIPTION
>  .Nm
> -takes the text file
> -.Ar patchfile
> +takes a text file called the
> +.Dq patch file
>  containing any of the four forms of difference
>  listing produced by the
>  .Xr diff 1
> @@ -56,7 +56,7 @@
>  producing a patched version.
>  If
>  .Ar patchfile
> -is omitted, or is a hyphen, the patch will be read from the standard input.
> +is omitted or is a hyphen, the patch file is read from the standard input.
>  .Pp
>  .Nm
>  will attempt to determine the type of the diff listing, unless overruled by a
> 



Re: document patch(1) not supporting binary diffs

2021-12-21 Thread Ingo Schwarze
Hi Theo,

Theo de Raadt wrote on Mon, Dec 20, 2021 at 10:37:00AM -0700:
> Ingo Schwarze  wrote:

>> The patch(1) manual talks about "lines" throughout,
>> and for binary files, a concept of "lines" does not even exist.

> That is a bit strong.  Some utilities designed for "text files" have no
> problem being both 8-bit clean and very-long-line capable.  For example,
> you can use emacs to edit a /bsd.  Add a space at the start of the file,
> save it.  re-edit the file, delete the space, and you get the same
> result.  I changed the internals of mg decades ago to also be 8-bit
> clean + very-long-line capable.

Yes, i agree such extensions are often useful.

> I think in POSIX this concept of "text
> file" is very weakly defined -- there are a variety of line-length
> issues, weird issues relating to \n and \r translation, obvious
> inability to handle embeded NUL etc, etc etc, and even end-of-file
> newline termination.
> 
> It is a bit of a copout to say "text file", when the truth really is is
> "imperfect content handling".  OTOH, adding perfect handling in many unix
> programs would be basically impossible, the complexity would be pretty high.
> 
> It may even be that the patch inputfile format cannot handle arbitrary data.
> It certainly has newline followed by '+', '-', or ' ' and ending with '\n'
> as context tracking, so it might be impossible.
 
>> The problem arises from the manual failing to mention that patch(1)
>> operates on text files.

> Yes, I am OK with it saying so.

>> Many standard utilities operate on text files
>> only, the concept of a text-file is both well-known and defined by
>> POSIX, so there is no need to re-explain what that means in individual
>> pages.

> Nope, I think POSIX fails to create a good definition.

Admittedly, the POSIX definition is a bit weak in some respects,
but all the same, it is usable as a minmium requirement that many
standard utilities have in common.

> Look at the
> recent commit to uniq.c, you even commented on it.  patch can handle files
> without trailing \n, right?   If the POSIX definition was real, patch would
> probably need to fail upon those files, creating other problems.

If POSIX says "The foobar utility operates on text files", that does not
mean that it requires foobar to fail if the input is not a text file.
It merely means that the behaviour is unspecified in that case.

It may occasionally be useful to add statements similar to the following
to the STANDARDS section:

  As an extension to that specification,

  foobar can handle files containing NUL bytes
or
  foobar can handle files containing bytes that do not form characters
or
  foobar can handle lines longer than LINE_MAX bytes
or
  foobar can handle files lacking the terminating newline character
  on the last line

Only in cases where such an extension is useful for some practical
purpose and we are willing to maintain it, of course.

Yours,
  Ingo



Re: document patch(1) not supporting binary diffs

2021-12-21 Thread Ingo Schwarze
Jason McIntyre wrote on Mon, Dec 20, 2021 at 04:13:19PM +:

> i'm ok with your diff

Thanks for checking.

> but it is slightly misleading in context of
> reading from stdin. i suppose that is no biggie.

I think i see why you say so.

Speaking *formally*, what we usually do is describing what the utility
does by default in the first sentence, then explain how command line
options modify that behaviour below "The options are as follows".
We could do that here, somewhat like this:

  patch reads a text file called the "patch file" from standard input,
  containing any of the four forms ... yadayada ...
  and applies those differences to an original text file ...

Besides, usually, we would *not* show a SYNOPSIS line

  patch  it would be simpler to just s/patch file/text file/ maybe?

True, that would be simpler and not outright wrong.  But in normal
operation, patch(1) deals with many text files, so i worry that
people might get confused as to which text file we are talking about.
It seems valuable for the first sentence to make it clear that one
among all these files is special and to introduce the term "patch file"
or ".Ar patchfile".

So to first get the issue reported by chrisz@ out of the way, i
committed the patch as-is.

We could say

  .Nm
  takes a text file called the
  .Sq patch file
  containing any of the four forms ...

Do you think that is better, like in the patch below?

Yours,
  Ingo


Index: patch.1
===
RCS file: /cvs/src/usr.bin/patch/patch.1,v
retrieving revision 1.34
diff -u -r1.34 patch.1
--- patch.1 21 Dec 2021 08:07:20 -  1.34
+++ patch.1 21 Dec 2021 08:37:54 -
@@ -47,8 +47,8 @@
 .Pf \*(Lt Ar patchfile
 .Sh DESCRIPTION
 .Nm
-takes the text file
-.Ar patchfile
+takes a text file called the
+.Dq patch file
 containing any of the four forms of difference
 listing produced by the
 .Xr diff 1
@@ -56,7 +56,7 @@
 producing a patched version.
 If
 .Ar patchfile
-is omitted, or is a hyphen, the patch will be read from the standard input.
+is omitted or is a hyphen, the patch file is read from the standard input.
 .Pp
 .Nm
 will attempt to determine the type of the diff listing, unless overruled by a



Re: document patch(1) not supporting binary diffs

2021-12-20 Thread Theo de Raadt
Ingo Schwarze  wrote:

> The patch(1) manual talks about "lines" throughout,
> and for binary files, a concept of "lines" does not even exist.

That is a bit strong.  Some utilities designed for "text files" have no
problem being both 8-bit clean and very-long-line capable.  For example,
you can use emacs to edit a /bsd.  Add a space at the start of the file,
save it.  re-edit the file, delete the space, and you get the same
result.  I changed the internals of mg decades ago to also be 8-bit
clean + very-long-line capable.  I think in POSIX this concept of "text
file" is very weakly defined -- there are a variety of line-length
issues, weird issues relating to \n and \r translation, obvious
inability to handle embeded NUL etc, etc etc, and even end-of-file
newline termination.

It is a bit of a copout to say "text file", when the truth really is is
"imperfect content handling".  OTOH, adding perfect handling in many unix
programs would be basically impossible, the complexity would be pretty high.

It may even be that the patch inputfile format cannot handle arbitrary data.
It certainly has newline followed by '+', '-', or ' ' and ending with '\n'
as context tracking, so it might be impossible.

> The problem arises from the manual failing to mention that patch(1)
> operates on text files.

Yes, I am OK with it saying so.

> Many standard utilities operate on text files
> only, the concept of a text-file is both well-known and defined by
> POSIX, so there is no need to re-explain what that means in individual
> pages.

Nope, I think POSIX fails to create a good definition.  Look at the
recent commit to uniq.c, you even commented on it.  patch can handle files
without trailing \n, right?   If the POSIX definition was real, patch would
probably need to fail upon those files, creating other problems.




Re: document patch(1) not supporting binary diffs

2021-12-20 Thread Jason McIntyre
On Mon, Dec 20, 2021 at 04:45:58PM +0100, Ingo Schwarze wrote:
> Hi Christopher,
> 
> Christopher Zimmermann wrote on Mon, Dec 20, 2021 at 04:01:49PM +0100:
> 
> > base patch cannot work with diffs of binary files. It might help to say 
> > so in the manpage since other implementations do support this (ab)use of 
> > patch. OK?
> 
> I agree you are pointing out a slight problem with the manual page,
> but i dislike your patch for two reasons:
> 
> This is not a bug, but a fundamental design decision.
> The patch(1) manual talks about "lines" throughout,
> and for binary files, a concept of "lines" does not even exist.
> 
> And the basic idea of a tool ought to be described in the first
> sentence of the description, not as an afterthought at the very end.
> 
> The problem arises from the manual failing to mention that patch(1)
> operates on text files.  Many standard utilities operate on text files
> only, the concept of a text-file is both well-known and defined by
> POSIX, so there is no need to re-explain what that means in individual
> pages.
> 
> Consequently, i think something like the following would be better.
> 
> Yours,
>   Ingo
> 
> 
> Index: patch.1
> ===
> RCS file: /cvs/src/usr.bin/patch/patch.1,v
> retrieving revision 1.33
> diff -u -r1.33 patch.1
> --- patch.1   9 Nov 2021 16:13:40 -   1.33
> +++ patch.1   20 Dec 2021 15:42:10 -
> @@ -47,10 +47,12 @@
>  .Pf \*(Lt Ar patchfile
>  .Sh DESCRIPTION
>  .Nm
> -will take a patch file containing any of the four forms of difference
> +takes the text file
> +.Ar patchfile
> +containing any of the four forms of difference
>  listing produced by the
>  .Xr diff 1
> -program and apply those differences to an original file,
> +program and applies those differences to an original text file,
>  producing a patched version.
>  If
>  .Ar patchfile
> 

hi ingo.

i'm ok with your diff but it is slightly misleading in context of
reading from stdin. i suppose that is no biggie.

it would be simpler to just s/patch file/text file/ maybe?

if not, i'm still ok with your version.

jmc



Re: document patch(1) not supporting binary diffs

2021-12-20 Thread Ingo Schwarze
Hi Christopher,

Christopher Zimmermann wrote on Mon, Dec 20, 2021 at 04:01:49PM +0100:

> base patch cannot work with diffs of binary files. It might help to say 
> so in the manpage since other implementations do support this (ab)use of 
> patch. OK?

I agree you are pointing out a slight problem with the manual page,
but i dislike your patch for two reasons:

This is not a bug, but a fundamental design decision.
The patch(1) manual talks about "lines" throughout,
and for binary files, a concept of "lines" does not even exist.

And the basic idea of a tool ought to be described in the first
sentence of the description, not as an afterthought at the very end.

The problem arises from the manual failing to mention that patch(1)
operates on text files.  Many standard utilities operate on text files
only, the concept of a text-file is both well-known and defined by
POSIX, so there is no need to re-explain what that means in individual
pages.

Consequently, i think something like the following would be better.

Yours,
  Ingo


Index: patch.1
===
RCS file: /cvs/src/usr.bin/patch/patch.1,v
retrieving revision 1.33
diff -u -r1.33 patch.1
--- patch.1 9 Nov 2021 16:13:40 -   1.33
+++ patch.1 20 Dec 2021 15:42:10 -
@@ -47,10 +47,12 @@
 .Pf \*(Lt Ar patchfile
 .Sh DESCRIPTION
 .Nm
-will take a patch file containing any of the four forms of difference
+takes the text file
+.Ar patchfile
+containing any of the four forms of difference
 listing produced by the
 .Xr diff 1
-program and apply those differences to an original file,
+program and applies those differences to an original text file,
 producing a patched version.
 If
 .Ar patchfile