Re: "svn patch" and the TAB character

2019-12-18 Thread Doug Robinson
Brane, Stefan, Daniel, Nathan:

Thank you for the discussion - it filled in history that I could not find.
I understand the issues now.  Cheers!

Happy Holidays!

Doug

On Tue, Dec 17, 2019 at 9:42 AM Branko Čibej  wrote:

> On 17.12.2019 14:58, Stefan Sperling wrote:
> > On Tue, Dec 17, 2019 at 08:33:09AM -0500, Doug Robinson wrote:
> >>>  And of course, both the
> >>> filename and the label (= the part after the tab character) may
> contain an
> >>> arbitrary number of spaces.
> >> The problem is parsing the line into proper tokens when every character
> out
> >> there can be part of the file name.  There must be a structure to the
> field
> >> or parsing is not really parsing anymore.
> > The output of 'svn diff' was not designed with 'svn patch' in mind.
> > The diff command preceded the patch command by many years.
> >
> > I still believe that making --patch-compatible produce filenames without
> > any trailing annotations in the output of 'svn diff' is the best answer.
>
>
> Agreed. Plain-vanilla BSD patch should be able to apply such patches, as
> should 'svn patch'.
>
>
> > This whole discussion started off with the problem statement that the
> > --patch-compatible option produces a diff which cannot be applied after
> > tabs are replaced with spaces.
> >
> > This filename parsing problem isn't limited to 'svn patch'. Other patch
> > implementations might run into the same problem. The way 'svn patch'
> finds
> > the filename is based on Larry Wall's original patch program.
>
> Precisely. That TAB *is* the canonical delimiter. The (implicit)
> specification is decades old, and it would be an error to try to
> second-guess it at this point.
>
> Designing a stricter universal format for transmitting contextual
> changes is a different matter entirely, one that we tried to tackle in
> the past and failed; other version control systems tried and failed,
> too. It would be a nice thing to have, but it's hardly something this
> community can do in isolation.
>
> > Generally, patches being mangled in transit is a notorious problem
> especially
> > with web-based email. It's not really a safe format for this purpose. If
> patches
> > get changed in transit in any way there is no guarantee they will apply
> cleanly.
> > That's not a new problem, it's a long-standing issue. And it's not
> SVN-specific.
> > There is already a way to transfer changes without such problems: svn
> commit
>
> And note that the tabs-to-spaces headache doesn't stop with parsing the
> filename out of the header, it affects the filename itself. I can put
> tabs in a filename on most Unix filesystems ... that would probably
> break even the original patch tool.
>
> -- Brane
>


-- 
*DOUGLAS B ROBINSON* SENIOR PRODUCT MANAGER

T +1 925 396 1125
*E* doug.robin...@wandisco.com

-- 


* *

**The *LiveData* Company
*Find out more 
*wandisco.com *



 



THIS MESSAGE AND ANY ATTACHMENTS 
ARE CONFIDENTIAL, PROPRIETARY AND MAY BE PRIVILEGED
*


If this message was 
misdirected, WANdisco, Inc. and its subsidiaries, ("WANdisco") does not 
waive any confidentiality or privilege. If you are not the intended 
recipient, please notify us immediately and destroy the message without 
disclosing its contents to anyone. Any distribution, use or copying of this 
email or the information it contains by other than an intended recipient is 
unauthorized. The views and opinions expressed in this email message are 
the author's own and may not reflect the views and opinions of WANdisco, 
unless the author is authorized by WANdisco to express such views or 
opinions on its behalf. All email sent to or from this address is subject 
to electronic storage and review by WANdisco. Although WANdisco operates 
anti-virus programs, it does not accept responsibility for any damage 
whatsoever caused by viruses being passed.


Re: "svn patch" and the TAB character

2019-12-17 Thread Branko Čibej
On 17.12.2019 14:58, Stefan Sperling wrote:
> On Tue, Dec 17, 2019 at 08:33:09AM -0500, Doug Robinson wrote:
>>>  And of course, both the
>>> filename and the label (= the part after the tab character) may contain an
>>> arbitrary number of spaces.
>> The problem is parsing the line into proper tokens when every character out
>> there can be part of the file name.  There must be a structure to the field
>> or parsing is not really parsing anymore.
> The output of 'svn diff' was not designed with 'svn patch' in mind.
> The diff command preceded the patch command by many years.
>
> I still believe that making --patch-compatible produce filenames without
> any trailing annotations in the output of 'svn diff' is the best answer.


Agreed. Plain-vanilla BSD patch should be able to apply such patches, as
should 'svn patch'.


> This whole discussion started off with the problem statement that the
> --patch-compatible option produces a diff which cannot be applied after
> tabs are replaced with spaces.
>
> This filename parsing problem isn't limited to 'svn patch'. Other patch
> implementations might run into the same problem. The way 'svn patch' finds
> the filename is based on Larry Wall's original patch program.

Precisely. That TAB *is* the canonical delimiter. The (implicit)
specification is decades old, and it would be an error to try to
second-guess it at this point.

Designing a stricter universal format for transmitting contextual
changes is a different matter entirely, one that we tried to tackle in
the past and failed; other version control systems tried and failed,
too. It would be a nice thing to have, but it's hardly something this
community can do in isolation.

> Generally, patches being mangled in transit is a notorious problem especially
> with web-based email. It's not really a safe format for this purpose. If 
> patches
> get changed in transit in any way there is no guarantee they will apply 
> cleanly.
> That's not a new problem, it's a long-standing issue. And it's not 
> SVN-specific.
> There is already a way to transfer changes without such problems: svn commit

And note that the tabs-to-spaces headache doesn't stop with parsing the
filename out of the header, it affects the filename itself. I can put
tabs in a filename on most Unix filesystems ... that would probably
break even the original patch tool.

-- Brane


Re: "svn patch" and the TAB character

2019-12-17 Thread Stefan Sperling
On Tue, Dec 17, 2019 at 08:33:09AM -0500, Doug Robinson wrote:
> >  And of course, both the
> > filename and the label (= the part after the tab character) may contain an
> > arbitrary number of spaces.
> 
> The problem is parsing the line into proper tokens when every character out
> there can be part of the file name.  There must be a structure to the field
> or parsing is not really parsing anymore.

The output of 'svn diff' was not designed with 'svn patch' in mind.
The diff command preceded the patch command by many years.

I still believe that making --patch-compatible produce filenames without
any trailing annotations in the output of 'svn diff' is the best answer.
This whole discussion started off with the problem statement that the
--patch-compatible option produces a diff which cannot be applied after
tabs are replaced with spaces.

This filename parsing problem isn't limited to 'svn patch'. Other patch
implementations might run into the same problem. The way 'svn patch' finds
the filename is based on Larry Wall's original patch program.

Generally, patches being mangled in transit is a notorious problem especially
with web-based email. It's not really a safe format for this purpose. If patches
get changed in transit in any way there is no guarantee they will apply cleanly.
That's not a new problem, it's a long-standing issue. And it's not SVN-specific.
There is already a way to transfer changes without such problems: svn commit


Re: "svn patch" and the TAB character

2019-12-17 Thread Doug Robinson
On Mon, Dec 16, 2019 at 8:55 PM Daniel Shahaf 
wrote:
> Doug Robinson wrote on Mon, Dec 16, 2019 at 11:13:25 -0500:
> > So the two file names, differing only by a TAB in the "right place" will
> > currently have completely different behaviors:
> >
> >   My File NameSPACE(Revision 12)
> >   My File NameTAB(Revision 12)
> >
> > I see no point in maintaining that the "TAB" is critically significant
and
> > different from the "SPACE".  The difference is easier parsing vs. 1
visually
> > indistinguishable use case (depending on what tool you are using to
view the
> > diff file).
> >
> > And the "sequence of SPACE" instead of just a single "SPACE" falls into
the
> > same visually indistinguishable category.  So keep the parsing simple
and
> > just declare file names with "SP/TAB(Revision #)" at the end to not be
> > supported.
>
> You can't assume the string after the tab will be "(revision %ld)".
>
> First of all, as I already pointed out, that string is translatable.

Then we're screwed.  Some languages will have the revision number before
the word.  Some may actually require characters on both sides of the
revision number.

There's a reason that languages like C, C++, Java, etc. have "keywords" -
so they can properly parse the code.  In this case the "code" is in the
form of a "diff" file and the language-variant "SVN diff" needs a
fixed-text keyword with a specific operational sequence, e.g. "(KEYWORD
OPERAND)", i.e. "(revision 1234)".  Short of that parsing is purely
accidental.

> Second of
> all, we also have to support patches generated by third-party tools,
which may
> contain arbitrary text after the tab character.

Define the language.  Require the tooling to comply.  Trying to make
everything work (everybody happy) is the path to failure.

>  And of course, both the
> filename and the label (= the part after the tab character) may contain an
> arbitrary number of spaces.

The problem is parsing the line into proper tokens when every character out
there can be part of the file name.  There must be a structure to the field
or parsing is not really parsing anymore.

> Please propose an algorithm for parsing a filename out of a diff header
line
> (a '---' line or a '+++' line) that doesn't contain tabs, under these
conditions.
> (We don't have to fix _all_ cases, but fixing the bug just for English
speakers
> isn't going to fly.)

Anything I propose without keywords/structure I can trivially
counter-example.  Something will break.  As is the current situation.

-- 
*DOUGLAS B ROBINSON* SENIOR PRODUCT MANAGER

T +1 925 396 1125
*E* doug.robin...@wandisco.com

-- 


* *

**The *LiveData* Company
*Find out more 
*wandisco.com *



 



THIS MESSAGE AND ANY ATTACHMENTS 
ARE CONFIDENTIAL, PROPRIETARY AND MAY BE PRIVILEGED
*


If this message was 
misdirected, WANdisco, Inc. and its subsidiaries, ("WANdisco") does not 
waive any confidentiality or privilege. If you are not the intended 
recipient, please notify us immediately and destroy the message without 
disclosing its contents to anyone. Any distribution, use or copying of this 
email or the information it contains by other than an intended recipient is 
unauthorized. The views and opinions expressed in this email message are 
the author's own and may not reflect the views and opinions of WANdisco, 
unless the author is authorized by WANdisco to express such views or 
opinions on its behalf. All email sent to or from this address is subject 
to electronic storage and review by WANdisco. Although WANdisco operates 
anti-virus programs, it does not accept responsibility for any damage 
whatsoever caused by viruses being passed.


Re: "svn patch" and the TAB character

2019-12-16 Thread Daniel Shahaf
Daniel Shahaf wrote on Tue, 17 Dec 2019 01:55 +00:00:
> I suppose that simply trying to repeatedly s/\s+\S+$// might work well 
> enough?  That is:

In English: remove all non-whitespaces at the end of the string, then
remove all trailing whitespace.

> data = line[len('--- '):].rstrip('\n')
> need_confirmation = False
> while len(data) > 0:
> if exists(data):
> break
> else:
> data =~ s/\s+\S+$//;
> need_confirmation = True
> else:
> raise Exception("{!r} does not exist".format(line[4:]))
> if need_confirmation:
> prompt_for_confirmation(data)
>


Re: "svn patch" and the TAB character

2019-12-16 Thread Daniel Shahaf
Doug Robinson wrote on Mon, Dec 16, 2019 at 11:13:25 -0500:
> So the two file names, differing only by a TAB in the "right place" will
> currently have completely different behaviors:
> 
>   My File NameSPACE(Revision 12)
>   My File NameTAB(Revision 12)
> 
> I see no point in maintaining that the "TAB" is critically significant and
> different from the "SPACE".  The difference is easier parsing vs. 1 visually
> indistinguishable use case (depending on what tool you are using to view the
> diff file).
> 
> And the "sequence of SPACE" instead of just a single "SPACE" falls into the
> same visually indistinguishable category.  So keep the parsing simple and
> just declare file names with "SP/TAB(Revision #)" at the end to not be
> supported.

You can't assume the string after the tab will be "(revision %ld)".

First of all, as I already pointed out, that string is translatable.  Second of
all, we also have to support patches generated by third-party tools, which may
contain arbitrary text after the tab character.  And of course, both the
filename and the label (= the part after the tab character) may contain an
arbitrary number of spaces.

Please propose an algorithm for parsing a filename out of a diff header line
(a '---' line or a '+++' line) that doesn't contain tabs, under these 
conditions.
(We don't have to fix _all_ cases, but fixing the bug just for English speakers
isn't going to fly.)

---

I suppose that simply trying to repeatedly s/\s+\S+$// might work well enough?  
That is:

data = line[len('--- '):].rstrip('\n')
need_confirmation = False
while len(data) > 0:
if exists(data):
break
else:
data =~ s/\s+\S+$//;
need_confirmation = True
else:
raise Exception("{!r} does not exist".format(line[4:]))
if need_confirmation:
prompt_for_confirmation(data)


Re: "svn patch" and the TAB character

2019-12-16 Thread Doug Robinson
On Sat, Dec 14, 2019 at 6:13 AM Stefan Sperling  wrote:

> 'svn patch' already behaves like regular patch, i.e. it assumes the whole
> line
> denotes a file name if no TAB can be found.
>
> The problem is that svn diff's revision number marker " (revision XY)" must
> be separated by a TAB. Otherwise it becomes part of the file name.
> Perhaps --patch-compatible should omit those markers?
>

The real problem is historical: poorly defined headers.
If the headers were simply:

  Something1: 
  Something2: 

Then there would be no problem.  With the current setup, the ambiguity means
that some file names are explicitly disallowed from being supported due to a
lack of disambiguating syntax (some type of quoting).  So the two file
names,
differing only by a TAB in the "right place" will currently have
completely different
behaviors:

  My File NameSPACE(Revision 12)
  My File NameTAB(Revision 12)

I see no point in maintaining that the "TAB" is critically significant and
different
from the "SPACE".  The difference is easier parsing vs. 1 visually
indistinguishable
use case (depending on what tool you are using to view the diff file).

And the "sequence of SPACE" instead of just a single "SPACE" falls into the
same visually indistinguishable category.  So keep the parsing simple and
just
declare file names with "SP/TAB(Revision #)" at the end to not be supported.

Cheers.

Doug
-- 
*DOUGLAS B ROBINSON* SENIOR PRODUCT MANAGER

T +1 925 396 1125
*E* doug.robin...@wandisco.com

-- 


* *

**The *LiveData* Company
*Find out more 
*wandisco.com *



 



THIS MESSAGE AND ANY ATTACHMENTS 
ARE CONFIDENTIAL, PROPRIETARY AND MAY BE PRIVILEGED
*


If this message was 
misdirected, WANdisco, Inc. and its subsidiaries, ("WANdisco") does not 
waive any confidentiality or privilege. If you are not the intended 
recipient, please notify us immediately and destroy the message without 
disclosing its contents to anyone. Any distribution, use or copying of this 
email or the information it contains by other than an intended recipient is 
unauthorized. The views and opinions expressed in this email message are 
the author's own and may not reflect the views and opinions of WANdisco, 
unless the author is authorized by WANdisco to express such views or 
opinions on its behalf. All email sent to or from this address is subject 
to electronic storage and review by WANdisco. Although WANdisco operates 
anti-virus programs, it does not accept responsibility for any damage 
whatsoever caused by viruses being passed.


Re: "svn patch" and the TAB character

2019-12-14 Thread Nathan Hartman
On Fri, Dec 13, 2019 at 5:00 PM Doug Robinson 
wrote:

> If I use "svn diff --patch compatible" to produce a patch file.  Then edit
> that patch file using an editor with settings that replace the TAB with a
> "proper" number of SPACE characters, the "svn patch" command will declare:
>
> $ svn patch ../n.txt
> Skipped missing target: 'TheFile  (revision 18)'
> Summary of conflicts:
>

FWIW I did see this recently when trying to run a reproduction script from
the TortoiseSVN forum. I realized that the tabs became spaced, fixed it,
and then the patch worked.

I don't have an opinion whether this behavior should be changed or not, but
perhaps we can try to detect this case and improve the error message?

Nathan


Re: Translatable diff headers (was: Re: "svn patch" and the TAB character)

2019-12-14 Thread Stefan Sperling
On Sat, Dec 14, 2019 at 12:28:04PM +, Daniel Shahaf wrote:
> Stefan Sperling wrote on Sat, 14 Dec 2019 11:13 +00:00:
> > The problem is that svn diff's revision number marker " (revision XY)" must
> > be separated by a TAB.
> 
> In diff generation, the whole string is translated: _("%s\t(revision
> %ld)").  It seems to me that if a translator changed the start of the
> string from "%s\t" to anything else, patches would no longer apply,
> would they?  If so, shouldn't we make just the "(revision %ld)" part
> translatable?
> 

Yes, that would be better indeed.


Translatable diff headers (was: Re: "svn patch" and the TAB character)

2019-12-14 Thread Daniel Shahaf
Stefan Sperling wrote on Sat, 14 Dec 2019 11:13 +00:00:
> The problem is that svn diff's revision number marker " (revision XY)" must
> be separated by a TAB.

In diff generation, the whole string is translated: _("%s\t(revision
%ld)").  It seems to me that if a translator changed the start of the
string from "%s\t" to anything else, patches would no longer apply,
would they?  If so, shouldn't we make just the "(revision %ld)" part
translatable?


Re: "svn patch" and the TAB character

2019-12-14 Thread Stefan Sperling
On Sat, Dec 14, 2019 at 10:59:01AM +, Daniel Shahaf wrote:
> Doug Robinson wrote on Fri, 13 Dec 2019 21:59 +00:00:
> > If I [...] edit that patch file using an editor with settings that replace 
> > the TAB 
> > with a "proper" number of SPACE characters,
> 
> Don't do that.  If the context lines start with tabs (ignoring the first
> column), they would be corrupted and the patch wouldn't apply — and even
> if it did apply, the indentation would be off-by-one or off-by-eight.  
> However:
> 
> > $ svn patch ../n.txt 
> > Skipped missing target: 'TheFile (revision 18)'
> > Summary of conflicts:
> >  Skipped paths: 1
> > 
> > (the TAB was between "TheFile" and "(revision...)".
> 
> Fixing this specific case would be helpful to projects that use spaces-
> only indentation where the unidiff got tabs-to-spaces'ed at some point
> (e.g., some terminal emulators do this).  I'm not opposed to it, though
> I don't think it's high priority either.  Does anyone have an algorithm
> to propose?  (An algorithm for deriving the name of a file to patch from
> the +++, ---, and "Index:" lines.)
> 
> Cheers,
> 
> Daniel

'svn patch' already behaves like regular patch, i.e. it assumes the whole line
denotes a file name if no TAB can be found.

The problem is that svn diff's revision number marker " (revision XY)" must
be separated by a TAB. Otherwise it becomes part of the file name.
Perhaps --patch-compatible should omit those markers?


Re: "svn patch" and the TAB character

2019-12-14 Thread Daniel Shahaf
Doug Robinson wrote on Fri, 13 Dec 2019 21:59 +00:00:
> If I [...] edit that patch file using an editor with settings that replace 
> the TAB 
> with a "proper" number of SPACE characters,

Don't do that.  If the context lines start with tabs (ignoring the first
column), they would be corrupted and the patch wouldn't apply — and even
if it did apply, the indentation would be off-by-one or off-by-eight.  However:

> $ svn patch ../n.txt 
> Skipped missing target: 'TheFile (revision 18)'
> Summary of conflicts:
>  Skipped paths: 1
> 
> (the TAB was between "TheFile" and "(revision...)".

Fixing this specific case would be helpful to projects that use spaces-
only indentation where the unidiff got tabs-to-spaces'ed at some point
(e.g., some terminal emulators do this).  I'm not opposed to it, though
I don't think it's high priority either.  Does anyone have an algorithm
to propose?  (An algorithm for deriving the name of a file to patch from
the +++, ---, and "Index:" lines.)

Cheers,

Daniel


"svn patch" and the TAB character

2019-12-13 Thread Doug Robinson
Folks:

If I use "svn diff --patch compatible" to produce a patch file.  Then edit
that patch file using an editor with settings that replace the TAB with a
"proper" number of SPACE characters, the "svn patch" command will declare:

$ svn patch ../n.txt
Skipped missing target: 'TheFile  (revision 18)'
Summary of conflicts:
  Skipped paths: 1

(the TAB was between "TheFile" and "(revision...)".  Changing the spaces
back to a tab enables the "svn patch" command to work.

I'm wondering if this is:
A. "goodness" as in "protection against diff chunks being
contaminated/modified" or
B. "bug" as in "the other patch tools out there don't enforce that TAB".

I looked through old postings and didn't see a discussion.

Thoughts?

Cheers.

Doug
-- 
*DOUGLAS B ROBINSON* SENIOR PRODUCT MANAGER

T +1 925 396 1125
*E* doug.robin...@wandisco.com

-- 


* *

**The *LiveData* Company
*Find out more 
*wandisco.com *



 



THIS MESSAGE AND ANY ATTACHMENTS 
ARE CONFIDENTIAL, PROPRIETARY AND MAY BE PRIVILEGED
*


If this message was 
misdirected, WANdisco, Inc. and its subsidiaries, ("WANdisco") does not 
waive any confidentiality or privilege. If you are not the intended 
recipient, please notify us immediately and destroy the message without 
disclosing its contents to anyone. Any distribution, use or copying of this 
email or the information it contains by other than an intended recipient is 
unauthorized. The views and opinions expressed in this email message are 
the author's own and may not reflect the views and opinions of WANdisco, 
unless the author is authorized by WANdisco to express such views or 
opinions on its behalf. All email sent to or from this address is subject 
to electronic storage and review by WANdisco. Although WANdisco operates 
anti-virus programs, it does not accept responsibility for any damage 
whatsoever caused by viruses being passed.