Re: SHA-1 collision in repository?

2018-02-22 Thread Philip Martin
Branko Čibej  writes:

> On 22.02.2018 21:30, Myria wrote:
>> When we try to commit a very specific version of a very specific
>> binary file, we get a SHA-1 collision error from the Subversion
>> repository:
>>
>> D:\confidential>svn commit secret.bin -m "Testing broken commit"
>> Sendingsecret.bin
>> Transmitting file data .svn: E16: Commit failed (details follow):
>> svn: E16: SHA1 of reps '604440 34 134255 136680
>> c9f4fabc4d093612fece03c339401058
>> db11617ef1454332336e00abc311d44bc698f3b3 605556-czmh/_8' and '-1 0
>> 134255 136680 c9f4fabc4d093612fece03c339401058
>> db11617ef1454332336e00abc311d44bc698f3b3 605556-czmh/_8' matches
>> (db11617ef1454332336e00abc311d44bc698f3b3) but contents differ
>>
>>
>> What can cause this?
>
> The simplest explanation would be a corruption of the existing
> representation on disk. Note that both the MD5 and the SHA1 checksums
> appear to match, as do the sizes; which makes it even more likely that
> it's the same file but the copy in the repository is somehow corrupted.

That pattern, all of MD5, SHA1 and size matching, is exactly what
happens if a SHA1 collision is committed using an old version of
Subversion where the rep-cache does not detect collisions.  The first
part of the collision would have been committed in r604440 and the
second part in r605556.

If that is the case, and a SHA1 collision did occur, then:

   svnadmin verify -r604440 path/to/repository

will succeed while:

   svnadmin verify -r605556 path/to/repository

will fail with an MD5 checksum error.

If this is what you see then unfortunately the colliding r605556 content
has been elided and the r605556 revision is corrupt.

You should be able to retrieve the first part of the collision from
r604440, it will be one of the files given by:

  svn log -v -r604440

The second part in r605556 is missing :-(  but it will be one of the
files given by:

  svn log -v -r605556

However your failing commit would also be a SHA1 collision with the
r604440 content (it might be identical to the missing content in
r605556).

-- 
Philip


Re: SHA-1 collision in repository?

2018-02-22 Thread Matt Simmons
I would get more advice from people here before you invest that time. I'm a
relative amateur and would listen to people with more experience than
myself.

--Matt

On Thu, Feb 22, 2018 at 2:29 PM, Myria  wrote:

> That was one document we ran into when searching, yes.
>
> We can do an svnsync, but this will take about a week to run--the
> repository is 43 GB with 600,000 commits.  I guess we'll start it now.
>
> On Thu, Feb 22, 2018 at 2:04 PM, Matt Simmons  wrote:
> > Hi Melissa,
> >
> > That definitely is interesting.
> >
> > I assume you have read
> > http://blogs.collab.net/subversion/subversion-sha1-
> collision-problem-statement-prevention-remediation-options
> >
> > If you do an svnsync to another location and attempt the commit there,
> does
> > the problem replicate itself?
> >
> > --Matt
> >
> >
> > On Thu, Feb 22, 2018 at 12:30 PM, Myria  wrote:
> >>
> >> When we try to commit a very specific version of a very specific
> >> binary file, we get a SHA-1 collision error from the Subversion
> >> repository:
> >>
> >> D:\confidential>svn commit secret.bin -m "Testing broken commit"
> >> Sendingsecret.bin
> >> Transmitting file data .svn: E16: Commit failed (details follow):
> >> svn: E16: SHA1 of reps '604440 34 134255 136680
> >> c9f4fabc4d093612fece03c339401058
> >> db11617ef1454332336e00abc311d44bc698f3b3 605556-czmh/_8' and '-1 0
> >> 134255 136680 c9f4fabc4d093612fece03c339401058
> >> db11617ef1454332336e00abc311d44bc698f3b3 605556-czmh/_8' matches
> >> (db11617ef1454332336e00abc311d44bc698f3b3) but contents differ
> >>
> >>
> >> What can cause this?  This file is a binary pixel shader compiled from
> >> a build process.  It's most certainly not Google's SHA-1 collision PDF
> >> files.  We also scanned the repository to confirm that nobody has
> >> committed Google's collision files.
> >>
> >> Occam's Razor suggests that something is wrong with our repository or
> >> Subversion itself, rather than this being a true SHA-1 collision.  In
> >> that case, what is wrong with our repository?
> >>
> >> If this really is a SHA-1 collision, it would be major cryptography
> >> news that someone randomly ran into a second collision without even
> >> trying.  In that case, is there a method by which we could recover the
> >> two files that supposedly have the same SHA-1?  The collision doesn't
> >> appear to be in the file itself, but in some sort of diff or revision
> >> output?
> >>
> >> Thanks,
> >>
> >> Melissa
> >
> >
> >
> >
> > --
> > "Today, vegetables... Tomorrow, the world!"
>



-- 
"Today, vegetables... Tomorrow, the world!"


Re: SHA-1 collision in repository?

2018-02-22 Thread Myria
That was one document we ran into when searching, yes.

We can do an svnsync, but this will take about a week to run--the
repository is 43 GB with 600,000 commits.  I guess we'll start it now.

On Thu, Feb 22, 2018 at 2:04 PM, Matt Simmons  wrote:
> Hi Melissa,
>
> That definitely is interesting.
>
> I assume you have read
> http://blogs.collab.net/subversion/subversion-sha1-collision-problem-statement-prevention-remediation-options
>
> If you do an svnsync to another location and attempt the commit there, does
> the problem replicate itself?
>
> --Matt
>
>
> On Thu, Feb 22, 2018 at 12:30 PM, Myria  wrote:
>>
>> When we try to commit a very specific version of a very specific
>> binary file, we get a SHA-1 collision error from the Subversion
>> repository:
>>
>> D:\confidential>svn commit secret.bin -m "Testing broken commit"
>> Sendingsecret.bin
>> Transmitting file data .svn: E16: Commit failed (details follow):
>> svn: E16: SHA1 of reps '604440 34 134255 136680
>> c9f4fabc4d093612fece03c339401058
>> db11617ef1454332336e00abc311d44bc698f3b3 605556-czmh/_8' and '-1 0
>> 134255 136680 c9f4fabc4d093612fece03c339401058
>> db11617ef1454332336e00abc311d44bc698f3b3 605556-czmh/_8' matches
>> (db11617ef1454332336e00abc311d44bc698f3b3) but contents differ
>>
>>
>> What can cause this?  This file is a binary pixel shader compiled from
>> a build process.  It's most certainly not Google's SHA-1 collision PDF
>> files.  We also scanned the repository to confirm that nobody has
>> committed Google's collision files.
>>
>> Occam's Razor suggests that something is wrong with our repository or
>> Subversion itself, rather than this being a true SHA-1 collision.  In
>> that case, what is wrong with our repository?
>>
>> If this really is a SHA-1 collision, it would be major cryptography
>> news that someone randomly ran into a second collision without even
>> trying.  In that case, is there a method by which we could recover the
>> two files that supposedly have the same SHA-1?  The collision doesn't
>> appear to be in the file itself, but in some sort of diff or revision
>> output?
>>
>> Thanks,
>>
>> Melissa
>
>
>
>
> --
> "Today, vegetables... Tomorrow, the world!"


Re: SHA-1 collision in repository?

2018-02-22 Thread Branko Čibej
On 22.02.2018 21:30, Myria wrote:
> When we try to commit a very specific version of a very specific
> binary file, we get a SHA-1 collision error from the Subversion
> repository:
>
> D:\confidential>svn commit secret.bin -m "Testing broken commit"
> Sendingsecret.bin
> Transmitting file data .svn: E16: Commit failed (details follow):
> svn: E16: SHA1 of reps '604440 34 134255 136680
> c9f4fabc4d093612fece03c339401058
> db11617ef1454332336e00abc311d44bc698f3b3 605556-czmh/_8' and '-1 0
> 134255 136680 c9f4fabc4d093612fece03c339401058
> db11617ef1454332336e00abc311d44bc698f3b3 605556-czmh/_8' matches
> (db11617ef1454332336e00abc311d44bc698f3b3) but contents differ
>
>
> What can cause this?

The simplest explanation would be a corruption of the existing
representation on disk. Note that both the MD5 and the SHA1 checksums
appear to match, as do the sizes; which makes it even more likely that
it's the same file but the copy in the repository is somehow corrupted.

-- Brane



Re: SHA-1 collision in repository?

2018-02-22 Thread Matt Simmons
Hi Melissa,

That definitely is interesting.

I assume you have read
http://blogs.collab.net/subversion/subversion-sha1-collision-problem-statement-prevention-remediation-options


If you do an svnsync to another location and attempt the commit there, does
the problem replicate itself?

--Matt


On Thu, Feb 22, 2018 at 12:30 PM, Myria  wrote:

> When we try to commit a very specific version of a very specific
> binary file, we get a SHA-1 collision error from the Subversion
> repository:
>
> D:\confidential>svn commit secret.bin -m "Testing broken commit"
> Sendingsecret.bin
> Transmitting file data .svn: E16: Commit failed (details follow):
> svn: E16: SHA1 of reps '604440 34 134255 136680
> c9f4fabc4d093612fece03c339401058
> db11617ef1454332336e00abc311d44bc698f3b3 605556-czmh/_8' and '-1 0
> 134255 136680 c9f4fabc4d093612fece03c339401058
> db11617ef1454332336e00abc311d44bc698f3b3 605556-czmh/_8' matches
> (db11617ef1454332336e00abc311d44bc698f3b3) but contents differ
>
>
> What can cause this?  This file is a binary pixel shader compiled from
> a build process.  It's most certainly not Google's SHA-1 collision PDF
> files.  We also scanned the repository to confirm that nobody has
> committed Google's collision files.
>
> Occam's Razor suggests that something is wrong with our repository or
> Subversion itself, rather than this being a true SHA-1 collision.  In
> that case, what is wrong with our repository?
>
> If this really is a SHA-1 collision, it would be major cryptography
> news that someone randomly ran into a second collision without even
> trying.  In that case, is there a method by which we could recover the
> two files that supposedly have the same SHA-1?  The collision doesn't
> appear to be in the file itself, but in some sort of diff or revision
> output?
>
> Thanks,
>
> Melissa
>



-- 
"Today, vegetables... Tomorrow, the world!"


SHA-1 collision in repository?

2018-02-22 Thread Myria
When we try to commit a very specific version of a very specific
binary file, we get a SHA-1 collision error from the Subversion
repository:

D:\confidential>svn commit secret.bin -m "Testing broken commit"
Sendingsecret.bin
Transmitting file data .svn: E16: Commit failed (details follow):
svn: E16: SHA1 of reps '604440 34 134255 136680
c9f4fabc4d093612fece03c339401058
db11617ef1454332336e00abc311d44bc698f3b3 605556-czmh/_8' and '-1 0
134255 136680 c9f4fabc4d093612fece03c339401058
db11617ef1454332336e00abc311d44bc698f3b3 605556-czmh/_8' matches
(db11617ef1454332336e00abc311d44bc698f3b3) but contents differ


What can cause this?  This file is a binary pixel shader compiled from
a build process.  It's most certainly not Google's SHA-1 collision PDF
files.  We also scanned the repository to confirm that nobody has
committed Google's collision files.

Occam's Razor suggests that something is wrong with our repository or
Subversion itself, rather than this being a true SHA-1 collision.  In
that case, what is wrong with our repository?

If this really is a SHA-1 collision, it would be major cryptography
news that someone randomly ran into a second collision without even
trying.  In that case, is there a method by which we could recover the
two files that supposedly have the same SHA-1?  The collision doesn't
appear to be in the file itself, but in some sort of diff or revision
output?

Thanks,

Melissa


Re: auto-props syntax in file vs. property

2018-02-22 Thread Chris
Hi Brane,

thanks for the reply. Then I understand why it's acting the way it is. It would 
have been nicer with different separators for the two cases, but it is what it 
is and I agree that it works.
The downside is that my initial fix (earlier in this thread) for 
svn_apply_autoprops.py isn't correct since I need to prune the second ; before 
it calls propset. Need to try another fix then (unless someone has fixed that 
in the repo already)

/Chris



On Thu, 2/22/18, Branko Čibej  wrote:

 Subject: Re: auto-props syntax in file vs. property
 To: users@subversion.apache.org
 Date: Thursday, February 22, 2018, 2:10 PM
 
 On 22.02.2018 13:52, Chris
 wrote:
 > Re-awakening my previous thread
 about the auto-properties. I get really confused by where to
 use ;; and ; as a separator.
 >
 > I currently have this in the auto-props on
 the repo:
 > *.txt =
 svn:mime-type=text/plain;;charset=iso-8859-1;svn:eol-style=LF
 >
 > And then if I add a
 text file:
 > prompt>  touch foo.txt;
 svn add foo.txt; svn pg svn:mime-type foo.txt
 > A         foo.txt
 >
 text/plain;charset=iso-8859-1
 
 More completely: an 'svn proplist -v'
 would show the folloing properties:
 
    
 svn:mime-type=text/plain;charset=iso-8859-1
     svn:eol-style=LF
 
 > So the property itself is with just one
 semicolon in there despite the auto-prop having ;;
 > Is this the correct behavior?
 
 Yes of course. In the
 auto-props configuration, a single colon separates
 individual properties. If you want a colon
 within a property value, you
 have to write
 ;; in the auto-props configuration to get the ; in the
 property value.
 
 If instead you'd had this auto-props
 configuration:
 
 *.txt =
 svn:mime-type=text/plain;charset=iso-8859-1;svn:eol-style=LF
 
 
 Then, when
 you added a file to Subversion, you'd get the
 following
 properties set:
 
    
 svn:mime-type=text/plain
    
 charset=iso-8859-1
     svn:eol-style=LF
 
 
 which is
 probably not what you want.
 
 
 > While if I to the same
 thing manually, i.e.
 >
 > prompt> touch foo; svn add foo; svn
 propset svn:mime-type
 "text/plain;;charset=iso-8859-1" foo; svn pg
 svn:mime-type foo
 > A         foo
 > property 'svn:mime-type' set on
 'foo'
 >
 text/plain;;charset=iso-8859-1
 >
 > That is, I'm passing in the exact
 string that I have in my auto-props into propset for a file
 without .txt-suffix so I don't get the auto-properties.
 But as you see in the resulting property that I now have has
 double semi-colons.
 
 On the
 command-line you can only set a singly property value at a
 time,
 so there's no need to escape the
 ';' delimiter.
 But the auto-props
 configuration isn't a single property; it's
 several
 properties, delimited with a single
 ';'.
 
 > My guess
 is that the former is the intended behavior and I should not
 be passing in the ";;" into the manual command,
 
 Yes.
 
 >  but
 I'm getting really confused here. I seems very
 error-prone that manual propset can't use the strings
 from the config file or auto-props wihtout getting a
 different result.
 >
 >
 Which version is the correct one, or do both actually do the
 job?
 
 Each does its
 job in its own context.
 
 --
 Brane
 


Re: auto-props syntax in file vs. property

2018-02-22 Thread Branko Čibej
On 22.02.2018 13:52, Chris wrote:
> Re-awakening my previous thread about the auto-properties. I get really 
> confused by where to use ;; and ; as a separator.
>
> I currently have this in the auto-props on the repo:
> *.txt = svn:mime-type=text/plain;;charset=iso-8859-1;svn:eol-style=LF
>
> And then if I add a text file:
> prompt>  touch foo.txt; svn add foo.txt; svn pg svn:mime-type foo.txt
> A foo.txt
> text/plain;charset=iso-8859-1

More completely: an 'svn proplist -v' would show the folloing properties:

svn:mime-type=text/plain;charset=iso-8859-1
svn:eol-style=LF

> So the property itself is with just one semicolon in there despite the 
> auto-prop having ;;
> Is this the correct behavior?

Yes of course. In the auto-props configuration, a single colon separates
individual properties. If you want a colon within a property value, you
have to write ;; in the auto-props configuration to get the ; in the
property value.

If instead you'd had this auto-props configuration:

*.txt = svn:mime-type=text/plain;charset=iso-8859-1;svn:eol-style=LF


Then, when you added a file to Subversion, you'd get the following
properties set:

svn:mime-type=text/plain
charset=iso-8859-1
svn:eol-style=LF


which is probably not what you want.


> While if I to the same thing manually, i.e.
>
> prompt> touch foo; svn add foo; svn propset svn:mime-type 
> "text/plain;;charset=iso-8859-1" foo; svn pg svn:mime-type foo
> A foo
> property 'svn:mime-type' set on 'foo'
> text/plain;;charset=iso-8859-1
>
> That is, I'm passing in the exact string that I have in my auto-props into 
> propset for a file without .txt-suffix so I don't get the auto-properties. 
> But as you see in the resulting property that I now have has double 
> semi-colons.

On the command-line you can only set a singly property value at a time,
so there's no need to escape the ';' delimiter.
But the auto-props configuration isn't a single property; it's several
properties, delimited with a single ';'.

> My guess is that the former is the intended behavior and I should not be 
> passing in the ";;" into the manual command,

Yes.

>  but I'm getting really confused here. I seems very error-prone that manual 
> propset can't use the strings from the config file or auto-props wihtout 
> getting a different result.
>
> Which version is the correct one, or do both actually do the job?

Each does its job in its own context.

-- Brane


Re: auto-props syntax in file vs. property

2018-02-22 Thread Chris
Re-awakening my previous thread about the auto-properties. I get really 
confused by where to use ;; and ; as a separator.

I currently have this in the auto-props on the repo:
*.txt = svn:mime-type=text/plain;;charset=iso-8859-1;svn:eol-style=LF

And then if I add a text file:
prompt>  touch foo.txt; svn add foo.txt; svn pg svn:mime-type foo.txt
A foo.txt
text/plain;charset=iso-8859-1

So the property itself is with just one semicolon in there despite the 
auto-prop having ;;
Is this the correct behavior?

While if I to the same thing manually, i.e.

prompt> touch foo; svn add foo; svn propset svn:mime-type 
"text/plain;;charset=iso-8859-1" foo; svn pg svn:mime-type foo
A foo
property 'svn:mime-type' set on 'foo'
text/plain;;charset=iso-8859-1

That is, I'm passing in the exact string that I have in my auto-props into 
propset for a file without .txt-suffix so I don't get the auto-properties. But 
as you see in the resulting property that I now have has double semi-colons.

My guess is that the former is the intended behavior and I should not be 
passing in the ";;" into the manual command, but I'm getting really confused 
here. I seems very error-prone that manual propset can't use the strings from 
the config file or auto-props wihtout getting a different result.

Which version is the correct one, or do both actually do the job?

BR
  Chris





On Wed, 1/10/18, Daniel Shahaf  wrote:

 Subject: Re: auto-props syntax in file vs. property
 To: "Chris" , users@subversion.apache.org
 Date: Wednesday, January 10, 2018, 8:51 PM
 
 Chris wrote on Wed, 10 Jan 2018
 08:26 +:
 > I think the fix to
 svn_apply_autoprops.py should be something like below 
 >
 (/subversion/trunk/contrib/client-side/svn_apply_autoprops.py)
 > If anyone with commit rights wants to fix
 it on the repo, feel free to 
 > use the
 below, or improve it as necessary (my python knowledge is
 non-
 > existing)
 > 
 > Index: svn_apply_autoprops.py
 >
 ===
 > --- svn_apply_autoprops.py     
 (revision 103617)
 > +++
 svn_apply_autoprops.py      (revision 103618)
 > @@ -101,7 +101,11 @@
 >      # leading and trailing whitespce
 from the propery names and
 >      #
 values.
 >      props_list = []
 > -    for prop in
 props.split(';'):
 > +    #
 Since ;; is a separator within one property, we need to
 do
 > +    # regex and use both negative
 lookahead and lookbehind to avoid
 > + 
   # ever matching a more than one semicolon in the split
 > +    semicolonpattern =
 re.compile("(?
 +    for prop in re.split(semicolonpattern, props):
 
 That's clever, but it will
 misparse sequences of three or more semicolons in a row,
 such as
 
 *.foo =
 key=val;;with;;semicolons;;;anotherkey=anotherval
 
 Daniel