Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2024-02-01 Thread Nathan Hartman
On Thu, Feb 1, 2024 at 5:26 PM Daniel Sahlberg
 wrote:
>
> Gentlemen,
>
> It seems you have both had your say in what flaws there has been in the 
> process. Can we please leave this part of the discussion and continue on the 
> technical issues? I'd hate for this discussion to turn to pie-throwing where 
> someone in the end feel offended and leave the community. We are such a small 
> community and we can't afford to lose someone just because an argument turns 
> toxic (it has happened before so let's make sure it doesn't happen again, 
> please).

I completely agree. Yes, there has been disagreement about process,
but it is counterproductive to debate that anymore. Let's focus on the
technical question and try to reach some consensus on what (if
anything) to do.

> As for the technical side, can we break down the current status and the 
> desired future status to some points and then look at what options we have 
> for solutions?
>
> Currently we use SHA1, which have known attacks. What are the risks?
> - It has been argued that `svn st` will, especially with no-pristines, be 
> extra vulnerable to not detecting a modified file if someone can create a 
> collision with the checksum of the original file
> - Someone also argued that a software could potentially be banned just 
> because it uses a checksum with a known attack, even if the checksum isn't 
> used in a security critical way.

I was the one who spoke about that possibility.

Just one example: NIST has already recommended federal agencies to
stop using SHA-1 for "signatures and other operations threatened by
collision attacks" and by 31 Dec 2030 NIST will publish "a revision of
FIPS 180 that removes the SHA-1 specification" and "Modules that still
use SHA-1 after 2030 will not be permitted for purchase by the federal
government." All those quotes are taken from [1], which was one of the
top hits in a recent DuckDuckGo search. (I don't remember the exact
search.)

Now, even if SVN's use cases of SHA1 are agreed by the developers to
be completely safe, I think it is a real possibility that some sites
could ban SVN because they consider SHA1 a banned algorithm, and even
if we explain that SVN's use of SHA1 is completely safe, those
explanations might not be acceptable in those settings, even if we are
right.

Given the way technology is used, understood, and sometimes (often?)
misunderstood, I can imagine a ridiculous scenario in which Subversion
could use 8-bit CRC, but not SHA1, even though SHA1 is much stronger
than 8-bit CRC, just because SHA1 is "banned" and 8-bit CRC is not.

> What options do we have and how do they mitigate the above risks?> - Evgeny 
> has already shown a possible solution with a salted hash (keeping SHA-1).
> - Can we switch to another hash function completely and does it offer any 
> benefits compared to the salted SHA-1?
> - Should we even do both?
>
> Any other points?
>
> Any thoughts?
>
> I would like to see this thread progress and I hope we can find consensus on 
> a way forward.
>
> Kind regards,
> Daniel Sahlberg

I, too, hope the community can come together and reach a consensus,
whatever that ends up being.

[1] 
https://www.securityweek.com/nist-retire-27-year-old-sha-1-cryptographic-algorithm/

Cheers,
Nathan


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2024-02-01 Thread Daniel Sahlberg
Gentlemen,

It seems you have both had your say in what flaws there has been in the
process. Can we please leave this part of the discussion and continue on
the technical issues? I'd hate for this discussion to turn to pie-throwing
where someone in the end feel offended and leave the community. We are such
a small community and we can't afford to lose someone just because an
argument turns toxic (it has happened before so let's make sure it doesn't
happen again, please).

As for the technical side, can we break down the current status and the
desired future status to some points and then look at what options we have
for solutions?

Currently we use SHA1, which have known attacks. What are the risks?
- It has been argued that `svn st` will, especially with no-pristines, be
extra vulnerable to not detecting a modified file if someone can create a
collision with the checksum of the original file
- Someone also argued that a software could potentially be banned just
because it uses a checksum with a known attack, even if the checksum isn't
used in a security critical way.

What options do we have and how do they mitigate the above risks?
- Evgeny has already shown a possible solution with a salted hash (keeping
SHA-1).
- Can we switch to another hash function completely and does it offer any
benefits compared to the salted SHA-1?
- Should we even do both?

Any other points?

Any thoughts?

I would like to see this thread progress and I hope we can find consensus
on a way forward.

Kind regards,
Daniel Sahlberg


Den tors 18 jan. 2024 kl 14:36 skrev Evgeny Kotkov via dev <
dev@subversion.apache.org>:

> Daniel Shahaf  writes:
>
> > Procedurally, the long hiatus is counterproductive.
>
> This reminds me that the substantive discussion of your veto ended with my
> email from 8 Feb 2023 that had four direct questions to you and was left
> without an answer:
>
> ``
>   > That's not how design discussions work.  A design discussion doesn't go
>   > "state decision; state pros; implement"; it goes "state problem;
> discuss
>   > potential solutions, pros, cons; decide; implement" (cf. [4, 5, 6]).
>
>   Well, I think it may not be as simple as it seems to you.  Who decided
> that
>   we should follow the process you're describing?  Is there a thread with a
>   consensus on this topic?  Or do you insist on using this specific process
>   because it's the only process that seems obvious to you?  What
> alternatives
>   to it have been considered?
>
>   As far as I can tell, the process you're suggesting is effectively a
>   waterfall-like process, and there are quite a lot of concerns about its
>   effectiveness, because the decisions have to be made in the conditions of
>   a lack of information.
> ``
>
> It's been more than 11 months since that email, and those questions still
> don't have an answer.  So if we are to resume this discussion, let's do it
> from the proper point.
>
> > You guys are welcome to try to /convince/ me to change my opinion, or to
> > have the veto invalidated.  In either case, you will be more likely to
> > succeed should your arguments relate not only to the veto's implications
> > but also to its /sine qua non/ component: its rationale.
>
> Just in case, my personal opinion here is that the veto is invalid.
>
> Firstly, based on my understanding, the ASF rules prohibit casting a veto
> without an appropriate technical justification (see [1], which I personally
> agree with).  Secondly, it seems that the process you are imposing hasn't
> been
> accepted in this community.  As far as I know, this topic was tangentially
> discussed before (see [2], for example), and it looks like there hasn't
> been
> a consensus to change our current Commit-Then-Review process into some
> sort of Review-Then-Commit.
>
> (At the same time I won't even try to /convince/ you, sorry.)
>
> [1] https://www.apache.org/foundation/voting.html
> [2] https://lists.apache.org/thread/ow2x68g2k4lv2ycr81d14p8r8w2jj1xl
>
>
> Regards,
> Evgeny Kotkov
>


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2024-01-18 Thread Evgeny Kotkov via dev
Daniel Sahlberg  writes:

> As far as I understand, the point of multi-hash is to keep the WC format
> between versions (so older clients can continue to use the WC).

Just as a minor note, the working copies created using the implementation
on the `pristine-checksum-salt` branch don't multi-hash the contents, but
rather make the [single] used checksum kind configurable and persist it at
the moment when a working copy is created or upgraded.

> I need some help to understand how that would work in practice. Let's say
> that 1.15 adds SHAABC, 1.16 adds SHAXYZ. Then 1.17 drops SHA1. But...
> - A 1.17 client will only use SHAABC or SHAXYZ hashes.
> - A 1.16 client can use SHA1, SHAABC and SHAXYZ hashes.
> - A 1.15 client can only use SHA1 and SHAABC hashes.
>
> How can these work together? A WC created in 1.17 can't be used by a 1.15
> client and a WC created in 1.15 (with SHA1) can't be used by a 1.17 client.
> How is this different from bumping the format? How do we detect this?

In the current design available on the `pristine-checksum-salt` branch, the
supported checksum kinds are tied to a working copy format, and any supported
checksum kind may additionally use a dynamic salt.  For example, format 33
supports only SHA-1 (regular or dynamically salted), but a newer format 34
can add support for another checksum kind such as SHA-2 if necessary.

When an existing working copy is upgraded to a newer format, its current
checksum kind is retained as is (we can't rehash the content in a
`--store-pristine=no` case because the pristines are not available).

I don't know if we'll find ourselves having to forcefully phase out SHA-1
*even* for such working copies that retain an older checksum kind, i.e.,
it might be enough to use the new checksum kind only for freshly created
working copies.  However, there would be a few options to consider:

I think that milder options could include warning the user to check out a
new working copy (that would use a different checksum kind), and a harsher
option could mean adding a new format that doesn't support SHA-1 under
any circumstances, and declaring all previously available working copy
formats unsupported.


Regards,
Evgeny Kotkov


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2024-01-18 Thread Branko Čibej

On 18.01.2024 08:43, Daniel Sahlberg wrote:
As far as I understand, the point of multi-hash is to keep the WC 
format between versions (so older clients can continue to use the WC). 
I need some help to understand how that would work in practice. Let's 
say that 1.15 adds SHAABC, 1.16 adds SHAXYZ. Then 1.17 drops SHA1. But...

- A 1.17 client will only use SHAABC or SHAXYZ hashes.
- A 1.16 client can use SHA1, SHAABC and SHAXYZ hashes.
- A 1.15 client can only use SHA1 and SHAABC hashes.

How can these work together? A WC created in 1.17 can't be used by a 
1.15 client and a WC created in 1.15 (with SHA1) can't be used by a 
1.17 client. How is this different from bumping the format? How do we 
detect this?


It's just another dimension of changing the format. When you introduce 
multihash, you have to bump the format number so that clients that don't 
know about it won't try to use the WC. Clients that _do_ know about it 
will have to check which hash algorithm(s) are used in any case.



At least, we'd need some method of updating the hashes in the 
database, akin the WC format upgrades in some versions (was it 1.8?).


"svn upgrade" is where this would happen. On the multi-wc-format branch 
(if memory serves), it accepts a target WC version -- which is 
equivalent to the feature set supported by the WC. There's no reason why 
it couldn't also grow a "--force-hash=quantum-entangled" option.


-- Brane

Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2024-01-18 Thread Evgeny Kotkov via dev
Daniel Shahaf  writes:

> Procedurally, the long hiatus is counterproductive.

This reminds me that the substantive discussion of your veto ended with my
email from 8 Feb 2023 that had four direct questions to you and was left
without an answer:

``
  > That's not how design discussions work.  A design discussion doesn't go
  > "state decision; state pros; implement"; it goes "state problem; discuss
  > potential solutions, pros, cons; decide; implement" (cf. [4, 5, 6]).

  Well, I think it may not be as simple as it seems to you.  Who decided that
  we should follow the process you're describing?  Is there a thread with a
  consensus on this topic?  Or do you insist on using this specific process
  because it's the only process that seems obvious to you?  What alternatives
  to it have been considered?

  As far as I can tell, the process you're suggesting is effectively a
  waterfall-like process, and there are quite a lot of concerns about its
  effectiveness, because the decisions have to be made in the conditions of
  a lack of information.
``

It's been more than 11 months since that email, and those questions still
don't have an answer.  So if we are to resume this discussion, let's do it
from the proper point.

> You guys are welcome to try to /convince/ me to change my opinion, or to
> have the veto invalidated.  In either case, you will be more likely to
> succeed should your arguments relate not only to the veto's implications
> but also to its /sine qua non/ component: its rationale.

Just in case, my personal opinion here is that the veto is invalid.

Firstly, based on my understanding, the ASF rules prohibit casting a veto
without an appropriate technical justification (see [1], which I personally
agree with).  Secondly, it seems that the process you are imposing hasn't been
accepted in this community.  As far as I know, this topic was tangentially
discussed before (see [2], for example), and it looks like there hasn't been
a consensus to change our current Commit-Then-Review process into some
sort of Review-Then-Commit.

(At the same time I won't even try to /convince/ you, sorry.)

[1] https://www.apache.org/foundation/voting.html
[2] https://lists.apache.org/thread/ow2x68g2k4lv2ycr81d14p8r8w2jj1xl


Regards,
Evgeny Kotkov


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2024-01-17 Thread Daniel Sahlberg
@Karl Fogel ,  @Evgeny Kotkov


Any chance for a comment on the questions in this thread?

I've also added my own comment below.

Kind regards,
Daniel



Den sön 14 jan. 2024 kl 00:56 skrev Nathan Hartman :

> On Fri, Jan 12, 2024 at 3:51 PM Johan Corveleyn  wrote:
>
>> On Fri, Jan 12, 2024 at 12:37 PM Daniel Shahaf 
>> wrote:
>> ...
>> > Procedurally, the long hiatus is counterproductive.  Neither kfogel nor
>> > I had the context in our heads, and the cache misses took their toll in
>> > tuits and in wallclock time.  Furthermore, I have less spare time for
>> > dev@ discussions than I did when I cast the veto (= a year ago next
>> > Saturday).  Going forward it might be preferable for threads not to
>> > hibernate.
>>
>> I agree, but obviously the hibernation is not some deliberate action
>> by anyone. It's just that most of us here have less spare time for
>> dev@ discussions (and for SVN development) than before. Especially for
>> such complex matters, and especially when people feel there are
>> walking into a minefield. There are only a few active devs left, and
>> tuits are running low ...
>>
>> ...
>> > That being the case, I have considered whether merging the feature
>> > branch outweighs letting dev@ take a not-only-/pro forma/ role in
>> > design discussions.  I am of the opinion that it does not, and
>> > therefore I reäfirrm the veto.
>>
>> It has become more clear to me (I was only following tangentially)
>> that your veto is focused on the development methodology and the lack
>> of design discussion. Is that a valid reason for a veto? We are low on
>> resources, someone still finds time to make some progress, no one
>> blocks it on technical grounds, and then someone vetoes it because we
>> don't have enough resources?
>>
>> That puts us pretty much in deadlock, because we are too low on
>> resources. Or maybe I misunderstand?
>>
>> To be clear: I appreciate your input, Daniel, and your insistence on a
>> more thorough design discussion. I assume it's coming from a genuine
>> concern that we formulate problems well, and think hard about possible
>> solutions (focusing on the precise problem we are trying to solve).
>> But at the end of the day, if that design discussion doesn't happen
>> (or not enough to your satisfaction anyway), is that grounds for a
>> veto? For me it's a tough call, because on the one hand you have a
>> point, but on the other hand ... you're blocking _some_ progress
>> because the process behind it is not perfect (which is hard to do with
>> the 3.25 tuits we have left).
>>
>> > P.S.  Could that BRANCH-README please state what's the problem the
>> branch
>> > means to solve, i.e., the goal / acceptance test?  "Make it possible to
>> > «svn add» SHA-1 collisions"?
>>
>> I agree that would be a good step.
>>
>> I too find it a bit unclear what problem we're actually trying to
>> solve, apart from a vague feeling that SHA-1 will become more and more
>> broken over time, and that this will cause fatal injury to SVN (in its
>> WC, protocol, dump format, or repository). And perhaps the fact that
>> security auditors are becoming more and more triggered by seeing SHA-1
>> (even if they don't understand the way it is used and its
>> ramifications). Making it possible to 'svn add' SHA-1 collisions is
>> not it, I think.
>>
>> --
>> Johan
>>
>
>
> Johan's reply sums up my thoughts pretty closely.
>
> I would very much like to *avoid* all of the following: deadlock, bad
> feelings, and members of this small community leaving because of deadlocks
> or bad feelings.
>
> I agree that (at the very least), BRANCH-README should define what problem
> the branch aims to solve, and perhaps that's really the main thing we need
> to discuss and resolve.
>
> Johan touched on one issue with SHA1: regardless how it is actually used
> in SVN and whether it is adequate for those purposes, there is customer
> perception. I can imagine, for example, the IT dept of some big
> $corporation could blacklist SHA1 because it is considered broken for
> cryptographic purposes. But they could blacklist it for everything. Even
> though it is safe and effective for our use cases, try explaining that to
> an admin who is struggling to meet such a blanket policy.
>
> I would like to add another reason to think about a post-SHA1 future: I'm
> writing on mobile so I can't easily grep for things now, but could our
> dependencies eventually remove the SHA1 implementation? (I just saw
> something about removal of DSA from some famous lib not too long ago. SHA1
> could be next?)
>
> When would SHA1 disappear? I don't know, but I consider it plausible to
> happen in about 5 years.
>
> If SHA1 is removed in the future, there will need to be a mad dash to
> replace it. Or we'll have to add a new dependency to use an alternate
> implementation. Or we'll have to implement our own SHA1 or copy some code
> into SVN. All of these seem bad to me.
>
> Switching to a different hash is also a bad idea, I think, because it is

Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2024-01-13 Thread Nathan Hartman
On Sat, Jan 13, 2024 at 3:56 PM Nathan Hartman 
wrote:

> Pros: Future-proofing against the real and perceived brokenness of any
> hash types.
>

I meant to write:

Pros: Future-proofing against the real and perceived brokenness of any hash
types, or the deprecation and later removal of their implementations from
our deps.

Cheers,
Nathan


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2024-01-13 Thread Nathan Hartman
On Fri, Jan 12, 2024 at 3:51 PM Johan Corveleyn  wrote:

> On Fri, Jan 12, 2024 at 12:37 PM Daniel Shahaf 
> wrote:
> ...
> > Procedurally, the long hiatus is counterproductive.  Neither kfogel nor
> > I had the context in our heads, and the cache misses took their toll in
> > tuits and in wallclock time.  Furthermore, I have less spare time for
> > dev@ discussions than I did when I cast the veto (= a year ago next
> > Saturday).  Going forward it might be preferable for threads not to
> > hibernate.
>
> I agree, but obviously the hibernation is not some deliberate action
> by anyone. It's just that most of us here have less spare time for
> dev@ discussions (and for SVN development) than before. Especially for
> such complex matters, and especially when people feel there are
> walking into a minefield. There are only a few active devs left, and
> tuits are running low ...
>
> ...
> > That being the case, I have considered whether merging the feature
> > branch outweighs letting dev@ take a not-only-/pro forma/ role in
> > design discussions.  I am of the opinion that it does not, and
> > therefore I reäfirrm the veto.
>
> It has become more clear to me (I was only following tangentially)
> that your veto is focused on the development methodology and the lack
> of design discussion. Is that a valid reason for a veto? We are low on
> resources, someone still finds time to make some progress, no one
> blocks it on technical grounds, and then someone vetoes it because we
> don't have enough resources?
>
> That puts us pretty much in deadlock, because we are too low on
> resources. Or maybe I misunderstand?
>
> To be clear: I appreciate your input, Daniel, and your insistence on a
> more thorough design discussion. I assume it's coming from a genuine
> concern that we formulate problems well, and think hard about possible
> solutions (focusing on the precise problem we are trying to solve).
> But at the end of the day, if that design discussion doesn't happen
> (or not enough to your satisfaction anyway), is that grounds for a
> veto? For me it's a tough call, because on the one hand you have a
> point, but on the other hand ... you're blocking _some_ progress
> because the process behind it is not perfect (which is hard to do with
> the 3.25 tuits we have left).
>
> > P.S.  Could that BRANCH-README please state what's the problem the branch
> > means to solve, i.e., the goal / acceptance test?  "Make it possible to
> > «svn add» SHA-1 collisions"?
>
> I agree that would be a good step.
>
> I too find it a bit unclear what problem we're actually trying to
> solve, apart from a vague feeling that SHA-1 will become more and more
> broken over time, and that this will cause fatal injury to SVN (in its
> WC, protocol, dump format, or repository). And perhaps the fact that
> security auditors are becoming more and more triggered by seeing SHA-1
> (even if they don't understand the way it is used and its
> ramifications). Making it possible to 'svn add' SHA-1 collisions is
> not it, I think.
>
> --
> Johan
>


Johan's reply sums up my thoughts pretty closely.

I would very much like to *avoid* all of the following: deadlock, bad
feelings, and members of this small community leaving because of deadlocks
or bad feelings.

I agree that (at the very least), BRANCH-README should define what problem
the branch aims to solve, and perhaps that's really the main thing we need
to discuss and resolve.

Johan touched on one issue with SHA1: regardless how it is actually used in
SVN and whether it is adequate for those purposes, there is customer
perception. I can imagine, for example, the IT dept of some big
$corporation could blacklist SHA1 because it is considered broken for
cryptographic purposes. But they could blacklist it for everything. Even
though it is safe and effective for our use cases, try explaining that to
an admin who is struggling to meet such a blanket policy.

I would like to add another reason to think about a post-SHA1 future: I'm
writing on mobile so I can't easily grep for things now, but could our
dependencies eventually remove the SHA1 implementation? (I just saw
something about removal of DSA from some famous lib not too long ago. SHA1
could be next?)

When would SHA1 disappear? I don't know, but I consider it plausible to
happen in about 5 years.

If SHA1 is removed in the future, there will need to be a mad dash to
replace it. Or we'll have to add a new dependency to use an alternate
implementation. Or we'll have to implement our own SHA1 or copy some code
into SVN. All of these seem bad to me.

Switching to a different hash is also a bad idea, I think, because it is
likely to suffer the same problems as SHA1 later on, as cryptography
research proceeds and newer hashes become declared broken.

I'll try to describe what I think is a best case scenario: Support
multi-hash in 1.15 in format 32 WCs. SHA1 can continue to be the default
but we should be careful not to require a SHA1 implementation to 

Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2024-01-13 Thread Daniel Sahlberg
Den lör 13 jan. 2024 kl 00:50 skrev Johan Corveleyn :

> On Fri, Jan 12, 2024 at 12:37 PM Daniel Shahaf 
> wrote:
> ...
> > Procedurally, the long hiatus is counterproductive.  Neither kfogel nor
> > I had the context in our heads, and the cache misses took their toll in
> > tuits and in wallclock time.  Furthermore, I have less spare time for
> > dev@ discussions than I did when I cast the veto (= a year ago next
> > Saturday).  Going forward it might be preferable for threads not to
> > hibernate.
>
> I agree, but obviously the hibernation is not some deliberate action
> by anyone. It's just that most of us here have less spare time for
> dev@ discussions (and for SVN development) than before. Especially for
> such complex matters, and especially when people feel there are
> walking into a minefield. There are only a few active devs left, and
> tuits are running low ...
>

I agree with Johan on this. The long hiatus is unfortunate. But it won't
help to point fingers at this point.



>
> ...
> > That being the case, I have considered whether merging the feature
> > branch outweighs letting dev@ take a not-only-/pro forma/ role in
> > design discussions.  I am of the opinion that it does not, and
> > therefore I reäfirrm the veto.
>
> It has become more clear to me (I was only following tangentially)
> that your veto is focused on the development methodology and the lack
> of design discussion. Is that a valid reason for a veto? We are low on
> resources, someone still finds time to make some progress, no one
> blocks it on technical grounds, and then someone vetoes it because we
> don't have enough resources?
>
> That puts us pretty much in deadlock, because we are too low on
> resources. Or maybe I misunderstand?
>
> To be clear: I appreciate your input, Daniel, and your insistence on a
> more thorough design discussion. I assume it's coming from a genuine
> concern that we formulate problems well, and think hard about possible
> solutions (focusing on the precise problem we are trying to solve).
> But at the end of the day, if that design discussion doesn't happen
> (or not enough to your satisfaction anyway), is that grounds for a
> veto? For me it's a tough call, because on the one hand you have a
> point, but on the other hand ... you're blocking _some_ progress
> because the process behind it is not perfect (which is hard to do with
> the 3.25 tuits we have left).
>
> > P.S.  Could that BRANCH-README please state what's the problem the branch
> > means to solve, i.e., the goal / acceptance test?  "Make it possible to
> > «svn add» SHA-1 collisions"?
>
> I agree that would be a good step.
>
> I too find it a bit unclear what problem we're actually trying to
> solve, apart from a vague feeling that SHA-1 will become more and more
> broken over time, and that this will cause fatal injury to SVN (in its
> WC, protocol, dump format, or repository). And perhaps the fact that
> security auditors are becoming more and more triggered by seeing SHA-1
> (even if they don't understand the way it is used and its
> ramifications). Making it possible to 'svn add' SHA-1 collisions is
> not it, I think.
>

I also agree with this.

>From what I remember of the dicsussions earlier there were concerns that a
changed file might go undetected if someone change it to another file with
a collision with the original file. I think that might be a vaild point,
especially if we don't have the pristine files anymore.

I'd also like to understand why we need the multi-checksum format instead
of just plainly switching to XXX (insert favourite checksuming algorithm
here). Does it help us to have multiple types of checksums available? Would
we use BOTH as a resort (likelyhood of collision in SHA1 and in XXX at the
same time approaching zero)? Does it help backwards/forwards compatibility?

Kind regards,
Daniel Sahlberg


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2024-01-12 Thread Johan Corveleyn
On Fri, Jan 12, 2024 at 12:37 PM Daniel Shahaf  wrote:
...
> Procedurally, the long hiatus is counterproductive.  Neither kfogel nor
> I had the context in our heads, and the cache misses took their toll in
> tuits and in wallclock time.  Furthermore, I have less spare time for
> dev@ discussions than I did when I cast the veto (= a year ago next
> Saturday).  Going forward it might be preferable for threads not to
> hibernate.

I agree, but obviously the hibernation is not some deliberate action
by anyone. It's just that most of us here have less spare time for
dev@ discussions (and for SVN development) than before. Especially for
such complex matters, and especially when people feel there are
walking into a minefield. There are only a few active devs left, and
tuits are running low ...

...
> That being the case, I have considered whether merging the feature
> branch outweighs letting dev@ take a not-only-/pro forma/ role in
> design discussions.  I am of the opinion that it does not, and
> therefore I reäfirrm the veto.

It has become more clear to me (I was only following tangentially)
that your veto is focused on the development methodology and the lack
of design discussion. Is that a valid reason for a veto? We are low on
resources, someone still finds time to make some progress, no one
blocks it on technical grounds, and then someone vetoes it because we
don't have enough resources?

That puts us pretty much in deadlock, because we are too low on
resources. Or maybe I misunderstand?

To be clear: I appreciate your input, Daniel, and your insistence on a
more thorough design discussion. I assume it's coming from a genuine
concern that we formulate problems well, and think hard about possible
solutions (focusing on the precise problem we are trying to solve).
But at the end of the day, if that design discussion doesn't happen
(or not enough to your satisfaction anyway), is that grounds for a
veto? For me it's a tough call, because on the one hand you have a
point, but on the other hand ... you're blocking _some_ progress
because the process behind it is not perfect (which is hard to do with
the 3.25 tuits we have left).

> P.S.  Could that BRANCH-README please state what's the problem the branch
> means to solve, i.e., the goal / acceptance test?  "Make it possible to
> «svn add» SHA-1 collisions"?

I agree that would be a good step.

I too find it a bit unclear what problem we're actually trying to
solve, apart from a vague feeling that SHA-1 will become more and more
broken over time, and that this will cause fatal injury to SVN (in its
WC, protocol, dump format, or repository). And perhaps the fact that
security auditors are becoming more and more triggered by seeing SHA-1
(even if they don't understand the way it is used and its
ramifications). Making it possible to 'svn add' SHA-1 collisions is
not it, I think.

-- 
Johan


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2024-01-12 Thread Daniel Shahaf
Karl Fogel wrote on Wed, 03 Jan 2024 22:13 +00:00:
> On 01 Apr 2023, Evgeny Kotkov via dev wrote:
> > Daniel Shahaf  writes:
> > 
> > > What's the question or action item to/for me?  Thanks.
> > 
> > I'm afraid I don't fully understand your question.  As you
> > probably remember, the change is blocked by your veto.  To my
> > knowledge, this veto hasn't been revoked as of now, and I simply
> > mentioned that in my email.  It is entirely your decision
> > whether or not to take any action regarding this matter.
> 
> So AIUI, Evgeny is asking you to withdraw your veto, Daniel. Evgeny would
> like to merge this into trunk -- on the grounds, I believe, that it is
> strictly an improvement over what we have now, and it opens the door to
> further future improvements (each of which would go through the usual
> discussion & consensus process, of course).

So, I looked.

This thread comprises 237 posts spanning 30 months (July 2021 through
today).  On 2023-01-20 I cast a veto.  There was some activity
afterwards, but until the parent post of this one, the thread has been
silent for the better part of a year; and now I'm being asked to
withdraw my veto.

Procedurally, the long hiatus is counterproductive.  Neither kfogel nor
I had the context in our heads, and the cache misses took their toll in
tuits and in wallclock time.  Furthermore, I have less spare time for
dev@ discussions than I did when I cast the veto (= a year ago next
Saturday).  Going forward it might be preferable for threads not to
hibernate.

You didn't link the veto, so I had to go grep for it.  It is,
presumably, this one:

>>>> # Archived-At: 
>>>> https://mail-archives.apache.org/mod_mbox/subversion-dev/202212.mbox/%3C904aded6-5ef0-4123-ade0-e23a3bb56726%40app.fastmail.com%3E
>>>> Date: Fri, 20 Jan 2023 12:15:24 +0000
>>>> From: Daniel Shahaf
>>>> To: dev@subversion.apache.org
>>>> Subject: Re: Switching from SHA1 to a checksum type without known 
>>>> collisions in 1.15 working copy format
>>>> Message-Id: <904aded6-5ef0-4123-ade0-e23a3bb56...@app.fastmail.com>
>>>> 
>>>> Evgeny Kotkov via dev wrote on Thu, 19 Jan 2023 18:52 +00:00:
>>>> > I can complete the work on this branch and bring it to a production-ready
>>>> > state, assuming there are no objections.
>>>> 
>>>> Your assumption is counterfactual:
>>>> 
>>>> https://mail-archives.apache.org/mod_mbox/subversion-dev/202301.mbox/%3C20230119152001.GA27446%40tarpaulin.shahaf.local2%3E
>>>> 
>>>> https://mail-archives.apache.org/mod_mbox/subversion-dev/202212.mbox/%3CCAMHy98NqYBLZaTL5-FAbf24RR6bagPN1npC5gsZenewZb0-EuQ%40mail.gmail.com%3E
>>>> 
>>>> Objections have been raised, been left unanswered, and now
>>>> implementation work has commenced following the original design.  That's
>>>> not acceptable.  I'm vetoing the change until a non-rubber-stamp design
>>>> discussion has been completed on the public dev@ list.

So, this veto being in front of me, let me reply to the request that
I withdraw it:

> So AIUI, Evgeny is asking you to withdraw your veto, Daniel. Evgeny would
> like to merge this into trunk -- on the grounds, I believe, that it is
> strictly an improvement over what we have now, and it opens the door to
> further future improvements (each of which would go through the usual
> discussion & consensus process, of course).
> 
> Evgeny's work is on this branch...
> 
> https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-salt
> 
> ...which in turn branched from
> https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-kind.
> 
> I used this command to get an overview of the work:
> 
> $ svn cat 
> https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-salt/BRANCH-README

As far as I can tell, the request for veto withdrawal is grounded only
in the fact that the veto, whilst in force, prevents the feature branch
from being merged/released.  The request does not allege the veto was
invalid or unfounded in the first place; nor that the veto has /become/
invalid or unfounded due to time having passed; nor that modifications
or alterations to the code [or, in this case, to the decision-making
process] have been made and are believed to have addressed the veto's
grounds.

In summary, the request only deals with the fact of a veto and its
formal/procedural implications, but does not deal with the substantive
justification for the veto at all.

That being the case, I have no reason to believe the original grounds of
the veto have been addressed.

That being the case, I have considered whether merging the feature
branch o

Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2024-01-04 Thread Karl Fogel

On 04 Jan 2024, Daniel Shahaf wrote:
Acknowledging receipt.  I'll reply substantively when I have the 
time to swap in the context.


Thanks.  Yeah, I went through the same context-swapping-in process 
yesterday before posting!


Best regards,
-Karl


Evgeny's work is on this branch...

https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-salt

...which in turn branched from 
https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-kind.


I used this command to get an overview of the work:

$ svn cat 
https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-salt/BRANCH-README


(The work is several months old now, but for the sake of 
discussion let's assume it's mergeable, passes all tests, etc. 
Obviously, Evgeny's only going to merge it when all of those 
conditions are true -- maybe some minor tweaks will be needed 
to 
get it there, I don't know.)


Best regards,
-Karl


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2024-01-04 Thread Daniel Shahaf
Karl Fogel wrote on Wed, 03 Jan 2024 22:13 +00:00:
> On 01 Apr 2023, Evgeny Kotkov via dev wrote:
>>Daniel Shahaf  writes:
>>
>>> What's the question or action item to/for me?  Thanks.
>>
>>I'm afraid I don't fully understand your question.  As you
>>probably remember, the change is blocked by your veto.  To my
>>knowledge, this veto hasn't been revoked as of now, and I simply
>>mentioned that in my email.  It is entirely your decision
>>whether or not to take any action regarding this matter.
>
> So AIUI, Evgeny is asking you to withdraw your veto, Daniel. 
> Evgeny would like to merge this into trunk -- on the grounds, I 
> believe, that it is strictly an improvement over what we have now, 
> and it opens the door to further future improvements (each of 
> which would go through the usual discussion & consensus process, 
> of course).
>

Acknowledging receipt.  I'll reply substantively when I have the time to swap 
in the context.

Daniel

> Evgeny's work is on this branch...
>
> https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-salt
>
> ...which in turn branched from 
> https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-kind.
>
> I used this command to get an overview of the work:
>
> $ svn cat 
> https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-salt/BRANCH-README
>
> (The work is several months old now, but for the sake of 
> discussion let's assume it's mergeable, passes all tests, etc. 
> Obviously, Evgeny's only going to merge it when all of those 
> conditions are true -- maybe some minor tweaks will be needed to 
> get it there, I don't know.)
>
> Best regards,
> -Karl


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2024-01-03 Thread Karl Fogel

On 01 Apr 2023, Evgeny Kotkov via dev wrote:

Daniel Shahaf  writes:


What's the question or action item to/for me?  Thanks.


I'm afraid I don't fully understand your question.  As you
probably remember, the change is blocked by your veto.  To my
knowledge, this veto hasn't been revoked as of now, and I simply
mentioned that in my email.  It is entirely your decision
whether or not to take any action regarding this matter.


So AIUI, Evgeny is asking you to withdraw your veto, Daniel. 
Evgeny would like to merge this into trunk -- on the grounds, I 
believe, that it is strictly an improvement over what we have now, 
and it opens the door to further future improvements (each of 
which would go through the usual discussion & consensus process, 
of course).


Evgeny's work is on this branch...

https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-salt

...which in turn branched from 
https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-kind.


I used this command to get an overview of the work:

$ svn cat 
https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-salt/BRANCH-README


(The work is several months old now, but for the sake of 
discussion let's assume it's mergeable, passes all tests, etc. 
Obviously, Evgeny's only going to merge it when all of those 
conditions are true -- maybe some minor tweaks will be needed to 
get it there, I don't know.)


Best regards,
-Karl


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-04-01 Thread Evgeny Kotkov via dev
Daniel Shahaf  writes:

> What's the question or action item to/for me?  Thanks.

I'm afraid I don't fully understand your question.  As you probably remember,
the change is blocked by your veto.  To my knowledge, this veto hasn't been
revoked as of now, and I simply mentioned that in my email.  It is entirely
your decision whether or not to take any action regarding this matter.


Thanks,
Evgeny Kotkov


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-03-31 Thread Daniel Shahaf
Evgeny Kotkov via dev wrote on Wed, 22 Mar 2023 15:23 +00:00:
> This change is still being blocked by a veto, but if danielsh changes his
> mind and if there won't be other objections, I'm ready to complete the few
> remaining bits and merge it to trunk.

What's the question or action item to/for me?  Thanks.

Daniel


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-03-22 Thread Evgeny Kotkov via dev
Evgeny Kotkov  writes:

> > Now, how hard would this be to actually implement?
>
> To have a more or less accurate estimate, I went ahead and prepared the
> first-cut implementation of an approach that makes the pristine checksum
> kind configurable in a working copy.
>
> The current implementation passes all tests in my environment and seems to
> work in practice.  It is available on the branch:
>
>   https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-kind
>
> The implementation on the branch allows creating working copies that use a
> checksum kind other than SHA-1.

I extended the current implementation to use a dynamically salted SHA-1
checksum, rather than a SHA-1 with a statically hardcoded salt.
The dynamic salt is generated during the creation of a wc.db.

The implementation is available on a separate branch:

  https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-salt

The change is a bit massive, but in the meantime I think that it should solve
the potential problem without any practical drawbacks, except for the lack
of the mentioned ra_serf fetch optimization.

So overall I'd propose to bring this change to trunk, to improve the current
state around checksum collisions in the working copy, and to also have the
infrastructure for supporting different checksum kinds in place, in case
we need it in the future.

This change is still being blocked by a veto, but if danielsh changes his
mind and if there won't be other objections, I'm ready to complete the few
remaining bits and merge it to trunk.


Thanks,
Evgeny Kotkov


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-02-07 Thread Evgeny Kotkov via dev
Daniel Shahaf  writes:

> Look, it's pretty simple.  You said "We should do Y because it
> addresses X".  You didn't explain why X needs to be addressed, didn't
> consider what alternatives there are to Y, didn't consider any cons that
> Y may have… and when people had questions, you just began to
> implement Y, without responding to or even acknowledging those
> questions.
>
> That's not how design discussions work.  A design discussion doesn't go
> "state decision; state pros; implement"; it goes "state problem; discuss
> potential solutions, pros, cons; decide; implement" (cf. [4, 5, 6]).

Well, I think it may not be as simple as it seems to you.  Who decided that
we should follow the process you're describing?  Is there a thread with a
consensus on this topic?  Or do you insist on using this specific process
because it's the only process that seems obvious to you?  What alternatives
to it have been considered?

As far as I can tell, the process you're suggesting is effectively a
waterfall-like process, and there are quite a lot of concerns about its
effectiveness, because the decisions have to be made in the conditions of
a lack of information.

Personally, I prefer an alternative process that starts from finding out
all available bits of information, which are then used to make informed
decisions.  The unfortunate reality, however, is that the only guaranteed
way of collecting all information means implementing all (or almost all)
significant parts in code.  Roughly speaking, this process looks like a
research project that gets completed by trial and error.

Based on what you've been saying so far, I wouldn't be surprised if you
disagree.  But I still think that forcing the others to follow a certain
process by such means as vetoing a code change is maybe a bit over the
top.  (In the meantime, I certainly won't object if you're going to use this
waterfall-like process for the changes that you implement yourself.)


Regards,
Evgeny Kotkov


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-02-06 Thread Daniel Shahaf
Evgeny Kotkov via dev wrote on Sun, Jan 29, 2023 at 16:37:20 +0300:
> Daniel Shahaf  writes:
> 
> > > (I'm not saying that the above rules have to be used in this particular 
> > > case
> > >  and that a veto is invalid, but still thought it’s worth mentioning.)
> > >
> >
> > I vetoed the change because it hadn't been designed on the dev@ list,
> > had not garnered dev@'s consensus, and was being railroaded through.
> > (as far as I could tell)
> 
> I have *absolutely* no idea where "being railroaded through" comes from.
> Really, it's a wrong way of portraying and thinking about the events that have
> happened so far.
> 
> Reiterating over those events: I wrote an email containing my thoughts
> and explaining the motivation for such change.  I didn't reply to some of
> the questions (including some tricky questions, such as the one featuring
> a theoretical hash function), because they have been at least partly
> answered by others in the thread, and I didn't have anything valuable
> to add at that time.
> 
> During that time, I was actively coding the core part of the change,
> to check if it's possible technically.  Which is important, as far as
> I believe, because not all theoretically possible solutions can be implemented
> without facing significant practical or implementation-related issues, and
> it seems to me that you significantly undervalue such an approach.
> 

Quoting myself from elsethread: [3]

- If the branch is seen and presented as a PoC for furthering discussion
  and for discovering practical considerations (e.g., that
  PRISTINE.MD5_CHECKSUM docstring I found yesterday during discussion,
  or the ra_serf sha1 optimization that anyone implementing the branch
  would run into), it's likely a good thing.

> I do not say my actions were exemplary, but as far as I can tell, they're
> pretty much in line with how svn-dev has been operating so far.  But, it all
> resulted in an unclear veto without any _technical_ arguments, where what's
> being vetoed is unclear as well, because the change was not ready at the
> moment veto got casted.
> 

Look, it's pretty simple.  You said "We should do Y because it
addresses X".  You didn't explain why X needs to be addressed, didn't
consider what alternatives there are to Y, didn't consider any cons that
Y may have… and when people had questions, you just began to
implement Y, without responding to or even acknowledging those
questions.

That's not how design discussions work.  A design discussion doesn't go
"state decision; state pros; implement"; it goes "state problem; discuss
potential solutions, pros, cons; decide; implement" (cf. [4, 5, 6]).

That's why I called veto: not because I considered any particular
proposal then on the table unreasonable, but because I considered /the
decision process being used/ unreasonable (cf. [7]).

> And because your veto goes in favor of a specific process

Yes, I'm arguing in favour of first defining a problem, then considering
solutions to it, both their pros and cons, and only then deciding what
to implement.  This process isn't unique, novel, or singular; it's
standard in multiple disciplines [4–7].

>   (considering that
> no other arguments were given), the only thing that's *actually* being
> railroaded is an odd form of an RTC (review-then-commit) process that is
> against our usual CTR (commit-then-review) [1,2].  That's railroading,
> because it hasn't been explicitly discussed anywhere and a consensus
> on it has not been reached.

This thread was started on 2022-12-20 [1], with the idiomatic
"Thoughts?" sign-off.  The first relevant code was committed on
2023-01-19 [2].

That is: the change followed RTC to begin with.  Considering that both
[1] and [2] were authored by you personally, I find it difficult to
charitably interpret your claim that "an odd form of [RTC]" was being
"railroaded", as RTC rather than "our usual CTR [process]" was being
followed at your own decision.

It's perhaps worth pointing out the veto followed the branch creation
because that was the point when I gave up on waiting for someone to
respond to the objections that had been made by then.  It wasn't a veto
on using a branch, as I have clarified: [3]

I didn't object to the use of a branch /per se/.  I objected to the
treating of objections that *had already been posted* as though they had
never been posted.  *That's* not acceptable.

So, no, I wasn't advocating /either/ RTC or CTR; I was advocating that
the "R" step happen at all.  A branch may take place before, during, or
after discussion — see [3] for more — but the important thing is that
discussion happen.  The OP doesn't have to agree with all points made,
but doesn't get to ignore them and proceed as though they have never
been posted.

Daniel

[1] 

Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-02-06 Thread Daniel Shahaf
Karl Fogel wrote on Mon, Jan 30, 2023 at 17:26:03 -0600:
> On 29 Jan 2023, Evgeny Kotkov via dev wrote:
> > I have *absolutely* no idea where "being railroaded through" comes
> > from.  Really, it's a wrong way of portraying and thinking about the
> > events that have happened so far.
> > 
> > Reiterating over those events: I wrote an email containing my
> > thoughts and explaining the motivation for such change.  I didn't
> > reply to some of the questions (including some tricky questions,
> > such as the one featuring a theoretical hash function), because they
> > have been at least partly answered by others in the thread, and I
> > didn't have anything valuable to add at that time.
> > 
> > During that time, I was actively coding the core part of the change,
> > to check if it's possible technically.  Which is important, as far
> > as I believe, because not all theoretically possible solutions can
> > be implemented without facing significant practical or
> > implementation-related issues, and it seems to me that you
> > significantly undervalue such an approach.
> > 
> > I do not say my actions were exemplary, but as far as I can tell,
> > they're pretty much in line with how svn-dev has been operating so
> > far. But, it all resulted in an unclear veto without any _technical_
> > arguments, where what's being vetoed is unclear as well, because the
> > change was not ready at the moment veto got casted.
> > 
> > And because your veto goes in favor of a specific process
> > (considering that no other arguments were given), the only thing
> > that's *actually* being railroaded is an odd form of an RTC
> > (review-then-commit) process that is against our usual CTR
> > (commit-then-review) [1,2].  That's railroading, because it hasn't
> > been explicitly discussed anywhere and a consensus on it has not
> > been reached.
> 
> Daniel, given what's in Evgeny's branch now, could you summarize your
> current technical objections if any?
> 
> If they are something like "This code is solving the wrong problem(s)" or
> "I'm not sure what problem(s) it's supposed to solve", those count as
> technical objections.  It's just that it would be useful to have the
> objection(s) gathered in one place. This thread has been long and somewhat
> digressive -- I'm not saying that's due to you -- and I at least have found
> it a bit difficult to keep track of the concrete objections versus various
> interesting but ultimately theoretical points.
> 

Quoting my other reply just now:

[…] it's pretty simple.  [The OP] said "We should do Y because it
addresses X".  [The OP] didn't explain why X needs to be addressed, didn't
consider what alternatives there are to Y, didn't consider any cons that
Y may have… and when people had questions, [the OP] just began to
implement Y, without responding to or even acknowledging those
questions.

That's not how design discussions work.  A design discussion doesn't go
"state decision; state pros; implement"; it goes "state problem; discuss
potential solutions, pros, cons; decide; implement" (cf. [4, 5, 6]).

That's why I called veto: not because I considered any particular
proposal then on the table unreasonable, but because I considered /the
decision process being used/ unreasonable (cf. [7]).

Concretely: Why would migrating away from SHA-1 be a good thing in the
first place?  Assuming that it /would/ be a good thing, what alternative
ways are there to achieve whatever the goodness may be (new feature /
bugfix / resilience to some attack vector / etc.)?  What are the
potential *downsides* of migrating away from SHA-1?

The same, restated at a higher level of abstraction: "Migrate
away from SHA-1" is a means, not an end.  Define the ends and have
a non-predetermined-outcome discussion on how to achieve them.

"Reduce the security impact to our users of second-preimage attacks
against SHA-1" would be an end.  I don't know whether it's the only one
or whether there are additional ones.

[As to the branch, I'm not sure whether to restate my position on it or
not — so I'll restate it, erring on the side of including too much
rather than too little, but feel free to ignore the following paragraph
at will:]

Was the branch commenced as a PoC / smoke test, to explore one proposed
direction and to be discarded if the consensus compass should end up
pointing towards another cardinal direction?  Or was it commenced on the
assumption that consensus on migrating to SHA-1 to SHA-256 went without
saying, had already formed, or would necessarily have formed by 1.15.0-rc1?

> The reason I'm supportive of Evgeny's direction is that his changes, if
> completed, would offer a solution to the (admittedly still somewhat distant)
> security concern I raised early on. Essentially, I'm worried that
> second-preimage attacks on SHA-1 are coming eventually (maybe I'm wrong
> about this -- they are after all significantly harder than mere collision
> attacks).  

Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-02-06 Thread Daniel Shahaf
Evgeny Kotkov via dev wrote on Sun, Jan 29, 2023 at 16:36:12 +0300:
> Daniel Shahaf  writes:
> 
> > > That could happen after a public disclosure of a pair of executable
> > > files/scripts where the forged version allows for remote code execution.
> > > Or maybe something similar with a file format that is often stored in
> > > repositories and that can be executed or used by a build script, etc.
> > >
> >
> > Err, hang on.  Your reference described a chosen-prefix attack, while
> > this scenario concerns a single public collision.  These are two
> > different things.
> 
> A chosen-prefix attack allows finding more meaningful collisions such as
> working executables/scripts.  When such collisions are made public, they
> would have a greater exploitation potential than just a random collision.
> 

Right.  So we're assuming Mallory generates a chosen-prefix collision,
and then somehow pulls off steps #1 and #2-as-amended [both quoted
below], with Alice noticing none of that.

That still sounds like something we should assume Mallory can pull off.

> > Disclosure of of a pair of executable files/scripts isn't by itself
> > a problem unless one of the pair ("file A") is in a repository
> > somewhere.  Now, was the colliding file ("file B") generated _before_ or
> > _after_ file A was committed?
> >
> > - If _before_, then it would seem Mallory had somehow managed to:
> >
> >   1. get a file of his choosing committed to Alice's repository; and
> >
> >   2. get a wc of Alice's repository into one of the codepaths that
> >  assume SHA-1 is one-to-one / collission-free (currently that's the
> >  ra_serf optimization and the 1.15 wc status).
> 
> Not only.  There are cases when the working copy itself installs the working
> file with a hash lookup in the pristine store.  This is more true for 1.14
> than trunk, because in trunk we have the streamy checkout/update that avoid
> such lookups by writing straight to the working file.  However, some of
> the code paths still install the contents from the pristine store by hash.
> Examples include reverting a file, copying an unmodified file, switching
> a file with keywords, the mentioned ra_serf optimization, and etc.
> 

Thanks.  In terms of that step #2, all these are also candidates for
"one of the codepaths", then.

> >   Now, step #1 seems plausible enough.  As to step #2, it's not clear to
> >   me how file B would reach the wc in step #2…
> 
> If Mallory has write access, she could commit both files, thus arranging for
> a possible content change if both files are checked out to a single working
> copy.  This isn't the same as just directly modifying the target file, because
> file content isn't expected to change due to changes in other files (that can
> be of any type), so this attack has much better chances of being unnoticed.
> 

Well, yes, but the write access requirement lowers severity.

> If Mallory doesn't have write access, there should be other vectors, such
> as distributing a pair of files (harmless in the context of their respective
> file formats) separately via two upstream channels.  Then, if both of the
> upstream distributions are committed into a repository and their files are
> checked out together, the content will change, allowing for a malicious
> action.

I take it we're still under the assumption that someone's repository has
rep-sharing disabled (or unsupported, i.e., pre-1.6 format) despite the
recommendation in security/sha1-advisory.txt, since otherwise the commit
would be rejected.

So, back to my question which you have snipped:

> >   So, I agree it's a scenario we should address.  What options do we
> >   have to address it?  (I grant that migrating away from SHA-1 is one
> >   option.)

Care to address that?

Daniel

> 
> Regards,
> Evgeny Kotkov


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-31 Thread Karl Fogel

On 31 Jan 2023, Daniel Shahaf wrote:

Karl Fogel wrote on Mon, 30 Jan 2023 23:26 +00:00:
Daniel, given what's in Evgeny's branch now, could you 
summarize 
your current technical objections if any?


Certainly, but I won't have time to do so today.


Oh, my gosh, I'd be the last person to ever complain about someone 
not being prompt in sending a detailed technical reply here :-). 
It takes me *weeks* sometimes.  Whenever you get time is good.


Best regards,
-Karl


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-31 Thread Daniel Shahaf
Karl Fogel wrote on Mon, 30 Jan 2023 23:26 +00:00:
> Daniel, given what's in Evgeny's branch now, could you summarize 
> your current technical objections if any?

Certainly, but I won't have time to do so today.


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-30 Thread Karl Fogel

On 29 Jan 2023, Evgeny Kotkov via dev wrote:
I have *absolutely* no idea where "being railroaded through" 
comes from.
Really, it's a wrong way of portraying and thinking about the 
events that have

happened so far.

Reiterating over those events: I wrote an email containing my 
thoughts
and explaining the motivation for such change.  I didn't reply to 
some of
the questions (including some tricky questions, such as the one 
featuring
a theoretical hash function), because they have been at least 
partly
answered by others in the thread, and I didn't have anything 
valuable

to add at that time.

During that time, I was actively coding the core part of the 
change,
to check if it's possible technically.  Which is important, as 
far as
I believe, because not all theoretically possible solutions can 
be implemented
without facing significant practical or implementation-related 
issues, and
it seems to me that you significantly undervalue such an 
approach.


I do not say my actions were exemplary, but as far as I can tell, 
they're
pretty much in line with how svn-dev has been operating so far. 
But, it all
resulted in an unclear veto without any _technical_ arguments, 
where what's
being vetoed is unclear as well, because the change was not ready 
at the

moment veto got casted.

And because your veto goes in favor of a specific process 
(considering that
no other arguments were given), the only thing that's *actually* 
being
railroaded is an odd form of an RTC (review-then-commit) process 
that is
against our usual CTR (commit-then-review) [1,2].  That's 
railroading,
because it hasn't been explicitly discussed anywhere and a 
consensus

on it has not been reached.


Daniel, given what's in Evgeny's branch now, could you summarize 
your current technical objections if any?


If they are something like "This code is solving the wrong 
problem(s)" or "I'm not sure what problem(s) it's supposed to 
solve", those count as technical objections.  It's just that it 
would be useful to have the objection(s) gathered in one place. 
This thread has been long and somewhat digressive -- I'm not 
saying that's due to you -- and I at least have found it a bit 
difficult to keep track of the concrete objections versus various 
interesting but ultimately theoretical points.


The reason I'm supportive of Evgeny's direction is that his 
changes, if completed, would offer a solution to the (admittedly 
still somewhat distant) security concern I raised early on. 
Essentially, I'm worried that second-preimage attacks on SHA-1 are 
coming eventually (maybe I'm wrong about this -- they are after 
all significantly harder than mere collision attacks).  *If* such 
attacks become possible, then our WC could report a file as 
unmodified when in fact it is modified, which would have real 
security implications, as I outlined.


Like I said, this is far from urgent, and IMHO it certainly should 
not delay a release of our new pristineless feature.  But when and 
if Evgeny's branch is ready (where "ready" presumably includes 
something other than salted SHA-1 as the other checksum option), I 
would like to see these changes go in, unless we identify some 
harm from them.


For everyone's ease of reference:

$ svn cat 
https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-kind/BRANCH-README


$ svn log --stop-on-copy 
https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-kind/


Best regards,
-Karl


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-29 Thread Evgeny Kotkov via dev
Daniel Shahaf  writes:

> > (I'm not saying that the above rules have to be used in this particular case
> >  and that a veto is invalid, but still thought it’s worth mentioning.)
> >
>
> I vetoed the change because it hadn't been designed on the dev@ list,
> had not garnered dev@'s consensus, and was being railroaded through.
> (as far as I could tell)

I have *absolutely* no idea where "being railroaded through" comes from.
Really, it's a wrong way of portraying and thinking about the events that have
happened so far.

Reiterating over those events: I wrote an email containing my thoughts
and explaining the motivation for such change.  I didn't reply to some of
the questions (including some tricky questions, such as the one featuring
a theoretical hash function), because they have been at least partly
answered by others in the thread, and I didn't have anything valuable
to add at that time.

During that time, I was actively coding the core part of the change,
to check if it's possible technically.  Which is important, as far as
I believe, because not all theoretically possible solutions can be implemented
without facing significant practical or implementation-related issues, and
it seems to me that you significantly undervalue such an approach.

I do not say my actions were exemplary, but as far as I can tell, they're
pretty much in line with how svn-dev has been operating so far.  But, it all
resulted in an unclear veto without any _technical_ arguments, where what's
being vetoed is unclear as well, because the change was not ready at the
moment veto got casted.

And because your veto goes in favor of a specific process (considering that
no other arguments were given), the only thing that's *actually* being
railroaded is an odd form of an RTC (review-then-commit) process that is
against our usual CTR (commit-then-review) [1,2].  That's railroading,
because it hasn't been explicitly discussed anywhere and a consensus
on it has not been reached.

[1] https://www.apache.org/foundation/glossary.html#CommitThenReview
[2] https://www.apache.org/foundation/glossary.html#ReviewThenCommit


Regards,
Evgeny Kotkov


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-29 Thread Evgeny Kotkov via dev
Daniel Shahaf  writes:

> > That could happen after a public disclosure of a pair of executable
> > files/scripts where the forged version allows for remote code execution.
> > Or maybe something similar with a file format that is often stored in
> > repositories and that can be executed or used by a build script, etc.
> >
>
> Err, hang on.  Your reference described a chosen-prefix attack, while
> this scenario concerns a single public collision.  These are two
> different things.

A chosen-prefix attack allows finding more meaningful collisions such as
working executables/scripts.  When such collisions are made public, they
would have a greater exploitation potential than just a random collision.

> Disclosure of of a pair of executable files/scripts isn't by itself
> a problem unless one of the pair ("file A") is in a repository
> somewhere.  Now, was the colliding file ("file B") generated _before_ or
> _after_ file A was committed?
>
> - If _before_, then it would seem Mallory had somehow managed to:
>
>   1. get a file of his choosing committed to Alice's repository; and
>
>   2. get a wc of Alice's repository into one of the codepaths that
>  assume SHA-1 is one-to-one / collission-free (currently that's the
>  ra_serf optimization and the 1.15 wc status).

Not only.  There are cases when the working copy itself installs the working
file with a hash lookup in the pristine store.  This is more true for 1.14
than trunk, because in trunk we have the streamy checkout/update that avoid
such lookups by writing straight to the working file.  However, some of
the code paths still install the contents from the pristine store by hash.
Examples include reverting a file, copying an unmodified file, switching
a file with keywords, the mentioned ra_serf optimization, and etc.

>   Now, step #1 seems plausible enough.  As to step #2, it's not clear to
>   me how file B would reach the wc in step #2…

If Mallory has write access, she could commit both files, thus arranging for
a possible content change if both files are checked out to a single working
copy.  This isn't the same as just directly modifying the target file, because
file content isn't expected to change due to changes in other files (that can
be of any type), so this attack has much better chances of being unnoticed.

If Mallory doesn't have write access, there should be other vectors, such
as distributing a pair of files (harmless in the context of their respective
file formats) separately via two upstream channels.  Then, if both of the
upstream distributions are committed into a repository and their files are
checked out together, the content will change, allowing for a malicious
action.


Regards,
Evgeny Kotkov


Glossary of attacks (was: Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format)

2023-01-26 Thread Daniel Shahaf
Definitions of attacks:

1. Collision attack:
   Given h(),
   find x₁, x₂ such that h(x₁) == h(x₂).

2. Second preimage attack:
   Given h() and x,
   find x′ such that h(x) == h(x′).

3. First preimage attack:
   Given h() and y,
   find x such that h(x) == y.

4. Chosen prefix attack:
   Given h(), p₁, and p₂,
   find m₁, m₂ such that h(m₁) == h(m₂) and m₁.startswith(p₁) and 
m₂.startswith(p₂).

Daniel Shahaf wrote on Thu, Jan 26, 2023 at 09:33:59 +:
> Evgeny Kotkov via dev wrote on Mon, Jan 23, 2023 at 02:28:50 +0300:
> > However, with the feasibility of chosen-prefix attacks on SHA-1 [2], it's
> > probably only a matter of time until the situation becomes worse.
> > 
> 
> Quoting the third hunk of 
> :
> 
> What's the acceptance test we use for candidate checksum algorithms?
> 
> You say we should switch to a checksum algorithm that doesn't have known
> collisions, but, why should we require that?  Consider the following
> 160-bit checksum algorithm:
> .
> 1. If the input consists of 40 ASCII lowercase hex digits and
>nothing else, return the input.
> 2. Else, return the SHA-1 of the input.
> 
> This algorithm has a trivial first preimage attack.  If a wc used this
> identity-then-sha1 algorithm instead of SHA-1, then… what?
> 
> > That could happen after a public disclosure of a pair of executable
> > files/scripts where the forged version allows for remote code execution.
> > Or maybe something similar with a file format that is often stored in
> > repositories and that can be executed or used by a build script, etc.
> > 
> 
> Err, hang on.  Your reference described a chosen-prefix attack, while
> this scenario concerns a single public collision.  These are two
> different things.
> 
> Disclosure of of a pair of executable files/scripts isn't by itself
> a problem unless one of the pair ("file A") is in a repository
> somewhere.  Now, was the colliding file ("file B") generated _before_ or
> _after_ file A was committed?
> 
> - If _before_, then it would seem Mallory had somehow managed to:
> 
>   1. get a file of his choosing committed to Alice's repository; and
> 
>   2. get a wc of Alice's repository into one of the codepaths that
>  assume SHA-1 is one-to-one / collission-free (currently that's the
>  ra_serf optimization and the 1.15 wc status).
> 
>   Now, step #1 seems plausible enough.  As to step #2, it's not clear to
>   me how file B would reach the wc in step #2… but insofar as security
>   assumptions go, it seems reasonable to assume Mallory can make this
>   happen.
> 
>   So, I agree it's a scenario we should address.  What options do we
>   have to address it?  (I grant that migrating away from SHA-1 is one
>   option.)
> 
> - If _after_, then you're presuming not simply a collision attack but
>   a second preimage attack.  Should we assume Mallory to be able to
>   mount a second preimage attack?
> 
> Chosen-prefix collision attacks can help Mallory in a variant of the
> "before" case: Mallory computes a collision, sends file A to Alice (who
> commits it), and invokes his assumed ability to inject file B into
> Alice's wc.  This would work for file formats that ignore the unchosen
> suffix.


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-26 Thread Daniel Shahaf
Evgeny Kotkov via dev wrote on Mon, Jan 23, 2023 at 02:28:50 +0300:
> Daniel Shahaf  writes:
> 
> > > I can complete the work on this branch and bring it to a production-ready
> > > state, assuming there are no objections.
> >
> > Your assumption is counterfactual:
> >
> > https://mail-archives.apache.org/mod_mbox/subversion-dev/202301.mbox/%3C20230119152001.GA27446%40tarpaulin.shahaf.local2%3E
> >
> > https://mail-archives.apache.org/mod_mbox/subversion-dev/202212.mbox/%3CCAMHy98NqYBLZaTL5-FAbf24RR6bagPN1npC5gsZenewZb0-EuQ%40mail.gmail.com%3E
> 
> I don't see any explicit objections in these two emails (here I assume that
> if something is not clear to a PMC member, it doesn't automatically become
> an objection).  If the "why?" question is indeed an objection, then I would
> say it has already been discussed and responded to in the thread.
> 

The "Why?" was sent _after_ the post you're quoting, and in any case was
just an elevator pitch summary of something I had explained more verbosely.

The first post in this thread asserts X is a problem and Y is a solution
to it, and argues that Y is a good thing.  However, that post does not
explain /why/ X is a problem, does not consider alternatives to Y, and
does not consider possible cons of Y.  That's what's missing.

> Now, returning to the problem:
> 
> As described in the advisory [1], we have a supported configuration that
> makes data forgery possible:
> 
> - A repository with disabled rep-sharing allows storing different files with
>   colliding SHA-1 values.
> - Having a repository with disabled rep-sharing is a supported configuration.
>   There may be a certain number of such repositories in the wild
>   (for example, created with SVN < 1.6 and not upgraded afterwise).
> - A working copy uses an assumption that the pristine contents are equal if
>   their SHA-1 hashes are equal.
> - So committing different files with colliding SHA-1 values makes it possible
>   to forge the contents of a file that will be checked-out and used by the
>   client.
> 
> I would say that this state is worrying just by itself.
> 

I assume this situation could happen accidentally, say, if someone adds
shattered-1.pdf and shattered-2.pdf to the same wc in a particular way.
That is, I'm not assuming "forgery" (which implies Mallory is involved).

Still, this is a potential data integrity issue with the new-in-1.15 wc
format, so we should address it before the release.  What are our
options to address that?  Switching to another checksum is an option,
yes, but we [as in, dev@] don't seem to have considered any alternatives
to that.

Just off the top of my head, we could:

- Encourage or require use of rep-sharing
  [the advisory already recommends this]

- Encourage or require use of 
tools/hook-scripts/reject-detected-sha1-collisions.sh
  [the advisory already recommends this]

- Have f32 wc's refuse to talk to servers that don't detect SHA-1
  collisions.  (1.15 users will still be able to interoperate with old
  servers by using f31.)

And there may be more options.  (Lurkers are invited to speak up!)

> However, with the feasibility of chosen-prefix attacks on SHA-1 [2], it's
> probably only a matter of time until the situation becomes worse.
> 

Quoting the third hunk of 
:

What's the acceptance test we use for candidate checksum algorithms?

You say we should switch to a checksum algorithm that doesn't have known
collisions, but, why should we require that?  Consider the following
160-bit checksum algorithm:
.
1. If the input consists of 40 ASCII lowercase hex digits and
   nothing else, return the input.
2. Else, return the SHA-1 of the input.

This algorithm has a trivial first preimage attack.  If a wc used this
identity-then-sha1 algorithm instead of SHA-1, then… what?

> That could happen after a public disclosure of a pair of executable
> files/scripts where the forged version allows for remote code execution.
> Or maybe something similar with a file format that is often stored in
> repositories and that can be executed or used by a build script, etc.
> 

Err, hang on.  Your reference described a chosen-prefix attack, while
this scenario concerns a single public collision.  These are two
different things.

Disclosure of of a pair of executable files/scripts isn't by itself
a problem unless one of the pair ("file A") is in a repository
somewhere.  Now, was the colliding file ("file B") generated _before_ or
_after_ file A was committed?

- If _before_, then it would seem Mallory had somehow managed to:

  1. get a file of his choosing committed to Alice's repository; and

  2. get a wc of Alice's repository into one of the codepaths that
 assume SHA-1 is one-to-one / collission-free (currently that's the
 ra_serf optimization and the 1.15 wc status).

  Now, step #1 seems 

Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-22 Thread Evgeny Kotkov via dev
Daniel Shahaf  writes:

> > I can complete the work on this branch and bring it to a production-ready
> > state, assuming there are no objections.
>
> Your assumption is counterfactual:
>
> https://mail-archives.apache.org/mod_mbox/subversion-dev/202301.mbox/%3C20230119152001.GA27446%40tarpaulin.shahaf.local2%3E
>
> https://mail-archives.apache.org/mod_mbox/subversion-dev/202212.mbox/%3CCAMHy98NqYBLZaTL5-FAbf24RR6bagPN1npC5gsZenewZb0-EuQ%40mail.gmail.com%3E

I don't see any explicit objections in these two emails (here I assume that
if something is not clear to a PMC member, it doesn't automatically become
an objection).  If the "why?" question is indeed an objection, then I would
say it has already been discussed and responded to in the thread.

Now, returning to the problem:

As described in the advisory [1], we have a supported configuration that
makes data forgery possible:

- A repository with disabled rep-sharing allows storing different files with
  colliding SHA-1 values.
- Having a repository with disabled rep-sharing is a supported configuration.
  There may be a certain number of such repositories in the wild
  (for example, created with SVN < 1.6 and not upgraded afterwise).
- A working copy uses an assumption that the pristine contents are equal if
  their SHA-1 hashes are equal.
- So committing different files with colliding SHA-1 values makes it possible
  to forge the contents of a file that will be checked-out and used by the
  client.

I would say that this state is worrying just by itself.

However, with the feasibility of chosen-prefix attacks on SHA-1 [2], it's
probably only a matter of time until the situation becomes worse.

That could happen after a public disclosure of a pair of executable
files/scripts where the forged version allows for remote code execution.
Or maybe something similar with a file format that is often stored in
repositories and that can be executed or used by a build script, etc.

[1] https://subversion.apache.org/security/sha1-advisory.txt
[2] https://sha-mbles.github.io/


Speaking of the proposed switch to SHA-256 or a different checksum, there's
an argument by contradiction: if we were designing the pristineless working
copy from scratch today, would we choose SHA-1 as the best available hash
that can be used to assert content equality?  If yes, how can one prove that?

> Objections have been raised, been left unanswered, and now implementation
> work has commenced following the original design.  That's not acceptable.
> I'm vetoing the change until a non-rubber-stamp design discussion has
> been completed on the public dev@ list.

I would like to note that vetoing a code modification should be accompanied
with a technical justification, and I have certain doubts that the above
arguments qualify as such:

https://www.apache.org/foundation/voting.html
[[[
To prevent vetoes from being used capriciously, the voter must provide
with the veto a technical justification showing why the change is bad
(opens a security exposure, negatively affects performance, etc. ).
A veto without a justification is invalid and has no weight.
]]]

(I'm not saying that the above rules have to be used in this particular case
 and that a veto is invalid, but still thought it’s worth mentioning.)

Anyway, I'll stop working on the branch, because a veto has been casted.


Regards,
Evgeny Kotkov


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-22 Thread Nathan Hartman
Replying to multiple parts of this thread...

On Sat, Jan 21, 2023 at 12:58 PM Karl Fogel  wrote:
> *nod* This issue isn't important enough to me to continue the
> conversation -- I'd like for new hash algorithms to be possible,
> and I think Evgeny's work on it is worthwhile, but I don't feel
> nearly as strongly about this as I feel about making the new
> pristineless working copies available in an official release as
> soon as we can.

I think it's certainly worthwhile to explore the multi-hash feature,
and if it can be in 1.15, that's good too. But if it will take a while
to *hash* out the details (pun intended) then I'm okay with letting it
wait for a future release in the interest of getting the i525pod
feature out there, even though that means a (possible) future format
bump. (i525pod provides a substantial immediate benefit, while a format
bump isn't necessarily the end of the world).

Having said so, continuing to explore the multi-hash idea:

Previously, I wrote: "Since the premise of this feature is to support
adding new hash types without bumping wc formats, it follows that any
new hash type will create compatibility problems for clients that
support f32 but not the specific new hash type. In light of that, it
might just be better to bump the wc format and then you know at the
outset that you need to upgrade your client. Just thinking out loud
here but this might be (partly) mitigated by trying to guess which hash
types we might want in the future and supporting them now, even if no
existing client will actually use them, but I don't really like this
idea."

I didn't like my own idea at the time, but the following got me
thinking:

On Sun, Jan 22, 2023 at 7:41 AM Daniel Shahaf  wrote:
> The server is aware of what algorithm the wc uses on the wire, which is
> SHA-1 in ra_serf's download optimization and MD5 in 
> svn_delta_editor_t::apply_textdelta()
> and svn_delta_editor_t::close_file().  However, the algorithm(s) used by
> the wc for naming pristines and, in f32, for detecting local mods are
> implementation details of the wc.
>
> So, suppose the wc didn't hardcode _any particular_ hash function for
> naming pristines and for status walks — not md5, not sha1, not sha256 —
> but had each «svn checkout» run pick a hash function uniformly at random
> out of a large enough family of hash functions[1].  (Intuitively, think
> of a family of hash functions as a hash function with a random salt,
> similar to [2].)
>
> This way, even if someone tried to deliberately create a collision, they
> wouldn't be able to pick a collision "off the shelf", as with
> shattered.io; they'd need to compute a collision for the specific hash
> function ("salt") used by that particular wc.  That's more difficult than
> creating a collision in a well-known hash function, regardless of
> whether we treat the salt's value as a secret of the wc (as in, stored
> in a mode-0400 file in under .svn directory and not disclosed to the
> server) or as a value the attacker is assumed to know.
>
> So, that's one way to address the scenario kfogel described.

Suppose the wc is made to support multiple hash types, support is added
now for "many" hash types (leaving open the question of "how many" and
which ones for now), and at checkout time, one is chosen, either "at
random" as suggested by danielsh, or, say, by some explicit user
option.

Suppose also that there is a possibility for the user to blacklist some
hash types which the user does not want used at all.

Now, if a specific hash type is later cracked (in the shattered.io
sense), the security fix on SVN's end is to add that hash type to the
default blacklist of hash types. It would still be supported, but new
working copies wouldn't choose it. In the advisory for said fix, we'd
document a workaround for users who can't/won't upgrade: the steps
users can take to blacklist the affected hash types on their systems,
in effect getting the same outcome as upgrading.

One caveat: In either case (whether the user upgrades or applies the
workaround), they'd have to check out new working copies (or maybe run
some invocation of 'svn upgrade') or the existing hashes won't be
changed.

And there's also this:

On Sat, Jan 21, 2023 at 5:25 AM Daniel Shahaf  wrote:
> For example, if we used another checksum algorithm, the attacker from
> your scenario might opt to edit the base checksums in .svn/wc.db and
> rename the .svn/pristine/ files accordingly.  That's much easier to pull
> off, and will be easy to adapt if we change the algorithm again, but on
> the other hand, requires write access to the .svn directory and is
> easier to discover.

Yup. Once an attacker has write access to the .svn contents, all bets
are off anyway.

Cheers,
Nathan


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-22 Thread Daniel Shahaf
[ tl;dr: See last paragraph for a concrete question about ra_serf. ]

Karl Fogel wrote on Fri, 20 Jan 2023 17:18 +00:00:
> Yes.  A hash is considered "broken" the moment security researches 
> can generate a collision.

Consider the following uses of hash functions in our code:

- FSFS rep-cache uses SHA-1.

- The ra_serf download optimization uses SHA-1.

- The commit editor uses MD5 in apply_textdelta() and close_file().

The first one is fine, because FSFS rejects collisions in new commits
(as pointed out upthread).

The second one is not necessarily fine: a variation of the attack you (kfogel)
described could make a client wrongly trigger the optimization and end
up with the wrong fulltext.

The third one is fine, because the delta and its resulting fulltext's
checksum don't travel separately.

So, there you have it: a use of SHA-1 which can stay as-is, a use of SHA-1
which may need attention, and a use of MD5 which can stay as-is — all
in the same codebase.

Thus, whether a hash function is "broken" or not depends on the context
in which it is used.



To be clear, the ra_serf thing which "may need attention" is the use
of «final_sha1_checksum» in subversion/libsvn_ra_serf/update.c.  That's
a place where we assume SHA-1 is one-to-one.

Cheers,

Daniel


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-22 Thread Daniel Shahaf
[See below a proposal that libsvn_wc not use any fixed hash function.]

Martin Edgar Furter Rathod wrote on Sat, 21 Jan 2023 05:22 +00:00:
> On 20.01.23 22:48, Karl Fogel wrote:
>> On 20 Jan 2023, Nathan Hartman wrote:
>>> We already can't store files with identical SHA1 hashes, but AFAIK the
>>> only meaningful impact we've ever heard is that security researchers
>>> cannot track files they generate with deliberate collisions. The same
>>> would be true with any hash type, for collisions within that hash
>>> type.
>> 
>> Yes.  A hash is considered "broken" the moment security researches can 
>> generate a collision.
>
> No matter what hash function you choose now, sooner or later it will be 
> broken.
>
> But a broken hash function can still be good enough for use in tools 
> like subversion if it is used correctly. Instead of just storing the 
> hash value subversion should also store a sequence number. Whenever a 
> collision happens subversion has to compare the two (or more) files 
> which have the same hash value.

So, basically, just do what the implementation of hashes (the data
structure mapping keys to values) does?

I think this would work in most of our uses of checksums, and make it
possible to have collisions in both the repository and the wc.

However, what about running `svn status` when there's an unhydrated file
that has been modified in a way that changes the fulltext but doesn't
change the checksum value?  In this case the BASE fulltext isn't
available locally to compare with.



I think there is actually something we can do about this: stop
hardcoding any particular hash function in libsvn_wc's internals.

The server is aware of what algorithm the wc uses on the wire, which is
SHA-1 in ra_serf's download optimization and MD5 in 
svn_delta_editor_t::apply_textdelta()
and svn_delta_editor_t::close_file().  However, the algorithm(s) used by
the wc for naming pristines and, in f32, for detecting local mods are
implementation details of the wc.

So, suppose the wc didn't hardcode _any particular_ hash function for
naming pristines and for status walks — not md5, not sha1, not sha256 —
but had each «svn checkout» run pick a hash function uniformly at random
out of a large enough family of hash functions[1].  (Intuitively, think
of a family of hash functions as a hash function with a random salt,
similar to [2].)

This way, even if someone tried to deliberately create a collision, they
wouldn't be able to pick a collision "off the shelf", as with
shattered.io; they'd need to compute a collision for the specific hash
function ("salt") used by that particular wc.  That's more difficult than
creating a collision in a well-known hash function, regardless of
whether we treat the salt's value as a secret of the wc (as in, stored
in a mode-0400 file in under .svn directory and not disclosed to the
server) or as a value the attacker is assumed to know.

So, that's one way to address the scenario kfogel described.

Thanks for speaking up, Martin.

Daniel

[1] I'm not making this term up; see, for instance, page 143 of
https://cseweb.ucsd.edu/~mihir/papers/gb.pdf.  "풦" is keyspace,
"D" is domain, "R" is range.  A random element K ∈ 풦 is chosen and the
hash function H_K [aka H with currying of the first parameter] is
used thereafter.

[2]
def f(foo):
return sha1(str(foo) + f.salt)
f.salt = str(random_thing())

> If the files are identical the old 
> hash+number pair is stored. If they differ the new file gets a new 
> sequence number and that hash+number pair is stored. Since collisions 
> almost never happen even if md5 is used the performance penalty will be 
> almost zero.
>
> The same thing has been discussed earlier and changing the hash function 
> will just solve the problem for a few years...
>
> Best regards,
> Martin


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-22 Thread Daniel Shahaf
To be clear, I wasn't vetoing changing the hash algorithm.  I was
vetoing making a change without discussion.  If there is discussion and
it results in consensus to change the algorithm, that'll be absolutely
fine by me.

Daniel

Karl Fogel wrote on Sat, 21 Jan 2023 17:58 +00:00:
> *nod* This issue isn't important enough to me to continue the 
> conversation -- I'd like for new hash algorithms to be possible, 
> and I think Evgeny's work on it is worthwhile, but I don't feel 
> nearly as strongly about this as I feel about making the new 
> pristineless working copies available in an official release as 
> soon as we can.
>
> Best regards,
> -Karl
>
> On 21 Jan 2023, Daniel Shahaf wrote:
>>Karl Fogel wrote on Fri, Jan 20, 2023 at 11:09:11 -0600:
>>> On 20 Jan 2023, Daniel Shahaf wrote:
>>> > Evgeny Kotkov via dev wrote on Thu, 19 Jan 2023 18:52 +00:00:
>>> > > I can complete the work on this branch and bring it to a
>>> > > production-ready
>>> > > state, assuming there are no objections.
>>> > 
>>> > Your assumption is counterfactual:
>>> > 
>>> > https://mail-archives.apache.org/mod_mbox/subversion-dev/202301.mbox/%3C20230119152001.GA27446%40tarpaulin.shahaf.local2%3E
>>> > 
>>> > https://mail-archives.apache.org/mod_mbox/subversion-dev/202212.mbox/%3CCAMHy98NqYBLZaTL5-FAbf24RR6bagPN1npC5gsZenewZb0-EuQ%40mail.gmail.com%3E
>>> > 
>>> > Objections have been raised, been left unanswered, and now
>>> > implementation work has commenced following the original 
>>> > design. That's
>>> > not acceptable.
>>> 
>>> I'm a little surprised by your reaction.
>>> 
>>> It is never "not acceptable" for someone to do implementation 
>>> work on a
>>> branch while a discussion is happening, even if that discussion 
>>> contains
>>> objections to or questions about the premise of the branch 
>>> work.
>>> 
>>> It's a branch.  He didn't merge it to trunk, and he posted it 
>>> as an explicit
>>> invitation for discussion.
>>> 
>>
>>I didn't object to the use of a branch /per se/.  I objected to 
>>the
>>treating of objections that *had already been posted* as though 
>>they had
>>never been posted.  *That's* not acceptable.
>>
>>However, since you ask, I don't think implementing a proposal on
>>a branch is necessarily a good idea:
>>
>>- If the branch is seen and presented as a PoC for furthering 
>>discussion
>>  and for discovering practical considerations (e.g., that
>>  PRISTINE.MD5_CHECKSUM docstring I found yesterday during 
>>  discussion,
>>  or the ra_serf sha1 optimization that anyone implementing the 
>>  branch
>>  would run into), it's likely a good thing.
>>  
>>- On the other hand, when the branch implements the original 
>>proposal,
>>  whilst outstanding questions were not only not answered but 
>>  also not
>>  acknowledged, that's quite another thing.  It can result in:
>>
>>  + The branch maintainer being biased in favour of the approach 
>>  they
>>have implemented.  (People tend not to argue against what 
>>they have
>>expended resources on.  Cf. plan continuation bias, sunk cost
>>fallacy.)
>>
>>  + dev@ being biased towards the approach that has been 
>>  implemented
>>(because it's a known entity; because no one is volunteering 
>>to
>>implement another approach; because there's a desire to cut
>>a minor release soon…).  This, in turn, can result in…
>>  
>>  + …an incentive for participants *not* to hold open design
>>discussions on dev@ in the first place.
>>
>>> > I'm vetoing the change until a non-rubber-stamp design
>>> > discussion has been completed on the public dev@ list.
>>> 
>>> Starting an implementation on a branch is a valuable 
>>> contribution to a
>>> design discussion -- it's exactly the kind of 
>>> "non-rubber-stamp"
>>> contribution one would want.
>>> 
>>
>>You're just repeating what you said above.
>>
>>> If you want to re-iterate points you've made that have been 
>>> left unanswered,
>>> that would be a useful contribution -- perhaps some of those 
>>> points will be
>>> updated now that there's actual code, or perhaps they won't. 
>>> Either way,
>>> what Evgeny is doing here seems very constructive to me, and 
>>> entirely within
>>> the normal range of how we do things.
>>
>>Posting a paragraph such as the one I'm replying to is not 
>>"entirely
>>within the normal range of how we do things".  As to my points, 
>>see
>>.
>>They boil down to this:
>>
>> We should migrate away from SHA-1.
>> Why?
>>
>>Daniel
>>
>>> Best regards,
>>> -Karl


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-21 Thread Karl Fogel
*nod* This issue isn't important enough to me to continue the 
conversation -- I'd like for new hash algorithms to be possible, 
and I think Evgeny's work on it is worthwhile, but I don't feel 
nearly as strongly about this as I feel about making the new 
pristineless working copies available in an official release as 
soon as we can.


Best regards,
-Karl

On 21 Jan 2023, Daniel Shahaf wrote:

Karl Fogel wrote on Fri, Jan 20, 2023 at 11:09:11 -0600:

On 20 Jan 2023, Daniel Shahaf wrote:
> Evgeny Kotkov via dev wrote on Thu, 19 Jan 2023 18:52 +00:00:
> > I can complete the work on this branch and bring it to a
> > production-ready
> > state, assuming there are no objections.
> 
> Your assumption is counterfactual:
> 
> https://mail-archives.apache.org/mod_mbox/subversion-dev/202301.mbox/%3C20230119152001.GA27446%40tarpaulin.shahaf.local2%3E
> 
> https://mail-archives.apache.org/mod_mbox/subversion-dev/202212.mbox/%3CCAMHy98NqYBLZaTL5-FAbf24RR6bagPN1npC5gsZenewZb0-EuQ%40mail.gmail.com%3E
> 
> Objections have been raised, been left unanswered, and now
> implementation work has commenced following the original 
> design. That's

> not acceptable.

I'm a little surprised by your reaction.

It is never "not acceptable" for someone to do implementation 
work on a
branch while a discussion is happening, even if that discussion 
contains
objections to or questions about the premise of the branch 
work.


It's a branch.  He didn't merge it to trunk, and he posted it 
as an explicit

invitation for discussion.



I didn't object to the use of a branch /per se/.  I objected to 
the
treating of objections that *had already been posted* as though 
they had

never been posted.  *That's* not acceptable.

However, since you ask, I don't think implementing a proposal on
a branch is necessarily a good idea:

- If the branch is seen and presented as a PoC for furthering 
discussion

 and for discovering practical considerations (e.g., that
 PRISTINE.MD5_CHECKSUM docstring I found yesterday during 
 discussion,
 or the ra_serf sha1 optimization that anyone implementing the 
 branch

 would run into), it's likely a good thing.
 
- On the other hand, when the branch implements the original 
proposal,
 whilst outstanding questions were not only not answered but 
 also not

 acknowledged, that's quite another thing.  It can result in:

 + The branch maintainer being biased in favour of the approach 
 they
   have implemented.  (People tend not to argue against what 
   they have

   expended resources on.  Cf. plan continuation bias, sunk cost
   fallacy.)

 + dev@ being biased towards the approach that has been 
 implemented
   (because it's a known entity; because no one is volunteering 
   to

   implement another approach; because there's a desire to cut
   a minor release soon…).  This, in turn, can result in…
 
 + …an incentive for participants *not* to hold open design

   discussions on dev@ in the first place.


> I'm vetoing the change until a non-rubber-stamp design
> discussion has been completed on the public dev@ list.

Starting an implementation on a branch is a valuable 
contribution to a
design discussion -- it's exactly the kind of 
"non-rubber-stamp"

contribution one would want.



You're just repeating what you said above.

If you want to re-iterate points you've made that have been 
left unanswered,
that would be a useful contribution -- perhaps some of those 
points will be
updated now that there's actual code, or perhaps they won't. 
Either way,
what Evgeny is doing here seems very constructive to me, and 
entirely within

the normal range of how we do things.


Posting a paragraph such as the one I'm replying to is not 
"entirely
within the normal range of how we do things".  As to my points, 
see

.
They boil down to this:

We should migrate away from SHA-1.
Why?

Daniel


Best regards,
-Karl


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-21 Thread Daniel Shahaf
Karl Fogel wrote on Fri, Jan 20, 2023 at 11:18:56 -0600:
> On 20 Jan 2023, Nathan Hartman wrote:
> > Taking a step back, this discussion started because pristine-free WCs
> > are IIUC more dependent on comparing hashes than pristineful WCs, and
> > therefore a hash collision could have more impact in a pristine-free
> > WC. "Guarantees" were mentioned, but I think it's important to state
> > that there's only a guarantee of probability, since as mentioned above
> > all hashes will have collisions.
> 
> Sure, in a literal mathematical sense, but not in a sense that matters for
> our purposes here.
> 
> In the absence of an intentionally caused collision, a good hash function
> has *far* less chance of accidental collision than, say, the chance that
> your CPU will malfunction due to a stray cosmic ray, or the chance of us
> getting hit by a planet-destroying meteorite tomorrow.
> 
> For our purposes, "guarantee" is accurate.  No guarantee we make can be
> stonger than the inverse probability of a CPU/memory malfunction anyway.
> 

The probability of an accidental collision in a "good" N-bit hash
function is on the order of 1/√2ⁿ, which for sufficiently large N is
considered an acceptable risk.  That's invariant over time, however,
intentionally causing collisions becomes easier over time.

> > We already can't store files with identical SHA1 hashes, but AFAIK the
> > only meaningful impact we've ever heard is that security researchers
> > cannot track files they generate with deliberate collisions. The same
> > would be true with any hash type, for collisions within that hash
> > type.
> 
> Yes.  A hash is considered "broken" the moment security researches can
> generate a collision.
> 

To be clear, is this what you're saying? —
.
Premise: There is a collision attack against SHA-1.
Conclusion: Subversion should stop using SHA-1.

This conclusion does not follow from this premise.  For instance, FSFS
checks for collisions, so it can actually use "File length in bytes" as
a checksum and everything would work; the only thing that would change
is that it would not be possible to commit a file that's the same
expanded_size as any other node-rev (including directories).

And, anyway, the burden is not on me to disprove your claim, but on
you to prove it.

> FWIW, in one of my previous posts, I described a real-life scenario in which
> the ability to generate a chosen-plaintext collision in an SVN working copy
> would have security implications.

Yes, and as I have already asked: What other counters to that attack,
besides migrating away from SHA-1, have you considered?  Have you
considered the downsides of migrating away from SHA-1?

Also, /if/ we changed checksums, would that address the attack?  Put
differently, why is a similar attack impossible if we change the
checksum algorithm?  Why is use of SHA-1 a /sine qua non/ of your
scenario?

For example, if we used another checksum algorithm, the attacker from
your scenario might opt to edit the base checksums in .svn/wc.db and
rename the .svn/pristine/ files accordingly.  That's much easier to pull
off, and will be easy to adapt if we change the algorithm again, but on
the other hand, requires write access to the .svn directory and is
easier to discover.

Daniel

> Best regards,
> -Karl


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-21 Thread Daniel Shahaf
Karl Fogel wrote on Fri, Jan 20, 2023 at 11:09:11 -0600:
> On 20 Jan 2023, Daniel Shahaf wrote:
> > Evgeny Kotkov via dev wrote on Thu, 19 Jan 2023 18:52 +00:00:
> > > I can complete the work on this branch and bring it to a
> > > production-ready
> > > state, assuming there are no objections.
> > 
> > Your assumption is counterfactual:
> > 
> > https://mail-archives.apache.org/mod_mbox/subversion-dev/202301.mbox/%3C20230119152001.GA27446%40tarpaulin.shahaf.local2%3E
> > 
> > https://mail-archives.apache.org/mod_mbox/subversion-dev/202212.mbox/%3CCAMHy98NqYBLZaTL5-FAbf24RR6bagPN1npC5gsZenewZb0-EuQ%40mail.gmail.com%3E
> > 
> > Objections have been raised, been left unanswered, and now
> > implementation work has commenced following the original design. That's
> > not acceptable.
> 
> I'm a little surprised by your reaction.
> 
> It is never "not acceptable" for someone to do implementation work on a
> branch while a discussion is happening, even if that discussion contains
> objections to or questions about the premise of the branch work.
> 
> It's a branch.  He didn't merge it to trunk, and he posted it as an explicit
> invitation for discussion.
> 

I didn't object to the use of a branch /per se/.  I objected to the
treating of objections that *had already been posted* as though they had
never been posted.  *That's* not acceptable.

However, since you ask, I don't think implementing a proposal on
a branch is necessarily a good idea:

- If the branch is seen and presented as a PoC for furthering discussion
  and for discovering practical considerations (e.g., that
  PRISTINE.MD5_CHECKSUM docstring I found yesterday during discussion,
  or the ra_serf sha1 optimization that anyone implementing the branch
  would run into), it's likely a good thing.
  
- On the other hand, when the branch implements the original proposal,
  whilst outstanding questions were not only not answered but also not
  acknowledged, that's quite another thing.  It can result in:

  + The branch maintainer being biased in favour of the approach they
have implemented.  (People tend not to argue against what they have
expended resources on.  Cf. plan continuation bias, sunk cost
fallacy.)

  + dev@ being biased towards the approach that has been implemented
(because it's a known entity; because no one is volunteering to
implement another approach; because there's a desire to cut
a minor release soon…).  This, in turn, can result in…
  
  + …an incentive for participants *not* to hold open design
discussions on dev@ in the first place.

> > I'm vetoing the change until a non-rubber-stamp design
> > discussion has been completed on the public dev@ list.
> 
> Starting an implementation on a branch is a valuable contribution to a
> design discussion -- it's exactly the kind of "non-rubber-stamp"
> contribution one would want.
> 

You're just repeating what you said above.

> If you want to re-iterate points you've made that have been left unanswered,
> that would be a useful contribution -- perhaps some of those points will be
> updated now that there's actual code, or perhaps they won't.  Either way,
> what Evgeny is doing here seems very constructive to me, and entirely within
> the normal range of how we do things.

Posting a paragraph such as the one I'm replying to is not "entirely
within the normal range of how we do things".  As to my points, see
.
They boil down to this:

 We should migrate away from SHA-1.
 Why?

Daniel

> Best regards,
> -Karl


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-20 Thread Martin Edgar Furter Rathod




On 20.01.23 22:48, Karl Fogel wrote:

On 20 Jan 2023, Nathan Hartman wrote:

We already can't store files with identical SHA1 hashes, but AFAIK the
only meaningful impact we've ever heard is that security researchers
cannot track files they generate with deliberate collisions. The same
would be true with any hash type, for collisions within that hash
type.


Yes.  A hash is considered "broken" the moment security researches can 
generate a collision.


No matter what hash function you choose now, sooner or later it will be 
broken.


But a broken hash function can still be good enough for use in tools 
like subversion if it is used correctly. Instead of just storing the 
hash value subversion should also store a sequence number. Whenever a 
collision happens subversion has to compare the two (or more) files 
which have the same hash value. If the files are identical the old 
hash+number pair is stored. If they differ the new file gets a new 
sequence number and that hash+number pair is stored. Since collisions 
almost never happen even if md5 is used the performance penalty will be 
almost zero.


The same thing has been discussed earlier and changing the hash function 
will just solve the problem for a few years...


Best regards,
Martin


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-20 Thread Karl Fogel

On 20 Jan 2023, Nathan Hartman wrote:
Taking a step back, this discussion started because pristine-free 
WCs
are IIUC more dependent on comparing hashes than pristineful WCs, 
and
therefore a hash collision could have more impact in a 
pristine-free
WC. "Guarantees" were mentioned, but I think it's important to 
state
that there's only a guarantee of probability, since as mentioned 
above

all hashes will have collisions.


Sure, in a literal mathematical sense, but not in a sense that 
matters for our purposes here.


In the absence of an intentionally caused collision, a good hash 
function has *far* less chance of accidental collision than, say, 
the chance that your CPU will malfunction due to a stray cosmic 
ray, or the chance of us getting hit by a planet-destroying 
meteorite tomorrow.


For our purposes, "guarantee" is accurate.  No guarantee we make 
can be stonger than the inverse probability of a CPU/memory 
malfunction anyway.


We already can't store files with identical SHA1 hashes, but 
AFAIK the
only meaningful impact we've ever heard is that security 
researchers
cannot track files they generate with deliberate collisions. The 
same

would be true with any hash type, for collisions within that hash
type.


Yes.  A hash is considered "broken" the moment security researches 
can generate a collision.


FWIW, in one of my previous posts, I described a real-life 
scenario in which the ability to generate a chosen-plaintext 
collision in an SVN working copy would have security implications.


Best regards,
-Karl


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-20 Thread Karl Fogel

On 20 Jan 2023, Daniel Shahaf wrote:

Evgeny Kotkov via dev wrote on Thu, 19 Jan 2023 18:52 +00:00:
I can complete the work on this branch and bring it to a 
production-ready

state, assuming there are no objections.


Your assumption is counterfactual:

https://mail-archives.apache.org/mod_mbox/subversion-dev/202301.mbox/%3C20230119152001.GA27446%40tarpaulin.shahaf.local2%3E

https://mail-archives.apache.org/mod_mbox/subversion-dev/202212.mbox/%3CCAMHy98NqYBLZaTL5-FAbf24RR6bagPN1npC5gsZenewZb0-EuQ%40mail.gmail.com%3E

Objections have been raised, been left unanswered, and now
implementation work has commenced following the original design. 
That's

not acceptable.


I'm a little surprised by your reaction.

It is never "not acceptable" for someone to do implementation work 
on a branch while a discussion is happening, even if that 
discussion contains objections to or questions about the premise 
of the branch work.


It's a branch.  He didn't merge it to trunk, and he posted it as 
an explicit invitation for discussion.



I'm vetoing the change until a non-rubber-stamp design
discussion has been completed on the public dev@ list.


Starting an implementation on a branch is a valuable contribution 
to a design discussion -- it's exactly the kind of 
"non-rubber-stamp" contribution one would want.


If you want to re-iterate points you've made that have been left 
unanswered, that would be a useful contribution -- perhaps some of 
those points will be updated now that there's actual code, or 
perhaps they won't.  Either way, what Evgeny is doing here seems 
very constructive to me, and entirely within the normal range of 
how we do things.


Best regards,
-Karl


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-20 Thread Daniel Shahaf
Nathan Hartman wrote on Fri, 20 Jan 2023 14:51 +00:00:
> 1. Pros/cons of switching from SHA1 to another hash.
⋮
> Do we need to switch from SHA1 to another hash? One con that was
> already mentioned [1] is that we'll never really be able to switch
> away from SHA1, as there are existing clients, servers, and working
> copies out there. Not only will we have to support SHA1 forever for
> backwards compatibility,

Actually, I think it's MD5, not SHA-1, that we have to support
indefinitely, since our uses of SHA-1 fall into two categories:

- Accompanied by MD5.  (wc.db PRISTINE table, FSFS node-rev headers,
  dumpfiles' Text-content-* headers)

- An optional optimization.  (ra_serf, rep-cache.db)

>  but any new hash that is ever added will need
> to be supported forever as well. If we accumulate many of those, it
> might become a burden,

Good point.  Then perhaps we should continue to record two checksums, as
both wc.db and FSFS do?  If we record, say, both «(svn_checksum_kind_t)42»
checksums and «(svn_checksum_kind_t)value_of_the_month» checksums, then
we'll only need to be able to upgrade from the former.

>but perhaps there will be only one new hash and
> it will be the "blessed" one for the next 20 years.

Cheers,

Daniel

P.S.  wc-metadata.sql implies that having MD5 collisions in a wc is supported:

 1  /* wc-metadata.sql -- schema used in the wc-metadata SQLite database
 2   * This is intended for use with SQLite 3
 ⋮
94  CREATE TABLE PRISTINE (
95/* The SHA-1 checksum of the pristine text. This is a unique key. The
96   SHA-1 checksum of a pristine text is assumed to be unique among all
97   pristine texts referenced from this database. */
98checksum  TEXT NOT NULL PRIMARY KEY,
99  
 ⋮
   114/* Alternative MD5 checksum used for communicating with older
   115   repositories. Not strictly guaranteed to be unique among table 
rows. */
   116md5_checksum  TEXT NOT NULL
   117);
   118  
   119  CREATE INDEX I_PRISTINE_MD5 ON PRISTINE (md5_checksum);


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-20 Thread Nathan Hartman
On Fri, Jan 20, 2023 at 9:51 AM Nathan Hartman  wrote:
>
> On Fri, Jan 20, 2023 at 7:18 AM Daniel Shahaf  wrote:
> >
> > Evgeny Kotkov via dev wrote on Thu, 19 Jan 2023 18:52 +00:00:
> > > I can complete the work on this branch and bring it to a production-ready
> > > state, assuming there are no objections.
> >
> > Your assumption is counterfactual:
> >
> > https://mail-archives.apache.org/mod_mbox/subversion-dev/202301.mbox/%3C20230119152001.GA27446%40tarpaulin.shahaf.local2%3E
> >
> > https://mail-archives.apache.org/mod_mbox/subversion-dev/202212.mbox/%3CCAMHy98NqYBLZaTL5-FAbf24RR6bagPN1npC5gsZenewZb0-EuQ%40mail.gmail.com%3E
> >
> > Objections have been raised, been left unanswered, and now
> > implementation work has commenced following the original design.  That's
> > not acceptable.  I'm vetoing the change until a non-rubber-stamp design
> > discussion has been completed on the public dev@ list.
>
>
> I think we can start by discussing some of the pros and cons.
>
> There are two separate things here but they end up being mixed
> together in the discussions:
>
> 1. Pros/cons of switching from SHA1 to another hash.
> 2. Supporting different hash types in f32.
>
> Regarding the first item:
>
> Do we need to switch from SHA1 to another hash? One con that was
> already mentioned [1] is that we'll never really be able to switch
> away from SHA1, as there are existing clients, servers, and working
> copies out there. Not only will we have to support SHA1 forever for
> backwards compatibility, but any new hash that is ever added will need
> to be supported forever as well. If we accumulate many of those, it
> might become a burden, but perhaps there will be only one new hash and
> it will be the "blessed" one for the next 20 years.
>
> There were concerns about collisions; since the space of possible
> input datasets is infinite and the hash code size is fixed and finite
> (pretty large, but very much finite), there will always be collisions
> with any hash. The significant questions are: how small is the
> probability of a collision, and (for the purposes of security) how
> hard is it to generate input data that produces a collision? The
> answer to the first question is fixed; the second one is probably
> expected to change over time, as algorithms are studied and new
> vulnerabilities are found. Which hash type do you pick, and who knows
> if a hash thought to be very strong (today) later proves easier to
> crack than one that is thought not as strong? We can only guess.
>
> Taking a step back, this discussion started because pristine-free WCs
> are IIUC more dependent on comparing hashes than pristineful WCs, and
> therefore a hash collision could have more impact in a pristine-free
> WC. "Guarantees" were mentioned, but I think it's important to state
> that there's only a guarantee of probability, since as mentioned above
> all hashes will have collisions.
>
> We already can't store files with identical SHA1 hashes, but AFAIK the
> only meaningful impact we've ever heard is that security researchers
> cannot track files they generate with deliberate collisions. The same
> would be true with any hash type, for collisions within that hash
> type.
>
> Advantages of switching to a new hash type might include: reducing the
> already small probability of collisions; choosing an algorithm that is
> faster or that has (or is expected to have in the future) hardware
> acceleration on commodity systems, perhaps addressing user perception
> (if SHA1 is seen as old and uncool), but then again, we can't really
> get rid of SHA1...
>
> [1] https://lists.apache.org/thread/v3dv1dtod2t9yrf920h4838g2t0l94cw
>
> Regarding the second item:
>
> Since the premise of this feature is to support adding new hash types
> without bumping wc formats, it follows that any new hash type will
> create compatibility problems for clients that support f32 but not the
> specific new hash type. In light of that, it might just be better to
> bump the wc format and then you know at the outset that you need to
> upgrade your client. Just thinking out loud here but this might be
> (partly) mitigated by trying to guess which hash types we might want
> in the future and supporting them now, even if no existing client will
> actually use them, but I don't really like this idea.
>
> I'll have to return later with more thoughts...

Just quickly I want to say that although I mentioned mostly cons
above, I don't want to appear to be against switching hashes nor
against supporting multiple hash types in f32; rather, since the
i525-pod feature necessitated a format bump anyway, I do think it
makes sense to consider adding such changes now, to avoid a future
format bump, and I'm considering arguments contrary to that from a
desire to be unbiased about it.

I have more thoughts (including more pros) but have some things to
attend to now.

Looking forward to hearing others' thoughts as well.

Cheers,
Nathan


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-20 Thread Nathan Hartman
On Fri, Jan 20, 2023 at 7:18 AM Daniel Shahaf  wrote:
>
> Evgeny Kotkov via dev wrote on Thu, 19 Jan 2023 18:52 +00:00:
> > I can complete the work on this branch and bring it to a production-ready
> > state, assuming there are no objections.
>
> Your assumption is counterfactual:
>
> https://mail-archives.apache.org/mod_mbox/subversion-dev/202301.mbox/%3C20230119152001.GA27446%40tarpaulin.shahaf.local2%3E
>
> https://mail-archives.apache.org/mod_mbox/subversion-dev/202212.mbox/%3CCAMHy98NqYBLZaTL5-FAbf24RR6bagPN1npC5gsZenewZb0-EuQ%40mail.gmail.com%3E
>
> Objections have been raised, been left unanswered, and now
> implementation work has commenced following the original design.  That's
> not acceptable.  I'm vetoing the change until a non-rubber-stamp design
> discussion has been completed on the public dev@ list.


I think we can start by discussing some of the pros and cons.

There are two separate things here but they end up being mixed
together in the discussions:

1. Pros/cons of switching from SHA1 to another hash.
2. Supporting different hash types in f32.

Regarding the first item:

Do we need to switch from SHA1 to another hash? One con that was
already mentioned [1] is that we'll never really be able to switch
away from SHA1, as there are existing clients, servers, and working
copies out there. Not only will we have to support SHA1 forever for
backwards compatibility, but any new hash that is ever added will need
to be supported forever as well. If we accumulate many of those, it
might become a burden, but perhaps there will be only one new hash and
it will be the "blessed" one for the next 20 years.

There were concerns about collisions; since the space of possible
input datasets is infinite and the hash code size is fixed and finite
(pretty large, but very much finite), there will always be collisions
with any hash. The significant questions are: how small is the
probability of a collision, and (for the purposes of security) how
hard is it to generate input data that produces a collision? The
answer to the first question is fixed; the second one is probably
expected to change over time, as algorithms are studied and new
vulnerabilities are found. Which hash type do you pick, and who knows
if a hash thought to be very strong (today) later proves easier to
crack than one that is thought not as strong? We can only guess.

Taking a step back, this discussion started because pristine-free WCs
are IIUC more dependent on comparing hashes than pristineful WCs, and
therefore a hash collision could have more impact in a pristine-free
WC. "Guarantees" were mentioned, but I think it's important to state
that there's only a guarantee of probability, since as mentioned above
all hashes will have collisions.

We already can't store files with identical SHA1 hashes, but AFAIK the
only meaningful impact we've ever heard is that security researchers
cannot track files they generate with deliberate collisions. The same
would be true with any hash type, for collisions within that hash
type.

Advantages of switching to a new hash type might include: reducing the
already small probability of collisions; choosing an algorithm that is
faster or that has (or is expected to have in the future) hardware
acceleration on commodity systems, perhaps addressing user perception
(if SHA1 is seen as old and uncool), but then again, we can't really
get rid of SHA1...

[1] https://lists.apache.org/thread/v3dv1dtod2t9yrf920h4838g2t0l94cw

Regarding the second item:

Since the premise of this feature is to support adding new hash types
without bumping wc formats, it follows that any new hash type will
create compatibility problems for clients that support f32 but not the
specific new hash type. In light of that, it might just be better to
bump the wc format and then you know at the outset that you need to
upgrade your client. Just thinking out loud here but this might be
(partly) mitigated by trying to guess which hash types we might want
in the future and supporting them now, even if no existing client will
actually use them, but I don't really like this idea.

I'll have to return later with more thoughts...

Cheers,
Nathan


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-20 Thread Daniel Shahaf
Evgeny Kotkov via dev wrote on Thu, 19 Jan 2023 18:52 +00:00:
> I can complete the work on this branch and bring it to a production-ready
> state, assuming there are no objections.

Your assumption is counterfactual:

https://mail-archives.apache.org/mod_mbox/subversion-dev/202301.mbox/%3C20230119152001.GA27446%40tarpaulin.shahaf.local2%3E

https://mail-archives.apache.org/mod_mbox/subversion-dev/202212.mbox/%3CCAMHy98NqYBLZaTL5-FAbf24RR6bagPN1npC5gsZenewZb0-EuQ%40mail.gmail.com%3E

Objections have been raised, been left unanswered, and now
implementation work has commenced following the original design.  That's
not acceptable.  I'm vetoing the change until a non-rubber-stamp design
discussion has been completed on the public dev@ list.

Daniel


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-19 Thread Karl Fogel

On 19 Jan 2023, Evgeny Kotkov wrote:
To have a more or less accurate estimate, I went ahead and 
prepared the
first-cut implementation of an approach that makes the pristine 
checksum

kind configurable in a working copy.

The current implementation passes all tests in my environment and 
seems to

work in practice.  It is available on the branch:

 https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-kind

The implementation on the branch allows creating working copies 
that use a

checksum kind other than SHA-1.

The checksum kind is persisted in the settings table.  Upgraded 
working copies
of the older formats will have SHA-1 recorded as their pristine 
checksum kind
and will continue to use it for compatibility.  Newly created 
working copies
of the latest format (with --compatible-version=1.15 or 
--store-pristine=no),
as currently implemented, will use the new pristine checksum 
kind.


Currently, as a proof-of-concept, the branch uses salted SHA-1 as 
the new
pristine checksum kind.  For the production-ready state, I plan 
to support
using multiple new checksum types such as SHA-256.  I think that 
it would
be useful for future compatibility, because if we encounter any 
issues with
one checksum kind, we could then switch to a different kind 
without having

to change the working copy format.

One thing worth noting is that ra_serf contains a specific 
optimization for
the skelta-style updates that allows skipping a GET request if 
the pristine
store already contains an entry with the specified SHA-1 
checksum.  Switching
to a different checksum type for the pristine entries is going to 
disable
that specific optimization.  Re-enabling it would require an 
update of the

server-side.  I consider this to be out of scope for this branch.

I can complete the work on this branch and bring it to a 
production-ready

state, assuming there are no objections.


This sounds great to me; thank you, Evgeny.  I agree that the 
server-side companion change is (or anyway can be) out-of-scope 
here -- the perfect should not be the enemy of the good, etc.


Best regards,
-Karl


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-19 Thread Karl Fogel

On 19 Jan 2023, Daniel Shahaf wrote:

https://subversion.apache.org/security/sha1-advisory.txt


That's a well-written advisory.  I was surprised to see that there 
is no date on it, though -- from looking at the page, one would 
have no quick way of knowing the date it was published (although 
one would know that it must have been published 2017 or after, 
since it references events in 2017).


Best regards,
-Karl


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-19 Thread Evgeny Kotkov via dev
Karl Fogel  writes:

> Now, how hard would this be to actually implement?

To have a more or less accurate estimate, I went ahead and prepared the
first-cut implementation of an approach that makes the pristine checksum
kind configurable in a working copy.

The current implementation passes all tests in my environment and seems to
work in practice.  It is available on the branch:

  https://svn.apache.org/repos/asf/subversion/branches/pristine-checksum-kind

The implementation on the branch allows creating working copies that use a
checksum kind other than SHA-1.

The checksum kind is persisted in the settings table.  Upgraded working copies
of the older formats will have SHA-1 recorded as their pristine checksum kind
and will continue to use it for compatibility.  Newly created working copies
of the latest format (with --compatible-version=1.15 or --store-pristine=no),
as currently implemented, will use the new pristine checksum kind.

Currently, as a proof-of-concept, the branch uses salted SHA-1 as the new
pristine checksum kind.  For the production-ready state, I plan to support
using multiple new checksum types such as SHA-256.  I think that it would
be useful for future compatibility, because if we encounter any issues with
one checksum kind, we could then switch to a different kind without having
to change the working copy format.

One thing worth noting is that ra_serf contains a specific optimization for
the skelta-style updates that allows skipping a GET request if the pristine
store already contains an entry with the specified SHA-1 checksum.  Switching
to a different checksum type for the pristine entries is going to disable
that specific optimization.  Re-enabling it would require an update of the
server-side.  I consider this to be out of scope for this branch.

I can complete the work on this branch and bring it to a production-ready
state, assuming there are no objections.


Thanks,
Evgeny Kotkov


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2023-01-19 Thread Daniel Shahaf
Karl Fogel wrote on Thu, Dec 29, 2022 at 17:35:44 -0600:
> On 29 Dec 2022, Evgeny Kotkov wrote:
> > Karl Fogel  writes:
> > 
> > > Now, how hard would this be to actually implement?
> > 
> > I plan to take a more detailed look at that, but I'm currently on
> > vacation for the New Year holidays.
> 
> That's great to hear, Evgeny.  In the meantime, enjoy your vacation!

Any news on this?  Over here it's still not clear to me why what problem
would be solved by switching away from SHA-1, what alternative solutions
to that problem have been considered, and whether anyone has actually
stopped to consider /both/ the pros and cons of switching away from SHA-1.

Karl Fogel wrote on Wed, Dec 28, 2022 at 09:10:31 -0400:
> On 28 Dec 2022, Daniel Sahlberg wrote:
> > Since we need to be backwards compatible with older v1 clients, can
> > this check ever be removed (before Subversion 2)?
> > 
> > So, while I believe f32 is a good opportunity to switch to a new
> > hash, what is the problem we would like to solve with a new hash?
> 
> As I said before, even if we couldn't think of a concrete problem right now,
> the mere fact that a former guarantee [1] has become a non-guarantee is
> enough motivation.  We can't anticipate all the problems that might arise
> from people being able to craft local content that looks unmodified to
> Subversion.  (As you implied, r1794611 has no effect for content that is
> never committed to the repository.)
> 
> Of course, my saying "This matters just through reasoning from first
> principles, therefore we should fix it" would count for a lot more if I were
> volunteering to fix it, which I'm not alas. But I do think we don't need to
> search further for justifications. What we already know is enough: our hash
> algorithm is known to be collidable, yet what we're using it for depends on
> non-collidability; therefore, switching to a better algorithm is a good
> idea.
> 

Agreed that we shouldn't limit ourselves to problems/attacks we can
imagine.

However, it does not follow from "the mere fact that a former guarantee
has become a non-guarantee" that we should switch the checksum
algorithm.  What does folow from that is that we should review our
design, identify the places that depend on the no-longer-valid
guarantee, assess the implications for each of them, and then determine
what sort of changes may be needed.

In other words, we should do what we do whenever we write an advisory.

Which reminds me:

https://subversion.apache.org/security/sha1-advisory.txt

Daniel

> However, it needn't be a blocker for the next release, for the reason Brane
> gave.
> 
> Best regards,
> -Karl
> 
> [1] "Former guarantee" meaning "former guarantee for all practical
> purposes", of course, since in the past there weren't ways to make
> collisions happen.


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2022-12-29 Thread Karl Fogel

On 29 Dec 2022, Evgeny Kotkov wrote:

Karl Fogel  writes:


Now, how hard would this be to actually implement?


I plan to take a more detailed look at that, but I'm currently on 
vacation

for the New Year holidays.


That's great to hear, Evgeny.  In the meantime, enjoy your 
vacation!


Best regards,
-Karl


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2022-12-29 Thread Branko Čibej

On 28.12.2022 13:34, Daniel Sahlberg wrote:
Since we need to be backwards compatible with older v1 clients, can 
this check ever be removed (before Subversion 2)?


The case you're citing is specific to the repository, you could easily 
have a repository format that uses different hashes. The same for the RA 
layer, where we have capability negotiation; likewise for the WC. We'll 
always need compatibility with older formats, but a new enough client 
and server could use, e.g., SHA-256 or -512 all the way from WC to 
repository.


So, while I believe f32 is a good opportunity to switch to a new hash, 
what is the problem we would like to solve with a new hash?


On the other hand, there can be no "switching to" a new hash, because 
you don't know what the server actually supports -- hence, we'll always 
have to keep SHA-1 around. :) IMO Karl described one possible attack 
vector, and given the context (Wordpress...) it's probably only a matter 
of time before it happens.



-- Brane

Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2022-12-29 Thread Evgeny Kotkov via dev
Karl Fogel  writes:

> Now, how hard would this be to actually implement?

I plan to take a more detailed look at that, but I'm currently on vacation
for the New Year holidays.


Thanks,
Evgeny Kotkov


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2022-12-28 Thread Karl Fogel

On 28 Dec 2022, Branko Čibej wrote:
My point was that we shouldn't have to worry about format bumps 
as

much any more because we have infrastructure in the client for
supporting multiple WC formats. That includes optional pristines,
different hashes, compressed pristines, etc. etc.


Thank you for the reminder -- that is indeed important here.

On 28 Dec 2022, Daniel Sahlberg wrote:
Since we need to be backwards compatible with older v1 clients, 
can

this check ever be removed (before Subversion 2)?

So, while I believe f32 is a good opportunity to switch to a new
hash, what is the problem we would like to solve with a new hash?


As I said before, even if we couldn't think of a concrete problem 
right now, the mere fact that a former guarantee [1] has become a 
non-guarantee is enough motivation.  We can't anticipate all the 
problems that might arise from people being able to craft local 
content that looks unmodified to Subversion.  (As you implied, 
r1794611 has no effect for content that is never committed to the 
repository.)


Of course, my saying "This matters just through reasoning from 
first principles, therefore we should fix it" would count for a 
lot more if I were volunteering to fix it, which I'm not alas. 
But I do think we don't need to search further for justifications. 
What we already know is enough: our hash algorithm is known to be 
collidable, yet what we're using it for depends on 
non-collidability; therefore, switching to a better algorithm is a 
good idea.


However, it needn't be a blocker for the next release, for the 
reason Brane gave.


Best regards,
-Karl

[1] "Former guarantee" meaning "former guarantee for all practical 
purposes", of course, since in the past there weren't ways to make 
collisions happen.


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2022-12-28 Thread Daniel Sahlberg
Den ons 28 dec. 2022 kl 08:48 skrev Branko Čibej :

> On 27.12.2022 02:56, Karl Fogel wrote:
>
> Now, how hard would this be to actually implement?  The
> pristineless-format WC upgrade is an opportunity to make other format
> changes, but I'd hate to block the release of pristineless working copies
> on this...
>
>
> My point was that we shouldn't have to worry about format bumps as much
> any more because we have infrastructure in the client for supporting
> multiple WC formats. That includes optional pristines, different hashes,
> compressed pristines, etc. etc.
>

Evgeny has a point that when going from 31 to 32, we know that all
pristines are there and we can rehash them in place. If/when we create
format X with the new XYZ-hash, we either have to download all missing
pristines or we have to support multiple hashes for each file.

I've been thinking about this question and while I don't know all
background, it seems to be two different questions:
- Detecting changes in the WC. Karl has an excellent scenario where this
might be a problem, but switching to a new hash only makes this scenario
more expensive. Thus: What is the definition of "expensive enough"? I
believe this is a different way of asking the same question posed by
DanielSh about the criteria for a new hash.
- Storing files with hash collisions. Subversion prevents this (with
E160067) and as far as I understand this is because of r1794611 (by Stefan
Sperling) and the log message argues:

[[[
However, similar problems still exist in (at least) the RA layer and the
working copy. Until those are fixed, rejecting content which causes a hash
collision is the safest approach and avoids the undesired consequences of
storing such content.
]]]

Since we need to be backwards compatible with older v1 clients, can this
check ever be removed (before Subversion 2)?

So, while I believe f32 is a good opportunity to switch to a new hash, what
is the problem we would like to solve with a new hash?

Kind regards,
Daniel Sahlberg


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2022-12-27 Thread Branko Čibej

On 27.12.2022 02:56, Karl Fogel wrote:
Now, how hard would this be to actually implement?  The 
pristineless-format WC upgrade is an opportunity to make other format 
changes, but I'd hate to block the release of pristineless working 
copies on this...


My point was that we shouldn't have to worry about format bumps as much 
any more because we have infrastructure in the client for supporting 
multiple WC formats. That includes optional pristines, different hashes, 
compressed pristines, etc. etc.


-- Brane

Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format

2022-12-26 Thread Karl Fogel

On 20 Dec 2022, Evgeny Kotkov via dev wrote:

[Moving discussion to a new thread]

We currently have a problem that a working copy relies on the 
checksum type
with known collisions (SHA1).  A solution to that problem is to 
switch to a
different checksum type without known collisions in one of the 
newer working

copy formats.

Since we plan on shipping a new working copy format in 1.15, this 
seems to
be an appropriate moment of time to decide whether we'd also want 
to switch

to a checksum type without known collisions in that new format.

Below are the arguments for including a switch to a different 
checksum type

in the working copy format for 1.15:

1) Since the "is the file modified?" check now compares 
checksums, leaving
  everything as-is may be considered a regression, because it 
  would
  introduce additional cases where a working copy currently 
  relies on

  comparing checksums with known collisions.

2) We already need a working copy format bump for the 
pristines-on-demand
  feature.  So using that format bump to solve the SHA1 issue 
  might reduce
  the overall number of required bumps for users (assuming that 
  we'll still

  need to switch from SHA1 at some point later).

3) While the pristines-on-demand feature is not released, 
upgrading with a
  switch to the new checksum type seems to be possible without 
  requiring a
  network fetch.  But if some of the pristines are optional, we 
  lose the
  possibility to rehash all contents in place.  So we might find 
  ourselves
  having to choose between two worse alternatives of either 
  requiring a
  network fetch during upgrade or entirely prohibiting an 
  upgrade of

  working copies with optional pristines.

Thoughts?


A few thoughts:

First, Daniel Shahaf raises the question of whether there is 
really a problem here.  I.e., Why do we care about possible 
collisions when they're unlikely to happen in practice unless 
deliberately caused?


My answer is: we should care because it's very difficult to 
imagine all the consequences -- including but not limited to 
clever deliberate attacks -- that might follow from losing a 
property we formerly had.  The hash semantics we have always 
assumed are "If the file is modified, the hash will change."  When 
those semantics change, we don't need to be able to think 
immediately of a specific problematic scenario to know that this 
is a significant development.  We've lost the guarantee; that's 
enough to be worth worrying about.


BUT, if you want a scenario, here's one:

I have put WordPress installations under Subversion version 
control before.  Once, I detected an attack on one of those 
WordPress servers when one of the things the attacker did was 
modify some of the WordPress scripts on the server.  Those files 
showed up as modified when I ran 'svn st', and from there I ran 
'svn diff' and figured out what had happened.  But a super-careful 
attacker could make modifications that leave the 
version-controlled files with the same SHA1 hash they had before, 
thus making it harder to detect the attack.


Yes, I realize there are other ways to detect modifications, and 
that random attackers are unlikely to take the trouble to preserve 
hashes.  On the other hand, a well-resourced spear-fishing 
attacker who knows something about the usage of SVN at their 
target might indeed try a hash-preserving approach to breaking in. 
The point is, if we're counting on the hashes having certain 
semantics, then our users are counting on it too.  If SHA1 no 
longer has those semantics, we should upgrade.


Second, +1 to what Branko said: we should upgrade to a new hash 
when we upgrade a working copy anyway, but new clients should 
still be able to handle the old hash in old working copies without 
upgrading them.


Now, how hard would this be to actually implement?  The 
pristineless-format WC upgrade is an opportunity to make other 
format changes, but I'd hate to block the release of pristineless 
working copies on this...


Best regards,
-Karl


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format (was: Re: Getting to first release of pristines-on-demand feature (#525).)

2022-12-20 Thread Daniel Shahaf
Evgeny Kotkov via dev wrote on Tue, Dec 20, 2022 at 11:14:00 +0300:
> [Moving discussion to a new thread]
> 
> We currently have a problem that a working copy relies on the checksum type
> with known collisions (SHA1).  A solution to that problem

Why is libsvn_wc's use of SHA-1 a problem?  What's the scenario wherein
Subversion will behave differently than it should?

> is to switch to a different checksum type without known collisions in
> one of the newer working copy formats.

Such as SHA-1 salted by NODES.LOCAL_RELPATH and NODES.WC_ID (or a per-wc UUID)?

> Since we plan on shipping a new working copy format in 1.15, this seems to
> be an appropriate moment of time to decide whether we'd also want to switch
> to a checksum type without known collisions in that new format.
> 

What's the acceptance test we use for candidate checksum algorithms?

You say we should switch to a checksum algorithm that doesn't have known
collisions, but, why should we require that?  Consider the following
160-bit checksum algorithm:
.
1. If the input consists of 40 ASCII lowercase hex digits and
   nothing else, return the input.
2. Else, return the SHA-1 of the input.

This algorithm has a trivial first preimage attack.  If a wc used this
identity-then-sha1 algorithm instead of SHA-1, then… what?

> Below are the arguments for including a switch to a different checksum type
> in the working copy format for 1.15:
> 
> 1) Since the "is the file modified?" check now compares checksums, leaving
>everything as-is may be considered a regression, because it would
>introduce additional cases where a working copy currently relies on
>comparing checksums with known collisions.
> 

Well, SHA-1 is still collision-free so long as one is not deliberately
trying to use collisions, so this would only be a regression if we
consider "Deliberately store files that have the same checksum" to be
a use-case.  Do we?

I recall we discussed this when shattered.io was announced, and we
didn't rush to upgrade the checksums we use everywhere, so I guess back
then we came to the conclusion that wasn't a use-case.  (Of course we
can change our opinion; that's just a datapoint, and there may be more,
on both sides, in the old thread.)

I looked for the old thread and didn't find it.  (I looked in the
private@ archives too in case the thread was there.)

> 2) We already need a working copy format bump for the pristines-on-demand
>feature.  So using that format bump to solve the SHA1 issue might reduce
>the overall number of required bumps for users (assuming that we'll still
>need to switch from SHA1 at some point later).
> 

Considering that 1.15 will support reading and writing both f31 and f32,
the "overall number of required bumps" between 1.8 and trunk@HEAD is
zero, meaning the proposed change can't reduce that number.

> 3) While the pristines-on-demand feature is not released, upgrading
>with a switch to the new checksum type seems to be possible without
>requiring a network fetch.

I infer the scenario in question here is upgrading a (say) pristinesless
wc to a a newer format that supports a new checksum algorithm.

>But if some of the pristines are optional, we lose the possibility
>to rehash all contents in place.  So we might find ourselves having
>to choose between two worse alternatives of either requiring
>a network fetch during upgrade or entirely prohibiting an upgrade
>of working copies with optional pristines.

Why would we want to rehash everything in place?  The 1.15→1.16 upgrade
could simply leave pristineless files' checksums as SHA-1 until the next
«svn up», just like «svnadmin upgrade» of FSFS doesn't retroactively add
SHA-1 checksums to node-rev headers or "-file" or "-dir" indicators in
the changed-paths section.

There may be yet other alternatives.

> Thoughts?

I'm not voting either -0 or +0 at this time.

Cheers,

Daniel


Re: Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format (was: Re: Getting to first release of pristines-on-demand feature (#525).)

2022-12-20 Thread Branko Čibej

On 20.12.2022 09:14, Evgeny Kotkov wrote:

2) We already need a working copy format bump for the pristines-on-demand
feature.  So using that format bump to solve the SHA1 issue might reduce
the overall number of required bumps for users (assuming that we'll still
need to switch from SHA1 at some point later).


Using a new hashing algorithm in the working copy is relatively simple. 
Making such a change backwards-compatible is not. It would be really 
nice if this could be done in a way that allows newer clients to still 
support older working copies without upgrading them; after all, we have 
the infrastructure for this in place now.


-- Brane

Switching from SHA1 to a checksum type without known collisions in 1.15 working copy format (was: Re: Getting to first release of pristines-on-demand feature (#525).)

2022-12-20 Thread Evgeny Kotkov via dev
Karl Fogel  writes:

> > While here, I would like to raise a topic of incorporating a switch from
> > SHA1 to a different checksum type (without known collisions) for the new
> > working copy format.  This topic is relevant to the pristines-on-demand
> > branch, because the new "is the file modified?" check relies on the
> > checksum comparison, instead of comparing the contents of working and
> > pristine files.
> >
> > And so while I consider it to be out of the scope of the pristines-on-
> > demand branch, I think that we might want to evaluate if this is something
> > that should be a part of the next release.
>
> Good point.  Maybe worth a new thread?

[Moving discussion to a new thread]

We currently have a problem that a working copy relies on the checksum type
with known collisions (SHA1).  A solution to that problem is to switch to a
different checksum type without known collisions in one of the newer working
copy formats.

Since we plan on shipping a new working copy format in 1.15, this seems to
be an appropriate moment of time to decide whether we'd also want to switch
to a checksum type without known collisions in that new format.

Below are the arguments for including a switch to a different checksum type
in the working copy format for 1.15:

1) Since the "is the file modified?" check now compares checksums, leaving
   everything as-is may be considered a regression, because it would
   introduce additional cases where a working copy currently relies on
   comparing checksums with known collisions.

2) We already need a working copy format bump for the pristines-on-demand
   feature.  So using that format bump to solve the SHA1 issue might reduce
   the overall number of required bumps for users (assuming that we'll still
   need to switch from SHA1 at some point later).

3) While the pristines-on-demand feature is not released, upgrading with a
   switch to the new checksum type seems to be possible without requiring a
   network fetch.  But if some of the pristines are optional, we lose the
   possibility to rehash all contents in place.  So we might find ourselves
   having to choose between two worse alternatives of either requiring a
   network fetch during upgrade or entirely prohibiting an upgrade of
   working copies with optional pristines.

Thoughts?


Thanks,
Evgeny Kotkov