Re: Hash algorithm analysis

2018-09-18 Thread Linus Torvalds
On Tue, Sep 18, 2018 at 8:18 AM Joan Daemen  wrote:
>
> 3) The relatively large state in the sponge construction increases the 
> generic strength against attacks when the input contains redundancy or
> has a certain form. For instance, if the input is restricted to be text in 
> ASCII (such as source code), then the collision-resistance grows
> higher than the nominal 2^{c/2}. Such an effect does not exist with 
> narrow-pipe Merkle-Damgård. (This may be what Linus had intuitively in mind.)

Answering to just this part:

No, what I had in mind was literally just exactly the kind of attack
that SHA1 broke for - attacking the internal state vector directly,
and not paying any penalty for it, because the stat size is the same
as the final hash size.

The length extension attack is just the simplest and most trivial
version of that kind of attack - because the internal state vector
*is* the result, and you just continue using it.

But that trivial length extension thing not the real problem, it's
just the absolutely simplest symptom of the real problem.

I think that the model where the internal state of the hash is the
same width as the final result is simply broken. It was what broke
SHA1, and that problem is shared with SHA2.

"Length extension" is just the simplest way to say "broken by design", imho.

Because the length extension attack is just the most trivial attack,
but it isn't the fundamental problem. It was just the first and the
cheapest attack found, but it was also the most special-cased and
least interesting. You need to have a very special case (with that
secret at the beginning etc) to make the pure length extension attack
interesting. And git has no secrets, so in that sense "length
extension" by itself is totally immaterial. But the basic problem of
internal hash size obviously wasn't.

So I would say that length extension is a direct result of the _real_
problem, which is that the hash exposes _all_ of the internal data.

That is what makes length extension possible - because you can just
continue from a known state, and there is absolutely nothing hidden -
and yes, that's a really easy special case where you don't even need
to actually break the hash at all.

But I argue that it's _also_ one big part of what made SHAttered
practical, and I think the underlying problem is exactly the same.
When the internal state is the same size as the hash, you can attack
the internal state itself for basically the same cost as attacking the
whole hash.

So you can pick-and-choose the weakest point.

Which is basically exactly what SHAttered did. No, it wasn't the
trivial "just add to the end", but it used the exact same underlying
weakness as one part of the attack.

*This* is why I dislike SHA2. It has basically the exact same basic
weakness that we already know SHA1 fell for. The hashing details are
different, and hopefully that means that there aren't the same kind of
patterns that can be generated to do the "attack the internal hash
state" part, but I don't understand why people seem to ignore that
other fundamental issue.

Something like SHA-512/256 would have been better, but I think almost
nobody does that in hardware, which was one of the big advantages of
plain SHA2.

The main reason I think SHA2 is acceptable is simply that 256 bits is
a lot. So even if somebody comes up with a shortcut that weakens it by
tens of bits, nobody really cares. Plus I'm obviously not a
cryptographer, so I didn't feel like I was going to fight it a lot.

But yes, I'd have probably gone with any of the other alternatives,
because I think it's a bit silly that we're switching hashes to
another hash that has (at least in part) the *exact* same issue as the
one people call broken.

(And yes, the hashing details are different, so it's "exactly the
same" only wrt that internal state part - not the bitpattern finding
part that made the attack on the internal state much cheaper. Real
cryptographers obviously found that "figure out the weakness of the
hashing" to be the more interesting and novel part over the trivial
internal hash size part).

That said..

The real reason I think SHA2 is the right choice was simply that there
needs to be a decision, and none of the choices were *wrong*.
Sometimes just the _act_ of making a decision is more important than
_what_ the decision is.

And hey, it is also likely that the reason _I_ get hung up on just the
size of the internal state is that exactly because I am _not_ a
cryptographer, that kind of high-level stuff is the part I understand.
When you start talking about why the exact rules of Merkle–Damgård
constructions work, my eyes just glaze over.

So I'm probably - no, certainly - myopic and looking at only one part
of the issue to begin with.

The end result is that I argued for more bits in the internal state
(and apparently wide vs narrow is the technical term), and I would
have seen parallel algorithms as a bonus for the large-file case. None
of which argued for SHA2.

But 

Re: Hash algorithm analysis

2018-09-18 Thread Jonathan Nieder
Hi,

A quick note.

Joan Daemen wrote:

> when going over my todo list I was confronted with the mail of Dan
> Shumow on the successor of SHA-1 for git. I know the decision was
> made and it is not my intention to change it, but please see below
> some comments on Dan's arguments.

When the time comes for the next hash change in Git, it will be useful
to be able to look back over this discussion.  Thanks for adding
details.

[...]
> On 30/07/2018 22:01, Dan Shumow wrote:

>> So, I also want to state my biases in favor of SHA2 as an employee
>> of Microsoft. [...] As such, and reflecting this bias, in the
>> internal discussions that Johannes alluded to, SHA2 and SHA3 were
>> the primary suggestions.  There was a slight preference for SHA2
>> because SHA3 is not exposed through the windows cryptographic APIs
>> (though Git does not use those, so this is a nonissue for this
>> discussion.)
>
> We find it cynical to bring up a Microsoft-internal argument that is
> actually not relevant to Git.

On the contrary, I am quite grateful that Dan was up front about where
his preference comes from, *especially* when the reasons are not
relevant to Git.  It is useful background for better understanding his
rationale and understanding the ramifications for some subset of
users.

In other words, consider someone active in the Git project that
disagrees with the decision to use SHA2.  This explanation by Dan can
help such a person understand where the disagreement is coming from
and whether we are making the decision for the wrong reasons (because
Git on Windows does not even use those APIs).

[...]
> 3) The relatively large state in the sponge construction increases
> the generic strength against attacks when the input contains
> redundancy or has a certain form. For instance, if the input is
> restricted to be text in ASCII (such as source code), then the
> collision-resistance grows higher than the nominal 2^{c/2}. Such an
> effect does not exist with narrow-pipe Merkle-Damgård. (This may be
> what Linus had intuitively in mind.)

Interesting.

[...]
> [2] Daniel J. Bernstein, Cost analysis of hash collisions: Will
> quantum computers make SHARCS obsolete? Workshop Record of
> SHARCS'09.

I remember that paper!  Thanks for the pointer.

Sincerely,
Jonathan


Re: Hash algorithm analysis

2018-09-18 Thread Joan Daemen
Dear all,

when going over my todo list I was confronted with the mail of Dan Shumow on 
the successor of SHA-1 for git. I know the decision was made and it is not my 
intention to change it, but please see below some comments on Dan's arguments.

In short, I argue below that SHA256 has no serious advantages when compared to 
KangarooTwelve. In that light, the fact that SHA2 was designed behind closed 
doors (like SHA-1 was) should be enough reason to skip it entirely in an 
undertaking that takes open-source seriously.

Kind regards,

Joan

PS: In my comments below I use "we" as I discussed them with the members of the 
Keccak team being Gilles Van Assche, Michaël Peeters, Guido Bertoni, Ronny Van 
Keer and Seth Hoffert, and we agree on all of them.

On 30/07/2018 22:01, Dan Shumow wrote:
> Hello all.   Johannes, thanks for adding me to this discussion.
>
> So, as one of the coauthors of the SHA-1 collision detection code, I just 
> wanted to chime in and say I'm glad to see the move to a longer hash 
> function.  Though, as a cryptographer, I have a few thoughts on the matter 
> that I thought I would share.
>
> I think that moving to SHA256 is a fine change, and I support it.
>
> I'm not anywhere near the expert in this that Joan Daeman is. 

Note that the correct spelling is "Daemen". But anyway, it is not a matter of 
being expert, but a matter of taking the arguments at face value. 

> I am someone who has worked in this space more or less peripherally.  
> However, I agree with Adam Langley that basically all of the finalists for a 
> hash function replacement are about the same for the security needs of Git.  
> I think that, for this community, other software engineering considerations 
> should be more important to the selection process.

We are also with Adam on the engineering considerations. We see the parallelism 
that K12 can exploit adaptively (unlike SHA256) as an example of such a 
consideration.

> I think Joan's survey of cryptanalysis papers and the numbers that he gives 
> are interesting, and I had never seen the comparison laid out like that.  So, 
> I think that there is a good argument to be made that SHA3 has had more 
> cryptanalysis than SHA2.  Though, Joan, are the papers that you surveyed only 
> focused on SHA2?  I'm curious if you think that the design/construction of 
> SHA2, as it can be seen as an iteration of MD5/SHA1, means that the 
> cryptanalysis papers on those constructions can be considered to apply to 
> SHA2?  

This argument works both ways, i.e., the knowledge and experience of the 
symmetric cryptography community in general has also contributed to our choices 
in Keccak and in K12 (including the experience gained by Rijndael/AES). But in 
the end, the only objective metric we have for comparing public scrutiny is the 
amount of cryptanalysis (and analysis) published, and there Keccak simply 
scores better.

> Again, I'm not an expert in this, but I do know that Marc Steven's techniques 
> for constructing collisions also provided some small cryptanalytic 
> improvements against the SHA2 family as well.  I also think that while the 
> paper survey is a good way to look over all of this, the more time in the 
> position of high profile visibility that SHA2 has can give us some confidence 
> as well.

High profile visibility to implementers does not mean more cryptanalysis, since 
users and implementers are usually not cryptanalysts. Actually, one of the 
reasons that SHA2 attracted much less cryptanalysis than you would expect due 
to its age is that during the SHA3 competition all cryptanalysts pointed their 
arrows to SHA3 candidates.

> Also something worth pointing out is that the connection SHA2 has to SHA1 
> means that if Marc Steven's cryptanalysis of MD5/SHA-1 were ever successfully 
> applied to SHA2, the SHA1 collision detection approach could be applied there 
> as well, thus providing a drop in replacement in that situation.  That said, 
> we don't know that there is not a similar way of addressing issues with the 
> SHA3/Sponge design.  It's just that because we haven't seen any weaknesses of 
> this sort in similar designs, we just don't know what a similar approach 
> would be there yet.  I don't want to put too much stock in this argument, 
> it's just saying "Well, we already know how SHA2 is likely to break, and 
> we've had fixes for similar things in the past."  This is pragmatic but not 
> inspiring or confidence building.
>
> So, I also want to state my biases in favor of SHA2 as an employee of 
> Microsoft.  Microsoft, being a corporation headquartered in a America, with 
> the US Gov't as a major customer definitely prefers to defer to the US Gov't 
> NIST standardization process.  And from that perspective SHA2 or SHA3 would 
> be good choices.  I, personally, think that the NIST process is the best we 
> have.  It is relatively transparent, and NIST employs a fair number of very 
> competent cryptographers.  Also, I am encouraged 

Re: Hash algorithm analysis

2018-08-02 Thread Jonathan Nieder
Hi Dan,

Dan Shumow wrote:

[replying out of order for convenience]
> However, I agree with Adam Langley that basically all of the
> finalists for a hash function replacement are about the same for the
> security needs of Git.  I think that, for this community, other
> software engineering considerations should be more important to the
> selection process.

Thanks for this clarification, which provides some useful context to
your opinion that was previously relayed by Dscho.

[...]
> So, as one of the coauthors of the SHA-1 collision detection code, I
> just wanted to chime in and say I'm glad to see the move to a longer
> hash function.  Though, as a cryptographer, I have a few thoughts on
> the matter that I thought I would share.
>
> I think that moving to SHA256 is a fine change, and I support it.

More generally, thanks for weighing in and for explaining your
rationale.  Even (especially) having already made the decision, it's
comforting to hear a qualified person endorsing that choice.

Sincerely,
Jonathan


RE: Hash algorithm analysis

2018-07-30 Thread Dan Shumow
Hello all.   Johannes, thanks for adding me to this discussion.

So, as one of the coauthors of the SHA-1 collision detection code, I just 
wanted to chime in and say I'm glad to see the move to a longer hash function.  
Though, as a cryptographer, I have a few thoughts on the matter that I thought 
I would share.

I think that moving to SHA256 is a fine change, and I support it.

I'm not anywhere near the expert in this that Joan Daeman is.  I am someone who 
has worked in this space more or less peripherally.  However, I agree with Adam 
Langley that basically all of the finalists for a hash function replacement are 
about the same for the security needs of Git.  I think that, for this 
community, other software engineering considerations should be more important 
to the selection process.

I think Joan's survey of cryptanalysis papers and the numbers that he gives are 
interesting, and I had never seen the comparison laid out like that.  So, I 
think that there is a good argument to be made that SHA3 has had more 
cryptanalysis than SHA2.  Though, Joan, are the papers that you surveyed only 
focused on SHA2?  I'm curious if you think that the design/construction of 
SHA2, as it can be seen as an iteration of MD5/SHA1, means that the 
cryptanalysis papers on those constructions can be considered to apply to SHA2? 
 Again, I'm not an expert in this, but I do know that Marc Steven's techniques 
for constructing collisions also provided some small cryptanalytic improvements 
against the SHA2 family as well.  I also think that while the paper survey is a 
good way to look over all of this, the more time in the position of high 
profile visibility that SHA2 has can give us some confidence as well.

Also something worth pointing out is that the connection SHA2 has to SHA1 means 
that if Marc Steven's cryptanalysis of MD5/SHA-1 were ever successfully applied 
to SHA2, the SHA1 collision detection approach could be applied there as well, 
thus providing a drop in replacement in that situation.  That said, we don't 
know that there is not a similar way of addressing issues with the SHA3/Sponge 
design.  It's just that because we haven't seen any weaknesses of this sort in 
similar designs, we just don't know what a similar approach would be there yet. 
 I don't want to put too much stock in this argument, it's just saying "Well, 
we already know how SHA2 is likely to break, and we've had fixes for similar 
things in the past."  This is pragmatic but not inspiring or confidence 
building.

So, I also want to state my biases in favor of SHA2 as an employee of 
Microsoft.  Microsoft, being a corporation headquartered in a America, with the 
US Gov't as a major customer definitely prefers to defer to the US Gov't NIST 
standardization process.  And from that perspective SHA2 or SHA3 would be good 
choices.  I, personally, think that the NIST process is the best we have.  It 
is relatively transparent, and NIST employs a fair number of very competent 
cryptographers.  Also, I am encouraged by the widespread international 
participation that the NIST competitions and selection processes attract.

As such, and reflecting this bias, in the internal discussions that Johannes 
alluded to, SHA2 and SHA3 were the primary suggestions.  There was a slight 
preference for SHA2 because SHA3 is not exposed through the windows 
cryptographic APIs (though Git does not use those, so this is a nonissue for 
this discussion.)

I also wanted to thank Johannes for keeping the cryptographers that he 
discussed this with anonymous.  After all, cryptographers are known for being 
private.  And I wanted to say that Johannes did, in fact, accurately represent 
our internal discussions on the matter.

I also wanted to comment on the discussion of the "internal state having the 
same size as the output."  Linus referred to this several times.  This is known 
as narrow-pipe vs wide-pipe in the hash function design literature.  Linus is 
correct that wide-pipe designs are more in favor currently, and IIRC, all of 
the serious SHA3 candidates employed this.  That said, it did seem that in the 
discussion this was being equated with "length extension attacks."  And that 
connection is just not accurate.  Length extension attacks are primarily a 
motivation of the HMAC liked nested hashing design for MACs, because of a 
potential forgery attack.  Again, this doesn't really matter because the 
decision has been made despite this discussion.  I just wanted to set the 
record straight about this, as to avoid doing the right thing for the wrong 
reason (T.S. Elliot's "greatest treason.")

One other thing that I wanted to throw out there for the future is that in the 
crypto community there is currently a very large push to post quantum 
cryptography.  Whether the threat of quantum computers is real or imagined this 
is a hot area of research, with a NIST competition to select post quantum 
asymmetric cryptographic algorithms.  That is not directly of 

Re: Hash algorithm analysis

2018-07-30 Thread Johannes Schindelin
Hi Brian,

On Tue, 24 Jul 2018, brian m. carlson wrote:

> On Tue, Jul 24, 2018 at 02:13:07PM -0700, Junio C Hamano wrote:
> > Yup.  I actually was leaning toward saying "all of them are OK in
> > practice, so the person who is actually spear-heading the work gets to
> > choose", but if we picked SHA-256 now, that would not be a choice that
> > Brian has to later justify for choosing against everybody else's
> > wishes, which makes it the best choice ;-)
> 
> This looks like a rough consensus.

As I grew really uncomfortable with having a decision that seems to be
based on hunches by non-experts (we rejected the preference of the only
cryptographer who weighed in, after all, precisely like we did over a
decade ago), I asked whether I could loop in one of our in-house experts
with this public discussion.

Y'all should be quite familiar with his work: Dan Shumow.

Dan, thank you for agreeing to chime in publicly.

Ciao,
Dscho


Re: Hash algorithm analysis

2018-07-26 Thread Johannes Schindelin
Hi Joan,

On Sun, 22 Jul 2018, Joan Daemen wrote:

> I wanted to react to some statements I read in this discussion. But
> first let me introduce myself. I'm Joan Daemen and I'm working in
> symmetric cryptography since 1988. Vincent Rijmen and I designed
> Rijndael that was selected to become AES and Guido Bertoni, Michael
> Peeters and Gilles Van Assche and I (the Keccak team, later extended
> with Ronny Van Keer) designed Keccak that was selected to become SHA3.
> Of course as a member of the Keccak team I'm biased in this discussion
> but I'll try to keep it factual.

Thank you *so* much for giving your valuable time and expertise on this
subject.

I really would hate for the decision to be made due to opinions of people
who are overconfident in their abilities to judge cryptographic matters
despite clearly being out of their league (which includes me, I want to
add specifically).

On a personal note: back in the day, I have been following the Keccak with
a lot of interest, being intrigued by the deliberate deviation from the
standard primitives, and I am pretty much giddy about the fact that I am
talking to you right now.

> [... interesting, and thorough background information ...]
> 
> Anyway, these numbers support the opinion that the safety margins taken
> in K12 are better understood than those in SHA-256, SHA-512 and BLAKE2.

This is very, very useful information in my mind.

> Adam Langley continues:
> 
>   Thus I think the question is primarily concerned with performance and 
> implementation availability
> 
> 
> Table 2 in our ACNS paper on K12 (available at
> https://eprint.iacr.org/2016/770) shows that performance of K12 is quite
> competitive. Moreover, there is a lot of code available under CC0
> license in the Keccak Code Package on github
> https://github.com/gvanas/KeccakCodePackage. If there is shortage of
> code for some platforms in the short term, we will be happy to work on that.
> 
> In the long term, it is likely that the relative advantage of K12 will
> increase as it has more potential for hardware acceleration, e.g., by
> instruction set extension. This is thanks to the fact that it does not
> use addition, as opposed to so-called addition-xor-rotation (ARX)
> designs such as the SHA-2 and BLAKE2 families. This is already
> illustrated in our Table 2 I referred to above, in the transition from
> Skylake to SkylakeX.

I *really* hope that more accessible hardware acceleration for this
materializes at some stage. And by "more accessible", I mean commodity
hardware such as ARM or AMD/Intel processors: big hosters could relatively
easily develop appropriate FPGAs (we already do this for AI, after all).

> Maybe also interesting for this discussion are the two notes we (Keccak
> team) wrote on our choice to not go for ARX and the one on "open source
> crypto" at https://keccak.team/2017/not_arx.html and
> https://keccak.team/2017/open_source_crypto.html respectively.

I had read those posts when they came out, and still find them insightful.
Hopefully other readers of this mailing list will spend the time to read
them, too.

Again, thank you so much for a well-timed dose of domain expertise in
this thread.

Ciao,
Dscho


Re: Hash algorithm analysis

2018-07-26 Thread Johannes Schindelin
Hi Eric,

On Sun, 22 Jul 2018, Eric Deplagne wrote:

> On Sun, 22 Jul 2018 14:21:48 +, brian m. carlson wrote:
> > On Sun, Jul 22, 2018 at 11:34:42AM +0200, Eric Deplagne wrote:
> > > On Sat, 21 Jul 2018 23:59:41 +, brian m. carlson wrote:
> > > > I don't know your colleagues, and they haven't commented here.  One
> > > > person that has commented here is Adam Langley.  It is my impression
> > > > (and anyone is free to correct me if I'm incorrect) that he is indeed a
> > > > cryptographer.  To quote him[0]:
> > > > 
> > > >   I think this group can safely assume that SHA-256, SHA-512, BLAKE2,
> > > >   K12, etc are all secure to the extent that I don't believe that making
> > > >   comparisons between them on that axis is meaningful. Thus I think the
> > > >   question is primarily concerned with performance and implementation
> > > >   availability.
> > > > 
> > > >   […]
> > > > 
> > > >   So, overall, none of these choices should obviously be excluded. The
> > > >   considerations at this point are not cryptographic and the tradeoff
> > > >   between implementation ease and performance is one that the git
> > > >   community would have to make.
> > > 
> > >   Am I completely out of the game, or the statement that
> > > "the considerations at this point are not cryptographic"
> > >   is just the wrongest ?
> > > 
> > >   I mean, if that was true, would we not be sticking to SHA1 ?
> > 
> > I snipped a portion of the context, but AGL was referring to the
> > considerations involved in choosing from the proposed ones for NewHash.
> > In context, he meant that the candidates for NewHash “are all secure”
> > and are therefore a better choice than SHA-1.
> 
>   Maybe a little bit sensitive, but I really did read
> "we don't care if it's weak or strong, that's not the matter".

Thank you for your concern. I agree that we need to be careful in
considering the security implications. We made that mistake before (IIRC
there was a cryptographer who was essentially shouted off the list when he
suggested *not* to hard-code SHA-1), and we should absolutely refrain from
making that same mistake again.

> > I think we can all agree that SHA-1 is weak and should be replaced.

Indeed.

So at this point, we already excluded pretty much all the unsafe options
(although it does concern me that BLAKE2b has been weakened purposefully,
I understand the reasoning, but still).

Which means that by now, considering the security implications of the
cipher is no longer a criterion that helps us whittle down the candidates
further.

So from my point of view, there are two criterions that can help us
further:

- Which cipher is the least likely to be broken (or just weakened by new
  attacks)?

- As energy considerations not only ecologically inspired, but also in
  terms of money for elecricity: which cipher is most likely to get decent
  hardware support any time soon?

Even if my original degree (prime number theory) is closer to
cryptanalysis than pretty much all other prolific core Git contributors, I
do not want you to trust *my* word on answering those questions.

Therefore, I will ask my colleagues to enter the hornet's nest that is
this mailing list.

Ciao,
Dscho

Re: Hash algorithm analysis

2018-07-24 Thread brian m. carlson
On Tue, Jul 24, 2018 at 02:13:07PM -0700, Junio C Hamano wrote:
> Yup.  I actually was leaning toward saying "all of them are OK in
> practice, so the person who is actually spear-heading the work gets
> to choose", but if we picked SHA-256 now, that would not be a choice
> that Brian has to later justify for choosing against everybody
> else's wishes, which makes it the best choice ;-)

This looks like a rough consensus.  And fortunately, I was going to pick
SHA-256 and implemented it over the weekend.

Things I thought about in this regard:
* When you compare against SHA1DC, most vectorized SHA-256
  implementations are indeed faster, even without acceleration.
* If we're doing signatures with OpenPGP (or even, I suppose, CMS),
  we're going to be using SHA-2, so it doesn't make sense to have our
  security depend on two separate algorithms when either one of them
  alone could break the security when we could just depend on one.

I'll be sending out some patches, probably in a few days, with SHA-256
and some test fixes.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: Hash algorithm analysis

2018-07-24 Thread Junio C Hamano
Linus Torvalds  writes:

> On Tue, Jul 24, 2018 at 12:01 PM Edward Thomson
>  wrote:
>>
>> Switching gears, if I look at this from the perspective of the libgit2
>> project, I would also prefer SHA-256 or SHA3 over blake2b.  To support
>> blake2b, we'd have to include - and support - that code ourselves.  But
>> to support SHA-256, we would simply use the system's crypto libraries
>> that we already take a dependecy on (OpenSSL, mbedTLS, CryptoNG, or
>> SecureTransport).
>
> I think this is probably the single strongest argument for sha256.
> "It's just there".

Yup.  I actually was leaning toward saying "all of them are OK in
practice, so the person who is actually spear-heading the work gets
to choose", but if we picked SHA-256 now, that would not be a choice
that Brian has to later justify for choosing against everybody
else's wishes, which makes it the best choice ;-)



Re: Hash algorithm analysis

2018-07-24 Thread Jonathan Nieder
Hi,

Linus Torvalds wrote:
> On Tue, Jul 24, 2018 at 12:01 PM Edward Thomson
>  wrote:

>> Switching gears, if I look at this from the perspective of the libgit2
>> project, I would also prefer SHA-256 or SHA3 over blake2b.  To support
>> blake2b, we'd have to include - and support - that code ourselves.  But
>> to support SHA-256, we would simply use the system's crypto libraries
>> that we already take a dependecy on (OpenSSL, mbedTLS, CryptoNG, or
>> SecureTransport).

Just to be clear, OpenSSL has built-in blake2b support.

[...]
> So I'm not a huge fan of sha256, partly because of my disappointment
> in lack of hw acceleration in releant markets (sure, it's fairly
> common in ARM, but nobody sane uses ARM for development because of
> _other_ reasons). And partly because I don't like how the internal
> data size is the same as the final hash. But that second issue is an
> annoyance with it, not a real issue - in the absence of weaknesses
> it's a non-issue, and any future weaknesses might affect any other
> choice too.

Thanks for saying this.  With this in mind, I think we have a clear
way forward: we should use SHA-256.

My main complaint about it is that it is not a tree hash, but the
common availability in libraries trumps that (versus SHA-256x16, say).
I also was excited about K12, both because I like a world where Keccak
gets wide hardware accelaration (improving PRNGs and other
applications) and because of Keccak team's helpfulness throughout the
process of helping us evaluate this, and it's possible that some day
in the future we may want to switch to something like it.  But today,
as mentioned in [1] and [2], there is value in settling on one
standard and SHA2-256 is the obvious standard today.

Thanks,
Jonathan

[1] 
https://public-inbox.org/git/cal9pxlynvlccqv1ftra3r4kuoamdzof29hjehv2jxrbhj1n...@mail.gmail.com/
[2] 
https://public-inbox.org/git/20180723183523.gb9...@aiede.svl.corp.google.com/


Re: Hash algorithm analysis

2018-07-24 Thread Linus Torvalds
On Tue, Jul 24, 2018 at 12:01 PM Edward Thomson
 wrote:
>
> Switching gears, if I look at this from the perspective of the libgit2
> project, I would also prefer SHA-256 or SHA3 over blake2b.  To support
> blake2b, we'd have to include - and support - that code ourselves.  But
> to support SHA-256, we would simply use the system's crypto libraries
> that we already take a dependecy on (OpenSSL, mbedTLS, CryptoNG, or
> SecureTransport).

I think this is probably the single strongest argument for sha256.
"It's just there".

The hardware acceleration hasn't become nearly as ubiquitous as I
would have hoped, and honestly, sha256 _needs_ hw acceleration more
than some of the alternatives in the first place.

But sha256 does have the big advantage of just having been around and
existing in pretty much every single crypto library.

So I'm not a huge fan of sha256, partly because of my disappointment
in lack of hw acceleration in releant markets (sure, it's fairly
common in ARM, but nobody sane uses ARM for development because of
_other_ reasons). And partly because I don't like how the internal
data size is the same as the final hash. But that second issue is an
annoyance with it, not a real issue - in the absence of weaknesses
it's a non-issue, and any future weaknesses might affect any other
choice too.

So hey, if people are actually at the point where the lack of choice
holds up development, we should just pick one. And despite what I've
said in this discussion, sha256 would have been my first choice, just
because it's the "obvious" choice. The exact same way that SHA1 was
the obvious choice (for pretty much the same infrastructure reasons)
back in 2005.

And maybe the hw acceleration landscape will actually improve. I think
AMD actually does do the SHA extensions in Zen/TR.

So I think Junio should just pick one. And I'll stand up and say

 "Let's just pick one.

  And sha256 is certainly the safe choice in that it won't strike
  anybody as being the _wrong_ choice per se, even if not everybody will
  necessarily agree it's the _bext_ choice".

but in the end I think Junio should be the final arbiter. I think all
of the discussed choices are perfectly fine in practice.

Btw, the one thing I *would* suggest is that the git community just
also says that the current hash is not SHA1, but SHA1DC.

Support for "plain" SHA1 should be removed entirely. If we add a lot
of new infrastructure to support a new more secure hash, we should not
have the old fallback for the known-weak one. Just make SHA1DC the
only one git can be built with.

   Linus


Re: Hash algorithm analysis

2018-07-24 Thread Edward Thomson
On Fri, Jul 20, 2018 at 09:52:20PM +, brian m. carlson wrote:
> 
> To summarize the discussion that's been had in addition to the above,
> Ævar has also stated a preference for SHA-256 and I would prefer BLAKE2b
> over SHA-256 over SHA3-256, although any of them would be fine.
> 
> Are there other contributors who have a strong opinion?  Are there
> things I can do to help us coalesce around an option?

Overall, I prefer SHA-256.

I mentioned this at the contributor summit - so this may have been
captured in the notes.  But if not, when I look at this from the
perspective of my day job at Notorious Big Software Company, we would
prefer SHA-256 due to its performance characteristics and the
availability of hardware acceleration.  We think about git object ids
in a few different ways:

Obviously we use git as a version control system - we have a significant
investment in hosting repositories (for both internal Microsoft teams
and our external customers).  What may be less obvious is that often,
git blob ids are used as fingerprints: on a typical Windows machine,
you don't have the command-line hash functions (md5sum and friends),
but every developer has git installed.  So we end up calculating git
object ids in places within the development pipeline that are beyond the
scope of just version control.  Not to dwell too much on implementation
details, but this is especially advantageous for us in (say) labs where
we can ensure that particular hardware is available to speed this up as
necessary.

Switching gears, if I look at this from the perspective of the libgit2
project, I would also prefer SHA-256 or SHA3 over blake2b.  To support
blake2b, we'd have to include - and support - that code ourselves.  But
to support SHA-256, we would simply use the system's crypto libraries
that we already take a dependecy on (OpenSSL, mbedTLS, CryptoNG, or
SecureTransport).  All of those support SHA-256 and none of them include
support for blake2b.  That means if there's a problem with (say)
OpenSSL's SHA-256 implementation, then it will be fixed by their vendor.
If there's a problem with libb2, then that's now my responsibility.

This is not to suggest that one library is of higher or lower quality
than another.  And surely we would try to use the same blake2b library
that git itself is using to minimize some of this risk (so that at least
we're all in the same boat and can leverage each other's communications
to users) but even then, there will be inevitable drift between our
vendored dependencies and the upstream code.  You can see this in action
in xdiff: git's xdiff has deviated from upstream, and libgit2 has taken
git's and ours has deviated from that.

Cheers-
-ed


Re: Hash algorithm analysis

2018-07-23 Thread Jonathan Nieder
Hi Yves,

demerphq wrote:
> On Sun, 22 Jul 2018 at 01:59, brian m. carlson
>  wrote:

>> I will admit that I don't love making this decision by myself, because
>> right now, whatever I pick, somebody is going to be unhappy.
[...]
> I do not envy you this decision.
>
> Personally I would aim towards pushing this decision out to the git
> user base and facilitating things so we can choose whatever hash
> function (and config) we wish, including ones not invented yet.

There are two separate pieces to this.

One is configurability at compile time.  So far that has definitely
been a goal, because we want to be ready to start the transition to
another hash, and quickly, as soon as the new hash is discovered to be
weak.  This also means that people can experiment with new hashes and
in a controlled environment (where the users can afford to build from
source), some users might prefer some bespoke hash for reasons only
known to them. ;-)

Another piece is configurability at run time.  This is a harder sell
because it has some negative effects in the ecosystem:

 - performance impact from users having to maintain a translation table
   between the different hash functions in use

 - security impact, in the form of downgrade attacks

 - dependency bloat, from Git having to be able to compute all hash
   functions permitted in that run-time configuration

The security impact can be mitigated by keeping the list of supported
hashes small (i.e. two or three instead of 10ish).  Each additional
hash function is a potential liability (just as in SSL), so they have
to earn their keep.

The performance impact is unavoidable if we encourage Git servers to
pick their favorite hash function instead of making a decision in the
project.  This can in turn affect security, since it would increase
the switching cost away from SHA-1, with the likely effect being that
most users stay on SHA-1.  I don't want to go there.

So I would say, support for arbitrary hash functions at compile time
and in file formats is important and I encourage you to hold us to
that (when reviewing patches, etc).  But in the standard Git build
configuration that most people run, I believe it is best to support
only SHA-1 + our chosen replacement hash.

Thanks,
Jonathan


Re: Hash algorithm analysis

2018-07-23 Thread Linus Torvalds
On Mon, Jul 23, 2018 at 5:48 AM Sitaram Chamarty  wrote:
>
> I would suggest (a) hash size of 256 bits and (b) choice of any hash
> function that can produce such a hash.  If people feel strongly that 256
> bits may also turn out to be too small (really?) then a choice of 256 or
> 512, but not arbitrary sizes.

Honestly, what's the expected point of 512-bit hashes?

The _only_ point of a 512-bit hash is that it's going to grow objects
in incompressible ways, and use more memory. Just don't do it.

If somebody can break a 256-bit hash, you have two choices:

 (a) the hash function itself was broken, and 512 bits isn't the
solution to it anyway, even if it can certainly hide the problem

 (b) you had some "new math" kind of unexpected breakthrough, which
means that 512 bits might not be much  better either.

Honestly, the number of particles in the observable universe is on the
order of 2**256. It's a really really big number.

Don't make the code base more complex than it needs to be. Make a
informed technical decision, and say "256 bits is a *lot*".

The difference between engineering and theory is that engineering
makes trade-offs. Good software is well *engineered*, not theorized.

Also, I would suggest that git default to "abbrev-commit=40", so that
nobody actually *sees* the new bits by default. So the perl scripts
etc that use "[0-9a-f]{40}" as a hash pattern would just silently
continue to work.

Because backwards compatibility is important (*)

 Linus

(*) And 2**160 is still a big big number, and hasn't really been a
practical problem, and SHA1DC is likely a good hash for the next
decade or longer.


Re: Hash algorithm analysis

2018-07-23 Thread Stefan Beller
On Mon, Jul 23, 2018 at 5:41 AM demerphq  wrote:
>
> On Sun, 22 Jul 2018 at 01:59, brian m. carlson
>  wrote:
> > I will admit that I don't love making this decision by myself, because
> > right now, whatever I pick, somebody is going to be unhappy.  I want to
> > state, unambiguously, that I'm trying to make a decision that is in the
> > interests of the Git Project, the community, and our users.
> >
> > I'm happy to wait a few more days to see if a consensus develops; if so,
> > I'll follow it.  If we haven't come to one by, say, Wednesday, I'll make
> > a decision and write my patches accordingly.  The community is free, as
> > always, to reject my patches if taking them is not in the interest of
> > the project.
>
> Hi Brian.
>
> I do not envy you this decision.
>
> Personally I would aim towards pushing this decision out to the git
> user base and facilitating things so we can choose whatever hash
> function (and config) we wish, including ones not invented yet.

By Git user base you actually mean millions of people?
(And they'll have different opinions and needs)

One of the goals of the hash transition is to pick a hash function
such that git repositories are compatible.
If users were to pick their own hashes, we would need to not just give
a SHA-1 ->  transition plan, but we'd have to make sure the
full matrix of possible hashes is interchangeable as we have no idea
of what the user would think of "safer". For example one server operator
might decide to settle on SHA2 and another would settle on blake2,
whereas a user that uses both servers as remotes settles with k12.

Then there would be a whole lot of conversion going on (you cannot talk
natively to a remote with a different hash; checking pgp signatures is
also harder as you have an abstraction layer in between).

I would rather just have the discussion now and then provide only one
conversion tool which might be easy to adapt, but after the majority
converted, it is rather left to bitrot instead of needing to support ongoing
conversions.

On the other hand, even if we'd provide a "different hashes are fine"
solution, I would think the network effect would make sure that eventually
most people end up with one hash.

One example of using different hashes successfully are transports,
like TLS, SSH. The difference there is that it is a point-to-point communication
whereas a git repository needs to be read by many parties involved; also
a communication over TLS/SSH is ephemeral unlike objects in Git.

> Failing that I would aim towards a hashing strategy which has the most
> flexibility. Keccak for instance has the interesting property that its
> security level is tunable, and that it can produce aribitrarily long
> hashes.  Leaving aside other concerns raised elsewhere in this thread,
> these two features alone seem to make it a superior choice for an
> initial implementation. You can find bugs by selecting unusual hash
> sizes, including very long ones, and you can provide ways to tune the
> function to peoples security and speed preferences.  Someone really
> paranoid can specify an unusually large round count and a very long
> hash.

I do not object to this in theory, but I would rather not want to burden
the need to write code for this on the community.

> I am not a cryptographer.

Same here. My personal preference would be blake2b
as that is the fastest IIRC.

Re-reading brians initial mail, I think we should settle on
SHA-256, as that is a conservative choice for security and
the winner in HW accelerated setups, and not to shabby in
a software implementation; it is also widely available.

Stefan


Re: Hash algorithm analysis

2018-07-23 Thread demerphq
On Mon, 23 Jul 2018 at 14:48, Sitaram Chamarty  wrote:
> On 07/23/2018 06:10 PM, demerphq wrote:
> > On Sun, 22 Jul 2018 at 01:59, brian m. carlson
> >  wrote:
> >> I will admit that I don't love making this decision by myself, because
> >> right now, whatever I pick, somebody is going to be unhappy.  I want to
> >> state, unambiguously, that I'm trying to make a decision that is in the
> >> interests of the Git Project, the community, and our users.
> >>
> >> I'm happy to wait a few more days to see if a consensus develops; if so,
> >> I'll follow it.  If we haven't come to one by, say, Wednesday, I'll make
> >> a decision and write my patches accordingly.  The community is free, as
> >> always, to reject my patches if taking them is not in the interest of
> >> the project.
> >
> > Hi Brian.
> >
> > I do not envy you this decision.
> >
> > Personally I would aim towards pushing this decision out to the git
> > user base and facilitating things so we can choose whatever hash
> > function (and config) we wish, including ones not invented yet.
> >
> > Failing that I would aim towards a hashing strategy which has the most
> > flexibility. Keccak for instance has the interesting property that its
> > security level is tunable, and that it can produce aribitrarily long
> > hashes.  Leaving aside other concerns raised elsewhere in this thread,
> > these two features alone seem to make it a superior choice for an
> > initial implementation. You can find bugs by selecting unusual hash
> > sizes, including very long ones, and you can provide ways to tune the
> > function to peoples security and speed preferences.  Someone really
> > paranoid can specify an unusually large round count and a very long
> > hash.
> >
> > Also frankly I keep thinking that the ability to arbitrarily extend
> > the hash size has to be useful /somewhere/ in git.
>
> I would not suggest arbitrarily long hashes.  Not only would it
> complicate a lot of code, it is not clear that it has any real benefit.

It has the benefit of armoring the code for the *next* hash change,
and making it clear that such decisions are arbitrary and should not
be depended on.

> Plus, the code contortions required to support arbitrarily long hashes
> would be more susceptible to potential bugs and exploits, simply by
> being more complex code.  Why take chances?

I think the benefits would outweight the risks.

> I would suggest (a) hash size of 256 bits and (b) choice of any hash
> function that can produce such a hash.  If people feel strongly that 256
> bits may also turn out to be too small (really?) then a choice of 256 or
> 512, but not arbitrary sizes.

I am aware of too many systems that cannot change their size and are
locked into woefully bad decisions that were made long ago to buy
this.

Making it a per-repo option, would eliminate assumptions and make for
a more secure and flexible tool.

Anyway, I am not going to do the work so my opinion is worth the price
of the paper I sent it on. :-)

cheers,
Yves

-- 
perl -Mre=debug -e "/just|another|perl|hacker/"


Re: Hash algorithm analysis

2018-07-23 Thread Sitaram Chamarty



On 07/23/2018 06:10 PM, demerphq wrote:
> On Sun, 22 Jul 2018 at 01:59, brian m. carlson
>  wrote:
>> I will admit that I don't love making this decision by myself, because
>> right now, whatever I pick, somebody is going to be unhappy.  I want to
>> state, unambiguously, that I'm trying to make a decision that is in the
>> interests of the Git Project, the community, and our users.
>>
>> I'm happy to wait a few more days to see if a consensus develops; if so,
>> I'll follow it.  If we haven't come to one by, say, Wednesday, I'll make
>> a decision and write my patches accordingly.  The community is free, as
>> always, to reject my patches if taking them is not in the interest of
>> the project.
> 
> Hi Brian.
> 
> I do not envy you this decision.
> 
> Personally I would aim towards pushing this decision out to the git
> user base and facilitating things so we can choose whatever hash
> function (and config) we wish, including ones not invented yet.
> 
> Failing that I would aim towards a hashing strategy which has the most
> flexibility. Keccak for instance has the interesting property that its
> security level is tunable, and that it can produce aribitrarily long
> hashes.  Leaving aside other concerns raised elsewhere in this thread,
> these two features alone seem to make it a superior choice for an
> initial implementation. You can find bugs by selecting unusual hash
> sizes, including very long ones, and you can provide ways to tune the
> function to peoples security and speed preferences.  Someone really
> paranoid can specify an unusually large round count and a very long
> hash.
> 
> Also frankly I keep thinking that the ability to arbitrarily extend
> the hash size has to be useful /somewhere/ in git.

I would not suggest arbitrarily long hashes.  Not only would it
complicate a lot of code, it is not clear that it has any real benefit.

Plus, the code contortions required to support arbitrarily long hashes
would be more susceptible to potential bugs and exploits, simply by
being more complex code.  Why take chances?

I would suggest (a) hash size of 256 bits and (b) choice of any hash
function that can produce such a hash.  If people feel strongly that 256
bits may also turn out to be too small (really?) then a choice of 256 or
512, but not arbitrary sizes.

Sitaram
also not a cryptographer!


Re: Hash algorithm analysis

2018-07-23 Thread demerphq
On Sun, 22 Jul 2018 at 01:59, brian m. carlson
 wrote:
> I will admit that I don't love making this decision by myself, because
> right now, whatever I pick, somebody is going to be unhappy.  I want to
> state, unambiguously, that I'm trying to make a decision that is in the
> interests of the Git Project, the community, and our users.
>
> I'm happy to wait a few more days to see if a consensus develops; if so,
> I'll follow it.  If we haven't come to one by, say, Wednesday, I'll make
> a decision and write my patches accordingly.  The community is free, as
> always, to reject my patches if taking them is not in the interest of
> the project.

Hi Brian.

I do not envy you this decision.

Personally I would aim towards pushing this decision out to the git
user base and facilitating things so we can choose whatever hash
function (and config) we wish, including ones not invented yet.

Failing that I would aim towards a hashing strategy which has the most
flexibility. Keccak for instance has the interesting property that its
security level is tunable, and that it can produce aribitrarily long
hashes.  Leaving aside other concerns raised elsewhere in this thread,
these two features alone seem to make it a superior choice for an
initial implementation. You can find bugs by selecting unusual hash
sizes, including very long ones, and you can provide ways to tune the
function to peoples security and speed preferences.  Someone really
paranoid can specify an unusually large round count and a very long
hash.

Also frankly I keep thinking that the ability to arbitrarily extend
the hash size has to be useful /somewhere/ in git.

cheers,
Yves
I am not a cryptographer.
-- 
perl -Mre=debug -e "/just|another|perl|hacker/"


Re: Hash algorithm analysis

2018-07-22 Thread Adam Langley
Somewhere upthread, Brian refers to me as a cryptographer. That's
flattering (thank you), but probably not really true even on a good
day. And certainly not true next to Joan Daemen. I do have experience
with crypto at scale and in ecosystems, though.

Joan's count of cryptanalysis papers is a reasonable way to try and
bring some quantitative clarity to an otherwise subjective topic. But
still, despite lacking any counterpoint to it, I find myself believing
that practical concerns are a stronger differentiater here.

But the world is in a position where a new, common hash function might
crystalise, and git could be the start of that. What that means for
the ecosystem is is that numerous libraries need to grow
implementations optimised for 3+ platforms and those platforms (esp
Intel) often need multiple versions (e.g. for different vector widths)
with code-size concerns pushing back at the same time. Intrinsics
still don't cut it, so that means hand-assembly and thus dealing with
gas vs Windows, CFI metadata, etc. Licensing differences mean that
code-sharing doesn't work nearly as well as one might hope.

Then complexity spreads upwards as testing matrices expand with the
combination of each signature algorithm with the new hash function,
options in numerous protocols etc.

In short, picking just one would be lovely.

For that reason, I've held back from SHA3 (which I consider distinct
from K12) because I didn't feel that it relieved enough pressure:
people who wanted more performance weren't going to be satisfied.
Other than that, I don't have strong feelings and, to be clear, K12
seems like a fine option.

But it does seem that a) there is probably not any more information to
discover that is going to alter your decision and b) waiting a short
to medium amount of time is probably not going to bring any definitive
developments either.


Cheers

AGL


Re: Hash algorithm analysis

2018-07-22 Thread Joan Daemen
Dear all,

I wanted to react to some statements I read in this discussion. But
first let me introduce myself. I'm Joan Daemen and I'm working in
symmetric cryptography since 1988. Vincent Rijmen and I designed
Rijndael that was selected to become AES and Guido Bertoni, Michael
Peeters and Gilles Van Assche and I (the Keccak team, later extended
with Ronny Van Keer) designed Keccak that was selected to become SHA3.
Of course as a member of the Keccak team I'm biased in this discussion
but I'll try to keep it factual.

Adam Langley says:

  I think this group can safely assume that SHA-256, SHA-512, BLAKE2, K12, etc 
are all secure to the extent that I don't believe that making
  comparisons between them on that axis is meaningful.

If never any cryptographic algorithms would be broken, this would be
true. Actually, one can manage the risk by going for cryptographic
algorithms with higher security assurance. In symmetric crypto one
compares security assurance of cryptographic algorithms by the amount of
third-party cryptanalysis, and a good indication of that is the number
of peer-reviewed papers published.

People tend to believe that the SHA2 functions have received more
third-party cryptanalysis than Keccak, but this is not supported by the
facts. We recently did a count of number of cryptanalysis papers for
different hash functions and found the following:

- Keccak: 35 third-party cryptanalysis papers dealing with the
permutation underlying Keccak, most of them at venues with peer review
(see https://keccak.team/third_party.html) This cryptanalysis carries
over to K12 as it is a tree hashing mode built on top of a reduced-round
Keccak variant.

- SHA-256 and SHA-512 together: we found 21 third-party cryptanalysis
papers dealing with the compression functions of SHA-256 or SHA-512.

- BLAKE2: the BLAKE2 webpage blake2.net lists 4 third-party
cryptanalysis papers. There are also a handful of cryptanalysis papers
on its predecessor BLAKE, but these results do not necessarily carry
over as the two compression functions in the different BLAKE2 variants
are different from the two compression functions in the different BLAKE
variants.

I was not surprised by the relatively low number of SHA-2 cryptanalysis
papers we found as during the SHA-3 competition all cryptanalysts were
focusing on SHA-3 candidates and after the competition attention shifted
to authenticated encryption.

Anyway, these numbers support the opinion that the safety margins taken
in K12 are better understood than those in SHA-256, SHA-512 and BLAKE2.

Adam Langley continues:

Thus I think the question is primarily concerned with performance and 
implementation availability


Table 2 in our ACNS paper on K12 (available at
https://eprint.iacr.org/2016/770) shows that performance of K12 is quite
competitive. Moreover, there is a lot of code available under CC0
license in the Keccak Code Package on github
https://github.com/gvanas/KeccakCodePackage. If there is shortage of
code for some platforms in the short term, we will be happy to work on that.

In the long term, it is likely that the relative advantage of K12 will
increase as it has more potential for hardware acceleration, e.g., by
instruction set extension. This is thanks to the fact that it does not
use addition, as opposed to so-called addition-xor-rotation (ARX)
designs such as the SHA-2 and BLAKE2 families. This is already
illustrated in our Table 2 I referred to above, in the transition from
Skylake to SkylakeX.

Maybe also interesting for this discussion are the two notes we (Keccak
team) wrote on our choice to not go for ARX and the one on "open source
crypto" at https://keccak.team/2017/not_arx.html and
https://keccak.team/2017/open_source_crypto.html respectively.

Kind regards,

Joan Daemen



On 22/07/2018 01:59, brian m. carlson wrote:
> On Sun, Jul 22, 2018 at 12:38:41AM +0200, Johannes Schindelin wrote:
>> Do you really want to value contributors' opinion more than
>> cryptographers'? I mean, that's exactly what got us into this hard-coded
>> SHA-1 mess in the first place.
> I agree (believe me, of all people, I agree) that hard-coding SHA-1 was
> a bad choice in retrospect.  But I've solicited contributors' opinions
> because the Git Project needs to make a decision *for this project*
> about the algorithm we're going to use going forward.
>
>> And to set the record straight: I do not have a strong preference of the
>> hash algorithm. But cryprographers I have the incredible luck to have
>> access to, by virtue of being a colleague, did mention their preference.
> I don't know your colleagues, and they haven't commented here.  One
> person that has commented here is Adam Langley.  It is my impression
> (and anyone is free to correct me if I'm incorrect) that he is indeed a
> cryptographer.  To quote him[0]:
>
>   I think this group can safely assume that SHA-256, SHA-512, BLAKE2,
>   K12, etc are all secure to the extent that I don't believe that making
>   

Re: Hash algorithm analysis

2018-07-22 Thread Eric Deplagne
On Sun, 22 Jul 2018 14:21:48 +, brian m. carlson wrote:
> On Sun, Jul 22, 2018 at 11:34:42AM +0200, Eric Deplagne wrote:
> > On Sat, 21 Jul 2018 23:59:41 +, brian m. carlson wrote:
> > > I don't know your colleagues, and they haven't commented here.  One
> > > person that has commented here is Adam Langley.  It is my impression
> > > (and anyone is free to correct me if I'm incorrect) that he is indeed a
> > > cryptographer.  To quote him[0]:
> > > 
> > >   I think this group can safely assume that SHA-256, SHA-512, BLAKE2,
> > >   K12, etc are all secure to the extent that I don't believe that making
> > >   comparisons between them on that axis is meaningful. Thus I think the
> > >   question is primarily concerned with performance and implementation
> > >   availability.
> > > 
> > >   […]
> > > 
> > >   So, overall, none of these choices should obviously be excluded. The
> > >   considerations at this point are not cryptographic and the tradeoff
> > >   between implementation ease and performance is one that the git
> > >   community would have to make.
> > 
> >   Am I completely out of the game, or the statement that
> > "the considerations at this point are not cryptographic"
> >   is just the wrongest ?
> > 
> >   I mean, if that was true, would we not be sticking to SHA1 ?
> 
> I snipped a portion of the context, but AGL was referring to the
> considerations involved in choosing from the proposed ones for NewHash.
> In context, he meant that the candidates for NewHash “are all secure”
> and are therefore a better choice than SHA-1.

  Maybe a little bit sensitive, but I really did read
"we don't care if it's weak or strong, that's not the matter".

> I think we can all agree that SHA-1 is weak and should be replaced.
> -- 
> brian m. carlson: Houston, Texas, US
> OpenPGP: https://keybase.io/bk2204



-- 
  Eric Deplagne


signature.asc
Description: Digital signature


Re: Hash algorithm analysis

2018-07-22 Thread brian m. carlson
On Sun, Jul 22, 2018 at 11:34:42AM +0200, Eric Deplagne wrote:
> On Sat, 21 Jul 2018 23:59:41 +, brian m. carlson wrote:
> > I don't know your colleagues, and they haven't commented here.  One
> > person that has commented here is Adam Langley.  It is my impression
> > (and anyone is free to correct me if I'm incorrect) that he is indeed a
> > cryptographer.  To quote him[0]:
> > 
> >   I think this group can safely assume that SHA-256, SHA-512, BLAKE2,
> >   K12, etc are all secure to the extent that I don't believe that making
> >   comparisons between them on that axis is meaningful. Thus I think the
> >   question is primarily concerned with performance and implementation
> >   availability.
> > 
> >   […]
> > 
> >   So, overall, none of these choices should obviously be excluded. The
> >   considerations at this point are not cryptographic and the tradeoff
> >   between implementation ease and performance is one that the git
> >   community would have to make.
> 
>   Am I completely out of the game, or the statement that
> "the considerations at this point are not cryptographic"
>   is just the wrongest ?
> 
>   I mean, if that was true, would we not be sticking to SHA1 ?

I snipped a portion of the context, but AGL was referring to the
considerations involved in choosing from the proposed ones for NewHash.
In context, he meant that the candidates for NewHash “are all secure”
and are therefore a better choice than SHA-1.

I think we can all agree that SHA-1 is weak and should be replaced.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: Hash algorithm analysis

2018-07-22 Thread Eric Deplagne
On Sat, 21 Jul 2018 23:59:41 +, brian m. carlson wrote:
> On Sun, Jul 22, 2018 at 12:38:41AM +0200, Johannes Schindelin wrote:
> > Do you really want to value contributors' opinion more than
> > cryptographers'? I mean, that's exactly what got us into this hard-coded
> > SHA-1 mess in the first place.
> 
> I agree (believe me, of all people, I agree) that hard-coding SHA-1 was
> a bad choice in retrospect.  But I've solicited contributors' opinions
> because the Git Project needs to make a decision *for this project*
> about the algorithm we're going to use going forward.
> 
> > And to set the record straight: I do not have a strong preference of the
> > hash algorithm. But cryprographers I have the incredible luck to have
> > access to, by virtue of being a colleague, did mention their preference.
> 
> I don't know your colleagues, and they haven't commented here.  One
> person that has commented here is Adam Langley.  It is my impression
> (and anyone is free to correct me if I'm incorrect) that he is indeed a
> cryptographer.  To quote him[0]:
> 
>   I think this group can safely assume that SHA-256, SHA-512, BLAKE2,
>   K12, etc are all secure to the extent that I don't believe that making
>   comparisons between them on that axis is meaningful. Thus I think the
>   question is primarily concerned with performance and implementation
>   availability.
> 
>   […]
> 
>   So, overall, none of these choices should obviously be excluded. The
>   considerations at this point are not cryptographic and the tradeoff
>   between implementation ease and performance is one that the git
>   community would have to make.

  Am I completely out of the game, or the statement that
"the considerations at this point are not cryptographic"
  is just the wrongest ?

  I mean, if that was true, would we not be sticking to SHA1 ?

> I'm aware that cryptographers tend to prefer algorithms that have been
> studied longer over ones that have been studied less.  They also prefer
> algorithms built in the open to ones developed behind closed doors.
> 
> SHA-256 has the benefit that it has been studied for a long time, but it
> was also designed in secret by the NSA.  SHA3-256 was created with
> significant study in the open, but is not as mature.  BLAKE2b has been
> incorporated into standards like Argon2, but has been weakened slightly
> for performance.
> 
> I'm not sure that there's a really obvious choice here.
> 
> I'm at the point where to continue the work that I'm doing, I need to
> make a decision.  I'm happy to follow the consensus if there is one, but
> it does not appear that there is.
> 
> I will admit that I don't love making this decision by myself, because
> right now, whatever I pick, somebody is going to be unhappy.  I want to
> state, unambiguously, that I'm trying to make a decision that is in the
> interests of the Git Project, the community, and our users.
> 
> I'm happy to wait a few more days to see if a consensus develops; if so,
> I'll follow it.  If we haven't come to one by, say, Wednesday, I'll make
> a decision and write my patches accordingly.  The community is free, as
> always, to reject my patches if taking them is not in the interest of
> the project.
> 
> [0] 
> https://public-inbox.org/git/CAL9PXLzhPyE+geUdcLmd=pidt5p8efebbsgx_ds88knz2q_...@mail.gmail.com/
> -- 
> brian m. carlson: Houston, Texas, US
> OpenPGP: https://keybase.io/bk2204



-- 
  Eric Deplagne


signature.asc
Description: Digital signature


Re: Hash algorithm analysis

2018-07-21 Thread brian m. carlson
On Sun, Jul 22, 2018 at 12:38:41AM +0200, Johannes Schindelin wrote:
> Do you really want to value contributors' opinion more than
> cryptographers'? I mean, that's exactly what got us into this hard-coded
> SHA-1 mess in the first place.

I agree (believe me, of all people, I agree) that hard-coding SHA-1 was
a bad choice in retrospect.  But I've solicited contributors' opinions
because the Git Project needs to make a decision *for this project*
about the algorithm we're going to use going forward.

> And to set the record straight: I do not have a strong preference of the
> hash algorithm. But cryprographers I have the incredible luck to have
> access to, by virtue of being a colleague, did mention their preference.

I don't know your colleagues, and they haven't commented here.  One
person that has commented here is Adam Langley.  It is my impression
(and anyone is free to correct me if I'm incorrect) that he is indeed a
cryptographer.  To quote him[0]:

  I think this group can safely assume that SHA-256, SHA-512, BLAKE2,
  K12, etc are all secure to the extent that I don't believe that making
  comparisons between them on that axis is meaningful. Thus I think the
  question is primarily concerned with performance and implementation
  availability.

  […]

  So, overall, none of these choices should obviously be excluded. The
  considerations at this point are not cryptographic and the tradeoff
  between implementation ease and performance is one that the git
  community would have to make.

I'm aware that cryptographers tend to prefer algorithms that have been
studied longer over ones that have been studied less.  They also prefer
algorithms built in the open to ones developed behind closed doors.

SHA-256 has the benefit that it has been studied for a long time, but it
was also designed in secret by the NSA.  SHA3-256 was created with
significant study in the open, but is not as mature.  BLAKE2b has been
incorporated into standards like Argon2, but has been weakened slightly
for performance.

I'm not sure that there's a really obvious choice here.

I'm at the point where to continue the work that I'm doing, I need to
make a decision.  I'm happy to follow the consensus if there is one, but
it does not appear that there is.

I will admit that I don't love making this decision by myself, because
right now, whatever I pick, somebody is going to be unhappy.  I want to
state, unambiguously, that I'm trying to make a decision that is in the
interests of the Git Project, the community, and our users.

I'm happy to wait a few more days to see if a consensus develops; if so,
I'll follow it.  If we haven't come to one by, say, Wednesday, I'll make
a decision and write my patches accordingly.  The community is free, as
always, to reject my patches if taking them is not in the interest of
the project.

[0] 
https://public-inbox.org/git/CAL9PXLzhPyE+geUdcLmd=pidt5p8efebbsgx_ds88knz2q_...@mail.gmail.com/
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: Hash algorithm analysis

2018-07-21 Thread Linus Torvalds
On Sat, Jul 21, 2018 at 3:39 PM Johannes Schindelin
 wrote:
>
> Do you really want to value contributors' opinion more than
> cryptographers'? I mean, that's exactly what got us into this hard-coded
> SHA-1 mess in the first place.

Don't be silly.

Other real cryptographers consider SHA256 to be a problem.

Really. It's amenable to the same hack on the internal hash that made
for the SHAttered break.

So your argument that "cryptographers prefer SHA256" is simply not true.

Your real argument is that know at least one cryptographer that you
work with that prefers it.

Don't try to make that into some generic "cryptographers prefer it".
It's not like cryptographers have issues with blake2b either, afaik.

And blake2b really _does_ have very real advantages.

If you can actually point to some "a large percentage of
cryptographers prefer it", you'd have a point.

But as it is, you don't have data, you have an anecdote, and you try
to use that anecdote to put down other peoples opinions.

Intellectually dishonest, that is.

   Linus


Re: Hash algorithm analysis

2018-07-21 Thread Johannes Schindelin
Hi Brian,

On Fri, 20 Jul 2018, brian m. carlson wrote:

> On Mon, Jun 11, 2018 at 12:29:42PM -0700, Jonathan Nieder wrote:
> > My understanding of the discussion so far:
> > 
> > Keccak team encourages us[1] to consider a variant like K12 instead of
> > SHA3.
> > 
> > AGL explains[2] that the algorithms considered all seem like
> > reasonable choices and we should decide using factors like
> > implementation ease and performance.
> > 
> > If we choose a Keccak-based function, AGL also[3] encourages using a
> > variant like K12 instead of SHA3.
> > 
> > Dscho strongly prefers[4] SHA-256, because of
> > - wide implementation availability, including in future hardware
> > - has been widely analyzed
> > - is fast
> > 
> > Yves Orton and Linus Torvalds prefer[5] SHA3 over SHA2 because of how
> > it is constructed.
> 
> I know this discussion has sort of petered out, but I'd like to see if
> we can revive it.  I'm writing index v3 and having a decision would help
> me write tests for it.
> 
> To summarize the discussion that's been had in addition to the above,
> Ævar has also stated a preference for SHA-256 and I would prefer BLAKE2b
> over SHA-256 over SHA3-256, although any of them would be fine.
> 
> Are there other contributors who have a strong opinion?  Are there
> things I can do to help us coalesce around an option?

Do you really want to value contributors' opinion more than
cryptographers'? I mean, that's exactly what got us into this hard-coded
SHA-1 mess in the first place.

And to set the record straight: I do not have a strong preference of the
hash algorithm. But cryprographers I have the incredible luck to have
access to, by virtue of being a colleague, did mention their preference.

I see no good reason to just blow their advice into the wind.

Ciao,
Dscho

Re: Hash algorithm analysis

2018-07-21 Thread brian m. carlson
On Sat, Jul 21, 2018 at 09:52:05PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> On Fri, Jul 20 2018, brian m. carlson wrote:
> > I know this discussion has sort of petered out, but I'd like to see if
> > we can revive it.  I'm writing index v3 and having a decision would help
> > me write tests for it.
> >
> > To summarize the discussion that's been had in addition to the above,
> > Ævar has also stated a preference for SHA-256 and I would prefer BLAKE2b
> > over SHA-256 over SHA3-256, although any of them would be fine.
> >
> > Are there other contributors who have a strong opinion?  Are there
> > things I can do to help us coalesce around an option?
> 
> I have a vague recollection of suggesting something similar in the past,
> but can't find that E-Mail (and maybe it never happened), but for
> testing purposes isn't in simplest if we just have some "test SHA-1"
> algorithm where we pretent that all inputs like "STRING" are really
> "PREFIX-STRING" for the purposes of hashing, or fake shortening /
> lengthening the hash to test arbitrary lenghts of N (just by repeating
> the hash from the beginning is probably good enough...).
> 
> That would make such patches easier to review, since we wouldn't need to
> carry hundreds/thousands of lines of dense hashing code, but a more
> trivial wrapper around SHA-1, and we could have some test mode where we
> could compile & run tests with an arbitrary hash length to make sure
> everything's future proof even after we move to NewHash.

I think Stefan suggested this approach.  It is viable for testing some
aspects of the code, but not others.  It doesn't work for synthesizing
partial collisions or the bisect tests (since bisect falls back to
object ID as a disambiguator).

I had tried this approach (using a single zero-byte as a prefix), but
for whatever reason, it ended up producing inconsistent results when I
hashed.  I'm unclear what went wrong in that approach, but I finally
discarded it after spending an hour or two staring at it.  I'm not
opposed to someone else providing it as an option, though.

Also, after feedback from Eric Sunshine, I decided to adopt an approach
for my hash-independent tests series that used the name of the hash
within the tests so that we could support additional algorithms (such as
a pseudo-SHA-1).  That work necessarily involves having a name for the
hash, which is why I haven't revisited it.

As for arbitrary hash sizes, there is some code which necessarily needs
to depend on a fixed hash size.  A lot of our Perl code matches
[0-9a-f]{40}, which needs to change.  There's no reason we couldn't
adopt such testing in the future, but it might end up being more
complicated than we want.  I have strived to reduce the dependence on
fixed-size constants wherever possible, though.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: Hash algorithm analysis

2018-07-21 Thread Ævar Arnfjörð Bjarmason


On Fri, Jul 20 2018, brian m. carlson wrote:

> On Mon, Jun 11, 2018 at 12:29:42PM -0700, Jonathan Nieder wrote:
>> My understanding of the discussion so far:
>>
>> Keccak team encourages us[1] to consider a variant like K12 instead of
>> SHA3.
>>
>> AGL explains[2] that the algorithms considered all seem like
>> reasonable choices and we should decide using factors like
>> implementation ease and performance.
>>
>> If we choose a Keccak-based function, AGL also[3] encourages using a
>> variant like K12 instead of SHA3.
>>
>> Dscho strongly prefers[4] SHA-256, because of
>> - wide implementation availability, including in future hardware
>> - has been widely analyzed
>> - is fast
>>
>> Yves Orton and Linus Torvalds prefer[5] SHA3 over SHA2 because of how
>> it is constructed.
>
> I know this discussion has sort of petered out, but I'd like to see if
> we can revive it.  I'm writing index v3 and having a decision would help
> me write tests for it.
>
> To summarize the discussion that's been had in addition to the above,
> Ævar has also stated a preference for SHA-256 and I would prefer BLAKE2b
> over SHA-256 over SHA3-256, although any of them would be fine.
>
> Are there other contributors who have a strong opinion?  Are there
> things I can do to help us coalesce around an option?

I have a vague recollection of suggesting something similar in the past,
but can't find that E-Mail (and maybe it never happened), but for
testing purposes isn't in simplest if we just have some "test SHA-1"
algorithm where we pretent that all inputs like "STRING" are really
"PREFIX-STRING" for the purposes of hashing, or fake shortening /
lengthening the hash to test arbitrary lenghts of N (just by repeating
the hash from the beginning is probably good enough...).

That would make such patches easier to review, since we wouldn't need to
carry hundreds/thousands of lines of dense hashing code, but a more
trivial wrapper around SHA-1, and we could have some test mode where we
could compile & run tests with an arbitrary hash length to make sure
everything's future proof even after we move to NewHash.


Re: Hash algorithm analysis

2018-07-20 Thread Jonathan Nieder
Hi,

brian m. carlson wrote:

> I know this discussion has sort of petered out, but I'd like to see if
> we can revive it.  I'm writing index v3 and having a decision would help
> me write tests for it.

Nice!  That's awesome.

> To summarize the discussion that's been had in addition to the above,
> Ævar has also stated a preference for SHA-256 and I would prefer BLAKE2b
> over SHA-256 over SHA3-256, although any of them would be fine.
>
> Are there other contributors who have a strong opinion?  Are there
> things I can do to help us coalesce around an option?

My advice would be to go with BLAKE2b.  If anyone objects, we can deal
with their objection at that point (and I volunteer to help with
cleaning up any mess in rewriting test cases that this advice causes).

Full disclosure: my preference order (speaking for myself and no one
else) is

 K12 > BLAKE2bp-256 > SHA-256x16 > BLAKE2b > SHA-256 > SHA-512/256 > SHA3-256

so I'm not just asking you to go with my favorite. ;-)

Jonathan


Re: Hash algorithm analysis

2018-07-20 Thread brian m. carlson
On Mon, Jun 11, 2018 at 12:29:42PM -0700, Jonathan Nieder wrote:
> My understanding of the discussion so far:
> 
> Keccak team encourages us[1] to consider a variant like K12 instead of
> SHA3.
> 
> AGL explains[2] that the algorithms considered all seem like
> reasonable choices and we should decide using factors like
> implementation ease and performance.
> 
> If we choose a Keccak-based function, AGL also[3] encourages using a
> variant like K12 instead of SHA3.
> 
> Dscho strongly prefers[4] SHA-256, because of
> - wide implementation availability, including in future hardware
> - has been widely analyzed
> - is fast
> 
> Yves Orton and Linus Torvalds prefer[5] SHA3 over SHA2 because of how
> it is constructed.

I know this discussion has sort of petered out, but I'd like to see if
we can revive it.  I'm writing index v3 and having a decision would help
me write tests for it.

To summarize the discussion that's been had in addition to the above,
Ævar has also stated a preference for SHA-256 and I would prefer BLAKE2b
over SHA-256 over SHA3-256, although any of them would be fine.

Are there other contributors who have a strong opinion?  Are there
things I can do to help us coalesce around an option?
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: Hash algorithm analysis

2018-06-21 Thread brian m. carlson
On Mon, Jun 11, 2018 at 11:19:10PM +0200, Ævar Arnfjörð Bjarmason wrote:
> This is a great summary. Thanks.
> 
> In case it's not apparent from what follows, I have a bias towards
> SHA-256. Reasons for that, to summarize some of the discussion the last
> time around[1], and to add more details:

To summarize my view, I think my ordered preference of hashes is
BLAKE2b, SHA-256, and SHA3-256.

I agree with AGL that all three of these options are secure and will be
for some time.  I believe there's sufficient literature on all three of
them and there will continue to be for some time.  I've seen and read
papers from the IACR archves on all three of them, and because all three
are widely used, they'll continue to be interesting to cryptologists for
a long time to come.

I'm personally partial to having full preimage resistance, which I think
makes SHAKE128 less appealing.  SHAKE128 also has fewer crypto library
implementations than the others.

My rationale for this ordering is essentially performance.  BLAKE2b is
quite fast on all known hardware, and it is almost as fast as an
accelerated SHA-256.  The entire rationale for BLAKE2b is to give people
a secure algorithm that is faster than MD5 and SHA-1, so there's no
reason to use an insecure algorithm.  It also greatly outperforms the
other two even in pure C, which matters for the fallback implementation
we'll need to ship.

I tend to think SHA3-256 is the most conservative of these choices as
far as security.  It has had an open development process and has a large
security margin.  It has gained a lot of cryptanalysis and come out
quite well, and the versatility of the Keccak sponge construction means
that it's going to get a lot more attention.  Pretty much the only
downside is its performance relative to the other two.

I placed SHA-256 in the middle because of its potential for acceleration
on Intel hardware.  I know such changes are coming, but they won't
likely be here for another two years.

While hashing performance isn't a huge concern for Git now, I had
planned to implement an rsync-based delta algorithm for large files that
could make storing some large files in a Git repository viable (of
course, there will still be many cases where Git LFS and git-annex are
better).  The algorithm is extremely sensitive to hash performance and
would simply not be viable with an unaccelerated SHA-256 or SHA3-256,
although it would perform reasonably well with BLAKE2b.

Having said that, I'd be happy with any of the three, and would support
a consensus around any of them as well.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: Hash algorithm analysis

2018-06-21 Thread Johannes Schindelin
Hi Ævar,

On Mon, 11 Jun 2018, Ævar Arnfjörð Bjarmason wrote:

> On Sat, Jun 09 2018, brian m. carlson wrote:
> 
> [Expanding the CC list to what we had in the last "what hash" thread[1]
> last year].
> 
> > == Discussion of Candidates
> >
> > I've implemented and tested the following algorithms, all of which are
> > 256-bit (in alphabetical order):
> >
> > * BLAKE2b (libb2)
> > * BLAKE2bp (libb2)
> > * KangarooTwelve (imported from the Keccak Code Package)
> > * SHA-256 (OpenSSL)
> > * SHA-512/256 (OpenSSL)
> > * SHA3-256 (OpenSSL)
> > * SHAKE128 (OpenSSL)
> >
> > I also rejected some other candidates.  I couldn't find any reference or
> > implementation of SHA256×16, so I didn't implement it.  I didn't
> > consider SHAKE256 because it is nearly identical to SHA3-256 in almost
> > all characteristics (including performance).
> >
> > I imported the optimized 64-bit implementation of KangarooTwelve.  The
> > AVX2 implementation was not considered for licensing reasons (it's
> > partially generated from external code, which falls foul of the GPL's
> > "preferred form for modifications" rule).
> >
> > === BLAKE2b and BLAKE2bp
> >
> > These are the non-parallelized and parallelized 64-bit variants of
> > BLAKE2.
> >
> > Benefits:
> > * Both algorithms provide 256-bit preimage resistance.
> >
> > Downsides:
> > * Some people are uncomfortable that the security margin has been
> >   decreased from the original SHA-3 submission, although it is still
> >   considered secure.
> > * BLAKE2bp, as implemented in libb2, uses OpenMP (and therefore
> >   multithreading) by default.  It was no longer possible to run the
> >   testsuite with -j3 on my laptop in this configuration.
> >
> > === Keccak-based Algorithms
> >
> > SHA3-256 is the 256-bit Keccak algorithm with 24 rounds, processing 136
> > bytes at a time.  SHAKE128 is an extendable output function with 24
> > rounds, processing 168 bytes at a time.  KangarooTwelve is an extendable
> > output function with 12 rounds, processing 136 bytes at a time.
> >
> > Benefits:
> > * SHA3-256 provides 256-bit preimage resistance.
> > * SHA3-256 has been heavily studied and is believed to have a large
> >   security margin.
> >
> > I noted the following downsides:
> > * There's a lack of a availability of KangarooTwelve in other
> >   implementations.  It may be the least available option in terms of
> >   implementations.
> > * Some people are uncomfortable that the security margin of
> >   KangarooTwelve has been decreased, although it is still considered
> >   secure.
> > * SHAKE128 and KangarooTwelve provide only 128-bit preimage resistance.
> >
> > === SHA-256 and SHA-512/256
> >
> > These are the 32-bit and 64-bit SHA-2 algorithms that are 256 bits in
> > size.
> >
> > I noted the following benefits:
> > * Both algorithms are well known and heavily analyzed.
> > * Both algorithms provide 256-bit preimage resistance.
> >
> > == Implementation Support
> >
> > |===
> > | Implementation | OpenSSL | libb2 | NSS | ACC | gcrypt | Nettle| CL  |
> > | SHA-1  |    |   |    |    |   |  | {1} |
> > | BLAKE2b| f   |  | | |   |   | {2} |
> > | BLAKE2bp   | |  | | ||   | |
> > | KangarooTwelve | |   | | ||   | |
> > | SHA-256|    |   |    |    |   |  | {1} |
> > | SHA-512/256|    |   | | ||  | {3} |
> > | SHA3-256   |    |   | | |   |  | {4} |
> > | SHAKE128   |    |   | | |   |   | {5} |
> > |===
> >
> > f: future version (expected 1.2.0)
> > ACC: Apple Common Crypto
> > CL: Command-line
> >
> > :1: OpenSSL, coreutils, Perl Digest::SHA.
> > :2: OpenSSL, coreutils.
> > :3: OpenSSL
> > :4: OpenSSL, Perl Digest::SHA3.
> > :5: Perl Digest::SHA3.
> >
> > === Performance Analysis
> >
> > The test system used below is my personal laptop, a 2016 Lenovo ThinkPad
> > X1 Carbon with an Intel i7-6600U CPU (2.60 GHz) running Debian unstable.
> >
> > I implemented a test tool helper to compute speed much like OpenSSL
> > does.  Below is a comparison of speeds.  The columns indicate the speed
> > in KiB/s for chunks of the given size.  The runs are representative of
> > multiple similar runs.
> >
> > 256 and 1024 bytes were chosen to represent common tree and commit
> > object sizes and the 8 KiB is an approximate average blob size.
> >
> > Algorithms are sorted by performance on the 1 KiB column.
> >
> > |===
> > | Implementation | 256 B  | 1 KiB  | 8 KiB  | 16 KiB |
> > | SHA-1 (OpenSSL)| 513963 | 685966 | 748993 | 754270 |
> > | BLAKE2b (libb2)| 488123 | 552839 | 576246 | 579292 |
> > | SHA-512/256 (OpenSSL)  | 181177 | 349002 | 499113 | 495169 |
> > | BLAKE2bp (libb2)   | 139891 | 344786 | 488390 | 522575 |
> > | SHA-256 (OpenSSL)  | 264276 | 333560 | 357830 | 355761 |
> > | 

Re: Hash algorithm analysis

2018-06-15 Thread Gilles Van Assche
On 14/06/18 01:58, brian m. carlson wrote:
>>> I imported the optimized 64-bit implementation of KangarooTwelve.
>>> The AVX2 implementation was not considered for licensing reasons
>>> (it's partially generated from external code, which falls foul of
>>> the GPL's "preferred form for modifications" rule). 
>>
>> Indeed part of the AVX2 code in the Keccak code package is an
>> extension of the implementation in OpenSSL (written by Andy
>> Polyakov). The assembly code is generated by a Perl script, and we
>> extended it to fit in the KCP's internal API.
>>
>> Would it solve this licensing problem if we remap our extensions to
>> the Perl script, which would then become "the source"? 
>
> The GPLv2 requires "the preferred form of the work for making
> modifications to it". If that form is the Perl script, then yes, that
> would be sufficient. If your code is dissimilar enough that editing it
> directly is better than editing the Perl script, then it might already
> meet the definition.
>
> I don't do assembly programming, so I don't know what forms one
> generally wants for editing assembly. Apparently OpenSSL wants a Perl
> script, but that is, I understand, less common. What would you use if
> you were going to improve it?

The Perl script would be more flexible in case one needs to improve the
implementation. It allows one to use meaningful symbolic names for the
registers. My colleague Ronny, who did the extension, worked directly
with physical register names and considered the output of the Perl
script as his source. But this extension could probably be done also at
the level of the Perl script.

Kind regards,
Gilles



Re: Hash algorithm analysis

2018-06-13 Thread brian m. carlson
On Tue, Jun 12, 2018 at 06:21:21PM +0200, Gilles Van Assche wrote:
> Hi,
> 
> On 10/06/18 00:49, brian m. carlson wrote:
> > I imported the optimized 64-bit implementation of KangarooTwelve. The
> > AVX2 implementation was not considered for licensing reasons (it's
> > partially generated from external code, which falls foul of the GPL's
> > "preferred form for modifications" rule).
> 
> Indeed part of the AVX2 code in the Keccak code package is an extension
> of the implementation in OpenSSL (written by Andy Polyakov). The
> assembly code is generated by a Perl script, and we extended it to fit
> in the KCP's internal API.
> 
> Would it solve this licensing problem if we remap our extensions to the
> Perl script, which would then become "the source"?

The GPLv2 requires "the preferred form of the work for making
modifications to it".  If that form is the Perl script, then yes, that
would be sufficient.  If your code is dissimilar enough that editing it
directly is better than editing the Perl script, then it might already
meet the definition.

I don't do assembly programming, so I don't know what forms one
generally wants for editing assembly.  Apparently OpenSSL wants a Perl
script, but that is, I understand, less common.  What would you use if
you were going to improve it?

> On 12/06/18 00:35, brian m. carlson wrote:
> > While I think K12 is an interesting algorithm, I'm not sure we're
> > going to get as good of performance out of it as we might want due to
> > the lack of implementations.
> 
> Implementation availability is indeed important. The effort to transform
> an implementation of SHAKE128 into one of K12 is limited due to the
> reuse of their main components (round function, sponge construction). So
> the availability of SHA-3/Keccak implementations can benefit that of K12
> if there is sufficient interest. E.g., the SHA-3/Keccak instructions in
> ARMv8.2 can speed up K12 as well.

That's good to know.  I wasn't aware that ARM was providing Keccak
instructions, but it's good to see that new chips are providing them.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: Hash algorithm analysis

2018-06-12 Thread Gilles Van Assche
Hi,

On 10/06/18 00:49, brian m. carlson wrote:
> I imported the optimized 64-bit implementation of KangarooTwelve. The
> AVX2 implementation was not considered for licensing reasons (it's
> partially generated from external code, which falls foul of the GPL's
> "preferred form for modifications" rule).

Indeed part of the AVX2 code in the Keccak code package is an extension
of the implementation in OpenSSL (written by Andy Polyakov). The
assembly code is generated by a Perl script, and we extended it to fit
in the KCP's internal API.

Would it solve this licensing problem if we remap our extensions to the
Perl script, which would then become "the source"?


On 12/06/18 00:35, brian m. carlson wrote:
>> My understanding is that all the algorithms we're discussing are
>> believed to be approximately equivalent in security. That's a strange
>> thing to say when e.g. K12 uses fewer rounds than SHA3 of the same
>> permutation, but it is my understanding nonetheless. We don't know
>> yet how these hash algorithms will ultimately break. 
>
> With the exception of variations in preimage security, I expect that's
> correct. I think implementation availability and performance are the
> best candidates for consideration.

Note that we recently updated the paper on K12 (accepted at ACNS 2018),
with more details on performance and security.
https://eprint.iacr.org/2016/770

>> My understanding of the discussion so far:
>>
>> Keccak team encourages us[1] to consider a variant like K12 instead
>> of SHA3. 
>
> While I think K12 is an interesting algorithm, I'm not sure we're
> going to get as good of performance out of it as we might want due to
> the lack of implementations.

Implementation availability is indeed important. The effort to transform
an implementation of SHAKE128 into one of K12 is limited due to the
reuse of their main components (round function, sponge construction). So
the availability of SHA-3/Keccak implementations can benefit that of K12
if there is sufficient interest. E.g., the SHA-3/Keccak instructions in
ARMv8.2 can speed up K12 as well.

Kind regards,
Gilles



Re: Hash algorithm analysis

2018-06-11 Thread Linus Torvalds
On Mon, Jun 11, 2018 at 4:27 PM Ævar Arnfjörð Bjarmason
 wrote:
> >
> > And no, I'm not a cryptographer. But honestly, length extension
> > attacks were how both md5 and sha1 were broken in practice, so I'm
> > just going "why would we go with a crypto choice that has that known
> > weakness? That's just crazy".
>
> What do you think about Johannes's summary of this being a non-issue for
> Git in
> https://public-inbox.org/git/alpine.DEB.2.21.1.1706151122180.4200@virtualbox/
> ?

I agree that the fact that git internal data is structured and all
meaningful (and doesn't really have ignored state) makes it *much*
harder to attack the basic git objects, since you not only have to
generate a good hash, the end result has to also *parse* and there is
not really any hidden non-parsed data that you can use to hide the
attack.

And *if* you are using git for source code, the same is pretty much
true even for the blob objects - an attacking object will stand out
like a sore thumb in "diff" etc.

So I don't disagree with Johannes in that sense: I think git does
fundamentally tend to have some extra validation in place, and there's
a reason why the examples for both the md5 and the sha1 attack were
pdf files.

That said, even if git internal ("metadata") objects like trees and
commits tend to not have opaque parts to them and are thus pretty hard
to attack, the blob objects are still an attack vector for projects
that use git for non-source-code (and even source projects do embed
binary files - including pdf files - even though they might not be "as
interesting" to attack). So you do want to protect those too.

And hey, protecting the metadata objects is good just to protect
against annoyances. Sure, you should always sanity check the object at
receive time anyway, but even so, if somebody is able to generate a
blob object that hashes to the same hash as a metadata object (ie tree
or commit), that really could be pretty damn annoying.

And the whole "intermediate hashed state is same size as final hash
state" just _fundamentally_ means that if you find a weakness in the
hash, you can now attack that weakness without having to worry about
the attack being fundamentally more expensive.

That's essentially what SHAttered relied on. It didn't rely on a
secret and a hash and length extension, but it *did* rely on the same
mechanism that a length extension attack relies on, where you can
basically attack the state in the middle with no extra cost.

Maybe some people don't consider it a length extension attack for that
reason, but it boils down to much the same basic situation where you
can attack the internal hash state and cause a state collision. And
you can try to find the patterns that then cause that state collision
when you've found a weakness in the hash.

With SHA3 or k12, you can obviously _also_ try to attack the hash
state and cause a collision, but because the intermediate state is
much bigger than the final hash, you're just making things *way*
harder for yourself if you try that.

  Linus


Re: Hash algorithm analysis

2018-06-11 Thread David Lang

On Tue, 12 Jun 2018, Ævar Arnfjörð Bjarmason wrote:


From a performance standpoint, I have to say (once more) that crypto
performance actually mattered a lot less than I originally thought it
would. Yes, there are phases that do care, but they are rare.


One real-world case is rebasing[1]. As noted in that E-Mail of mine a
year ago we can use SHA1DC v.s. OpenSSL as a stand-in for the sort of
performance difference we might expect between hash functions, although
as you note this doesn't account for the difference in length.


when you are rebasing, how many hashes do you need to calculate? a few dozen, a 
few hundred, a few thousand, a few hundered thousand?


If the common uses of rebasing are on the low end, then the fact that the hash 
takes a bit longer won't matter much because the entire job is so fast.


And at the high end, I/O will probably dominate.

so where does it really make a human visible difference?

David Lang

Re: Hash algorithm analysis

2018-06-11 Thread Ævar Arnfjörð Bjarmason


On Mon, Jun 11 2018, Linus Torvalds wrote:

> On Mon, Jun 11, 2018 at 12:29 PM Jonathan Nieder  wrote:
>>
>> Yves Orton and Linus Torvalds prefer[5] SHA3 over SHA2 because of how
>> it is constructed.
>
> Yeah, I really think that it's a mistake to switch to something that
> has the same problem SHA1 had.
>
> That doesn't necessarily mean SHA3, but it does mean "bigger
> intermediate hash state" (so no length extension attack), which could
> be SHA3, but also SHA-512/256 or K12.
>
> Honestly, git has effectively already moved from SHA1 to SHA1DC.
>
> So the actual known attack and weakness of SHA1 should simply not be
> part of the discussion for the next hash. You can basically say "we're
> _already_ on the second hash, we just picked one that was so
> compatible with SHA1 that nobody even really noticed.
>
> The reason to switch is
>
>  (a) 160 bits may not be enough
>
>  (b) maybe there are other weaknesses in SHA1 that SHA1DC doesn't catch.
>
>  (c) others?
>
> Obviously all of the choices address (a).

FWIW I updated our docs 3 months ago to try to address some of this:
https://github.com/git/git/commit/5988eb631a

> But at least for me, (b) makes me go "well, SHA2 has the exact same
> weak inter-block state attack, so if there are unknown weaknesses in
> SHA1, then what about unknown weaknesses in SHA2"?
>
> And no, I'm not a cryptographer. But honestly, length extension
> attacks were how both md5 and sha1 were broken in practice, so I'm
> just going "why would we go with a crypto choice that has that known
> weakness? That's just crazy".

What do you think about Johannes's summary of this being a non-issue for
Git in
https://public-inbox.org/git/alpine.DEB.2.21.1.1706151122180.4200@virtualbox/
?

> From a performance standpoint, I have to say (once more) that crypto
> performance actually mattered a lot less than I originally thought it
> would. Yes, there are phases that do care, but they are rare.

One real-world case is rebasing[1]. As noted in that E-Mail of mine a
year ago we can use SHA1DC v.s. OpenSSL as a stand-in for the sort of
performance difference we might expect between hash functions, although
as you note this doesn't account for the difference in length.

With our perf tests, in t/perf on linux.git:

$ GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_REPEAT_COUNT=10 
GIT_PERF_MAKE_COMMAND='if pwd | grep -q $(git rev-parse origin/master); then 
make -j8 CFLAGS=-O3 DC_SHA1=Y; else make -j8 CFLAGS=-O3 OPENSSL_SHA1=Y; fi' 
./run origin/master~ origin/master -- p3400-rebase.sh
Test
origin/master~origin/master


3400.2: rebase on top of a lot of unrelated changes 
1.38(1.19+0.11)   1.40(1.23+0.10) +1.4%
3400.4: rebase a lot of unrelated changes without split-index   
4.07(3.28+0.66)   4.62(3.71+0.76) +13.5%
3400.6: rebase a lot of unrelated changes with split-index  
3.41(2.94+0.38)   3.35(2.87+0.37) -1.8%

On a bigger monorepo I have here:

Test
origin/master~origin/master

---
3400.2: rebase on top of a lot of unrelated changes 
1.39(1.19+0.17)   1.34(1.16+0.16) -3.6%
3400.4: rebase a lot of unrelated changes without split-index   
6.67(3.37+0.63)   6.95(3.90+0.62) +4.2%
3400.6: rebase a lot of unrelated changes with split-index  
3.70(2.85+0.45)   3.73(2.85+0.41) +0.8%

I didn't paste any numbers in that E-Mail a year ago, maybe I produced
them differently, but this is clerly not that of a "big difference". But
this is one way to see the difference.

> For example, I think SHA1 performance has probably mattered most for
> the index and pack-file, where it's really only used as a fancy CRC.
> For most individual object cases, it is almost never an issue.

Yeah there's lots of things we could optimize there, but we are going to
need to hash things to create the commit in e.g. the rebase case, but
much of that could probably be done more efficiently without switching
the hash.

> From a performance angle, I think the whole "256-bit hashes are
> bigger" is going to be the more noticeable performance issue, just
> because things like delta compression and zlib - both of which are
> very *real* and present performance issues - will have more data that
> they need to work on. The performance difference between different
> hashing functions is likely not all that noticeable in most common
> cases as long as we're not talking orders of magnitude.
>
> And yes, I guess we're in the "approaching an order of magnitude"
> performance difference, but we should actually compare not to OpenSSL
> SHA1, but to SHA1DC. See above.
>
> Personally, the fact that the Keccak people would suggest K12 makes me
> think 

Re: Hash algorithm analysis

2018-06-11 Thread brian m. carlson
On Mon, Jun 11, 2018 at 12:29:42PM -0700, Jonathan Nieder wrote:
> brian m. carlson wrote:
> 
> > == Discussion of Candidates
> >
> > I've implemented and tested the following algorithms, all of which are
> > 256-bit (in alphabetical order):
> 
> Thanks for this.  Where can I read your code?

https://github.com/bk2204/git.git, test-hashes branch.  You will need to
have libb2 and OPENSSL_SHA1 set.  It's a bit of a hack, so don't look
too hard.

> [...]
> > I also rejected some other candidates.  I couldn't find any reference or
> > implementation of SHA256×16, so I didn't implement it.
> 
> Reference: https://eprint.iacr.org/2012/476.pdf

Thanks for that reference.

> If consensus turns toward it being the right hash function to use,
> then we can pursue finding or writing a good high-quality
> implementation.  But I tend to suspect that the lack of wide
> implementation availability is a reason to avoid it unless we find
> SHA-256 to be too slow.

I agree.  Implementation availability is important.  Whatever we provide
is likely going to be portable C code, which is going to be slower than
an optimized implementation.

> [...]
> > * BLAKE2bp, as implemented in libb2, uses OpenMP (and therefore
> >   multithreading) by default.  It was no longer possible to run the
> >   testsuite with -j3 on my laptop in this configuration.
> 
> My understanding is that BLAKE2bp is better able to make use of simd
> instructions than BLAKE2b.  Is there a way to configure libb2 to take
> advantage of that without multithreading?

You'll notice below that I have both BLAKE2bp with and without
threading.  I recompiled libb2 to not use threading, and it still didn't
perform as well.

libb2 is written by the authors of BLAKE2, so it's the most favorable
implementation we're likely to get.

> [...]
> > |===
> > | Implementation | 256 B  | 1 KiB  | 8 KiB  | 16 KiB |
> > | SHA-1 (OpenSSL)| 513963 | 685966 | 748993 | 754270 |
> > | BLAKE2b (libb2)| 488123 | 552839 | 576246 | 579292 |
> > | SHA-512/256 (OpenSSL)  | 181177 | 349002 | 499113 | 495169 |
> > | BLAKE2bp (libb2)   | 139891 | 344786 | 488390 | 522575 |
> > | SHA-256 (OpenSSL)  | 264276 | 333560 | 357830 | 355761 |
> > | KangarooTwelve | 239305 | 307300 | 355257 | 364261 |
> > | SHAKE128 (OpenSSL) | 154775 | 253344 | 337811 | 346732 |
> > | SHA3-256 (OpenSSL) | 128597 | 185381 | 198931 | 207365 |
> > | BLAKE2bp (libb2; threaded) |  12223 |  49306 | 132833 | 179616 |
> > |===
> 
> That's a bit surprising, since my impression (e.g. in the SUPERCOP
> benchmarks you cite) is that there are secure hash functions that
> allow comparable or even faster performance than SHA-1 with large
> inputs on a single core.  In Git we also care about performance with
> small inputs, creating a bit of a trade-off.
> 
> More on the subject of blake2b vs blake2bp:
> 
> - blake2b is faster for small inputs (under 1k, say). If this is
>   important then we could set a threshold, e.g. 512 bytes, for
>   swtiching to blake2bp.
> 
> - blake2b is supported in OpenSSL and likely to get x86-optimized
>   versions in the future. blake2bp is not in OpenSSL.

Correct.  BLAKE2b in OpenSSL is currently 512-bit only, but it's
intended to add support for 256-bit versions soon.  I think the benefit
of sticking to one hash function altogether is significant, so I think
we should one that has good all-around performance instead of trying to
split between different ones.

> My understanding is that all the algorithms we're discussing are
> believed to be approximately equivalent in security.  That's a strange
> thing to say when e.g. K12 uses fewer rounds than SHA3 of the same
> permutation, but it is my understanding nonetheless.  We don't know
> yet how these hash algorithms will ultimately break.

With the exception of variations in preimage security, I expect that's
correct.  I think implementation availability and performance are the
best candidates for consideration.

> My understanding of the discussion so far:
> 
> Keccak team encourages us[1] to consider a variant like K12 instead of
> SHA3.

While I think K12 is an interesting algorithm, I'm not sure we're going
to get as good of performance out of it as we might want due to the lack
of implementations.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: Hash algorithm analysis

2018-06-11 Thread Ævar Arnfjörð Bjarmason


On Sat, Jun 09 2018, brian m. carlson wrote:

[Expanding the CC list to what we had in the last "what hash" thread[1]
last year].

> == Discussion of Candidates
>
> I've implemented and tested the following algorithms, all of which are
> 256-bit (in alphabetical order):
>
> * BLAKE2b (libb2)
> * BLAKE2bp (libb2)
> * KangarooTwelve (imported from the Keccak Code Package)
> * SHA-256 (OpenSSL)
> * SHA-512/256 (OpenSSL)
> * SHA3-256 (OpenSSL)
> * SHAKE128 (OpenSSL)
>
> I also rejected some other candidates.  I couldn't find any reference or
> implementation of SHA256×16, so I didn't implement it.  I didn't
> consider SHAKE256 because it is nearly identical to SHA3-256 in almost
> all characteristics (including performance).
>
> I imported the optimized 64-bit implementation of KangarooTwelve.  The
> AVX2 implementation was not considered for licensing reasons (it's
> partially generated from external code, which falls foul of the GPL's
> "preferred form for modifications" rule).
>
> === BLAKE2b and BLAKE2bp
>
> These are the non-parallelized and parallelized 64-bit variants of
> BLAKE2.
>
> Benefits:
> * Both algorithms provide 256-bit preimage resistance.
>
> Downsides:
> * Some people are uncomfortable that the security margin has been
>   decreased from the original SHA-3 submission, although it is still
>   considered secure.
> * BLAKE2bp, as implemented in libb2, uses OpenMP (and therefore
>   multithreading) by default.  It was no longer possible to run the
>   testsuite with -j3 on my laptop in this configuration.
>
> === Keccak-based Algorithms
>
> SHA3-256 is the 256-bit Keccak algorithm with 24 rounds, processing 136
> bytes at a time.  SHAKE128 is an extendable output function with 24
> rounds, processing 168 bytes at a time.  KangarooTwelve is an extendable
> output function with 12 rounds, processing 136 bytes at a time.
>
> Benefits:
> * SHA3-256 provides 256-bit preimage resistance.
> * SHA3-256 has been heavily studied and is believed to have a large
>   security margin.
>
> I noted the following downsides:
> * There's a lack of a availability of KangarooTwelve in other
>   implementations.  It may be the least available option in terms of
>   implementations.
> * Some people are uncomfortable that the security margin of
>   KangarooTwelve has been decreased, although it is still considered
>   secure.
> * SHAKE128 and KangarooTwelve provide only 128-bit preimage resistance.
>
> === SHA-256 and SHA-512/256
>
> These are the 32-bit and 64-bit SHA-2 algorithms that are 256 bits in
> size.
>
> I noted the following benefits:
> * Both algorithms are well known and heavily analyzed.
> * Both algorithms provide 256-bit preimage resistance.
>
> == Implementation Support
>
> |===
> | Implementation | OpenSSL | libb2 | NSS | ACC | gcrypt | Nettle| CL  |
> | SHA-1  |    |   |    |    |   |  | {1} |
> | BLAKE2b| f   |  | | |   |   | {2} |
> | BLAKE2bp   | |  | | ||   | |
> | KangarooTwelve | |   | | ||   | |
> | SHA-256|    |   |    |    |   |  | {1} |
> | SHA-512/256|    |   | | ||  | {3} |
> | SHA3-256   |    |   | | |   |  | {4} |
> | SHAKE128   |    |   | | |   |   | {5} |
> |===
>
> f: future version (expected 1.2.0)
> ACC: Apple Common Crypto
> CL: Command-line
>
> :1: OpenSSL, coreutils, Perl Digest::SHA.
> :2: OpenSSL, coreutils.
> :3: OpenSSL
> :4: OpenSSL, Perl Digest::SHA3.
> :5: Perl Digest::SHA3.
>
> === Performance Analysis
>
> The test system used below is my personal laptop, a 2016 Lenovo ThinkPad
> X1 Carbon with an Intel i7-6600U CPU (2.60 GHz) running Debian unstable.
>
> I implemented a test tool helper to compute speed much like OpenSSL
> does.  Below is a comparison of speeds.  The columns indicate the speed
> in KiB/s for chunks of the given size.  The runs are representative of
> multiple similar runs.
>
> 256 and 1024 bytes were chosen to represent common tree and commit
> object sizes and the 8 KiB is an approximate average blob size.
>
> Algorithms are sorted by performance on the 1 KiB column.
>
> |===
> | Implementation | 256 B  | 1 KiB  | 8 KiB  | 16 KiB |
> | SHA-1 (OpenSSL)| 513963 | 685966 | 748993 | 754270 |
> | BLAKE2b (libb2)| 488123 | 552839 | 576246 | 579292 |
> | SHA-512/256 (OpenSSL)  | 181177 | 349002 | 499113 | 495169 |
> | BLAKE2bp (libb2)   | 139891 | 344786 | 488390 | 522575 |
> | SHA-256 (OpenSSL)  | 264276 | 333560 | 357830 | 355761 |
> | KangarooTwelve | 239305 | 307300 | 355257 | 364261 |
> | SHAKE128 (OpenSSL) | 154775 | 253344 | 337811 | 346732 |
> | SHA3-256 (OpenSSL) | 128597 | 185381 | 198931 | 207365 |
> | BLAKE2bp (libb2; threaded) |  12223 |  49306 | 132833 | 179616 |
> |===
>
> SUPERCOP (a crypto 

Re: Hash algorithm analysis

2018-06-11 Thread Linus Torvalds
On Mon, Jun 11, 2018 at 12:29 PM Jonathan Nieder  wrote:
>
> Yves Orton and Linus Torvalds prefer[5] SHA3 over SHA2 because of how
> it is constructed.

Yeah, I really think that it's a mistake to switch to something that
has the same problem SHA1 had.

That doesn't necessarily mean SHA3, but it does mean "bigger
intermediate hash state" (so no length extension attack), which could
be SHA3, but also SHA-512/256 or K12.

Honestly, git has effectively already moved from SHA1 to SHA1DC.

So the actual known attack and weakness of SHA1 should simply not be
part of the discussion for the next hash. You can basically say "we're
_already_ on the second hash, we just picked one that was so
compatible with SHA1 that nobody even really noticed.

The reason to switch is

 (a) 160 bits may not be enough

 (b) maybe there are other weaknesses in SHA1 that SHA1DC doesn't catch.

 (c) others?

Obviously all of the choices address (a).

But at least for me, (b) makes me go "well, SHA2 has the exact same
weak inter-block state attack, so if there are unknown weaknesses in
SHA1, then what about unknown weaknesses in SHA2"?

And no, I'm not a cryptographer. But honestly, length extension
attacks were how both md5 and sha1 were broken in practice, so I'm
just going "why would we go with a crypto choice that has that known
weakness? That's just crazy".

>From a performance standpoint, I have to say (once more) that crypto
performance actually mattered a lot less than I originally thought it
would. Yes, there are phases that do care, but they are rare.

For example, I think SHA1 performance has probably mattered most for
the index and pack-file, where it's really only used as a fancy CRC.
For most individual object cases, it is almost never an issue.

>From a performance angle, I think the whole "256-bit hashes are
bigger" is going to be the more noticeable performance issue, just
because things like delta compression and zlib - both of which are
very *real* and present performance issues - will have more data that
they need to work on. The performance difference between different
hashing functions is likely not all that noticeable in most common
cases as long as we're not talking orders of magnitude.

And yes, I guess we're in the "approaching an order of magnitude"
performance difference, but we should actually compare not to OpenSSL
SHA1, but to SHA1DC. See above.

Personally, the fact that the Keccak people would suggest K12 makes me
think that should be a front-runner, but whatever. I don't think the
128-bit preimage case is an issue, since 128 bits is the brute-force
cost for any 256-bit hash.

But hey, I picked sha1 to begin with, so take any input from me with
that historical pinch of salt in mind ;)

Linus


Re: Hash algorithm analysis

2018-06-11 Thread Jonathan Nieder
Hi,

brian m. carlson wrote:

> == Discussion of Candidates
>
> I've implemented and tested the following algorithms, all of which are
> 256-bit (in alphabetical order):

Thanks for this.  Where can I read your code?

[...]
> I also rejected some other candidates.  I couldn't find any reference or
> implementation of SHA256×16, so I didn't implement it.

Reference: https://eprint.iacr.org/2012/476.pdf

If consensus turns toward it being the right hash function to use,
then we can pursue finding or writing a good high-quality
implementation.  But I tend to suspect that the lack of wide
implementation availability is a reason to avoid it unless we find
SHA-256 to be too slow.

[...]
> * BLAKE2bp, as implemented in libb2, uses OpenMP (and therefore
>   multithreading) by default.  It was no longer possible to run the
>   testsuite with -j3 on my laptop in this configuration.

My understanding is that BLAKE2bp is better able to make use of simd
instructions than BLAKE2b.  Is there a way to configure libb2 to take
advantage of that without multithreading?

E.g. https://github.com/sneves/blake2-avx2 looks promising for that.

[...]
> |===
> | Implementation | 256 B  | 1 KiB  | 8 KiB  | 16 KiB |
> | SHA-1 (OpenSSL)| 513963 | 685966 | 748993 | 754270 |
> | BLAKE2b (libb2)| 488123 | 552839 | 576246 | 579292 |
> | SHA-512/256 (OpenSSL)  | 181177 | 349002 | 499113 | 495169 |
> | BLAKE2bp (libb2)   | 139891 | 344786 | 488390 | 522575 |
> | SHA-256 (OpenSSL)  | 264276 | 333560 | 357830 | 355761 |
> | KangarooTwelve | 239305 | 307300 | 355257 | 364261 |
> | SHAKE128 (OpenSSL) | 154775 | 253344 | 337811 | 346732 |
> | SHA3-256 (OpenSSL) | 128597 | 185381 | 198931 | 207365 |
> | BLAKE2bp (libb2; threaded) |  12223 |  49306 | 132833 | 179616 |
> |===

That's a bit surprising, since my impression (e.g. in the SUPERCOP
benchmarks you cite) is that there are secure hash functions that
allow comparable or even faster performance than SHA-1 with large
inputs on a single core.  In Git we also care about performance with
small inputs, creating a bit of a trade-off.

More on the subject of blake2b vs blake2bp:

- blake2b is faster for small inputs (under 1k, say). If this is
  important then we could set a threshold, e.g. 512 bytes, for
  swtiching to blake2bp.

- blake2b is supported in OpenSSL and likely to get x86-optimized
  versions in the future. blake2bp is not in OpenSSL.

[...]
> == Summary
>
> The algorithms with the greatest implementation availability are
> SHA-256, SHA3-256, BLAKE2b, and SHAKE128.
>
> In terms of command-line availability, BLAKE2b, SHA-256, SHA-512/256,
> and SHA3-256 should be available in the near future on a reasonably
> small Debian, Ubuntu, or Fedora install.
>
> As far as security, the most conservative choices appear to be SHA-256,
> SHA-512/256, and SHA3-256.

SHA-256x16 has the same security properties as SHA-256.  Picking
between those two is a tradeoff between performance and implementation
availability.

My understanding is that all the algorithms we're discussing are
believed to be approximately equivalent in security.  That's a strange
thing to say when e.g. K12 uses fewer rounds than SHA3 of the same
permutation, but it is my understanding nonetheless.  We don't know
yet how these hash algorithms will ultimately break.

My understanding of the discussion so far:

Keccak team encourages us[1] to consider a variant like K12 instead of
SHA3.

AGL explains[2] that the algorithms considered all seem like
reasonable choices and we should decide using factors like
implementation ease and performance.

If we choose a Keccak-based function, AGL also[3] encourages using a
variant like K12 instead of SHA3.

Dscho strongly prefers[4] SHA-256, because of
- wide implementation availability, including in future hardware
- has been widely analyzed
- is fast

Yves Orton and Linus Torvalds prefer[5] SHA3 over SHA2 because of how
it is constructed.

Thanks,
Jonathan

[1] 
https://public-inbox.org/git/91a34c5b-7844-3db2-cf29-411df5bcf...@noekeon.org/
[2] 
https://public-inbox.org/git/CAL9PXLzhPyE+geUdcLmd=pidt5p8efebbsgx_ds88knz2q_...@mail.gmail.com/
[3] https://www.imperialviolet.org/2017/05/31/skipsha3.html
[4] 
https://public-inbox.org/git/alpine.DEB.2.21.1.1706151122180.4200@virtualbox/
[5] 
https://public-inbox.org/git/ca+55afwun0kibpdqk2zrxzxkok8-aaub2njzqqkcpq1ddhd...@mail.gmail.com/


Hash algorithm analysis

2018-06-09 Thread brian m. carlson
== Discussion of Candidates

I've implemented and tested the following algorithms, all of which are
256-bit (in alphabetical order):

* BLAKE2b (libb2)
* BLAKE2bp (libb2)
* KangarooTwelve (imported from the Keccak Code Package)
* SHA-256 (OpenSSL)
* SHA-512/256 (OpenSSL)
* SHA3-256 (OpenSSL)
* SHAKE128 (OpenSSL)

I also rejected some other candidates.  I couldn't find any reference or
implementation of SHA256×16, so I didn't implement it.  I didn't
consider SHAKE256 because it is nearly identical to SHA3-256 in almost
all characteristics (including performance).

I imported the optimized 64-bit implementation of KangarooTwelve.  The
AVX2 implementation was not considered for licensing reasons (it's
partially generated from external code, which falls foul of the GPL's
"preferred form for modifications" rule).

=== BLAKE2b and BLAKE2bp

These are the non-parallelized and parallelized 64-bit variants of
BLAKE2.

Benefits:
* Both algorithms provide 256-bit preimage resistance.

Downsides:
* Some people are uncomfortable that the security margin has been
  decreased from the original SHA-3 submission, although it is still
  considered secure.
* BLAKE2bp, as implemented in libb2, uses OpenMP (and therefore
  multithreading) by default.  It was no longer possible to run the
  testsuite with -j3 on my laptop in this configuration.

=== Keccak-based Algorithms

SHA3-256 is the 256-bit Keccak algorithm with 24 rounds, processing 136
bytes at a time.  SHAKE128 is an extendable output function with 24
rounds, processing 168 bytes at a time.  KangarooTwelve is an extendable
output function with 12 rounds, processing 136 bytes at a time.

Benefits:
* SHA3-256 provides 256-bit preimage resistance.
* SHA3-256 has been heavily studied and is believed to have a large
  security margin.

I noted the following downsides:
* There's a lack of a availability of KangarooTwelve in other
  implementations.  It may be the least available option in terms of
  implementations.
* Some people are uncomfortable that the security margin of
  KangarooTwelve has been decreased, although it is still considered
  secure.
* SHAKE128 and KangarooTwelve provide only 128-bit preimage resistance.

=== SHA-256 and SHA-512/256

These are the 32-bit and 64-bit SHA-2 algorithms that are 256 bits in
size.

I noted the following benefits:
* Both algorithms are well known and heavily analyzed.
* Both algorithms provide 256-bit preimage resistance.

== Implementation Support

|===
| Implementation | OpenSSL | libb2 | NSS | ACC | gcrypt | Nettle| CL  |
| SHA-1  |    |   |    |    |   |  | {1} |
| BLAKE2b| f   |  | | |   |   | {2} |
| BLAKE2bp   | |  | | ||   | |
| KangarooTwelve | |   | | ||   | |
| SHA-256|    |   |    |    |   |  | {1} |
| SHA-512/256|    |   | | ||  | {3} |
| SHA3-256   |    |   | | |   |  | {4} |
| SHAKE128   |    |   | | |   |   | {5} |
|===

f: future version (expected 1.2.0)
ACC: Apple Common Crypto
CL: Command-line

:1: OpenSSL, coreutils, Perl Digest::SHA.
:2: OpenSSL, coreutils.
:3: OpenSSL
:4: OpenSSL, Perl Digest::SHA3.
:5: Perl Digest::SHA3.

=== Performance Analysis

The test system used below is my personal laptop, a 2016 Lenovo ThinkPad
X1 Carbon with an Intel i7-6600U CPU (2.60 GHz) running Debian unstable.

I implemented a test tool helper to compute speed much like OpenSSL
does.  Below is a comparison of speeds.  The columns indicate the speed
in KiB/s for chunks of the given size.  The runs are representative of
multiple similar runs.

256 and 1024 bytes were chosen to represent common tree and commit
object sizes and the 8 KiB is an approximate average blob size.

Algorithms are sorted by performance on the 1 KiB column.

|===
| Implementation | 256 B  | 1 KiB  | 8 KiB  | 16 KiB |
| SHA-1 (OpenSSL)| 513963 | 685966 | 748993 | 754270 |
| BLAKE2b (libb2)| 488123 | 552839 | 576246 | 579292 |
| SHA-512/256 (OpenSSL)  | 181177 | 349002 | 499113 | 495169 |
| BLAKE2bp (libb2)   | 139891 | 344786 | 488390 | 522575 |
| SHA-256 (OpenSSL)  | 264276 | 333560 | 357830 | 355761 |
| KangarooTwelve | 239305 | 307300 | 355257 | 364261 |
| SHAKE128 (OpenSSL) | 154775 | 253344 | 337811 | 346732 |
| SHA3-256 (OpenSSL) | 128597 | 185381 | 198931 | 207365 |
| BLAKE2bp (libb2; threaded) |  12223 |  49306 | 132833 | 179616 |
|===

SUPERCOP (a crypto benchmarking tool;
https://bench.cr.yp.to/results-hash.html) has also benchmarked these
algorithms.  Note that BLAKE2bp is not listed, KangarooTwelve is k12,
SHA-512/256 is equivalent to sha512, SHA3-256 is keccakc512, and SHAKE128 is
keccakc256.

Information is for kizomba, a Kaby Lake system.  Counts are in cycles
per byte (smaller is better; sorted by