subject:"Migrating away from SHA\-1\?"

Re: Migrating away from SHA-1?

2016-06-24 Thread brian m. carlson

On Sat, Jun 18, 2016 at 03:10:27AM +0100, Leo Gaspard wrote:
> First, sorry for not having this message threaded: I'm not subscribed to
> the list and haven't found a way to get a Message-Id from gmane.

Sorry it's taken so long to get back to this.  I've been at a
conference.

> So, my questions to the git team:
>  * Is there a consensus, that git should migrate away from SHA-1 before
> it gets a collision attack, because it would mean chosen-prefix
> collision isn't far away and people wouldn't have the time to upgrade?

I plan on adding support for a new hash as soon as that's possible, but
I don't have a firm timeline.  This is a volunteer effort in my own
limited free time.

>  * Is there a consensus, that Peter Anvin's amended transition plan is
> the way to go?

I'm not planning on changing algorithms in the middle of a repository.
This will only be available on new or imported repositories.

My current thinking on proposed algorithms is SHA3-256 or BLAKE2b-256.
The cryptanalysis on SHA-256 indicates that it may not be a great
long-term choice, and I expect people won't want to change algorithms
frequently.

If time becomes extremely urgent, we can always add support for a
160-bit hash first (e.g. BLAKE2b-160) and then finish the object_id
transition later as it becomes convenient.  I'd like to avoid that,
though.

>  * If the two conditions above are fulfilled, has work started on it
> yet? (I guess as Brian Carlson had started his work 9 weeks ago and he
> was speaking about working on it on the week-end he should have finished
> it now, so excluding this)

It takes a long time to get a patch series through.  I'm rather busy and
don't always have time to rebase and address issues during the week.

>  * If the two first conditions are fulfilled, is there anything I could
> do to help this transition? (including helping Brian if his work hasn't
> actually ended yet)

You're welcome to send patches if you like.  I try to avoid areas I know
are under heavy development, like the refs code.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204

signature.asc
Description: PGP signature

Re: Migrating away from SHA-1?

2016-06-17 Thread Eric Wong

Leo Gaspard  wrote:
> First, sorry for not having this message threaded: I'm not subscribed to
> the list and haven't found a way to get a Message-Id from gmane.

Appending "/raw" to the gmane URL will get you the raw message
with full headers:

  article.gmane.org/gmane.comp.version-control.git/$NUMBER/raw

you can also use that article $NUMBER via NNTP on news.gmane.org

> So, my questions to the git team:

It is customary to Cc: all relevant parties involved with that
thread since they may not all be subscribed, either.

>  * Is there a consensus, that git should migrate away from SHA-1 before
> it gets a collision attack, because it would mean chosen-prefix
> collision isn't far away and people wouldn't have the time to upgrade?
>  * Is there a consensus, that Peter Anvin's amended transition plan is
> the way to go?
>  * If the two conditions above are fulfilled, has work started on it
> yet? (I guess as Brian Carlson had started his work 9 weeks ago and he
> was speaking about working on it on the week-end he should have finished
> it now, so excluding this)

AFAIK, brian is still working on it.  Last series on the matter
begins here:
http://mid.gmane.org/20160607005716.69222-2-sand...@crustytoothpaste.net
I'm just on the sidelines observing :)

>  * If the two first conditions are fulfilled, is there anything I could
> do to help this transition? (including helping Brian if his work hasn't
> actually ended yet)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrating away from SHA-1?

2016-06-17 Thread Leo Gaspard

First, sorry for not having this message threaded: I'm not subscribed to
the list and haven't found a way to get a Message-Id from gmane.

I just wanted to ask, as an end-user highly relying on commit
signatures, a few questions as to the migration away from SHA-1.

SHA-1 already suffers from a freestart collision attack. Based on what I
understand of the object model of git, a chosen-prefix collision attack
(perhaps somewhat improved) is enough to make reviewers accept a patch,
sign it, and then swap the innocuous-looking patch for an evil-doing one
-- which *will be signed*.

As for the issue about code checking being an easier entrypoint
(Theodore Ts'o, 2016-04-14 22:40:51 GMT), in a use case of mine there is
a repo with my dotfiles on an untrusted server. Yet I download them and
am able to execute them without fear because each commit is PGP-signed
with my key. The point being that code checking is not even a possible
entrypoint in some cases, so SHA-1 seems to be(come) the weakest link.

So, I don't think it is possible to disagree with Jeff King when he
wrote his 2016-04-12 23:15:19 GMT email.

Peter Anvin (2016-04-14 17:28:50 GMT) gets a point in that there is no
need to hurry (chosen-prefix collisions may be still quite a long way,
even though there is no guesswork in these matters), and quality is
important. Yet Jeff King's proposal (2016-04-12 23:42:52 GMT), amended
by Junio Hamano (2016-04-13 01:03:02 GMT) and himself (2016-04-13
01:36:32 GMT) seem to have met no opposition.

So, my questions to the git team:
 * Is there a consensus, that git should migrate away from SHA-1 before
it gets a collision attack, because it would mean chosen-prefix
collision isn't far away and people wouldn't have the time to upgrade?
 * Is there a consensus, that Peter Anvin's amended transition plan is
the way to go?
 * If the two conditions above are fulfilled, has work started on it
yet? (I guess as Brian Carlson had started his work 9 weeks ago and he
was speaking about working on it on the week-end he should have finished
it now, so excluding this)
 * If the two first conditions are fulfilled, is there anything I could
do to help this transition? (including helping Brian if his work hasn't
actually ended yet)

Sorry for bringing up again a subject that seems to be quite recurrent,
and for this long block of text,
Leo Gaspard



signature.asc
Description: OpenPGP digital signature

Re: Migrating away from SHA-1?

2016-04-17 Thread brian m. carlson

On Tue, Apr 12, 2016 at 06:58:10PM -0700, H. Peter Anvin wrote:
> On April 12, 2016 6:51:12 PM PDT, Duy Nguyen  wrote:
> >On Wed, Apr 13, 2016 at 5:38 AM, H. Peter Anvin  wrote:
> >> OK, I'm going to open this can of worms...
> >>
> >> At what point do we migrate from SHA-1?
> >
> >Brian Carlson has been slowly refactoring git code base, abstracting
> >SHA-1 away. Once that work is done, I think we can talk about moving
> >away from SHA-1. The process is slow because it likely causes
> >conflicts with in-flight topics. A quick grep shows we still have
> >about 300 SHA-1 references, so it'll be quite some time.
> 
> Well, at least it sounds like work is underway.  That is a big deal.

Yes, it's a bunch of slow manual refactoring, and I've been busy as
we've been doing house- and car-related things recently.  I'll try to
spend a little more time on it this weekend.

The first step is to convert all of the individual places that use
unsigned char [20] to use struct object_id, which can then be extended
to use different hash algorithms.  There are also constants,
GIT_SHA1_RAWSZ and GIT_SHA1_HEXSZ, that abstract the 20 and 40 values in
the codebase so they can be changed in the future.

While this is a project I've been mostly working on, I have no objection
to other people sending in a patch or series as they feel like it.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204

signature.asc
Description: PGP signature

Re: Migrating away from SHA-1?

2016-04-14 Thread Jeff King

On Thu, Apr 14, 2016 at 07:18:53PM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > [2] Somewhere in the list archive is my patch to find partial
> > collisions like "git commit --sha1=31337", and I did in fact use
> > that micro-optimization. That, along with multi-threading, made it
> > feasible to do 6-8 character prefixes, as I recall.
> 
> In our testsuite, we have a test that uses many objects, all of
> which have object names that begin with 10 '0' characters.

Can you give more details on which test? 10 zeroes is 40 bits, which
means that by random chance, only about one in a trillion objects would
match that. We certainly didn't hit that randomly, and it seems like it
would be computationally expensive to have come up with the input for
even one such object, let alone "many".

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrating away from SHA-1?

2016-04-14 Thread Junio C Hamano

Jeff King  writes:

> [2] Somewhere in the list archive is my patch to find partial
> collisions like "git commit --sha1=31337", and I did in fact use
> that micro-optimization. That, along with multi-threading, made it
> feasible to do 6-8 character prefixes, as I recall.

In our testsuite, we have a test that uses many objects, all of
which have object names that begin with 10 '0' characters.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrating away from SHA-1?

2016-04-14 Thread Jeff King

On Thu, Apr 14, 2016 at 06:40:51PM -0400, Theodore Ts'o wrote:

> Also, remember that while we can write programs that look for
> suspicious git objects that have stuff hidden after the null
> terminator (in fact, maybe that would be a good thing to add to git,
> hmmm?)[...]

Detecting the hidden bytes is underway elsewhere on the list. And while
I think it's a good idea to do so, I don't think it really introduces
a meaningful defense against collision attacks.

You can also hide bytes in arbitrary headers in a git object[1], and
they will not be shown by default. Adding the extra bytes at the end is
certainly easier if you're micro-optimizing the collision process[2],
but I don't think it changes the fundamental equation. It reduces the
work you do per-sha1 by a constant factor, but not the number of sha1s
you expect to compute.

-Peff

[1] Obviously neither "extra headers" nor "stuff after NUL" applies to
patches sent by email, where everything short of binary-diffs is
human-readable. So for the kernel, you're really talking about
attacking a lieutenant whose repo gets pulled. But there are plenty
of other projects that "git merge" from strangers.

[2] Somewhere in the list archive is my patch to find partial
collisions like "git commit --sha1=31337", and I did in fact use
that micro-optimization. That, along with multi-threading, made it
feasible to do 6-8 character prefixes, as I recall.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrating away from SHA-1?

2016-04-14 Thread Theodore Ts'o

On Thu, Apr 14, 2016 at 10:28:50AM -0700, H. Peter Anvin wrote:
> 
> Either way, I agree with Ted, that we have enough time to do it
> right, but that is a good reason to do it sooner rather than later
> (see also my note about freezing the cryptographic properties.)

Sure, I think we should do it as well.  But the fact that the attacker
will likely need to get a commit into the tree in order to be able to
carry out a collision attack means that it's easier (and probably less
detectable) to get some underhanded C code into the tree.  For one
thing, you just need to introduce it via a patch ("Hi, I'm super eager
newbie Nick, here's a cleanup patch!"), as opposed to getting a
sublieutenant to accept a git pull request.

Also, remember that while we can write programs that look for
suspicious git objects that have stuff hidden after the null
terminator (in fact, maybe that would be a good thing to add to git,
hmmm?), the state of the art in detecting underhanded C code which is
deliberately designed to not be noticed by static code checkers (or
humans doing a superficial code review, for that matter) is not
particularly encouraging to me.

- Ted
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrating away from SHA-1?

2016-04-14 Thread H. Peter Anvin

On April 14, 2016 10:23:03 AM PDT, David Turner  
wrote:
>On Wed, 2016-04-13 at 21:53 -0400, Theodore Ts'o wrote:
>> On Tue, Apr 12, 2016 at 07:15:34PM -0400, David Turner wrote:
>> > 
>> > If SHA-1 is broken (in certain ways), someone *can* replace an
>> > arbitrary blob.  GPG does not help in this case, because the
>> > signature
>> > is over the commit object (which points to a tree, which eventually
>> > points to the blob), and the commit hasn't changed.  So the GPG
>> > signature will still verify.
>> 
>> The "in certain ways" is the critical bit.  The question is whether
>> you are trying to replace an arbitrary blob, or a blob that was
>> submitted under your control.
>> 
>> If you are trying to replace an arbitrary blob under the you need to
>> carry a preimage attack.  That means that given a particular hash,
>> you
>> need to find another blob that has the same hash.  SHA-1 is currently
>> resistant against preimage attack (that is, you need to use brute
>> force, so the work factor is 2**159).  
>> 
>> If you are trying to replace an arbitrary blob which is under your
>> control, then all you need is a collision attack, and this is where
>> SHA-1 has been weakened.  It is now possible to find a collision with
>> a work factor of 2**69, instead of the requisite 2**80.
>> 
>> It was a MD5 collision which was involved with the Flame attack.
>> Someone (in probably the US or Isreali intelligence services)
>> submitted a Certificate Signing Request (CSR) to the Microsoft
>> Terminal Services Licensing server.  That CSR was under the control
>> of
>> the attacker, and it resulted in a certificate where parts of the
>> certificate could be swapped out with the corresponding fields from
>> another CSR (which was not submitted to the Certifiying Authority)
>> which had the code signing bit set.
>> 
>> So in order to carry out this attack, not only did the (cough)
>> "unknown" attackers had to have come up with a collision, but the two
>> pieces of colliding blobs had to parsable a valid CSR's, one which
>> had
>> to pass inspection by the automated CA signing authority, and the
>> other which had to contain the desired code signing bits set so the
>> attacker could sabotage an Iranian nuclear centrifuge.
>> 
>> OK, so how does this map to git?  First of all, from a collision
>> perspective, the two blobs have to map into valid C code, one of
>> which
>> has to be innocuous enough such that any humans who review the patch
>> and/or git pull request don't notice anything wrong.  
>
>It looks like Linux contains at least some firmware which would be hard
>to audit.  One random example is:
>firmware/bnx2x/bnx2x-e1h-6.2.9.0.fw.ihex

Either way, I agree with Ted, that we have enough time to do it right, but that 
is a good reason to do it sooner rather than later (see also my note about 
freezing the cryptographic properties.)
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrating away from SHA-1?

2016-04-14 Thread David Turner

On Wed, 2016-04-13 at 21:53 -0400, Theodore Ts'o wrote:
> On Tue, Apr 12, 2016 at 07:15:34PM -0400, David Turner wrote:
> > 
> > If SHA-1 is broken (in certain ways), someone *can* replace an
> > arbitrary blob.  GPG does not help in this case, because the
> > signature
> > is over the commit object (which points to a tree, which eventually
> > points to the blob), and the commit hasn't changed.  So the GPG
> > signature will still verify.
> 
> The "in certain ways" is the critical bit.  The question is whether
> you are trying to replace an arbitrary blob, or a blob that was
> submitted under your control.
> 
> If you are trying to replace an arbitrary blob under the you need to
> carry a preimage attack.  That means that given a particular hash,
> you
> need to find another blob that has the same hash.  SHA-1 is currently
> resistant against preimage attack (that is, you need to use brute
> force, so the work factor is 2**159).  
> 
> If you are trying to replace an arbitrary blob which is under your
> control, then all you need is a collision attack, and this is where
> SHA-1 has been weakened.  It is now possible to find a collision with
> a work factor of 2**69, instead of the requisite 2**80.
> 
> It was a MD5 collision which was involved with the Flame attack.
> Someone (in probably the US or Isreali intelligence services)
> submitted a Certificate Signing Request (CSR) to the Microsoft
> Terminal Services Licensing server.  That CSR was under the control
> of
> the attacker, and it resulted in a certificate where parts of the
> certificate could be swapped out with the corresponding fields from
> another CSR (which was not submitted to the Certifiying Authority)
> which had the code signing bit set.
> 
> So in order to carry out this attack, not only did the (cough)
> "unknown" attackers had to have come up with a collision, but the two
> pieces of colliding blobs had to parsable a valid CSR's, one which
> had
> to pass inspection by the automated CA signing authority, and the
> other which had to contain the desired code signing bits set so the
> attacker could sabotage an Iranian nuclear centrifuge.
> 
> OK, so how does this map to git?  First of all, from a collision
> perspective, the two blobs have to map into valid C code, one of
> which
> has to be innocuous enough such that any humans who review the patch
> and/or git pull request don't notice anything wrong.  

It looks like Linux contains at least some firmware which would be hard
to audit.  One random example is:
firmware/bnx2x/bnx2x-e1h-6.2.9.0.fw.ihex

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrating away from SHA-1?

2016-04-14 Thread Joey Hess

Theodore Ts'o wrote:
> OK, so how does this map to git?  First of all, from a collision
> perspective, the two blobs have to map into valid C code

Git provides other places to hide the colliding blobs; the best seems to
be as an added header in the commit object, or as trailing data after a \0
in the commit message. git is very good at hiding such potentially
colliding data from the user, as https://github.com/joeyh/supercollider
demonstrates.

commit 24f30db5790b209fa412ce81c5ef2bf8af5fd4d7
Author: Joey Hess 
Date:   Fri Sep 9 11:49:21 2011 -0400

an innocent commit

If this were a sha1 colliding attack, there would be some sort of binary
garbage below. Which there isn't. So this can be safely merged.

joey@darkstar:~/tmp/supercollider>git cat-file -p 
24f30db5790b209fa412ce81c5ef2bf8af5fd4d7
tree 735a7633237c07b398856005de3bc9ea00446747
author Joey Hess  1315583361 -0400
committer Joey Hess  1315583361 -0400

an innocent commit

If this were a sha1 colliding attack, there would be some sort of binary
garbage below. Which there isn't. So this can be safely merged.

??b???[?i??ͯ?t?
2??os??Q??H?޸*zl?RA˂q?E
?E7???\?m???U?>MU 
GY?d)?ȼ??'g?~D??ɯhQ????/"E??X?m???^͸??S?D??;w6(?`??>?縘?AѲ?*!??@v>?8??2?!??=*?J

???ynH???c?w?\??K7???N?6?????A5?FM?wZ?~?pKY?R???s7??(?ƶ?_"??m%1a??ʀ??K[
t????!A0?ΈfT.?T?w?ᛵƌ?р???aco?V/2??nَ?
?}?6?_?z?{

(The other possibility would be to hide the colliding blob in the tree
object, but that seems unlikely.)

-- 
see shy jo

signature.asc
Description: PGP signature

Re: Migrating away from SHA-1?

2016-04-13 Thread Theodore Ts'o

On Tue, Apr 12, 2016 at 07:15:34PM -0400, David Turner wrote:
> 
> If SHA-1 is broken (in certain ways), someone *can* replace an
> arbitrary blob.  GPG does not help in this case, because the signature
> is over the commit object (which points to a tree, which eventually
> points to the blob), and the commit hasn't changed.  So the GPG
> signature will still verify.

The "in certain ways" is the critical bit.  The question is whether
you are trying to replace an arbitrary blob, or a blob that was
submitted under your control.

If you are trying to replace an arbitrary blob under the you need to
carry a preimage attack.  That means that given a particular hash, you
need to find another blob that has the same hash.  SHA-1 is currently
resistant against preimage attack (that is, you need to use brute
force, so the work factor is 2**159).  

If you are trying to replace an arbitrary blob which is under your
control, then all you need is a collision attack, and this is where
SHA-1 has been weakened.  It is now possible to find a collision with
a work factor of 2**69, instead of the requisite 2**80.

It was a MD5 collision which was involved with the Flame attack.
Someone (in probably the US or Isreali intelligence services)
submitted a Certificate Signing Request (CSR) to the Microsoft
Terminal Services Licensing server.  That CSR was under the control of
the attacker, and it resulted in a certificate where parts of the
certificate could be swapped out with the corresponding fields from
another CSR (which was not submitted to the Certifiying Authority)
which had the code signing bit set.

So in order to carry out this attack, not only did the (cough)
"unknown" attackers had to have come up with a collision, but the two
pieces of colliding blobs had to parsable a valid CSR's, one which had
to pass inspection by the automated CA signing authority, and the
other which had to contain the desired code signing bits set so the
attacker could sabotage an Iranian nuclear centrifuge.

OK, so how does this map to git?  First of all, from a collision
perspective, the two blobs have to map into valid C code, one of which
has to be innocuous enough such that any humans who review the patch
and/or git pull request don't notice anything wrong.  The second has
to contain whatever security backdoor the attacker is going to try to
introduce into the git tree.  Ideally this is also should pass muster
by humans who are inspecting the code, but if the attack is targetted
against a specific victim which is not likely to look at the code, it
might be okay if something like this:

#if 0  /* this is needed to make the hash collision work */
aev2Ein4Hagh8eimshood5aTeteiVo9hOhchohN6jiem6AiNEipeeR3Pie4ePaeJ
fo8eLa9ateeKie5VeG5eZuu2Sahqu1Ohai9ohGhuAevoot5OtohQuai7koo4IeTh
ohCefae4Ahkah0eiku2Efo0iuHai8ideaRooth8wVahlia0nuu1eeSh5oht1Kaer
aiJi4chunahK9oozpaiWu7viee5aiFahud6Ee2zieich1veKque6PhiaAit1shie
#endif

... was hidden in the middle of the replacement blob.  One would
*hope*, though, that if something like this appeared in a blob that
was being sent to the upstream repository, that even a sloppy github
pull request reviewer would notice.

That's because in this scenario, the attacker needs to be able to get
the first blob into the git tree first, which means they need to be
trusted enough to get the first blob in.  And so the question which
comes to mind is if you are that trusted (or if the git pull review
process is that crappy), might it not be easier to simply introduce an
obfuscated code that has a security weakness?  That is, something from
the Underhanded C contest, or an accidental buffer overrun, hopefully
one that isn't noticed by static code checkers.  If you do that, you
don't even need to figure out how to create a SHA-1 collision.

Does that mean that we shouldn't figure out how to migrate to another
hash function?  No, it's probably worth planning how to do it.  But we
probably have a fair amount of time to get this right.

Cheers,

- Ted
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrating away from SHA-1?

2016-04-12 Thread H. Peter Anvin

On April 12, 2016 6:51:12 PM PDT, Duy Nguyen  wrote:
>On Wed, Apr 13, 2016 at 5:38 AM, H. Peter Anvin  wrote:
>> OK, I'm going to open this can of worms...
>>
>> At what point do we migrate from SHA-1?
>
>Brian Carlson has been slowly refactoring git code base, abstracting
>SHA-1 away. Once that work is done, I think we can talk about moving
>away from SHA-1. The process is slow because it likely causes
>conflicts with in-flight topics. A quick grep shows we still have
>about 300 SHA-1 references, so it'll be quite some time.

Well, at least it sounds like work is underway.  That is a big deal.
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrating away from SHA-1?

2016-04-12 Thread Duy Nguyen

On Wed, Apr 13, 2016 at 5:38 AM, H. Peter Anvin  wrote:
> OK, I'm going to open this can of worms...
>
> At what point do we migrate from SHA-1?

Brian Carlson has been slowly refactoring git code base, abstracting
SHA-1 away. Once that work is done, I think we can talk about moving
away from SHA-1. The process is slow because it likely causes
conflicts with in-flight topics. A quick grep shows we still have
about 300 SHA-1 references, so it'll be quite some time.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrating away from SHA-1?

2016-04-12 Thread H. Peter Anvin


On 04/12/16 18:03, Junio C Hamano wrote:


and so on. Of course trees don't have any space for this; they have a
fixed-length for the hash part of each record, which is basically:

 NUL <20-byte-sha1>

So we'd probably need a "treev2" object type that gives room for an
algorithm byte (or we'd have to try to shove it into the mode, but since
old versions won't know the new algorithm anyway, I don't think it
solves that much...). Or you can just define for the whole tree object
(either implicit in its type, or in a header) that it always uses
algorithm X.


This will hurt the performance a lot during the transition period as
it no longer will be possible to rely on "most of the time a fine
grained commit changes only a small part of the tree, and we can
cheaply avoid descending into trees that haven't changed because we
can tell that the corresponding tree objects in the pre- and post-
trees have the same object name" optimization.  But we cannot avoid
it.



Not really, because you can point to the algoX hash even for the 
existing objects.


Perhaps the tree object can add a format descriptor at the beginning; 
something like:


 


Transitioning to that would be something like:

   0. Overhaul all of the git code to handle arbitrary-sized object ids.

   1. Decide on the new algorithm and implement it in git.

   2. Recognize parameterized object ids in commits and tags (designing
  format, implementing the reading side).

   3. Recognize parameterized object ids somehow in trees (designing
  format, implementing the reading side).

   4. Teach the object database to index objects by the new algorithm (or
  possibly both algorithms).

   5. Add a protocol extension so that both sides can decide which
  algorithm is being used when they talk about oids.

   6. Add a config option to write references in objects using the new
  algorithm.

   7. After a while, flip the config option on. Hopefully the readers
  from steps 1-5 have percolated to the masses by then, and it's not
  a horrible flag day.

We're basically on step 0 right now. I'm sure I'm missing some
subtleties in there, too.


One subtlety is that 7. "not a flag day" may not be a good thing.

There has to be a section of a history that spans the transition,
set of commits and trees that have pointers to both kinds of object
names.  The narrower such a section of the history, the more
pleasant to use the result of the transition would be.

Different projects that can have their own flag days at their own
pace is a good thing, so the above observation does not invalidate
your transition plan, though.


I don't think there is any way this can *not* be by repository and 
somehow require a manual operation in order to preserve the 
cryptographic integrity.  In some ways, the transition point and the 
transition table becomes a special kind of tag object.  There may have 
to be more than one in the case of commits in multiple trees.


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrating away from SHA-1?

2016-04-12 Thread Jeff King

On Tue, Apr 12, 2016 at 06:03:02PM -0700, Junio C Hamano wrote:

> > So we'd probably need a "treev2" object type that gives room for an
> > algorithm byte (or we'd have to try to shove it into the mode, but since
> > old versions won't know the new algorithm anyway, I don't think it
> > solves that much...). Or you can just define for the whole tree object
> > (either implicit in its type, or in a header) that it always uses
> > algorithm X.
> 
> This will hurt the performance a lot during the transition period as
> it no longer will be possible to rely on "most of the time a fine
> grained commit changes only a small part of the tree, and we can
> cheaply avoid descending into trees that haven't changed because we
> can tell that the corresponding tree objects in the pre- and post-
> trees have the same object name" optimization.  But we cannot avoid
> it.

Yeah. I'd hope in general that there would be a single commit that does
the transition, and we'd only pay it when doing diffs across the
boundary. And even then, I think a local-only cache of aliases could
mitigate the worst of it.

> >   7. After a while, flip the config option on. Hopefully the readers
> >  from steps 1-5 have percolated to the masses by then, and it's not
> >  a horrible flag day.
> >
> > We're basically on step 0 right now. I'm sure I'm missing some
> > subtleties in there, too.
> 
> One subtlety is that 7. "not a flag day" may not be a good thing.
> 
> There has to be a section of a history that spans the transition,
> set of commits and trees that have pointers to both kinds of object
> names.  The narrower such a section of the history, the more
> pleasant to use the result of the transition would be.
> 
> Different projects that can have their own flag days at their own
> pace is a good thing, so the above observation does not invalidate
> your transition plan, though.

Good point. I do think projects would do well to have a moment where
they switch to the new format, and don't freely intermingle. We could
possibly do some magic there to help things out. For example, if we are
building on a commit that is sha-2, we automatically use more sha-2
objects to point to them. And then the "flag day" for a project is
simply that somebody pushes to "master" using sha-2, and everybody
else's git (which learned long ago to speak the new algorithm) just
picks it up.

Of course that's not exactly a flag day for projects that branch from
old history for bugfixes. But it might be close enough.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrating away from SHA-1?

2016-04-12 Thread Junio C Hamano

Jeff King  writes:

> So a slightly nicer thing is to parameterize the algorithm for every
> object name reference. So commits look like:
>
>   tree sha256:1234abcd...
>   parent sha256:1234abcd...
>
> and so on. Of course trees don't have any space for this; they have a
> fixed-length for the hash part of each record, which is basically:
>
> NUL <20-byte-sha1>
>
> So we'd probably need a "treev2" object type that gives room for an
> algorithm byte (or we'd have to try to shove it into the mode, but since
> old versions won't know the new algorithm anyway, I don't think it
> solves that much...). Or you can just define for the whole tree object
> (either implicit in its type, or in a header) that it always uses
> algorithm X.

This will hurt the performance a lot during the transition period as
it no longer will be possible to rely on "most of the time a fine
grained commit changes only a small part of the tree, and we can
cheaply avoid descending into trees that haven't changed because we
can tell that the corresponding tree objects in the pre- and post-
trees have the same object name" optimization.  But we cannot avoid
it.

> Transitioning to that would be something like:
>
>   0. Overhaul all of the git code to handle arbitrary-sized object ids.
>
>   1. Decide on the new algorithm and implement it in git.
>
>   2. Recognize parameterized object ids in commits and tags (designing
>  format, implementing the reading side).
>
>   3. Recognize parameterized object ids somehow in trees (designing
>  format, implementing the reading side).
>
>   4. Teach the object database to index objects by the new algorithm (or
>  possibly both algorithms).
>
>   5. Add a protocol extension so that both sides can decide which
>  algorithm is being used when they talk about oids.
>
>   6. Add a config option to write references in objects using the new
>  algorithm.
>
>   7. After a while, flip the config option on. Hopefully the readers
>  from steps 1-5 have percolated to the masses by then, and it's not
>  a horrible flag day.
>
> We're basically on step 0 right now. I'm sure I'm missing some
> subtleties in there, too.

One subtlety is that 7. "not a flag day" may not be a good thing.

There has to be a section of a history that spans the transition,
set of commits and trees that have pointers to both kinds of object
names.  The narrower such a section of the history, the more
pleasant to use the result of the transition would be.

Different projects that can have their own flag days at their own
pace is a good thing, so the above observation does not invalidate
your transition plan, though.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrating away from SHA-1?

2016-04-12 Thread Jeff King

On Tue, Apr 12, 2016 at 07:15:34PM -0400, David Turner wrote:

> It would be possible, of course, to GPG-sign the entire commit's
> transitive data (rather than just the SHA1s of same).  But as far as I
> know, that is not ever what is done.

There is a project called git-evtag which does this, and you can find
mention on the list. The problem is just that it's not very efficient.
That's maybe OK for tag-signing, which is relatively rare. It wouldn't
really work for commit-signing.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrating away from SHA-1?

2016-04-12 Thread Jeff King

On Tue, Apr 12, 2016 at 03:38:04PM -0700, H. Peter Anvin wrote:

> For existing repositories we will need to have a migration mechanism. Since
> we can't modify objects without completely invalidating the cryptographic
> properties, what I would suggest is that we leave the existing objects as
> is, with a persistent lookup table from SHA-1 to , and have that
> lookup table signed (e.g. GPG) by the person responsible for converting the
> repository.  This freezes the cryptographic status of the existing SHA-1
> objects at the time the conversion happens.  This is a very good reason to
> do this before SHA-1 is actually broken  In contrast. SHA-2 has been
> surprisingly resistant to cryptoanalysis, to the point that SHA-3 was
> motivated by performance and the desire to have a well-tested function based
> on entirely different principles should a generic attack against the common
> structure of MD5/SHA-1/SHA-2 would ever be found.

There are a few threads in the list archive discussing options, if you
search.

A conversion table like you mention seems like a "step 2". I think the
first step is figuring out what the new format looks like, and how
objects refer to each other.

The absolute simplest thing that could work is literally replacing sha1
with a 160-bit truncation of sha-256, telling everybody to convert their
repos, and accepting that existing gpg signatures and external sha1
references are all obsolete. Old versions of git are obsolete, but the
code changes are very minor.

That sucks for a lot of reasons, obviously.

So a slightly nicer thing is to parameterize the algorithm for every
object name reference. So commits look like:

  tree sha256:1234abcd...
  parent sha256:1234abcd...

and so on. Of course trees don't have any space for this; they have a
fixed-length for the hash part of each record, which is basically:

NUL <20-byte-sha1>

So we'd probably need a "treev2" object type that gives room for an
algorithm byte (or we'd have to try to shove it into the mode, but since
old versions won't know the new algorithm anyway, I don't think it
solves that much...). Or you can just define for the whole tree object
(either implicit in its type, or in a header) that it always uses
algorithm X.

And then the "new" objects can refer to the older sha1 objects directly
(either via "sha1:1234abcd", or we'd probably define a parameter-less
reference to mean "sha1:"), and that essentially grafts the old history
to the new. You can always walk the old history. And because we're
really talk about collision attacks and not pre-image attacks, it
probably remains fairly trustworthy for chaining (because nobody is
making _new_ objects and referring to them via sha1).

And then if you buy into the collision vs pre-image thing above, there's
not much point in caring about the mapping between sha1 and the new
algorithm. The old ones are set in stone and probably fine. You might
want such a mapping for performance (e.g., so that you can immediately
tell that an old sha-1 tree and a new sha-2 tree have an empty diff,
even though they have different ids), but that's purely a local thing.

So perhaps you were thinking of something in between, or an alternative
plan altogether.  I haven't been able to think of a scheme that is
secure, convenient, and involves less work than the one above.

Transitioning to that would be something like:

  0. Overhaul all of the git code to handle arbitrary-sized object ids.

  1. Decide on the new algorithm and implement it in git.

  2. Recognize parameterized object ids in commits and tags (designing
 format, implementing the reading side).

  3. Recognize parameterized object ids somehow in trees (designing
 format, implementing the reading side).

  4. Teach the object database to index objects by the new algorithm (or
 possibly both algorithms).

  5. Add a protocol extension so that both sides can decide which
 algorithm is being used when they talk about oids.

  6. Add a config option to write references in objects using the new
 algorithm.

  7. After a while, flip the config option on. Hopefully the readers
 from steps 1-5 have percolated to the masses by then, and it's not
 a horrible flag day.

We're basically on step 0 right now. I'm sure I'm missing some
subtleties in there, too.

Things get simpler if you don't fully parameterize (e.g., just assume
everything is moved to the new algorithm, and provide a "legacy" parent
pointer for connecting to sha1 history). But part of this would be
future-proofing for a day when sha-2 fails.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrating away from SHA-1?

2016-04-12 Thread David Turner

On Tue, 2016-04-12 at 16:00 -0700, Stefan Beller wrote:
> On Tue, Apr 12, 2016 at 3:38 PM, H. Peter Anvin 
> wrote:
> > OK, I'm going to open this can of worms...
> > 
> > At what point do we migrate from SHA-1?  At this point the
> > cryptoanalysis of
> > SHA-1 is most likely a matter of time.
> 
> And I thought the cryptographic properties of SHA1 did not matter for
> Gits use case.
> We could employ broken md5 or such as well.
> ( see http://stackoverflow.com/questions/28792784/why-does-git-use-a-
> cryptographic-hash-function
> )
> That is because security goes on top via gpg signing of tags/commits.
> 
> I am not sure if anyone came up with
> a counter argument to Linus reasoning there?

Here's my reasoning as to why the security of SHA1 matters:

If SHA-1 is not broken, and someone hacks into e.g. kernel.org, they
can't replace an arbitrary blob with anything else without being
detected by git's automatic checksumming of objects.  GPG is necessary
here because otherwise the HEAD commit could be changed (to point to a
new tree that points to the new blob). 

If SHA-1 is broken (in certain ways), someone *can* replace an
arbitrary blob.  GPG does not help in this case, because the signature
is over the commit object (which points to a tree, which eventually
points to the blob), and the commit hasn't changed.  So the GPG
signature will still verify.

It would be possible, of course, to GPG-sign the entire commit's
transitive data (rather than just the SHA1s of same).  But as far as I
know, that is not ever what is done.

This is the argument for migration to a more-secure hash.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrating away from SHA-1?

2016-04-12 Thread Jeff King

On Tue, Apr 12, 2016 at 04:00:18PM -0700, Stefan Beller wrote:

> On Tue, Apr 12, 2016 at 3:38 PM, H. Peter Anvin  wrote:
> > OK, I'm going to open this can of worms...
> >
> > At what point do we migrate from SHA-1?  At this point the cryptoanalysis of
> > SHA-1 is most likely a matter of time.
> 
> And I thought the cryptographic properties of SHA1 did not matter for
> Gits use case.
> We could employ broken md5 or such as well.
> ( see 
> http://stackoverflow.com/questions/28792784/why-does-git-use-a-cryptographic-hash-function
> )
> That is because security goes on top via gpg signing of tags/commits.
> 
> I am not sure if anyone came up with
> a counter argument to Linus reasoning there?

I have never understood that reasoning at all, nor why it is so often
repeated.

The GPG signature is over a single object, that mentions other objects
by their sha1 ids. But users don't care that v1.0 is securely mapped to
tree 1234abcd. They care which files are in 1234abcd, and if sha1 is
broken, it means you can't credibly verify the content down to the blob
level.

There's some additional protection in that git generally prefers objects
it already has to new ones. So it's hard to reliably distribute your
evil colliding object, depending on where people might have fetched
from first. But:

  1. I know there's at least once race[1] where a colliding object can
 still enter the repository. There may be more that have either
 existed all along, or that have grown over the years. I don't think
 this is something we've paid attention to and tested.

  2. That helps some people, I guess, but it's little consolation to
 somebody who runs "git clone" followed by verifying the tag.

-Peff

[1] The race I am thinking of is that for performance reasons, we don't
re-scan the pack directory when index-pack checks has_sha1_file()
on an incoming object and it comes up negative. So if somebody else
is repacking, we might skip the collision check in such a case. At
least that race is not under control of an attacker, though.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrating away from SHA-1?

2016-04-12 Thread H. Peter Anvin

On 04/12/16 16:00, Stefan Beller wrote:

On Tue, Apr 12, 2016 at 3:38 PM, H. Peter Anvin wrote:

OK, I'm going to open this can of worms...

At what point do we migrate from SHA-1? At this point the cryptoanalysis of
SHA-1 is most likely a matter of time.

And I thought the cryptographic properties of SHA1 did not matter for
Gits use case.
We could employ broken md5 or such as well.
( see
http://stackoverflow.com/questions/28792784/why-does-git-use-a-cryptographic-hash-function
)
That is because security goes on top via gpg signing of tags/commits.

I am not sure if anyone came up with
a counter argument to Linus reasoning there?

Not true, because what we are signing is a chain of SHA-1s; the
signature is meaningless unless the integrity of the hash chain is
inviolate.

For existing repositories we will need to have a migration mechanism. Since
we can't modify objects without completely invalidating the cryptographic
properties, what I would suggest is that we leave the existing objects as
is, with a persistent lookup table from SHA-1 to , and have that
lookup table signed (e.g. GPG) by the person responsible for converting the
repository. This freezes the cryptographic status of the existing SHA-1
objects at the time the conversion happens. This is a very good reason to
do this before SHA-1 is actually broken In contrast. SHA-2 has been
surprisingly resistant to cryptoanalysis, to the point that SHA-3 was
motivated by performance and the desire to have a well-tested function based
on entirely different principles should a generic attack against the common
structure of MD5/SHA-1/SHA-2 would ever be found.

When the kernel moved from BitKeeper to Git, all history was thrown away,
and started from scratch. The old history could be grafted into the
repo, if you cared
though.

I'd propose to go that route again and use a sha1 graft history which
you can get optionally
put into your new history for convenience.

That was done more for legal reasons than anything else, as far as I
understand. The userbase of git today is also much, much larger than
the userbase for BK ever was.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Migrating away from SHA-1?

2016-04-12 Thread Stefan Beller

On Tue, Apr 12, 2016 at 3:38 PM, H. Peter Anvin  wrote:
> OK, I'm going to open this can of worms...
>
> At what point do we migrate from SHA-1?  At this point the cryptoanalysis of
> SHA-1 is most likely a matter of time.

And I thought the cryptographic properties of SHA1 did not matter for
Gits use case.
We could employ broken md5 or such as well.
( see 
http://stackoverflow.com/questions/28792784/why-does-git-use-a-cryptographic-hash-function
)
That is because security goes on top via gpg signing of tags/commits.

I am not sure if anyone came up with
a counter argument to Linus reasoning there?

>
> For existing repositories we will need to have a migration mechanism. Since
> we can't modify objects without completely invalidating the cryptographic
> properties, what I would suggest is that we leave the existing objects as
> is, with a persistent lookup table from SHA-1 to , and have that
> lookup table signed (e.g. GPG) by the person responsible for converting the
> repository.  This freezes the cryptographic status of the existing SHA-1
> objects at the time the conversion happens.  This is a very good reason to
> do this before SHA-1 is actually broken  In contrast. SHA-2 has been
> surprisingly resistant to cryptoanalysis, to the point that SHA-3 was
> motivated by performance and the desire to have a well-tested function based
> on entirely different principles should a generic attack against the common
> structure of MD5/SHA-1/SHA-2 would ever be found.

When the kernel moved from BitKeeper to Git, all history was thrown away,
and started from scratch. The old history could be grafted into the
repo, if you cared
though.

I'd propose to go that route again and use a sha1 graft history which
you can get optionally
put into your new history for convenience.

Stefan

>
> -hpa
>
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Migrating away from SHA-1?

2016-04-12 Thread H. Peter Anvin


OK, I'm going to open this can of worms...

At what point do we migrate from SHA-1?  At this point the 
cryptoanalysis of SHA-1 is most likely a matter of time.


For existing repositories we will need to have a migration mechanism. 
Since we can't modify objects without completely invalidating the 
cryptographic properties, what I would suggest is that we leave the 
existing objects as is, with a persistent lookup table from SHA-1 to 
, and have that lookup table signed (e.g. GPG) by the person 
responsible for converting the repository.  This freezes the 
cryptographic status of the existing SHA-1 objects at the time the 
conversion happens.  This is a very good reason to do this before SHA-1 
is actually broken  In contrast. SHA-2 has been surprisingly resistant 
to cryptoanalysis, to the point that SHA-3 was motivated by performance 
and the desire to have a well-tested function based on entirely 
different principles should a generic attack against the common 
structure of MD5/SHA-1/SHA-2 would ever be found.


-hpa

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Migrating away from SHA-1?

Re: Migrating away from SHA-1?

Re: Migrating away from SHA-1?

Re: Migrating away from SHA-1?

Re: Migrating away from SHA-1?

Re: Migrating away from SHA-1?

Re: Migrating away from SHA-1?

Re: Migrating away from SHA-1?

Re: Migrating away from SHA-1?

Re: Migrating away from SHA-1?

Re: Migrating away from SHA-1?

Re: Migrating away from SHA-1?

Re: Migrating away from SHA-1?

Re: Migrating away from SHA-1?

Re: Migrating away from SHA-1?

Re: Migrating away from SHA-1?

Re: Migrating away from SHA-1?

Re: Migrating away from SHA-1?

Re: Migrating away from SHA-1?

Re: Migrating away from SHA-1?

Re: Migrating away from SHA-1?

Re: Migrating away from SHA-1?

Re: Migrating away from SHA-1?

Migrating away from SHA-1?

24 matches

Site Navigation

Mail list logo

Footer information