Re: When Will We See Collisions for SHA-1? (An interesting analysis by Bruce Schneier)

2012-10-17 Thread Peter Todd
On Tue, Oct 16, 2012 at 02:27:51PM -0400, Jeff King wrote:
> > The one reason why we *might* want to use SHA-3, BTW, is that it is a
> > radically different design from SHA-1 and SHA-2.  And if there is a
> > crypto hash failure which is bad enough that the security of git would
> > be affected, there's a chance that the same attack could significantly
> > affect SHA-2 as well.  The fact that SHA-3 is fundamentally different
> > from a cryptographic design perspective means that an attack that
> > impacts SHA-1/SHA-2 will not likely impact SHA-3, and vice versa.
> 
> Right. The point of having the SHA-3 contest was that we thought SHA-1's
> breakage meant that SHA-2 was going to fall next. But Schneier's
> comments before the winners were announced were basically "it turns out
> that SHA-2 is not broken like we thought, so there's no reason to ditch
> it, and the fact that it is well-studied and well-deployed may mean it's
> a good choice".
> 
> So I could go either way. This is not a decision we should make today,
> though, so we can wait and see which direction the world goes before
> picking an algorithm.

Do you really need to pick an algorithm and go through a full-on flag
day ten years down the road all over again? People don't really care
that a git revision is actually the hex-encoded SHA1 hash of a tree.
They just know it's this long string of "stuff" that uniquely identifies
a revison globally somehow. They know if they copy and paste the first
few characters of the string there is a small chance two revisions will
have the same first few characters, and if they copy and paste the whole
string the chance drops to "you're whole dev team will be eaten by
wolves in tragic unrelated incidences" unlikely.

So why bake in a single algorithm? We'll have to extend the length of a
whole revision string anyway - the alternatives start at 256bits - and
people are going to want to be able to specify the whole revision string
at least sometimes. Once you've gone through that pain, why have to
repeat it again in ten years?


Let's make revisions be a long but variable length string. A revision by
itself is meaningless of course. However if if you know of a repo that
contains that revision, you can convert it into something useful, like a
commit and associated tree. If you don't know, well, you'd be stuck
anyway right now. 

Now when you push and pull from a remote repo what'll happen is the repo
will figure out what type(s) of hash algorithm your client supports. A
Git 3000 user with a repo using SHA3072 can talk to a v0.1 client just
fine: they send the v0.1 client revisions calculated with an algorithm
they support, and when they pull revisions from that repo they calculate
new revisions with their preferred algorithm. If they want to do this a
lot, they maintain the two sets of digest tables next to each other,
with the SHA3072 table marked as preferred, and the rest kept only so
pushes and pulls can be fast. In most cases a project will convert to
one hash algorithm, but by having multi-hash support that conversion
doesn't have to be a flag day, and at the same time it's still easy to
lookup old revisions by their old digests. Meanwhile the crypto-wonks
get to have their fun PGP signing and timestamping long, secure digests.

Note that we don't even have to shut out non-upgraded users from
participating. Machine-to-machine communication is not a problem as
outlined above, but even with stuff like mailing lists we can start
passing around concatenated revisions like the following: 

da39a3ee5e6b4b0d3255bfef95601890afd80709.e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

Old users just use the first bit. (the period isn't even required
really) If you think that's too long, there's a simple solution that
keeps -bit security, albeit one whose implications
lead you right to Linus's lines of thinking:

da39a3ee5e6b4b0d3255bfef95601890afd80709

That's just a SHA1 again. Of course, if you actually care about this
stuff you already have cryptographic infrastructure, and that
infrastructure can simply store *trusted* metadata in you're repos
saying that the string 'foo' happens to be a valid alias for the actual
digest that *the user* can specify instead of that digest. It may even
be that for your security needs just timestamping those aliases is
enough. Either way while something needs to be calculating secure
hashes, and preferably Git mainline so push and pull works without
having to examine every last line of code, you can get away without
changing the UI very much.


Anyway, in the short term the people who care can write parallel digest
calculators; I personally have a use-case for one right now. Better code
to handle the cases where individual blobs have colliding hashes is
required as well in the medium term. Finally those who require it could
very well write parallel git's to effectively do the pulling and pushing
of their parallel calculated revision hashes if they really wanted too.

Bu

Re: When Will We See Collisions for SHA-1? (An interesting analysis by Bruce Schneier)

2012-10-16 Thread Junio C Hamano
Jeff King  writes:

> A much bigger problem is the other places we reference sha1s. The
> obvious place is trees, which have no room for backup pointers (either
> in headers, or with a NUL trick).

This is a tangent (as I do not have anything particularly worth
adding on top of what have already been said around the exact
SHA-[123] topic), but we probably would want to start thinking about
the tree object format "v2" at some point.

Some random thoughts:

 - It is OK if existing versions of Git barfed when asked to read a
   tree object in the "v2" format.  The repository format version
   may need to be bumped up when writing such an object, and
   transfer protocols need to pay attention to it, to avoid
   transferring history with objects in newer representation to
   repositories with older repository format version.

 - We do not need a new "tree v2" object type.  Existing versions of
   Git will barf upon seeing such an object, but that won't be the
   only way to prevent existing versions of Git from misinterpreting
   a tree object recorded in the "v2" format as if it were in the
   current format (e.g. a non-octal in the mode field of the first
   entry causes tree-walk.c::get_mode() to barf).

 - We do not mind two tree objects that encodes the same tree in the
   current and the enhanced formats to have different object names.
   In fact, we care more about the object names derived purely from
   the content of the object as an uninterpreted bytestream, so it
   is expected that they have different object names.

   This will make the path-limited traversal and diff to open more
   trees unnecessarily at the "version bump" boundary in the
   history, but that is normal (think of a project that used to
   record its text files with CRLF and one day decides to convert
   everything to LF; the trees before and after the conversion will
   record logically the same contents "git show" should give an
   emptyness, but the diff machinery needs to go into contents at
   the flag day boundary).

   As long as we do not let random "extension of the day" into the
   new format willy-nilly, the resulting history will still be
   useful and usable.  From that point of view, no parts of the
   additional information we would record in the updated format that
   is not present in the current format should be optional (iow,
   once you decide to use the "v2" format to record a certain tree,
   you will produce an identical and reproducible representation in
   "v2", regardless of your implementation).

All of the above are issues for Git 3.0 and beyond, though ;-).
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: When Will We See Collisions for SHA-1? (An interesting analysis by Bruce Schneier)

2012-10-16 Thread Jeff King
On Tue, Oct 16, 2012 at 11:32:38AM -0700, da...@lang.hm wrote:

> >I don't see much point in it. If we want to add new backup pointers to
> >commit objects, it is very easy to do so by adding new header fields.
> >
> >A much bigger problem is the other places we reference sha1s. The
> >obvious place is trees, which have no room for backup pointers (either
> >in headers, or with a NUL trick). But it also means that any time you
> >have a sha1 that you arrive at in some other way than traversal from a
> >signature, you are vulnerable to attack. E.g., if I record a sha1 in an
> >external system, today I can be sure that when I fetch the object for
> >that sha1, it is valid (or I can check that it is valid by hashing it).
> >With sha1 collisions, I am vulnerable to attack.
> 
> If you have two hashes of the same contents (SHA1 and SHA3) and they
> both agree that the file has not been tampered with, you should still
> be in good shape just using the SHA1 as a reference.

But tampering is only part of it. We care about a chain of authenticity
from some known point (either a gpg signature, or a sha1 that you know
to be good because you recorded it from a trusted source). The
references between objects are the links in that chain.

Whether an internal hash would help you would depend on the exact
details of the collision attack. Let's imagine you have a signed tag
that points to commit sha1 X. The pointed-to commit contains a trailer
that says "btw, my sha-3 is Y". An attacker doing a brute-force birthday
attack would do:

  1. Generate some potential contents for the object (generally, take a good
 and malicious version of the object, and add some random bits at
 the end).

  2. Generate the sha-3 trailer for each object and tack it on.

  3. Generate the sha1 for object+trailer.

  4. Remember the sha1 and contents of each object. If the sha1 matches
 something we generated before, we have a collision. Otherwise, goto
 step 1.

So each object, good or malicious, remains consistent with respect to
the sha-3 hash. We know it has not been tampered with since its
generation, but do we not know if it is the same object that the tagger
originally referenced.  We had to compute the sha-3 as part of
generating the object, but it was not actually part of the collision
attack; it just makes it a little more expensive to compute each
iteration. We still have to do only 2^80 iterations.

But nobody is worried about this 2^80 brute force attack. The problem
with sha-1 (as I understand it) is that there are tricks you can do when
making the modifications in step 1 that will make the sha1 from step 3
more likely to find a collision with something you've already generated.
The modifications you make in step 1 will affect the sha-3 hash in step
2, which ultimately impacts the sha1 hash in step 3. Whether and how
that affects your attack would depend on the exact details of the
tricks.

I don't keep up on the state of the art in sha-1 cracking, so maybe the
techniques happen in such a way that the extra hash would be a
significant impediment. Even if it is sufficient to stop current (or
whatever is "current" when sha1 is broken enough to worry about)
attacks, it is a weak point for future attacks.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: When Will We See Collisions for SHA-1? (An interesting analysis by Bruce Schneier)

2012-10-16 Thread david

On Tue, 16 Oct 2012, Jeff King wrote:


On Tue, Oct 16, 2012 at 01:58:06PM -0400, Theodore Ts'o wrote:


I seem to recall that there was at least some discussion at one point
about adding some extra fields to the commit object in a backwards
compatible way by adding it after the trailing NUL.  We didn't end up
doing it, but I could see it being a useful thing nonetheless (for
example, we could potentially put the backup SHA-2/SHA-3 pointer there).


I don't see much point in it. If we want to add new backup pointers to
commit objects, it is very easy to do so by adding new header fields.

A much bigger problem is the other places we reference sha1s. The
obvious place is trees, which have no room for backup pointers (either
in headers, or with a NUL trick). But it also means that any time you
have a sha1 that you arrive at in some other way than traversal from a
signature, you are vulnerable to attack. E.g., if I record a sha1 in an
external system, today I can be sure that when I fetch the object for
that sha1, it is valid (or I can check that it is valid by hashing it).
With sha1 collisions, I am vulnerable to attack.


If you have two hashes of the same contents (SHA1 and SHA3) and they both 
agree that the file has not been tampered with, you should still be in 
good shape just using the SHA1 as a reference.


David Lang
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: When Will We See Collisions for SHA-1? (An interesting analysis by Bruce Schneier)

2012-10-16 Thread Jeff King
On Tue, Oct 16, 2012 at 01:58:06PM -0400, Theodore Ts'o wrote:

> I seem to recall that there was at least some discussion at one point
> about adding some extra fields to the commit object in a backwards
> compatible way by adding it after the trailing NUL.  We didn't end up
> doing it, but I could see it being a useful thing nonetheless (for
> example, we could potentially put the backup SHA-2/SHA-3 pointer there).

I don't see much point in it. If we want to add new backup pointers to
commit objects, it is very easy to do so by adding new header fields.

A much bigger problem is the other places we reference sha1s. The
obvious place is trees, which have no room for backup pointers (either
in headers, or with a NUL trick). But it also means that any time you
have a sha1 that you arrive at in some other way than traversal from a
signature, you are vulnerable to attack. E.g., if I record a sha1 in an
external system, today I can be sure that when I fetch the object for
that sha1, it is valid (or I can check that it is valid by hashing it).
With sha1 collisions, I am vulnerable to attack.

> What if we explicitly allow a length plus SHA-2/3 hash of the commit
> plus the fields after the SHA-2/3 hash as an extension?  This would
> allow a secure way of adding an extension, including perhaps adding
> backup SHA-2/3 parent pointers, which is something that would be
> useful to do from a security perspective if we really are worried
> about a catastrophic hash failure.

I'm not sure exactly what you mean. Extended parent pointers make sense,
but I don't see what you mean in your first sentence. It sounds like we
are SHA-2/3 hashing something internal to the object, but that doesn't
help. If the pointers are sha1, then I can always replace the whole
object with a colliding one, even if that object is internally
consistent with respect to sha-2.

> The one reason why we *might* want to use SHA-3, BTW, is that it is a
> radically different design from SHA-1 and SHA-2.  And if there is a
> crypto hash failure which is bad enough that the security of git would
> be affected, there's a chance that the same attack could significantly
> affect SHA-2 as well.  The fact that SHA-3 is fundamentally different
> from a cryptographic design perspective means that an attack that
> impacts SHA-1/SHA-2 will not likely impact SHA-3, and vice versa.

Right. The point of having the SHA-3 contest was that we thought SHA-1's
breakage meant that SHA-2 was going to fall next. But Schneier's
comments before the winners were announced were basically "it turns out
that SHA-2 is not broken like we thought, so there's no reason to ditch
it, and the fact that it is well-studied and well-deployed may mean it's
a good choice".

So I could go either way. This is not a decision we should make today,
though, so we can wait and see which direction the world goes before
picking an algorithm.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: When Will We See Collisions for SHA-1? (An interesting analysis by Bruce Schneier)

2012-10-16 Thread Theodore Ts'o
I seem to recall that there was at least some discussion at one point
about adding some extra fields to the commit object in a backwards
compatible way by adding it after the trailing NUL.  We didn't end up
doing it, but I could see it being a useful thing nonetheless (for
example, we could potentially put the backup SHA-2/SHA-3 pointer there).

What if we explicitly allow a length plus SHA-2/3 hash of the commit
plus the fields after the SHA-2/3 hash as an extension?  This would
allow a secure way of adding an extension, including perhaps adding
backup SHA-2/3 parent pointers, which is something that would be
useful to do from a security perspective if we really are worried
about a catastrophic hash failure.

The one reason why we *might* want to use SHA-3, BTW, is that it is a
radically different design from SHA-1 and SHA-2.  And if there is a
crypto hash failure which is bad enough that the security of git would
be affected, there's a chance that the same attack could significantly
affect SHA-2 as well.  The fact that SHA-3 is fundamentally different
from a cryptographic design perspective means that an attack that
impacts SHA-1/SHA-2 will not likely impact SHA-3, and vice versa.

   - Ted
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: When Will We See Collisions for SHA-1? (An interesting analysis by Bruce Schneier)

2012-10-16 Thread Jeff King
On Tue, Oct 16, 2012 at 01:34:41PM +0200, René Scharfe wrote:

> FWIW, I couldn't measure a performance difference for git log with and
> without the following patch, which catches commits created with your
> hash collision trick, but might be too strict:
> 
> diff --git a/commit.c b/commit.c
> index 213bc98..4cd1e83 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -262,6 +262,12 @@ int parse_commit_buffer(struct commit *item, const void 
> *buffer, unsigned long s
>   if (item->object.parsed)
>   return 0;
>   item->object.parsed = 1;
> +
> + if (memchr(buffer, '\0', size)) {
> + return error("bogus commit contains a NUL character: %s",
> +  sha1_to_hex(item->object.sha1));
> + }
> +

Hmm. Yeah, that should be relatively inexpensive, since we are about to
read through most of the bytes anyway (we probably have just zlib
inflated them all, so they would even be in cache). It might make more
of a difference for a raw traversal that is not even going to look at
below the header, like rev-list or merge-base. But I couldn't measure a
difference doing "git rev-list HEAD >/dev/null" in either git.git or
linux-2.6.git.

So maybe it is worth doing preemptively. Even without security concerns,
we would be truncating the commit message, so it is probably better to
let the user know (a warning is probably more appropriate, though, just
in case somebody does have embedded NULs for historical reason).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: When Will We See Collisions for SHA-1? (An interesting analysis by Bruce Schneier)

2012-10-16 Thread René Scharfe
Am 15.10.2012 20:34, schrieb Jeff King:
> On Mon, Oct 15, 2012 at 07:47:09PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
>> On Mon, Oct 15, 2012 at 6:42 PM, Elia Pinto  wrote:
>>> Very clear analysis. Well written. Perhaps is it the time to update
>>> http://git-scm.com/book/ch6-1.html (A SHORT NOTE ABOUT SHA-1) ?
>>>
>>> Hope useful
>>>
>>> http://www.schneier.com/crypto-gram-1210.html
>>
>> This would be concerning if the Git security model would break down if
>> someone found a SHA1 collision, but it really wouldn't.
>>
>> It's one thing to find *a* collision, it's quite another to:
>>
>>   1. Find a collision for the sha1 of harmless.c which I know you use,
>>  and replace it with evil.c.
>>
>>   2. Somehow make evil.c compile so that it actually does something
>>  useful and nefarious, and doesn't just make the C compiler puke.
>>
>>  If finding one arbitrary collision costs $43K in 2021 dollars
>>  getting past this point is going to take quite a large multiple of
>>  $43K.
> 
> There are easier attacks than that if you can hide arbitrary bytes
> inside a file. It's hard with C source code. The common one in hash
> collision detection circles is to put invisible cruft into binary
> document formats like PDF or Postscript. Git blobs themselves do not
> have such an invisible place to put it, but you might be storing a
> format that does.
> 
> But worse, git _commits_ have such an invisible portion. We calculate
> the sha1 over the full commit, but we tend to show only the portion up
> to the first NUL byte. I used that horrible trick in my "choose your own
> sha1 prefix" patch. However, we could mitigate that by checking for
> embedded NULs in git-fsck.

FWIW, I couldn't measure a performance difference for git log with and
without the following patch, which catches commits created with your
hash collision trick, but might be too strict:

diff --git a/commit.c b/commit.c
index 213bc98..4cd1e83 100644
--- a/commit.c
+++ b/commit.c
@@ -262,6 +262,12 @@ int parse_commit_buffer(struct commit *item, const void 
*buffer, unsigned long s
if (item->object.parsed)
return 0;
item->object.parsed = 1;
+
+   if (memchr(buffer, '\0', size)) {
+   return error("bogus commit contains a NUL character: %s",
+sha1_to_hex(item->object.sha1));
+   }
+
tail += size;
if (tail <= bufptr + 46 || memcmp(bufptr, "tree ", 5) || bufptr[45] != 
'\n')
return error("bogus commit object %s", 
sha1_to_hex(item->object.sha1));

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: When Will We See Collisions for SHA-1? (An interesting analysis by Bruce Schneier)

2012-10-15 Thread Jeff King
On Mon, Oct 15, 2012 at 09:09:44PM +0200, Elia Pinto wrote:

> Hem , sha-3 i suppose, keccak, no ? But really is not so urgent as you
> have already told .

It depends. Read what Schneier wrote right before they announced the
SHA-3 winner:

  https://www.schneier.com/crypto-gram-1210.html#2

There's really no security reason not to use SHA-2, and in fact it's
probably better, as it has been more widely studied at this point. But
that part is easy; it's the compatibility switch-over that's hard (we
could also even parameterize the hash, but that has some annoyances,
too).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: When Will We See Collisions for SHA-1? (An interesting analysis by Bruce Schneier)

2012-10-15 Thread Elia Pinto
Hem , sha-3 i suppose, keccak, no ? But really is not so urgent as you
have already told .

Best

2012/10/15, Jeff King :
> On Mon, Oct 15, 2012 at 07:47:09PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> On Mon, Oct 15, 2012 at 6:42 PM, Elia Pinto 
>> wrote:
>> > Very clear analysis. Well written. Perhaps is it the time to update
>> > http://git-scm.com/book/ch6-1.html (A SHORT NOTE ABOUT SHA-1) ?
>> >
>> > Hope useful
>> >
>> > http://www.schneier.com/crypto-gram-1210.html
>>
>> This would be concerning if the Git security model would break down if
>> someone found a SHA1 collision, but it really wouldn't.
>>
>> It's one thing to find *a* collision, it's quite another to:
>>
>>  1. Find a collision for the sha1 of harmless.c which I know you use,
>> and replace it with evil.c.
>>
>>  2. Somehow make evil.c compile so that it actually does something
>> useful and nefarious, and doesn't just make the C compiler puke.
>>
>> If finding one arbitrary collision costs $43K in 2021 dollars
>> getting past this point is going to take quite a large multiple of
>> $43K.
>
> There are easier attacks than that if you can hide arbitrary bytes
> inside a file. It's hard with C source code. The common one in hash
> collision detection circles is to put invisible cruft into binary
> document formats like PDF or Postscript. Git blobs themselves do not
> have such an invisible place to put it, but you might be storing a
> format that does.
>
> But worse, git _commits_ have such an invisible portion. We calculate
> the sha1 over the full commit, but we tend to show only the portion up
> to the first NUL byte. I used that horrible trick in my "choose your own
> sha1 prefix" patch. However, we could mitigate that by checking for
> embedded NULs in git-fsck.
>
>>  3. Somehow inject the new evil object into your repository, or
>> convince you to re-clone it / clone it from somewhere you usually
>> wouldn't.
>
> Yeah, this part is the kicker. With the commit NUL trick, you would make
> a useful commit and then ask somebody to pull it, and then later replace
> it with a commit pointing to an arbitrary tree. But if we assume we can
> detect that easily (which I think we can), we are left with replacing
> binary blobs that have hidden bits. And most projects do not take many
> such blobs, and the result is that you could only replace the contents
> of that particular blob, not an arbitrary part of the tree.
>
>> It would be very interesting to see an analysis that deals with some
>> actual Git-related security scenarios, instead of something that just
>> assumes that if someone finds *any* SHA1 collision the sky is going to
>> fall.
>
> I agree that most of the analysis is overblown. Having read the analysis
> Schneier pointed to, it actually is not that bad. We have 5-10 years to
> get to a point where it's really expensive and extremely complex to
> mount a single attack.
>
> That doesn't seem like an emergency to me. It sounds like something we
> should be thinking about (and we are). The simplest thing would be to
> wait for a moment when it makes sense to break compatibility (e.g., we
> decide that "git 2.0" is here, and everybody will have to rewrite to
> take advantage of new features, so we can jump to sha-2). We can also
> start building sha-2 history that references sha-1 history. That would
> mean everybody needs to upgrade their git, but that is not a problem
> that requires 5-10 years of foresight and planning.
>
> -Peff
>

-- 
Inviato dal mio dispositivo mobile
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: When Will We See Collisions for SHA-1? (An interesting analysis by Bruce Schneier)

2012-10-15 Thread Jeff King
On Mon, Oct 15, 2012 at 07:47:09PM +0200, Ævar Arnfjörð Bjarmason wrote:

> On Mon, Oct 15, 2012 at 6:42 PM, Elia Pinto  wrote:
> > Very clear analysis. Well written. Perhaps is it the time to update
> > http://git-scm.com/book/ch6-1.html (A SHORT NOTE ABOUT SHA-1) ?
> >
> > Hope useful
> >
> > http://www.schneier.com/crypto-gram-1210.html
> 
> This would be concerning if the Git security model would break down if
> someone found a SHA1 collision, but it really wouldn't.
> 
> It's one thing to find *a* collision, it's quite another to:
> 
>  1. Find a collision for the sha1 of harmless.c which I know you use,
> and replace it with evil.c.
> 
>  2. Somehow make evil.c compile so that it actually does something
> useful and nefarious, and doesn't just make the C compiler puke.
> 
> If finding one arbitrary collision costs $43K in 2021 dollars
> getting past this point is going to take quite a large multiple of
> $43K.

There are easier attacks than that if you can hide arbitrary bytes
inside a file. It's hard with C source code. The common one in hash
collision detection circles is to put invisible cruft into binary
document formats like PDF or Postscript. Git blobs themselves do not
have such an invisible place to put it, but you might be storing a
format that does.

But worse, git _commits_ have such an invisible portion. We calculate
the sha1 over the full commit, but we tend to show only the portion up
to the first NUL byte. I used that horrible trick in my "choose your own
sha1 prefix" patch. However, we could mitigate that by checking for
embedded NULs in git-fsck.

>  3. Somehow inject the new evil object into your repository, or
> convince you to re-clone it / clone it from somewhere you usually
> wouldn't.

Yeah, this part is the kicker. With the commit NUL trick, you would make
a useful commit and then ask somebody to pull it, and then later replace
it with a commit pointing to an arbitrary tree. But if we assume we can
detect that easily (which I think we can), we are left with replacing
binary blobs that have hidden bits. And most projects do not take many
such blobs, and the result is that you could only replace the contents
of that particular blob, not an arbitrary part of the tree.

> It would be very interesting to see an analysis that deals with some
> actual Git-related security scenarios, instead of something that just
> assumes that if someone finds *any* SHA1 collision the sky is going to
> fall.

I agree that most of the analysis is overblown. Having read the analysis
Schneier pointed to, it actually is not that bad. We have 5-10 years to
get to a point where it's really expensive and extremely complex to
mount a single attack.

That doesn't seem like an emergency to me. It sounds like something we
should be thinking about (and we are). The simplest thing would be to
wait for a moment when it makes sense to break compatibility (e.g., we
decide that "git 2.0" is here, and everybody will have to rewrite to
take advantage of new features, so we can jump to sha-2). We can also
start building sha-2 history that references sha-1 history. That would
mean everybody needs to upgrade their git, but that is not a problem
that requires 5-10 years of foresight and planning.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: When Will We See Collisions for SHA-1? (An interesting analysis by Bruce Schneier)

2012-10-15 Thread Elia Pinto
2012/10/15 Ævar Arnfjörð Bjarmason :
> On Mon, Oct 15, 2012 at 6:42 PM, Elia Pinto  wrote:
>> Very clear analysis. Well written. Perhaps is it the time to update
>> http://git-scm.com/book/ch6-1.html (A SHORT NOTE ABOUT SHA-1) ?
>>
>> Hope useful
>>
>> http://www.schneier.com/crypto-gram-1210.html
>
> This would be concerning if the Git security model would break down if
> someone found a SHA1 collision, but it really wouldn't.
I know perfectly.
>
> It's one thing to find *a* collision, it's quite another to:
>
>  1. Find a collision for the sha1 of harmless.c which I know you use,
> and replace it with evil.c.
>
>  2. Somehow make evil.c compile so that it actually does something
> useful and nefarious, and doesn't just make the C compiler puke.
>
> If finding one arbitrary collision costs $43K in 2021 dollars
> getting past this point is going to take quite a large multiple of
> $43K.
>
>  3. Somehow inject the new evil object into your repository, or
> convince you to re-clone it / clone it from somewhere you usually
> wouldn't.
>
> At some point in the early days of Git Linus went on a rant to this
> effect either on this list or on the LKML.
>
> Maybe it would be useful to include some of that instead?
>
What you wrote is a risk analysis. I appreciate, i am also a  security
professionals..
> It would be very interesting to see an analysis that deals with some
> actual Git-related security scenarios, instead of something that just
> assumes that if someone finds *any* SHA1 collision the sky is going to
> fall.
What you wrote is a risk analysis. I appreciate, as security professionals, too.

I agree, of course. However, it is totally different from saying that
because exists the birthday paradox git will be immune to collision,
sure, if caused by a cryptographic attack. But clearly the risk for a
project that uses a cryptographic hash function as a hash function, as
git, is zero in the absence of a real use case. In computer security
the use of encryption is hardly the point of attack, today, as you
have clearly said. But it is the most invisible to highlight.

It seemed interesting to quote here the Bruce article because the
topic has already been discussed in the past here. That's it. No more,
no less.

Thanks

Best Regards
Best Regard
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: When Will We See Collisions for SHA-1? (An interesting analysis by Bruce Schneier)

2012-10-15 Thread Ævar Arnfjörð Bjarmason
On Mon, Oct 15, 2012 at 6:42 PM, Elia Pinto  wrote:
> Very clear analysis. Well written. Perhaps is it the time to update
> http://git-scm.com/book/ch6-1.html (A SHORT NOTE ABOUT SHA-1) ?
>
> Hope useful
>
> http://www.schneier.com/crypto-gram-1210.html

This would be concerning if the Git security model would break down if
someone found a SHA1 collision, but it really wouldn't.

It's one thing to find *a* collision, it's quite another to:

 1. Find a collision for the sha1 of harmless.c which I know you use,
and replace it with evil.c.

 2. Somehow make evil.c compile so that it actually does something
useful and nefarious, and doesn't just make the C compiler puke.

If finding one arbitrary collision costs $43K in 2021 dollars
getting past this point is going to take quite a large multiple of
$43K.

 3. Somehow inject the new evil object into your repository, or
convince you to re-clone it / clone it from somewhere you usually
wouldn't.

At some point in the early days of Git Linus went on a rant to this
effect either on this list or on the LKML.

Maybe it would be useful to include some of that instead?

It would be very interesting to see an analysis that deals with some
actual Git-related security scenarios, instead of something that just
assumes that if someone finds *any* SHA1 collision the sky is going to
fall.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html