subject:"Git and SHA\-1 security \(again\)"

Re: Git and SHA-1 security (again)

2016-08-22 Thread Philip Oakley


Sorry if I'm dropping in at the wrong point (this is one I'd bookmarked)..



From: "Duy Nguyen" 
Sent: Wednesday, July 20, 2016 3:44 PM

On Wed, Jul 20, 2016 at 2:28 PM, Johannes Schindelin
 wrote:

But that strategy *still* ignores the distributed nature of Git. Just
because *you* make that merge at a certain point does not necessarily 
mean

that I make it at that point, too.

Any approach that tries to have one single point of conversion will most
likely fall short of a solution.


OK I see the difference in our views now. To me an sha256 repo would
see an sha1 repo as a _foreign_ DVCS, pretty much like git sees
mercurial now. So a transition from sha1 to sha256 is not that
different from cvs -> svn -> a dvcs bubble -> git.



I think that within Git, that it is possible to have inter-workability (for 
those parts that negotiate) between instances with different views about the 
availability of two hash types. Fetch/push negotiation is a normal part of 
working with a remote.


To be honest, I am less concerned about the GPG-signed commits (after 
all,

after switching to a more secure hash algorithm, a maintainer could
cross-sign all signed commits, or only the branch tips or tags, as new
tags, to reinstitute trust).

I am much more concerned about references to commits, both inside and
outside the repository. That is, if I read anywhere on the internet about
Git having added support for `git add --chmod=+x ` in 4e55ed3 (add:
add --chmod=+x / --chmod=-x options, 2016-05-31), I want to find that
commit by that reference.

And I am of course concerned what should happen if a user wants to fetch
from, or push to, a SHA-1-hashed remote repository into, or from, a
SHA-256-hashed local one.


to follow the above, in my view, interaction with sha1 repos go
through some conversion bridges like what we have with hg and svn. I
don't know if we are going this route. It's certainly simpler and
people already have experiences (from previous migration) to prepare
for it.
--


The main thought was that rather than worrying about which advanced hash to 
pick (with all the arguments that entails), rather it is worth reducing the 
problem space to create a 'toy problem', to look at the interaction issues.


For the toy problem view we'd keep the current oid length (so that the 
transmission formats don't change size), however we swap the old-new to make 
sha1 the new hash and use an older shorter hash (e.g. md5) to investigate 
the transition from a short to long hash.


Keeping it as a 'toy problem' avoids folks having too much invested in the 
new hash choice, rather the interworking can be more easily sorted, and some 
issue can be punted on (e.g. the choice of salt to extend the md5 to the 
sha1, and collisions therein).


--
Philip

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-22 Thread Junio C Hamano

Johannes Schindelin  writes:

> Hi Junio,
>
> On Mon, 18 Jul 2016, Junio C Hamano wrote:
>
>> "brian m. carlson"  writes:
>> 
>> > I will say that the pack format will likely require some changes,
>> > because it assumes ...  The reason is that we can't have an
>> > unambiguous parse of the current objects if two hash algorithms are in
>> > use  So when we look at a new hash, we need to provide an
>> > unambiguous way to know what hash is in use.  The two choices are to
>> > either require all object use the new hash, or to extend the objects
>> > to include the hash.  Until a couple days ago, I had planned to do the
>> > former.  I had not even considered using a multihash approach due to
>> > the complexity.
>> 
>> Objects in Git identify themselves, but once you introduce the second
>> hash function (as opposed to replacing the hash function to a new one),
>> you would allow people to call the same object by two names.  That has
>> interesting implications.
>> 
>> [...]
>
> So essentially you are saying that the multi-hash approach has too many
> negative implications, right? At least that is what I understand.
>
> Looks more and more like we do need to convert repositories wholesale, and
> keep a two-way mapping for talking to remote repositories.
>
> Would you concur?

Not necessarily.

That was me thinking aloud, listing some issues that I would imagine
to be tricky to solve, without even attempting to be exhaustive,
that I expect to see solved in a good end-result implementation.
For example, "I do not see a nice way to solve X myself without
doing Y" in the message you are responding to does not necessarily
mean there is no good solution to X (just "I do not think of any
offhand"), and it does not mean I think it is terrible that we have
to do Y to solve X.


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-21 Thread Johannes Schindelin

Hi Brian,

On Mon, 18 Jul 2016, brian m. carlson wrote:

> On Mon, Jul 18, 2016 at 09:00:06AM +0200, Johannes Schindelin wrote:
> 
> > FWIW it never crossed my mind to allow different same-sized hash
> > algorithms. So I never thought we'd need a way to distinguish, say,
> > BLAKE2b-256 from SHA-256.
> > 
> > Is there a good reason to add the maintenance burden of several 256-bit
> > hash algorithms, apart from speed (which in my mind should decide which
> > one to use, always, rather than letting the user choose)? It would also
> > complicate transport even further, let alone subtree merges from
> > differently-hashed repositories.
> 
> There are really three candidates:
> 
> * SHA-256 (the SHA-2 algorithm): While this looks good right now,
>   cryptanalysis is advancing.  This is not a good choice for a long-term
>   solution.
> * SHA3-256 (the SHA-3 algorithm): This is the conservative choice.  It's
>   also faster than SHA-256 on 64-bit systems.  It has a very
>   conservative security margin and is a good long-term choice.
> * BLAKE2b-256: This is the blisteringly fast choice.  It outperforms
>   SHA-1 and even MD5 on 64-bit systems.  This algorithm was designed so
>   that nobody would have a reason to use an insecure algorithm.  It will
>   probably be secure for some time, but maybe not as long as SHA3-256.
> 
> I'm only considering 256-bit hashes, because anything longer won't fit
> on an 80-column terminal in hex form.
> 
> The reason I had considered implementing both SHA3-256 and BLAKE2b-256
> is that I want there to be no reason not to upgrade.  People who need a
> FIPS-approved algorithm or want a long-term, conservative choice should
> use SHA3-256.  People who want even better performance than current Git
> would use BLAKE2b-256.
> 
> Performance comparison (my implementations):
> SHA-1: 437 MiB/s
> SHA-256:   196 MiB/s
> SHA3-256:  273 MiB/s
> BLAKE2b:   649 MiB/s

Those are impressive numbers on BLAKE2b. However, Keccak was chosen as
SHA-3 because it can be implemented in hardware more efficiently than
BLAKE (and hence, probably, also BLAKE2). Given that there are already SSE
instructions implementing SHA-1/SHA-256 on some CPUs [*1*], I would not be
surprised if SHA3 would also see some hardware support.

So speed seems less of a concern to me. We are talking about a multi-year
roadmap, after all.

And given the complications for public repository hosters, I would like to
settle for a single 256-bit hash. That'll be challenging enough.

Ciao,
Dscho

Footnote *1*: https://en.wikipedia.org/wiki/Intel_SHA_extensions
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-21 Thread Johannes Schindelin

Hi Brian,

On Mon, 18 Jul 2016, brian m. carlson wrote:

> On Mon, Jul 18, 2016 at 11:00:35AM -0700, Junio C Hamano wrote:
> > Continuing this thought process, I do not see a good way to allow us
> > to wean ourselves off of the old hash, unless we _break_ the pack
> > stream format so that each object in the pack carries not just the
> > data but also the hash algorithm to be used to _name_ it, so that
> > new objects will never be referred to using the old hash.
> 
> I think for this reason, I'm going to propose the following approach
> when we get there:
> 
> * We serialize the hash in the object formats, using multihash or
>   something similar.  This means that it is minimally painful if we ever
>   need to change in the future[0].

This adds a lot of redundancy, though, and has an adverse performance
impact, no?

Could we not simply require packs to identify the used hash *once*, and
use a single hash algorithm per repository?

That would mean that we would have to re-hash packs on-the-fly if, say,
talking to a SHA-1 remote from a SHA-256 local repository.

> * Each repository carries exactly one hash algorithm, except for
>   submodule data.  If we don't do this, then some people will never
>   switch because the submodules they depend on haven't.

If we re-hash transparently, we could get away with SHA-256 even for
submodules.

> * If people on the new format need to refer to submodule commits using
>   SHA-1, then they have to use a prefix on the hash form; otherwise,
>   they can use the raw hash value (without any multihash prefix).
> * git fsck verifies one consistent algorithm (excepting submodule
>   references).
> 
> This preserves the security benefits, avoids future-proofing problems,
> and minimizes performance impacts due to naming like you mentioned.
> 
> [0] We are practically limited to 256-bit hashes because anything longer
> will wrap on an 80-column terminal when in hex form.

We are not really bound by the 80-column limit when choosing a hash
algorithm. We typically refer to a commit by a shorter name, and the
80-column limit applies only to Git's own source code.

Ciao,
Dscho
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-21 Thread Johannes Schindelin

Hi Junio,

On Mon, 18 Jul 2016, Junio C Hamano wrote:

> "brian m. carlson"  writes:
> 
> > I will say that the pack format will likely require some changes,
> > because it assumes ...  The reason is that we can't have an
> > unambiguous parse of the current objects if two hash algorithms are in
> > use  So when we look at a new hash, we need to provide an
> > unambiguous way to know what hash is in use.  The two choices are to
> > either require all object use the new hash, or to extend the objects
> > to include the hash.  Until a couple days ago, I had planned to do the
> > former.  I had not even considered using a multihash approach due to
> > the complexity.
> 
> Objects in Git identify themselves, but once you introduce the second
> hash function (as opposed to replacing the hash function to a new one),
> you would allow people to call the same object by two names.  That has
> interesting implications.
> 
> [...]

So essentially you are saying that the multi-hash approach has too many
negative implications, right? At least that is what I understand.

Looks more and more like we do need to convert repositories wholesale, and
keep a two-way mapping for talking to remote repositories.

Would you concur?

Ciao,
Dscho
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-20 Thread Junio C Hamano

Stefan Beller  writes:

>> to follow the above, in my view, interaction with sha1 repos go
>> through some conversion bridges like what we have with hg and svn. I
>> don't know if we are going this route. It's certainly simpler and
>> people already have experiences (from previous migration) to prepare
>> for it.
>
> When treating the SHA1 version as a foreign dvcs and the SHA256
> as the real deal, we could introduce "pointer objects", and during the
> conversion
> create a 4e55ed3 pointer that points to the SHA256 commit of (add:
> add --chmod=+x / --chmod=-x options, 2016-05-31).

Hmmm.  If you are designing this "pointer objects" to be extensible
enough to cover other foreign vcs (i.e.e.g. you make it to be
capable of mapping Subversion's r24323 to a matching commit in the
converted result), I would think it may be a very useful thing to
have, but I think it is pretty much orthogonal to the discussion in
this topic.  IOW, that can happen with or without change of the hash
function.

And looking at it that way, I am not sure if such a mapping feature
should require adding a new type of "object".
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-20 Thread Stefan Beller

On Wed, Jul 20, 2016 at 7:44 AM, Duy Nguyen  wrote:
> On Wed, Jul 20, 2016 at 2:28 PM, Johannes Schindelin
>  wrote:
>> But that strategy *still* ignores the distributed nature of Git. Just
>> because *you* make that merge at a certain point does not necessarily mean
>> that I make it at that point, too.
>>
>> Any approach that tries to have one single point of conversion will most
>> likely fall short of a solution.
>
> OK I see the difference in our views now. To me an sha256 repo would
> see an sha1 repo as a _foreign_ DVCS, pretty much like git sees
> mercurial now. So a transition from sha1 to sha256 is not that
> different from cvs -> svn -> a dvcs bubble -> git.
>
>> To be honest, I am less concerned about the GPG-signed commits (after all,
>> after switching to a more secure hash algorithm, a maintainer could
>> cross-sign all signed commits, or only the branch tips or tags, as new
>> tags, to reinstitute trust).
>>
>> I am much more concerned about references to commits, both inside and
>> outside the repository. That is, if I read anywhere on the internet about
>> Git having added support for `git add --chmod=+x ` in 4e55ed3 (add:
>> add --chmod=+x / --chmod=-x options, 2016-05-31), I want to find that
>> commit by that reference.
>>
>> And I am of course concerned what should happen if a user wants to fetch
>> from, or push to, a SHA-1-hashed remote repository into, or from, a
>> SHA-256-hashed local one.
>
> to follow the above, in my view, interaction with sha1 repos go
> through some conversion bridges like what we have with hg and svn. I
> don't know if we are going this route. It's certainly simpler and
> people already have experiences (from previous migration) to prepare
> for it.

When treating the SHA1 version as a foreign dvcs and the SHA256
as the real deal, we could introduce "pointer objects", and during the
conversion
create a 4e55ed3 pointer that points to the SHA256 commit of (add:
add --chmod=+x / --chmod=-x options, 2016-05-31). Ideally we would
not even expose this sort of object a lot, e.g. git show  would just
redirect automatically. Instead of a new class of "pointer objects" we could
also solve this via a lot of refs. (refs/old-sha1/4e55ed3 pointing to
the converted
commit; Though we would need to accept partial refs names then :/)

> --
> Duy
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-20 Thread Duy Nguyen

On Tue, Jul 19, 2016 at 8:58 PM, Herczeg Zsolt  wrote:
> 2016-07-19 20:04 GMT+02:00 Duy Nguyen :
>> On Tue, Jul 19, 2016 at 7:59 PM, David Lang  wrote:
>>> On Tue, 19 Jul 2016, Duy Nguyen wrote:
>>>
 On Tue, Jul 19, 2016 at 7:34 PM, David Lang  wrote:
>
> On Tue, 19 Jul 2016, Duy Nguyen wrote:
>
>> On Tue, Jul 19, 2016 at 9:18 AM, Johannes Schindelin
>>  wrote:


 But we can recreate SHA-1 from the same content and verify GPG, right?
 I know it's super expensive, but it feels safer to not carry SHA-1
 around when it's not secure anymore (I recall something about
 exploiting the weakest link when you have both sha1 and sha256 in the
 object content). Rehashing would be done locally and is better
 controlled.
>>>
>>>
>>>
>>> You could. But how would you determine whether to recreate the commit
>>> object from a SHA-1-ified version of the commit buffer? Fall back if
>>> the
>>> original did not match the signature?
>>
>>
>>
>> Any repo would have a cut point when they move to sha256 (or whatever
>> new hash), if we can record this somewhere (e.g. as a tag or a bunch
>> of tags, or some dummy commits to mark the heads of the repo) then we
>> only verify gpg signatures _in_ the repository before this point.
>
>
>
> remember that a repo doesn't have a single 'now', each branch has it's
> own
> head, and you can easily go back to prior points and branch off from
> there.
>
> Since timestamps in repos can't be trusted (different people's clocks may
> not be in sync), how would you define this cutoff point?


 The set of all heads at the time the conversion happens (maybe plus
 all the real tags). We can make an octopus merge commit to cover all
 the heads, then it can be the reference point.
>>>
>>>
>>> so to make sure I'm understanding this, anything not reachable from that
>>> merge must be the new hash, correct? Including forks, merges, etc that
>>> happen from earlier points in the history.
>>
>> Yes everything except that merge and everything reachable from it, the
>> whole old clone, basically.
>
> It could work, but does it worth it?
>
> 1) If you use multihash, you should assume that anything with SHA1
> could be manipulated. That means you can "inject" something later to
> that "old clone" anyway.

No it's not multihash. The repo only uses sha256, but by substituting
it with sha1 using the same dag, we can recreate the exact same sha1
repo (up to the conversion point). This is mostly to avoid people
injecting something because _you_ generate the repo locally.

> 2) Even if the content is re-hashed, it's hard to understand for a
> user where the trust comes from. The user should decide weather he
> trust (or not) the person who signed that octopus breakpoint.
>
> Even without git you can achieve this security: Get the complete old
> repository, make a signed tarball of it. If anytime later you want to
> check that signatures, you can just use that tarball. I don't think
> it's worth the trouble to create a native method for something which
> is rare, and can be worked around easily. It's actually easier for a
> user to understand the "trust relation" when using this workaround.
>
> Referring to that signed-tarball approach, you may just as well drop
> all signature data on conversion... As long as you can look up the
> references to old hashes easily, I think it's usable enough.

It's more or less the signed-tarball approach in my view, except that
you recreate that tarball dynamically with your sha256 repo (so this
tarball is "signed" with sha256).
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-20 Thread Duy Nguyen

On Wed, Jul 20, 2016 at 2:28 PM, Johannes Schindelin
 wrote:
> But that strategy *still* ignores the distributed nature of Git. Just
> because *you* make that merge at a certain point does not necessarily mean
> that I make it at that point, too.
>
> Any approach that tries to have one single point of conversion will most
> likely fall short of a solution.

OK I see the difference in our views now. To me an sha256 repo would
see an sha1 repo as a _foreign_ DVCS, pretty much like git sees
mercurial now. So a transition from sha1 to sha256 is not that
different from cvs -> svn -> a dvcs bubble -> git.

> To be honest, I am less concerned about the GPG-signed commits (after all,
> after switching to a more secure hash algorithm, a maintainer could
> cross-sign all signed commits, or only the branch tips or tags, as new
> tags, to reinstitute trust).
>
> I am much more concerned about references to commits, both inside and
> outside the repository. That is, if I read anywhere on the internet about
> Git having added support for `git add --chmod=+x ` in 4e55ed3 (add:
> add --chmod=+x / --chmod=-x options, 2016-05-31), I want to find that
> commit by that reference.
>
> And I am of course concerned what should happen if a user wants to fetch
> from, or push to, a SHA-1-hashed remote repository into, or from, a
> SHA-256-hashed local one.

to follow the above, in my view, interaction with sha1 repos go
through some conversion bridges like what we have with hg and svn. I
don't know if we are going this route. It's certainly simpler and
people already have experiences (from previous migration) to prepare
for it.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-20 Thread Johannes Schindelin

Hi Duy,

On Tue, 19 Jul 2016, Duy Nguyen wrote:

> On Tue, Jul 19, 2016 at 7:59 PM, David Lang  wrote:
> > On Tue, 19 Jul 2016, Duy Nguyen wrote:
> >
> >> On Tue, Jul 19, 2016 at 7:34 PM, David Lang  wrote:
> >>>
> >>> On Tue, 19 Jul 2016, Duy Nguyen wrote:
> >>>
>  On Tue, Jul 19, 2016 at 9:18 AM, Johannes Schindelin
>   wrote:
> >>
> >>
> >> But we can recreate SHA-1 from the same content and verify GPG,
> >> right?  I know it's super expensive, but it feels safer to not
> >> carry SHA-1 around when it's not secure anymore (I recall
> >> something about exploiting the weakest link when you have both
> >> sha1 and sha256 in the object content). Rehashing would be done
> >> locally and is better controlled.
> >
> > You could. But how would you determine whether to recreate the
> > commit object from a SHA-1-ified version of the commit buffer?
> > Fall back if the original did not match the signature?
> 
>  Any repo would have a cut point when they move to sha256 (or
>  whatever new hash), if we can record this somewhere (e.g. as a tag
>  or a bunch of tags, or some dummy commits to mark the heads of the
>  repo) then we only verify gpg signatures _in_ the repository before
>  this point.
> >>>
> >>> remember that a repo doesn't have a single 'now', each branch has
> >>> it's own head, and you can easily go back to prior points and branch
> >>> off from there.
> >>>
> >>> Since timestamps in repos can't be trusted (different people's
> >>> clocks may not be in sync), how would you define this cutoff point?
> >>
> >> The set of all heads at the time the conversion happens (maybe plus
> >> all the real tags). We can make an octopus merge commit to cover all
> >> the heads, then it can be the reference point.
> >
> > so to make sure I'm understanding this, anything not reachable from
> > that merge must be the new hash, correct? Including forks, merges, etc
> > that happen from earlier points in the history.
> 
> Yes everything except that merge and everything reachable from it, the
> whole old clone, basically.

But that strategy *still* ignores the distributed nature of Git. Just
because *you* make that merge at a certain point does not necessarily mean
that I make it at that point, too.

Any approach that tries to have one single point of conversion will most
likely fall short of a solution.

To be honest, I am less concerned about the GPG-signed commits (after all,
after switching to a more secure hash algorithm, a maintainer could
cross-sign all signed commits, or only the branch tips or tags, as new
tags, to reinstitute trust).

I am much more concerned about references to commits, both inside and
outside the repository. That is, if I read anywhere on the internet about
Git having added support for `git add --chmod=+x ` in 4e55ed3 (add:
add --chmod=+x / --chmod=-x options, 2016-05-31), I want to find that
commit by that reference.

And I am of course concerned what should happen if a user wants to fetch
from, or push to, a SHA-1-hashed remote repository into, or from, a
SHA-256-hashed local one.

Ciao,
Dscho
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-19 Thread Herczeg Zsolt

2016-07-19 20:04 GMT+02:00 Duy Nguyen :
> On Tue, Jul 19, 2016 at 7:59 PM, David Lang  wrote:
>> On Tue, 19 Jul 2016, Duy Nguyen wrote:
>>
>>> On Tue, Jul 19, 2016 at 7:34 PM, David Lang  wrote:

 On Tue, 19 Jul 2016, Duy Nguyen wrote:

> On Tue, Jul 19, 2016 at 9:18 AM, Johannes Schindelin
>  wrote:
>>>
>>>
>>> But we can recreate SHA-1 from the same content and verify GPG, right?
>>> I know it's super expensive, but it feels safer to not carry SHA-1
>>> around when it's not secure anymore (I recall something about
>>> exploiting the weakest link when you have both sha1 and sha256 in the
>>> object content). Rehashing would be done locally and is better
>>> controlled.
>>
>>
>>
>> You could. But how would you determine whether to recreate the commit
>> object from a SHA-1-ified version of the commit buffer? Fall back if
>> the
>> original did not match the signature?
>
>
>
> Any repo would have a cut point when they move to sha256 (or whatever
> new hash), if we can record this somewhere (e.g. as a tag or a bunch
> of tags, or some dummy commits to mark the heads of the repo) then we
> only verify gpg signatures _in_ the repository before this point.

 remember that a repo doesn't have a single 'now', each branch has it's
 own
 head, and you can easily go back to prior points and branch off from
 there.

 Since timestamps in repos can't be trusted (different people's clocks may
 not be in sync), how would you define this cutoff point?
>>>
>>>
>>> The set of all heads at the time the conversion happens (maybe plus
>>> all the real tags). We can make an octopus merge commit to cover all
>>> the heads, then it can be the reference point.
>>
>>
>> so to make sure I'm understanding this, anything not reachable from that
>> merge must be the new hash, correct? Including forks, merges, etc that
>> happen from earlier points in the history.
>
> Yes everything except that merge and everything reachable from it, the
> whole old clone, basically.

It could work, but does it worth it?

1) If you use multihash, you should assume that anything with SHA1
could be manipulated. That means you can "inject" something later to
that "old clone" anyway.
2) Even if the content is re-hashed, it's hard to understand for a
user where the trust comes from. The user should decide weather he
trust (or not) the person who signed that octopus breakpoint.

Even without git you can achieve this security: Get the complete old
repository, make a signed tarball of it. If anytime later you want to
check that signatures, you can just use that tarball. I don't think
it's worth the trouble to create a native method for something which
is rare, and can be worked around easily. It's actually easier for a
user to understand the "trust relation" when using this workaround.

Referring to that signed-tarball approach, you may just as well drop
all signature data on conversion... As long as you can look up the
references to old hashes easily, I think it's usable enough.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-19 Thread Junio C Hamano

Duy Nguyen  writes:

>> Even though that single operation might be possible, do not go
>> there.  A "pathname" identifies a "path", not its contents, and
>> "appending crap after path" breaks the data model badly.
>
> I thought about that but I thought all those operations required
> special treatment for submodules anyway.

Operatins requiring special treatment does not make it right to
break the data model anyway, so...
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-19 Thread Duy Nguyen

On Tue, Jul 19, 2016 at 7:59 PM, David Lang  wrote:
> On Tue, 19 Jul 2016, Duy Nguyen wrote:
>
>> On Tue, Jul 19, 2016 at 7:34 PM, David Lang  wrote:
>>>
>>> On Tue, 19 Jul 2016, Duy Nguyen wrote:
>>>
 On Tue, Jul 19, 2016 at 9:18 AM, Johannes Schindelin
  wrote:
>>
>>
>> But we can recreate SHA-1 from the same content and verify GPG, right?
>> I know it's super expensive, but it feels safer to not carry SHA-1
>> around when it's not secure anymore (I recall something about
>> exploiting the weakest link when you have both sha1 and sha256 in the
>> object content). Rehashing would be done locally and is better
>> controlled.
>
>
>
> You could. But how would you determine whether to recreate the commit
> object from a SHA-1-ified version of the commit buffer? Fall back if
> the
> original did not match the signature?



 Any repo would have a cut point when they move to sha256 (or whatever
 new hash), if we can record this somewhere (e.g. as a tag or a bunch
 of tags, or some dummy commits to mark the heads of the repo) then we
 only verify gpg signatures _in_ the repository before this point.
>>>
>>>
>>>
>>> remember that a repo doesn't have a single 'now', each branch has it's
>>> own
>>> head, and you can easily go back to prior points and branch off from
>>> there.
>>>
>>> Since timestamps in repos can't be trusted (different people's clocks may
>>> not be in sync), how would you define this cutoff point?
>>
>>
>> The set of all heads at the time the conversion happens (maybe plus
>> all the real tags). We can make an octopus merge commit to cover all
>> the heads, then it can be the reference point.
>
>
> so to make sure I'm understanding this, anything not reachable from that
> merge must be the new hash, correct? Including forks, merges, etc that
> happen from earlier points in the history.

Yes everything except that merge and everything reachable from it, the
whole old clone, basically.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-19 Thread David Lang


On Tue, 19 Jul 2016, Duy Nguyen wrote:


On Tue, Jul 19, 2016 at 7:34 PM, David Lang  wrote:

On Tue, 19 Jul 2016, Duy Nguyen wrote:


On Tue, Jul 19, 2016 at 9:18 AM, Johannes Schindelin
 wrote:


But we can recreate SHA-1 from the same content and verify GPG, right?
I know it's super expensive, but it feels safer to not carry SHA-1
around when it's not secure anymore (I recall something about
exploiting the weakest link when you have both sha1 and sha256 in the
object content). Rehashing would be done locally and is better
controlled.



You could. But how would you determine whether to recreate the commit
object from a SHA-1-ified version of the commit buffer? Fall back if the
original did not match the signature?



Any repo would have a cut point when they move to sha256 (or whatever
new hash), if we can record this somewhere (e.g. as a tag or a bunch
of tags, or some dummy commits to mark the heads of the repo) then we
only verify gpg signatures _in_ the repository before this point.



remember that a repo doesn't have a single 'now', each branch has it's own
head, and you can easily go back to prior points and branch off from there.

Since timestamps in repos can't be trusted (different people's clocks may
not be in sync), how would you define this cutoff point?


The set of all heads at the time the conversion happens (maybe plus
all the real tags). We can make an octopus merge commit to cover all
the heads, then it can be the reference point.


so to make sure I'm understanding this, anything not reachable from that merge 
must be the new hash, correct? Including forks, merges, etc that happen from 
earlier points in the history.


David Lang
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-19 Thread Duy Nguyen

On Tue, Jul 19, 2016 at 7:34 PM, David Lang  wrote:
> On Tue, 19 Jul 2016, Duy Nguyen wrote:
>
>> On Tue, Jul 19, 2016 at 9:18 AM, Johannes Schindelin
>>  wrote:

 But we can recreate SHA-1 from the same content and verify GPG, right?
 I know it's super expensive, but it feels safer to not carry SHA-1
 around when it's not secure anymore (I recall something about
 exploiting the weakest link when you have both sha1 and sha256 in the
 object content). Rehashing would be done locally and is better
 controlled.
>>>
>>>
>>> You could. But how would you determine whether to recreate the commit
>>> object from a SHA-1-ified version of the commit buffer? Fall back if the
>>> original did not match the signature?
>>
>>
>> Any repo would have a cut point when they move to sha256 (or whatever
>> new hash), if we can record this somewhere (e.g. as a tag or a bunch
>> of tags, or some dummy commits to mark the heads of the repo) then we
>> only verify gpg signatures _in_ the repository before this point.
>
>
> remember that a repo doesn't have a single 'now', each branch has it's own
> head, and you can easily go back to prior points and branch off from there.
>
> Since timestamps in repos can't be trusted (different people's clocks may
> not be in sync), how would you define this cutoff point?

The set of all heads at the time the conversion happens (maybe plus
all the real tags). We can make an octopus merge commit to cover all
the heads, then it can be the reference point.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-19 Thread David Lang


On Tue, 19 Jul 2016, Duy Nguyen wrote:


On Tue, Jul 19, 2016 at 9:18 AM, Johannes Schindelin
 wrote:

But we can recreate SHA-1 from the same content and verify GPG, right?
I know it's super expensive, but it feels safer to not carry SHA-1
around when it's not secure anymore (I recall something about
exploiting the weakest link when you have both sha1 and sha256 in the
object content). Rehashing would be done locally and is better
controlled.


You could. But how would you determine whether to recreate the commit
object from a SHA-1-ified version of the commit buffer? Fall back if the
original did not match the signature?


Any repo would have a cut point when they move to sha256 (or whatever
new hash), if we can record this somewhere (e.g. as a tag or a bunch
of tags, or some dummy commits to mark the heads of the repo) then we
only verify gpg signatures _in_ the repository before this point.


remember that a repo doesn't have a single 'now', each branch has it's own head, 
and you can easily go back to prior points and branch off from there.


Since timestamps in repos can't be trusted (different people's clocks may not be 
in sync), how would you define this cutoff point?


David Lang
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-19 Thread Duy Nguyen

On Tue, Jul 19, 2016 at 7:06 PM, Junio C Hamano  wrote:
> Duy Nguyen  writes:
>
>> Post-shower thoughts. In a tree object, a submodule entry consists of
>> perm (S_IFGITLINK), hash (which is the external hash) and path. We
>> could fill the "hash" part with all zero (invalid, signature of new
>> submodule hash format), then append "/:" to
>> the "path" part. This way we don't have to update tree object or index
>> format. And I suspect the "path" part is available everywhere we need
>> to handle submodules already, so extracting the external hash should
>> be possible...
>
> Even though that single operation might be possible, do not go
> there.  A "pathname" identifies a "path", not its contents, and
> "appending crap after path" breaks the data model badly.  Also other
> things like merge, checkout and diff would break by butchering
> ordering the entries in tree objects.

I thought about that but I thought all those operations required
special treatment for submodules anyway. But I forgot about d/f
conflicts so yeah it's a bad idea. We still have some invalid "mode"
combination that can be used as S_IFGITLINK2, then we can have
variable length hash field in the entry.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-19 Thread Junio C Hamano

Duy Nguyen  writes:

> Post-shower thoughts. In a tree object, a submodule entry consists of
> perm (S_IFGITLINK), hash (which is the external hash) and path. We
> could fill the "hash" part with all zero (invalid, signature of new
> submodule hash format), then append "/:" to
> the "path" part. This way we don't have to update tree object or index
> format. And I suspect the "path" part is available everywhere we need
> to handle submodules already, so extracting the external hash should
> be possible...

Even though that single operation might be possible, do not go
there.  A "pathname" identifies a "path", not its contents, and
"appending crap after path" breaks the data model badly.  Also other
things like merge, checkout and diff would break by butchering
ordering the entries in tree objects.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-19 Thread Duy Nguyen

On Mon, Jul 18, 2016 at 6:51 PM, Duy Nguyen  wrote:
> On Sun, Jul 17, 2016 at 4:21 PM, brian m. carlson
>  wrote:
>> I'm going to end up having to do something similar because of the issue
>> of submodules.  Submodules may still be SHA-1, while the main repo may
>> be a newer hash.
>
> Or even the other way around, main repo is one with sha1 while
> submodule is on sha256. I wonder if we should address this separately
> (and even in parallel with sha256 support), making submodules work
> with an any external VCS system (that supports some basic operations
> we define).

Post-shower thoughts. In a tree object, a submodule entry consists of
perm (S_IFGITLINK), hash (which is the external hash) and path. We
could fill the "hash" part with all zero (invalid, signature of new
submodule hash format), then append "/:" to
the "path" part. This way we don't have to update tree object or index
format. And I suspect the "path" part is available everywhere we need
to handle submodules already, so extracting the external hash should
be possible...
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-19 Thread Duy Nguyen

On Tue, Jul 19, 2016 at 9:18 AM, Johannes Schindelin
 wrote:
>> But we can recreate SHA-1 from the same content and verify GPG, right?
>> I know it's super expensive, but it feels safer to not carry SHA-1
>> around when it's not secure anymore (I recall something about
>> exploiting the weakest link when you have both sha1 and sha256 in the
>> object content). Rehashing would be done locally and is better
>> controlled.
>
> You could. But how would you determine whether to recreate the commit
> object from a SHA-1-ified version of the commit buffer? Fall back if the
> original did not match the signature?

Any repo would have a cut point when they move to sha256 (or whatever
new hash), if we can record this somewhere (e.g. as a tag or a bunch
of tags, or some dummy commits to mark the heads of the repo) then we
only verify gpg signatures _in_ the repository before this point.

> That would pose at least these two problems:
>
> 1. The point of a signature is trust. If all of a sudden the signature
> does not match what is supposedly signed, that trust is broken.
>
> 2. The point of going to a stronger hash is to increase the trust. If
> any developer could decide to sign the SHA-1-ified version of any future
> commit, and Git validating it, it would be even worse than not switching
> to a new hash: it would leave us open to collision attacks *and* pretend
> that we prevented such attacks.

GPG signatures are still valid on the old repo (we will keep old repos
around forever, I suppose). And because they sign on the "weak" hash,
sha1, at some point they will be broken (but until then we can still
regenerate sha1 and verify locally). When sha1 is broken, GPG
signatures of the past can't be trusted anymore.

If people care enough about the past, they should re-sign (at least
for tags). Commits can be re-signed by the person who does the
conversion. Yes you have to trust that person. Sort of a painful fresh
start, with hopefully better security.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-19 Thread David Lang


On Tue, 19 Jul 2016, Johannes Schindelin wrote:


Hi Duy,

On Mon, 18 Jul 2016, Duy Nguyen wrote:


On Sun, Jul 17, 2016 at 4:21 PM, brian m. carlson
 wrote:

I'm going to end up having to do something similar because of the issue
of submodules.  Submodules may still be SHA-1, while the main repo may
be a newer hash.


Or even the other way around, main repo is one with sha1 while
submodule is on sha256. I wonder if we should address this separately
(and even in parallel with sha256 support), making submodules work
with an any external VCS system (that supports some basic operations
we define).


It is safe to assume that any project using a submodule with a more secure
hash would require Git tooling capable of said hash. It would hence make
no sense to use SHA-1 for the super project.

So I do not believe that we have to support the use case of a SHA-1-based
project using SHA-256-based submodules.


they have different upstreams, what if the upstream of the submodule has 
upgraded and is using signed commits of the sha-256 but the upstream of the 
parent hasn't and is using signed commits of sha1?


David Lang
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-19 Thread Johannes Schindelin

Hi Duy,

On Mon, 18 Jul 2016, Duy Nguyen wrote:

> On Sun, Jul 17, 2016 at 4:21 PM, brian m. carlson
>  wrote:
> > I'm going to end up having to do something similar because of the issue
> > of submodules.  Submodules may still be SHA-1, while the main repo may
> > be a newer hash.
> 
> Or even the other way around, main repo is one with sha1 while
> submodule is on sha256. I wonder if we should address this separately
> (and even in parallel with sha256 support), making submodules work
> with an any external VCS system (that supports some basic operations
> we define).

It is safe to assume that any project using a submodule with a more secure
hash would require Git tooling capable of said hash. It would hence make
no sense to use SHA-1 for the super project.

So I do not believe that we have to support the use case of a SHA-1-based
project using SHA-256-based submodules.

Ciao,
Dscho
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-19 Thread Johannes Schindelin

Hi Zsolt,

On Mon, 18 Jul 2016, Herczeg Zsolt wrote:

> >> My point is not to throw out old hashes and break signatures. My point
> >> is to convert the data storage, and use mapping to resolve problems
> >> with those old hashes and signatures.
> >
> > If you convert the data storage, then the SHA-1s listed in the commit
> > objects will have to be rewritten, and then the GPG signature will not
> > match anymore.
> >
> > Call e.g. `git cat-file commit 44cc742a8ca17b9c279be4cc195a93a6ef7a320e`
> > to see the anatomy of a gpg-signed commit object.
> >
> 
> Yes and no. That's the reason you need the two-way lookup table. If
> you need to verify a commit which was signed as SHA-1, you must use
> the lookup table in reverse.

That pretends that it is both easy and trustworthy to know when (and how)
to recreate the SHA-1-ified version of the commit object.

Neither is the case, though.

Ciao,
Johannes
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-19 Thread Johannes Schindelin

Hi Duy,

On Mon, 18 Jul 2016, Duy Nguyen wrote:

> On Mon, Jul 18, 2016 at 5:57 PM, Johannes Schindelin
>  wrote:
> >
> > On Mon, 18 Jul 2016, Herczeg Zsolt wrote:
> >
> >> >> I think converting is a much better option. Use a single-hash
> >> >> storage, and convert everything to that on import/clone/pull.
> >> >
> >> > That ignores two very important issues that I already had mentioned:
> >>
> >> That's not true. If you double-check the next part of my message, you I
> >> just showed that an automatic two-way mapping could solve these
> >> problems! (I even give briefs explanation how to handle referencing and
> >> signature verification in those cases.)
> >>
> >> My point is not to throw out old hashes and break signatures. My point
> >> is to convert the data storage, and use mapping to resolve problems
> >> with those old hashes and signatures.
> >
> > If you convert the data storage, then the SHA-1s listed in the commit
> > objects will have to be rewritten, and then the GPG signature will not
> > match anymore.
> 
> But we can recreate SHA-1 from the same content and verify GPG, right?
> I know it's super expensive, but it feels safer to not carry SHA-1
> around when it's not secure anymore (I recall something about
> exploiting the weakest link when you have both sha1 and sha256 in the
> object content). Rehashing would be done locally and is better
> controlled.

You could. But how would you determine whether to recreate the commit
object from a SHA-1-ified version of the commit buffer? Fall back if the
original did not match the signature? That would pose at least these two
problems:

1. The point of a signature is trust. If all of a sudden the signature
does not match what is supposedly signed, that trust is broken.

2. The point of going to a stronger hash is to increase the trust. If
any developer could decide to sign the SHA-1-ified version of any future
commit, and Git validating it, it would be even worse than not switching
to a new hash: it would leave us open to collision attacks *and* pretend
that we prevented such attacks.

The more I think about it, the more I am convinced that we have no choice
but allow mixed hashes (i.e. both 160-bit SHA-1 and 256-bit new hash,
whatever we settle on). Otherwise there would be no reliable and
trustworthy upgrade path.

But maybe there is a clever strategy I failed to think of?

Ciao,
Dscho
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-18 Thread brian m. carlson

On Mon, Jul 18, 2016 at 11:00:35AM -0700, Junio C Hamano wrote:
> Continuing this thought process, I do not see a good way to allow us
> to wean ourselves off of the old hash, unless we _break_ the pack
> stream format so that each object in the pack carries not just the
> data but also the hash algorithm to be used to _name_ it, so that
> new objects will never be referred to using the old hash.

I think for this reason, I'm going to propose the following approach
when we get there:

* We serialize the hash in the object formats, using multihash or
  something similar.  This means that it is minimally painful if we ever
  need to change in the future[0].
* Each repository carries exactly one hash algorithm, except for
  submodule data.  If we don't do this, then some people will never
  switch because the submodules they depend on haven't.
* If people on the new format need to refer to submodule commits using
  SHA-1, then they have to use a prefix on the hash form; otherwise,
  they can use the raw hash value (without any multihash prefix).
* git fsck verifies one consistent algorithm (excepting submodule
  references).

This preserves the security benefits, avoids future-proofing problems,
and minimizes performance impacts due to naming like you mentioned.

[0] We are practically limited to 256-bit hashes because anything longer
will wrap on an 80-column terminal when in hex form.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204

signature.asc
Description: PGP signature

Re: Git and SHA-1 security (again)

2016-07-18 Thread brian m. carlson

On Mon, Jul 18, 2016 at 09:00:06AM +0200, Johannes Schindelin wrote:
> Hi Brian,
> 
> On Sun, 17 Jul 2016, brian m. carlson wrote:
> 
> > On Sun, Jul 17, 2016 at 10:01:38AM +0200, Johannes Schindelin wrote:
> > > Out of curiosity: have you considered something like padding the SHA-1s
> > > with, say 0xa1, to the size of the new hash and using that padding to
> > > distinguish between old vs new hash?
> > 
> > I'm going to end up having to do something similar because of the issue
> > of submodules.  Submodules may still be SHA-1, while the main repo may
> > be a newer hash.  I was going to zero-pad, however.
> 
> I thought about zero-padding, but there are plenty of
> is_null_sha1()/is_null_oid() calls around. Of course, I assumed
> left-padding. But you may have thought of right-padding instead? That
> would make short name handling much easier, too.

I was going to right-pad.

> FWIW it never crossed my mind to allow different same-sized hash
> algorithms. So I never thought we'd need a way to distinguish, say,
> BLAKE2b-256 from SHA-256.
> 
> Is there a good reason to add the maintenance burden of several 256-bit
> hash algorithms, apart from speed (which in my mind should decide which
> one to use, always, rather than letting the user choose)? It would also
> complicate transport even further, let alone subtree merges from
> differently-hashed repositories.

There are really three candidates:

* SHA-256 (the SHA-2 algorithm): While this looks good right now,
  cryptanalysis is advancing.  This is not a good choice for a long-term
  solution.
* SHA3-256 (the SHA-3 algorithm): This is the conservative choice.  It's
  also faster than SHA-256 on 64-bit systems.  It has a very
  conservative security margin and is a good long-term choice.
* BLAKE2b-256: This is the blisteringly fast choice.  It outperforms
  SHA-1 and even MD5 on 64-bit systems.  This algorithm was designed so
  that nobody would have a reason to use an insecure algorithm.  It will
  probably be secure for some time, but maybe not as long as SHA3-256.

I'm only considering 256-bit hashes, because anything longer won't fit
on an 80-column terminal in hex form.

The reason I had considered implementing both SHA3-256 and BLAKE2b-256
is that I want there to be no reason not to upgrade.  People who need a
FIPS-approved algorithm or want a long-term, conservative choice should
use SHA3-256.  People who want even better performance than current Git
would use BLAKE2b-256.

Performance comparison (my implementations):
SHA-1: 437 MiB/s
SHA-256:   196 MiB/s
SHA3-256:  273 MiB/s
BLAKE2b:   649 MiB/s

I hadn't thought about subtree merges, though.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204

signature.asc
Description: PGP signature

Re: Git and SHA-1 security (again)

2016-07-18 Thread Herczeg Zsolt

>> The reality of the current situation is that it's largely mitigated in
>> practice because:
>>
>> a) it's hard to hand someone a crafted blob to begin with for reasons
>> that have nothing to do with SHA-1 (they'll go "wtf is this garbage?")
>>
>> b) even in that case it's *very* hard to come up with two colliding
>> blobs that are *useful* for some nefarious purpose, e.g. a program A
>> that looks normal being replaced by an evil program B with the same
>> SHA-1.
>
> Thanks.  That's a nice rephrasing of
>
>   
> http://public-inbox.org/git/Pine.LNX.4.58.0504291221250.18901%40ppc970.osdl.org/
>
> where Linus explains SHA-1 is not the security, and the real
> security is in distribution.

If the real security is in the distribution, than why git supports
signed commits and objects?

The security of the signatures do depend on the hash. Saying the hash
is not a security feature and offering GPG signing based on that hash
is a damn big lie. You can change the hash algorithm to a secure one,
or change the signing method to be independent of the hash algorithm,
or you can stop offering signatures at all, but something has to be
done here.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-18 Thread Jonathan Nieder

Junio C Hamano wrote:

> Continuing this thought process, I do not see a good way to allow us
> to wean ourselves off of the old hash, unless we _break_ the pack
> stream format so that each object in the pack carries not just the
> data but also the hash algorithm to be used to _name_ it, so that
> new objects will never be referred to using the old hash.

Taking a step further: I don't think that any backward-compatible
format change would address the security concerns with sufficiently
old hashing algorithms.

As long as my favorite repository is allowed to contain objects
identified by SHA-1, my adversary can exploit a SHA-1 collision using
signed tags referring (possibly indirectly) to backdated objects. The
Git object format does not include a proof of commit date, so I cannot
guarantee "Only old objects are named by SHA-1".

There is a way to get a backward-compatible *user experience* without
the format change being backward-compatible, though. Name all objects
in the repository using FuturisticHash. Also store enough information
to recover the old hashes, either in objects as a new field or in a
side table.

If the old hash is broken, signatures using the old hash cannot be
trusted. An adversary could generate a collision to retroactively
change the meaning of an existing signature. To maintain the meaning
of old signatures, someone has to record the new names of all involved
objects, either before the state of the art in breaking the old hash
advances far enough or using a copy of the repository from before the
state of the art had advanced --- in effect you need new signatures to
maintain the meaning of old signatures. This could happen as part of
the process of updating a repository to use a new hash.

E.g.

object
a787a87b98a7s98798a798b7a98b798a7b98a7b987a9b87a9b87a98b79a87b98a7b98a7b987a987987a878a78a
sha1tag object 04b871796dc0420f8e7561a895b52484b701d51a
type commit
tag signedtag
tagger C O Mitter 1465981006 +

signed tag

signed tag message body
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQEcBAABAgAGBQJXYRhOAAoJEGEJLoW3InGJklkIAIcnhL7RwEb/+QeX9enkXhxn
rxfdqrvWd1K80sl2TOt8Bg/NYwrUBw/RWJ+sg/hhHp4WtvE1HDGHlkEz3y11Lkuh
8tSxS3qKTxXUGozyPGuE90sJfExhZlW4knIQ1wt/yWqM+33E9pN4hzPqLwyrdods
q8FWEqPPUbSJXoMbRPw04S5jrLtZSsUWbRYjmJCHzlhSfFWW4eFd37uquIaLUBS0
rkC3Jrx7420jkIpgFcTI2s60uhSQLzgcCwdA2ukSYIRnjg/zDkj8+3h/GaROJ72x
lZyI6HWixKJkWw8lE9aAOD9TmTW9sFJwcVAzmAuFX2kUreDUKMZduGcoRYGpD7E=
=jpXa
-END PGP SIGNATURE-
-BEGIN PGP SIGNATURE
...
-END PGP SIGNATURE

This example uses a signature to attest that mapping
04b871796dc0420f8e7561a895b52484b701d51a->a787a87b98a7s98798a798b7a98b798a7b98a7b987a9b87a9b87a98b79a87b98a7b98a7b987a987987a878a78a
is correct. A more straightforward approach would be for the
conversion process to produce an out-of-band signed mapping list to
make the sha1tag usable without such a signature.

Summary:
* Git's properties depend on using a single hash function throughout
a repository. I don't think we should change that.

* A safe and mostly painless migration to a stronger hash function is
possible using a signed assertion (for example generated by the
conversion process) of the mapping from old object names to new
object names.

* Dealing with multiple such signed mappings (for example due to
separate conversion of repositories based on linux.git) is left as
an exercise to the reader.

Hope that helps,
Jonathan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-18 Thread Junio C Hamano

Ævar Arnfjörð Bjarmason  writes:

> The reality of the current situation is that it's largely mitigated in
> practice because:
>
> a) it's hard to hand someone a crafted blob to begin with for reasons
> that have nothing to do with SHA-1 (they'll go "wtf is this garbage?")
>
> b) even in that case it's *very* hard to come up with two colliding
> blobs that are *useful* for some nefarious purpose, e.g. a program A
> that looks normal being replaced by an evil program B with the same
> SHA-1.

Thanks.  That's a nice rephrasing of

  
http://public-inbox.org/git/Pine.LNX.4.58.0504291221250.18901%40ppc970.osdl.org/

where Linus explains SHA-1 is not the security, and the real
security is in distribution.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-18 Thread David Lang


On Mon, 18 Jul 2016, Herczeg Zsolt wrote:


In particular, as far as I know and as Theodore Ts'o's post describes
better than I could[1], you seem to be confusing preimage attacks with
collision attacks, and then concluding that because SHA1 is vulnerable
to collision attacks that use-cases that would need a preimage attack
to be compromised (which as far is I can tell, includes all your
examples) are also "broken".


I understand the differences between the collision and preimage
attacks. A collision attack is not that bad for git in a typical
use-case. But I think that it's important to note that there are many
use-cases which do need a hash safe from collision attack. Some
examples:

You maintain a repository with gittorrent with signed commits Others
can use these signatures to verify it's original. Let's say you
include some safe file (potentially binary) from a third-party
contributor. That would be fine if the hash algo is safe. Currently
there is the possibility that you received a (safe) file which was
made to collide with another malicious one. Once you committed (and
signed) that file, the attacker joins the gittorrent network and
starts to distribute the malicious file. Suddenly most of your clients
pulling are infected however your signature is correct.

Or, you would like to make a continuous delivery system, where you use
signed tags. The delivery happens only when signature is right, and
the signer is responsible for it. Your colleague makes a collision,
pushes the good-file. You make all the tests, everything is fine, sign
and push and wait for the delivery to happen. Your colleague changes
the file on the server. The delivery makes a huge mass, and you're
fired.

Or, let's say you use a service like github, which is nice enough to
make a repository for you, with .gitignore, licenses and everything.
Likely, you'll never change dose files. Let's say that service made
one of those initial files to collide something bad. That means, they
can "infect" anyone, who is pulling your repo.

Do you need more hypothetical stories? There are a lot. Of course they
need a lot of work, and they're unlikely to happen. But it's possible.
If you need trust, and gpg signatures that means you need ultimate
trust. What's the point in making GPG signatures anyway if you cannot
ultimately trust them? You could just as well say: well that's
repository is only reachable by trustworthy persons, everything here
is just fine and really made by the person named in the "author
field".


All of your examples are actually preimage attacks. If the bad guy can tamper 
with the both the 'safe' and 'malicious' versions of the file, they don't 
actually need the malicious version, they can attack you through the one you 
think is 'safe'


The 'collision' attack isn't that there is some increased chance of a random 
file colliding with your safe file, it's that if you are manipulating the 
contents of both files, you can create two that collide. This won't hurt a Git 
repository unless one of these manipulated files is able to be introduced as a 
legitimate part of the repo you are dealing with.


David Lang
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-18 Thread Ævar Arnfjörð Bjarmason

On Mon, Jul 18, 2016 at 7:48 PM, Herczeg Zsolt  wrote:
>> In particular, as far as I know and as Theodore Ts'o's post describes
>> better than I could[1], you seem to be confusing preimage attacks with
>> collision attacks, and then concluding that because SHA1 is vulnerable
>> to collision attacks that use-cases that would need a preimage attack
>> to be compromised (which as far is I can tell, includes all your
>> examples) are also "broken".
>
> I understand the differences between the collision and preimage
> attacks.

Fair enough. The rest of your E-Mail certainly shows that you do, and
I didn't know enough anything about GitTorrent and this case where
it's vulnerable to collission attacks.

But I didn't get that impression from your initial E-Mail which
outright said said:

Git signed tags and signed commits are cryptographically
insecure, they're useless at the moment.

It's important that those of us who *do* understand the difference
between collision and preimage attacks carefully phrase things, least
they turn into FUD.

Your initial E-Mail does *not* make it sound like you're just talking
about the cases where someone's provided you with a crafted blob that
you've been tricked into signing, but rather makes it sound like
signed tags & commits are just categorically broken, even for preimage
attacks, which is not the case.

The reality of the current situation is that it's largely mitigated in
practice because:

a) it's hard to hand someone a crafted blob to begin with for reasons
that have nothing to do with SHA-1 (they'll go "wtf is this garbage?")

b) even in that case it's *very* hard to come up with two colliding
blobs that are *useful* for some nefarious purpose, e.g. a program A
that looks normal being replaced by an evil program B with the same
SHA-1.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-18 Thread Junio C Hamano

"brian m. carlson"  writes:

> I will say that the pack format will likely require some changes,
> because it assumes ...
> The reason is that we can't have an unambiguous parse of the current
> objects if two hash algorithms are in use
> So when we look at a new hash, we need to provide an unambiguous way to
> know what hash is in use.  The two choices are to either require all
> object use the new hash, or to extend the objects to include the hash.
> Until a couple days ago, I had planned to do the former.  I had not even
> considered using a multihash approach due to the complexity.

Objects in Git identify themselves, but once you introduce the
second hash function (as opposed to replacing the hash function to a
new one), you would allow people to call the same object by two
names.  That has interesting implications.

Let's say you have a blob at path F in a top-level tree object and
create a commit.  You have three objects in total, the tree knows
the blob as one name based on SHA-1 and the commit knows the tree as
one name based on SHA-1.  The same contents of the blob and the tree
could have different names based on SHA-256 in the future Git.

Let's further say you have a future Git and clone from the above
repository with three objects.  You get a pack stream, containing
the data for one commit, tree and blob each.  These objects do not
carry their own name as extra pieces of information.  You only get
their contents, and it is up to you to name them by hashing.  .idx
files are created by running index-pack while receiving the pack
data stream.  You _somehow_ need to know that these three objects
need to be hashed with SHA-1, even though you are SHA-256 capable,
because otherwise the object name recorded in the tree object for
the blob would not match what your .idx file would call the blob
data.  Also the object name recorded in the ref to point at the
commit would not match the commit object's object name, unless you
hash with SHA-1.  It is a possibility to always hash these objects
twice and record _both_ hashes in the updated .idx file; after all,
.idx files are strictly local matter.

Now let's further say that you update the file F in the working
tree, and do "git commit -a" with updated version of Git.  What
should happen?  Assuming that we are trying to migrate to a
different hashing algorithm over time, we would want to create a new
blob under object name based on SHA-256, add that to the index and
write a new tree out, named by hashing with SHA-256.  We then record
that longer-named tree in a commit whose parent commit is still
named with SHA-1 based hash, and the new commit in turn is named by
hashing with SHA-256.

Then you push the result back.  Let's assume by now the place you
cloned from is also SHA-256 capable.  You look at the tips of refs
at your clone-source and discover that you would need to only send
the new commit, its tree and the updated blob.  You send data in
these three objects.  The receiving end would now need to do the
same "magically choose hash to make sure the new blob gets the name
that is recorded in the new tree (and the new tree the new commit)"
thing.  The same discussion applies if somebody else clones from you
at this point.  The objects introduced by the second commit all need
to be hashed with the new hash to be named, while the other objects
need to be hashed with the old hash.

Continuing this thought process, I do not see a good way to allow us
to wean ourselves off of the old hash, unless we _break_ the pack
stream format so that each object in the pack carries not just the
data but also the hash algorithm to be used to _name_ it, so that
new objects will never be referred to using the old hash.

It matters performance-wise that the weaning process go as quickly
as possible, once the system becomes capable of new hash algorighm,
because during the transition period, we'd have to suffer the full
tree-diff becoming inefficient (Note: don't limit your thinking to
just "git diff" and "git log"; the same inefficiency hits "git
checkout" to switch branches and "git merge" to walk three trees in
parallel), because we cannot skip descending into subdirectories
based on the tree object name being equal, which guarantees that
everything under the hierarchy is equal.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-18 Thread Herczeg Zsolt

> In particular, as far as I know and as Theodore Ts'o's post describes
> better than I could[1], you seem to be confusing preimage attacks with
> collision attacks, and then concluding that because SHA1 is vulnerable
> to collision attacks that use-cases that would need a preimage attack
> to be compromised (which as far is I can tell, includes all your
> examples) are also "broken".

I understand the differences between the collision and preimage
attacks. A collision attack is not that bad for git in a typical
use-case. But I think that it's important to note that there are many
use-cases which do need a hash safe from collision attack. Some
examples:

You maintain a repository with gittorrent with signed commits Others
can use these signatures to verify it's original. Let's say you
include some safe file (potentially binary) from a third-party
contributor. That would be fine if the hash algo is safe. Currently
there is the possibility that you received a (safe) file which was
made to collide with another malicious one. Once you committed (and
signed) that file, the attacker joins the gittorrent network and
starts to distribute the malicious file. Suddenly most of your clients
pulling are infected however your signature is correct.

Or, you would like to make a continuous delivery system, where you use
signed tags. The delivery happens only when signature is right, and
the signer is responsible for it. Your colleague makes a collision,
pushes the good-file. You make all the tests, everything is fine, sign
and push and wait for the delivery to happen. Your colleague changes
the file on the server. The delivery makes a huge mass, and you're
fired.

Or, let's say you use a service like github, which is nice enough to
make a repository for you, with .gitignore, licenses and everything.
Likely, you'll never change dose files. Let's say that service made
one of those initial files to collide something bad. That means, they
can "infect" anyone, who is pulling your repo.

Do you need more hypothetical stories? There are a lot. Of course they
need a lot of work, and they're unlikely to happen. But it's possible.
If you need trust, and gpg signatures that means you need ultimate
trust. What's the point in making GPG signatures anyway if you cannot
ultimately trust them? You could just as well say: well that's
repository is only reachable by trustworthy persons, everything here
is just fine and really made by the person named in the "author
field".
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-18 Thread Duy Nguyen

On Sun, Jul 17, 2016 at 4:21 PM, brian m. carlson
 wrote:
> I'm going to end up having to do something similar because of the issue
> of submodules.  Submodules may still be SHA-1, while the main repo may
> be a newer hash.

Or even the other way around, main repo is one with sha1 while
submodule is on sha256. I wonder if we should address this separately
(and even in parallel with sha256 support), making submodules work
with an any external VCS system (that supports some basic operations
we define).
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-18 Thread Ævar Arnfjörð Bjarmason

On Sat, Jul 16, 2016 at 3:48 PM, Herczeg Zsolt  wrote:
> I would like to discuss an old topic from 2006. I understand it was
> already discussed. The only reason i'm sending this e-mail is to talk
> about a possible solution which didn't show up on this list before.

You mention the 2006 discussion, but I wonder if you've read the more
recent discussion from April on the subject.

> I think we all understand that SHA-1 is broken. It still works perfect
> as a storage key, but it's not cryptographically secure anymore. Git
> is not moving away from SHA-1 because it would break too many
> projects, and cryptographic security is not needed but git if you have
> your own repository.
>
> However I would like to show some big problems caused by SHA-1:
>  - Git signed tags and signed commits are cryptographically insecure,
> they're useless at the moment.
>  - Git Torrent (https://github.com/cjb/GitTorrent) is also
> cryptographically broken, however it would be an awesome experiment.
>  - Linus said: "You only need to know the SHA-1 of the top of your
> tree, and if you know that, you can trust your tree." That's not true
> anymore. You have to trust your computer, you servers, your git
> provider in a way that no-one can maliciously modify your data.

In particular, as far as I know and as Theodore Ts'o's post describes
better than I could[1], you seem to be confusing preimage attacks with
collision attacks, and then concluding that because SHA1 is vulnerable
to collision attacks that use-cases that would need a preimage attack
to be compromised (which as far is I can tell, includes all your
examples) are also "broken".

1. http://thread.gmane.org/gmane.comp.version-control.git/291305/focus=291511
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-18 Thread Herczeg Zsolt

Hi Johannes,

>> My point is not to throw out old hashes and break signatures. My point
>> is to convert the data storage, and use mapping to resolve problems
>> with those old hashes and signatures.
>
> If you convert the data storage, then the SHA-1s listed in the commit
> objects will have to be rewritten, and then the GPG signature will not
> match anymore.
>
> Call e.g. `git cat-file commit 44cc742a8ca17b9c279be4cc195a93a6ef7a320e`
> to see the anatomy of a gpg-signed commit object.
>

Yes and no. That's the reason you need the two-way lookup table. If
you need to verify a commit which was signed as SHA-1, you must use
the lookup table in reverse. This way you can reconstruct the original
commit structure, which than can be verified. Of course it's work to
do so but you only need to develop the new signature verification
algorithm. You save much more on the other side where you don't have
to rework all the other algorithms to multi-hash.

Another interesting point is that multi-hash storage, actively hurts
signature security! (Duy just mentoined that while I'm writing.) A
signed commit (or tag) is just as secure as the least secure hash it
refers (directly or indirectly). Let's imagine that you make a new a
commit, and there is on old file in the tree somewhere. That's a weak
point: cause it has SHA-1 hash, someone can replace it (and thus
change your commits content.

I would clearly mark any signature wether it's SHA-1 or SHA2 (or
anything else) based, and strictly allow that hash in all the trees
and objects while verifying that commit. If it's not the same
hash-type as the storage-key, than use the lookup table for conversion
before check. (This has some interesting side-effects, but it's all
about good implementation).
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-18 Thread Duy Nguyen

On Mon, Jul 18, 2016 at 5:57 PM, Johannes Schindelin
 wrote:
> Hi Zsolt,
>
> On Mon, 18 Jul 2016, Herczeg Zsolt wrote:
>
>> >> I think converting is a much better option. Use a single-hash
>> >> storage, and convert everything to that on import/clone/pull.
>> >
>> > That ignores two very important issues that I already had mentioned:
>>
>> That's not true. If you double-check the next part of my message, you I
>> just showed that an automatic two-way mapping could solve these
>> problems! (I even give briefs explanation how to handle referencing and
>> signature verification in those cases.)
>>
>> My point is not to throw out old hashes and break signatures. My point
>> is to convert the data storage, and use mapping to resolve problems
>> with those old hashes and signatures.
>
> If you convert the data storage, then the SHA-1s listed in the commit
> objects will have to be rewritten, and then the GPG signature will not
> match anymore.

But we can recreate SHA-1 from the same content and verify GPG, right?
I know it's super expensive, but it feels safer to not carry SHA-1
around when it's not secure anymore (I recall something about
exploiting the weakest link when you have both sha1 and sha256 in the
object content). Rehashing would be done locally and is better
controlled.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-18 Thread Johannes Schindelin

Hi Zsolt,

On Mon, 18 Jul 2016, Herczeg Zsolt wrote:

> >> I think converting is a much better option. Use a single-hash
> >> storage, and convert everything to that on import/clone/pull.
> >
> > That ignores two very important issues that I already had mentioned:
> 
> That's not true. If you double-check the next part of my message, you I
> just showed that an automatic two-way mapping could solve these
> problems! (I even give briefs explanation how to handle referencing and
> signature verification in those cases.)
> 
> My point is not to throw out old hashes and break signatures. My point
> is to convert the data storage, and use mapping to resolve problems
> with those old hashes and signatures.

If you convert the data storage, then the SHA-1s listed in the commit
objects will have to be rewritten, and then the GPG signature will not
match anymore.

Call e.g. `git cat-file commit 44cc742a8ca17b9c279be4cc195a93a6ef7a320e`
to see the anatomy of a gpg-signed commit object.

Ciao,
Johannes
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-18 Thread Herczeg Zsolt

>> I think converting is a much better option. Use a single-hash storage, and
>> convert everything to that on import/clone/pull.
>
> That ignores two very important issues that I already had mentioned:

That's not true. If you double-check the next part of my message, you
I just showed that an automatic two-way mapping could solve these
problems! (I even give briefs explanation how to handle referencing
and signature verification in those cases.)

My point is not to throw out old hashes and break signatures. My point
is to convert the data storage, and use mapping to resolve problems
with those old hashes and signatures. A single-hash data storage is
obviously way easier to handle, than a multi-hash mass. (See Linus's
old e-mail: multiple hashes [=meaning database keys] for the same
content is a complete nonsense in git-speak)

> The "convert everything" strategy also ignores the problem of interacting
> with servers and collaborators. Think of hosting repositories,
> rediscovering forgotten work trees, and of the "D" in DSCM.

That's not an issue when we're working with a single repository. It's
reasonable to ask for all git clients of the same repository, to
support the same hash. Yes, you have the need to configure the hash
algo on a per-repository basis but that's all. For importing and
co-working between different repositories, it's a bit harder, problem,
but it's possible to handle the conversions correctly.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-18 Thread Johannes Schindelin

Hi Zsolt,

On Mon, 18 Jul 2016, Herczeg Zsolt wrote:

> I think converting is a much better option. Use a single-hash storage, and
> convert everything to that on import/clone/pull.

That ignores two very important issues that I already had mentioned:

- existing references, both in-repository, e.g. in commit messages
  referring to earlier commits, as well as out-of-repository, e.g.
  referring to commits in mails, blog posts, etc

- GPG-signed commits

Those issues cannot just be hand-waved away.

The "convert everything" strategy also ignores the problem of interacting
with servers and collaborators. Think of hosting repositories,
rediscovering forgotten work trees, and of the "D" in DSCM.

Ciao,
Johannes
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-18 Thread Johannes Schindelin

Hi Brian,

On Sun, 17 Jul 2016, brian m. carlson wrote:

> On Sun, Jul 17, 2016 at 10:01:38AM +0200, Johannes Schindelin wrote:
> > Out of curiosity: have you considered something like padding the SHA-1s
> > with, say 0xa1, to the size of the new hash and using that padding to
> > distinguish between old vs new hash?
> 
> I'm going to end up having to do something similar because of the issue
> of submodules.  Submodules may still be SHA-1, while the main repo may
> be a newer hash.  I was going to zero-pad, however.

I thought about zero-padding, but there are plenty of
is_null_sha1()/is_null_oid() calls around. Of course, I assumed
left-padding. But you may have thought of right-padding instead? That
would make short name handling much easier, too.

FWIW it never crossed my mind to allow different same-sized hash
algorithms. So I never thought we'd need a way to distinguish, say,
BLAKE2b-256 from SHA-256.

Is there a good reason to add the maintenance burden of several 256-bit
hash algorithms, apart from speed (which in my mind should decide which
one to use, always, rather than letting the user choose)? It would also
complicate transport even further, let alone subtree merges from
differently-hashed repositories.

Ciao,
Dscho
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Fwd: Git and SHA-1 security (again)

2016-07-17 Thread Herczeg Zsolt

Do you think the multi-hash approach worth the added complexity? It'll
break a lot of things. I mean almost everything. All git algorithms
rely on the "same hash => same content" "same content => same hash"
statements.

I think converting is a much better option. Use a single-hash storage,
and convert everything to that on import/clone/pull. I would only
introduce a a two-way mapping table (old-hash <=> new-hash) to help
the users. In the normal workflow, everything can go with new-hashes
only.

That leaves most algorithms and code intact, and introduces a very few
new cases:

On import:
If the imported data does not match your selected new-hash format, add
it's hash to the lookup table, than convert it to your selected
format, and handle it as such.

If a user references an old hash:
We can look-up that table forward, and find the referenced object in
storage. We can handle it from now as normal.

On an old signature verify:
Look-up the table forward, find the object by knew key. Look-up
backwards for all referenced objects, reconstruct the "old-format" and
verify the hash on that.

If you double-check your hashes as you build the mapping, you can even
trust it, which makes the lookups and verifys very fast. You can
introduce as many hash mapping tables as you want, so you can not only
support old-to-new has transition, but there can be as many different
hashes in the world as you want. Your only rule is "you reference your
work in your current format", but you can look-up any references which
was valid at the moment it was made.

(There are slight issues with this aproach if we "convert than
convert". As for example, when you import from A (sha1) repo to B
(sha2) repo it's perfectl. But when you import the same commits from B
to C (sha3), you might loose sha1 references. That could be considered
normal if we wan't to keep-it-simple-stupid, only support a few
hashes, always going forward. Or you may add an extra "list" field to
objects, which could show what type of hashes you have to keep in
lookup-tables for that particular object. Or, you can even include a
list of old hashes in the object itself, which should make it to the
lookup table on import.)

Anyway, I think a single-hash storage, and the ability to hide any old
hashes from most of the internal algorithms is a key point in making
transition. If we want to provide multi-hash interface to users, than
we should look for "wrapper" solutions, that translates multi-hash
user needs to a single-hash backend.

Zsolt
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-17 Thread brian m. carlson

On Sun, Jul 17, 2016 at 12:23:49PM -0400, Theodore Ts'o wrote:
> On Sun, Jul 17, 2016 at 03:42:34PM +, brian m. carlson wrote:
> > As I said, I'm not planning on multiple hash support at first, but it
> > doesn't appear impossible if we go this route.  We might still have to
> > rewrite objects, but we can verify signatures over the legacy SHA-1
> > objects by forcing them into the old-style object format.
> 
> How hard would it be to make the on-disk format be multihash, even if
> there is no support for anything other than a single hash, at least
> for now?  That way we won't have to rewrite the objects twice.

Other than the amount of work to change reading from the on-disk format,
nothing prevents us from doing that, although I would recommend storing
the object database with the tag prefix if we do so (i.e., instead of
.git/objects/17, writing .git/objects/111417).  That future-proofs us
for when we change the hash.

I will say that the pack format will likely require some changes,
because it assumes things are 4-byte aligned.  It also assumes you can
use the object ID in the mmaped pack directly (4-byte aligned), which
you can no longer do.  We have some cases where we cast that memory
directly to struct object_id, which will no longer be valid, and even if
we add the two prefix bytes to struct object_id, that doesn't guarantee
that struct won't be aligned differently.

We could require that the pack format have two NUL bytes before the
hash, which would force it to be aligned.  We'd still have to make the
Git protocol negotiate the new extension and fail gracefully if the
version is too old.  We could do this by requiring a pack version 5,
which would simply cause older Gits to report errors.

It's a lot of work, and it's definitely a flag day.  That's why I had
planned to only do it with a new hash format: it would impact only
people who were moving to the new hash.  It also means that we get to
work out any problems with the design at that point and not be committed
to a design that might be inadequate.  This is a place where I don't
want to mess up.

> Personally, so long as the newer versions of the tree are secured, I
> wouldn't mind if the older commits stayed using SHA1 only.  The newer
> commits are the ones that are most important and security-critical
> anyway.  It seems like the main reason to rewrite all of the objects
> is to simplify the initial rollout of a newer hash algorithm, no?

The reason is that we can't have an unambiguous parse of the current
objects if two hash algorithms are in use.  tree objects don't use a hex
encoding of hashes; they use a binary encoding.  It's therefore possible
to create an ambiguous tree representation.

So when we look at a new hash, we need to provide an unambiguous way to
know what hash is in use.  The two choices are to either require all
object use the new hash, or to extend the objects to include the hash.
Until a couple days ago, I had planned to do the former.  I had not even
considered using a multihash approach due to the complexity.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204

signature.asc
Description: PGP signature

Re: Git and SHA-1 security (again)

2016-07-17 Thread Theodore Ts'o

On Sun, Jul 17, 2016 at 03:42:34PM +, brian m. carlson wrote:
> As I said, I'm not planning on multiple hash support at first, but it
> doesn't appear impossible if we go this route.  We might still have to
> rewrite objects, but we can verify signatures over the legacy SHA-1
> objects by forcing them into the old-style object format.

How hard would it be to make the on-disk format be multihash, even if
there is no support for anything other than a single hash, at least
for now?  That way we won't have to rewrite the objects twice.

Personally, so long as the newer versions of the tree are secured, I
wouldn't mind if the older commits stayed using SHA1 only.  The newer
commits are the ones that are most important and security-critical
anyway.  It seems like the main reason to rewrite all of the objects
is to simplify the initial rollout of a newer hash algorithm, no?

   - Ted
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-17 Thread brian m. carlson

On Sun, Jul 17, 2016 at 05:19:02PM +0200, Duy Nguyen wrote:
> On Sun, Jul 17, 2016 at 4:21 PM, brian m. carlson
>  wrote:
> > On Sun, Jul 17, 2016 at 10:01:38AM +0200, Johannes Schindelin wrote:
> >> Out of curiosity: have you considered something like padding the SHA-1s
> >> with, say 0xa1, to the size of the new hash and using that padding to
> >> distinguish between old vs new hash?
> >
> > I'm going to end up having to do something similar because of the issue
> > of submodules.  Submodules may still be SHA-1, while the main repo may
> > be a newer hash.  I was going to zero-pad, however.  I was also, at
> > least at first, going to force a separate .git dir for those, to avoid
> > having to try to store two separate types of objects in the same repo.
> 
> If it's just the external hash representation, can we go with a prefix
>  to identify the hash algorithm? For example
> sha256:1234... is SHA-256 while 1235... by default is SHA-1 (but we
> could switch the default to SHA-256 via config file later SHA-1 is
> dead and nobody wants to type sha256: every time). It catches
> incorrect hash algorithm references.

I'd make it such that the default is that of the repo.  If the current
repo is generating SHA-256, say, then 473a0f4 refers to the empty blob.
If you want to refer to an SHA-1 object, then you write sha-1:e69de29.

On disk, multihash[0] seems like the right way to go.  We'd serialize
references to the SHA-1 and SHA-256 empty blobs as
1114e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 and
1220473a0f4c3be8a93681a267e3b1e9a7dcda1185436fe141f7749120a303721813
respectively.  This makes parsing significantly easier.  On disk, we
could write them into the object database as 1114e6/9de2… and
122047/3a0f….

We could implement the default hash algorithm as extensions.hash and the
on-disk format (which would be a requirement for extensions.hash) as
extensions.explicitHash.

As I said, I'm not planning on multiple hash support at first, but it
doesn't appear impossible if we go this route.  We might still have to
rewrite objects, but we can verify signatures over the legacy SHA-1
objects by forcing them into the old-style object format.

[0] https://github.com/jbenet/multihash
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204

signature.asc
Description: PGP signature

Re: Git and SHA-1 security (again)

2016-07-17 Thread Duy Nguyen

On Sun, Jul 17, 2016 at 4:21 PM, brian m. carlson
 wrote:
> On Sun, Jul 17, 2016 at 10:01:38AM +0200, Johannes Schindelin wrote:
>> Out of curiosity: have you considered something like padding the SHA-1s
>> with, say 0xa1, to the size of the new hash and using that padding to
>> distinguish between old vs new hash?
>
> I'm going to end up having to do something similar because of the issue
> of submodules.  Submodules may still be SHA-1, while the main repo may
> be a newer hash.  I was going to zero-pad, however.  I was also, at
> least at first, going to force a separate .git dir for those, to avoid
> having to try to store two separate types of objects in the same repo.

If it's just the external hash representation, can we go with a prefix
 to identify the hash algorithm? For example
sha256:1234... is SHA-256 while 1235... by default is SHA-1 (but we
could switch the default to SHA-256 via config file later SHA-1 is
dead and nobody wants to type sha256: every time). It catches
incorrect hash algorithm references.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-17 Thread brian m. carlson

On Sun, Jul 17, 2016 at 10:01:38AM +0200, Johannes Schindelin wrote:
> Out of curiosity: have you considered something like padding the SHA-1s
> with, say 0xa1, to the size of the new hash and using that padding to
> distinguish between old vs new hash?

I'm going to end up having to do something similar because of the issue
of submodules.  Submodules may still be SHA-1, while the main repo may
be a newer hash.  I was going to zero-pad, however.  I was also, at
least at first, going to force a separate .git dir for those, to avoid
having to try to store two separate types of objects in the same repo.

The other limitation with this is that it isn't immediately obvious what
hash is in use just because it has a certain length.  For example, I
plan on implementing SHA3-256, but it's also possible I might add
BLAKE2b-256 for people for whom SHA3-256 is too slow.  There's no way to
distinguish between those two algorithms.  Thus allowing multiple hashes
in the same repo won't work without a format byte.

What I might do, however, is add multihash-style format information to
the on-disk format for non-SHA-1 repos.  Then SHA-1 compatibility could
come in a future iteration.  That would be compatible with the existing
refactor.

> I guess that it would also possible to introduce an opt-in "legacy mapper"
> which would generate a mapping locally of all objects' SHA-1 to whatever
> new hash you choose. Generating it locally would side-step the security
> issues of the SHA-1 algorithm. We would need to teach Git to pick that
> mapping up if available and use it, of course.

I think that might be easier.  Considering the number of tests that
hard-code object names, I might need that for the testsuite.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204

signature.asc
Description: PGP signature

Re: Git and SHA-1 security (again)

2016-07-17 Thread Johannes Schindelin

Hi Brian,

On Sat, 16 Jul 2016, brian m. carlson wrote:

> My current plan is not to implement side-by-side data, for a couple
> reasons.

I am as guilty as the next person to have use the "deafbee(This is my
commit, 2007-08-21)" format to refer to older commits. So I do have
concerns about rewriting history when switching to a new hash.

I understand the technical challenges, of course.

Out of curiosity: have you considered something like padding the SHA-1s
with, say 0xa1, to the size of the new hash and using that padding to
distinguish between old vs new hash?

I guess that it would also possible to introduce an opt-in "legacy mapper"
which would generate a mapping locally of all objects' SHA-1 to whatever
new hash you choose. Generating it locally would side-step the security
issues of the SHA-1 algorithm. We would need to teach Git to pick that
mapping up if available and use it, of course.

However, that latter solution would do nothing to address the problem of
existing GPG-signed commits.

Ciao,
Dscho
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-16 Thread brian m. carlson

On Sat, Jul 16, 2016 at 11:46:06PM +0200, Herczeg Zsolt wrote:
> Dear Brian,
> 
> Thank you for your response. It very good to hear that changing the
> hash is on the git project's list. I haven't found any official
> communication on that topic since 2006.

There's been some recent discussion on the list about it.  It is less on
the Git project's list and more on my personal list.  It's my hope that
Junio and other contributors will decide to accept my patches when they
are ready.  Also, the plan is to keep SHA-1 available, probably as the
default, for backwards compatibility.

> I'll look into the contributions guide and the source codes, to check
> if I can contribute to this transition. If you have any documentation
> or other related info, please point me towards it.

The major work at this point is turning instances of unsigned char [20]
into struct object_id, as well as converting hardcoded 20 and 40 (and
derivative values) to GIT_SHA1_RAWSZ and GIT_SHA1_HEXSZ.  This work
allows us to make as little code as possible know about the size of the
hash, as well as generally being easier to maintain.

You can look at the bc/cocci branch which was recently merged into next.
(It doesn't exist independently outside of next, so you'll have to
search through the history).  That work is what in my branches is called
object-id-part4.  I'm currently working on getting to the point of
converting get_tree_entry to use struct object_id, which is what will
become my object-id-part5.

I recommend if you're planning on doing some of this work that you try
to avoid areas which are under work by other developers, especially the
refs code, which is undergoing massive changes.  Other people will
appreciate it.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204

signature.asc
Description: PGP signature

Re: Git and SHA-1 security (again)

2016-07-16 Thread Herczeg Zsolt

Dear Brian,

Thank you for your response. It very good to hear that changing the
hash is on the git project's list. I haven't found any official
communication on that topic since 2006.
I'll look into the contributions guide and the source codes, to check
if I can contribute to this transition. If you have any documentation
or other related info, please point me towards it.

Thanks,
Zsolt Herczeg


2016-07-16 22:13 GMT+02:00 brian m. carlson :
> On Sat, Jul 16, 2016 at 03:48:49PM +0200, Herczeg Zsolt wrote:
>> But - and that's the main idea i'm writing here - changing the storage
>> keys does not mean you should drop your old hashes out. If you change
>> the git data structure in a way, that it can keep multiple hashes for
>> the same "link" in each objects (trees, commits, etc) you can keep the
>> old ones right next to the new one. If you want to look up the
>> referenced object, you must use the newest hash - which is the key.
>> But if you want to verify some old hash, it's still possible! Just
>> look up the objects by the new key, remove all the newer generation
>> keys, and verify the old hash on that.
>>
>> A storage structure like this would allow a very great flexibility:
>>  - You can change your hash algorithm in the future. If SHA-256
>> becomes broken, it's not a problem. Just re-hash the storage, and
>> append the new hashes the git objects.
>>  - You can still verify your old hashes after a hash change - removing
>> the new hashes from the objects before hashing should give you back
>> the old objects, thus giving you the same hash as before.
>>  - That makes possible for signed tags, and commits to keep their
>> validity after hash change! With a clever-enough new format, you can
>> even keep the validity of current hashes and signs. (To be able to do
>> that, you should be able to calculate back the current format from the
>> new format.)
>>
>> Moving git forward to a format like this would solve the weak-key
>> problem in git forever. You would be able to configure your key algo
>> on a per repository basis, you - and git - can do the daily work on
>> the newest hashes, while still carrying the old hashes and signatures,
>> in case you ever want to verify them. That would allow repositories to
>> gracefully change hashes in case they need to, and to only
>> compatibility limitation is that you must use a new enough git to
>> understand the new storage format.
>>
>> What are your thoughts on this approach? Will git ever reach a release
>> with exchangeable hash algorithm? Or should someone look for
>> alternatives if there's a need for cryptographic security?
>
> I'm working on adding new hash algorithm support in Git.  However, it
> requires a significant refactor of the code base.  My current plan is
> not to implement side-by-side data, for a couple reasons.
>
> One is that it requires significantly more work to implement and
> complicates the code.  It's also incompatible with all the refactoring
> I've done already.
>
> The second is that it requires that Git have the ability to store
> multiple hashes at once, which is very expensive in terms of memory.
> Moving from a 160-bit hash to a 256-bit hash (my current plan is
> SHA3-256) requires 1.6× the memory.  Storing both requires 2.6× the
> memory.  If you add a third hash, it requires even more.  Memory is
> often a constraint with using Git.
>
> The current plan is to use git-fast-import and git-fast-export to handle
> that conversion process, and then maybe provide wrappers to make it more
> transparent.
>
> Currently the process of the refactor is ongoing, but it is a free time
> activity for me.
>
> If you'd like to follow the progress roughly, you can do so by checking
> the output of the following commands:
>
>   git grep 'unsigned char.*20' | wc -l
>   git grep 'struct object_id' | wc -l
>
> You are also welcome to contribute, of course.
> --
> brian m. carlson / brian with sandals: Houston, Texas, US
> +1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
> OpenPGP: https://keybase.io/bk2204
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-07-16 Thread brian m. carlson

On Sat, Jul 16, 2016 at 03:48:49PM +0200, Herczeg Zsolt wrote:
> But - and that's the main idea i'm writing here - changing the storage
> keys does not mean you should drop your old hashes out. If you change
> the git data structure in a way, that it can keep multiple hashes for
> the same "link" in each objects (trees, commits, etc) you can keep the
> old ones right next to the new one. If you want to look up the
> referenced object, you must use the newest hash - which is the key.
> But if you want to verify some old hash, it's still possible! Just
> look up the objects by the new key, remove all the newer generation
> keys, and verify the old hash on that.
> 
> A storage structure like this would allow a very great flexibility:
>  - You can change your hash algorithm in the future. If SHA-256
> becomes broken, it's not a problem. Just re-hash the storage, and
> append the new hashes the git objects.
>  - You can still verify your old hashes after a hash change - removing
> the new hashes from the objects before hashing should give you back
> the old objects, thus giving you the same hash as before.
>  - That makes possible for signed tags, and commits to keep their
> validity after hash change! With a clever-enough new format, you can
> even keep the validity of current hashes and signs. (To be able to do
> that, you should be able to calculate back the current format from the
> new format.)
> 
> Moving git forward to a format like this would solve the weak-key
> problem in git forever. You would be able to configure your key algo
> on a per repository basis, you - and git - can do the daily work on
> the newest hashes, while still carrying the old hashes and signatures,
> in case you ever want to verify them. That would allow repositories to
> gracefully change hashes in case they need to, and to only
> compatibility limitation is that you must use a new enough git to
> understand the new storage format.
> 
> What are your thoughts on this approach? Will git ever reach a release
> with exchangeable hash algorithm? Or should someone look for
> alternatives if there's a need for cryptographic security?

I'm working on adding new hash algorithm support in Git.  However, it
requires a significant refactor of the code base.  My current plan is
not to implement side-by-side data, for a couple reasons.

One is that it requires significantly more work to implement and
complicates the code.  It's also incompatible with all the refactoring
I've done already.

The second is that it requires that Git have the ability to store
multiple hashes at once, which is very expensive in terms of memory.
Moving from a 160-bit hash to a 256-bit hash (my current plan is
SHA3-256) requires 1.6× the memory.  Storing both requires 2.6× the
memory.  If you add a third hash, it requires even more.  Memory is
often a constraint with using Git.

The current plan is to use git-fast-import and git-fast-export to handle
that conversion process, and then maybe provide wrappers to make it more
transparent.

Currently the process of the refactor is ongoing, but it is a free time
activity for me.

If you'd like to follow the progress roughly, you can do so by checking
the output of the following commands:

  git grep 'unsigned char.*20' | wc -l
  git grep 'struct object_id' | wc -l

You are also welcome to contribute, of course.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204

signature.asc
Description: PGP signature

Git and SHA-1 security (again)

2016-07-16 Thread Herczeg Zsolt

Dear List Members, Git Developers,

I would like to discuss an old topic from 2006. I understand it was
already discussed. The only reason i'm sending this e-mail is to talk
about a possible solution which didn't show up on this list before.

I think we all understand that SHA-1 is broken. It still works perfect
as a storage key, but it's not cryptographically secure anymore. Git
is not moving away from SHA-1 because it would break too many
projects, and cryptographic security is not needed but git if you have
your own repository.

However I would like to show some big problems caused by SHA-1:
 - Git signed tags and signed commits are cryptographically insecure,
they're useless at the moment.
 - Git Torrent (https://github.com/cjb/GitTorrent) is also
cryptographically broken, however it would be an awesome experiment.
 - Linus said: "You only need to know the SHA-1 of the top of your
tree, and if you know that, you can trust your tree." That's not true
anymore. You have to trust your computer, you servers, your git
provider in a way that no-one can maliciously modify your data.

I understand that git is perfect for a work flow, where you have your
very own repository and you double-check any commits or diffs you
accepting to it. But that's not everybody's work flow. For example: if
I want to blindly trust my college, I could just include all commits
he signed without review. Currently I can't do that. There are
workarounds of course: signing the e-mail he sends me, or signing the
entire git repository's tarball, etc... But that's not the right way
to do things.

As a final thought on this, I would like to say: Git is a great tool,
but it can be so much better with a safe hash.


I would like to propose a solution for changing git's hash algorithm:
It would be a breaking change, bit I think it can be done pretty
painless. (If you read the discussion back in 2006 the problems of
moving are clear.)

In git, every data has to have one and only one key - so a hybrid hash
is a no-go. That means changing hash algo involves re-hashing every
data in a git repository, but it's not that bad. On a git clone, we
actually re-hash everything to check integrity. Changing all the keys
shouldn't be worth than that.

But - and that's the main idea i'm writing here - changing the storage
keys does not mean you should drop your old hashes out. If you change
the git data structure in a way, that it can keep multiple hashes for
the same "link" in each objects (trees, commits, etc) you can keep the
old ones right next to the new one. If you want to look up the
referenced object, you must use the newest hash - which is the key.
But if you want to verify some old hash, it's still possible! Just
look up the objects by the new key, remove all the newer generation
keys, and verify the old hash on that.

A storage structure like this would allow a very great flexibility:
 - You can change your hash algorithm in the future. If SHA-256
becomes broken, it's not a problem. Just re-hash the storage, and
append the new hashes the git objects.
 - You can still verify your old hashes after a hash change - removing
the new hashes from the objects before hashing should give you back
the old objects, thus giving you the same hash as before.
 - That makes possible for signed tags, and commits to keep their
validity after hash change! With a clever-enough new format, you can
even keep the validity of current hashes and signs. (To be able to do
that, you should be able to calculate back the current format from the
new format.)

Moving git forward to a format like this would solve the weak-key
problem in git forever. You would be able to configure your key algo
on a per repository basis, you - and git - can do the daily work on
the newest hashes, while still carrying the old hashes and signatures,
in case you ever want to verify them. That would allow repositories to
gracefully change hashes in case they need to, and to only
compatibility limitation is that you must use a new enough git to
understand the new storage format.

What are your thoughts on this approach? Will git ever reach a release
with exchangeable hash algorithm? Or should someone look for
alternatives if there's a need for cryptographic security?

Thank you for your time reading this.

References:
SHA-256 discussion in 2006:
http://www.gelato.unsw.edu.au/archives/git/0608/26446.html
Discussion about git signatures in 2014
https://www.mail-archive.com/git%40vger.kernel.org/msg61087.html
Linus's talk on git
https://www.youtube.com/watch?v=4XpnKHJAok8=56m20s

Kind regards,
Zsolt Herczeg
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

52 matches

Mail list logo