Re: Git and SHA-1 security (again)

2016-07-19 Thread Herczeg Zsolt
2016-07-19 20:04 GMT+02:00 Duy Nguyen :
> On Tue, Jul 19, 2016 at 7:59 PM, David Lang  wrote:
>> On Tue, 19 Jul 2016, Duy Nguyen wrote:
>>
>>> On Tue, Jul 19, 2016 at 7:34 PM, David Lang  wrote:

 On Tue, 19 Jul 2016, Duy Nguyen wrote:

> On Tue, Jul 19, 2016 at 9:18 AM, Johannes Schindelin
>  wrote:
>>>
>>>
>>> But we can recreate SHA-1 from the same content and verify GPG, right?
>>> I know it's super expensive, but it feels safer to not carry SHA-1
>>> around when it's not secure anymore (I recall something about
>>> exploiting the weakest link when you have both sha1 and sha256 in the
>>> object content). Rehashing would be done locally and is better
>>> controlled.
>>
>>
>>
>> You could. But how would you determine whether to recreate the commit
>> object from a SHA-1-ified version of the commit buffer? Fall back if
>> the
>> original did not match the signature?
>
>
>
> Any repo would have a cut point when they move to sha256 (or whatever
> new hash), if we can record this somewhere (e.g. as a tag or a bunch
> of tags, or some dummy commits to mark the heads of the repo) then we
> only verify gpg signatures _in_ the repository before this point.



 remember that a repo doesn't have a single 'now', each branch has it's
 own
 head, and you can easily go back to prior points and branch off from
 there.

 Since timestamps in repos can't be trusted (different people's clocks may
 not be in sync), how would you define this cutoff point?
>>>
>>>
>>> The set of all heads at the time the conversion happens (maybe plus
>>> all the real tags). We can make an octopus merge commit to cover all
>>> the heads, then it can be the reference point.
>>
>>
>> so to make sure I'm understanding this, anything not reachable from that
>> merge must be the new hash, correct? Including forks, merges, etc that
>> happen from earlier points in the history.
>
> Yes everything except that merge and everything reachable from it, the
> whole old clone, basically.

It could work, but does it worth it?

1) If you use multihash, you should assume that anything with SHA1
could be manipulated. That means you can "inject" something later to
that "old clone" anyway.
2) Even if the content is re-hashed, it's hard to understand for a
user where the trust comes from. The user should decide weather he
trust (or not) the person who signed that octopus breakpoint.

Even without git you can achieve this security: Get the complete old
repository, make a signed tarball of it. If anytime later you want to
check that signatures, you can just use that tarball. I don't think
it's worth the trouble to create a native method for something which
is rare, and can be worked around easily. It's actually easier for a
user to understand the "trust relation" when using this workaround.

Referring to that signed-tarball approach, you may just as well drop
all signature data on conversion... As long as you can look up the
references to old hashes easily, I think it's usable enough.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git and SHA-1 security (again)

2016-07-18 Thread Herczeg Zsolt
>> The reality of the current situation is that it's largely mitigated in
>> practice because:
>>
>> a) it's hard to hand someone a crafted blob to begin with for reasons
>> that have nothing to do with SHA-1 (they'll go "wtf is this garbage?")
>>
>> b) even in that case it's *very* hard to come up with two colliding
>> blobs that are *useful* for some nefarious purpose, e.g. a program A
>> that looks normal being replaced by an evil program B with the same
>> SHA-1.
>
> Thanks.  That's a nice rephrasing of
>
>   
> http://public-inbox.org/git/Pine.LNX.4.58.0504291221250.18901%40ppc970.osdl.org/
>
> where Linus explains SHA-1 is not the security, and the real
> security is in distribution.

If the real security is in the distribution, than why git supports
signed commits and objects?

The security of the signatures do depend on the hash. Saying the hash
is not a security feature and offering GPG signing based on that hash
is a damn big lie. You can change the hash algorithm to a secure one,
or change the signing method to be independent of the hash algorithm,
or you can stop offering signatures at all, but something has to be
done here.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git and SHA-1 security (again)

2016-07-18 Thread Herczeg Zsolt
> In particular, as far as I know and as Theodore Ts'o's post describes
> better than I could[1], you seem to be confusing preimage attacks with
> collision attacks, and then concluding that because SHA1 is vulnerable
> to collision attacks that use-cases that would need a preimage attack
> to be compromised (which as far is I can tell, includes all your
> examples) are also "broken".

I understand the differences between the collision and preimage
attacks. A collision attack is not that bad for git in a typical
use-case. But I think that it's important to note that there are many
use-cases which do need a hash safe from collision attack. Some
examples:

You maintain a repository with gittorrent with signed commits Others
can use these signatures to verify it's original. Let's say you
include some safe file (potentially binary) from a third-party
contributor. That would be fine if the hash algo is safe. Currently
there is the possibility that you received a (safe) file which was
made to collide with another malicious one. Once you committed (and
signed) that file, the attacker joins the gittorrent network and
starts to distribute the malicious file. Suddenly most of your clients
pulling are infected however your signature is correct.

Or, you would like to make a continuous delivery system, where you use
signed tags. The delivery happens only when signature is right, and
the signer is responsible for it. Your colleague makes a collision,
pushes the good-file. You make all the tests, everything is fine, sign
and push and wait for the delivery to happen. Your colleague changes
the file on the server. The delivery makes a huge mass, and you're
fired.

Or, let's say you use a service like github, which is nice enough to
make a repository for you, with .gitignore, licenses and everything.
Likely, you'll never change dose files. Let's say that service made
one of those initial files to collide something bad. That means, they
can "infect" anyone, who is pulling your repo.

Do you need more hypothetical stories? There are a lot. Of course they
need a lot of work, and they're unlikely to happen. But it's possible.
If you need trust, and gpg signatures that means you need ultimate
trust. What's the point in making GPG signatures anyway if you cannot
ultimately trust them? You could just as well say: well that's
repository is only reachable by trustworthy persons, everything here
is just fine and really made by the person named in the "author
field".
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git and SHA-1 security (again)

2016-07-18 Thread Herczeg Zsolt
Hi Johannes,

>> My point is not to throw out old hashes and break signatures. My point
>> is to convert the data storage, and use mapping to resolve problems
>> with those old hashes and signatures.
>
> If you convert the data storage, then the SHA-1s listed in the commit
> objects will have to be rewritten, and then the GPG signature will not
> match anymore.
>
> Call e.g. `git cat-file commit 44cc742a8ca17b9c279be4cc195a93a6ef7a320e`
> to see the anatomy of a gpg-signed commit object.
>

Yes and no. That's the reason you need the two-way lookup table. If
you need to verify a commit which was signed as SHA-1, you must use
the lookup table in reverse. This way you can reconstruct the original
commit structure, which than can be verified. Of course it's work to
do so but you only need to develop the new signature verification
algorithm. You save much more on the other side where you don't have
to rework all the other algorithms to multi-hash.

Another interesting point is that multi-hash storage, actively hurts
signature security! (Duy just mentoined that while I'm writing.) A
signed commit (or tag) is just as secure as the least secure hash it
refers (directly or indirectly). Let's imagine that you make a new a
commit, and there is on old file in the tree somewhere. That's a weak
point: cause it has SHA-1 hash, someone can replace it (and thus
change your commits content.

I would clearly mark any signature wether it's SHA-1 or SHA2 (or
anything else) based, and strictly allow that hash in all the trees
and objects while verifying that commit. If it's not the same
hash-type as the storage-key, than use the lookup table for conversion
before check. (This has some interesting side-effects, but it's all
about good implementation).
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git and SHA-1 security (again)

2016-07-18 Thread Herczeg Zsolt
>> I think converting is a much better option. Use a single-hash storage, and
>> convert everything to that on import/clone/pull.
>
> That ignores two very important issues that I already had mentioned:

That's not true. If you double-check the next part of my message, you
I just showed that an automatic two-way mapping could solve these
problems! (I even give briefs explanation how to handle referencing
and signature verification in those cases.)

My point is not to throw out old hashes and break signatures. My point
is to convert the data storage, and use mapping to resolve problems
with those old hashes and signatures. A single-hash data storage is
obviously way easier to handle, than a multi-hash mass. (See Linus's
old e-mail: multiple hashes [=meaning database keys] for the same
content is a complete nonsense in git-speak)

> The "convert everything" strategy also ignores the problem of interacting
> with servers and collaborators. Think of hosting repositories,
> rediscovering forgotten work trees, and of the "D" in DSCM.

That's not an issue when we're working with a single repository. It's
reasonable to ask for all git clients of the same repository, to
support the same hash. Yes, you have the need to configure the hash
algo on a per-repository basis but that's all. For importing and
co-working between different repositories, it's a bit harder, problem,
but it's possible to handle the conversions correctly.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fwd: Git and SHA-1 security (again)

2016-07-17 Thread Herczeg Zsolt
Do you think the multi-hash approach worth the added complexity? It'll
break a lot of things. I mean almost everything. All git algorithms
rely on the "same hash => same content" "same content => same hash"
statements.

I think converting is a much better option. Use a single-hash storage,
and convert everything to that on import/clone/pull. I would only
introduce a a two-way mapping table (old-hash <=> new-hash) to help
the users. In the normal workflow, everything can go with new-hashes
only.

That leaves most algorithms and code intact, and introduces a very few
new cases:

On import:
If the imported data does not match your selected new-hash format, add
it's hash to the lookup table, than convert it to your selected
format, and handle it as such.

If a user references an old hash:
We can look-up that table forward, and find the referenced object in
storage. We can handle it from now as normal.

On an old signature verify:
Look-up the table forward, find the object by knew key. Look-up
backwards for all referenced objects, reconstruct the "old-format" and
verify the hash on that.

If you double-check your hashes as you build the mapping, you can even
trust it, which makes the lookups and verifys very fast. You can
introduce as many hash mapping tables as you want, so you can not only
support old-to-new has transition, but there can be as many different
hashes in the world as you want. Your only rule is "you reference your
work in your current format", but you can look-up any references which
was valid at the moment it was made.

(There are slight issues with this aproach if we "convert than
convert". As for example, when you import from A (sha1) repo to B
(sha2) repo it's perfectl. But when you import the same commits from B
to C (sha3), you might loose sha1 references. That could be considered
normal if we wan't to keep-it-simple-stupid, only support a few
hashes, always going forward. Or you may add an extra "list" field to
objects, which could show what type of hashes you have to keep in
lookup-tables for that particular object. Or, you can even include a
list of old hashes in the object itself, which should make it to the
lookup table on import.)

Anyway, I think a single-hash storage, and the ability to hide any old
hashes from most of the internal algorithms is a key point in making
transition. If we want to provide multi-hash interface to users, than
we should look for "wrapper" solutions, that translates multi-hash
user needs to a single-hash backend.

Zsolt
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git and SHA-1 security (again)

2016-07-16 Thread Herczeg Zsolt
Dear Brian,

Thank you for your response. It very good to hear that changing the
hash is on the git project's list. I haven't found any official
communication on that topic since 2006.
I'll look into the contributions guide and the source codes, to check
if I can contribute to this transition. If you have any documentation
or other related info, please point me towards it.

Thanks,
Zsolt Herczeg


2016-07-16 22:13 GMT+02:00 brian m. carlson <sand...@crustytoothpaste.net>:
> On Sat, Jul 16, 2016 at 03:48:49PM +0200, Herczeg Zsolt wrote:
>> But - and that's the main idea i'm writing here - changing the storage
>> keys does not mean you should drop your old hashes out. If you change
>> the git data structure in a way, that it can keep multiple hashes for
>> the same "link" in each objects (trees, commits, etc) you can keep the
>> old ones right next to the new one. If you want to look up the
>> referenced object, you must use the newest hash - which is the key.
>> But if you want to verify some old hash, it's still possible! Just
>> look up the objects by the new key, remove all the newer generation
>> keys, and verify the old hash on that.
>>
>> A storage structure like this would allow a very great flexibility:
>>  - You can change your hash algorithm in the future. If SHA-256
>> becomes broken, it's not a problem. Just re-hash the storage, and
>> append the new hashes the git objects.
>>  - You can still verify your old hashes after a hash change - removing
>> the new hashes from the objects before hashing should give you back
>> the old objects, thus giving you the same hash as before.
>>  - That makes possible for signed tags, and commits to keep their
>> validity after hash change! With a clever-enough new format, you can
>> even keep the validity of current hashes and signs. (To be able to do
>> that, you should be able to calculate back the current format from the
>> new format.)
>>
>> Moving git forward to a format like this would solve the weak-key
>> problem in git forever. You would be able to configure your key algo
>> on a per repository basis, you - and git - can do the daily work on
>> the newest hashes, while still carrying the old hashes and signatures,
>> in case you ever want to verify them. That would allow repositories to
>> gracefully change hashes in case they need to, and to only
>> compatibility limitation is that you must use a new enough git to
>> understand the new storage format.
>>
>> What are your thoughts on this approach? Will git ever reach a release
>> with exchangeable hash algorithm? Or should someone look for
>> alternatives if there's a need for cryptographic security?
>
> I'm working on adding new hash algorithm support in Git.  However, it
> requires a significant refactor of the code base.  My current plan is
> not to implement side-by-side data, for a couple reasons.
>
> One is that it requires significantly more work to implement and
> complicates the code.  It's also incompatible with all the refactoring
> I've done already.
>
> The second is that it requires that Git have the ability to store
> multiple hashes at once, which is very expensive in terms of memory.
> Moving from a 160-bit hash to a 256-bit hash (my current plan is
> SHA3-256) requires 1.6× the memory.  Storing both requires 2.6× the
> memory.  If you add a third hash, it requires even more.  Memory is
> often a constraint with using Git.
>
> The current plan is to use git-fast-import and git-fast-export to handle
> that conversion process, and then maybe provide wrappers to make it more
> transparent.
>
> Currently the process of the refactor is ongoing, but it is a free time
> activity for me.
>
> If you'd like to follow the progress roughly, you can do so by checking
> the output of the following commands:
>
>   git grep 'unsigned char.*20' | wc -l
>   git grep 'struct object_id' | wc -l
>
> You are also welcome to contribute, of course.
> --
> brian m. carlson / brian with sandals: Houston, Texas, US
> +1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
> OpenPGP: https://keybase.io/bk2204
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Git and SHA-1 security (again)

2016-07-16 Thread Herczeg Zsolt
Dear List Members, Git Developers,

I would like to discuss an old topic from 2006. I understand it was
already discussed. The only reason i'm sending this e-mail is to talk
about a possible solution which didn't show up on this list before.

I think we all understand that SHA-1 is broken. It still works perfect
as a storage key, but it's not cryptographically secure anymore. Git
is not moving away from SHA-1 because it would break too many
projects, and cryptographic security is not needed but git if you have
your own repository.

However I would like to show some big problems caused by SHA-1:
 - Git signed tags and signed commits are cryptographically insecure,
they're useless at the moment.
 - Git Torrent (https://github.com/cjb/GitTorrent) is also
cryptographically broken, however it would be an awesome experiment.
 - Linus said: "You only need to know the SHA-1 of the top of your
tree, and if you know that, you can trust your tree." That's not true
anymore. You have to trust your computer, you servers, your git
provider in a way that no-one can maliciously modify your data.

I understand that git is perfect for a work flow, where you have your
very own repository and you double-check any commits or diffs you
accepting to it. But that's not everybody's work flow. For example: if
I want to blindly trust my college, I could just include all commits
he signed without review. Currently I can't do that. There are
workarounds of course: signing the e-mail he sends me, or signing the
entire git repository's tarball, etc... But that's not the right way
to do things.

As a final thought on this, I would like to say: Git is a great tool,
but it can be so much better with a safe hash.


I would like to propose a solution for changing git's hash algorithm:
It would be a breaking change, bit I think it can be done pretty
painless. (If you read the discussion back in 2006 the problems of
moving are clear.)

In git, every data has to have one and only one key - so a hybrid hash
is a no-go. That means changing hash algo involves re-hashing every
data in a git repository, but it's not that bad. On a git clone, we
actually re-hash everything to check integrity. Changing all the keys
shouldn't be worth than that.

But - and that's the main idea i'm writing here - changing the storage
keys does not mean you should drop your old hashes out. If you change
the git data structure in a way, that it can keep multiple hashes for
the same "link" in each objects (trees, commits, etc) you can keep the
old ones right next to the new one. If you want to look up the
referenced object, you must use the newest hash - which is the key.
But if you want to verify some old hash, it's still possible! Just
look up the objects by the new key, remove all the newer generation
keys, and verify the old hash on that.

A storage structure like this would allow a very great flexibility:
 - You can change your hash algorithm in the future. If SHA-256
becomes broken, it's not a problem. Just re-hash the storage, and
append the new hashes the git objects.
 - You can still verify your old hashes after a hash change - removing
the new hashes from the objects before hashing should give you back
the old objects, thus giving you the same hash as before.
 - That makes possible for signed tags, and commits to keep their
validity after hash change! With a clever-enough new format, you can
even keep the validity of current hashes and signs. (To be able to do
that, you should be able to calculate back the current format from the
new format.)

Moving git forward to a format like this would solve the weak-key
problem in git forever. You would be able to configure your key algo
on a per repository basis, you - and git - can do the daily work on
the newest hashes, while still carrying the old hashes and signatures,
in case you ever want to verify them. That would allow repositories to
gracefully change hashes in case they need to, and to only
compatibility limitation is that you must use a new enough git to
understand the new storage format.

What are your thoughts on this approach? Will git ever reach a release
with exchangeable hash algorithm? Or should someone look for
alternatives if there's a need for cryptographic security?

Thank you for your time reading this.

References:
SHA-256 discussion in 2006:
http://www.gelato.unsw.edu.au/archives/git/0608/26446.html
Discussion about git signatures in 2014
https://www.mail-archive.com/git%40vger.kernel.org/msg61087.html
Linus's talk on git
https://www.youtube.com/watch?v=4XpnKHJAok8=56m20s

Kind regards,
Zsolt Herczeg
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html