Re: Git and SHA-1 security (again)
2016-07-19 20:04 GMT+02:00 Duy Nguyen: > On Tue, Jul 19, 2016 at 7:59 PM, David Lang wrote: >> On Tue, 19 Jul 2016, Duy Nguyen wrote: >> >>> On Tue, Jul 19, 2016 at 7:34 PM, David Lang wrote: On Tue, 19 Jul 2016, Duy Nguyen wrote: > On Tue, Jul 19, 2016 at 9:18 AM, Johannes Schindelin > wrote: >>> >>> >>> But we can recreate SHA-1 from the same content and verify GPG, right? >>> I know it's super expensive, but it feels safer to not carry SHA-1 >>> around when it's not secure anymore (I recall something about >>> exploiting the weakest link when you have both sha1 and sha256 in the >>> object content). Rehashing would be done locally and is better >>> controlled. >> >> >> >> You could. But how would you determine whether to recreate the commit >> object from a SHA-1-ified version of the commit buffer? Fall back if >> the >> original did not match the signature? > > > > Any repo would have a cut point when they move to sha256 (or whatever > new hash), if we can record this somewhere (e.g. as a tag or a bunch > of tags, or some dummy commits to mark the heads of the repo) then we > only verify gpg signatures _in_ the repository before this point. remember that a repo doesn't have a single 'now', each branch has it's own head, and you can easily go back to prior points and branch off from there. Since timestamps in repos can't be trusted (different people's clocks may not be in sync), how would you define this cutoff point? >>> >>> >>> The set of all heads at the time the conversion happens (maybe plus >>> all the real tags). We can make an octopus merge commit to cover all >>> the heads, then it can be the reference point. >> >> >> so to make sure I'm understanding this, anything not reachable from that >> merge must be the new hash, correct? Including forks, merges, etc that >> happen from earlier points in the history. > > Yes everything except that merge and everything reachable from it, the > whole old clone, basically. It could work, but does it worth it? 1) If you use multihash, you should assume that anything with SHA1 could be manipulated. That means you can "inject" something later to that "old clone" anyway. 2) Even if the content is re-hashed, it's hard to understand for a user where the trust comes from. The user should decide weather he trust (or not) the person who signed that octopus breakpoint. Even without git you can achieve this security: Get the complete old repository, make a signed tarball of it. If anytime later you want to check that signatures, you can just use that tarball. I don't think it's worth the trouble to create a native method for something which is rare, and can be worked around easily. It's actually easier for a user to understand the "trust relation" when using this workaround. Referring to that signed-tarball approach, you may just as well drop all signature data on conversion... As long as you can look up the references to old hashes easily, I think it's usable enough. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
>> The reality of the current situation is that it's largely mitigated in >> practice because: >> >> a) it's hard to hand someone a crafted blob to begin with for reasons >> that have nothing to do with SHA-1 (they'll go "wtf is this garbage?") >> >> b) even in that case it's *very* hard to come up with two colliding >> blobs that are *useful* for some nefarious purpose, e.g. a program A >> that looks normal being replaced by an evil program B with the same >> SHA-1. > > Thanks. That's a nice rephrasing of > > > http://public-inbox.org/git/Pine.LNX.4.58.0504291221250.18901%40ppc970.osdl.org/ > > where Linus explains SHA-1 is not the security, and the real > security is in distribution. If the real security is in the distribution, than why git supports signed commits and objects? The security of the signatures do depend on the hash. Saying the hash is not a security feature and offering GPG signing based on that hash is a damn big lie. You can change the hash algorithm to a secure one, or change the signing method to be independent of the hash algorithm, or you can stop offering signatures at all, but something has to be done here. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
> In particular, as far as I know and as Theodore Ts'o's post describes > better than I could[1], you seem to be confusing preimage attacks with > collision attacks, and then concluding that because SHA1 is vulnerable > to collision attacks that use-cases that would need a preimage attack > to be compromised (which as far is I can tell, includes all your > examples) are also "broken". I understand the differences between the collision and preimage attacks. A collision attack is not that bad for git in a typical use-case. But I think that it's important to note that there are many use-cases which do need a hash safe from collision attack. Some examples: You maintain a repository with gittorrent with signed commits Others can use these signatures to verify it's original. Let's say you include some safe file (potentially binary) from a third-party contributor. That would be fine if the hash algo is safe. Currently there is the possibility that you received a (safe) file which was made to collide with another malicious one. Once you committed (and signed) that file, the attacker joins the gittorrent network and starts to distribute the malicious file. Suddenly most of your clients pulling are infected however your signature is correct. Or, you would like to make a continuous delivery system, where you use signed tags. The delivery happens only when signature is right, and the signer is responsible for it. Your colleague makes a collision, pushes the good-file. You make all the tests, everything is fine, sign and push and wait for the delivery to happen. Your colleague changes the file on the server. The delivery makes a huge mass, and you're fired. Or, let's say you use a service like github, which is nice enough to make a repository for you, with .gitignore, licenses and everything. Likely, you'll never change dose files. Let's say that service made one of those initial files to collide something bad. That means, they can "infect" anyone, who is pulling your repo. Do you need more hypothetical stories? There are a lot. Of course they need a lot of work, and they're unlikely to happen. But it's possible. If you need trust, and gpg signatures that means you need ultimate trust. What's the point in making GPG signatures anyway if you cannot ultimately trust them? You could just as well say: well that's repository is only reachable by trustworthy persons, everything here is just fine and really made by the person named in the "author field". -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
Hi Johannes, >> My point is not to throw out old hashes and break signatures. My point >> is to convert the data storage, and use mapping to resolve problems >> with those old hashes and signatures. > > If you convert the data storage, then the SHA-1s listed in the commit > objects will have to be rewritten, and then the GPG signature will not > match anymore. > > Call e.g. `git cat-file commit 44cc742a8ca17b9c279be4cc195a93a6ef7a320e` > to see the anatomy of a gpg-signed commit object. > Yes and no. That's the reason you need the two-way lookup table. If you need to verify a commit which was signed as SHA-1, you must use the lookup table in reverse. This way you can reconstruct the original commit structure, which than can be verified. Of course it's work to do so but you only need to develop the new signature verification algorithm. You save much more on the other side where you don't have to rework all the other algorithms to multi-hash. Another interesting point is that multi-hash storage, actively hurts signature security! (Duy just mentoined that while I'm writing.) A signed commit (or tag) is just as secure as the least secure hash it refers (directly or indirectly). Let's imagine that you make a new a commit, and there is on old file in the tree somewhere. That's a weak point: cause it has SHA-1 hash, someone can replace it (and thus change your commits content. I would clearly mark any signature wether it's SHA-1 or SHA2 (or anything else) based, and strictly allow that hash in all the trees and objects while verifying that commit. If it's not the same hash-type as the storage-key, than use the lookup table for conversion before check. (This has some interesting side-effects, but it's all about good implementation). -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
>> I think converting is a much better option. Use a single-hash storage, and >> convert everything to that on import/clone/pull. > > That ignores two very important issues that I already had mentioned: That's not true. If you double-check the next part of my message, you I just showed that an automatic two-way mapping could solve these problems! (I even give briefs explanation how to handle referencing and signature verification in those cases.) My point is not to throw out old hashes and break signatures. My point is to convert the data storage, and use mapping to resolve problems with those old hashes and signatures. A single-hash data storage is obviously way easier to handle, than a multi-hash mass. (See Linus's old e-mail: multiple hashes [=meaning database keys] for the same content is a complete nonsense in git-speak) > The "convert everything" strategy also ignores the problem of interacting > with servers and collaborators. Think of hosting repositories, > rediscovering forgotten work trees, and of the "D" in DSCM. That's not an issue when we're working with a single repository. It's reasonable to ask for all git clients of the same repository, to support the same hash. Yes, you have the need to configure the hash algo on a per-repository basis but that's all. For importing and co-working between different repositories, it's a bit harder, problem, but it's possible to handle the conversions correctly. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Fwd: Git and SHA-1 security (again)
Do you think the multi-hash approach worth the added complexity? It'll break a lot of things. I mean almost everything. All git algorithms rely on the "same hash => same content" "same content => same hash" statements. I think converting is a much better option. Use a single-hash storage, and convert everything to that on import/clone/pull. I would only introduce a a two-way mapping table (old-hash <=> new-hash) to help the users. In the normal workflow, everything can go with new-hashes only. That leaves most algorithms and code intact, and introduces a very few new cases: On import: If the imported data does not match your selected new-hash format, add it's hash to the lookup table, than convert it to your selected format, and handle it as such. If a user references an old hash: We can look-up that table forward, and find the referenced object in storage. We can handle it from now as normal. On an old signature verify: Look-up the table forward, find the object by knew key. Look-up backwards for all referenced objects, reconstruct the "old-format" and verify the hash on that. If you double-check your hashes as you build the mapping, you can even trust it, which makes the lookups and verifys very fast. You can introduce as many hash mapping tables as you want, so you can not only support old-to-new has transition, but there can be as many different hashes in the world as you want. Your only rule is "you reference your work in your current format", but you can look-up any references which was valid at the moment it was made. (There are slight issues with this aproach if we "convert than convert". As for example, when you import from A (sha1) repo to B (sha2) repo it's perfectl. But when you import the same commits from B to C (sha3), you might loose sha1 references. That could be considered normal if we wan't to keep-it-simple-stupid, only support a few hashes, always going forward. Or you may add an extra "list" field to objects, which could show what type of hashes you have to keep in lookup-tables for that particular object. Or, you can even include a list of old hashes in the object itself, which should make it to the lookup table on import.) Anyway, I think a single-hash storage, and the ability to hide any old hashes from most of the internal algorithms is a key point in making transition. If we want to provide multi-hash interface to users, than we should look for "wrapper" solutions, that translates multi-hash user needs to a single-hash backend. Zsolt -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
Dear Brian, Thank you for your response. It very good to hear that changing the hash is on the git project's list. I haven't found any official communication on that topic since 2006. I'll look into the contributions guide and the source codes, to check if I can contribute to this transition. If you have any documentation or other related info, please point me towards it. Thanks, Zsolt Herczeg 2016-07-16 22:13 GMT+02:00 brian m. carlson <sand...@crustytoothpaste.net>: > On Sat, Jul 16, 2016 at 03:48:49PM +0200, Herczeg Zsolt wrote: >> But - and that's the main idea i'm writing here - changing the storage >> keys does not mean you should drop your old hashes out. If you change >> the git data structure in a way, that it can keep multiple hashes for >> the same "link" in each objects (trees, commits, etc) you can keep the >> old ones right next to the new one. If you want to look up the >> referenced object, you must use the newest hash - which is the key. >> But if you want to verify some old hash, it's still possible! Just >> look up the objects by the new key, remove all the newer generation >> keys, and verify the old hash on that. >> >> A storage structure like this would allow a very great flexibility: >> - You can change your hash algorithm in the future. If SHA-256 >> becomes broken, it's not a problem. Just re-hash the storage, and >> append the new hashes the git objects. >> - You can still verify your old hashes after a hash change - removing >> the new hashes from the objects before hashing should give you back >> the old objects, thus giving you the same hash as before. >> - That makes possible for signed tags, and commits to keep their >> validity after hash change! With a clever-enough new format, you can >> even keep the validity of current hashes and signs. (To be able to do >> that, you should be able to calculate back the current format from the >> new format.) >> >> Moving git forward to a format like this would solve the weak-key >> problem in git forever. You would be able to configure your key algo >> on a per repository basis, you - and git - can do the daily work on >> the newest hashes, while still carrying the old hashes and signatures, >> in case you ever want to verify them. That would allow repositories to >> gracefully change hashes in case they need to, and to only >> compatibility limitation is that you must use a new enough git to >> understand the new storage format. >> >> What are your thoughts on this approach? Will git ever reach a release >> with exchangeable hash algorithm? Or should someone look for >> alternatives if there's a need for cryptographic security? > > I'm working on adding new hash algorithm support in Git. However, it > requires a significant refactor of the code base. My current plan is > not to implement side-by-side data, for a couple reasons. > > One is that it requires significantly more work to implement and > complicates the code. It's also incompatible with all the refactoring > I've done already. > > The second is that it requires that Git have the ability to store > multiple hashes at once, which is very expensive in terms of memory. > Moving from a 160-bit hash to a 256-bit hash (my current plan is > SHA3-256) requires 1.6× the memory. Storing both requires 2.6× the > memory. If you add a third hash, it requires even more. Memory is > often a constraint with using Git. > > The current plan is to use git-fast-import and git-fast-export to handle > that conversion process, and then maybe provide wrappers to make it more > transparent. > > Currently the process of the refactor is ongoing, but it is a free time > activity for me. > > If you'd like to follow the progress roughly, you can do so by checking > the output of the following commands: > > git grep 'unsigned char.*20' | wc -l > git grep 'struct object_id' | wc -l > > You are also welcome to contribute, of course. > -- > brian m. carlson / brian with sandals: Houston, Texas, US > +1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only > OpenPGP: https://keybase.io/bk2204 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Git and SHA-1 security (again)
Dear List Members, Git Developers, I would like to discuss an old topic from 2006. I understand it was already discussed. The only reason i'm sending this e-mail is to talk about a possible solution which didn't show up on this list before. I think we all understand that SHA-1 is broken. It still works perfect as a storage key, but it's not cryptographically secure anymore. Git is not moving away from SHA-1 because it would break too many projects, and cryptographic security is not needed but git if you have your own repository. However I would like to show some big problems caused by SHA-1: - Git signed tags and signed commits are cryptographically insecure, they're useless at the moment. - Git Torrent (https://github.com/cjb/GitTorrent) is also cryptographically broken, however it would be an awesome experiment. - Linus said: "You only need to know the SHA-1 of the top of your tree, and if you know that, you can trust your tree." That's not true anymore. You have to trust your computer, you servers, your git provider in a way that no-one can maliciously modify your data. I understand that git is perfect for a work flow, where you have your very own repository and you double-check any commits or diffs you accepting to it. But that's not everybody's work flow. For example: if I want to blindly trust my college, I could just include all commits he signed without review. Currently I can't do that. There are workarounds of course: signing the e-mail he sends me, or signing the entire git repository's tarball, etc... But that's not the right way to do things. As a final thought on this, I would like to say: Git is a great tool, but it can be so much better with a safe hash. I would like to propose a solution for changing git's hash algorithm: It would be a breaking change, bit I think it can be done pretty painless. (If you read the discussion back in 2006 the problems of moving are clear.) In git, every data has to have one and only one key - so a hybrid hash is a no-go. That means changing hash algo involves re-hashing every data in a git repository, but it's not that bad. On a git clone, we actually re-hash everything to check integrity. Changing all the keys shouldn't be worth than that. But - and that's the main idea i'm writing here - changing the storage keys does not mean you should drop your old hashes out. If you change the git data structure in a way, that it can keep multiple hashes for the same "link" in each objects (trees, commits, etc) you can keep the old ones right next to the new one. If you want to look up the referenced object, you must use the newest hash - which is the key. But if you want to verify some old hash, it's still possible! Just look up the objects by the new key, remove all the newer generation keys, and verify the old hash on that. A storage structure like this would allow a very great flexibility: - You can change your hash algorithm in the future. If SHA-256 becomes broken, it's not a problem. Just re-hash the storage, and append the new hashes the git objects. - You can still verify your old hashes after a hash change - removing the new hashes from the objects before hashing should give you back the old objects, thus giving you the same hash as before. - That makes possible for signed tags, and commits to keep their validity after hash change! With a clever-enough new format, you can even keep the validity of current hashes and signs. (To be able to do that, you should be able to calculate back the current format from the new format.) Moving git forward to a format like this would solve the weak-key problem in git forever. You would be able to configure your key algo on a per repository basis, you - and git - can do the daily work on the newest hashes, while still carrying the old hashes and signatures, in case you ever want to verify them. That would allow repositories to gracefully change hashes in case they need to, and to only compatibility limitation is that you must use a new enough git to understand the new storage format. What are your thoughts on this approach? Will git ever reach a release with exchangeable hash algorithm? Or should someone look for alternatives if there's a need for cryptographic security? Thank you for your time reading this. References: SHA-256 discussion in 2006: http://www.gelato.unsw.edu.au/archives/git/0608/26446.html Discussion about git signatures in 2014 https://www.mail-archive.com/git%40vger.kernel.org/msg61087.html Linus's talk on git https://www.youtube.com/watch?v=4XpnKHJAok8=56m20s Kind regards, Zsolt Herczeg -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html