Re: Git and SHA-1 security (again)
Sorry if I'm dropping in at the wrong point (this is one I'd bookmarked).. From: "Duy Nguyen"Sent: Wednesday, July 20, 2016 3:44 PM On Wed, Jul 20, 2016 at 2:28 PM, Johannes Schindelin wrote: But that strategy *still* ignores the distributed nature of Git. Just because *you* make that merge at a certain point does not necessarily mean that I make it at that point, too. Any approach that tries to have one single point of conversion will most likely fall short of a solution. OK I see the difference in our views now. To me an sha256 repo would see an sha1 repo as a _foreign_ DVCS, pretty much like git sees mercurial now. So a transition from sha1 to sha256 is not that different from cvs -> svn -> a dvcs bubble -> git. I think that within Git, that it is possible to have inter-workability (for those parts that negotiate) between instances with different views about the availability of two hash types. Fetch/push negotiation is a normal part of working with a remote. To be honest, I am less concerned about the GPG-signed commits (after all, after switching to a more secure hash algorithm, a maintainer could cross-sign all signed commits, or only the branch tips or tags, as new tags, to reinstitute trust). I am much more concerned about references to commits, both inside and outside the repository. That is, if I read anywhere on the internet about Git having added support for `git add --chmod=+x ` in 4e55ed3 (add: add --chmod=+x / --chmod=-x options, 2016-05-31), I want to find that commit by that reference. And I am of course concerned what should happen if a user wants to fetch from, or push to, a SHA-1-hashed remote repository into, or from, a SHA-256-hashed local one. to follow the above, in my view, interaction with sha1 repos go through some conversion bridges like what we have with hg and svn. I don't know if we are going this route. It's certainly simpler and people already have experiences (from previous migration) to prepare for it. -- The main thought was that rather than worrying about which advanced hash to pick (with all the arguments that entails), rather it is worth reducing the problem space to create a 'toy problem', to look at the interaction issues. For the toy problem view we'd keep the current oid length (so that the transmission formats don't change size), however we swap the old-new to make sha1 the new hash and use an older shorter hash (e.g. md5) to investigate the transition from a short to long hash. Keeping it as a 'toy problem' avoids folks having too much invested in the new hash choice, rather the interworking can be more easily sorted, and some issue can be punted on (e.g. the choice of salt to extend the md5 to the sha1, and collisions therein). -- Philip -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
Johannes Schindelinwrites: > Hi Junio, > > On Mon, 18 Jul 2016, Junio C Hamano wrote: > >> "brian m. carlson" writes: >> >> > I will say that the pack format will likely require some changes, >> > because it assumes ... The reason is that we can't have an >> > unambiguous parse of the current objects if two hash algorithms are in >> > use So when we look at a new hash, we need to provide an >> > unambiguous way to know what hash is in use. The two choices are to >> > either require all object use the new hash, or to extend the objects >> > to include the hash. Until a couple days ago, I had planned to do the >> > former. I had not even considered using a multihash approach due to >> > the complexity. >> >> Objects in Git identify themselves, but once you introduce the second >> hash function (as opposed to replacing the hash function to a new one), >> you would allow people to call the same object by two names. That has >> interesting implications. >> >> [...] > > So essentially you are saying that the multi-hash approach has too many > negative implications, right? At least that is what I understand. > > Looks more and more like we do need to convert repositories wholesale, and > keep a two-way mapping for talking to remote repositories. > > Would you concur? Not necessarily. That was me thinking aloud, listing some issues that I would imagine to be tricky to solve, without even attempting to be exhaustive, that I expect to see solved in a good end-result implementation. For example, "I do not see a nice way to solve X myself without doing Y" in the message you are responding to does not necessarily mean there is no good solution to X (just "I do not think of any offhand"), and it does not mean I think it is terrible that we have to do Y to solve X. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
Hi Brian, On Mon, 18 Jul 2016, brian m. carlson wrote: > On Mon, Jul 18, 2016 at 09:00:06AM +0200, Johannes Schindelin wrote: > > > FWIW it never crossed my mind to allow different same-sized hash > > algorithms. So I never thought we'd need a way to distinguish, say, > > BLAKE2b-256 from SHA-256. > > > > Is there a good reason to add the maintenance burden of several 256-bit > > hash algorithms, apart from speed (which in my mind should decide which > > one to use, always, rather than letting the user choose)? It would also > > complicate transport even further, let alone subtree merges from > > differently-hashed repositories. > > There are really three candidates: > > * SHA-256 (the SHA-2 algorithm): While this looks good right now, > cryptanalysis is advancing. This is not a good choice for a long-term > solution. > * SHA3-256 (the SHA-3 algorithm): This is the conservative choice. It's > also faster than SHA-256 on 64-bit systems. It has a very > conservative security margin and is a good long-term choice. > * BLAKE2b-256: This is the blisteringly fast choice. It outperforms > SHA-1 and even MD5 on 64-bit systems. This algorithm was designed so > that nobody would have a reason to use an insecure algorithm. It will > probably be secure for some time, but maybe not as long as SHA3-256. > > I'm only considering 256-bit hashes, because anything longer won't fit > on an 80-column terminal in hex form. > > The reason I had considered implementing both SHA3-256 and BLAKE2b-256 > is that I want there to be no reason not to upgrade. People who need a > FIPS-approved algorithm or want a long-term, conservative choice should > use SHA3-256. People who want even better performance than current Git > would use BLAKE2b-256. > > Performance comparison (my implementations): > SHA-1: 437 MiB/s > SHA-256: 196 MiB/s > SHA3-256: 273 MiB/s > BLAKE2b: 649 MiB/s Those are impressive numbers on BLAKE2b. However, Keccak was chosen as SHA-3 because it can be implemented in hardware more efficiently than BLAKE (and hence, probably, also BLAKE2). Given that there are already SSE instructions implementing SHA-1/SHA-256 on some CPUs [*1*], I would not be surprised if SHA3 would also see some hardware support. So speed seems less of a concern to me. We are talking about a multi-year roadmap, after all. And given the complications for public repository hosters, I would like to settle for a single 256-bit hash. That'll be challenging enough. Ciao, Dscho Footnote *1*: https://en.wikipedia.org/wiki/Intel_SHA_extensions -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
Hi Brian, On Mon, 18 Jul 2016, brian m. carlson wrote: > On Mon, Jul 18, 2016 at 11:00:35AM -0700, Junio C Hamano wrote: > > Continuing this thought process, I do not see a good way to allow us > > to wean ourselves off of the old hash, unless we _break_ the pack > > stream format so that each object in the pack carries not just the > > data but also the hash algorithm to be used to _name_ it, so that > > new objects will never be referred to using the old hash. > > I think for this reason, I'm going to propose the following approach > when we get there: > > * We serialize the hash in the object formats, using multihash or > something similar. This means that it is minimally painful if we ever > need to change in the future[0]. This adds a lot of redundancy, though, and has an adverse performance impact, no? Could we not simply require packs to identify the used hash *once*, and use a single hash algorithm per repository? That would mean that we would have to re-hash packs on-the-fly if, say, talking to a SHA-1 remote from a SHA-256 local repository. > * Each repository carries exactly one hash algorithm, except for > submodule data. If we don't do this, then some people will never > switch because the submodules they depend on haven't. If we re-hash transparently, we could get away with SHA-256 even for submodules. > * If people on the new format need to refer to submodule commits using > SHA-1, then they have to use a prefix on the hash form; otherwise, > they can use the raw hash value (without any multihash prefix). > * git fsck verifies one consistent algorithm (excepting submodule > references). > > This preserves the security benefits, avoids future-proofing problems, > and minimizes performance impacts due to naming like you mentioned. > > [0] We are practically limited to 256-bit hashes because anything longer > will wrap on an 80-column terminal when in hex form. We are not really bound by the 80-column limit when choosing a hash algorithm. We typically refer to a commit by a shorter name, and the 80-column limit applies only to Git's own source code. Ciao, Dscho -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
Hi Junio, On Mon, 18 Jul 2016, Junio C Hamano wrote: > "brian m. carlson"writes: > > > I will say that the pack format will likely require some changes, > > because it assumes ... The reason is that we can't have an > > unambiguous parse of the current objects if two hash algorithms are in > > use So when we look at a new hash, we need to provide an > > unambiguous way to know what hash is in use. The two choices are to > > either require all object use the new hash, or to extend the objects > > to include the hash. Until a couple days ago, I had planned to do the > > former. I had not even considered using a multihash approach due to > > the complexity. > > Objects in Git identify themselves, but once you introduce the second > hash function (as opposed to replacing the hash function to a new one), > you would allow people to call the same object by two names. That has > interesting implications. > > [...] So essentially you are saying that the multi-hash approach has too many negative implications, right? At least that is what I understand. Looks more and more like we do need to convert repositories wholesale, and keep a two-way mapping for talking to remote repositories. Would you concur? Ciao, Dscho -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
Stefan Bellerwrites: >> to follow the above, in my view, interaction with sha1 repos go >> through some conversion bridges like what we have with hg and svn. I >> don't know if we are going this route. It's certainly simpler and >> people already have experiences (from previous migration) to prepare >> for it. > > When treating the SHA1 version as a foreign dvcs and the SHA256 > as the real deal, we could introduce "pointer objects", and during the > conversion > create a 4e55ed3 pointer that points to the SHA256 commit of (add: > add --chmod=+x / --chmod=-x options, 2016-05-31). Hmmm. If you are designing this "pointer objects" to be extensible enough to cover other foreign vcs (i.e.e.g. you make it to be capable of mapping Subversion's r24323 to a matching commit in the converted result), I would think it may be a very useful thing to have, but I think it is pretty much orthogonal to the discussion in this topic. IOW, that can happen with or without change of the hash function. And looking at it that way, I am not sure if such a mapping feature should require adding a new type of "object". -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
On Wed, Jul 20, 2016 at 7:44 AM, Duy Nguyenwrote: > On Wed, Jul 20, 2016 at 2:28 PM, Johannes Schindelin > wrote: >> But that strategy *still* ignores the distributed nature of Git. Just >> because *you* make that merge at a certain point does not necessarily mean >> that I make it at that point, too. >> >> Any approach that tries to have one single point of conversion will most >> likely fall short of a solution. > > OK I see the difference in our views now. To me an sha256 repo would > see an sha1 repo as a _foreign_ DVCS, pretty much like git sees > mercurial now. So a transition from sha1 to sha256 is not that > different from cvs -> svn -> a dvcs bubble -> git. > >> To be honest, I am less concerned about the GPG-signed commits (after all, >> after switching to a more secure hash algorithm, a maintainer could >> cross-sign all signed commits, or only the branch tips or tags, as new >> tags, to reinstitute trust). >> >> I am much more concerned about references to commits, both inside and >> outside the repository. That is, if I read anywhere on the internet about >> Git having added support for `git add --chmod=+x ` in 4e55ed3 (add: >> add --chmod=+x / --chmod=-x options, 2016-05-31), I want to find that >> commit by that reference. >> >> And I am of course concerned what should happen if a user wants to fetch >> from, or push to, a SHA-1-hashed remote repository into, or from, a >> SHA-256-hashed local one. > > to follow the above, in my view, interaction with sha1 repos go > through some conversion bridges like what we have with hg and svn. I > don't know if we are going this route. It's certainly simpler and > people already have experiences (from previous migration) to prepare > for it. When treating the SHA1 version as a foreign dvcs and the SHA256 as the real deal, we could introduce "pointer objects", and during the conversion create a 4e55ed3 pointer that points to the SHA256 commit of (add: add --chmod=+x / --chmod=-x options, 2016-05-31). Ideally we would not even expose this sort of object a lot, e.g. git show would just redirect automatically. Instead of a new class of "pointer objects" we could also solve this via a lot of refs. (refs/old-sha1/4e55ed3 pointing to the converted commit; Though we would need to accept partial refs names then :/) > -- > Duy > -- > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
On Tue, Jul 19, 2016 at 8:58 PM, Herczeg Zsoltwrote: > 2016-07-19 20:04 GMT+02:00 Duy Nguyen : >> On Tue, Jul 19, 2016 at 7:59 PM, David Lang wrote: >>> On Tue, 19 Jul 2016, Duy Nguyen wrote: >>> On Tue, Jul 19, 2016 at 7:34 PM, David Lang wrote: > > On Tue, 19 Jul 2016, Duy Nguyen wrote: > >> On Tue, Jul 19, 2016 at 9:18 AM, Johannes Schindelin >> wrote: But we can recreate SHA-1 from the same content and verify GPG, right? I know it's super expensive, but it feels safer to not carry SHA-1 around when it's not secure anymore (I recall something about exploiting the weakest link when you have both sha1 and sha256 in the object content). Rehashing would be done locally and is better controlled. >>> >>> >>> >>> You could. But how would you determine whether to recreate the commit >>> object from a SHA-1-ified version of the commit buffer? Fall back if >>> the >>> original did not match the signature? >> >> >> >> Any repo would have a cut point when they move to sha256 (or whatever >> new hash), if we can record this somewhere (e.g. as a tag or a bunch >> of tags, or some dummy commits to mark the heads of the repo) then we >> only verify gpg signatures _in_ the repository before this point. > > > > remember that a repo doesn't have a single 'now', each branch has it's > own > head, and you can easily go back to prior points and branch off from > there. > > Since timestamps in repos can't be trusted (different people's clocks may > not be in sync), how would you define this cutoff point? The set of all heads at the time the conversion happens (maybe plus all the real tags). We can make an octopus merge commit to cover all the heads, then it can be the reference point. >>> >>> >>> so to make sure I'm understanding this, anything not reachable from that >>> merge must be the new hash, correct? Including forks, merges, etc that >>> happen from earlier points in the history. >> >> Yes everything except that merge and everything reachable from it, the >> whole old clone, basically. > > It could work, but does it worth it? > > 1) If you use multihash, you should assume that anything with SHA1 > could be manipulated. That means you can "inject" something later to > that "old clone" anyway. No it's not multihash. The repo only uses sha256, but by substituting it with sha1 using the same dag, we can recreate the exact same sha1 repo (up to the conversion point). This is mostly to avoid people injecting something because _you_ generate the repo locally. > 2) Even if the content is re-hashed, it's hard to understand for a > user where the trust comes from. The user should decide weather he > trust (or not) the person who signed that octopus breakpoint. > > Even without git you can achieve this security: Get the complete old > repository, make a signed tarball of it. If anytime later you want to > check that signatures, you can just use that tarball. I don't think > it's worth the trouble to create a native method for something which > is rare, and can be worked around easily. It's actually easier for a > user to understand the "trust relation" when using this workaround. > > Referring to that signed-tarball approach, you may just as well drop > all signature data on conversion... As long as you can look up the > references to old hashes easily, I think it's usable enough. It's more or less the signed-tarball approach in my view, except that you recreate that tarball dynamically with your sha256 repo (so this tarball is "signed" with sha256). -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
On Wed, Jul 20, 2016 at 2:28 PM, Johannes Schindelinwrote: > But that strategy *still* ignores the distributed nature of Git. Just > because *you* make that merge at a certain point does not necessarily mean > that I make it at that point, too. > > Any approach that tries to have one single point of conversion will most > likely fall short of a solution. OK I see the difference in our views now. To me an sha256 repo would see an sha1 repo as a _foreign_ DVCS, pretty much like git sees mercurial now. So a transition from sha1 to sha256 is not that different from cvs -> svn -> a dvcs bubble -> git. > To be honest, I am less concerned about the GPG-signed commits (after all, > after switching to a more secure hash algorithm, a maintainer could > cross-sign all signed commits, or only the branch tips or tags, as new > tags, to reinstitute trust). > > I am much more concerned about references to commits, both inside and > outside the repository. That is, if I read anywhere on the internet about > Git having added support for `git add --chmod=+x ` in 4e55ed3 (add: > add --chmod=+x / --chmod=-x options, 2016-05-31), I want to find that > commit by that reference. > > And I am of course concerned what should happen if a user wants to fetch > from, or push to, a SHA-1-hashed remote repository into, or from, a > SHA-256-hashed local one. to follow the above, in my view, interaction with sha1 repos go through some conversion bridges like what we have with hg and svn. I don't know if we are going this route. It's certainly simpler and people already have experiences (from previous migration) to prepare for it. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
Hi Duy, On Tue, 19 Jul 2016, Duy Nguyen wrote: > On Tue, Jul 19, 2016 at 7:59 PM, David Langwrote: > > On Tue, 19 Jul 2016, Duy Nguyen wrote: > > > >> On Tue, Jul 19, 2016 at 7:34 PM, David Lang wrote: > >>> > >>> On Tue, 19 Jul 2016, Duy Nguyen wrote: > >>> > On Tue, Jul 19, 2016 at 9:18 AM, Johannes Schindelin > wrote: > >> > >> > >> But we can recreate SHA-1 from the same content and verify GPG, > >> right? I know it's super expensive, but it feels safer to not > >> carry SHA-1 around when it's not secure anymore (I recall > >> something about exploiting the weakest link when you have both > >> sha1 and sha256 in the object content). Rehashing would be done > >> locally and is better controlled. > > > > You could. But how would you determine whether to recreate the > > commit object from a SHA-1-ified version of the commit buffer? > > Fall back if the original did not match the signature? > > Any repo would have a cut point when they move to sha256 (or > whatever new hash), if we can record this somewhere (e.g. as a tag > or a bunch of tags, or some dummy commits to mark the heads of the > repo) then we only verify gpg signatures _in_ the repository before > this point. > >>> > >>> remember that a repo doesn't have a single 'now', each branch has > >>> it's own head, and you can easily go back to prior points and branch > >>> off from there. > >>> > >>> Since timestamps in repos can't be trusted (different people's > >>> clocks may not be in sync), how would you define this cutoff point? > >> > >> The set of all heads at the time the conversion happens (maybe plus > >> all the real tags). We can make an octopus merge commit to cover all > >> the heads, then it can be the reference point. > > > > so to make sure I'm understanding this, anything not reachable from > > that merge must be the new hash, correct? Including forks, merges, etc > > that happen from earlier points in the history. > > Yes everything except that merge and everything reachable from it, the > whole old clone, basically. But that strategy *still* ignores the distributed nature of Git. Just because *you* make that merge at a certain point does not necessarily mean that I make it at that point, too. Any approach that tries to have one single point of conversion will most likely fall short of a solution. To be honest, I am less concerned about the GPG-signed commits (after all, after switching to a more secure hash algorithm, a maintainer could cross-sign all signed commits, or only the branch tips or tags, as new tags, to reinstitute trust). I am much more concerned about references to commits, both inside and outside the repository. That is, if I read anywhere on the internet about Git having added support for `git add --chmod=+x ` in 4e55ed3 (add: add --chmod=+x / --chmod=-x options, 2016-05-31), I want to find that commit by that reference. And I am of course concerned what should happen if a user wants to fetch from, or push to, a SHA-1-hashed remote repository into, or from, a SHA-256-hashed local one. Ciao, Dscho -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
2016-07-19 20:04 GMT+02:00 Duy Nguyen: > On Tue, Jul 19, 2016 at 7:59 PM, David Lang wrote: >> On Tue, 19 Jul 2016, Duy Nguyen wrote: >> >>> On Tue, Jul 19, 2016 at 7:34 PM, David Lang wrote: On Tue, 19 Jul 2016, Duy Nguyen wrote: > On Tue, Jul 19, 2016 at 9:18 AM, Johannes Schindelin > wrote: >>> >>> >>> But we can recreate SHA-1 from the same content and verify GPG, right? >>> I know it's super expensive, but it feels safer to not carry SHA-1 >>> around when it's not secure anymore (I recall something about >>> exploiting the weakest link when you have both sha1 and sha256 in the >>> object content). Rehashing would be done locally and is better >>> controlled. >> >> >> >> You could. But how would you determine whether to recreate the commit >> object from a SHA-1-ified version of the commit buffer? Fall back if >> the >> original did not match the signature? > > > > Any repo would have a cut point when they move to sha256 (or whatever > new hash), if we can record this somewhere (e.g. as a tag or a bunch > of tags, or some dummy commits to mark the heads of the repo) then we > only verify gpg signatures _in_ the repository before this point. remember that a repo doesn't have a single 'now', each branch has it's own head, and you can easily go back to prior points and branch off from there. Since timestamps in repos can't be trusted (different people's clocks may not be in sync), how would you define this cutoff point? >>> >>> >>> The set of all heads at the time the conversion happens (maybe plus >>> all the real tags). We can make an octopus merge commit to cover all >>> the heads, then it can be the reference point. >> >> >> so to make sure I'm understanding this, anything not reachable from that >> merge must be the new hash, correct? Including forks, merges, etc that >> happen from earlier points in the history. > > Yes everything except that merge and everything reachable from it, the > whole old clone, basically. It could work, but does it worth it? 1) If you use multihash, you should assume that anything with SHA1 could be manipulated. That means you can "inject" something later to that "old clone" anyway. 2) Even if the content is re-hashed, it's hard to understand for a user where the trust comes from. The user should decide weather he trust (or not) the person who signed that octopus breakpoint. Even without git you can achieve this security: Get the complete old repository, make a signed tarball of it. If anytime later you want to check that signatures, you can just use that tarball. I don't think it's worth the trouble to create a native method for something which is rare, and can be worked around easily. It's actually easier for a user to understand the "trust relation" when using this workaround. Referring to that signed-tarball approach, you may just as well drop all signature data on conversion... As long as you can look up the references to old hashes easily, I think it's usable enough. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
Duy Nguyenwrites: >> Even though that single operation might be possible, do not go >> there. A "pathname" identifies a "path", not its contents, and >> "appending crap after path" breaks the data model badly. > > I thought about that but I thought all those operations required > special treatment for submodules anyway. Operatins requiring special treatment does not make it right to break the data model anyway, so... -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
On Tue, Jul 19, 2016 at 7:59 PM, David Langwrote: > On Tue, 19 Jul 2016, Duy Nguyen wrote: > >> On Tue, Jul 19, 2016 at 7:34 PM, David Lang wrote: >>> >>> On Tue, 19 Jul 2016, Duy Nguyen wrote: >>> On Tue, Jul 19, 2016 at 9:18 AM, Johannes Schindelin wrote: >> >> >> But we can recreate SHA-1 from the same content and verify GPG, right? >> I know it's super expensive, but it feels safer to not carry SHA-1 >> around when it's not secure anymore (I recall something about >> exploiting the weakest link when you have both sha1 and sha256 in the >> object content). Rehashing would be done locally and is better >> controlled. > > > > You could. But how would you determine whether to recreate the commit > object from a SHA-1-ified version of the commit buffer? Fall back if > the > original did not match the signature? Any repo would have a cut point when they move to sha256 (or whatever new hash), if we can record this somewhere (e.g. as a tag or a bunch of tags, or some dummy commits to mark the heads of the repo) then we only verify gpg signatures _in_ the repository before this point. >>> >>> >>> >>> remember that a repo doesn't have a single 'now', each branch has it's >>> own >>> head, and you can easily go back to prior points and branch off from >>> there. >>> >>> Since timestamps in repos can't be trusted (different people's clocks may >>> not be in sync), how would you define this cutoff point? >> >> >> The set of all heads at the time the conversion happens (maybe plus >> all the real tags). We can make an octopus merge commit to cover all >> the heads, then it can be the reference point. > > > so to make sure I'm understanding this, anything not reachable from that > merge must be the new hash, correct? Including forks, merges, etc that > happen from earlier points in the history. Yes everything except that merge and everything reachable from it, the whole old clone, basically. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
On Tue, 19 Jul 2016, Duy Nguyen wrote: On Tue, Jul 19, 2016 at 7:34 PM, David Langwrote: On Tue, 19 Jul 2016, Duy Nguyen wrote: On Tue, Jul 19, 2016 at 9:18 AM, Johannes Schindelin wrote: But we can recreate SHA-1 from the same content and verify GPG, right? I know it's super expensive, but it feels safer to not carry SHA-1 around when it's not secure anymore (I recall something about exploiting the weakest link when you have both sha1 and sha256 in the object content). Rehashing would be done locally and is better controlled. You could. But how would you determine whether to recreate the commit object from a SHA-1-ified version of the commit buffer? Fall back if the original did not match the signature? Any repo would have a cut point when they move to sha256 (or whatever new hash), if we can record this somewhere (e.g. as a tag or a bunch of tags, or some dummy commits to mark the heads of the repo) then we only verify gpg signatures _in_ the repository before this point. remember that a repo doesn't have a single 'now', each branch has it's own head, and you can easily go back to prior points and branch off from there. Since timestamps in repos can't be trusted (different people's clocks may not be in sync), how would you define this cutoff point? The set of all heads at the time the conversion happens (maybe plus all the real tags). We can make an octopus merge commit to cover all the heads, then it can be the reference point. so to make sure I'm understanding this, anything not reachable from that merge must be the new hash, correct? Including forks, merges, etc that happen from earlier points in the history. David Lang -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
On Tue, Jul 19, 2016 at 7:34 PM, David Langwrote: > On Tue, 19 Jul 2016, Duy Nguyen wrote: > >> On Tue, Jul 19, 2016 at 9:18 AM, Johannes Schindelin >> wrote: But we can recreate SHA-1 from the same content and verify GPG, right? I know it's super expensive, but it feels safer to not carry SHA-1 around when it's not secure anymore (I recall something about exploiting the weakest link when you have both sha1 and sha256 in the object content). Rehashing would be done locally and is better controlled. >>> >>> >>> You could. But how would you determine whether to recreate the commit >>> object from a SHA-1-ified version of the commit buffer? Fall back if the >>> original did not match the signature? >> >> >> Any repo would have a cut point when they move to sha256 (or whatever >> new hash), if we can record this somewhere (e.g. as a tag or a bunch >> of tags, or some dummy commits to mark the heads of the repo) then we >> only verify gpg signatures _in_ the repository before this point. > > > remember that a repo doesn't have a single 'now', each branch has it's own > head, and you can easily go back to prior points and branch off from there. > > Since timestamps in repos can't be trusted (different people's clocks may > not be in sync), how would you define this cutoff point? The set of all heads at the time the conversion happens (maybe plus all the real tags). We can make an octopus merge commit to cover all the heads, then it can be the reference point. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
On Tue, 19 Jul 2016, Duy Nguyen wrote: On Tue, Jul 19, 2016 at 9:18 AM, Johannes Schindelinwrote: But we can recreate SHA-1 from the same content and verify GPG, right? I know it's super expensive, but it feels safer to not carry SHA-1 around when it's not secure anymore (I recall something about exploiting the weakest link when you have both sha1 and sha256 in the object content). Rehashing would be done locally and is better controlled. You could. But how would you determine whether to recreate the commit object from a SHA-1-ified version of the commit buffer? Fall back if the original did not match the signature? Any repo would have a cut point when they move to sha256 (or whatever new hash), if we can record this somewhere (e.g. as a tag or a bunch of tags, or some dummy commits to mark the heads of the repo) then we only verify gpg signatures _in_ the repository before this point. remember that a repo doesn't have a single 'now', each branch has it's own head, and you can easily go back to prior points and branch off from there. Since timestamps in repos can't be trusted (different people's clocks may not be in sync), how would you define this cutoff point? David Lang -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
On Tue, Jul 19, 2016 at 7:06 PM, Junio C Hamanowrote: > Duy Nguyen writes: > >> Post-shower thoughts. In a tree object, a submodule entry consists of >> perm (S_IFGITLINK), hash (which is the external hash) and path. We >> could fill the "hash" part with all zero (invalid, signature of new >> submodule hash format), then append "/:" to >> the "path" part. This way we don't have to update tree object or index >> format. And I suspect the "path" part is available everywhere we need >> to handle submodules already, so extracting the external hash should >> be possible... > > Even though that single operation might be possible, do not go > there. A "pathname" identifies a "path", not its contents, and > "appending crap after path" breaks the data model badly. Also other > things like merge, checkout and diff would break by butchering > ordering the entries in tree objects. I thought about that but I thought all those operations required special treatment for submodules anyway. But I forgot about d/f conflicts so yeah it's a bad idea. We still have some invalid "mode" combination that can be used as S_IFGITLINK2, then we can have variable length hash field in the entry. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
Duy Nguyenwrites: > Post-shower thoughts. In a tree object, a submodule entry consists of > perm (S_IFGITLINK), hash (which is the external hash) and path. We > could fill the "hash" part with all zero (invalid, signature of new > submodule hash format), then append "/:" to > the "path" part. This way we don't have to update tree object or index > format. And I suspect the "path" part is available everywhere we need > to handle submodules already, so extracting the external hash should > be possible... Even though that single operation might be possible, do not go there. A "pathname" identifies a "path", not its contents, and "appending crap after path" breaks the data model badly. Also other things like merge, checkout and diff would break by butchering ordering the entries in tree objects. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
On Mon, Jul 18, 2016 at 6:51 PM, Duy Nguyenwrote: > On Sun, Jul 17, 2016 at 4:21 PM, brian m. carlson > wrote: >> I'm going to end up having to do something similar because of the issue >> of submodules. Submodules may still be SHA-1, while the main repo may >> be a newer hash. > > Or even the other way around, main repo is one with sha1 while > submodule is on sha256. I wonder if we should address this separately > (and even in parallel with sha256 support), making submodules work > with an any external VCS system (that supports some basic operations > we define). Post-shower thoughts. In a tree object, a submodule entry consists of perm (S_IFGITLINK), hash (which is the external hash) and path. We could fill the "hash" part with all zero (invalid, signature of new submodule hash format), then append "/:" to the "path" part. This way we don't have to update tree object or index format. And I suspect the "path" part is available everywhere we need to handle submodules already, so extracting the external hash should be possible... -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
On Tue, Jul 19, 2016 at 9:18 AM, Johannes Schindelinwrote: >> But we can recreate SHA-1 from the same content and verify GPG, right? >> I know it's super expensive, but it feels safer to not carry SHA-1 >> around when it's not secure anymore (I recall something about >> exploiting the weakest link when you have both sha1 and sha256 in the >> object content). Rehashing would be done locally and is better >> controlled. > > You could. But how would you determine whether to recreate the commit > object from a SHA-1-ified version of the commit buffer? Fall back if the > original did not match the signature? Any repo would have a cut point when they move to sha256 (or whatever new hash), if we can record this somewhere (e.g. as a tag or a bunch of tags, or some dummy commits to mark the heads of the repo) then we only verify gpg signatures _in_ the repository before this point. > That would pose at least these two problems: > > 1. The point of a signature is trust. If all of a sudden the signature > does not match what is supposedly signed, that trust is broken. > > 2. The point of going to a stronger hash is to increase the trust. If > any developer could decide to sign the SHA-1-ified version of any future > commit, and Git validating it, it would be even worse than not switching > to a new hash: it would leave us open to collision attacks *and* pretend > that we prevented such attacks. GPG signatures are still valid on the old repo (we will keep old repos around forever, I suppose). And because they sign on the "weak" hash, sha1, at some point they will be broken (but until then we can still regenerate sha1 and verify locally). When sha1 is broken, GPG signatures of the past can't be trusted anymore. If people care enough about the past, they should re-sign (at least for tags). Commits can be re-signed by the person who does the conversion. Yes you have to trust that person. Sort of a painful fresh start, with hopefully better security. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
On Tue, 19 Jul 2016, Johannes Schindelin wrote: Hi Duy, On Mon, 18 Jul 2016, Duy Nguyen wrote: On Sun, Jul 17, 2016 at 4:21 PM, brian m. carlsonwrote: I'm going to end up having to do something similar because of the issue of submodules. Submodules may still be SHA-1, while the main repo may be a newer hash. Or even the other way around, main repo is one with sha1 while submodule is on sha256. I wonder if we should address this separately (and even in parallel with sha256 support), making submodules work with an any external VCS system (that supports some basic operations we define). It is safe to assume that any project using a submodule with a more secure hash would require Git tooling capable of said hash. It would hence make no sense to use SHA-1 for the super project. So I do not believe that we have to support the use case of a SHA-1-based project using SHA-256-based submodules. they have different upstreams, what if the upstream of the submodule has upgraded and is using signed commits of the sha-256 but the upstream of the parent hasn't and is using signed commits of sha1? David Lang -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
Hi Duy, On Mon, 18 Jul 2016, Duy Nguyen wrote: > On Sun, Jul 17, 2016 at 4:21 PM, brian m. carlson >wrote: > > I'm going to end up having to do something similar because of the issue > > of submodules. Submodules may still be SHA-1, while the main repo may > > be a newer hash. > > Or even the other way around, main repo is one with sha1 while > submodule is on sha256. I wonder if we should address this separately > (and even in parallel with sha256 support), making submodules work > with an any external VCS system (that supports some basic operations > we define). It is safe to assume that any project using a submodule with a more secure hash would require Git tooling capable of said hash. It would hence make no sense to use SHA-1 for the super project. So I do not believe that we have to support the use case of a SHA-1-based project using SHA-256-based submodules. Ciao, Dscho -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
Hi Zsolt, On Mon, 18 Jul 2016, Herczeg Zsolt wrote: > >> My point is not to throw out old hashes and break signatures. My point > >> is to convert the data storage, and use mapping to resolve problems > >> with those old hashes and signatures. > > > > If you convert the data storage, then the SHA-1s listed in the commit > > objects will have to be rewritten, and then the GPG signature will not > > match anymore. > > > > Call e.g. `git cat-file commit 44cc742a8ca17b9c279be4cc195a93a6ef7a320e` > > to see the anatomy of a gpg-signed commit object. > > > > Yes and no. That's the reason you need the two-way lookup table. If > you need to verify a commit which was signed as SHA-1, you must use > the lookup table in reverse. That pretends that it is both easy and trustworthy to know when (and how) to recreate the SHA-1-ified version of the commit object. Neither is the case, though. Ciao, Johannes -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
Hi Duy, On Mon, 18 Jul 2016, Duy Nguyen wrote: > On Mon, Jul 18, 2016 at 5:57 PM, Johannes Schindelin >wrote: > > > > On Mon, 18 Jul 2016, Herczeg Zsolt wrote: > > > >> >> I think converting is a much better option. Use a single-hash > >> >> storage, and convert everything to that on import/clone/pull. > >> > > >> > That ignores two very important issues that I already had mentioned: > >> > >> That's not true. If you double-check the next part of my message, you I > >> just showed that an automatic two-way mapping could solve these > >> problems! (I even give briefs explanation how to handle referencing and > >> signature verification in those cases.) > >> > >> My point is not to throw out old hashes and break signatures. My point > >> is to convert the data storage, and use mapping to resolve problems > >> with those old hashes and signatures. > > > > If you convert the data storage, then the SHA-1s listed in the commit > > objects will have to be rewritten, and then the GPG signature will not > > match anymore. > > But we can recreate SHA-1 from the same content and verify GPG, right? > I know it's super expensive, but it feels safer to not carry SHA-1 > around when it's not secure anymore (I recall something about > exploiting the weakest link when you have both sha1 and sha256 in the > object content). Rehashing would be done locally and is better > controlled. You could. But how would you determine whether to recreate the commit object from a SHA-1-ified version of the commit buffer? Fall back if the original did not match the signature? That would pose at least these two problems: 1. The point of a signature is trust. If all of a sudden the signature does not match what is supposedly signed, that trust is broken. 2. The point of going to a stronger hash is to increase the trust. If any developer could decide to sign the SHA-1-ified version of any future commit, and Git validating it, it would be even worse than not switching to a new hash: it would leave us open to collision attacks *and* pretend that we prevented such attacks. The more I think about it, the more I am convinced that we have no choice but allow mixed hashes (i.e. both 160-bit SHA-1 and 256-bit new hash, whatever we settle on). Otherwise there would be no reliable and trustworthy upgrade path. But maybe there is a clever strategy I failed to think of? Ciao, Dscho -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
On Mon, Jul 18, 2016 at 11:00:35AM -0700, Junio C Hamano wrote: > Continuing this thought process, I do not see a good way to allow us > to wean ourselves off of the old hash, unless we _break_ the pack > stream format so that each object in the pack carries not just the > data but also the hash algorithm to be used to _name_ it, so that > new objects will never be referred to using the old hash. I think for this reason, I'm going to propose the following approach when we get there: * We serialize the hash in the object formats, using multihash or something similar. This means that it is minimally painful if we ever need to change in the future[0]. * Each repository carries exactly one hash algorithm, except for submodule data. If we don't do this, then some people will never switch because the submodules they depend on haven't. * If people on the new format need to refer to submodule commits using SHA-1, then they have to use a prefix on the hash form; otherwise, they can use the raw hash value (without any multihash prefix). * git fsck verifies one consistent algorithm (excepting submodule references). This preserves the security benefits, avoids future-proofing problems, and minimizes performance impacts due to naming like you mentioned. [0] We are practically limited to 256-bit hashes because anything longer will wrap on an 80-column terminal when in hex form. -- brian m. carlson / brian with sandals: Houston, Texas, US +1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
Re: Git and SHA-1 security (again)
On Mon, Jul 18, 2016 at 09:00:06AM +0200, Johannes Schindelin wrote: > Hi Brian, > > On Sun, 17 Jul 2016, brian m. carlson wrote: > > > On Sun, Jul 17, 2016 at 10:01:38AM +0200, Johannes Schindelin wrote: > > > Out of curiosity: have you considered something like padding the SHA-1s > > > with, say 0xa1, to the size of the new hash and using that padding to > > > distinguish between old vs new hash? > > > > I'm going to end up having to do something similar because of the issue > > of submodules. Submodules may still be SHA-1, while the main repo may > > be a newer hash. I was going to zero-pad, however. > > I thought about zero-padding, but there are plenty of > is_null_sha1()/is_null_oid() calls around. Of course, I assumed > left-padding. But you may have thought of right-padding instead? That > would make short name handling much easier, too. I was going to right-pad. > FWIW it never crossed my mind to allow different same-sized hash > algorithms. So I never thought we'd need a way to distinguish, say, > BLAKE2b-256 from SHA-256. > > Is there a good reason to add the maintenance burden of several 256-bit > hash algorithms, apart from speed (which in my mind should decide which > one to use, always, rather than letting the user choose)? It would also > complicate transport even further, let alone subtree merges from > differently-hashed repositories. There are really three candidates: * SHA-256 (the SHA-2 algorithm): While this looks good right now, cryptanalysis is advancing. This is not a good choice for a long-term solution. * SHA3-256 (the SHA-3 algorithm): This is the conservative choice. It's also faster than SHA-256 on 64-bit systems. It has a very conservative security margin and is a good long-term choice. * BLAKE2b-256: This is the blisteringly fast choice. It outperforms SHA-1 and even MD5 on 64-bit systems. This algorithm was designed so that nobody would have a reason to use an insecure algorithm. It will probably be secure for some time, but maybe not as long as SHA3-256. I'm only considering 256-bit hashes, because anything longer won't fit on an 80-column terminal in hex form. The reason I had considered implementing both SHA3-256 and BLAKE2b-256 is that I want there to be no reason not to upgrade. People who need a FIPS-approved algorithm or want a long-term, conservative choice should use SHA3-256. People who want even better performance than current Git would use BLAKE2b-256. Performance comparison (my implementations): SHA-1: 437 MiB/s SHA-256: 196 MiB/s SHA3-256: 273 MiB/s BLAKE2b: 649 MiB/s I hadn't thought about subtree merges, though. -- brian m. carlson / brian with sandals: Houston, Texas, US +1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
Re: Git and SHA-1 security (again)
>> The reality of the current situation is that it's largely mitigated in >> practice because: >> >> a) it's hard to hand someone a crafted blob to begin with for reasons >> that have nothing to do with SHA-1 (they'll go "wtf is this garbage?") >> >> b) even in that case it's *very* hard to come up with two colliding >> blobs that are *useful* for some nefarious purpose, e.g. a program A >> that looks normal being replaced by an evil program B with the same >> SHA-1. > > Thanks. That's a nice rephrasing of > > > http://public-inbox.org/git/Pine.LNX.4.58.0504291221250.18901%40ppc970.osdl.org/ > > where Linus explains SHA-1 is not the security, and the real > security is in distribution. If the real security is in the distribution, than why git supports signed commits and objects? The security of the signatures do depend on the hash. Saying the hash is not a security feature and offering GPG signing based on that hash is a damn big lie. You can change the hash algorithm to a secure one, or change the signing method to be independent of the hash algorithm, or you can stop offering signatures at all, but something has to be done here. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
Junio C Hamano wrote: > Continuing this thought process, I do not see a good way to allow us > to wean ourselves off of the old hash, unless we _break_ the pack > stream format so that each object in the pack carries not just the > data but also the hash algorithm to be used to _name_ it, so that > new objects will never be referred to using the old hash. Taking a step further: I don't think that any backward-compatible format change would address the security concerns with sufficiently old hashing algorithms. As long as my favorite repository is allowed to contain objects identified by SHA-1, my adversary can exploit a SHA-1 collision using signed tags referring (possibly indirectly) to backdated objects. The Git object format does not include a proof of commit date, so I cannot guarantee "Only old objects are named by SHA-1". There is a way to get a backward-compatible *user experience* without the format change being backward-compatible, though. Name all objects in the repository using FuturisticHash. Also store enough information to recover the old hashes, either in objects as a new field or in a side table. If the old hash is broken, signatures using the old hash cannot be trusted. An adversary could generate a collision to retroactively change the meaning of an existing signature. To maintain the meaning of old signatures, someone has to record the new names of all involved objects, either before the state of the art in breaking the old hash advances far enough or using a copy of the repository from before the state of the art had advanced --- in effect you need new signatures to maintain the meaning of old signatures. This could happen as part of the process of updating a repository to use a new hash. E.g. object a787a87b98a7s98798a798b7a98b798a7b98a7b987a9b87a9b87a98b79a87b98a7b98a7b987a987987a878a78a sha1tag object 04b871796dc0420f8e7561a895b52484b701d51a type commit tag signedtag tagger C O Mitter1465981006 + signed tag signed tag message body -BEGIN PGP SIGNATURE- Version: GnuPG v1 iQEcBAABAgAGBQJXYRhOAAoJEGEJLoW3InGJklkIAIcnhL7RwEb/+QeX9enkXhxn rxfdqrvWd1K80sl2TOt8Bg/NYwrUBw/RWJ+sg/hhHp4WtvE1HDGHlkEz3y11Lkuh 8tSxS3qKTxXUGozyPGuE90sJfExhZlW4knIQ1wt/yWqM+33E9pN4hzPqLwyrdods q8FWEqPPUbSJXoMbRPw04S5jrLtZSsUWbRYjmJCHzlhSfFWW4eFd37uquIaLUBS0 rkC3Jrx7420jkIpgFcTI2s60uhSQLzgcCwdA2ukSYIRnjg/zDkj8+3h/GaROJ72x lZyI6HWixKJkWw8lE9aAOD9TmTW9sFJwcVAzmAuFX2kUreDUKMZduGcoRYGpD7E= =jpXa -END PGP SIGNATURE- -BEGIN PGP SIGNATURE ... -END PGP SIGNATURE This example uses a signature to attest that mapping 04b871796dc0420f8e7561a895b52484b701d51a->a787a87b98a7s98798a798b7a98b798a7b98a7b987a9b87a9b87a98b79a87b98a7b98a7b987a987987a878a78a is correct. A more straightforward approach would be for the conversion process to produce an out-of-band signed mapping list to make the sha1tag usable without such a signature. Summary: * Git's properties depend on using a single hash function throughout a repository. I don't think we should change that. * A safe and mostly painless migration to a stronger hash function is possible using a signed assertion (for example generated by the conversion process) of the mapping from old object names to new object names. * Dealing with multiple such signed mappings (for example due to separate conversion of repositories based on linux.git) is left as an exercise to the reader. Hope that helps, Jonathan -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
Ævar Arnfjörð Bjarmasonwrites: > The reality of the current situation is that it's largely mitigated in > practice because: > > a) it's hard to hand someone a crafted blob to begin with for reasons > that have nothing to do with SHA-1 (they'll go "wtf is this garbage?") > > b) even in that case it's *very* hard to come up with two colliding > blobs that are *useful* for some nefarious purpose, e.g. a program A > that looks normal being replaced by an evil program B with the same > SHA-1. Thanks. That's a nice rephrasing of http://public-inbox.org/git/Pine.LNX.4.58.0504291221250.18901%40ppc970.osdl.org/ where Linus explains SHA-1 is not the security, and the real security is in distribution. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
On Mon, 18 Jul 2016, Herczeg Zsolt wrote: In particular, as far as I know and as Theodore Ts'o's post describes better than I could[1], you seem to be confusing preimage attacks with collision attacks, and then concluding that because SHA1 is vulnerable to collision attacks that use-cases that would need a preimage attack to be compromised (which as far is I can tell, includes all your examples) are also "broken". I understand the differences between the collision and preimage attacks. A collision attack is not that bad for git in a typical use-case. But I think that it's important to note that there are many use-cases which do need a hash safe from collision attack. Some examples: You maintain a repository with gittorrent with signed commits Others can use these signatures to verify it's original. Let's say you include some safe file (potentially binary) from a third-party contributor. That would be fine if the hash algo is safe. Currently there is the possibility that you received a (safe) file which was made to collide with another malicious one. Once you committed (and signed) that file, the attacker joins the gittorrent network and starts to distribute the malicious file. Suddenly most of your clients pulling are infected however your signature is correct. Or, you would like to make a continuous delivery system, where you use signed tags. The delivery happens only when signature is right, and the signer is responsible for it. Your colleague makes a collision, pushes the good-file. You make all the tests, everything is fine, sign and push and wait for the delivery to happen. Your colleague changes the file on the server. The delivery makes a huge mass, and you're fired. Or, let's say you use a service like github, which is nice enough to make a repository for you, with .gitignore, licenses and everything. Likely, you'll never change dose files. Let's say that service made one of those initial files to collide something bad. That means, they can "infect" anyone, who is pulling your repo. Do you need more hypothetical stories? There are a lot. Of course they need a lot of work, and they're unlikely to happen. But it's possible. If you need trust, and gpg signatures that means you need ultimate trust. What's the point in making GPG signatures anyway if you cannot ultimately trust them? You could just as well say: well that's repository is only reachable by trustworthy persons, everything here is just fine and really made by the person named in the "author field". All of your examples are actually preimage attacks. If the bad guy can tamper with the both the 'safe' and 'malicious' versions of the file, they don't actually need the malicious version, they can attack you through the one you think is 'safe' The 'collision' attack isn't that there is some increased chance of a random file colliding with your safe file, it's that if you are manipulating the contents of both files, you can create two that collide. This won't hurt a Git repository unless one of these manipulated files is able to be introduced as a legitimate part of the repo you are dealing with. David Lang -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
On Mon, Jul 18, 2016 at 7:48 PM, Herczeg Zsoltwrote: >> In particular, as far as I know and as Theodore Ts'o's post describes >> better than I could[1], you seem to be confusing preimage attacks with >> collision attacks, and then concluding that because SHA1 is vulnerable >> to collision attacks that use-cases that would need a preimage attack >> to be compromised (which as far is I can tell, includes all your >> examples) are also "broken". > > I understand the differences between the collision and preimage > attacks. Fair enough. The rest of your E-Mail certainly shows that you do, and I didn't know enough anything about GitTorrent and this case where it's vulnerable to collission attacks. But I didn't get that impression from your initial E-Mail which outright said said: Git signed tags and signed commits are cryptographically insecure, they're useless at the moment. It's important that those of us who *do* understand the difference between collision and preimage attacks carefully phrase things, least they turn into FUD. Your initial E-Mail does *not* make it sound like you're just talking about the cases where someone's provided you with a crafted blob that you've been tricked into signing, but rather makes it sound like signed tags & commits are just categorically broken, even for preimage attacks, which is not the case. The reality of the current situation is that it's largely mitigated in practice because: a) it's hard to hand someone a crafted blob to begin with for reasons that have nothing to do with SHA-1 (they'll go "wtf is this garbage?") b) even in that case it's *very* hard to come up with two colliding blobs that are *useful* for some nefarious purpose, e.g. a program A that looks normal being replaced by an evil program B with the same SHA-1. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
"brian m. carlson"writes: > I will say that the pack format will likely require some changes, > because it assumes ... > The reason is that we can't have an unambiguous parse of the current > objects if two hash algorithms are in use > So when we look at a new hash, we need to provide an unambiguous way to > know what hash is in use. The two choices are to either require all > object use the new hash, or to extend the objects to include the hash. > Until a couple days ago, I had planned to do the former. I had not even > considered using a multihash approach due to the complexity. Objects in Git identify themselves, but once you introduce the second hash function (as opposed to replacing the hash function to a new one), you would allow people to call the same object by two names. That has interesting implications. Let's say you have a blob at path F in a top-level tree object and create a commit. You have three objects in total, the tree knows the blob as one name based on SHA-1 and the commit knows the tree as one name based on SHA-1. The same contents of the blob and the tree could have different names based on SHA-256 in the future Git. Let's further say you have a future Git and clone from the above repository with three objects. You get a pack stream, containing the data for one commit, tree and blob each. These objects do not carry their own name as extra pieces of information. You only get their contents, and it is up to you to name them by hashing. .idx files are created by running index-pack while receiving the pack data stream. You _somehow_ need to know that these three objects need to be hashed with SHA-1, even though you are SHA-256 capable, because otherwise the object name recorded in the tree object for the blob would not match what your .idx file would call the blob data. Also the object name recorded in the ref to point at the commit would not match the commit object's object name, unless you hash with SHA-1. It is a possibility to always hash these objects twice and record _both_ hashes in the updated .idx file; after all, .idx files are strictly local matter. Now let's further say that you update the file F in the working tree, and do "git commit -a" with updated version of Git. What should happen? Assuming that we are trying to migrate to a different hashing algorithm over time, we would want to create a new blob under object name based on SHA-256, add that to the index and write a new tree out, named by hashing with SHA-256. We then record that longer-named tree in a commit whose parent commit is still named with SHA-1 based hash, and the new commit in turn is named by hashing with SHA-256. Then you push the result back. Let's assume by now the place you cloned from is also SHA-256 capable. You look at the tips of refs at your clone-source and discover that you would need to only send the new commit, its tree and the updated blob. You send data in these three objects. The receiving end would now need to do the same "magically choose hash to make sure the new blob gets the name that is recorded in the new tree (and the new tree the new commit)" thing. The same discussion applies if somebody else clones from you at this point. The objects introduced by the second commit all need to be hashed with the new hash to be named, while the other objects need to be hashed with the old hash. Continuing this thought process, I do not see a good way to allow us to wean ourselves off of the old hash, unless we _break_ the pack stream format so that each object in the pack carries not just the data but also the hash algorithm to be used to _name_ it, so that new objects will never be referred to using the old hash. It matters performance-wise that the weaning process go as quickly as possible, once the system becomes capable of new hash algorighm, because during the transition period, we'd have to suffer the full tree-diff becoming inefficient (Note: don't limit your thinking to just "git diff" and "git log"; the same inefficiency hits "git checkout" to switch branches and "git merge" to walk three trees in parallel), because we cannot skip descending into subdirectories based on the tree object name being equal, which guarantees that everything under the hierarchy is equal. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
> In particular, as far as I know and as Theodore Ts'o's post describes > better than I could[1], you seem to be confusing preimage attacks with > collision attacks, and then concluding that because SHA1 is vulnerable > to collision attacks that use-cases that would need a preimage attack > to be compromised (which as far is I can tell, includes all your > examples) are also "broken". I understand the differences between the collision and preimage attacks. A collision attack is not that bad for git in a typical use-case. But I think that it's important to note that there are many use-cases which do need a hash safe from collision attack. Some examples: You maintain a repository with gittorrent with signed commits Others can use these signatures to verify it's original. Let's say you include some safe file (potentially binary) from a third-party contributor. That would be fine if the hash algo is safe. Currently there is the possibility that you received a (safe) file which was made to collide with another malicious one. Once you committed (and signed) that file, the attacker joins the gittorrent network and starts to distribute the malicious file. Suddenly most of your clients pulling are infected however your signature is correct. Or, you would like to make a continuous delivery system, where you use signed tags. The delivery happens only when signature is right, and the signer is responsible for it. Your colleague makes a collision, pushes the good-file. You make all the tests, everything is fine, sign and push and wait for the delivery to happen. Your colleague changes the file on the server. The delivery makes a huge mass, and you're fired. Or, let's say you use a service like github, which is nice enough to make a repository for you, with .gitignore, licenses and everything. Likely, you'll never change dose files. Let's say that service made one of those initial files to collide something bad. That means, they can "infect" anyone, who is pulling your repo. Do you need more hypothetical stories? There are a lot. Of course they need a lot of work, and they're unlikely to happen. But it's possible. If you need trust, and gpg signatures that means you need ultimate trust. What's the point in making GPG signatures anyway if you cannot ultimately trust them? You could just as well say: well that's repository is only reachable by trustworthy persons, everything here is just fine and really made by the person named in the "author field". -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
On Sun, Jul 17, 2016 at 4:21 PM, brian m. carlsonwrote: > I'm going to end up having to do something similar because of the issue > of submodules. Submodules may still be SHA-1, while the main repo may > be a newer hash. Or even the other way around, main repo is one with sha1 while submodule is on sha256. I wonder if we should address this separately (and even in parallel with sha256 support), making submodules work with an any external VCS system (that supports some basic operations we define). -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
On Sat, Jul 16, 2016 at 3:48 PM, Herczeg Zsoltwrote: > I would like to discuss an old topic from 2006. I understand it was > already discussed. The only reason i'm sending this e-mail is to talk > about a possible solution which didn't show up on this list before. You mention the 2006 discussion, but I wonder if you've read the more recent discussion from April on the subject. > I think we all understand that SHA-1 is broken. It still works perfect > as a storage key, but it's not cryptographically secure anymore. Git > is not moving away from SHA-1 because it would break too many > projects, and cryptographic security is not needed but git if you have > your own repository. > > However I would like to show some big problems caused by SHA-1: > - Git signed tags and signed commits are cryptographically insecure, > they're useless at the moment. > - Git Torrent (https://github.com/cjb/GitTorrent) is also > cryptographically broken, however it would be an awesome experiment. > - Linus said: "You only need to know the SHA-1 of the top of your > tree, and if you know that, you can trust your tree." That's not true > anymore. You have to trust your computer, you servers, your git > provider in a way that no-one can maliciously modify your data. In particular, as far as I know and as Theodore Ts'o's post describes better than I could[1], you seem to be confusing preimage attacks with collision attacks, and then concluding that because SHA1 is vulnerable to collision attacks that use-cases that would need a preimage attack to be compromised (which as far is I can tell, includes all your examples) are also "broken". 1. http://thread.gmane.org/gmane.comp.version-control.git/291305/focus=291511 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
Hi Johannes, >> My point is not to throw out old hashes and break signatures. My point >> is to convert the data storage, and use mapping to resolve problems >> with those old hashes and signatures. > > If you convert the data storage, then the SHA-1s listed in the commit > objects will have to be rewritten, and then the GPG signature will not > match anymore. > > Call e.g. `git cat-file commit 44cc742a8ca17b9c279be4cc195a93a6ef7a320e` > to see the anatomy of a gpg-signed commit object. > Yes and no. That's the reason you need the two-way lookup table. If you need to verify a commit which was signed as SHA-1, you must use the lookup table in reverse. This way you can reconstruct the original commit structure, which than can be verified. Of course it's work to do so but you only need to develop the new signature verification algorithm. You save much more on the other side where you don't have to rework all the other algorithms to multi-hash. Another interesting point is that multi-hash storage, actively hurts signature security! (Duy just mentoined that while I'm writing.) A signed commit (or tag) is just as secure as the least secure hash it refers (directly or indirectly). Let's imagine that you make a new a commit, and there is on old file in the tree somewhere. That's a weak point: cause it has SHA-1 hash, someone can replace it (and thus change your commits content. I would clearly mark any signature wether it's SHA-1 or SHA2 (or anything else) based, and strictly allow that hash in all the trees and objects while verifying that commit. If it's not the same hash-type as the storage-key, than use the lookup table for conversion before check. (This has some interesting side-effects, but it's all about good implementation). -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
On Mon, Jul 18, 2016 at 5:57 PM, Johannes Schindelinwrote: > Hi Zsolt, > > On Mon, 18 Jul 2016, Herczeg Zsolt wrote: > >> >> I think converting is a much better option. Use a single-hash >> >> storage, and convert everything to that on import/clone/pull. >> > >> > That ignores two very important issues that I already had mentioned: >> >> That's not true. If you double-check the next part of my message, you I >> just showed that an automatic two-way mapping could solve these >> problems! (I even give briefs explanation how to handle referencing and >> signature verification in those cases.) >> >> My point is not to throw out old hashes and break signatures. My point >> is to convert the data storage, and use mapping to resolve problems >> with those old hashes and signatures. > > If you convert the data storage, then the SHA-1s listed in the commit > objects will have to be rewritten, and then the GPG signature will not > match anymore. But we can recreate SHA-1 from the same content and verify GPG, right? I know it's super expensive, but it feels safer to not carry SHA-1 around when it's not secure anymore (I recall something about exploiting the weakest link when you have both sha1 and sha256 in the object content). Rehashing would be done locally and is better controlled. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
Hi Zsolt, On Mon, 18 Jul 2016, Herczeg Zsolt wrote: > >> I think converting is a much better option. Use a single-hash > >> storage, and convert everything to that on import/clone/pull. > > > > That ignores two very important issues that I already had mentioned: > > That's not true. If you double-check the next part of my message, you I > just showed that an automatic two-way mapping could solve these > problems! (I even give briefs explanation how to handle referencing and > signature verification in those cases.) > > My point is not to throw out old hashes and break signatures. My point > is to convert the data storage, and use mapping to resolve problems > with those old hashes and signatures. If you convert the data storage, then the SHA-1s listed in the commit objects will have to be rewritten, and then the GPG signature will not match anymore. Call e.g. `git cat-file commit 44cc742a8ca17b9c279be4cc195a93a6ef7a320e` to see the anatomy of a gpg-signed commit object. Ciao, Johannes -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
>> I think converting is a much better option. Use a single-hash storage, and >> convert everything to that on import/clone/pull. > > That ignores two very important issues that I already had mentioned: That's not true. If you double-check the next part of my message, you I just showed that an automatic two-way mapping could solve these problems! (I even give briefs explanation how to handle referencing and signature verification in those cases.) My point is not to throw out old hashes and break signatures. My point is to convert the data storage, and use mapping to resolve problems with those old hashes and signatures. A single-hash data storage is obviously way easier to handle, than a multi-hash mass. (See Linus's old e-mail: multiple hashes [=meaning database keys] for the same content is a complete nonsense in git-speak) > The "convert everything" strategy also ignores the problem of interacting > with servers and collaborators. Think of hosting repositories, > rediscovering forgotten work trees, and of the "D" in DSCM. That's not an issue when we're working with a single repository. It's reasonable to ask for all git clients of the same repository, to support the same hash. Yes, you have the need to configure the hash algo on a per-repository basis but that's all. For importing and co-working between different repositories, it's a bit harder, problem, but it's possible to handle the conversions correctly. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
Hi Zsolt, On Mon, 18 Jul 2016, Herczeg Zsolt wrote: > I think converting is a much better option. Use a single-hash storage, and > convert everything to that on import/clone/pull. That ignores two very important issues that I already had mentioned: - existing references, both in-repository, e.g. in commit messages referring to earlier commits, as well as out-of-repository, e.g. referring to commits in mails, blog posts, etc - GPG-signed commits Those issues cannot just be hand-waved away. The "convert everything" strategy also ignores the problem of interacting with servers and collaborators. Think of hosting repositories, rediscovering forgotten work trees, and of the "D" in DSCM. Ciao, Johannes -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
Hi Brian, On Sun, 17 Jul 2016, brian m. carlson wrote: > On Sun, Jul 17, 2016 at 10:01:38AM +0200, Johannes Schindelin wrote: > > Out of curiosity: have you considered something like padding the SHA-1s > > with, say 0xa1, to the size of the new hash and using that padding to > > distinguish between old vs new hash? > > I'm going to end up having to do something similar because of the issue > of submodules. Submodules may still be SHA-1, while the main repo may > be a newer hash. I was going to zero-pad, however. I thought about zero-padding, but there are plenty of is_null_sha1()/is_null_oid() calls around. Of course, I assumed left-padding. But you may have thought of right-padding instead? That would make short name handling much easier, too. FWIW it never crossed my mind to allow different same-sized hash algorithms. So I never thought we'd need a way to distinguish, say, BLAKE2b-256 from SHA-256. Is there a good reason to add the maintenance burden of several 256-bit hash algorithms, apart from speed (which in my mind should decide which one to use, always, rather than letting the user choose)? It would also complicate transport even further, let alone subtree merges from differently-hashed repositories. Ciao, Dscho -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Fwd: Git and SHA-1 security (again)
Do you think the multi-hash approach worth the added complexity? It'll break a lot of things. I mean almost everything. All git algorithms rely on the "same hash => same content" "same content => same hash" statements. I think converting is a much better option. Use a single-hash storage, and convert everything to that on import/clone/pull. I would only introduce a a two-way mapping table (old-hash <=> new-hash) to help the users. In the normal workflow, everything can go with new-hashes only. That leaves most algorithms and code intact, and introduces a very few new cases: On import: If the imported data does not match your selected new-hash format, add it's hash to the lookup table, than convert it to your selected format, and handle it as such. If a user references an old hash: We can look-up that table forward, and find the referenced object in storage. We can handle it from now as normal. On an old signature verify: Look-up the table forward, find the object by knew key. Look-up backwards for all referenced objects, reconstruct the "old-format" and verify the hash on that. If you double-check your hashes as you build the mapping, you can even trust it, which makes the lookups and verifys very fast. You can introduce as many hash mapping tables as you want, so you can not only support old-to-new has transition, but there can be as many different hashes in the world as you want. Your only rule is "you reference your work in your current format", but you can look-up any references which was valid at the moment it was made. (There are slight issues with this aproach if we "convert than convert". As for example, when you import from A (sha1) repo to B (sha2) repo it's perfectl. But when you import the same commits from B to C (sha3), you might loose sha1 references. That could be considered normal if we wan't to keep-it-simple-stupid, only support a few hashes, always going forward. Or you may add an extra "list" field to objects, which could show what type of hashes you have to keep in lookup-tables for that particular object. Or, you can even include a list of old hashes in the object itself, which should make it to the lookup table on import.) Anyway, I think a single-hash storage, and the ability to hide any old hashes from most of the internal algorithms is a key point in making transition. If we want to provide multi-hash interface to users, than we should look for "wrapper" solutions, that translates multi-hash user needs to a single-hash backend. Zsolt -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
On Sun, Jul 17, 2016 at 12:23:49PM -0400, Theodore Ts'o wrote: > On Sun, Jul 17, 2016 at 03:42:34PM +, brian m. carlson wrote: > > As I said, I'm not planning on multiple hash support at first, but it > > doesn't appear impossible if we go this route. We might still have to > > rewrite objects, but we can verify signatures over the legacy SHA-1 > > objects by forcing them into the old-style object format. > > How hard would it be to make the on-disk format be multihash, even if > there is no support for anything other than a single hash, at least > for now? That way we won't have to rewrite the objects twice. Other than the amount of work to change reading from the on-disk format, nothing prevents us from doing that, although I would recommend storing the object database with the tag prefix if we do so (i.e., instead of .git/objects/17, writing .git/objects/111417). That future-proofs us for when we change the hash. I will say that the pack format will likely require some changes, because it assumes things are 4-byte aligned. It also assumes you can use the object ID in the mmaped pack directly (4-byte aligned), which you can no longer do. We have some cases where we cast that memory directly to struct object_id, which will no longer be valid, and even if we add the two prefix bytes to struct object_id, that doesn't guarantee that struct won't be aligned differently. We could require that the pack format have two NUL bytes before the hash, which would force it to be aligned. We'd still have to make the Git protocol negotiate the new extension and fail gracefully if the version is too old. We could do this by requiring a pack version 5, which would simply cause older Gits to report errors. It's a lot of work, and it's definitely a flag day. That's why I had planned to only do it with a new hash format: it would impact only people who were moving to the new hash. It also means that we get to work out any problems with the design at that point and not be committed to a design that might be inadequate. This is a place where I don't want to mess up. > Personally, so long as the newer versions of the tree are secured, I > wouldn't mind if the older commits stayed using SHA1 only. The newer > commits are the ones that are most important and security-critical > anyway. It seems like the main reason to rewrite all of the objects > is to simplify the initial rollout of a newer hash algorithm, no? The reason is that we can't have an unambiguous parse of the current objects if two hash algorithms are in use. tree objects don't use a hex encoding of hashes; they use a binary encoding. It's therefore possible to create an ambiguous tree representation. So when we look at a new hash, we need to provide an unambiguous way to know what hash is in use. The two choices are to either require all object use the new hash, or to extend the objects to include the hash. Until a couple days ago, I had planned to do the former. I had not even considered using a multihash approach due to the complexity. -- brian m. carlson / brian with sandals: Houston, Texas, US +1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
Re: Git and SHA-1 security (again)
On Sun, Jul 17, 2016 at 03:42:34PM +, brian m. carlson wrote: > As I said, I'm not planning on multiple hash support at first, but it > doesn't appear impossible if we go this route. We might still have to > rewrite objects, but we can verify signatures over the legacy SHA-1 > objects by forcing them into the old-style object format. How hard would it be to make the on-disk format be multihash, even if there is no support for anything other than a single hash, at least for now? That way we won't have to rewrite the objects twice. Personally, so long as the newer versions of the tree are secured, I wouldn't mind if the older commits stayed using SHA1 only. The newer commits are the ones that are most important and security-critical anyway. It seems like the main reason to rewrite all of the objects is to simplify the initial rollout of a newer hash algorithm, no? - Ted -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
On Sun, Jul 17, 2016 at 05:19:02PM +0200, Duy Nguyen wrote: > On Sun, Jul 17, 2016 at 4:21 PM, brian m. carlson >wrote: > > On Sun, Jul 17, 2016 at 10:01:38AM +0200, Johannes Schindelin wrote: > >> Out of curiosity: have you considered something like padding the SHA-1s > >> with, say 0xa1, to the size of the new hash and using that padding to > >> distinguish between old vs new hash? > > > > I'm going to end up having to do something similar because of the issue > > of submodules. Submodules may still be SHA-1, while the main repo may > > be a newer hash. I was going to zero-pad, however. I was also, at > > least at first, going to force a separate .git dir for those, to avoid > > having to try to store two separate types of objects in the same repo. > > If it's just the external hash representation, can we go with a prefix > to identify the hash algorithm? For example > sha256:1234... is SHA-256 while 1235... by default is SHA-1 (but we > could switch the default to SHA-256 via config file later SHA-1 is > dead and nobody wants to type sha256: every time). It catches > incorrect hash algorithm references. I'd make it such that the default is that of the repo. If the current repo is generating SHA-256, say, then 473a0f4 refers to the empty blob. If you want to refer to an SHA-1 object, then you write sha-1:e69de29. On disk, multihash[0] seems like the right way to go. We'd serialize references to the SHA-1 and SHA-256 empty blobs as 1114e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 and 1220473a0f4c3be8a93681a267e3b1e9a7dcda1185436fe141f7749120a303721813 respectively. This makes parsing significantly easier. On disk, we could write them into the object database as 1114e6/9de2… and 122047/3a0f…. We could implement the default hash algorithm as extensions.hash and the on-disk format (which would be a requirement for extensions.hash) as extensions.explicitHash. As I said, I'm not planning on multiple hash support at first, but it doesn't appear impossible if we go this route. We might still have to rewrite objects, but we can verify signatures over the legacy SHA-1 objects by forcing them into the old-style object format. [0] https://github.com/jbenet/multihash -- brian m. carlson / brian with sandals: Houston, Texas, US +1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
Re: Git and SHA-1 security (again)
On Sun, Jul 17, 2016 at 4:21 PM, brian m. carlsonwrote: > On Sun, Jul 17, 2016 at 10:01:38AM +0200, Johannes Schindelin wrote: >> Out of curiosity: have you considered something like padding the SHA-1s >> with, say 0xa1, to the size of the new hash and using that padding to >> distinguish between old vs new hash? > > I'm going to end up having to do something similar because of the issue > of submodules. Submodules may still be SHA-1, while the main repo may > be a newer hash. I was going to zero-pad, however. I was also, at > least at first, going to force a separate .git dir for those, to avoid > having to try to store two separate types of objects in the same repo. If it's just the external hash representation, can we go with a prefix to identify the hash algorithm? For example sha256:1234... is SHA-256 while 1235... by default is SHA-1 (but we could switch the default to SHA-256 via config file later SHA-1 is dead and nobody wants to type sha256: every time). It catches incorrect hash algorithm references. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
On Sun, Jul 17, 2016 at 10:01:38AM +0200, Johannes Schindelin wrote: > Out of curiosity: have you considered something like padding the SHA-1s > with, say 0xa1, to the size of the new hash and using that padding to > distinguish between old vs new hash? I'm going to end up having to do something similar because of the issue of submodules. Submodules may still be SHA-1, while the main repo may be a newer hash. I was going to zero-pad, however. I was also, at least at first, going to force a separate .git dir for those, to avoid having to try to store two separate types of objects in the same repo. The other limitation with this is that it isn't immediately obvious what hash is in use just because it has a certain length. For example, I plan on implementing SHA3-256, but it's also possible I might add BLAKE2b-256 for people for whom SHA3-256 is too slow. There's no way to distinguish between those two algorithms. Thus allowing multiple hashes in the same repo won't work without a format byte. What I might do, however, is add multihash-style format information to the on-disk format for non-SHA-1 repos. Then SHA-1 compatibility could come in a future iteration. That would be compatible with the existing refactor. > I guess that it would also possible to introduce an opt-in "legacy mapper" > which would generate a mapping locally of all objects' SHA-1 to whatever > new hash you choose. Generating it locally would side-step the security > issues of the SHA-1 algorithm. We would need to teach Git to pick that > mapping up if available and use it, of course. I think that might be easier. Considering the number of tests that hard-code object names, I might need that for the testsuite. -- brian m. carlson / brian with sandals: Houston, Texas, US +1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
Re: Git and SHA-1 security (again)
Hi Brian, On Sat, 16 Jul 2016, brian m. carlson wrote: > My current plan is not to implement side-by-side data, for a couple > reasons. I am as guilty as the next person to have use the "deafbee(This is my commit, 2007-08-21)" format to refer to older commits. So I do have concerns about rewriting history when switching to a new hash. I understand the technical challenges, of course. Out of curiosity: have you considered something like padding the SHA-1s with, say 0xa1, to the size of the new hash and using that padding to distinguish between old vs new hash? I guess that it would also possible to introduce an opt-in "legacy mapper" which would generate a mapping locally of all objects' SHA-1 to whatever new hash you choose. Generating it locally would side-step the security issues of the SHA-1 algorithm. We would need to teach Git to pick that mapping up if available and use it, of course. However, that latter solution would do nothing to address the problem of existing GPG-signed commits. Ciao, Dscho -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
On Sat, Jul 16, 2016 at 11:46:06PM +0200, Herczeg Zsolt wrote: > Dear Brian, > > Thank you for your response. It very good to hear that changing the > hash is on the git project's list. I haven't found any official > communication on that topic since 2006. There's been some recent discussion on the list about it. It is less on the Git project's list and more on my personal list. It's my hope that Junio and other contributors will decide to accept my patches when they are ready. Also, the plan is to keep SHA-1 available, probably as the default, for backwards compatibility. > I'll look into the contributions guide and the source codes, to check > if I can contribute to this transition. If you have any documentation > or other related info, please point me towards it. The major work at this point is turning instances of unsigned char [20] into struct object_id, as well as converting hardcoded 20 and 40 (and derivative values) to GIT_SHA1_RAWSZ and GIT_SHA1_HEXSZ. This work allows us to make as little code as possible know about the size of the hash, as well as generally being easier to maintain. You can look at the bc/cocci branch which was recently merged into next. (It doesn't exist independently outside of next, so you'll have to search through the history). That work is what in my branches is called object-id-part4. I'm currently working on getting to the point of converting get_tree_entry to use struct object_id, which is what will become my object-id-part5. I recommend if you're planning on doing some of this work that you try to avoid areas which are under work by other developers, especially the refs code, which is undergoing massive changes. Other people will appreciate it. -- brian m. carlson / brian with sandals: Houston, Texas, US +1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
Re: Git and SHA-1 security (again)
Dear Brian, Thank you for your response. It very good to hear that changing the hash is on the git project's list. I haven't found any official communication on that topic since 2006. I'll look into the contributions guide and the source codes, to check if I can contribute to this transition. If you have any documentation or other related info, please point me towards it. Thanks, Zsolt Herczeg 2016-07-16 22:13 GMT+02:00 brian m. carlson: > On Sat, Jul 16, 2016 at 03:48:49PM +0200, Herczeg Zsolt wrote: >> But - and that's the main idea i'm writing here - changing the storage >> keys does not mean you should drop your old hashes out. If you change >> the git data structure in a way, that it can keep multiple hashes for >> the same "link" in each objects (trees, commits, etc) you can keep the >> old ones right next to the new one. If you want to look up the >> referenced object, you must use the newest hash - which is the key. >> But if you want to verify some old hash, it's still possible! Just >> look up the objects by the new key, remove all the newer generation >> keys, and verify the old hash on that. >> >> A storage structure like this would allow a very great flexibility: >> - You can change your hash algorithm in the future. If SHA-256 >> becomes broken, it's not a problem. Just re-hash the storage, and >> append the new hashes the git objects. >> - You can still verify your old hashes after a hash change - removing >> the new hashes from the objects before hashing should give you back >> the old objects, thus giving you the same hash as before. >> - That makes possible for signed tags, and commits to keep their >> validity after hash change! With a clever-enough new format, you can >> even keep the validity of current hashes and signs. (To be able to do >> that, you should be able to calculate back the current format from the >> new format.) >> >> Moving git forward to a format like this would solve the weak-key >> problem in git forever. You would be able to configure your key algo >> on a per repository basis, you - and git - can do the daily work on >> the newest hashes, while still carrying the old hashes and signatures, >> in case you ever want to verify them. That would allow repositories to >> gracefully change hashes in case they need to, and to only >> compatibility limitation is that you must use a new enough git to >> understand the new storage format. >> >> What are your thoughts on this approach? Will git ever reach a release >> with exchangeable hash algorithm? Or should someone look for >> alternatives if there's a need for cryptographic security? > > I'm working on adding new hash algorithm support in Git. However, it > requires a significant refactor of the code base. My current plan is > not to implement side-by-side data, for a couple reasons. > > One is that it requires significantly more work to implement and > complicates the code. It's also incompatible with all the refactoring > I've done already. > > The second is that it requires that Git have the ability to store > multiple hashes at once, which is very expensive in terms of memory. > Moving from a 160-bit hash to a 256-bit hash (my current plan is > SHA3-256) requires 1.6× the memory. Storing both requires 2.6× the > memory. If you add a third hash, it requires even more. Memory is > often a constraint with using Git. > > The current plan is to use git-fast-import and git-fast-export to handle > that conversion process, and then maybe provide wrappers to make it more > transparent. > > Currently the process of the refactor is ongoing, but it is a free time > activity for me. > > If you'd like to follow the progress roughly, you can do so by checking > the output of the following commands: > > git grep 'unsigned char.*20' | wc -l > git grep 'struct object_id' | wc -l > > You are also welcome to contribute, of course. > -- > brian m. carlson / brian with sandals: Houston, Texas, US > +1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only > OpenPGP: https://keybase.io/bk2204 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git and SHA-1 security (again)
On Sat, Jul 16, 2016 at 03:48:49PM +0200, Herczeg Zsolt wrote: > But - and that's the main idea i'm writing here - changing the storage > keys does not mean you should drop your old hashes out. If you change > the git data structure in a way, that it can keep multiple hashes for > the same "link" in each objects (trees, commits, etc) you can keep the > old ones right next to the new one. If you want to look up the > referenced object, you must use the newest hash - which is the key. > But if you want to verify some old hash, it's still possible! Just > look up the objects by the new key, remove all the newer generation > keys, and verify the old hash on that. > > A storage structure like this would allow a very great flexibility: > - You can change your hash algorithm in the future. If SHA-256 > becomes broken, it's not a problem. Just re-hash the storage, and > append the new hashes the git objects. > - You can still verify your old hashes after a hash change - removing > the new hashes from the objects before hashing should give you back > the old objects, thus giving you the same hash as before. > - That makes possible for signed tags, and commits to keep their > validity after hash change! With a clever-enough new format, you can > even keep the validity of current hashes and signs. (To be able to do > that, you should be able to calculate back the current format from the > new format.) > > Moving git forward to a format like this would solve the weak-key > problem in git forever. You would be able to configure your key algo > on a per repository basis, you - and git - can do the daily work on > the newest hashes, while still carrying the old hashes and signatures, > in case you ever want to verify them. That would allow repositories to > gracefully change hashes in case they need to, and to only > compatibility limitation is that you must use a new enough git to > understand the new storage format. > > What are your thoughts on this approach? Will git ever reach a release > with exchangeable hash algorithm? Or should someone look for > alternatives if there's a need for cryptographic security? I'm working on adding new hash algorithm support in Git. However, it requires a significant refactor of the code base. My current plan is not to implement side-by-side data, for a couple reasons. One is that it requires significantly more work to implement and complicates the code. It's also incompatible with all the refactoring I've done already. The second is that it requires that Git have the ability to store multiple hashes at once, which is very expensive in terms of memory. Moving from a 160-bit hash to a 256-bit hash (my current plan is SHA3-256) requires 1.6× the memory. Storing both requires 2.6× the memory. If you add a third hash, it requires even more. Memory is often a constraint with using Git. The current plan is to use git-fast-import and git-fast-export to handle that conversion process, and then maybe provide wrappers to make it more transparent. Currently the process of the refactor is ongoing, but it is a free time activity for me. If you'd like to follow the progress roughly, you can do so by checking the output of the following commands: git grep 'unsigned char.*20' | wc -l git grep 'struct object_id' | wc -l You are also welcome to contribute, of course. -- brian m. carlson / brian with sandals: Houston, Texas, US +1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
Git and SHA-1 security (again)
Dear List Members, Git Developers, I would like to discuss an old topic from 2006. I understand it was already discussed. The only reason i'm sending this e-mail is to talk about a possible solution which didn't show up on this list before. I think we all understand that SHA-1 is broken. It still works perfect as a storage key, but it's not cryptographically secure anymore. Git is not moving away from SHA-1 because it would break too many projects, and cryptographic security is not needed but git if you have your own repository. However I would like to show some big problems caused by SHA-1: - Git signed tags and signed commits are cryptographically insecure, they're useless at the moment. - Git Torrent (https://github.com/cjb/GitTorrent) is also cryptographically broken, however it would be an awesome experiment. - Linus said: "You only need to know the SHA-1 of the top of your tree, and if you know that, you can trust your tree." That's not true anymore. You have to trust your computer, you servers, your git provider in a way that no-one can maliciously modify your data. I understand that git is perfect for a work flow, where you have your very own repository and you double-check any commits or diffs you accepting to it. But that's not everybody's work flow. For example: if I want to blindly trust my college, I could just include all commits he signed without review. Currently I can't do that. There are workarounds of course: signing the e-mail he sends me, or signing the entire git repository's tarball, etc... But that's not the right way to do things. As a final thought on this, I would like to say: Git is a great tool, but it can be so much better with a safe hash. I would like to propose a solution for changing git's hash algorithm: It would be a breaking change, bit I think it can be done pretty painless. (If you read the discussion back in 2006 the problems of moving are clear.) In git, every data has to have one and only one key - so a hybrid hash is a no-go. That means changing hash algo involves re-hashing every data in a git repository, but it's not that bad. On a git clone, we actually re-hash everything to check integrity. Changing all the keys shouldn't be worth than that. But - and that's the main idea i'm writing here - changing the storage keys does not mean you should drop your old hashes out. If you change the git data structure in a way, that it can keep multiple hashes for the same "link" in each objects (trees, commits, etc) you can keep the old ones right next to the new one. If you want to look up the referenced object, you must use the newest hash - which is the key. But if you want to verify some old hash, it's still possible! Just look up the objects by the new key, remove all the newer generation keys, and verify the old hash on that. A storage structure like this would allow a very great flexibility: - You can change your hash algorithm in the future. If SHA-256 becomes broken, it's not a problem. Just re-hash the storage, and append the new hashes the git objects. - You can still verify your old hashes after a hash change - removing the new hashes from the objects before hashing should give you back the old objects, thus giving you the same hash as before. - That makes possible for signed tags, and commits to keep their validity after hash change! With a clever-enough new format, you can even keep the validity of current hashes and signs. (To be able to do that, you should be able to calculate back the current format from the new format.) Moving git forward to a format like this would solve the weak-key problem in git forever. You would be able to configure your key algo on a per repository basis, you - and git - can do the daily work on the newest hashes, while still carrying the old hashes and signatures, in case you ever want to verify them. That would allow repositories to gracefully change hashes in case they need to, and to only compatibility limitation is that you must use a new enough git to understand the new storage format. What are your thoughts on this approach? Will git ever reach a release with exchangeable hash algorithm? Or should someone look for alternatives if there's a need for cryptographic security? Thank you for your time reading this. References: SHA-256 discussion in 2006: http://www.gelato.unsw.edu.au/archives/git/0608/26446.html Discussion about git signatures in 2014 https://www.mail-archive.com/git%40vger.kernel.org/msg61087.html Linus's talk on git https://www.youtube.com/watch?v=4XpnKHJAok8=56m20s Kind regards, Zsolt Herczeg -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html