RE: git bundle format
Left off a citation to an old thread. -Original Message- From: Pyeron, Jason J CTR (US) Sent: Monday, November 26, 2012 2:25 PM I may need to be nudged in a better direction, but please try to understand my intentions. I am facing a situation where I would like to use git bundle but at the same time inspect the contents to prevent a spillage[1]. Given we have a public repository which was cloned on to a secret development repository. Now the developers do some work which should not be sensitive in any way and commit and push it to the secret repository. Now they want to release it out to the public. The current process is to review the text files to ensure that there is no secret sauce in there and then approve its release. This current process ignores the change tracking and all non-content is lost. In this situation we should assume that the bundle does not have any content which is already in the public repository, that is it has the minimum data to make it pass a git bundle verify from the public repositories point of view. We would then take the bundle and pipe it though the git-bundle2text program which would result in a human inspectable format [3] as opposed to the packed format[2]. The security reviewer would then see all the information being released and with the help of the public repository see how the data changes the repository. Am I barking up the right tree? 1: http://en.wikipedia.org/wiki/Spillage_of_Classified_Information 2: http://git-scm.com/book/ch9-4.html 3: http://git.661346.n2.nabble.com/How-to-extract-files-out-of-a-quot-git-bundle-quot-no-matter-what-td1679188.html smime.p7s Description: S/MIME cryptographic signature
Re: git bundle format
Pyeron, Jason J CTR (US) jason.j.pyeron@mail.mil writes: In this situation we should assume that the bundle does not have any content which is already in the public repository, that is it has the minimum data to make it pass a git bundle verify from the public repositories point of view. We would then take the bundle and pipe it though the git-bundle2text program which would result in a human inspectable format as opposed to the packed format[2]. The security reviewer would then see all the information being released and with the help of the public repository see how the data changes the repository. The bundle file is a thinly wrapped packfile, with extra information that tells what objects in the bundle are the tips of histories and what objects the repository the bundle gets unbundled has to have. So your git-bundle2text would likely to involve fetching from the bundle and inspecting the resulting history and the working tree files. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: git bundle format
-Original Message- From: Felipe Contreras Sent: Monday, November 26, 2012 3:20 PM On Mon, Nov 26, 2012 at 8:24 PM, Pyeron, Jason J CTR (US) jason.j.pyeron@mail.mil wrote: I may need to be nudged in a better direction, but please try to understand my intentions. I am facing a situation where I would like to use git bundle but at the same time inspect the contents to prevent a spillage[1]. snip/ Am I barking up the right tree? Have you tried 'git fast-export'? The output is definitely not human inspectable, but should be relatively easy to parse to generate such a format. And instead of 'git bundle unbundle' you could use 'git fast-import'. or simply do the conversion in your script. No. But I am going to read up on it today. It clearly says You can use it as a human-readable bundle replacement[4]. My initial question is does it ever use deltas? The repositories I just tested it on only seem to output full blobs (which is really nice from this use case point of view). -Jason 4: http://www.kernel.org/pub/software/scm/git/docs/git-fast-export.html smime.p7s Description: S/MIME cryptographic signature
RE: git bundle format
-Original Message- From: Junio C Hamano Sent: Monday, November 26, 2012 3:38 PM Pyeron, Jason J CTR (US) writes: In this situation we should assume that the bundle does not have any content which is already in the public repository, that is it has the minimum data to make it pass a git bundle verify from the public repositories point of view. We would then take the bundle and pipe it though the git-bundle2text program which would result in a human inspectable format as opposed to the packed format[2]. The security reviewer would then see all the information being released and with the *** Assumed that the inspector had a copy of the original public repo help of the public repository see how the data changes the repository. The bundle file is a thinly wrapped packfile, with extra information that tells what objects in the bundle are the tips of histories and what objects the repository the bundle gets unbundled has to have. So your git-bundle2text would likely to involve fetching from the bundle and inspecting the resulting history and the working tree files. Yea, I knew the inspection tool was going to get messy. -Jason smime.p7s Description: S/MIME cryptographic signature
RE: git bundle format [OT]
-Original Message- From: Stephen Bash Sent: Monday, November 26, 2012 3:56 PM - Original Message - From: Jason J CTR Pyeron (US) Sent: Monday, November 26, 2012 2:24:54 PM Subject: git bundle format I am facing a situation where I would like to use git bundle but at the same time inspect the contents to prevent a spillage[1]. As someone who faced a similar situation in a previous life, I'll offer my $0.02, but I'm certainly not the technical expert here. Kind of what I am looking for as a side effect. Given we have a public repository which was cloned on to a secret development repository. Now the developers do some work which should not be sensitive in any way and commit and push it to the secret repository. Now they want to release it out to the public. The current process is to review the text files to ensure that there is no secret sauce in there and then approve its release. This current process ignores the change tracking and all non-content is lost. In this situation we should assume that the bundle does not have any content which is already in the public repository, that is it has the minimum data to make it pass a git bundle verify from the public repositories point of view. We would then take the bundle and pipe it though the git-bundle2text program which would result in a human inspectable format as opposed to the packed format[2]. The security reviewer would then see all the information being released and with the help of the public repository see how the data changes the repository. Am I barking up the right tree? First, a shot out of left field: how about a patch based workflow? (similar to the mailing list, just replace email with sneakernet) Patches are plain text and simple to review (preferable to an opaque binary format?). This is to only address the accidental development on a high side. Using this or any process should come with shame or punishment for wasting resources/time by not developing on a low side to start with. But accepting reality there will be times where code and its metadata (commit logs, etc) will be created on a high side and should be brought back to the low side. Second, thinking about your proposed bundle-based workflow I have two questions I'd have to answer to be comfortable with the solution: 1) Does the binary bundle contain any sensitive information? Potentially, hence the review. If the reviewer cannot prove the data he is looking at then the presumption is yes. 2) Do the diffs applied to public repo contain any sensitive data? That is a great question. Can the change of code while neither the original or the resultant be secret while the change imply or demonstrate the secret. I think the answer is yes. Question 1 seems tricky to someone who knows *nothing* about the bundle format (e.g. me). Maybe some form of bundle2text can be vetted enough that everyone involved believes that there is no other information traveling with the bundle (if so, you're golden). Here I have to trust other experts. On the flip side, even if the bundle itself is polluted (or considered to be lacking proof to the contrary), if (2) is considered safe, the patching of the public repo could potentially be done on a sacrificial hard drive before pushing. The logistics are well established and here and now is not a place to go in to that. But the above is the crux of what I am trying to get at. Question 2 is relatively straight forward and lead me to the patch idea. I would: - Bundle the public repository - Init a new repo in the secure space from the public bundle - Fetch from the to-be-sanitized bundle into the new repo - Examine commits (diffs) introduced by branches in the to-be- sanitized bundle - Perhaps get a list of all the objects in the to-be-sanitized bundle and do a git-cat-file on each of them (if the bundle is assembled correctly it shouldn't have any unreachable objects...). This step may be extraneous after the previous. Here we would be missing the metadata that goes along with the commit. Especially the SHA sums. Thanks. -Jason smime.p7s Description: S/MIME cryptographic signature
Re: git bundle format [OT]
- Original Message - From: Jason J CTR Pyeron (US) jason.j.pyeron@mail.mil Sent: Monday, November 26, 2012 4:06:59 PM Subject: RE: git bundle format [OT] First, a shot out of left field: how about a patch based workflow? (similar to the mailing list, just replace email with sneakernet) Patches are plain text and simple to review (preferable to an opaque binary format?). This is to only address the accidental development on a high side. Using this or any process should come with shame or punishment for wasting resources/time by not developing on a low side to start with. Ah, if only more of those I (previously) worked with thought as you do :) But accepting reality there will be times where code and its metadata (commit logs, etc) will be created on a high side and should be brought back to the low side. Using git format-patch and git am it's possible to retain the commit messages (and other associated metadata). But again, I'm not the expert on this :) I've made it work a few times to test patches from this list, but so far I've avoided serious integration into the mailing list workflow. 2) Do the diffs applied to public repo contain any sensitive data? That is a great question. Can the change of code while neither the original or the resultant be secret while the change imply or demonstrate the secret. I think the answer is yes. In actual fact I was thinking about the simple case where the result included an Eek! 3.1415926 cannot show up in this code! (sometimes that's easier to see in a diff than a full text blob). Obviously the first line of defense should catch such mistakes. But yes, your point is also a good one. I'd be hard pressed to argue that a particular series of commits leaks information on their own, but they can certainly corroborate other available information. Question 2 is relatively straight forward and lead me to the patch idea. I would: - Bundle the public repository - Init a new repo in the secure space from the public bundle - Fetch from the to-be-sanitized bundle into the new repo - Examine commits (diffs) introduced by branches in the to-be- sanitized bundle - Perhaps get a list of all the objects in the to-be-sanitized bundle and do a git-cat-file on each of them (if the bundle is assembled correctly it shouldn't have any unreachable objects...). This step may be extraneous after the previous. Here we would be missing the metadata that goes along with the commit. Especially the SHA sums. Ah sorry, I guess I wasn't complete. Once that process has been done on the high side one has to go back to question 1 and see if it's safe to move the bundle out to repeat the process on the low side. Stephen -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html