RE: git bundle format

2012-11-26 Thread Pyeron, Jason J CTR (US)
Left off a citation to an old thread.

 -Original Message-
 From: Pyeron, Jason J CTR (US)
 Sent: Monday, November 26, 2012 2:25 PM
 
 I may need to be nudged in a better direction, but please try to
 understand my intentions.
 
 I am facing a situation where I would like to use git bundle but at the
 same time inspect the contents to prevent a spillage[1].
 
 Given we have a public repository which was cloned on to a secret
 development repository. Now the developers do some work which should
 not be sensitive in any way and commit and push it to the secret
 repository.
 
 Now they want to release it out to the public. The current process is
 to review the text files to ensure that there is no secret sauce in
 there and then approve its release. This current process ignores the
 change tracking and all non-content is lost.
 
 
 In this situation we should assume that the bundle does not have any
 content which is already in the public repository, that is it has the
 minimum data to make it pass a git bundle verify from the public
 repositories point of view. We would then take the bundle and pipe it
 though the git-bundle2text program which would result in a human
 inspectable format
[3]
 as opposed to the packed format[2]. The security
 reviewer would then see all the information being released and with the
 help of the public repository see how the data changes the repository.
 
 Am I barking up the right tree?
 
 
 1: http://en.wikipedia.org/wiki/Spillage_of_Classified_Information
 2: http://git-scm.com/book/ch9-4.html
3: 
http://git.661346.n2.nabble.com/How-to-extract-files-out-of-a-quot-git-bundle-quot-no-matter-what-td1679188.html


smime.p7s
Description: S/MIME cryptographic signature


Re: git bundle format

2012-11-26 Thread Junio C Hamano
Pyeron, Jason J CTR (US) jason.j.pyeron@mail.mil writes:

 In this situation we should assume that the bundle does not have
 any content which is already in the public repository, that is it
 has the minimum data to make it pass a git bundle verify from the
 public repositories point of view. We would then take the bundle
 and pipe it though the git-bundle2text program which would
 result in a human inspectable format as opposed to the packed
 format[2]. The security reviewer would then see all the
 information being released and with the help of the public
 repository see how the data changes the repository.

The bundle file is a thinly wrapped packfile, with extra information
that tells what objects in the bundle are the tips of histories and
what objects the repository the bundle gets unbundled has to have.
So your git-bundle2text would likely to involve fetching from the
bundle and inspecting the resulting history and the working tree
files.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: git bundle format

2012-11-26 Thread Pyeron, Jason J CTR (US)
 -Original Message-
 From: Felipe Contreras
 Sent: Monday, November 26, 2012 3:20 PM
 
 On Mon, Nov 26, 2012 at 8:24 PM, Pyeron, Jason J CTR (US)
 jason.j.pyeron@mail.mil wrote:
  I may need to be nudged in a better direction, but please try to
 understand my intentions.
 
  I am facing a situation where I would like to use git bundle but at
 the same time inspect the contents to prevent a spillage[1].
 
snip/
 
  Am I barking up the right tree?
 
 Have you tried 'git fast-export'? The output is definitely not human
 inspectable, but should be relatively easy to parse to generate such a
 format. And instead of 'git bundle unbundle' you could use 'git
 fast-import'. or simply do the conversion in your script.

No. But I am going to read up on it today. It clearly says You can use it as a 
human-readable bundle replacement[4]. My initial question is does it ever use 
deltas? The repositories I just tested it on only seem to output full blobs 
(which is really nice from this use case point of view).

-Jason


4: http://www.kernel.org/pub/software/scm/git/docs/git-fast-export.html


smime.p7s
Description: S/MIME cryptographic signature


RE: git bundle format

2012-11-26 Thread Pyeron, Jason J CTR (US)
 -Original Message-
 From: Junio C Hamano
 Sent: Monday, November 26, 2012 3:38 PM
 
 Pyeron, Jason J CTR (US) writes:
 
  In this situation we should assume that the bundle does not have
  any content which is already in the public repository, that is it
  has the minimum data to make it pass a git bundle verify from the
  public repositories point of view. We would then take the bundle
  and pipe it though the git-bundle2text program which would
  result in a human inspectable format as opposed to the packed
  format[2]. The security reviewer would then see all the
  information being released and with the 

*** Assumed that the inspector had a copy of the original public repo

  help of the public
  repository see how the data changes the repository.



 
 The bundle file is a thinly wrapped packfile, with extra information
 that tells what objects in the bundle are the tips of histories and
 what objects the repository the bundle gets unbundled has to have.
 So your git-bundle2text would likely to involve fetching from the
 bundle and inspecting the resulting history and the working tree
 files.

Yea, I knew the inspection tool was going to get messy.

-Jason


smime.p7s
Description: S/MIME cryptographic signature


RE: git bundle format [OT]

2012-11-26 Thread Pyeron, Jason J CTR (US)
 -Original Message-
 From: Stephen Bash
 Sent: Monday, November 26, 2012 3:56 PM
 
 - Original Message -
  From: Jason J CTR Pyeron (US) 
  Sent: Monday, November 26, 2012 2:24:54 PM
  Subject: git bundle format
 
  I am facing a situation where I would like to use git bundle but at
  the same time inspect the contents to prevent a spillage[1].
 
 As someone who faced a similar situation in a previous life, I'll offer
 my $0.02, but I'm certainly not the technical expert here.

Kind of what I am looking for as a side effect.

 
  Given we have a public repository which was cloned on to a secret
  development repository. Now the developers do some work which should
  not be sensitive in any way and commit and push it to the secret
  repository.
 
  Now they want to release it out to the public. The current process is
  to review the text files to ensure that there is no secret sauce
  in there and then approve its release. This current process ignores
  the change tracking and all non-content is lost.
 
  In this situation we should assume that the bundle does not have any
  content which is already in the public repository, that is it has
  the minimum data to make it pass a git bundle verify from the public
  repositories point of view. We would then take the bundle and pipe
  it though the git-bundle2text program which would result in a
  human inspectable format as opposed to the packed format[2]. The
  security reviewer would then see all the information being released
  and with the help of the public repository see how the data changes
  the repository.
 
  Am I barking up the right tree?
 
 First, a shot out of left field: how about a patch based workflow?
 (similar to the mailing list, just replace email with sneakernet)
 Patches are plain text and simple to review (preferable to an opaque
 binary format?).

This is to only address the accidental development on a high side. Using this 
or any process should come with shame or punishment for wasting resources/time 
by not developing on a low side to start with. But accepting reality there will 
be times where code and its metadata (commit logs, etc) will be created on a 
high side and should be brought back to the low side.


 Second, thinking about your proposed bundle-based workflow I have two
 questions I'd have to answer to be comfortable with the solution:
 
   1) Does the binary bundle contain any sensitive information?

Potentially, hence the review. If the reviewer cannot prove the data he is 
looking at then the presumption is yes.

   2) Do the diffs applied to public repo contain any sensitive data?

That is a great question. Can the change of code while neither the original or 
the resultant be secret while the change imply or demonstrate the secret. I 
think the answer is yes.

 
 Question 1 seems tricky to someone who knows *nothing* about the bundle
 format (e.g. me).  Maybe some form of bundle2text can be vetted enough
 that everyone involved believes that there is no other information
 traveling with the bundle (if so, you're golden).  Here I have to trust
 other experts.  On the flip side, even if the bundle itself is polluted
 (or considered to be lacking proof to the contrary), if (2) is
 considered safe, the patching of the public repo could potentially be
 done on a sacrificial hard drive before pushing.

The logistics are well established and here and now is not a place to go in to 
that. But the above is the crux of what I am trying to get at.
 
 
 Question 2 is relatively straight forward and lead me to the patch
 idea.  I would:
   - Bundle the public repository
   - Init a new repo in the secure space from the public bundle
   - Fetch from the to-be-sanitized bundle into the new repo
   - Examine commits (diffs) introduced by branches in the to-be-
 sanitized bundle
   - Perhaps get a list of all the objects in the to-be-sanitized bundle
 and do a git-cat-file on each of them (if the bundle is assembled
 correctly it shouldn't have any unreachable objects...).  This step may
 be extraneous after the previous.

Here we would be missing the metadata that goes along with the commit. 
Especially the SHA sums.

Thanks.

-Jason


smime.p7s
Description: S/MIME cryptographic signature


Re: git bundle format [OT]

2012-11-26 Thread Stephen Bash
- Original Message -
 From: Jason J CTR Pyeron (US) jason.j.pyeron@mail.mil
 Sent: Monday, November 26, 2012 4:06:59 PM
 Subject: RE: git bundle format [OT]
 
  First, a shot out of left field: how about a patch based workflow?
  (similar to the mailing list, just replace email with sneakernet)
  Patches are plain text and simple to review (preferable to an
  opaque binary format?).
 
 This is to only address the accidental development on a high side.
 Using this or any process should come with shame or punishment for
 wasting resources/time by not developing on a low side to start
 with.

Ah, if only more of those I (previously) worked with thought as you do :)

 But accepting reality there will be times where code and its
 metadata (commit logs, etc) will be created on a high side and
 should be brought back to the low side.

Using git format-patch and git am it's possible to retain the commit messages 
(and other associated metadata).  But again, I'm not the expert on this :)  
I've made it work a few times to test patches from this list, but so far I've 
avoided serious integration into the mailing list workflow.

2) Do the diffs applied to public repo contain any sensitive
data?
 
 That is a great question. Can the change of code while neither the
 original or the resultant be secret while the change imply or
 demonstrate the secret. I think the answer is yes.

In actual fact I was thinking about the simple case where the result included 
an Eek! 3.1415926 cannot show up in this code! (sometimes that's easier to 
see in a diff than a full text blob).  Obviously the first line of defense 
should catch such mistakes.  But yes, your point is also a good one.  I'd be 
hard pressed to argue that a particular series of commits leaks information on 
their own, but they can certainly corroborate other available information.

  Question 2 is relatively straight forward and lead me to the patch
  idea.  I would:
- Bundle the public repository
- Init a new repo in the secure space from the public bundle
- Fetch from the to-be-sanitized bundle into the new repo
- Examine commits (diffs) introduced by branches in the to-be-
sanitized bundle
- Perhaps get a list of all the objects in the to-be-sanitized
bundle and do a git-cat-file on each of them (if the bundle is
assembled correctly it shouldn't have any unreachable objects...).
This step may be extraneous after the previous.
 
 Here we would be missing the metadata that goes along with the
 commit. Especially the SHA sums.

Ah sorry, I guess I wasn't complete.  Once that process has been done on the 
high side one has to go back to question 1 and see if it's safe to move the 
bundle out to repeat the process on the low side. 
 
Stephen
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html