Re: [git-users] Synchronizing air gapped git repositories using bundles

2017-02-02 Thread Philip Oakley
Hi Lowell,

An aside question: If all the machines were co-located, would they still be 
air-gapped those few feet between them, or would they be linked? I ask because 
it helps clarify the way you would serve between the machines. 

The bundle is simply a compact version of the on-wire transfer, where the 
negotiation of 'wants', and 'haves'  (which is normally on-the-wire) is done by 
the user. When you fetch from the bundle it is just like fetching from a 
remote, and ony the requested parts (the refspec) is extracted from the bundle.

It may help clarify discussions about what is being transferred.

My example would be:

git bundle create mc1-20161201-20170202.bndl --ALL --since="2016.12.01" master

so the bundle filename contains the full details of what where when...

I've not come up against a 'verify' scenario where the bundle has a partial 
orphan branch from the recipients view, so that if the refspec wanted every ref 
then that one would be metaphorically shallow (and hence not be connected).

The bundles are usually quite small and compact (relative to say a zip file of 
every revision - e.g. right click the top directory and sent to a Compressed 
(zipped) folder - been there, done that), so it shouldn't be an issue (try the 
alternate to compare!)

"Bundle stacking" was the second one as you thought - fetch from Nov16.bndl, 
then Dec16.bndl, then Jan17.bndl, etc.

Using the --branches option is one way to go, but you may get later 
synchronisation problems with the other branches if you don't get sufficient 
depth into the bundle - i.e. getting the right since date, or tag, or merge 
base "master..sidebranch" (two dot notation).


The other point you probably already know is that you can practice this all on 
a single machine! Just init a test repo, add a few commits and branches (with 
faked commit dates etc), then bundle parts of it, now init another repo and try 
fetching fom that bundle, back to the first repo and get another partial 
bundle, and transfer that, etc. etc. This means you can test out all the issues 
easily from the comfort of the chair!

The final bit is that you don't (shouldn't) get any "result in unreferenced 
revisions in the destination repositories", all those revisions fetched from 
the bundle will be linked to a known ref. There may be revs in the bundle that 
aren't brought in, which isn't quite the same question.

Hope that bit of rambling helps.

Philip
PS I understand enough of how I think git works to get me into trouble  
also.
  - Original Message - 
  From: Lowell Alleman 
  To: Git for human beings 
  Cc: philipoak...@iee.org 
  Sent: Thursday, February 02, 2017 6:23 PM
  Subject: Re: [git-users] Synchronizing air gapped git repositories using 
bundles


  Philip,


  Thanks for the reply!


  The reason I'm looking at using a script is mostly for standardization.  So 
that the file names are consistent and to capture some bundle metadata 
necessary for the file transfer process (file name, size, checksum, ...)  We 
capture some metadata about the bundle such as: Revision count and some delta 
details (specifically, the the output of  "git diff --stat" and "git log 
--stat").  This helps answer the question about what is being transferred in a 
given bundle.  (And to the best of my knowledge, there's no way to get this 
info from the bundle file itself.)   Secondary reasons for the script comes 
down to mixed levels of user fluency with git, a general mandate to automate 
tasks, and, currently, the script is responsible for tracking the "last export 
point" via tags.  (Oh, and I found it easy to forget to include refs, like 
refs/heads/master in the bundle, and then importing became super painful on the 
other side.)


  I was trying to stick with specific revisions and avoid overlapping exports, 
for a few reasons:  (1) so that we could build a change "manifest" to go along 
with the bundle that would only include what's "new", (2) so that if we need to 
release multiple fixes in a short period of time, like more that one a day, we 
don't end up just copying the same stuff around over and over again (we are 
looking at a scheduled monthly sync up to keep divergence from becoming 
significant, but we may need to sync up multiple times a day on rare 
occasions), and (3) just to generally minimize file transfer size (not a huge 
deal, talking a few MBs).


  I fully agree on the file transfer rejections point.  Hasn't happened yet.  
Policy work is ongoing.



  When you refer to bundle stacking.  Is there a way to specify multiple 
locations to pull from at once, or are you just referring to the fact that you 
can sequentially pull from multiple bundle files.  (I'm assuming the second.)


  Yes, "recording what has been transferred" is exactly the core issue I'm 
facing.  I've noted above some of the reasons I was trying to use a tighter 
re

Re: [git-users] Synchronizing air gapped git repositories using bundles

2017-02-02 Thread Lowell Alleman
at in the future (as the total 
repository size is becoming more significant.)

So I guess the more fundamental question is this:  Is it better to use the 
macro approach (and risk pushing around lots of extra stuff) which could 
result in unreferenced revisions in the destination repositories, or is it 
better to use the micro-mode and be strategic about just the specific 
branches/revisions I want to synchronize.

Right now I've been grabbing just the branch I want from the bundle, and 
normally that's all it includes anyways.  (e.g.  git checkout mirror-REPO1; 
git fetch my.repo1.bundle master").  And I'm now wondering if in doing so, 
I could ultimately end up missing necessary revisions that would be 
imported if I used "git fetch".  Is that possible?


Okay, that's enough rambling.   Thanks again for any help you can provide. 
  As you may have gathered, I've been fighting with this process for quite 
some time now.  I probably have an unbalanced knowledge of git that's 
currently working against me.  I understand enough of how I think git works 
to get me into trouble, but not enough to get back out of it. ;-)  And I'm 
working with a rather old version of git 1.7.

Thanks in advance!


On Wednesday, February 1, 2017 at 6:47:33 PM UTC-5, Philip Oakley wrote:
>
> Hi Lowell,
>  
> You can use all of the options in the rev-list for selecting which commits 
> are in the bundle (which is just a thin wrapper around the pack file that 
> would be sent over the wire). 
>  
> You can include more commits in the bundle than you need [1], that is, 
> have an overlap. One option is simply to use the --since= option as a 
> way of ensuring you go far enough back in history. Plus the --all to get 
> *everything* after tha date [2].
>  
> I suspect that part of the problem is finding a way of recording what has 
> been transferred in the three way transfer - I'd suggest it's just as easy 
> to use a small note book (or formal admin log) for recording the date of 
> transfers and use that to guide the bundle creation.
>  
> Plus you can always stack up the bundles, so can fetch first from the 
> oldest bundle, and then from the newer bundle, etc. 
>  
> I see you have the typical 'transfer review' process for the bundle 
> exchange (implies a certain kind of environment ;-) - does it ever 
> fail/reject the transfer? or is it simply making sure it is what you 
> thought it was and have recorded the transfer correctly (I expect it's 
> actually the latter). If you get true rejection you have more issues.
>  
> I don't really think you need a special 'script' (beyond satisfying some 
> edict), as the bundle and fetch commands should be sufficient for doing the 
> transfer.
>  
> Probably the biggest issue at that point is having a standardised naming 
> convention for the bundle file, e.g. server--.bndl so 
> that you know where it came from, where the --since cut point was, and when 
> it was created.
>  
> Then it becomes fairly easy to import/fetch from the bundle acording to 
> the carefully mandated process. 
>  
> Philip
>  
> [1] https://git-scm.com/docs/git-bundle
> It is okay to err on the side of caution, causing the bundle file to 
> contain objects already in the destination, as these are ignored when 
> unpacking at the destination.
> [2] 
> http://stackoverflow.com/questions/11792671/how-to-git-bundle-a-complete-repo
>
> - Original Message - 
> *From:* Lowell Alleman  
> *To:* Git for human beings  
> *Sent:* Wednesday, February 01, 2017 9:58 PM
> *Subject:* [git-users] Synchronizing air gapped git repositories using 
> bundles
>
> I have 3 separate air-gapped git repositories (hosted on local GitHub 
> enterprise) that I'm trying to keep in sync.   Currently, I'm using "git 
> bundle" to push revisions back and forth, which worked fairly well with 
> just 2 repositories, but I'm struggling a bit since the 3rd (and final) 
> repository has been added to the mix.  I was using a single tag to track 
> the point of last export as noted in the "git bundle" docs, but I'm 
> struggling to make that scale with 2+ total repositories. 
>
> In terms of information flow, we've deemed one of the repositories as 
> "primary" and the other two as "secondary" repositories.  So in a sense we 
> are using the "primary" repository like a development and merging area so 
> that all changes go through the primary repository and trickle down to the 
> secondary repositories.  Changes are always pushed upstream to primary, and 
> then synced down to the other secondary repository. 
>
> Please note that our use of git is more like a "versioned file system" 
> than the typical developer use case.  I go on to explain that a bit more 
>

Re: [git-users] Synchronizing air gapped git repositories using bundles

2017-02-01 Thread Philip Oakley
Hi Lowell,

You can use all of the options in the rev-list for selecting which commits are 
in the bundle (which is just a thin wrapper around the pack file that would be 
sent over the wire). 

You can include more commits in the bundle than you need [1], that is, have an 
overlap. One option is simply to use the --since= option as a way of 
ensuring you go far enough back in history. Plus the --all to get everything 
after tha date [2].

I suspect that part of the problem is finding a way of recording what has been 
transferred in the three way transfer - I'd suggest it's just as easy to use a 
small note book (or formal admin log) for recording the date of transfers and 
use that to guide the bundle creation.

Plus you can always stack up the bundles, so can fetch first from the oldest 
bundle, and then from the newer bundle, etc. 

I see you have the typical 'transfer review' process for the bundle exchange 
(implies a certain kind of environment ;-) - does it ever fail/reject the 
transfer? or is it simply making sure it is what you thought it was and have 
recorded the transfer correctly (I expect it's actually the latter). If you get 
true rejection you have more issues.

I don't really think you need a special 'script' (beyond satisfying some 
edict), as the bundle and fetch commands should be sufficient for doing the 
transfer.

Probably the biggest issue at that point is having a standardised naming 
convention for the bundle file, e.g. server--.bndl so 
that you know where it came from, where the --since cut point was, and when it 
was created.

Then it becomes fairly easy to import/fetch from the bundle acording to the 
carefully mandated process. 

Philip

[1] https://git-scm.com/docs/git-bundle
It is okay to err on the side of caution, causing the bundle file to contain 
objects already in the destination, as these are ignored when unpacking at the 
destination.
[2] 
http://stackoverflow.com/questions/11792671/how-to-git-bundle-a-complete-repo
  - Original Message - 
  From: Lowell Alleman 
  To: Git for human beings 
  Sent: Wednesday, February 01, 2017 9:58 PM
  Subject: [git-users] Synchronizing air gapped git repositories using bundles


  I have 3 separate air-gapped git repositories (hosted on local GitHub 
enterprise) that I'm trying to keep in sync.   Currently, I'm using "git 
bundle" to push revisions back and forth, which worked fairly well with just 2 
repositories, but I'm struggling a bit since the 3rd (and final) repository has 
been added to the mix.  I was using a single tag to track the point of last 
export as noted in the "git bundle" docs, but I'm struggling to make that scale 
with 2+ total repositories. 


  In terms of information flow, we've deemed one of the repositories as 
"primary" and the other two as "secondary" repositories.  So in a sense we are 
using the "primary" repository like a development and merging area so that all 
changes go through the primary repository and trickle down to the secondary 
repositories.  Changes are always pushed upstream to primary, and then synced 
down to the other secondary repository. 


  Please note that our use of git is more like a "versioned file system" than 
the typical developer use case.  I go on to explain that a bit more later, but 
wanted to get to my main question before everyone gives up on reading this 
really long and complicated explanation of the mess I made. 


  Q:  Does anyone know of any existing scripts, documented methods, or best 
practices to follow when syncing a branch between multiple air-gapped 
repositories?


  How we are using git:  As noted above, this is NOT a typical 
development-centered use-case.  Branching is very infrequent, and most work is 
done on the "master" branch in each repository.  Unlike typical 
developer-centric approaches, each clone (working copy) ends up tied to a 
specific server, rather than a single developer.  So multiple users end up 
working in the same working copy and committing code from one place.  The team 
is small and the changes are infrequent enough that this works for us, despite 
the atypical and less-than-ideal use case.



  How we are using branches:   We treat each repository as if it has just one 
branch, a single "master".  However, because of the synchronization 
requirements, we create special purpose branches in each repository that 
essentially mirror the master branches of the other repositories.  So the 
primary repository has 2 mirrored branches, one for each of the secondary 
repositories.  And each secondary repository has a single mirrored branch that 
represents the primary (upstream) repository.  (By convention, we have agreed 
never to synchronize revisions directly between the two secondary 
repositories.)  Local changes are never applied to a mirrored repository 
branch, so that it should match the "master" branch of the mirrored reposito