Re: [git-users] Synchronizing air gapped git repositories using bundles
Hi Lowell, An aside question: If all the machines were co-located, would they still be air-gapped those few feet between them, or would they be linked? I ask because it helps clarify the way you would serve between the machines. The bundle is simply a compact version of the on-wire transfer, where the negotiation of 'wants', and 'haves' (which is normally on-the-wire) is done by the user. When you fetch from the bundle it is just like fetching from a remote, and ony the requested parts (the refspec) is extracted from the bundle. It may help clarify discussions about what is being transferred. My example would be: git bundle create mc1-20161201-20170202.bndl --ALL --since="2016.12.01" master so the bundle filename contains the full details of what where when... I've not come up against a 'verify' scenario where the bundle has a partial orphan branch from the recipients view, so that if the refspec wanted every ref then that one would be metaphorically shallow (and hence not be connected). The bundles are usually quite small and compact (relative to say a zip file of every revision - e.g. right click the top directory and sent to a Compressed (zipped) folder - been there, done that), so it shouldn't be an issue (try the alternate to compare!) "Bundle stacking" was the second one as you thought - fetch from Nov16.bndl, then Dec16.bndl, then Jan17.bndl, etc. Using the --branches option is one way to go, but you may get later synchronisation problems with the other branches if you don't get sufficient depth into the bundle - i.e. getting the right since date, or tag, or merge base "master..sidebranch" (two dot notation). The other point you probably already know is that you can practice this all on a single machine! Just init a test repo, add a few commits and branches (with faked commit dates etc), then bundle parts of it, now init another repo and try fetching fom that bundle, back to the first repo and get another partial bundle, and transfer that, etc. etc. This means you can test out all the issues easily from the comfort of the chair! The final bit is that you don't (shouldn't) get any "result in unreferenced revisions in the destination repositories", all those revisions fetched from the bundle will be linked to a known ref. There may be revs in the bundle that aren't brought in, which isn't quite the same question. Hope that bit of rambling helps. Philip PS I understand enough of how I think git works to get me into trouble also. - Original Message - From: Lowell Alleman To: Git for human beings Cc: philipoak...@iee.org Sent: Thursday, February 02, 2017 6:23 PM Subject: Re: [git-users] Synchronizing air gapped git repositories using bundles Philip, Thanks for the reply! The reason I'm looking at using a script is mostly for standardization. So that the file names are consistent and to capture some bundle metadata necessary for the file transfer process (file name, size, checksum, ...) We capture some metadata about the bundle such as: Revision count and some delta details (specifically, the the output of "git diff --stat" and "git log --stat"). This helps answer the question about what is being transferred in a given bundle. (And to the best of my knowledge, there's no way to get this info from the bundle file itself.) Secondary reasons for the script comes down to mixed levels of user fluency with git, a general mandate to automate tasks, and, currently, the script is responsible for tracking the "last export point" via tags. (Oh, and I found it easy to forget to include refs, like refs/heads/master in the bundle, and then importing became super painful on the other side.) I was trying to stick with specific revisions and avoid overlapping exports, for a few reasons: (1) so that we could build a change "manifest" to go along with the bundle that would only include what's "new", (2) so that if we need to release multiple fixes in a short period of time, like more that one a day, we don't end up just copying the same stuff around over and over again (we are looking at a scheduled monthly sync up to keep divergence from becoming significant, but we may need to sync up multiple times a day on rare occasions), and (3) just to generally minimize file transfer size (not a huge deal, talking a few MBs). I fully agree on the file transfer rejections point. Hasn't happened yet. Policy work is ongoing. When you refer to bundle stacking. Is there a way to specify multiple locations to pull from at once, or are you just referring to the fact that you can sequentially pull from multiple bundle files. (I'm assuming the second.) Yes, "recording what has been transferred" is exactly the core issue I'm facing. I've noted above some of the reasons I was trying to use a tighter re
Re: [git-users] Synchronizing air gapped git repositories using bundles
at in the future (as the total repository size is becoming more significant.) So I guess the more fundamental question is this: Is it better to use the macro approach (and risk pushing around lots of extra stuff) which could result in unreferenced revisions in the destination repositories, or is it better to use the micro-mode and be strategic about just the specific branches/revisions I want to synchronize. Right now I've been grabbing just the branch I want from the bundle, and normally that's all it includes anyways. (e.g. git checkout mirror-REPO1; git fetch my.repo1.bundle master"). And I'm now wondering if in doing so, I could ultimately end up missing necessary revisions that would be imported if I used "git fetch". Is that possible? Okay, that's enough rambling. Thanks again for any help you can provide. As you may have gathered, I've been fighting with this process for quite some time now. I probably have an unbalanced knowledge of git that's currently working against me. I understand enough of how I think git works to get me into trouble, but not enough to get back out of it. ;-) And I'm working with a rather old version of git 1.7. Thanks in advance! On Wednesday, February 1, 2017 at 6:47:33 PM UTC-5, Philip Oakley wrote: > > Hi Lowell, > > You can use all of the options in the rev-list for selecting which commits > are in the bundle (which is just a thin wrapper around the pack file that > would be sent over the wire). > > You can include more commits in the bundle than you need [1], that is, > have an overlap. One option is simply to use the --since= option as a > way of ensuring you go far enough back in history. Plus the --all to get > *everything* after tha date [2]. > > I suspect that part of the problem is finding a way of recording what has > been transferred in the three way transfer - I'd suggest it's just as easy > to use a small note book (or formal admin log) for recording the date of > transfers and use that to guide the bundle creation. > > Plus you can always stack up the bundles, so can fetch first from the > oldest bundle, and then from the newer bundle, etc. > > I see you have the typical 'transfer review' process for the bundle > exchange (implies a certain kind of environment ;-) - does it ever > fail/reject the transfer? or is it simply making sure it is what you > thought it was and have recorded the transfer correctly (I expect it's > actually the latter). If you get true rejection you have more issues. > > I don't really think you need a special 'script' (beyond satisfying some > edict), as the bundle and fetch commands should be sufficient for doing the > transfer. > > Probably the biggest issue at that point is having a standardised naming > convention for the bundle file, e.g. server--.bndl so > that you know where it came from, where the --since cut point was, and when > it was created. > > Then it becomes fairly easy to import/fetch from the bundle acording to > the carefully mandated process. > > Philip > > [1] https://git-scm.com/docs/git-bundle > It is okay to err on the side of caution, causing the bundle file to > contain objects already in the destination, as these are ignored when > unpacking at the destination. > [2] > http://stackoverflow.com/questions/11792671/how-to-git-bundle-a-complete-repo > > - Original Message - > *From:* Lowell Alleman > *To:* Git for human beings > *Sent:* Wednesday, February 01, 2017 9:58 PM > *Subject:* [git-users] Synchronizing air gapped git repositories using > bundles > > I have 3 separate air-gapped git repositories (hosted on local GitHub > enterprise) that I'm trying to keep in sync. Currently, I'm using "git > bundle" to push revisions back and forth, which worked fairly well with > just 2 repositories, but I'm struggling a bit since the 3rd (and final) > repository has been added to the mix. I was using a single tag to track > the point of last export as noted in the "git bundle" docs, but I'm > struggling to make that scale with 2+ total repositories. > > In terms of information flow, we've deemed one of the repositories as > "primary" and the other two as "secondary" repositories. So in a sense we > are using the "primary" repository like a development and merging area so > that all changes go through the primary repository and trickle down to the > secondary repositories. Changes are always pushed upstream to primary, and > then synced down to the other secondary repository. > > Please note that our use of git is more like a "versioned file system" > than the typical developer use case. I go on to explain that a bit more >
Re: [git-users] Synchronizing air gapped git repositories using bundles
Hi Lowell, You can use all of the options in the rev-list for selecting which commits are in the bundle (which is just a thin wrapper around the pack file that would be sent over the wire). You can include more commits in the bundle than you need [1], that is, have an overlap. One option is simply to use the --since= option as a way of ensuring you go far enough back in history. Plus the --all to get everything after tha date [2]. I suspect that part of the problem is finding a way of recording what has been transferred in the three way transfer - I'd suggest it's just as easy to use a small note book (or formal admin log) for recording the date of transfers and use that to guide the bundle creation. Plus you can always stack up the bundles, so can fetch first from the oldest bundle, and then from the newer bundle, etc. I see you have the typical 'transfer review' process for the bundle exchange (implies a certain kind of environment ;-) - does it ever fail/reject the transfer? or is it simply making sure it is what you thought it was and have recorded the transfer correctly (I expect it's actually the latter). If you get true rejection you have more issues. I don't really think you need a special 'script' (beyond satisfying some edict), as the bundle and fetch commands should be sufficient for doing the transfer. Probably the biggest issue at that point is having a standardised naming convention for the bundle file, e.g. server--.bndl so that you know where it came from, where the --since cut point was, and when it was created. Then it becomes fairly easy to import/fetch from the bundle acording to the carefully mandated process. Philip [1] https://git-scm.com/docs/git-bundle It is okay to err on the side of caution, causing the bundle file to contain objects already in the destination, as these are ignored when unpacking at the destination. [2] http://stackoverflow.com/questions/11792671/how-to-git-bundle-a-complete-repo - Original Message - From: Lowell Alleman To: Git for human beings Sent: Wednesday, February 01, 2017 9:58 PM Subject: [git-users] Synchronizing air gapped git repositories using bundles I have 3 separate air-gapped git repositories (hosted on local GitHub enterprise) that I'm trying to keep in sync. Currently, I'm using "git bundle" to push revisions back and forth, which worked fairly well with just 2 repositories, but I'm struggling a bit since the 3rd (and final) repository has been added to the mix. I was using a single tag to track the point of last export as noted in the "git bundle" docs, but I'm struggling to make that scale with 2+ total repositories. In terms of information flow, we've deemed one of the repositories as "primary" and the other two as "secondary" repositories. So in a sense we are using the "primary" repository like a development and merging area so that all changes go through the primary repository and trickle down to the secondary repositories. Changes are always pushed upstream to primary, and then synced down to the other secondary repository. Please note that our use of git is more like a "versioned file system" than the typical developer use case. I go on to explain that a bit more later, but wanted to get to my main question before everyone gives up on reading this really long and complicated explanation of the mess I made. Q: Does anyone know of any existing scripts, documented methods, or best practices to follow when syncing a branch between multiple air-gapped repositories? How we are using git: As noted above, this is NOT a typical development-centered use-case. Branching is very infrequent, and most work is done on the "master" branch in each repository. Unlike typical developer-centric approaches, each clone (working copy) ends up tied to a specific server, rather than a single developer. So multiple users end up working in the same working copy and committing code from one place. The team is small and the changes are infrequent enough that this works for us, despite the atypical and less-than-ideal use case. How we are using branches: We treat each repository as if it has just one branch, a single "master". However, because of the synchronization requirements, we create special purpose branches in each repository that essentially mirror the master branches of the other repositories. So the primary repository has 2 mirrored branches, one for each of the secondary repositories. And each secondary repository has a single mirrored branch that represents the primary (upstream) repository. (By convention, we have agreed never to synchronize revisions directly between the two secondary repositories.) Local changes are never applied to a mirrored repository branch, so that it should match the "master" branch of the mirrored reposito