On Sun, Aug 3, 2014 at 6:20 PM, Gerald Combs <ger...@wireshark.org> wrote:
> On 8/3/14, 11:34 AM, Evan Huus wrote: > > On Mon, May 13, 2013 at 7:54 PM, Gerald Combs <ger...@wireshark.org > > <mailto:ger...@wireshark.org>> wrote: > > > > On 5/10/13 1:47 PM, Evan Huus wrote: > > > Hi Gerald > > > > > > I just cloned the Wireshark git mirror onto a new machine and was > > > surprised at how large it was to download. Running an aggressive > git > > > gc on the finished clone reduced the disk usage on my machine from > > > ~500MB to ~150MB. > > > > > > I'm a bit surprised - git is supposed to automatically garbage > collect > > > repositories when they get too cluttered, but perhaps its threshold > > > for automatic gc is just very high. > > > > > > I pinged Balint (CCed) about this and he suggested running gc on a > > > weekly basis and gc --aggressive on a monthly basis on the server. > It > > > would probably save a non-trivial amount of bandwidth in the long > term > > > as more people clone the repository. > > > > It might be due to our particular circumstances (a bare repository > only > > updated via the mirror script) but git's automatic garbage collection > > doesn't seem to happen very often. The mirror script runs "git gc > > --auto" each time it synchronizes which keeps it from filling up the > > disk (which happened early on) but as you point out there is room for > > improvement. I added a cron job that runs "git gc --aggressive" each > > week. Here is the output from a manual run, which includes "git > > count-objects -v" before and after: > > > > 2013-05-13 14:38:12: Started. > > 2013-05-13 14:38:12: Synchronizing repository wireshark > > 2013-05-13 14:38:12: Object count start > > count: 0 > > size: 0 > > in-pack: 316591 > > packs: 45 > > size-pack: 567146 > > prune-packable: 0 > > garbage: 0 > > 2013-05-13 14:38:12: Collecting garbage > > 2013-05-13 15:09:56: Object count start > > count: 0 > > size: 0 > > in-pack: 316596 > > packs: 2 > > size-pack: 127499 > > prune-packable: 0 > > garbage: 0 > > 2013-05-13 15:09:56: Done > > > > > > So it's been over a year since this conversation and we have actually > > migrated to Git/Gerrit so I have no idea what Gerrit is doing in this > > regard (is there even a "real" git repository backing it, or is it all > > internal magic?), but I recently came across [1] which suggests that > > repeated use of --aggressive maybe wasn't such a good idea after all. > > > > It suggests just sticking to regular `git gc` except in cases of large > > one-time imports (like we did on migration) at which point you should > > run the apparently-very-slow `git repack -a -d --depth=250 --window=250`. > > > > FWIW, a fresh clone from Gerrit right now is 213MB - my local repo is > > only 161MB, and my current desktop is actually not beefy enough to run > > the recommended repack command so I have no idea what improvement that > > would give. > > It's a "real" git repository but any operations performed by Gerrit are > done using JGit. The weekly automatic number update script runs `gerrit > gc --all`, which uses JGit's garbage collector. Many sites including > Google appear to run it one or more times a day. We may want to to the > same. > > I tried running git `repack -a -d --depth=250 --window=250` on the > server. It ran successfully and shrunk the repository from 248 MB to 208 > MB but now the OS X builders are timing out during `git fetch`... > Hmm, that's interesting, I would have expected a bigger improvement (given my local copy is still smaller than the one on the server). Perhaps it is worth trying an --aggressive gc just once (or passing the -f and -F flags to the existing repack command, which is probably even *more* aggressive). No idea why the buildbots would be timing out... the gc shouldn't have materially affected their ability to pull down deltas I don't think.
___________________________________________________________________________ Sent via: Wireshark-dev mailing list <wireshark-dev@wireshark.org> Archives: http://www.wireshark.org/lists/wireshark-dev Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev mailto:wireshark-dev-requ...@wireshark.org?subject=unsubscribe