On Sun, Aug 3, 2014 at 6:20 PM, Gerald Combs <ger...@wireshark.org> wrote:

> On 8/3/14, 11:34 AM, Evan Huus wrote:
> > On Mon, May 13, 2013 at 7:54 PM, Gerald Combs <ger...@wireshark.org
> > <mailto:ger...@wireshark.org>> wrote:
> >
> >     On 5/10/13 1:47 PM, Evan Huus wrote:
> >     > Hi Gerald
> >     >
> >     > I just cloned the Wireshark git mirror onto a new machine and was
> >     > surprised at how large it was to download. Running an aggressive
> git
> >     > gc on the finished clone reduced the disk usage on my machine from
> >     > ~500MB to ~150MB.
> >     >
> >     > I'm a bit surprised - git is supposed to automatically garbage
> collect
> >     > repositories when they get too cluttered, but perhaps its threshold
> >     > for automatic gc is just very high.
> >     >
> >     > I pinged Balint (CCed) about this and he suggested running gc on a
> >     > weekly basis and gc --aggressive on a monthly basis on the server.
> It
> >     > would probably save a non-trivial amount of bandwidth in the long
> term
> >     > as more people clone the repository.
> >
> >     It might be due to our particular circumstances (a bare repository
> only
> >     updated via the mirror script) but git's automatic garbage collection
> >     doesn't seem to happen very often. The mirror script runs "git gc
> >     --auto" each time it synchronizes which keeps it from filling up the
> >     disk (which happened early on) but as you point out there is room for
> >     improvement. I added a cron job that runs "git gc --aggressive" each
> >     week. Here is the output from a manual run, which includes "git
> >     count-objects -v" before and after:
> >
> >     2013-05-13 14:38:12: Started.
> >     2013-05-13 14:38:12: Synchronizing repository wireshark
> >     2013-05-13 14:38:12: Object count start
> >     count: 0
> >     size: 0
> >     in-pack: 316591
> >     packs: 45
> >     size-pack: 567146
> >     prune-packable: 0
> >     garbage: 0
> >     2013-05-13 14:38:12: Collecting garbage
> >     2013-05-13 15:09:56: Object count start
> >     count: 0
> >     size: 0
> >     in-pack: 316596
> >     packs: 2
> >     size-pack: 127499
> >     prune-packable: 0
> >     garbage: 0
> >     2013-05-13 15:09:56: Done
> >
> >
> > So it's been over a year since this conversation and we have actually
> > migrated to Git/Gerrit so I have no idea what Gerrit is doing in this
> > regard (is there even a "real" git repository backing it, or is it all
> > internal magic?), but I recently came across [1] which suggests that
> > repeated use of --aggressive maybe wasn't such a good idea after all.
> >
> > It suggests just sticking to regular `git gc` except in cases of large
> > one-time imports (like we did on migration) at which point you should
> > run the apparently-very-slow `git repack -a -d --depth=250 --window=250`.
> >
> > FWIW, a fresh clone from Gerrit right now is 213MB - my local repo is
> > only 161MB, and my current desktop is actually not beefy enough to run
> > the recommended repack command so I have no idea what improvement that
> > would give.
>
> It's a "real" git repository but any operations performed by Gerrit are
> done using JGit. The weekly automatic number update script runs `gerrit
> gc --all`, which uses JGit's garbage collector. Many sites including
> Google appear to run it one or more times a day. We may want to to the
> same.
>
> I tried running git `repack -a -d --depth=250 --window=250` on the
> server. It ran successfully and shrunk the repository from 248 MB to 208
> MB but now the OS X builders are timing out during `git fetch`...
>

Hmm, that's interesting, I would have expected a bigger improvement (given
my local copy is still smaller than the one on the server). Perhaps it is
worth trying an --aggressive gc just once (or passing the -f and -F flags
to the existing repack command, which is probably even *more* aggressive).

No idea why the buildbots would be timing out... the gc shouldn't have
materially affected their ability to pull down deltas I don't think.
___________________________________________________________________________
Sent via:    Wireshark-dev mailing list <wireshark-dev@wireshark.org>
Archives:    http://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
             mailto:wireshark-dev-requ...@wireshark.org?subject=unsubscribe

Reply via email to