Re: Multiple threads of compression

2012-11-26 Thread Thorsten Glaser
Brandon Casey dixit:

The number of threads that pack uses can be configured in the global
or system gitconfig file by setting pack.threads.
[…]
The other setting you should probably look at is pack.windowMemory
which should help you control the amount of memory git uses while
packing.  Also look at core.packedGitWindowSize and
core.packedGitLimit if your repository is really large.

OK, thanks a lot!

I can’t really say much about the repositories beforehand
because it’s a generic code hosting platform, several instances
of which we run at my employer’s place (I also run one privately
now), and which is also run by e.g. Debian. But I’ll try to figure
out some somewhat sensible defaults.

Running 'git gc' with --aggressive should be as safe as running it
without --aggressive.

OK, thanks.

But, you should think about whether you really need to run it more
than once, or at all.  When you use --aggressive, git will perform the
[…]

Great explanation!

I think that I’d want to run it once, after the repository has
been begun to be used (probably not correct English but you know
what I want to say), but have to figure out a way to do so… but
I’ll just leave out the --aggressive from the cronjob then.

Much appreciated,
//mirabilos
-- 
Sometimes they [people] care too much: pretty printers [and syntax highligh-
ting, d.A.] mechanically produce pretty output that accentuates irrelevant
detail in the program, which is as sensible as putting all the prepositions
in English text in bold font.   -- Rob Pike in Notes on Programming in C
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Multiple threads of compression

2012-11-25 Thread Thorsten Glaser
Hi,

I’m asking here informally first, because my information relates
to a quite old version (the one from lenny-backports). A tl;dr
is at the end.

On a multi-core machine, the garbage collection of git, as well
as pack compression on the server side when someone clones a
repository remotely, the compression is normally done automatically
using multiple threads of execution.

That may be fine for your typical setups, but in my cases, I have
two scenarios where it isn’t:

ⓐ The machine where I want it to use only, say, 2 of my 4 or 8 cores
  as I’m also running some VMs on the box which eat up a lot of CPU
  and which I don’t want to slow down.

ⓑ The server VM which has been given 2 or 3 VCPUs to cope with all
  the load done by clients, but which is RAM-constrained to only
  512 or, when lucky, 768 MiB. It previously served only http/https
  and *yuk* Subversion, but now, git comes into the play, and I’ve
  seen the one server box I think about go down *HARD* because git
  ate up all RAM *and* swap when someone wanted to update their clone
  of a repository after someone else committed… well, an ~100 MiB large
  binary file they shouldn’t. (It required manual intervention on the
  server to kill that revision and then the objects coupled with it,
  but even *that* didn’t work, read on for more.)

In both cases, I had to apply a quick hack. One I can reproduce
by now is, that, on the first box, I added a --threads=2 to the
line calling git pack-objects in /usr/lib/git-core/git-repack,
like this:

   83 args=$args $local ${GIT_QUIET:+-q} $no_reuse$extra
   84 names=$(git pack-objects --threads=2 --keep-true-parents --honor-pack-
keep --non-empty --all --reflog $arg
   85 exit 1

(By the way, wrapping source code at 80c is still way to go IMHO.)

On the second box, IIRC I added --threads=1, but that box got
subsequently upgraded from lenny to wheezy so any local modification
is lost (luckily, the problem didn’t occur again recently, or at
least I didn’t notice it, save for the VM load going up to 6-8
several times a day).

tl;dr: I would like to have a *global* option for git to restrict
the number of threads of execution it uses. Several subcommands,
like pack-objects, are already equipped with an optioin for this,
but unfortunately, these are seldom invoked by hand¹, so this can’t
work in my situations.

① automatic garbage collection, “git gc --aggressive --prune=now”,
  and cloning are the use cases I have at hand right now.

À propos, while here: is gc --aggressive safe to run on a live,
online-shared repository, or does it break other users accessing
the repository concurrently? (If it’s safe I’d very much like to do
that in a, say weekly, cronjob on FusionForge, our hosting system.)

Thanks in advance!
//mirabilos

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html