Re: [PATCH] Increase core.packedGitLimit

2017-06-21 Thread Jeff King
On Thu, Apr 20, 2017 at 04:41:18PM -0400, David Turner wrote:

> When core.packedGitLimit is exceeded, git will close packs.  If there
> is a repack operation going on in parallel with a fetch, the fetch
> might open a pack, and then be forced to close it due to
> packedGitLimit being hit.  The repack could then delete the pack
> out from under the fetch, causing the fetch to fail.
> 
> Increase core.packedGitLimit's default value to prevent
> this.
> 
> On current 64-bit x86_64 machines, 48 bits of address space are
> available.  It appears that 64-bit ARM machines have no standard
> amount of address space (that is, it varies by manufacturer), and IA64
> and POWER machines have the full 64 bits.  So 48 bits is the only
> limit that we can reasonably care about.  We reserve a few bits of the
> 48-bit address space for the kernel's use (this is not strictly
> necessary, but it's better to be safe), and use up to the remaining
> 45.  No git repository will be anywhere near this large any time soon,
> so this should prevent the failure.

I happened to run into another case of this which failed fairly
reliably, and confirmed that bumping packedGitLimit fixed it. Having
forgotten completely that we did this bump here, I wrote a similar
patch. :-/

The content of the patch is basically the same, but for the sake of the
list archive, here's the bit from my commit message about the
reproduction recipe (in case we should ever need to revisit it again,
though I'm pretty sure that your patch should fix the problem for good).

-- >8 --

You can see this situation in practice with strace:

  # two objects, one big and one small; for the sake of
  # speed, we'll make our big object only 1.1MB and drop
  # core.packedGitLimit to 1MB. Note that we care about
  # on-disk compressed sizes here, so we need uncompressible
  # data.
  small=$(echo small | git hash-object -w --stdin)
  big=$(dd if=/dev/urandom bs=1k count=1100 | git hash-object -w --stdin)

  # Now put each in a pack, which will get mmap'd
  # separately.
  echo $small | git pack-objects objects/pack/pack
  echo $big | git pack-objects objects/pack/pack

  # Now we look in the packs alternately, and observe the
  # access patterns.
  {
  echo $small
  echo $big
  echo $small
  } | strace -e open,close,mmap,munmap \
  git -c core.packedgitlimit=1M \
  -c core.packedGitWindowSize=512K \
  cat-file --batch >/dev/null

The trace will look something like this (I've omitted the
uninteresting .idx bits):

  open("./objects/pack/pack-b2a8452664fe8848fd0a78be8b6f69ce65af3c57.pack", 
O_RDONLY|O_CLOEXEC) = 3
  mmap(NULL, 47, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fb5890af000
  close(3)= 0

There we opened the small pack, mapped the whole thing, and
closed it.

  open("./objects/pack/pack-acfa5f8886b391d0b2475ef3f19bcc387ce19271.pack", 
O_RDONLY|O_CLOEXEC) = 3
  mmap(NULL, 524288, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fb589011000
  mmap(NULL, 1130496, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7fb588efd000
  munmap(0x7fb5890af000, 47)  = 0
  mmap(NULL, 524288, PROT_READ, MAP_PRIVATE, 3, 0x8) = 0x7fb5881c
  munmap(0x7fb589011000, 524288)  = 0
  mmap(NULL, 78211, PROT_READ, MAP_PRIVATE, 3, 0x10) = 0x7fb58909a000
  munmap(0x7fb588efd000, 1130496) = 0

And here we open the larger pack. Note that we don't close
the descriptor because the file is bigger than our 512k
window.  And in fact we end up creating several maps as we
progress through the file. But before making the second one,
which would push us over our packedGitLimit, we close the
original small 47-byte map.

  open("./objects/pack/pack-b2a8452664fe8848fd0a78be8b6f69ce65af3c57.pack", 
O_RDONLY|O_CLOEXEC) = 4
  mmap(NULL, 47, PROT_READ, MAP_PRIVATE, 4, 0) = 0x7fb5890af000
  close(4)= 0

And here we access the small pack again, but we have to
re-open and map it from scratch.  If there had been a
simultaneous repack and if this had been pack-objects, we
would now fail the operation.

These numbers are shrunk to make the test more tractable,
but you can see the same thing with the defaults when
fetching from a large repository.  The real-world situation
that would trigger this is something like:

  0. Repo has 8GB or larger packfile (this is the default
 core.packedGitLimit on 64-bit machines).

  1. Somebody pushes a "small" pack (smaller than the 1GB
 default window).

  2. Somebody tries to clone/fetch the result. The
 pack-objects process on the server starts by generating
 the list of objects to write, committing to the
 use of those packs. After accessing the recent objects
 in the small pack, it opens the big pack and closes the
 small one.

  3. Meanwhile, git-repack runs and folds the small pack
 into a larger one.

  4. The pack-objects process hits the write phase, but it
 can no longer access the recent objects from 

RE: [PATCH] Increase core.packedGitLimit

2017-04-21 Thread Johannes Schindelin
Hi Dave,

On Thu, 20 Apr 2017, David Turner wrote:
> 
> We might want to think about replacing git_config_ulong with 
> git_config_size_t in nearly all cases. "long" has ceased to be 
> useful.  More modern versions of C prefer uint64_t, but I
> think that we'll usually want size_t because these values will
> be used as memory limits of various sorts.

Indeed, `unsigned long` has ceased to be useful ever since the
introduction of inttypes.h. It is just too undefined.

We probably also need git_config_off_t(), though, and maybe even
git_config_ssize_t().

I would definitely welcome a change in direction where we use datatypes
also as a documentation of the variables' purpose.

Ciao,
Dscho


RE: [PATCH] Increase core.packedGitLimit

2017-04-20 Thread David Turner

> -Original Message-
> From: Johannes Schindelin [mailto:johannes.schinde...@gmx.de]
> Sent: Thursday, April 20, 2017 5:58 PM
> To: Jeff King <p...@peff.net>
> Cc: David Turner <david.tur...@twosigma.com>; git@vger.kernel.org
> Subject: Re: [PATCH] Increase core.packedGitLimit
> 
> Hi Peff,
> 
> On Thu, 20 Apr 2017, Jeff King wrote:
> 
> > On Thu, Apr 20, 2017 at 04:41:18PM -0400, David Turner wrote:
> >
> > > When core.packedGitLimit is exceeded, git will close packs.  If
> > > there is a repack operation going on in parallel with a fetch, the
> > > fetch might open a pack, and then be forced to close it due to
> > > packedGitLimit being hit.  The repack could then delete the pack out
> > > from under the fetch, causing the fetch to fail.
> > >
> > > Increase core.packedGitLimit's default value to prevent this.
> > >
> > > On current 64-bit x86_64 machines, 48 bits of address space are
> > > available.  It appears that 64-bit ARM machines have no standard
> > > amount of address space (that is, it varies by manufacturer), and
> > > IA64 and POWER machines have the full 64 bits.  So 48 bits is the
> > > only limit that we can reasonably care about.  We reserve a few bits
> > > of the 48-bit address space for the kernel's use (this is not
> > > strictly necessary, but it's better to be safe), and use up to the
> > > remaining 45.  No git repository will be anywhere near this large
> > > any time soon, so this should prevent the failure.
> >
> > Yep, I think this is a reasonable direction.
> >
> > > ---
> > >  git-compat-util.h | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > This probably needs an update to the core.packedGitLimit section of
> > Documentation/config.txt.
> >
> > > diff --git a/git-compat-util.h b/git-compat-util.h index
> > > 8a4a3f85e7..1c5de153a5 100644
> > > --- a/git-compat-util.h
> > > +++ b/git-compat-util.h
> > > @@ -616,7 +616,7 @@ extern int git_lstat(const char *, struct stat
> > > *);  #endif
> > >
> > >  #define DEFAULT_PACKED_GIT_LIMIT \
> > > - ((1024L * 1024L) * (size_t)(sizeof(void*) >= 8 ? 8192 : 256))
> > > + ((1024L * 1024L) * (size_t)(sizeof(void*) >= 8 ? (32 * 1024L *
> > > +1024L) : 256))
> >
> > I wondered if we would run afoul of integer sizes on 64-bit systems
> > where "long" is still only 32-bits (i.e., Windows). But I think it's
> > OK, because the values before we cast to size_t are in megabytes. So
> > your
> > 32*1024*1024 needs only 25 bits to store it. And then after we cast to
> > size_t, everything is in 64-bit.
> 
> Indeed, when I patch a local Git checkout accordingly, I see that
> packed_git_limit is set to 35184372088832.
> 
> The bigger problem in this regard is that users are allowed to override this 
> via
> core.packedgitlimit but that value is parsed as an unsigned long.

We might want to think about replacing git_config_ulong with 
git_config_size_t in nearly all cases. "long" has ceased to be 
useful.  More modern versions of C prefer uint64_t, but I
think that we'll usually want size_t because these values will
be used as memory limits of various sorts.



Re: [PATCH] Increase core.packedGitLimit

2017-04-20 Thread Johannes Schindelin
Hi Peff,

On Thu, 20 Apr 2017, Jeff King wrote:

> On Thu, Apr 20, 2017 at 04:41:18PM -0400, David Turner wrote:
> 
> > When core.packedGitLimit is exceeded, git will close packs.  If there
> > is a repack operation going on in parallel with a fetch, the fetch
> > might open a pack, and then be forced to close it due to
> > packedGitLimit being hit.  The repack could then delete the pack
> > out from under the fetch, causing the fetch to fail.
> > 
> > Increase core.packedGitLimit's default value to prevent
> > this.
> > 
> > On current 64-bit x86_64 machines, 48 bits of address space are
> > available.  It appears that 64-bit ARM machines have no standard
> > amount of address space (that is, it varies by manufacturer), and IA64
> > and POWER machines have the full 64 bits.  So 48 bits is the only
> > limit that we can reasonably care about.  We reserve a few bits of the
> > 48-bit address space for the kernel's use (this is not strictly
> > necessary, but it's better to be safe), and use up to the remaining
> > 45.  No git repository will be anywhere near this large any time soon,
> > so this should prevent the failure.
> 
> Yep, I think this is a reasonable direction.
> 
> > ---
> >  git-compat-util.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> This probably needs an update to the core.packedGitLimit section of
> Documentation/config.txt.
> 
> > diff --git a/git-compat-util.h b/git-compat-util.h
> > index 8a4a3f85e7..1c5de153a5 100644
> > --- a/git-compat-util.h
> > +++ b/git-compat-util.h
> > @@ -616,7 +616,7 @@ extern int git_lstat(const char *, struct stat *);
> >  #endif
> >  
> >  #define DEFAULT_PACKED_GIT_LIMIT \
> > -   ((1024L * 1024L) * (size_t)(sizeof(void*) >= 8 ? 8192 : 256))
> > +   ((1024L * 1024L) * (size_t)(sizeof(void*) >= 8 ? (32 * 1024L * 1024L) : 
> > 256))
> 
> I wondered if we would run afoul of integer sizes on 64-bit systems where
> "long" is still only 32-bits (i.e., Windows). But I think it's OK,
> because the values before we cast to size_t are in megabytes. So your
> 32*1024*1024 needs only 25 bits to store it. And then after we cast to
> size_t, everything is in 64-bit.

Indeed, when I patch a local Git checkout accordingly, I see that
packed_git_limit is set to 35184372088832.

The bigger problem in this regard is that users are allowed to override
this via core.packedgitlimit but that value is parsed as an unsigned long.

Ciao,
Dscho


Re: [PATCH] Increase core.packedGitLimit

2017-04-20 Thread Jeff King
On Thu, Apr 20, 2017 at 04:41:18PM -0400, David Turner wrote:

> When core.packedGitLimit is exceeded, git will close packs.  If there
> is a repack operation going on in parallel with a fetch, the fetch
> might open a pack, and then be forced to close it due to
> packedGitLimit being hit.  The repack could then delete the pack
> out from under the fetch, causing the fetch to fail.
> 
> Increase core.packedGitLimit's default value to prevent
> this.
> 
> On current 64-bit x86_64 machines, 48 bits of address space are
> available.  It appears that 64-bit ARM machines have no standard
> amount of address space (that is, it varies by manufacturer), and IA64
> and POWER machines have the full 64 bits.  So 48 bits is the only
> limit that we can reasonably care about.  We reserve a few bits of the
> 48-bit address space for the kernel's use (this is not strictly
> necessary, but it's better to be safe), and use up to the remaining
> 45.  No git repository will be anywhere near this large any time soon,
> so this should prevent the failure.

Yep, I think this is a reasonable direction.

> ---
>  git-compat-util.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

This probably needs an update to the core.packedGitLimit section of
Documentation/config.txt.

> diff --git a/git-compat-util.h b/git-compat-util.h
> index 8a4a3f85e7..1c5de153a5 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -616,7 +616,7 @@ extern int git_lstat(const char *, struct stat *);
>  #endif
>  
>  #define DEFAULT_PACKED_GIT_LIMIT \
> - ((1024L * 1024L) * (size_t)(sizeof(void*) >= 8 ? 8192 : 256))
> + ((1024L * 1024L) * (size_t)(sizeof(void*) >= 8 ? (32 * 1024L * 1024L) : 
> 256))

I wondered if we would run afoul of integer sizes on 64-bit systems where
"long" is still only 32-bits (i.e., Windows). But I think it's OK,
because the values before we cast to size_t are in megabytes. So your
32*1024*1024 needs only 25 bits to store it. And then after we cast to
size_t, everything is in 64-bit.

-Peff


[PATCH] Increase core.packedGitLimit

2017-04-20 Thread David Turner
When core.packedGitLimit is exceeded, git will close packs.  If there
is a repack operation going on in parallel with a fetch, the fetch
might open a pack, and then be forced to close it due to
packedGitLimit being hit.  The repack could then delete the pack
out from under the fetch, causing the fetch to fail.

Increase core.packedGitLimit's default value to prevent
this.

On current 64-bit x86_64 machines, 48 bits of address space are
available.  It appears that 64-bit ARM machines have no standard
amount of address space (that is, it varies by manufacturer), and IA64
and POWER machines have the full 64 bits.  So 48 bits is the only
limit that we can reasonably care about.  We reserve a few bits of the
48-bit address space for the kernel's use (this is not strictly
necessary, but it's better to be safe), and use up to the remaining
45.  No git repository will be anywhere near this large any time soon,
so this should prevent the failure.

Helped-by: Jeff King 
Signed-off-by: David Turner 
---
 git-compat-util.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/git-compat-util.h b/git-compat-util.h
index 8a4a3f85e7..1c5de153a5 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -616,7 +616,7 @@ extern int git_lstat(const char *, struct stat *);
 #endif
 
 #define DEFAULT_PACKED_GIT_LIMIT \
-   ((1024L * 1024L) * (size_t)(sizeof(void*) >= 8 ? 8192 : 256))
+   ((1024L * 1024L) * (size_t)(sizeof(void*) >= 8 ? (32 * 1024L * 1024L) : 
256))
 
 #ifdef NO_PREAD
 #define pread git_pread
-- 
2.11.GIT