Re: [PATCH v2] Bump core.deltaBaseCacheLimit to 128MiB

2014-03-21 Thread David Kastrup
Junio C Hamano gits...@pobox.com writes:

 David Kastrup d...@gnu.org writes:

 Junio C Hamano gits...@pobox.com writes:

 Duy Nguyen pclo...@gmail.com writes:

 The only
 downside I see is large blobs will be packed  undeltified, which could
 increase pack size if you have lots of them.

 I think that is something that can be tweaked, unless the user tells
 us otherwise via command line override, when running the improved
 gc --aggressive ;-)

 deltaBaseCacheLimit is used for unpacking, not for packing.

 Hmm, doesn't packing need to read existing data?

Judging from the frequent out-of-memory conditions of git gc
--aggressive, packing is not restrained by deltaBaseCacheLimit.

-- 
David Kastrup
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Bump core.deltaBaseCacheLimit to 128MiB

2014-03-21 Thread David Kastrup
Jeff King p...@peff.net writes:

 On Wed, Mar 19, 2014 at 01:38:32PM +0100, David Kastrup wrote:

 The default of 16MiB causes serious thrashing for large delta chains
 combined with large files.

 Does it make much sense to bump this without also bumping
 MAX_DELTA_CACHE in sha1_file.c? In my measurements of linux.git, bumping
 the memory limit did not help much without also bumping the number of
 slots.

In the cases I checked, bumping MAX_DELTA_CACHE did not help much.
Bumping it once to 512 could be a slight improvement; larger values then
caused performance to regress.

 I guess that just bumping the memory limit would help with repos which
 have deltas on large-ish files (whereas the kernel just has a lot of
 deltas on a lot of little directories),

Well, those were the most pathological for git-blame so I was somewhat
focused on them.

 but I'd be curious how much.

http://repo.or.cz/r/wortliste.git

consists basically of adding lines to a single alphabetically sorted
file wortliste of currently size 15MB.

-- 
David Kastrup
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Bump core.deltaBaseCacheLimit to 128MiB

2014-03-21 Thread Duy Nguyen
On Fri, Mar 21, 2014 at 1:04 PM, David Kastrup d...@gnu.org wrote:
 Hmm, doesn't packing need to read existing data?

 Judging from the frequent out-of-memory conditions of git gc
 --aggressive, packing is not restrained by deltaBaseCacheLimit.

pack-objects memory usage is more controlled by pack.windowmemory,
which defaults to unlimited. Does it make sense to default to half
physical memory or something?
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Bump core.deltaBaseCacheLimit to 128MiB

2014-03-21 Thread David Kastrup
Junio C Hamano gits...@pobox.com writes:

 I know that the 512MiB default for the bitFileThreashold (aka
 forget about delta compression) came out of thin air.  It was just
 1GB is always too huge for anybody, so let's cut it in half and
 declare that value the initial version of a sane threashold,
 nothing more.

 So it might be that the problem is 512MiB is still too big, relative
 to the 16MiB of delta base cache, and the former may be what needs
 to be tweaked.

Well, the point with the 512MiB limit is basically that you can always
argue if you are working with 500MiB files, you will not be using just
128MiB of main memory anyway.  The 16MiB limit, in contrast, will get
utilized for basically every history, even one of small files, that's
deep enough.

 If a blob close to but below 512MiB is a problem for 16MiB delta base
 cache, it would still be too big to cause the same problem for 128MiB
 delta base cache---it would evict all the other objects and then end
 up not being able to fit in the limit itself, busting the limit
 immediately, no?

I think, but may be mistaken, that the delta base limit decides when to
flush whole blobs.  So a 500MiB file would still be encoded relative to
a single newer version anyway and take memory accordingly.

But I am not sure about that at all.

-- 
David Kastrup
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Bump core.deltaBaseCacheLimit to 128MiB

2014-03-21 Thread David Kastrup
Jeff King p...@peff.net writes:

 If you have before-and-after numbers for just this patch on some
 repository, that would be an interesting thing to put in the commit
 message.

It's a hen-and-egg problem regarding the benchmarks.  The most
impressive benchmarks arise with the git-blame performance work in
place, and the most impressive benchmarks for the git-blame performance
work are when this or something similar is in place.  Of course, when
there are two really deficient things slowing operations down, fixing
only one is going to be much less impressive.

So I decided to tackle the low-hanging fruit here first.  But it would
appear that this amounts in far too much work since it means I have to
search for and create some _other_ benchmarking scenario not hampered by
substandard code like the current git-blame is.

I have enough on my plate as it is, so even though it puts the _real_
git-blame work in a worse light, I should rather get that finished first
(nobody will argue to keep the useless threshing of it around).  Of
course, the person then creating the two-line change to
deltaBaseCacheLimit will be able to claim much larger performance
improvements in his commit message afterwards than what I can claim
regarding the git-blame work when going first, but that's life.

-- 
David Kastrup
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Bump core.deltaBaseCacheLimit to 128MiB

2014-03-20 Thread Junio C Hamano
Duy Nguyen pclo...@gmail.com writes:

 On Thu, Mar 20, 2014 at 5:11 AM, Junio C Hamano gits...@pobox.com wrote:
 ...
 I know that the 512MiB default for the bitFileThreashold (aka
 forget about delta compression) came out of thin air.  It was just
 1GB is always too huge for anybody, so let's cut it in half and
 declare that value the initial version of a sane threashold,
 nothing more.

 So it might be that the problem is 512MiB is still too big, relative
 to the 16MiB of delta base cache, and the former may be what needs
 to be tweaked.  If a blob close to but below 512MiB is a problem for
 16MiB delta base cache, it would still be too big to cause the same
 problem for 128MiB delta base cache---it would evict all the other
 objects and then end up not being able to fit in the limit itself,
 busting the limit immediately, no?

 I would understand if the change were to update the definition of
 deltaBaseCacheLimit and link it to the value of bigFileThreashold,
 for example.  With the presented discussion, I am still not sure if
 we can say that bumping deltaBaseCacheLimit is the right solution to
 the description with the current setting is clearly wrong (which
 is a real issue).

 I vote make big_file_threshold smaller. 512MB is already unfriendly
 for many smaller machines. I'm thinking somewhere around 32MB-64MB
 (and maybe increase delta cache base limit to match).

These numbers match my gut feeling (e.g. 4k*4k*32-bit uncompressed
would be 64MB); delta cash base that is sized to the same as (or
perhaps twice as big as) that limit may be a good default.

 The only
 downside I see is large blobs will be packed  undeltified, which could
 increase pack size if you have lots of them.

I think that is something that can be tweaked, unless the user tells
us otherwise via command line override, when running the improved
gc --aggressive ;-)
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Bump core.deltaBaseCacheLimit to 128MiB

2014-03-20 Thread David Kastrup
Junio C Hamano gits...@pobox.com writes:

 Duy Nguyen pclo...@gmail.com writes:

 The only
 downside I see is large blobs will be packed  undeltified, which could
 increase pack size if you have lots of them.

 I think that is something that can be tweaked, unless the user tells
 us otherwise via command line override, when running the improved
 gc --aggressive ;-)

deltaBaseCacheLimit is used for unpacking, not for packing.

-- 
David Kastrup
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Bump core.deltaBaseCacheLimit to 128MiB

2014-03-20 Thread Junio C Hamano
David Kastrup d...@gnu.org writes:

 Junio C Hamano gits...@pobox.com writes:

 Duy Nguyen pclo...@gmail.com writes:

 The only
 downside I see is large blobs will be packed  undeltified, which could
 increase pack size if you have lots of them.

 I think that is something that can be tweaked, unless the user tells
 us otherwise via command line override, when running the improved
 gc --aggressive ;-)

 deltaBaseCacheLimit is used for unpacking, not for packing.

Hmm, doesn't packing need to read existing data?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Bump core.deltaBaseCacheLimit to 128MiB

2014-03-20 Thread Jeff King
On Wed, Mar 19, 2014 at 01:38:32PM +0100, David Kastrup wrote:

 The default of 16MiB causes serious thrashing for large delta chains
 combined with large files.

Does it make much sense to bump this without also bumping
MAX_DELTA_CACHE in sha1_file.c? In my measurements of linux.git, bumping
the memory limit did not help much without also bumping the number of
slots.

I guess that just bumping the memory limit would help with repos which
have deltas on large-ish files (whereas the kernel just has a lot of
deltas on a lot of little directories), but I'd be curious how much.

If you have before-and-after numbers for just this patch on some
repository, that would be an interesting thing to put in the commit
message.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] Bump core.deltaBaseCacheLimit to 128MiB

2014-03-19 Thread David Kastrup
The default of 16MiB causes serious thrashing for large delta chains
combined with large files.

Signed-off-by: David Kastrup d...@gnu.org
---

Forgot the signoff.  For the rationale of this patch and the 128MiB
choice, see the original patch.

Documentation/config.txt | 2 +-
 environment.c| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 73c8973..1b6950a 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -484,7 +484,7 @@ core.deltaBaseCacheLimit::
to avoid unpacking and decompressing frequently used base
objects multiple times.
 +
-Default is 16 MiB on all platforms.  This should be reasonable
+Default is 128 MiB on all platforms.  This should be reasonable
 for all users/operating systems, except on the largest projects.
 You probably do not need to adjust this value.
 +
diff --git a/environment.c b/environment.c
index c3c8606..73ed670 100644
--- a/environment.c
+++ b/environment.c
@@ -37,7 +37,7 @@ int core_compression_seen;
 int fsync_object_files;
 size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
 size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
-size_t delta_base_cache_limit = 16 * 1024 * 1024;
+size_t delta_base_cache_limit = 128 * 1024 * 1024;
 unsigned long big_file_threshold = 512 * 1024 * 1024;
 const char *pager_program;
 int pager_use_color = 1;
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Bump core.deltaBaseCacheLimit to 128MiB

2014-03-19 Thread Junio C Hamano
David Kastrup d...@gnu.org writes:

 The default of 16MiB causes serious thrashing for large delta chains
 combined with large files.

 Signed-off-by: David Kastrup d...@gnu.org
 ---

Is that a good argument?  Wouldn't the default of 128MiB burden
smaller machines with bloated processes?

 Forgot the signoff.  For the rationale of this patch and the 128MiB
 choice, see the original patch.

See the original patch, especially written after three-dash lines
without a reference, will not help future readers of git log who
later bisects to find that this change hurt their usage and want to
see why it was done unconditionally (as opposed to encouraging those
who benefit from this change to configure their Git to use larger
value for them, without hurting others).

While I can personally afford 128MiB, I do *not* think 16MiB was
chosen more scientifically than the choice of 128MiB this change
proposes to make, and my gut feeling is that this would not have too
big a negative impact to anybody, I would prefer to have a reason
better than gut feeling before accepting a default change.

Thanks.


 Documentation/config.txt | 2 +-
  environment.c| 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

 diff --git a/Documentation/config.txt b/Documentation/config.txt
 index 73c8973..1b6950a 100644
 --- a/Documentation/config.txt
 +++ b/Documentation/config.txt
 @@ -484,7 +484,7 @@ core.deltaBaseCacheLimit::
   to avoid unpacking and decompressing frequently used base
   objects multiple times.
  +
 -Default is 16 MiB on all platforms.  This should be reasonable
 +Default is 128 MiB on all platforms.  This should be reasonable
  for all users/operating systems, except on the largest projects.
  You probably do not need to adjust this value.
  +
 diff --git a/environment.c b/environment.c
 index c3c8606..73ed670 100644
 --- a/environment.c
 +++ b/environment.c
 @@ -37,7 +37,7 @@ int core_compression_seen;
  int fsync_object_files;
  size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
  size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
 -size_t delta_base_cache_limit = 16 * 1024 * 1024;
 +size_t delta_base_cache_limit = 128 * 1024 * 1024;
  unsigned long big_file_threshold = 512 * 1024 * 1024;
  const char *pager_program;
  int pager_use_color = 1;
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Bump core.deltaBaseCacheLimit to 128MiB

2014-03-19 Thread David Kastrup
Junio C Hamano gits...@pobox.com writes:

 David Kastrup d...@gnu.org writes:

 The default of 16MiB causes serious thrashing for large delta chains
 combined with large files.

 Signed-off-by: David Kastrup d...@gnu.org
 ---

 Is that a good argument?  Wouldn't the default of 128MiB burden
 smaller machines with bloated processes?

The default file size before Git forgets about delta compression is
512MiB.  Unpacking 500MiB files with 16MiB of delta storage is going to
be uglier.

 Forgot the signoff.  For the rationale of this patch and the 128MiB
 choice, see the original patch.

 See the original patch, especially written after three-dash lines
 without a reference, will not help future readers of git log who
 later bisects to find that this change hurt their usage and want to
 see why it was done unconditionally (as opposed to encouraging those
 who benefit from this change to configure their Git to use larger
 value for them, without hurting others).

Documentation/config.txt states:

core.deltaBaseCacheLimit::
Maximum number of bytes to reserve for caching base objects
that may be referenced by multiple deltified objects.  By storing 
the
entire decompressed base objects in a cache Git is able
to avoid unpacking and decompressing frequently used base
objects multiple times.
+
Default is 16 MiB on all platforms.  This should be reasonable
for all users/operating systems, except on the largest projects.
You probably do not need to adjust this value.

I've seen this seriously screwing performance in several projects of
mine that don't really count as largest projects.

So the description in combination with the current setting is clearly wrong.

 While I can personally afford 128MiB, I do *not* think 16MiB was
 chosen more scientifically than the choice of 128MiB this change
 proposes to make, and my gut feeling is that this would not have too
 big a negative impact to anybody, I would prefer to have a reason
 better than gut feeling before accepting a default change.

Shrug.  I've set my private default anyway, so I don't have anything to
gain from this change.  If anybody wants to pick this up and put more
science into the exact value: be my guest.

-- 
David Kastrup
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Bump core.deltaBaseCacheLimit to 128MiB

2014-03-19 Thread Junio C Hamano
David Kastrup d...@gnu.org writes:

 Junio C Hamano gits...@pobox.com writes:

 David Kastrup d...@gnu.org writes:

 The default of 16MiB causes serious thrashing for large delta chains
 combined with large files.

 Signed-off-by: David Kastrup d...@gnu.org
 ---

 Is that a good argument?  Wouldn't the default of 128MiB burden
 smaller machines with bloated processes?

 The default file size before Git forgets about delta compression is
 512MiB.  Unpacking 500MiB files with 16MiB of delta storage is going to
 be uglier.

 ...

 Documentation/config.txt states:

 core.deltaBaseCacheLimit::
 Maximum number of bytes to reserve for caching base objects
 that may be referenced by multiple deltified objects.  By storing 
 the
 entire decompressed base objects in a cache Git is able
 to avoid unpacking and decompressing frequently used base
 objects multiple times.
 +
 Default is 16 MiB on all platforms.  This should be reasonable
 for all users/operating systems, except on the largest projects.
 You probably do not need to adjust this value.

 I've seen this seriously screwing performance in several projects of
 mine that don't really count as largest projects.

 So the description in combination with the current setting is clearly wrong.

That is a good material for proposed log message, and I think you
are onto something here.

I know that the 512MiB default for the bitFileThreashold (aka
forget about delta compression) came out of thin air.  It was just
1GB is always too huge for anybody, so let's cut it in half and
declare that value the initial version of a sane threashold,
nothing more.

So it might be that the problem is 512MiB is still too big, relative
to the 16MiB of delta base cache, and the former may be what needs
to be tweaked.  If a blob close to but below 512MiB is a problem for
16MiB delta base cache, it would still be too big to cause the same
problem for 128MiB delta base cache---it would evict all the other
objects and then end up not being able to fit in the limit itself,
busting the limit immediately, no?

I would understand if the change were to update the definition of
deltaBaseCacheLimit and link it to the value of bigFileThreashold,
for example.  With the presented discussion, I am still not sure if
we can say that bumping deltaBaseCacheLimit is the right solution to
the description with the current setting is clearly wrong (which
is a real issue).
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Bump core.deltaBaseCacheLimit to 128MiB

2014-03-19 Thread Duy Nguyen
On Thu, Mar 20, 2014 at 5:11 AM, Junio C Hamano gits...@pobox.com wrote:
 David Kastrup d...@gnu.org writes:

 Junio C Hamano gits...@pobox.com writes:

 David Kastrup d...@gnu.org writes:

 The default of 16MiB causes serious thrashing for large delta chains
 combined with large files.

 Signed-off-by: David Kastrup d...@gnu.org
 ---

 Is that a good argument?  Wouldn't the default of 128MiB burden
 smaller machines with bloated processes?

 The default file size before Git forgets about delta compression is
 512MiB.  Unpacking 500MiB files with 16MiB of delta storage is going to
 be uglier.

 ...

 Documentation/config.txt states:

 core.deltaBaseCacheLimit::
 Maximum number of bytes to reserve for caching base objects
 that may be referenced by multiple deltified objects.  By 
 storing the
 entire decompressed base objects in a cache Git is able
 to avoid unpacking and decompressing frequently used base
 objects multiple times.
 +
 Default is 16 MiB on all platforms.  This should be reasonable
 for all users/operating systems, except on the largest projects.
 You probably do not need to adjust this value.

 I've seen this seriously screwing performance in several projects of
 mine that don't really count as largest projects.

 So the description in combination with the current setting is clearly wrong.

 That is a good material for proposed log message, and I think you
 are onto something here.

 I know that the 512MiB default for the bitFileThreashold (aka
 forget about delta compression) came out of thin air.  It was just
 1GB is always too huge for anybody, so let's cut it in half and
 declare that value the initial version of a sane threashold,
 nothing more.

 So it might be that the problem is 512MiB is still too big, relative
 to the 16MiB of delta base cache, and the former may be what needs
 to be tweaked.  If a blob close to but below 512MiB is a problem for
 16MiB delta base cache, it would still be too big to cause the same
 problem for 128MiB delta base cache---it would evict all the other
 objects and then end up not being able to fit in the limit itself,
 busting the limit immediately, no?

 I would understand if the change were to update the definition of
 deltaBaseCacheLimit and link it to the value of bigFileThreashold,
 for example.  With the presented discussion, I am still not sure if
 we can say that bumping deltaBaseCacheLimit is the right solution to
 the description with the current setting is clearly wrong (which
 is a real issue).

I vote make big_file_threshold smaller. 512MB is already unfriendly
for many smaller machines. I'm thinking somewhere around 32MB-64MB
(and maybe increase delta cache base limit to match). The only
downside I see is large blobs will be packed  undeltified, which could
increase pack size if you have lots of them. But maybe we could
improve pack-objects/repack/gc to deltify large blobs anyway if
they're old enough.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html