Re: [PATCH v2] Bump core.deltaBaseCacheLimit to 128MiB
Junio C Hamano gits...@pobox.com writes: David Kastrup d...@gnu.org writes: Junio C Hamano gits...@pobox.com writes: Duy Nguyen pclo...@gmail.com writes: The only downside I see is large blobs will be packed undeltified, which could increase pack size if you have lots of them. I think that is something that can be tweaked, unless the user tells us otherwise via command line override, when running the improved gc --aggressive ;-) deltaBaseCacheLimit is used for unpacking, not for packing. Hmm, doesn't packing need to read existing data? Judging from the frequent out-of-memory conditions of git gc --aggressive, packing is not restrained by deltaBaseCacheLimit. -- David Kastrup -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Bump core.deltaBaseCacheLimit to 128MiB
Jeff King p...@peff.net writes: On Wed, Mar 19, 2014 at 01:38:32PM +0100, David Kastrup wrote: The default of 16MiB causes serious thrashing for large delta chains combined with large files. Does it make much sense to bump this without also bumping MAX_DELTA_CACHE in sha1_file.c? In my measurements of linux.git, bumping the memory limit did not help much without also bumping the number of slots. In the cases I checked, bumping MAX_DELTA_CACHE did not help much. Bumping it once to 512 could be a slight improvement; larger values then caused performance to regress. I guess that just bumping the memory limit would help with repos which have deltas on large-ish files (whereas the kernel just has a lot of deltas on a lot of little directories), Well, those were the most pathological for git-blame so I was somewhat focused on them. but I'd be curious how much. http://repo.or.cz/r/wortliste.git consists basically of adding lines to a single alphabetically sorted file wortliste of currently size 15MB. -- David Kastrup -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Bump core.deltaBaseCacheLimit to 128MiB
On Fri, Mar 21, 2014 at 1:04 PM, David Kastrup d...@gnu.org wrote: Hmm, doesn't packing need to read existing data? Judging from the frequent out-of-memory conditions of git gc --aggressive, packing is not restrained by deltaBaseCacheLimit. pack-objects memory usage is more controlled by pack.windowmemory, which defaults to unlimited. Does it make sense to default to half physical memory or something? -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Bump core.deltaBaseCacheLimit to 128MiB
Junio C Hamano gits...@pobox.com writes: I know that the 512MiB default for the bitFileThreashold (aka forget about delta compression) came out of thin air. It was just 1GB is always too huge for anybody, so let's cut it in half and declare that value the initial version of a sane threashold, nothing more. So it might be that the problem is 512MiB is still too big, relative to the 16MiB of delta base cache, and the former may be what needs to be tweaked. Well, the point with the 512MiB limit is basically that you can always argue if you are working with 500MiB files, you will not be using just 128MiB of main memory anyway. The 16MiB limit, in contrast, will get utilized for basically every history, even one of small files, that's deep enough. If a blob close to but below 512MiB is a problem for 16MiB delta base cache, it would still be too big to cause the same problem for 128MiB delta base cache---it would evict all the other objects and then end up not being able to fit in the limit itself, busting the limit immediately, no? I think, but may be mistaken, that the delta base limit decides when to flush whole blobs. So a 500MiB file would still be encoded relative to a single newer version anyway and take memory accordingly. But I am not sure about that at all. -- David Kastrup -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Bump core.deltaBaseCacheLimit to 128MiB
Jeff King p...@peff.net writes: If you have before-and-after numbers for just this patch on some repository, that would be an interesting thing to put in the commit message. It's a hen-and-egg problem regarding the benchmarks. The most impressive benchmarks arise with the git-blame performance work in place, and the most impressive benchmarks for the git-blame performance work are when this or something similar is in place. Of course, when there are two really deficient things slowing operations down, fixing only one is going to be much less impressive. So I decided to tackle the low-hanging fruit here first. But it would appear that this amounts in far too much work since it means I have to search for and create some _other_ benchmarking scenario not hampered by substandard code like the current git-blame is. I have enough on my plate as it is, so even though it puts the _real_ git-blame work in a worse light, I should rather get that finished first (nobody will argue to keep the useless threshing of it around). Of course, the person then creating the two-line change to deltaBaseCacheLimit will be able to claim much larger performance improvements in his commit message afterwards than what I can claim regarding the git-blame work when going first, but that's life. -- David Kastrup -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Bump core.deltaBaseCacheLimit to 128MiB
Duy Nguyen pclo...@gmail.com writes: On Thu, Mar 20, 2014 at 5:11 AM, Junio C Hamano gits...@pobox.com wrote: ... I know that the 512MiB default for the bitFileThreashold (aka forget about delta compression) came out of thin air. It was just 1GB is always too huge for anybody, so let's cut it in half and declare that value the initial version of a sane threashold, nothing more. So it might be that the problem is 512MiB is still too big, relative to the 16MiB of delta base cache, and the former may be what needs to be tweaked. If a blob close to but below 512MiB is a problem for 16MiB delta base cache, it would still be too big to cause the same problem for 128MiB delta base cache---it would evict all the other objects and then end up not being able to fit in the limit itself, busting the limit immediately, no? I would understand if the change were to update the definition of deltaBaseCacheLimit and link it to the value of bigFileThreashold, for example. With the presented discussion, I am still not sure if we can say that bumping deltaBaseCacheLimit is the right solution to the description with the current setting is clearly wrong (which is a real issue). I vote make big_file_threshold smaller. 512MB is already unfriendly for many smaller machines. I'm thinking somewhere around 32MB-64MB (and maybe increase delta cache base limit to match). These numbers match my gut feeling (e.g. 4k*4k*32-bit uncompressed would be 64MB); delta cash base that is sized to the same as (or perhaps twice as big as) that limit may be a good default. The only downside I see is large blobs will be packed undeltified, which could increase pack size if you have lots of them. I think that is something that can be tweaked, unless the user tells us otherwise via command line override, when running the improved gc --aggressive ;-) -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Bump core.deltaBaseCacheLimit to 128MiB
Junio C Hamano gits...@pobox.com writes: Duy Nguyen pclo...@gmail.com writes: The only downside I see is large blobs will be packed undeltified, which could increase pack size if you have lots of them. I think that is something that can be tweaked, unless the user tells us otherwise via command line override, when running the improved gc --aggressive ;-) deltaBaseCacheLimit is used for unpacking, not for packing. -- David Kastrup -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Bump core.deltaBaseCacheLimit to 128MiB
David Kastrup d...@gnu.org writes: Junio C Hamano gits...@pobox.com writes: Duy Nguyen pclo...@gmail.com writes: The only downside I see is large blobs will be packed undeltified, which could increase pack size if you have lots of them. I think that is something that can be tweaked, unless the user tells us otherwise via command line override, when running the improved gc --aggressive ;-) deltaBaseCacheLimit is used for unpacking, not for packing. Hmm, doesn't packing need to read existing data? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Bump core.deltaBaseCacheLimit to 128MiB
On Wed, Mar 19, 2014 at 01:38:32PM +0100, David Kastrup wrote: The default of 16MiB causes serious thrashing for large delta chains combined with large files. Does it make much sense to bump this without also bumping MAX_DELTA_CACHE in sha1_file.c? In my measurements of linux.git, bumping the memory limit did not help much without also bumping the number of slots. I guess that just bumping the memory limit would help with repos which have deltas on large-ish files (whereas the kernel just has a lot of deltas on a lot of little directories), but I'd be curious how much. If you have before-and-after numbers for just this patch on some repository, that would be an interesting thing to put in the commit message. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] Bump core.deltaBaseCacheLimit to 128MiB
The default of 16MiB causes serious thrashing for large delta chains combined with large files. Signed-off-by: David Kastrup d...@gnu.org --- Forgot the signoff. For the rationale of this patch and the 128MiB choice, see the original patch. Documentation/config.txt | 2 +- environment.c| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 73c8973..1b6950a 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -484,7 +484,7 @@ core.deltaBaseCacheLimit:: to avoid unpacking and decompressing frequently used base objects multiple times. + -Default is 16 MiB on all platforms. This should be reasonable +Default is 128 MiB on all platforms. This should be reasonable for all users/operating systems, except on the largest projects. You probably do not need to adjust this value. + diff --git a/environment.c b/environment.c index c3c8606..73ed670 100644 --- a/environment.c +++ b/environment.c @@ -37,7 +37,7 @@ int core_compression_seen; int fsync_object_files; size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE; size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT; -size_t delta_base_cache_limit = 16 * 1024 * 1024; +size_t delta_base_cache_limit = 128 * 1024 * 1024; unsigned long big_file_threshold = 512 * 1024 * 1024; const char *pager_program; int pager_use_color = 1; -- 1.8.3.2 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Bump core.deltaBaseCacheLimit to 128MiB
David Kastrup d...@gnu.org writes: The default of 16MiB causes serious thrashing for large delta chains combined with large files. Signed-off-by: David Kastrup d...@gnu.org --- Is that a good argument? Wouldn't the default of 128MiB burden smaller machines with bloated processes? Forgot the signoff. For the rationale of this patch and the 128MiB choice, see the original patch. See the original patch, especially written after three-dash lines without a reference, will not help future readers of git log who later bisects to find that this change hurt their usage and want to see why it was done unconditionally (as opposed to encouraging those who benefit from this change to configure their Git to use larger value for them, without hurting others). While I can personally afford 128MiB, I do *not* think 16MiB was chosen more scientifically than the choice of 128MiB this change proposes to make, and my gut feeling is that this would not have too big a negative impact to anybody, I would prefer to have a reason better than gut feeling before accepting a default change. Thanks. Documentation/config.txt | 2 +- environment.c| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 73c8973..1b6950a 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -484,7 +484,7 @@ core.deltaBaseCacheLimit:: to avoid unpacking and decompressing frequently used base objects multiple times. + -Default is 16 MiB on all platforms. This should be reasonable +Default is 128 MiB on all platforms. This should be reasonable for all users/operating systems, except on the largest projects. You probably do not need to adjust this value. + diff --git a/environment.c b/environment.c index c3c8606..73ed670 100644 --- a/environment.c +++ b/environment.c @@ -37,7 +37,7 @@ int core_compression_seen; int fsync_object_files; size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE; size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT; -size_t delta_base_cache_limit = 16 * 1024 * 1024; +size_t delta_base_cache_limit = 128 * 1024 * 1024; unsigned long big_file_threshold = 512 * 1024 * 1024; const char *pager_program; int pager_use_color = 1; -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Bump core.deltaBaseCacheLimit to 128MiB
Junio C Hamano gits...@pobox.com writes: David Kastrup d...@gnu.org writes: The default of 16MiB causes serious thrashing for large delta chains combined with large files. Signed-off-by: David Kastrup d...@gnu.org --- Is that a good argument? Wouldn't the default of 128MiB burden smaller machines with bloated processes? The default file size before Git forgets about delta compression is 512MiB. Unpacking 500MiB files with 16MiB of delta storage is going to be uglier. Forgot the signoff. For the rationale of this patch and the 128MiB choice, see the original patch. See the original patch, especially written after three-dash lines without a reference, will not help future readers of git log who later bisects to find that this change hurt their usage and want to see why it was done unconditionally (as opposed to encouraging those who benefit from this change to configure their Git to use larger value for them, without hurting others). Documentation/config.txt states: core.deltaBaseCacheLimit:: Maximum number of bytes to reserve for caching base objects that may be referenced by multiple deltified objects. By storing the entire decompressed base objects in a cache Git is able to avoid unpacking and decompressing frequently used base objects multiple times. + Default is 16 MiB on all platforms. This should be reasonable for all users/operating systems, except on the largest projects. You probably do not need to adjust this value. I've seen this seriously screwing performance in several projects of mine that don't really count as largest projects. So the description in combination with the current setting is clearly wrong. While I can personally afford 128MiB, I do *not* think 16MiB was chosen more scientifically than the choice of 128MiB this change proposes to make, and my gut feeling is that this would not have too big a negative impact to anybody, I would prefer to have a reason better than gut feeling before accepting a default change. Shrug. I've set my private default anyway, so I don't have anything to gain from this change. If anybody wants to pick this up and put more science into the exact value: be my guest. -- David Kastrup -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Bump core.deltaBaseCacheLimit to 128MiB
David Kastrup d...@gnu.org writes: Junio C Hamano gits...@pobox.com writes: David Kastrup d...@gnu.org writes: The default of 16MiB causes serious thrashing for large delta chains combined with large files. Signed-off-by: David Kastrup d...@gnu.org --- Is that a good argument? Wouldn't the default of 128MiB burden smaller machines with bloated processes? The default file size before Git forgets about delta compression is 512MiB. Unpacking 500MiB files with 16MiB of delta storage is going to be uglier. ... Documentation/config.txt states: core.deltaBaseCacheLimit:: Maximum number of bytes to reserve for caching base objects that may be referenced by multiple deltified objects. By storing the entire decompressed base objects in a cache Git is able to avoid unpacking and decompressing frequently used base objects multiple times. + Default is 16 MiB on all platforms. This should be reasonable for all users/operating systems, except on the largest projects. You probably do not need to adjust this value. I've seen this seriously screwing performance in several projects of mine that don't really count as largest projects. So the description in combination with the current setting is clearly wrong. That is a good material for proposed log message, and I think you are onto something here. I know that the 512MiB default for the bitFileThreashold (aka forget about delta compression) came out of thin air. It was just 1GB is always too huge for anybody, so let's cut it in half and declare that value the initial version of a sane threashold, nothing more. So it might be that the problem is 512MiB is still too big, relative to the 16MiB of delta base cache, and the former may be what needs to be tweaked. If a blob close to but below 512MiB is a problem for 16MiB delta base cache, it would still be too big to cause the same problem for 128MiB delta base cache---it would evict all the other objects and then end up not being able to fit in the limit itself, busting the limit immediately, no? I would understand if the change were to update the definition of deltaBaseCacheLimit and link it to the value of bigFileThreashold, for example. With the presented discussion, I am still not sure if we can say that bumping deltaBaseCacheLimit is the right solution to the description with the current setting is clearly wrong (which is a real issue). -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Bump core.deltaBaseCacheLimit to 128MiB
On Thu, Mar 20, 2014 at 5:11 AM, Junio C Hamano gits...@pobox.com wrote: David Kastrup d...@gnu.org writes: Junio C Hamano gits...@pobox.com writes: David Kastrup d...@gnu.org writes: The default of 16MiB causes serious thrashing for large delta chains combined with large files. Signed-off-by: David Kastrup d...@gnu.org --- Is that a good argument? Wouldn't the default of 128MiB burden smaller machines with bloated processes? The default file size before Git forgets about delta compression is 512MiB. Unpacking 500MiB files with 16MiB of delta storage is going to be uglier. ... Documentation/config.txt states: core.deltaBaseCacheLimit:: Maximum number of bytes to reserve for caching base objects that may be referenced by multiple deltified objects. By storing the entire decompressed base objects in a cache Git is able to avoid unpacking and decompressing frequently used base objects multiple times. + Default is 16 MiB on all platforms. This should be reasonable for all users/operating systems, except on the largest projects. You probably do not need to adjust this value. I've seen this seriously screwing performance in several projects of mine that don't really count as largest projects. So the description in combination with the current setting is clearly wrong. That is a good material for proposed log message, and I think you are onto something here. I know that the 512MiB default for the bitFileThreashold (aka forget about delta compression) came out of thin air. It was just 1GB is always too huge for anybody, so let's cut it in half and declare that value the initial version of a sane threashold, nothing more. So it might be that the problem is 512MiB is still too big, relative to the 16MiB of delta base cache, and the former may be what needs to be tweaked. If a blob close to but below 512MiB is a problem for 16MiB delta base cache, it would still be too big to cause the same problem for 128MiB delta base cache---it would evict all the other objects and then end up not being able to fit in the limit itself, busting the limit immediately, no? I would understand if the change were to update the definition of deltaBaseCacheLimit and link it to the value of bigFileThreashold, for example. With the presented discussion, I am still not sure if we can say that bumping deltaBaseCacheLimit is the right solution to the description with the current setting is clearly wrong (which is a real issue). I vote make big_file_threshold smaller. 512MB is already unfriendly for many smaller machines. I'm thinking somewhere around 32MB-64MB (and maybe increase delta cache base limit to match). The only downside I see is large blobs will be packed undeltified, which could increase pack size if you have lots of them. But maybe we could improve pack-objects/repack/gc to deltify large blobs anyway if they're old enough. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html