Re: [PATCH 00/13] object store: alloc
Hi Junio, On Mon, May 7, 2018 at 7:05 AM, Junio C Hamanowrote: > Stefan Beller writes: > >> This applies on top of sb/oid-object-info and is the logical continuum of >> the series that it builds on; this brings the object store into more of >> Gits code, removing global state, such that reasoning about the state of >> the in-memory representation of the repository is easier. > > I am not sure how well this topic is done, but I've queued the > following patch at the tip of the topic to make it compile after > getting merged to integration branches (curiously, the topic by > itself compiled file for whatever reason). Thanks for the fixup; I will include it with other fixes in a reroll. The investigation why it would not compile is not found in alloc.c but in 1da1580e4c2 (Makefile: detect compiler and enable more warnings in DEVELOPER=1, 2018-04-14), which enabled -Werror=missing-prototypes, that requires a prototype which is found in alloc.h Thanks, Stefan
Re: [PATCH 00/13] object store: alloc
Stefan Bellerwrites: > This applies on top of sb/oid-object-info and is the logical continuum of > the series that it builds on; this brings the object store into more of > Gits code, removing global state, such that reasoning about the state of > the in-memory representation of the repository is easier. I am not sure how well this topic is done, but I've queued the following patch at the tip of the topic to make it compile after getting merged to integration branches (curiously, the topic by itself compiled file for whatever reason). I think I haven't send that fixup patch out, so here it is. -- >8 -- From: Junio C Hamano Date: Wed, 2 May 2018 19:09:50 +0900 Subject: [PATCH] alloc.c: include alloc.h Otherwise the definition in alloc.c would not see the matching decl in alloc.h, triggering warning from compiler. Signed-off-by: Junio C Hamano --- alloc.c | 1 + 1 file changed, 1 insertion(+) diff --git a/alloc.c b/alloc.c index 66a3d07ba2..f47a67153b 100644 --- a/alloc.c +++ b/alloc.c @@ -16,6 +16,7 @@ #include "tree.h" #include "commit.h" #include "tag.h" +#include "alloc.h" #define BLOCKING 1024 -- 2.17.0-391-g1f1cddd558
Re: [PATCH 00/13] object store: alloc
On Wed, May 2, 2018 at 11:22 AM, Duy Nguyenwrote: > I think the two have quite different characteristics. alloc.c code is > driven by overhead. struct blob is only 24 bytes each and about 1/3 > the repo is blobs, and each malloc has 16 bytes overhead or so if I > remember correctly. struct cache_entry at minimum in 88 bytes so > relative overhead is not that a big deal (but sure reducing it is > still very nice). > > mem-pool is about allocation speed, I don't think so, given that we do a linear search in each block allocation. > but I think that's not a concern > for alloc.c because when we do full rev walk, I think I/O is always > the bottleneck (maybe object lookup as well). I don't see a good way > to have the one memory allocator that satisfyies both to be honest. By changing the allocation size of a block to be larger than 1024 entries in alloc. we should lessen the impact of management overhead, and then the mem pool can be more than feasible.
RE: [PATCH 00/13] object store: alloc
> -Original Message- > From: Duy Nguyen <pclo...@gmail.com> > Sent: Wednesday, May 2, 2018 2:23 PM > To: Jameson Miller <jam...@microsoft.com> > Cc: Stefan Beller <sbel...@google.com>; Git Mailing List <git@vger.kernel.org> > Subject: Re: [PATCH 00/13] object store: alloc > > On Wed, May 2, 2018 at 8:07 PM, Jameson Miller <jam...@microsoft.com> > wrote: > > > > > >> -Original Message- > >> From: Duy Nguyen <pclo...@gmail.com> > >> Sent: Wednesday, May 2, 2018 1:02 PM > >> To: Stefan Beller <sbel...@google.com> > >> Cc: Git Mailing List <git@vger.kernel.org>; Jameson Miller > >> <jam...@microsoft.com> > >> Subject: Re: [PATCH 00/13] object store: alloc > >> > >> On Tue, May 1, 2018 at 11:33 PM, Stefan Beller <sbel...@google.com> > wrote: > >> > I also debated if it is worth converting alloc.c via this patch > >> > series or if it might make more sense to use the new mem-pool by > Jameson[1]. > >> > > >> > I vaguely wonder about the performance impact, as the object > >> > allocation code seemed to be relevant in the past. > >> > >> If I remember correctly, alloc.c was added because malloc() has too > >> high overhead per allocation (and we create like millions of them). > >> As long as you keep allocation overhead low, it should be ok. Note > >> that we allocate a lot more than the mem-pool's main target (cache > >> entries if I remember correctly). We may have a couple thousands > >> cache entries. We already deal with a couple million of struct object. > > > > The work to move cache entry allocation onto a memory pool was > > motivated by the fact that we are looking at indexes with millions of > > entries. If there is scaling concern with the current version of > > mem-pool, we would like to address it there as well. Or if there is > improvements that can be shared, that would be nice as well. > > I think the two have quite different characteristics. alloc.c code is driven > by > overhead. struct blob is only 24 bytes each and about 1/3 the repo is blobs, > and > each malloc has 16 bytes overhead or so if I remember correctly. struct > cache_entry at minimum in 88 bytes so relative overhead is not that a big deal > (but sure reducing it is still very nice). > > mem-pool is about allocation speed, but I think that's not a concern for > alloc.c > because when we do full rev walk, I think I/O is always the bottleneck (maybe > object lookup as well). I don't see a good way to have the one memory > allocator > that satisfyies both to be honest. If you could allocate fixed-size cache > entries > most of the time (e.g. > larger entries will be allocated using malloc or something, and should be a > small > number), then perhaps we can unify. Or if mem-pool can have an option to > allocated fixed size pieces with no overhead, that would be great (sorry I > still > have not read that mem-pool series yet) > -- > Duy Thank you for the extra details - the extra context was helpful - especially the motivations for each of the areas. I agree with your general analysis, but with the extra point that the memory pool does allocate memory (variable sized) without any overhead, except for possible alignment considerations and differences in bookkeeping the larger "blocks" of memory from which small allocations are made from - but I don't think this would be enough to have a meaningful overall impact. The mem-pool only tracks the pointer to the next available bit of memory, and the end of the available memory range. It has a similar constraint in that individual allocations cannot be freed - you have to free the whole block. It may be that the requirements are different enough (or the gains worth it) to have another dedicated pooling allocator, but I think the current design of the memory pool would satisfy both consumers, even if the memory considerations are a bigger motivation for blob structs. I would be interested in your thoughts if you get the opportunity to read the mem-pool series.
Re: [PATCH 00/13] object store: alloc
On Wed, May 2, 2018 at 8:07 PM, Jameson Miller <jam...@microsoft.com> wrote: > > >> -Original Message- >> From: Duy Nguyen <pclo...@gmail.com> >> Sent: Wednesday, May 2, 2018 1:02 PM >> To: Stefan Beller <sbel...@google.com> >> Cc: Git Mailing List <git@vger.kernel.org>; Jameson Miller >> <jam...@microsoft.com> >> Subject: Re: [PATCH 00/13] object store: alloc >> >> On Tue, May 1, 2018 at 11:33 PM, Stefan Beller <sbel...@google.com> wrote: >> > I also debated if it is worth converting alloc.c via this patch series >> > or if it might make more sense to use the new mem-pool by Jameson[1]. >> > >> > I vaguely wonder about the performance impact, as the object >> > allocation code seemed to be relevant in the past. >> >> If I remember correctly, alloc.c was added because malloc() has too high >> overhead per allocation (and we create like millions of them). As long as you >> keep allocation overhead low, it should be ok. Note that we allocate a lot >> more >> than the mem-pool's main target (cache entries if I remember correctly). We >> may have a couple thousands cache entries. We already deal with a couple >> million of struct object. > > The work to move cache entry allocation onto a memory pool was motivated by > the fact that we are looking at indexes with millions of entries. If there is > scaling > concern with the current version of mem-pool, we would like to address it > there > as well. Or if there is improvements that can be shared, that would be nice > as well. I think the two have quite different characteristics. alloc.c code is driven by overhead. struct blob is only 24 bytes each and about 1/3 the repo is blobs, and each malloc has 16 bytes overhead or so if I remember correctly. struct cache_entry at minimum in 88 bytes so relative overhead is not that a big deal (but sure reducing it is still very nice). mem-pool is about allocation speed, but I think that's not a concern for alloc.c because when we do full rev walk, I think I/O is always the bottleneck (maybe object lookup as well). I don't see a good way to have the one memory allocator that satisfyies both to be honest. If you could allocate fixed-size cache entries most of the time (e.g. larger entries will be allocated using malloc or something, and should be a small number), then perhaps we can unify. Or if mem-pool can have an option to allocated fixed size pieces with no overhead, that would be great (sorry I still have not read that mem-pool series yet) -- Duy
RE: [PATCH 00/13] object store: alloc
> -Original Message- > From: Duy Nguyen <pclo...@gmail.com> > Sent: Wednesday, May 2, 2018 1:02 PM > To: Stefan Beller <sbel...@google.com> > Cc: Git Mailing List <git@vger.kernel.org>; Jameson Miller > <jam...@microsoft.com> > Subject: Re: [PATCH 00/13] object store: alloc > > On Tue, May 1, 2018 at 11:33 PM, Stefan Beller <sbel...@google.com> wrote: > > I also debated if it is worth converting alloc.c via this patch series > > or if it might make more sense to use the new mem-pool by Jameson[1]. > > > > I vaguely wonder about the performance impact, as the object > > allocation code seemed to be relevant in the past. > > If I remember correctly, alloc.c was added because malloc() has too high > overhead per allocation (and we create like millions of them). As long as you > keep allocation overhead low, it should be ok. Note that we allocate a lot > more > than the mem-pool's main target (cache entries if I remember correctly). We > may have a couple thousands cache entries. We already deal with a couple > million of struct object. The work to move cache entry allocation onto a memory pool was motivated by the fact that we are looking at indexes with millions of entries. If there is scaling concern with the current version of mem-pool, we would like to address it there as well. Or if there is improvements that can be shared, that would be nice as well. > -- > Duy
Re: [PATCH 00/13] object store: alloc
On Tue, May 1, 2018 at 11:33 PM, Stefan Bellerwrote: > I also debated if it is worth converting alloc.c via this patch series > or if it might make more sense to use the new mem-pool by Jameson[1]. > > I vaguely wonder about the performance impact, as the object allocation > code seemed to be relevant in the past. If I remember correctly, alloc.c was added because malloc() has too high overhead per allocation (and we create like millions of them). As long as you keep allocation overhead low, it should be ok. Note that we allocate a lot more than the mem-pool's main target (cache entries if I remember correctly). We may have a couple thousands cache entries. We already deal with a couple million of struct object. -- Duy
[PATCH 00/13] object store: alloc
This applies on top of sb/oid-object-info and is the logical continuum of the series that it builds on; this brings the object store into more of Gits code, removing global state, such that reasoning about the state of the in-memory representation of the repository is easier. My original plan was to convert lookup_commit_graft as the next series, which would be similar to lookup_replace_object, as in sb/object-store-replace. The grafts and shallow mechanism are very close to each other, such that they need to be converted at the same time, both depending on the "parsed object store" that is introduced in this commit. The next series will then convert code in {object/blob/tree/commit/tag}.c hopefully finishing the lookup_* functions. I also debated if it is worth converting alloc.c via this patch series or if it might make more sense to use the new mem-pool by Jameson[1]. I vaguely wonder about the performance impact, as the object allocation code seemed to be relevant in the past. [1] https://public-inbox.org/git/20180430153122.243976-1-jam...@microsoft.com/ Any comments welcome, Thanks, Stefan Jonathan Nieder (1): object: add repository argument to grow_object_hash Stefan Beller (12): repository: introduce object parser field object: add repository argument to create_object alloc: add repository argument to alloc_blob_node alloc: add repository argument to alloc_tree_node alloc: add repository argument to alloc_commit_node alloc: add repository argument to alloc_tag_node alloc: add repository argument to alloc_object_node alloc: add repository argument to alloc_report alloc: add repository argument to alloc_commit_index object: allow grow_object_hash to handle arbitrary repositories object: allow create_object to handle arbitrary repositories alloc: allow arbitrary repositories for alloc functions alloc.c | 69 +++ alloc.h | 21 +++ blame.c | 3 +- blob.c| 5 ++- cache.h | 9 - commit.c | 4 +- merge-recursive.c | 3 +- object.c | 93 +-- object.h | 18 - repository.c | 7 repository.h | 11 +- tag.c | 4 +- tree.c| 4 +- 13 files changed, 182 insertions(+), 69 deletions(-) create mode 100644 alloc.h -- 2.17.0.441.gb46fe60e1d-goog