RE: [PATCH v3 0/7] allocate cache entries from memory pool
> > > We debated several approaches for what to do here > > it would be awesome if the list could participate in the discussion even if > only > read-only. A bit of delay in my response here, but I like the suggestion. here is a summary of some approaches I considered: 1) Do not include any information about where the cache_entry was allocated. Pros: No extra memory overhead Cons: Caller is responsible for freeing the cache entry correctly This was our initial approach. We were hoping that all cache entries could be allocated from a memory pool, and we would not have to worry about non-pool allocated entries. There are still a couple of places that need "temporary" cache entries at the moment, so we couldn't move completely to only memory pool allocated cache_entries. This would have resulted in the code allocating many "temporary" cache_entries from a pool, and the memory would not be eligible to be reclaimed until the entire memory pool was freed - and this was a tradeoff we didn't want to make. 2) Include an extra field encoding whether the cache_entry was allocated from a memory pool Pro: the discard function can now make a decision regarding how to free the cache_entry Con: each cache entry grows by an extra int field. The bits in the existing ce_flags field are all claimed, so we need an extra field to track this bit of information. We could claim just a bit in the field now, which would result in the cache_entry still growing by the same amount, but any future bits would not require extra space. This pushes off the work for an actual bit field off into future work. 3) Encode whether the cache_entry was allocated from a memory pool as a bit in a field. Pro: only a single bit is required to encode this information. Con: All the existings bits in the existing ce_flags field are claimed. We need to add an extra bit field and take the same memory growth. I considered this approach (and am still open to it), but I went for a simpler initial approach to make sure the overall change is acceptable. There is no difference in the memory footprint with this change, so it is only to enable future changes more easily. 4) Include pointer to the memory pool from which this cache_entry was created from Pro: Could (potentially) do some extra bookkeeping, such as automatically cleaning up the memory_pool when all allocated cache_entries are freed. Con: extra complexity, larger growth to cache_entry struct to accommodate mem_pool pointer In the end, we didn't see a tangible benefit to this option at this point. Given the tradeoffs, I went with option #2 for now.
Re: [PATCH v3 0/7] allocate cache entries from memory pool
> > The memory pool design makes some tradeoffs. It is not meant to > be completely replace malloc / free as a general purpose > allocator, but rather used in scenarios where the benefit (faster > allocations, lower bookkeeping overhead) is worth the > tradeoffs (not able to free individual allocations). So this is the actual stated design goal of this memory pool? Fast allocation with little overhead for giving up individual frees? > We debated several approaches for what to do here it would be awesome if the list could participate in the discussion even if only read-only.
Re: [PATCH v3 0/7] allocate cache entries from memory pool
On Wed, May 23, 2018 at 7:47 AM, Jameson Millerwrote: > Changes from V2: > > - Tweak logic of finding available memory block for memory > allocation > > - Only search head block > > - Tweaked handling of large memory allocations. > > - Large blocks now tracked in same manner as "regular" > blocks > > - Large blocks are placed at end of linked list of memory > blocks > > - Cache_entry type gains notion of whether it was allocated > from memory pool or not > > - Collapsed cache_entry discard logic into single > function. This should make code easier to maintain > > - Small tweaks based on V1 feedback > > Base Ref: master > Web-Diff: g...@github.com:jamill/git.git/commit/d608515f9e Unrelated to this series, but to the tool you use to generate the cover-letter: I think the Web-Diff only works when you give the http[s] address, git@ won't work here?
RE: [PATCH v3 0/7] allocate cache entries from memory pool
> -Original Message- > From: Junio C Hamano <jch2...@gmail.com> On Behalf Of Junio C Hamano > Sent: Thursday, May 24, 2018 12:55 AM > To: Jameson Miller <jam...@microsoft.com> > Cc: git@vger.kernel.org; pclo...@gmail.com; jonathanta...@google.com; > sbel...@google.com; peart...@gmail.com > Subject: Re: [PATCH v3 0/7] allocate cache entries from memory pool > > Jameson Miller <jam...@microsoft.com> writes: > > > Changes from V2: > > > > - Tweak logic of finding available memory block for memory > > allocation > > > > - Only search head block > > Hmph. Is that because we generally do not free() a lot so once a block is > filled, > there is not much chance that we have reclaimed space in the block later? > The design of the memory pool is that once the memory is claimed from the pool, it is not reused until the containing pool is discarded. Individual entries are not freed, only the entire memory pool is freed, and only after we are sure that there are no references to any of the entries in the pool. The memory pool design makes some tradeoffs. It is not meant to be completely replace malloc / free as a general purpose allocator, but rather used in scenarios where the benefit (faster allocations, lower bookkeeping overhead) is worth the tradeoffs (not able to free individual allocations). The access patterns around cache entries are well matched with the memory pool to get the benefits - the majority of cache entries are allocated up front when reading the index from disk, and are then discarded in bulk when the index is freed (if the index is freed at all (rather than just existing)). > > - Tweaked handling of large memory allocations. > > > > - Large blocks now tracked in same manner as "regular" > > blocks > > > > - Large blocks are placed at end of linked list of memory > > blocks > > If we are only carving out of the most recently allocated block, it seems that > there is no point looking for "the end", no? Right. If we are not searching the list, then there isn't any point in Appending odd large items to the end vs sticking it immediately past the head block. I will remove the usage of the tail pointer in the next version. Yes, this is true. I can remove the usage of the tail pointer here, as it is not really leveraged. I will make this change in the next version. > > > > - Cache_entry type gains notion of whether it was allocated > > from memory pool or not > > > > - Collapsed cache_entry discard logic into single > > function. This should make code easier to maintain > > That certainly should be safer to have a back-pointer pointing to which pool > each entry came from, but doesn't it result in memory bloat? Currently, entries claimed from a memory pool are not freed, so we only need to know whether the entry came from a memory pool or not. This has less memory impact than a full pointer but is also a bit more restrictive. We debated several approaches for what to do here and landed on using a simple bit for this rather than the full pointer. In the current code we use a full integer field for this, but we can convert this into a bit or bit field. The current flags word is full, so this would require a second flags field.
Re: [PATCH v3 0/7] allocate cache entries from memory pool
Jameson Millerwrites: > Changes from V2: > > - Tweak logic of finding available memory block for memory > allocation > > - Only search head block Hmph. Is that because we generally do not free() a lot so once a block is filled, there is not much chance that we have reclaimed space in the block later? > - Tweaked handling of large memory allocations. > > - Large blocks now tracked in same manner as "regular" > blocks > > - Large blocks are placed at end of linked list of memory > blocks If we are only carving out of the most recently allocated block, it seems that there is no point looking for "the end", no? > - Cache_entry type gains notion of whether it was allocated > from memory pool or not > > - Collapsed cache_entry discard logic into single > function. This should make code easier to maintain That certainly should be safer to have a back-pointer pointing to which pool each entry came from, but doesn't it result in memory bloat?
[PATCH v3 0/7] allocate cache entries from memory pool
Changes from V2: - Tweak logic of finding available memory block for memory allocation - Only search head block - Tweaked handling of large memory allocations. - Large blocks now tracked in same manner as "regular" blocks - Large blocks are placed at end of linked list of memory blocks - Cache_entry type gains notion of whether it was allocated from memory pool or not - Collapsed cache_entry discard logic into single function. This should make code easier to maintain - Small tweaks based on V1 feedback Base Ref: master Web-Diff: g...@github.com:jamill/git.git/commit/d608515f9e Checkout: git fetch g...@github.com:jamill/git.git users/jamill/block_allocation-v3 && git checkout d608515f9e Jameson Miller (7): read-cache: teach refresh_cache_entry() to take istate block alloc: add lifecycle APIs for cache_entry structs mem-pool: only search head block for available space mem-pool: add lifecycle management functions mem-pool: fill out functionality block alloc: allocate cache entries from mem_pool block alloc: add validations around cache_entry lifecyle apply.c| 26 +++-- blame.c| 5 +- builtin/checkout.c | 8 +- builtin/difftool.c | 8 +- builtin/reset.c| 6 +- builtin/update-index.c | 26 +++-- cache.h| 53 +- fast-import.c | 2 +- git.c | 3 + mem-pool.c | 116 -- mem-pool.h | 25 - merge-recursive.c | 4 +- read-cache.c | 261 - resolve-undo.c | 6 +- split-index.c | 58 --- tree.c | 4 +- unpack-trees.c | 38 +++ 17 files changed, 514 insertions(+), 135 deletions(-) base-commit: ccdcbd54c4475c2238b310f7113ab3075b5abc9c -- 2.14.3