RE: [PATCH v3 0/7] allocate cache entries from memory pool

2018-06-20 Thread Jameson Miller
> 
> > We debated several approaches for what to do here
> 
> it would be awesome if the list could participate in the discussion even if 
> only
> read-only.

A bit of delay in my response here, but I like the suggestion. here is a 
summary of
some approaches I considered:

1) Do not include any information about where the cache_entry was allocated.
 Pros: No extra memory overhead
 Cons: Caller is responsible for freeing the cache entry correctly

This was our initial approach. We were hoping that all cache
entries could be allocated from a memory pool, and we would not
have to worry about non-pool allocated entries. There are still a
couple of places that need "temporary" cache entries at the
moment, so we couldn't move completely to only memory pool
allocated cache_entries. This would have resulted in the code
allocating many "temporary" cache_entries from a pool, and the
memory would not be eligible to be reclaimed until the entire
memory pool was freed - and this was a tradeoff we didn't want to
make.

2) Include an extra field encoding whether the cache_entry was
   allocated from a memory pool

Pro: the discard function can now make a decision regarding how
 to free the cache_entry
Con: each cache entry grows by an extra int field.

The bits in the existing ce_flags field are all claimed, so we
need an extra field to track this bit of information. We could
claim just a bit in the field now, which would result in the
cache_entry still growing by the same amount, but any future bits
would not require extra space. This pushes off the work for an
actual bit field off into future work.

3) Encode whether the cache_entry was allocated from a memory
   pool as a bit in a field.

Pro: only a single bit is required to encode this information.
Con: All the existings bits in the existing ce_flags field are
 claimed. We need to add an extra bit field and take the same
 memory growth.

I considered this approach (and am still open to it), but I went
for a simpler initial approach to make sure the overall change is
acceptable. There is no difference in the memory footprint with
this change, so it is only to enable future changes more easily.

4) Include pointer to the memory pool from which this cache_entry
   was created from

Pro: Could (potentially) do some extra bookkeeping, such as
 automatically cleaning up the memory_pool when all
 allocated cache_entries are freed.
Con: extra complexity, larger growth to cache_entry struct to
 accommodate mem_pool pointer

In the end, we didn't see a tangible benefit to this option at this point.

Given the tradeoffs, I went with option #2 for now.


Re: [PATCH v3 0/7] allocate cache entries from memory pool

2018-05-25 Thread Stefan Beller
>
> The memory pool design makes some tradeoffs. It is not meant to
> be completely replace malloc / free as a general purpose
> allocator, but rather used in scenarios where the benefit (faster
> allocations, lower bookkeeping overhead) is worth the
> tradeoffs (not able to free individual allocations).

So this is the actual stated design goal of this memory pool?
Fast allocation with little overhead for giving up individual frees?

> We debated several approaches for what to do here

it would be awesome if the list could participate in the discussion
even if only read-only.


Re: [PATCH v3 0/7] allocate cache entries from memory pool

2018-05-25 Thread Stefan Beller
On Wed, May 23, 2018 at 7:47 AM, Jameson Miller  wrote:
> Changes from V2:
>
> - Tweak logic of finding available memory block for memory
>   allocation
>
>   - Only search head block
>
> - Tweaked handling of large memory allocations.
>
>   - Large blocks now tracked in same manner as "regular"
> blocks
>
>   - Large blocks are placed at end of linked list of memory
> blocks
>
> - Cache_entry type gains notion of whether it was allocated
>   from memory pool or not
>
>   - Collapsed cache_entry discard logic into single
> function. This should make code easier to maintain
>
> - Small tweaks based on V1 feedback
>
> Base Ref: master
> Web-Diff: g...@github.com:jamill/git.git/commit/d608515f9e

Unrelated to this series, but to the tool you use to generate the cover-letter:
I think the Web-Diff only works when you give the http[s] address, git@
won't work here?


RE: [PATCH v3 0/7] allocate cache entries from memory pool

2018-05-24 Thread Jameson Miller


> -Original Message-
> From: Junio C Hamano <jch2...@gmail.com> On Behalf Of Junio C Hamano
> Sent: Thursday, May 24, 2018 12:55 AM
> To: Jameson Miller <jam...@microsoft.com>
> Cc: git@vger.kernel.org; pclo...@gmail.com; jonathanta...@google.com;
> sbel...@google.com; peart...@gmail.com
> Subject: Re: [PATCH v3 0/7] allocate cache entries from memory pool
> 
> Jameson Miller <jam...@microsoft.com> writes:
> 
> > Changes from V2:
> >
> > - Tweak logic of finding available memory block for memory
> >   allocation
> >
> >   - Only search head block
> 
> Hmph.  Is that because we generally do not free() a lot so once a block is 
> filled,
> there is not much chance that we have reclaimed space in the block later?
> 

The design of the memory pool is that once the memory is
claimed from the pool, it is not reused until the
containing pool is discarded. Individual entries are not
freed, only the entire memory pool is freed, and only after we
are sure that there are no references to any of the entries in the
pool.

The memory pool design makes some tradeoffs. It is not meant to
be completely replace malloc / free as a general purpose
allocator, but rather used in scenarios where the benefit (faster
allocations, lower bookkeeping overhead) is worth the
tradeoffs (not able to free individual allocations). The access
patterns around cache entries are well matched with the memory
pool to get the benefits - the majority of cache entries are
allocated up front when reading the index from disk, and are then
discarded in bulk when the index is freed (if the index is freed
at all (rather than just existing)).

> > - Tweaked handling of large memory allocations.
> >
> >   - Large blocks now tracked in same manner as "regular"
> > blocks
> >
> >   - Large blocks are placed at end of linked list of memory
> > blocks
> 
> If we are only carving out of the most recently allocated block, it seems that
> there is no point looking for "the end", no?

Right. If we are not searching the list, then there isn't any point in
Appending odd large items to the end vs sticking it immediately past
the head block. I will remove the usage of the tail pointer in the
next version.

Yes, this is true. I can remove the usage of the tail pointer here,
as it is not really leveraged. I will make this change in the next version.

> 
> 
> > - Cache_entry type gains notion of whether it was allocated
> >   from memory pool or not
> >
> >   - Collapsed cache_entry discard logic into single
> > function. This should make code easier to maintain
> 
> That certainly should be safer to have a back-pointer pointing to which pool
> each entry came from, but doesn't it result in memory bloat?

Currently, entries claimed from a memory pool are not freed, so we only need
to know whether the entry came from a memory pool or not. This has less memory 
impact than a full pointer but is also a bit more restrictive.

We debated several approaches for what to do here and landed on using a simple 
bit
for this rather than the full pointer. In the current code we use a full 
integer field for this, but
we can convert this into a bit or bit field. The current flags word is full, so 
this would require
a second flags field.



Re: [PATCH v3 0/7] allocate cache entries from memory pool

2018-05-23 Thread Junio C Hamano
Jameson Miller  writes:

> Changes from V2:
>
>   - Tweak logic of finding available memory block for memory
>   allocation
>   
> - Only search head block

Hmph.  Is that because we generally do not free() a lot so once a
block is filled, there is not much chance that we have reclaimed
space in the block later?

>   - Tweaked handling of large memory allocations.
>   
> - Large blocks now tracked in same manner as "regular"
> blocks
> 
> - Large blocks are placed at end of linked list of memory
> blocks

If we are only carving out of the most recently allocated block, it
seems that there is no point looking for "the end", no?


>   - Cache_entry type gains notion of whether it was allocated
>   from memory pool or not
>   
> - Collapsed cache_entry discard logic into single
> function. This should make code easier to maintain

That certainly should be safer to have a back-pointer pointing to
which pool each entry came from, but doesn't it result in memory
bloat?


[PATCH v3 0/7] allocate cache entries from memory pool

2018-05-23 Thread Jameson Miller
Changes from V2:

- Tweak logic of finding available memory block for memory
  allocation

  - Only search head block
  
- Tweaked handling of large memory allocations.

  - Large blocks now tracked in same manner as "regular"
blocks
  
  - Large blocks are placed at end of linked list of memory
blocks

- Cache_entry type gains notion of whether it was allocated
  from memory pool or not

  - Collapsed cache_entry discard logic into single
function. This should make code easier to maintain

- Small tweaks based on V1 feedback

Base Ref: master
Web-Diff: g...@github.com:jamill/git.git/commit/d608515f9e
Checkout: git fetch g...@github.com:jamill/git.git 
users/jamill/block_allocation-v3 && git checkout d608515f9e


Jameson Miller (7):
  read-cache: teach refresh_cache_entry() to take istate
  block alloc: add lifecycle APIs for cache_entry structs
  mem-pool: only search head block for available space
  mem-pool: add lifecycle management functions
  mem-pool: fill out functionality
  block alloc: allocate cache entries from mem_pool
  block alloc: add validations around cache_entry lifecyle

 apply.c|  26 +++--
 blame.c|   5 +-
 builtin/checkout.c |   8 +-
 builtin/difftool.c |   8 +-
 builtin/reset.c|   6 +-
 builtin/update-index.c |  26 +++--
 cache.h|  53 +-
 fast-import.c  |   2 +-
 git.c  |   3 +
 mem-pool.c | 116 --
 mem-pool.h |  25 -
 merge-recursive.c  |   4 +-
 read-cache.c   | 261 -
 resolve-undo.c |   6 +-
 split-index.c  |  58 ---
 tree.c |   4 +-
 unpack-trees.c |  38 +++
 17 files changed, 514 insertions(+), 135 deletions(-)


base-commit: ccdcbd54c4475c2238b310f7113ab3075b5abc9c
-- 
2.14.3