Re: [PATCH 00/13] object store: alloc

2018-05-07 Thread Stefan Beller
Hi Junio,

On Mon, May 7, 2018 at 7:05 AM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> This applies on top of sb/oid-object-info and is the logical continuum of
>> the series that it builds on; this brings the object store into more of
>> Gits code, removing global state, such that reasoning about the state of
>> the in-memory representation of the repository is easier.
>
> I am not sure how well this topic is done, but I've queued the
> following patch at the tip of the topic to make it compile after
> getting merged to integration branches (curiously, the topic by
> itself compiled file for whatever reason).

Thanks for the fixup; I will include it with other fixes in a reroll.

The investigation why it would not compile is not found in alloc.c
but in 1da1580e4c2 (Makefile: detect compiler and enable more
warnings in DEVELOPER=1, 2018-04-14), which enabled
-Werror=missing-prototypes, that requires a prototype which
is found in alloc.h

Thanks,
Stefan


Re: [PATCH 00/13] object store: alloc

2018-05-07 Thread Junio C Hamano
Stefan Beller  writes:

> This applies on top of sb/oid-object-info and is the logical continuum of
> the series that it builds on; this brings the object store into more of
> Gits code, removing global state, such that reasoning about the state of
> the in-memory representation of the repository is easier.

I am not sure how well this topic is done, but I've queued the
following patch at the tip of the topic to make it compile after
getting merged to integration branches (curiously, the topic by
itself compiled file for whatever reason). I think I haven't send
that fixup patch out, so here it is.

-- >8 --
From: Junio C Hamano 
Date: Wed, 2 May 2018 19:09:50 +0900
Subject: [PATCH] alloc.c: include alloc.h

Otherwise the definition in alloc.c would not see the matching decl
in alloc.h, triggering warning from compiler.

Signed-off-by: Junio C Hamano 
---
 alloc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/alloc.c b/alloc.c
index 66a3d07ba2..f47a67153b 100644
--- a/alloc.c
+++ b/alloc.c
@@ -16,6 +16,7 @@
 #include "tree.h"
 #include "commit.h"
 #include "tag.h"
+#include "alloc.h"
 
 #define BLOCKING 1024
 
-- 
2.17.0-391-g1f1cddd558



Re: [PATCH 00/13] object store: alloc

2018-05-03 Thread Stefan Beller
On Wed, May 2, 2018 at 11:22 AM, Duy Nguyen  wrote:

> I think the two have quite different characteristics. alloc.c code is
> driven by overhead. struct blob is only 24 bytes each and about 1/3
> the repo is blobs, and each malloc has 16 bytes overhead or so if I
> remember correctly. struct cache_entry at minimum in 88 bytes so
> relative overhead is not that a big deal (but sure reducing it is
> still very nice).
>
> mem-pool is about allocation speed,

I don't think so, given that we do a linear search in each block allocation.

> but I think that's not a concern
> for alloc.c because when we do full rev walk, I think I/O is always
> the bottleneck (maybe object lookup as well). I don't see a good way
> to have the one memory allocator that satisfyies both to be honest.

By changing the allocation size of a block to be larger than 1024 entries
in alloc. we should lessen the impact of management overhead, and then
the mem pool can be more than feasible.


RE: [PATCH 00/13] object store: alloc

2018-05-02 Thread Jameson Miller


> -Original Message-
> From: Duy Nguyen <pclo...@gmail.com>
> Sent: Wednesday, May 2, 2018 2:23 PM
> To: Jameson Miller <jam...@microsoft.com>
> Cc: Stefan Beller <sbel...@google.com>; Git Mailing List <git@vger.kernel.org>
> Subject: Re: [PATCH 00/13] object store: alloc
> 
> On Wed, May 2, 2018 at 8:07 PM, Jameson Miller <jam...@microsoft.com>
> wrote:
> >
> >
> >> -Original Message-
> >> From: Duy Nguyen <pclo...@gmail.com>
> >> Sent: Wednesday, May 2, 2018 1:02 PM
> >> To: Stefan Beller <sbel...@google.com>
> >> Cc: Git Mailing List <git@vger.kernel.org>; Jameson Miller
> >> <jam...@microsoft.com>
> >> Subject: Re: [PATCH 00/13] object store: alloc
> >>
> >> On Tue, May 1, 2018 at 11:33 PM, Stefan Beller <sbel...@google.com>
> wrote:
> >> > I also debated if it is worth converting alloc.c via this patch
> >> > series or if it might make more sense to use the new mem-pool by
> Jameson[1].
> >> >
> >> > I vaguely wonder about the performance impact, as the object
> >> > allocation code seemed to be relevant in the past.
> >>
> >> If I remember correctly, alloc.c was added because malloc() has too
> >> high overhead per allocation (and we create like millions of them).
> >> As long as you keep allocation overhead low, it should be ok. Note
> >> that we allocate a lot more than the mem-pool's main target (cache
> >> entries if I remember correctly). We may have a couple thousands
> >> cache entries.  We already deal with a couple million of struct object.
> >
> > The work to move cache entry allocation onto a memory pool was
> > motivated by the fact that we are looking at indexes with millions of
> > entries. If there is scaling concern with the current version of
> > mem-pool, we would like to address it there as well. Or if there is
> improvements that can be shared, that would be nice as well.
> 
> I think the two have quite different characteristics. alloc.c code is driven 
> by
> overhead. struct blob is only 24 bytes each and about 1/3 the repo is blobs, 
> and
> each malloc has 16 bytes overhead or so if I remember correctly. struct
> cache_entry at minimum in 88 bytes so relative overhead is not that a big deal
> (but sure reducing it is still very nice).
> 
> mem-pool is about allocation speed, but I think that's not a concern for 
> alloc.c
> because when we do full rev walk, I think I/O is always the bottleneck (maybe
> object lookup as well). I don't see a good way to have the one memory 
> allocator
> that satisfyies both to be honest. If you could allocate fixed-size cache 
> entries
> most of the time (e.g.
> larger entries will be allocated using malloc or something, and should be a 
> small
> number), then perhaps we can unify. Or if mem-pool can have an option to
> allocated fixed size pieces with no overhead, that would be great (sorry I 
> still
> have not read that mem-pool series yet)
> --
> Duy

Thank you for the extra details - the extra context was helpful -
especially the motivations for each of the areas. I agree with
your general analysis, but with the extra point that the memory
pool does allocate memory (variable sized) without any overhead,
except for possible alignment considerations and differences in
bookkeeping the larger "blocks" of memory from which small
allocations are made from - but I don't think this would be
enough to have a meaningful overall impact.

The mem-pool only tracks the pointer to the next available bit of
memory, and the end of the available memory range. It has a
similar constraint in that individual allocations cannot be freed
- you have to free the whole block.

It may be that the requirements are different enough (or the
gains worth it) to have another dedicated pooling allocator, but
I think the current design of the memory pool would satisfy both
consumers, even if the memory considerations are a bigger
motivation for blob structs. I would be interested in your
thoughts if you get the opportunity to read the mem-pool series.


Re: [PATCH 00/13] object store: alloc

2018-05-02 Thread Duy Nguyen
On Wed, May 2, 2018 at 8:07 PM, Jameson Miller <jam...@microsoft.com> wrote:
>
>
>> -Original Message-
>> From: Duy Nguyen <pclo...@gmail.com>
>> Sent: Wednesday, May 2, 2018 1:02 PM
>> To: Stefan Beller <sbel...@google.com>
>> Cc: Git Mailing List <git@vger.kernel.org>; Jameson Miller
>> <jam...@microsoft.com>
>> Subject: Re: [PATCH 00/13] object store: alloc
>>
>> On Tue, May 1, 2018 at 11:33 PM, Stefan Beller <sbel...@google.com> wrote:
>> > I also debated if it is worth converting alloc.c via this patch series
>> > or if it might make more sense to use the new mem-pool by Jameson[1].
>> >
>> > I vaguely wonder about the performance impact, as the object
>> > allocation code seemed to be relevant in the past.
>>
>> If I remember correctly, alloc.c was added because malloc() has too high
>> overhead per allocation (and we create like millions of them). As long as you
>> keep allocation overhead low, it should be ok. Note that we allocate a lot 
>> more
>> than the mem-pool's main target (cache entries if I remember correctly). We
>> may have a couple thousands cache entries.  We already deal with a couple
>> million of struct object.
>
> The work to move cache entry allocation onto a memory pool was motivated by
> the fact that we are looking at indexes with millions of entries. If there is 
> scaling
> concern with the current version of mem-pool, we would like to address it 
> there
> as well. Or if there is improvements that can be shared, that would be nice 
> as well.

I think the two have quite different characteristics. alloc.c code is
driven by overhead. struct blob is only 24 bytes each and about 1/3
the repo is blobs, and each malloc has 16 bytes overhead or so if I
remember correctly. struct cache_entry at minimum in 88 bytes so
relative overhead is not that a big deal (but sure reducing it is
still very nice).

mem-pool is about allocation speed, but I think that's not a concern
for alloc.c because when we do full rev walk, I think I/O is always
the bottleneck (maybe object lookup as well). I don't see a good way
to have the one memory allocator that satisfyies both to be honest. If
you could allocate fixed-size cache entries most of the time (e.g.
larger entries will be allocated using malloc or something, and should
be a small number), then perhaps we can unify. Or if mem-pool can have
an option to allocated fixed size pieces with no overhead, that would
be great (sorry I still have not read that mem-pool series yet)
-- 
Duy


RE: [PATCH 00/13] object store: alloc

2018-05-02 Thread Jameson Miller


> -Original Message-
> From: Duy Nguyen <pclo...@gmail.com>
> Sent: Wednesday, May 2, 2018 1:02 PM
> To: Stefan Beller <sbel...@google.com>
> Cc: Git Mailing List <git@vger.kernel.org>; Jameson Miller
> <jam...@microsoft.com>
> Subject: Re: [PATCH 00/13] object store: alloc
> 
> On Tue, May 1, 2018 at 11:33 PM, Stefan Beller <sbel...@google.com> wrote:
> > I also debated if it is worth converting alloc.c via this patch series
> > or if it might make more sense to use the new mem-pool by Jameson[1].
> >
> > I vaguely wonder about the performance impact, as the object
> > allocation code seemed to be relevant in the past.
> 
> If I remember correctly, alloc.c was added because malloc() has too high
> overhead per allocation (and we create like millions of them). As long as you
> keep allocation overhead low, it should be ok. Note that we allocate a lot 
> more
> than the mem-pool's main target (cache entries if I remember correctly). We
> may have a couple thousands cache entries.  We already deal with a couple
> million of struct object.

The work to move cache entry allocation onto a memory pool was motivated by
the fact that we are looking at indexes with millions of entries. If there is 
scaling
concern with the current version of mem-pool, we would like to address it there
as well. Or if there is improvements that can be shared, that would be nice as 
well.

> --
> Duy


Re: [PATCH 00/13] object store: alloc

2018-05-02 Thread Duy Nguyen
On Tue, May 1, 2018 at 11:33 PM, Stefan Beller  wrote:
> I also debated if it is worth converting alloc.c via this patch series
> or if it might make more sense to use the new mem-pool by Jameson[1].
>
> I vaguely wonder about the performance impact, as the object allocation
> code seemed to be relevant in the past.

If I remember correctly, alloc.c was added because malloc() has too
high overhead per allocation (and we create like millions of them). As
long as you keep allocation overhead low, it should be ok. Note that
we allocate a lot more than the mem-pool's main target (cache entries
if I remember correctly). We may have a couple thousands cache
entries.  We already deal with a couple million of struct object.
-- 
Duy


[PATCH 00/13] object store: alloc

2018-05-01 Thread Stefan Beller
This applies on top of sb/oid-object-info and is the logical continuum of
the series that it builds on; this brings the object store into more of
Gits code, removing global state, such that reasoning about the state of
the in-memory representation of the repository is easier.

My original plan was to convert lookup_commit_graft as the next series,
which would be similar to lookup_replace_object, as in sb/object-store-replace.
The grafts and shallow mechanism are very close to each other, such that
they need to be converted at the same time, both depending on the
"parsed object store" that is introduced in this commit.

The next series will then convert code in {object/blob/tree/commit/tag}.c
hopefully finishing the lookup_* functions.

I also debated if it is worth converting alloc.c via this patch series
or if it might make more sense to use the new mem-pool by Jameson[1].

I vaguely wonder about the performance impact, as the object allocation
code seemed to be relevant in the past.

[1] https://public-inbox.org/git/20180430153122.243976-1-jam...@microsoft.com/

Any comments welcome,
Thanks,
Stefan

Jonathan Nieder (1):
  object: add repository argument to grow_object_hash

Stefan Beller (12):
  repository: introduce object parser field
  object: add repository argument to create_object
  alloc: add repository argument to alloc_blob_node
  alloc: add repository argument to alloc_tree_node
  alloc: add repository argument to alloc_commit_node
  alloc: add repository argument to alloc_tag_node
  alloc: add repository argument to alloc_object_node
  alloc: add repository argument to alloc_report
  alloc: add repository argument to alloc_commit_index
  object: allow grow_object_hash to handle arbitrary repositories
  object: allow create_object to handle arbitrary repositories
  alloc: allow arbitrary repositories for alloc functions

 alloc.c   | 69 +++
 alloc.h   | 21 +++
 blame.c   |  3 +-
 blob.c|  5 ++-
 cache.h   |  9 -
 commit.c  |  4 +-
 merge-recursive.c |  3 +-
 object.c  | 93 +--
 object.h  | 18 -
 repository.c  |  7 
 repository.h  | 11 +-
 tag.c |  4 +-
 tree.c|  4 +-
 13 files changed, 182 insertions(+), 69 deletions(-)
 create mode 100644 alloc.h

-- 
2.17.0.441.gb46fe60e1d-goog