Re: [PATCH 00/13] object store: alloc

2018-05-07 Thread Stefan Beller
Hi Junio,

On Mon, May 7, 2018 at 7:05 AM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> This applies on top of sb/oid-object-info and is the logical continuum of
>> the series that it builds on; this brings the object store into more of
>> Gits code, removing global state, such that reasoning about the state of
>> the in-memory representation of the repository is easier.
>
> I am not sure how well this topic is done, but I've queued the
> following patch at the tip of the topic to make it compile after
> getting merged to integration branches (curiously, the topic by
> itself compiled file for whatever reason).

Thanks for the fixup; I will include it with other fixes in a reroll.

The investigation why it would not compile is not found in alloc.c
but in 1da1580e4c2 (Makefile: detect compiler and enable more
warnings in DEVELOPER=1, 2018-04-14), which enabled
-Werror=missing-prototypes, that requires a prototype which
is found in alloc.h

Thanks,
Stefan


Re: [PATCH 00/13] object store: alloc

2018-05-07 Thread Junio C Hamano
Stefan Beller  writes:

> This applies on top of sb/oid-object-info and is the logical continuum of
> the series that it builds on; this brings the object store into more of
> Gits code, removing global state, such that reasoning about the state of
> the in-memory representation of the repository is easier.

I am not sure how well this topic is done, but I've queued the
following patch at the tip of the topic to make it compile after
getting merged to integration branches (curiously, the topic by
itself compiled file for whatever reason). I think I haven't send
that fixup patch out, so here it is.

-- >8 --
From: Junio C Hamano 
Date: Wed, 2 May 2018 19:09:50 +0900
Subject: [PATCH] alloc.c: include alloc.h

Otherwise the definition in alloc.c would not see the matching decl
in alloc.h, triggering warning from compiler.

Signed-off-by: Junio C Hamano 
---
 alloc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/alloc.c b/alloc.c
index 66a3d07ba2..f47a67153b 100644
--- a/alloc.c
+++ b/alloc.c
@@ -16,6 +16,7 @@
 #include "tree.h"
 #include "commit.h"
 #include "tag.h"
+#include "alloc.h"
 
 #define BLOCKING 1024
 
-- 
2.17.0-391-g1f1cddd558



Re: [PATCH 00/13] object store: alloc

2018-05-03 Thread Stefan Beller
On Wed, May 2, 2018 at 11:22 AM, Duy Nguyen  wrote:

> I think the two have quite different characteristics. alloc.c code is
> driven by overhead. struct blob is only 24 bytes each and about 1/3
> the repo is blobs, and each malloc has 16 bytes overhead or so if I
> remember correctly. struct cache_entry at minimum in 88 bytes so
> relative overhead is not that a big deal (but sure reducing it is
> still very nice).
>
> mem-pool is about allocation speed,

I don't think so, given that we do a linear search in each block allocation.

> but I think that's not a concern
> for alloc.c because when we do full rev walk, I think I/O is always
> the bottleneck (maybe object lookup as well). I don't see a good way
> to have the one memory allocator that satisfyies both to be honest.

By changing the allocation size of a block to be larger than 1024 entries
in alloc. we should lessen the impact of management overhead, and then
the mem pool can be more than feasible.


RE: [PATCH 00/13] object store: alloc

2018-05-02 Thread Jameson Miller


> -Original Message-
> From: Duy Nguyen <pclo...@gmail.com>
> Sent: Wednesday, May 2, 2018 2:23 PM
> To: Jameson Miller <jam...@microsoft.com>
> Cc: Stefan Beller <sbel...@google.com>; Git Mailing List <git@vger.kernel.org>
> Subject: Re: [PATCH 00/13] object store: alloc
> 
> On Wed, May 2, 2018 at 8:07 PM, Jameson Miller <jam...@microsoft.com>
> wrote:
> >
> >
> >> -Original Message-
> >> From: Duy Nguyen <pclo...@gmail.com>
> >> Sent: Wednesday, May 2, 2018 1:02 PM
> >> To: Stefan Beller <sbel...@google.com>
> >> Cc: Git Mailing List <git@vger.kernel.org>; Jameson Miller
> >> <jam...@microsoft.com>
> >> Subject: Re: [PATCH 00/13] object store: alloc
> >>
> >> On Tue, May 1, 2018 at 11:33 PM, Stefan Beller <sbel...@google.com>
> wrote:
> >> > I also debated if it is worth converting alloc.c via this patch
> >> > series or if it might make more sense to use the new mem-pool by
> Jameson[1].
> >> >
> >> > I vaguely wonder about the performance impact, as the object
> >> > allocation code seemed to be relevant in the past.
> >>
> >> If I remember correctly, alloc.c was added because malloc() has too
> >> high overhead per allocation (and we create like millions of them).
> >> As long as you keep allocation overhead low, it should be ok. Note
> >> that we allocate a lot more than the mem-pool's main target (cache
> >> entries if I remember correctly). We may have a couple thousands
> >> cache entries.  We already deal with a couple million of struct object.
> >
> > The work to move cache entry allocation onto a memory pool was
> > motivated by the fact that we are looking at indexes with millions of
> > entries. If there is scaling concern with the current version of
> > mem-pool, we would like to address it there as well. Or if there is
> improvements that can be shared, that would be nice as well.
> 
> I think the two have quite different characteristics. alloc.c code is driven 
> by
> overhead. struct blob is only 24 bytes each and about 1/3 the repo is blobs, 
> and
> each malloc has 16 bytes overhead or so if I remember correctly. struct
> cache_entry at minimum in 88 bytes so relative overhead is not that a big deal
> (but sure reducing it is still very nice).
> 
> mem-pool is about allocation speed, but I think that's not a concern for 
> alloc.c
> because when we do full rev walk, I think I/O is always the bottleneck (maybe
> object lookup as well). I don't see a good way to have the one memory 
> allocator
> that satisfyies both to be honest. If you could allocate fixed-size cache 
> entries
> most of the time (e.g.
> larger entries will be allocated using malloc or something, and should be a 
> small
> number), then perhaps we can unify. Or if mem-pool can have an option to
> allocated fixed size pieces with no overhead, that would be great (sorry I 
> still
> have not read that mem-pool series yet)
> --
> Duy

Thank you for the extra details - the extra context was helpful -
especially the motivations for each of the areas. I agree with
your general analysis, but with the extra point that the memory
pool does allocate memory (variable sized) without any overhead,
except for possible alignment considerations and differences in
bookkeeping the larger "blocks" of memory from which small
allocations are made from - but I don't think this would be
enough to have a meaningful overall impact.

The mem-pool only tracks the pointer to the next available bit of
memory, and the end of the available memory range. It has a
similar constraint in that individual allocations cannot be freed
- you have to free the whole block.

It may be that the requirements are different enough (or the
gains worth it) to have another dedicated pooling allocator, but
I think the current design of the memory pool would satisfy both
consumers, even if the memory considerations are a bigger
motivation for blob structs. I would be interested in your
thoughts if you get the opportunity to read the mem-pool series.


Re: [PATCH 00/13] object store: alloc

2018-05-02 Thread Duy Nguyen
On Wed, May 2, 2018 at 8:07 PM, Jameson Miller <jam...@microsoft.com> wrote:
>
>
>> -Original Message-
>> From: Duy Nguyen <pclo...@gmail.com>
>> Sent: Wednesday, May 2, 2018 1:02 PM
>> To: Stefan Beller <sbel...@google.com>
>> Cc: Git Mailing List <git@vger.kernel.org>; Jameson Miller
>> <jam...@microsoft.com>
>> Subject: Re: [PATCH 00/13] object store: alloc
>>
>> On Tue, May 1, 2018 at 11:33 PM, Stefan Beller <sbel...@google.com> wrote:
>> > I also debated if it is worth converting alloc.c via this patch series
>> > or if it might make more sense to use the new mem-pool by Jameson[1].
>> >
>> > I vaguely wonder about the performance impact, as the object
>> > allocation code seemed to be relevant in the past.
>>
>> If I remember correctly, alloc.c was added because malloc() has too high
>> overhead per allocation (and we create like millions of them). As long as you
>> keep allocation overhead low, it should be ok. Note that we allocate a lot 
>> more
>> than the mem-pool's main target (cache entries if I remember correctly). We
>> may have a couple thousands cache entries.  We already deal with a couple
>> million of struct object.
>
> The work to move cache entry allocation onto a memory pool was motivated by
> the fact that we are looking at indexes with millions of entries. If there is 
> scaling
> concern with the current version of mem-pool, we would like to address it 
> there
> as well. Or if there is improvements that can be shared, that would be nice 
> as well.

I think the two have quite different characteristics. alloc.c code is
driven by overhead. struct blob is only 24 bytes each and about 1/3
the repo is blobs, and each malloc has 16 bytes overhead or so if I
remember correctly. struct cache_entry at minimum in 88 bytes so
relative overhead is not that a big deal (but sure reducing it is
still very nice).

mem-pool is about allocation speed, but I think that's not a concern
for alloc.c because when we do full rev walk, I think I/O is always
the bottleneck (maybe object lookup as well). I don't see a good way
to have the one memory allocator that satisfyies both to be honest. If
you could allocate fixed-size cache entries most of the time (e.g.
larger entries will be allocated using malloc or something, and should
be a small number), then perhaps we can unify. Or if mem-pool can have
an option to allocated fixed size pieces with no overhead, that would
be great (sorry I still have not read that mem-pool series yet)
-- 
Duy


RE: [PATCH 00/13] object store: alloc

2018-05-02 Thread Jameson Miller


> -Original Message-
> From: Duy Nguyen <pclo...@gmail.com>
> Sent: Wednesday, May 2, 2018 1:02 PM
> To: Stefan Beller <sbel...@google.com>
> Cc: Git Mailing List <git@vger.kernel.org>; Jameson Miller
> <jam...@microsoft.com>
> Subject: Re: [PATCH 00/13] object store: alloc
> 
> On Tue, May 1, 2018 at 11:33 PM, Stefan Beller <sbel...@google.com> wrote:
> > I also debated if it is worth converting alloc.c via this patch series
> > or if it might make more sense to use the new mem-pool by Jameson[1].
> >
> > I vaguely wonder about the performance impact, as the object
> > allocation code seemed to be relevant in the past.
> 
> If I remember correctly, alloc.c was added because malloc() has too high
> overhead per allocation (and we create like millions of them). As long as you
> keep allocation overhead low, it should be ok. Note that we allocate a lot 
> more
> than the mem-pool's main target (cache entries if I remember correctly). We
> may have a couple thousands cache entries.  We already deal with a couple
> million of struct object.

The work to move cache entry allocation onto a memory pool was motivated by
the fact that we are looking at indexes with millions of entries. If there is 
scaling
concern with the current version of mem-pool, we would like to address it there
as well. Or if there is improvements that can be shared, that would be nice as 
well.

> --
> Duy


Re: [PATCH 00/13] object store: alloc

2018-05-02 Thread Duy Nguyen
On Tue, May 1, 2018 at 11:33 PM, Stefan Beller  wrote:
> I also debated if it is worth converting alloc.c via this patch series
> or if it might make more sense to use the new mem-pool by Jameson[1].
>
> I vaguely wonder about the performance impact, as the object allocation
> code seemed to be relevant in the past.

If I remember correctly, alloc.c was added because malloc() has too
high overhead per allocation (and we create like millions of them). As
long as you keep allocation overhead low, it should be ok. Note that
we allocate a lot more than the mem-pool's main target (cache entries
if I remember correctly). We may have a couple thousands cache
entries.  We already deal with a couple million of struct object.
-- 
Duy