Re: [PATCH 1/4] shallow.c: make paint_alloc slightly more robust

2016-12-05 Thread Duy Nguyen
On Sat, Dec 3, 2016 at 12:14 PM, Jeff King  wrote:
> On Fri, Dec 02, 2016 at 09:31:01PM +0100, Rasmus Villemoes wrote:
>
>> I have no idea if this is a real issue, but it's not obvious to me that
>> paint_alloc cannot be called with info->nr_bits greater than about
>> 4M (\approx 8*COMMIT_SLAB_SIZE). In that case the new slab would be too
>> small. So just round up the allocation to the maximum of
>> COMMIT_SLAB_SIZE and size.
>
> I had trouble understanding what the problem is from this description,
> but I think i figured it out from the code.
>
> Let me try to restate it to make sure I understand.
>
> The paint_alloc() may be asked to allocate a certain number of bits,
> which it does across a series of independently allocated slabs. Each
> slab holds a fixed size, but we only allocate a single slab. If the
> number we need to allocate is larger than fits in a single slab, then at
> the end we'll have under-allocated.

Each bit here represents a ref. This code walks the commit graph and
"paints" all commits reachable by the n-th ref with the n-th bit,
stored in the commit slab. But because the majority of commits will
have the same bitmap (e.g. when you exclude tag ABC and nothing else,
then all commits from ABC will have the same bitmap "1"), it's a waste
to allocate the same bitmap per commit (and it's also inefficient to
let malloc allocate 1 bit). I tried to reduce the memory usage: if the
a commit and its parent has the same bitmap, and the slab pointer of
the child commit points to the memory of the parent's, no extra
allocation is done. This manual memory management is pretty much like
alloc.c

The COMMIT_SLAB_SIZE here is really an arbitrary big number so that we
don't have to allocate often. It's basically allocating a new memory
pool. When we use all of that pool, we allocate a new one.. Yeah I
probably should define a new one instead of reusing COMMIT_SLAB_SIZE.
Tthe chances of under-allocation is super low, but still possible: you
need to send more than 4M "exclude" (or "shallow") requests to
upload-pack, to create a bitmap of over 512KiB. That's a lot of
traffic in git protocol.

> Your solution is to make the slab we allocate bigger. But that seems
> odd to me. Usually when we are using COMMIT_SLAB_SIZE, we are allocating
> a series of slabs that make up a virtual array, and we know that each
> slab has the same size. So if you need to find the k-th item, and each
> slab has length n, then you'd look at slab (k / n), and then at item (k
> % n) within that slab.
>
> In other words, I think the solution isn't to make the one slab bigger,
> but to allocate slabs until we have enough of them to meet the request.

If I still understand my code (it's been a long time since I wrote
this thing), then I think we just need to catch the problem and die().
Normal users should never ask the server to allocate this much.
-- 
Duy


Re: [PATCH 1/4] shallow.c: make paint_alloc slightly more robust

2016-12-02 Thread Jeff King
On Fri, Dec 02, 2016 at 09:31:01PM +0100, Rasmus Villemoes wrote:

> I have no idea if this is a real issue, but it's not obvious to me that
> paint_alloc cannot be called with info->nr_bits greater than about
> 4M (\approx 8*COMMIT_SLAB_SIZE). In that case the new slab would be too
> small. So just round up the allocation to the maximum of
> COMMIT_SLAB_SIZE and size.

I had trouble understanding what the problem is from this description,
but I think i figured it out from the code.

Let me try to restate it to make sure I understand.

The paint_alloc() may be asked to allocate a certain number of bits,
which it does across a series of independently allocated slabs. Each
slab holds a fixed size, but we only allocate a single slab. If the
number we need to allocate is larger than fits in a single slab, then at
the end we'll have under-allocated.

Your solution is to make the slab we allocate bigger. But that seems
odd to me. Usually when we are using COMMIT_SLAB_SIZE, we are allocating
a series of slabs that make up a virtual array, and we know that each
slab has the same size. So if you need to find the k-th item, and each
slab has length n, then you'd look at slab (k / n), and then at item (k
% n) within that slab.

In other words, I think the solution isn't to make the one slab bigger,
but to allocate slabs until we have enough of them to meet the request.

But I don't really know how this code is used, or why it is using
COMMIT_SLAB_SIZE in the first place. That's generally supposed to be an
internal detail of the commit-slab.h infrastructure. Why is it being
used directly, instead of just using the functions that commit-slab
defines?

-Peff


[PATCH 1/4] shallow.c: make paint_alloc slightly more robust

2016-12-02 Thread Rasmus Villemoes
I have no idea if this is a real issue, but it's not obvious to me that
paint_alloc cannot be called with info->nr_bits greater than about
4M (\approx 8*COMMIT_SLAB_SIZE). In that case the new slab would be too
small. So just round up the allocation to the maximum of
COMMIT_SLAB_SIZE and size.

Signed-off-by: Rasmus Villemoes 
---
 shallow.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/shallow.c b/shallow.c
index 4d0b005..e21534a 100644
--- a/shallow.c
+++ b/shallow.c
@@ -445,11 +445,13 @@ static uint32_t *paint_alloc(struct paint_info *info)
unsigned size = nr * sizeof(uint32_t);
void *p;
if (!info->slab_count || info->free + size > info->end) {
+   unsigned alloc_size = size < COMMIT_SLAB_SIZE ?
+   COMMIT_SLAB_SIZE : size;
info->slab_count++;
REALLOC_ARRAY(info->slab, info->slab_count);
-   info->free = xmalloc(COMMIT_SLAB_SIZE);
+   info->free = xmalloc(alloc_size);
info->slab[info->slab_count - 1] = info->free;
-   info->end = info->free + COMMIT_SLAB_SIZE;
+   info->end = info->free + alloc_size;
}
p = info->free;
info->free += size;
-- 
2.1.4