Re: [PATCH 1/3] pack-objects: fix tree_depth and layer invariants

2018-11-22 Thread Jeff King
On Tue, Nov 20, 2018 at 05:37:18PM +0100, Duy Nguyen wrote:

> > But in (b), we use the number of stored objects, _not_ the allocated
> > size of the objects array. So we can run into a situation like this:
> >
> >   1. packlist_alloc() needs to store the Nth object, so it grows the
> >  objects array to M, where M > N.
> >
> >   2. oe_set_tree_depth() wants to store a depth, so it allocates an
> >  array of length N. Now we've violated our invariant.
> >
> >   3. packlist_alloc() needs to store the N+1th object. But it _doesn't_
> >  grow the objects array, since N <= M still holds. We try to assign
> >  to tree_depth[N+1], which is out of bounds.
> 
> Do you think if this splitting data to packing_data is too fragile
> that we should just scrape the whole thing and move all data back to
> object_entry[]? We would use more memory of course but higher memory
> usage is still better than more bugs (if these are likely to show up
> again).

Certainly that thought crossed my mind while working on these patches. :)

Especially given the difficulties it introduced into the recent
bitmap-reuse topic, and the size fixes we had to deal with in v2.19.

Overall, though, I dunno. This fix, while subtle, turned out not to be
too complicated. And the memory savings are real. I consider 100M
objects to be on the large size of feasible for stock Git these days,
and I think we are talking about on the order of 4GB memory savings
there. You need a big machine to handle a repository of that size, but
4GB is still appreciable.

So I guess at this point, with all (known) bugs fixed, we should stick
with it for now. If it becomes a problem for development of a future
feature, then we can re-evaluate then.

-Peff


Re: [PATCH 1/3] pack-objects: fix tree_depth and layer invariants

2018-11-20 Thread Junio C Hamano
Jeff King  writes:

> But in (b), we use the number of stored objects, _not_ the allocated
> size of the objects array. So we can run into a situation like this:
>
>   1. packlist_alloc() needs to store the Nth object, so it grows the
>  objects array to M, where M > N.
>
>   2. oe_set_tree_depth() wants to store a depth, so it allocates an
>  array of length N. Now we've violated our invariant.
>
>   3. packlist_alloc() needs to store the N+1th object. But it _doesn't_
>  grow the objects array, since N <= M still holds. We try to assign
>  to tree_depth[N+1], which is out of bounds.

Ouch.  I see counting and allocationg is hard (I think I spotted a
bug in another area that comes from the same "count while filtering
and then allocate" pattern during this cycle).  Thanks for spotting.



Re: [PATCH 1/3] pack-objects: fix tree_depth and layer invariants

2018-11-20 Thread Duy Nguyen
On Tue, Nov 20, 2018 at 11:04 AM Jeff King  wrote:
>
> Commit 108f530385 (pack-objects: move tree_depth into 'struct
> packing_data', 2018-08-16) dynamically manages a tree_depth array in
> packing_data that maintains one of these invariants:
>
>   1. tree_depth is NULL (i.e., the requested options don't require us to
>  track tree depths)
>
>   2. tree_depth is non-NULL and has as many entries as the "objects"
>  array
>
> We maintain (2) by:
>
>   a. When the objects array grows, grow tree_depth to the same size
>  (unless it's NULL, in which case we can leave it).
>
>   b. When a caller asks to set a depth via oe_set_tree_depth(), if
>  tree_depth is NULL we allocate it.
>
> But in (b), we use the number of stored objects, _not_ the allocated
> size of the objects array. So we can run into a situation like this:
>
>   1. packlist_alloc() needs to store the Nth object, so it grows the
>  objects array to M, where M > N.
>
>   2. oe_set_tree_depth() wants to store a depth, so it allocates an
>  array of length N. Now we've violated our invariant.
>
>   3. packlist_alloc() needs to store the N+1th object. But it _doesn't_
>  grow the objects array, since N <= M still holds. We try to assign
>  to tree_depth[N+1], which is out of bounds.

Do you think if this splitting data to packing_data is too fragile
that we should just scrape the whole thing and move all data back to
object_entry[]? We would use more memory of course but higher memory
usage is still better than more bugs (if these are likely to show up
again).
-- 
Duy


[PATCH 1/3] pack-objects: fix tree_depth and layer invariants

2018-11-20 Thread Jeff King
Commit 108f530385 (pack-objects: move tree_depth into 'struct
packing_data', 2018-08-16) dynamically manages a tree_depth array in
packing_data that maintains one of these invariants:

  1. tree_depth is NULL (i.e., the requested options don't require us to
 track tree depths)

  2. tree_depth is non-NULL and has as many entries as the "objects"
 array

We maintain (2) by:

  a. When the objects array grows, grow tree_depth to the same size
 (unless it's NULL, in which case we can leave it).

  b. When a caller asks to set a depth via oe_set_tree_depth(), if
 tree_depth is NULL we allocate it.

But in (b), we use the number of stored objects, _not_ the allocated
size of the objects array. So we can run into a situation like this:

  1. packlist_alloc() needs to store the Nth object, so it grows the
 objects array to M, where M > N.

  2. oe_set_tree_depth() wants to store a depth, so it allocates an
 array of length N. Now we've violated our invariant.

  3. packlist_alloc() needs to store the N+1th object. But it _doesn't_
 grow the objects array, since N <= M still holds. We try to assign
 to tree_depth[N+1], which is out of bounds.

That doesn't happen in our test scripts, because the repositories they
use are so small, but it's easy to trigger by running:

  echo HEAD | git pack-objects --revs --delta-islands --stdout >/dev/null

in any reasonably-sized repo (like git.git).

We can fix it by always growing the array to match pack->nr_alloc, not
pack->nr_objects. Likewise for the "layer" array from fe0ac2fb7f
(pack-objects: move 'layer' into 'struct packing_data', 2018-08-16),
which has the same bug.

Signed-off-by: Jeff King 
---
 pack-objects.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/pack-objects.h b/pack-objects.h
index feb6a6a05e..f31ac1c81c 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -412,7 +412,7 @@ static inline void oe_set_tree_depth(struct packing_data 
*pack,
 unsigned int tree_depth)
 {
if (!pack->tree_depth)
-   ALLOC_ARRAY(pack->tree_depth, pack->nr_objects);
+   ALLOC_ARRAY(pack->tree_depth, pack->nr_alloc);
pack->tree_depth[e - pack->objects] = tree_depth;
 }
 
@@ -429,7 +429,7 @@ static inline void oe_set_layer(struct packing_data *pack,
unsigned char layer)
 {
if (!pack->layer)
-   ALLOC_ARRAY(pack->layer, pack->nr_objects);
+   ALLOC_ARRAY(pack->layer, pack->nr_alloc);
pack->layer[e - pack->objects] = layer;
 }
 
-- 
2.20.0.rc0.715.gf6b01ab3e1