Re: [PATCH] pack-objects: handle island check for "external" delta base

2018-09-19 Thread Jeff King
On Wed, Sep 19, 2018 at 08:34:05PM +0200, Martin Ågren wrote:

> > +   /*
> > +* First see if we're already sending the base (or it's explicitly 
> > in
> > +* our "excluded" list.
> > +*/
> 
> Missing ')'.

Oops, yes.

> > +   if (use_delta_islands) {
> > +   struct object_id base_oid;
> > +   hashcpy(base_oid.hash, base_sha1);
> > +   if (!in_same_island(>idx.oid, _oid))
> > +   return 0;
> 
> This does some extra juggling to avoid using `base->idx.oid`, which
> would have been the moral equivalent of the original code, but which
> won't fly since `base` is NULL.

Yeah, this is the actual bug-fix.

I wasn't happy about having to write in_same_island() twice, but writing
it the other way ended up pretty nasty, too. Something like:

  struct object_id ext_oid;
  struct object_id *base_oid;

  base = packlist_find(_pack, base_sha1, NULL);
  if (base) {
base_oid = >idx.oid;
*base_out = base;
  }
  else if (thin && bitmap_has_sha1_in_uninteresting(bitmap_git, base_sha1)) {
hashcpy(ext_oid.hash, base_sha1);
base_oid = _oid;
*base_out = NULL;
  } else {
return 0;
  }

  if (use_island_marks && !in_same_island(>idx.oid, base_oid))
return 0;

  return 1;

That's less repetitive, but I feel like it's harder to follow which
variables are valid when.

> > +   if (can_reuse_delta(base_ref, entry, _entry)) {
> > oe_set_type(entry, entry->in_pack_type);
> > SET_SIZE(entry, in_pack_size); /* delta size */
> > SET_DELTA_SIZE(entry, in_pack_size);
> 
> Without being at all familiar with this code, this looks sane to me.
> Just had a small nit about the missing closing ')'.

Thanks for the review!

-Peff


Re: [PATCH] pack-objects: handle island check for "external" delta base

2018-09-19 Thread Martin Ågren
On Wed, 19 Sep 2018 at 05:49, Jeff King  wrote:
> This is tricky to do inside a single "if" statement. And
> after the merge in f3504ea3dd (Merge branch
> 'cc/delta-islands', 2018-09-17), that "if" condition is
> already getting pretty unwieldy. So this patch moves the
> logic into a helper function, where we can easily use
> multiple return paths. The result is a bit longer, but the
> logic should be much easier to follow.

> +static int can_reuse_delta(const unsigned char *base_sha1,
> +  struct object_entry *delta,
> +  struct object_entry **base_out)
> +{
> +   struct object_entry *base;
> +
> +   if (!base_sha1)
> +   return 0;

So this corresponds to "if (base_ref &&".

> +   /*
> +* First see if we're already sending the base (or it's explicitly in
> +* our "excluded" list.
> +*/

Missing ')'.

> +   base = packlist_find(_pack, base_sha1, NULL);
> +   if (base) {
> +   if (!in_same_island(>idx.oid, >idx.oid))
> +   return 0;

This logic matches the removed code...

> +   *base_out = base;
> +   return 1;
> +   }
> +
> +   /*
> +* Otherwise, reachability bitmaps may tell us if the receiver has it,
> +* even if it was buried too deep in history to make it into the
> +* packing list.
> +*/
> +   if (thin && bitmap_has_sha1_in_uninteresting(bitmap_git, base_sha1)) {

This matches...

> +   if (use_delta_islands) {
> +   struct object_id base_oid;
> +   hashcpy(base_oid.hash, base_sha1);
> +   if (!in_same_island(>idx.oid, _oid))
> +   return 0;

This does some extra juggling to avoid using `base->idx.oid`, which
would have been the moral equivalent of the original code, but which
won't fly since `base` is NULL.

> +   }
> +   *base_out = NULL;
> +   return 1;
> +   }
> +
> +   return 0;
> +}
> +
>  static void check_object(struct object_entry *entry)
>  {
> unsigned long canonical_size;
> @@ -1556,22 +1607,7 @@ static void check_object(struct object_entry *entry)
> break;
> }
>
> -   if (base_ref && (
> -   (base_entry = packlist_find(_pack, base_ref, NULL)) ||
> -   (thin &&
> -bitmap_has_sha1_in_uninteresting(bitmap_git, base_ref))) 
> &&
> -   in_same_island(>idx.oid, _entry->idx.oid)) {

Yeah, the new function looks much simpler than this. We have

  if (A && (B1 || B2) && C) {.

Knowing what to look for, it can be seen that we can -- under the right
circumstances -- have A and B2, but not B1, and try to evalute C by
dereferencing `base_entry` which will be NULL.

> +   if (can_reuse_delta(base_ref, entry, _entry)) {
> oe_set_type(entry, entry->in_pack_type);
> SET_SIZE(entry, in_pack_size); /* delta size */
> SET_DELTA_SIZE(entry, in_pack_size);

Without being at all familiar with this code, this looks sane to me.
Just had a small nit about the missing closing ')'.

Martin


[PATCH] pack-objects: handle island check for "external" delta base

2018-09-18 Thread Jeff King
On Fri, Sep 14, 2018 at 02:56:36PM -0700, Junio C Hamano wrote:

> * cc/delta-islands (2018-08-16) 7 commits
>   (merged to 'next' on 2018-08-27 at cf3d7bd93f)
>  + pack-objects: move 'layer' into 'struct packing_data'
>  + pack-objects: move tree_depth into 'struct packing_data'
>  + t5320: tests for delta islands
>  + repack: add delta-islands support
>  + pack-objects: add delta-islands support
>  + pack-objects: refactor code into compute_layer_order()
>  + Add delta-islands.{c,h}
> 
>  Lift code from GitHub to restrict delta computation so that an
>  object that exists in one fork is not made into a delta against
>  another object that does not appear in the same forked repository.
> 
>  Will merge to 'master'.

This needed some conflict resolution with my pack-bitmap-reuse-delta
topic, but there's a subtle bug in the result that went to 'master'.
Details and a fix below.

As a side note, I did this same resolution myself at least twice (for my
personal build and for porting the refreshed delta-reuse series to our
GitHub build), and I wrote the exact same resolution you did both times.
So I think it was an easy mistake to make. :)

-Peff

-- >8 --
Subject: pack-objects: handle island check for "external" delta base

Two recent topics, jk/pack-delta-reuse-with-bitmap and
cc/delta-islands, can have a funny interaction. When
checking if we can reuse an on-disk delta, the first topic
allows base_entry to be NULL when we find an object that's
not in the packing list. But the latter topic introduces a
call to in_same_island(), which needs to look at
base_entry->idx.oid. When these two features are used
together, we might try to dereference a NULL base_entry.

In practice, this doesn't really happen. We'd generally only
use delta islands when packing to disk, since the whole
point is to optimize the pack for serving fetches later. And
the new delta-reuse code relies on having used reachability
bitmaps to determine the set of objects, which we would
typically only do when serving an actual fetch.

However, it is technically possible to combine these
features. And even without doing so, building with
"SANITIZE=address,undefined" will cause t5310.46 to
complain.  Even though that test does not have delta islands
enabled, we still take the address of the NULL entry to pass
to in_same_island(). That function then promptly returns
without dereferencing the value when it sees that islands
are not enabled, but it's enough to trigger a sanitizer
error.

The solution is straight-forward: when both features are
used together, we should pass the oid of the found base to
in_same_island().

This is tricky to do inside a single "if" statement. And
after the merge in f3504ea3dd (Merge branch
'cc/delta-islands', 2018-09-17), that "if" condition is
already getting pretty unwieldy. So this patch moves the
logic into a helper function, where we can easily use
multiple return paths. The result is a bit longer, but the
logic should be much easier to follow.

Signed-off-by: Jeff King 
---
 builtin/pack-objects.c | 68 --
 1 file changed, 52 insertions(+), 16 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 5041818ddf..27cb674124 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1470,6 +1470,57 @@ static void cleanup_preferred_base(void)
done_pbase_paths_num = done_pbase_paths_alloc = 0;
 }
 
+/*
+ * Return 1 iff the object specified by "delta" can be sent
+ * literally as a delta against the base in "base_sha1". If
+ * so, then *base_out will point to the entry in our packing
+ * list, or NULL if we must use the external-base list.
+ *
+ * Depth value does not matter - find_deltas() will
+ * never consider reused delta as the base object to
+ * deltify other objects against, in order to avoid
+ * circular deltas.
+ */
+static int can_reuse_delta(const unsigned char *base_sha1,
+  struct object_entry *delta,
+  struct object_entry **base_out)
+{
+   struct object_entry *base;
+
+   if (!base_sha1)
+   return 0;
+
+   /*
+* First see if we're already sending the base (or it's explicitly in
+* our "excluded" list.
+*/
+   base = packlist_find(_pack, base_sha1, NULL);
+   if (base) {
+   if (!in_same_island(>idx.oid, >idx.oid))
+   return 0;
+   *base_out = base;
+   return 1;
+   }
+
+   /*
+* Otherwise, reachability bitmaps may tell us if the receiver has it,
+* even if it was buried too deep in history to make it into the
+* packing list.
+*/
+   if (thin && bitmap_has_sha1_in_uninteresting(bitmap_git, base_sha1)) {
+   if (use_delta_islands) {
+   struct object_id base_oid;
+   hashcpy(base_oid.hash, base_sha1);
+   if (!in_same_island(>idx.oid, _oid))
+