Re: [PATCH] pack-objects: handle island check for "external" delta base

2018-09-19 Thread Jeff King
On Wed, Sep 19, 2018 at 08:34:05PM +0200, Martin Ågren wrote:

> > +   /*
> > +* First see if we're already sending the base (or it's explicitly 
> > in
> > +* our "excluded" list.
> > +*/
> 
> Missing ')'.

Oops, yes.

> > +   if (use_delta_islands) {
> > +   struct object_id base_oid;
> > +   hashcpy(base_oid.hash, base_sha1);
> > +   if (!in_same_island(&delta->idx.oid, &base_oid))
> > +   return 0;
> 
> This does some extra juggling to avoid using `base->idx.oid`, which
> would have been the moral equivalent of the original code, but which
> won't fly since `base` is NULL.

Yeah, this is the actual bug-fix.

I wasn't happy about having to write in_same_island() twice, but writing
it the other way ended up pretty nasty, too. Something like:

  struct object_id ext_oid;
  struct object_id *base_oid;

  base = packlist_find(&to_pack, base_sha1, NULL);
  if (base) {
base_oid = &base->idx.oid;
*base_out = base;
  }
  else if (thin && bitmap_has_sha1_in_uninteresting(bitmap_git, base_sha1)) {
hashcpy(ext_oid.hash, base_sha1);
base_oid = &ext_oid;
*base_out = NULL;
  } else {
return 0;
  }

  if (use_island_marks && !in_same_island(&delta->idx.oid, base_oid))
return 0;

  return 1;

That's less repetitive, but I feel like it's harder to follow which
variables are valid when.

> > +   if (can_reuse_delta(base_ref, entry, &base_entry)) {
> > oe_set_type(entry, entry->in_pack_type);
> > SET_SIZE(entry, in_pack_size); /* delta size */
> > SET_DELTA_SIZE(entry, in_pack_size);
> 
> Without being at all familiar with this code, this looks sane to me.
> Just had a small nit about the missing closing ')'.

Thanks for the review!

-Peff


Re: [PATCH] pack-objects: handle island check for "external" delta base

2018-09-19 Thread Martin Ågren
On Wed, 19 Sep 2018 at 05:49, Jeff King  wrote:
> This is tricky to do inside a single "if" statement. And
> after the merge in f3504ea3dd (Merge branch
> 'cc/delta-islands', 2018-09-17), that "if" condition is
> already getting pretty unwieldy. So this patch moves the
> logic into a helper function, where we can easily use
> multiple return paths. The result is a bit longer, but the
> logic should be much easier to follow.

> +static int can_reuse_delta(const unsigned char *base_sha1,
> +  struct object_entry *delta,
> +  struct object_entry **base_out)
> +{
> +   struct object_entry *base;
> +
> +   if (!base_sha1)
> +   return 0;

So this corresponds to "if (base_ref &&".

> +   /*
> +* First see if we're already sending the base (or it's explicitly in
> +* our "excluded" list.
> +*/

Missing ')'.

> +   base = packlist_find(&to_pack, base_sha1, NULL);
> +   if (base) {
> +   if (!in_same_island(&delta->idx.oid, &base->idx.oid))
> +   return 0;

This logic matches the removed code...

> +   *base_out = base;
> +   return 1;
> +   }
> +
> +   /*
> +* Otherwise, reachability bitmaps may tell us if the receiver has it,
> +* even if it was buried too deep in history to make it into the
> +* packing list.
> +*/
> +   if (thin && bitmap_has_sha1_in_uninteresting(bitmap_git, base_sha1)) {

This matches...

> +   if (use_delta_islands) {
> +   struct object_id base_oid;
> +   hashcpy(base_oid.hash, base_sha1);
> +   if (!in_same_island(&delta->idx.oid, &base_oid))
> +   return 0;

This does some extra juggling to avoid using `base->idx.oid`, which
would have been the moral equivalent of the original code, but which
won't fly since `base` is NULL.

> +   }
> +   *base_out = NULL;
> +   return 1;
> +   }
> +
> +   return 0;
> +}
> +
>  static void check_object(struct object_entry *entry)
>  {
> unsigned long canonical_size;
> @@ -1556,22 +1607,7 @@ static void check_object(struct object_entry *entry)
> break;
> }
>
> -   if (base_ref && (
> -   (base_entry = packlist_find(&to_pack, base_ref, NULL)) ||
> -   (thin &&
> -bitmap_has_sha1_in_uninteresting(bitmap_git, base_ref))) 
> &&
> -   in_same_island(&entry->idx.oid, &base_entry->idx.oid)) {

Yeah, the new function looks much simpler than this. We have

  if (A && (B1 || B2) && C) {.

Knowing what to look for, it can be seen that we can -- under the right
circumstances -- have A and B2, but not B1, and try to evalute C by
dereferencing `base_entry` which will be NULL.

> +   if (can_reuse_delta(base_ref, entry, &base_entry)) {
> oe_set_type(entry, entry->in_pack_type);
> SET_SIZE(entry, in_pack_size); /* delta size */
> SET_DELTA_SIZE(entry, in_pack_size);

Without being at all familiar with this code, this looks sane to me.
Just had a small nit about the missing closing ')'.

Martin