Re: [PATCH v2] send-pack: never fetch when checking exclusions

2019-10-16 Thread Jeff King
On Fri, Oct 11, 2019 at 03:08:22PM -0700, Jonathan Tan wrote:

> > As a general rule (and why I'm raising this issue in reply to Jonathan's
> > patch), I think most or all sites that want OBJECT_INFO_QUICK will want
> > SKIP_FETCH_OBJECT as well, and vice versa. The reasoning is generally
> > the same:
> > 
> >   - it's OK to racily have a false negative (we'll still be correct, but
> > possibly a little less optimal)
> > 
> >   - it's expected and normal to be missing the object, so spending time
> > double-checking the pack store wastes measurable time in real-world
> > cases
> 
> I took a look on "next" and it's true for these reasons in most cases
> but not all.

Thanks for digging into this.

> QUICK implies SKIP_FETCH_OBJECT:
> 
>   fetch-pack.c: Run with fetch_if_missing=0 (from builtin/fetch.c,
>   builtin/fetch-pack.c, or through a lazy fetch) so OK.
>   
>   builtin/index-pack.c: Run with fetch_if_missing=0, so OK.
>   
>   builtin/fetch.c: Run with fetch_if_missing=0, so OK.
>   
>   object-store.h, sha1-file.c: Definition and implementation of this
>   flag.

Right, I think going in this direction is pretty simple. Having been
marked with QUICK, they hit both of my points from above. And if we want
to avoid re-scanning the pack directory because of cost, we _definitely_
want to avoid making an expensive network call.

> Everything is OK here. Now, SKIP_FETCH_OBJECT implies QUICK:
> 
>   cache-tree.c: I added this recently in f981ec18cf ("cache-tree: do not
>   lazy-fetch tentative tree", 2019-09-09). No problem with a false
>   negative, since we know how to reconstruct the tree. OK.
> [...]
>   send-pack.c: This patch (which is already in "next"). If we have a
>   false negative, we might accidentally send more than we need. But that
>   is not too bad.

Yeah, I think both of these could be QUICK.

>   promisor-remote.c: This is the slightly tricky one. We use this
>   information to determine if we got our lazily-fetched object from the
>   most recent lazy fetch, or if we should continue attempting to fetch the
>   given object from other promisor remotes; so this information is
>   important. However, adding QUICK doesn't lose us anything because the
>   lack of QUICK only helps us when there is another process packing
>   loose objects: if we got our object, our object will be in a pack
>   (because of the way the fetch is implemented - in particular, we need
>   a pack because we need the ".promisor" file).
> 
> So everything is OK except for promisor-remote.c, but even that is OK
> for another reason.

Yeah, though I wouldn't be sad to see that use a separate flag, since it
really is about promisor logic.

That implies to me maybe we should be using QUICK more aggressively, and
QUICK should auto-imply SKIP_FETCH_OBJECT.

> Having said that, perhaps we should consider promisor-remote.c as
> low-level code and expect it to know that objects are fetched into a
> packfile (as opposed to loose objects), so it can safely use QUICK
> (which is documented as checking packed after packed and loose). If no
> one disagrees, I can make such a patch after jt/push-avoid-lazy-fetch is
> merged to master (as is the plan, according to What's Cooking [1]).

I think it's OK to continue leaving out QUICK there if it's not causing
problems. It really is a bit different than the other cases.

-Peff


Re: [PATCH v2] send-pack: never fetch when checking exclusions

2019-10-11 Thread Junio C Hamano
Jeff King  writes:

> As a general rule (and why I'm raising this issue in reply to Jonathan's
> patch), I think most or all sites that want OBJECT_INFO_QUICK will want
> SKIP_FETCH_OBJECT as well, and vice versa. The reasoning is generally
> the same:
>
>   - it's OK to racily have a false negative (we'll still be correct, but
> possibly a little less optimal)
>
>   - it's expected and normal to be missing the object, so spending time
> double-checking the pack store wastes measurable time in real-world
> cases

31f5256c ("sha1-file: split OBJECT_INFO_FOR_PREFETCH", 2019-05-28)
separated SKIP_FETCH_OBJECT out of FOR_PREFETCH, the latter of which
was and is SKIP_FETCH and QUICK combined.  Use SKIP_FETCH_OBJECT
alone may need to be re-examined and discouraged?



Re: [PATCH v2] send-pack: never fetch when checking exclusions

2019-10-11 Thread Jonathan Tan
> As a general rule (and why I'm raising this issue in reply to Jonathan's
> patch), I think most or all sites that want OBJECT_INFO_QUICK will want
> SKIP_FETCH_OBJECT as well, and vice versa. The reasoning is generally
> the same:
> 
>   - it's OK to racily have a false negative (we'll still be correct, but
> possibly a little less optimal)
> 
>   - it's expected and normal to be missing the object, so spending time
> double-checking the pack store wastes measurable time in real-world
> cases

I took a look on "next" and it's true for these reasons in most cases
but not all.

QUICK implies SKIP_FETCH_OBJECT:

  fetch-pack.c: Run with fetch_if_missing=0 (from builtin/fetch.c,
  builtin/fetch-pack.c, or through a lazy fetch) so OK.
  
  builtin/index-pack.c: Run with fetch_if_missing=0, so OK.
  
  builtin/fetch.c: Run with fetch_if_missing=0, so OK.
  
  object-store.h, sha1-file.c: Definition and implementation of this
  flag.

Everything is OK here. Now, SKIP_FETCH_OBJECT implies QUICK:

  cache-tree.c: I added this recently in f981ec18cf ("cache-tree: do not
  lazy-fetch tentative tree", 2019-09-09). No problem with a false
  negative, since we know how to reconstruct the tree. OK.
  
  object-store.h, sha1-file.c: Definition and implementation of this
  flag.
  
  send-pack.c: This patch (which is already in "next"). If we have a
  false negative, we might accidentally send more than we need. But that
  is not too bad.
  
  promisor-remote.c: This is the slightly tricky one. We use this
  information to determine if we got our lazily-fetched object from the
  most recent lazy fetch, or if we should continue attempting to fetch the
  given object from other promisor remotes; so this information is
  important. However, adding QUICK doesn't lose us anything because the
  lack of QUICK only helps us when there is another process packing
  loose objects: if we got our object, our object will be in a pack
  (because of the way the fetch is implemented - in particular, we need
  a pack because we need the ".promisor" file).

So everything is OK except for promisor-remote.c, but even that is OK
for another reason.

Having said that, perhaps we should consider promisor-remote.c as
low-level code and expect it to know that objects are fetched into a
packfile (as opposed to loose objects), so it can safely use QUICK
(which is documented as checking packed after packed and loose). If no
one disagrees, I can make such a patch after jt/push-avoid-lazy-fetch is
merged to master (as is the plan, according to What's Cooking [1]).

[1] https://public-inbox.org/git/xmqq8sprhgzc@gitster-ct.c.googlers.com/


Re: [PATCH v2] send-pack: never fetch when checking exclusions

2019-10-11 Thread Jeff King
On Fri, Oct 11, 2019 at 08:31:30AM -0400, Derrick Stolee wrote:

> >> Ensure that these lazy fetches do not occur.
> > 
> > That makes sense. For similar reasons, should we be using
> > OBJECT_INFO_QUICK here? If the other side has a bunch of ref tips that
> > we don't have, we'll end up re-scanning the pack directory over and over
> > (which is _usually_ pretty quick, but can be slow if you have a lot of
> > packs locally). And it's OK if we racily miss out on an exclusion due to
> > somebody else repacking simultaneously.
> 
> That's a good idea. We can hint to the object store that we don't expect
> misses to be due to a concurrent repack, so we don't want to reprepare
> pack-files.

As a general rule (and why I'm raising this issue in reply to Jonathan's
patch), I think most or all sites that want OBJECT_INFO_QUICK will want
SKIP_FETCH_OBJECT as well, and vice versa. The reasoning is generally
the same:

  - it's OK to racily have a false negative (we'll still be correct, but
possibly a little less optimal)

  - it's expected and normal to be missing the object, so spending time
double-checking the pack store wastes measurable time in real-world
cases

-Peff


Re: [PATCH v2] send-pack: never fetch when checking exclusions

2019-10-11 Thread Derrick Stolee
On 10/11/2019 2:12 AM, Jeff King wrote:
> On Tue, Oct 08, 2019 at 11:37:39AM -0700, Jonathan Tan wrote:
> 
>> When building the packfile to be sent, send_pack() is given a list of
>> remote refs to be used as exclusions. For each ref, it first checks if
>> the ref exists locally, and if it does, passes it with a "^" prefix to
>> pack-objects. However, in a partial clone, the check may trigger a lazy
>> fetch.
>>
>> The additional commit ancestry information obtained during such fetches
>> may show that certain objects that would have been sent are already
>> known to the server, resulting in a smaller pack being sent. But this is
>> at the cost of fetching from many possibly unrelated refs, and the lazy
>> fetches do not help at all in the typical case where the client is
>> up-to-date with the upstream of the branch being pushed.
>>
>> Ensure that these lazy fetches do not occur.
> 
> That makes sense. For similar reasons, should we be using
> OBJECT_INFO_QUICK here? If the other side has a bunch of ref tips that
> we don't have, we'll end up re-scanning the pack directory over and over
> (which is _usually_ pretty quick, but can be slow if you have a lot of
> packs locally). And it's OK if we racily miss out on an exclusion due to
> somebody else repacking simultaneously.

That's a good idea. We can hint to the object store that we don't expect
misses to be due to a concurrent repack, so we don't want to reprepare
pack-files.

-Stolee



Re: [PATCH v2] send-pack: never fetch when checking exclusions

2019-10-10 Thread Jeff King
On Tue, Oct 08, 2019 at 11:37:39AM -0700, Jonathan Tan wrote:

> When building the packfile to be sent, send_pack() is given a list of
> remote refs to be used as exclusions. For each ref, it first checks if
> the ref exists locally, and if it does, passes it with a "^" prefix to
> pack-objects. However, in a partial clone, the check may trigger a lazy
> fetch.
> 
> The additional commit ancestry information obtained during such fetches
> may show that certain objects that would have been sent are already
> known to the server, resulting in a smaller pack being sent. But this is
> at the cost of fetching from many possibly unrelated refs, and the lazy
> fetches do not help at all in the typical case where the client is
> up-to-date with the upstream of the branch being pushed.
> 
> Ensure that these lazy fetches do not occur.

That makes sense. For similar reasons, should we be using
OBJECT_INFO_QUICK here? If the other side has a bunch of ref tips that
we don't have, we'll end up re-scanning the pack directory over and over
(which is _usually_ pretty quick, but can be slow if you have a lot of
packs locally). And it's OK if we racily miss out on an exclusion due to
somebody else repacking simultaneously.

-Peff