Re: [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests

2017-12-11 Thread Jonathan Tan
On Fri, 8 Dec 2017 14:30:10 -0800
Brandon Williams  wrote:

> I just finished reading through parts 1-3.  Overall I like the series.
> There are a few point's that I'm not a big fan of but i wasn't able to
> come up with a better alternative.  One of these being the need for a
> global variable to tell the fetch-object logic to not go to the server
> to try and fetch a missing object.

I didn't really like that approach too but I went with that because,
like you, I couldn't come up with a better one. The main issue is that
too many functions (e.g. parse_commit() in commit.c) indirectly read
objects, and I couldn't find a better way to control them all. Ideally,
we should have a "struct object_store" (or maybe "struct repository"
could do this too) on which we can set "fetch_if_missing", and have all
object-reading functions take a pointer to this struct. Or completely
separate the object-reading and object-parsing code (e.g. commit.c
should not be able to read objects at all). Or both.

Any of these would be major undertakings, though, and there are good
reasons for why the same function does the reading and parsing (for
example, parse_commit() does not perform any reading if the object has
been already parsed).

> One other thing i noticed was it looks like when you discover that you
> are missing a blob you you'll try to fault it in from the server without
> first checking its an object the server would even have.  Shouldn't you
> first do a check to verify that the object in question is a promised
> object before you go out to contact the server to request it?  You may
> have already ruled this out for some reason I'm not aware of (maybe its
> too costly to compute?).

It is quite costly to compute - in the worst case, we would need to read
every object in every promisor packfile of one or more certain types
(e.g. if we know that we're fetching a blob, we need to read every tree)
to find out if the object we want is a promisor object.

Such a check would be better at surfacing mistakes (e.g. the user giving
the wrong SHA-1) early, but beyond that, I don't think that having the
check is very important. Consider these two very common situations:

 (1) Fetching a single branch by its tip's SHA-1. A naive implementation
 will first check if we have that SHA-1, which triggers the dynamic
 fetch (since it is an object read), and assuming success, notice
 that we indeed have that tip, and not fetch anything else. The
 check you describe will avoid this situation.
 (2) Dynamically fetching a missing blob by its SHA-1. A naive
 implementation will first check if we have that SHA-1, which
 triggers the dynamic fetch, and that fetch will first check if we
 have that SHA-1, and so on (thus, an infinite loop). The check you
 describe will not avoid that situation.

The check solves (1), but we still need a solution to (2) - I used
"fetch_if_missing", as discussed in your previous question and my answer
to that. A solution to (2) is usually also a solution to (1), so the
check wouldn't help much here.


Re: [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests

2017-12-08 Thread Brandon Williams
On 12/08, Jeff Hostetler wrote:
> From: Jeff Hostetler 
> 
> This is V7 of part 3 of partial clone.  It builds upon V7 of part 2
> (which builds upon V6 of part 1).
> 
> This version adds additional tests, fixes test errors on the MAC version,
> and squashes some fixup commits.
> 
> It also restores functionality accidentally dropped from the V6 series
> for "git fetch" to automatically inherit the partial-clone filter-spec
> when appropriate.  This version extends the --no-filter argument to
> override this inheritance.
> 

I just finished reading through parts 1-3.  Overall I like the series.
There are a few point's that I'm not a big fan of but i wasn't able to
come up with a better alternative.  One of these being the need for a
global variable to tell the fetch-object logic to not go to the server
to try and fetch a missing object.

One other thing i noticed was it looks like when you discover that you
are missing a blob you you'll try to fault it in from the server without
first checking its an object the server would even have.  Shouldn't you
first do a check to verify that the object in question is a promised
object before you go out to contact the server to request it?  You may
have already ruled this out for some reason I'm not aware of (maybe its
too costly to compute?).


-- 
Brandon Williams


Re: [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests

2017-12-08 Thread Junio C Hamano
Jeff Hostetler  writes:

> On 12/8/2017 12:58 PM, Junio C Hamano wrote:
>> Jeff Hostetler  writes:
>>
>>> From: Jeff Hostetler 
>>>
>>> This is V7 of part 3 of partial clone.  It builds upon V7 of part 2
>>> (which builds upon V6 of part 1).
>>
>> Aren't the three patches at the bottom sort-of duplicate from the
>> part 2 series?
>>
>
> oops.  yes, you're right.  it looks like i selected pc*6*_p2..pc7_p3
> rather than pc*7*_p2..pc7_p3.  sorry for the typo.
>
> and since the only changes in p2 were to squash those 2 commits near
> the tip of p2, only those 3 commits changed SHAs in v7 over v6.
>
> so, please disregard the duplicates.
>
> would you like me to send a corrected V8 for p3 ?

Nah.  I just wanted to make sure that I am discarding the right ones
(i.e. 1-3/16 of partial-clone, not 8-10/10 of fsck-promisors).

Thanks for an update.



Re: [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests

2017-12-08 Thread Jeff Hostetler



On 12/8/2017 12:58 PM, Junio C Hamano wrote:

Jeff Hostetler  writes:


From: Jeff Hostetler 

This is V7 of part 3 of partial clone.  It builds upon V7 of part 2
(which builds upon V6 of part 1).


Aren't the three patches at the bottom sort-of duplicate from the
part 2 series?



oops.  yes, you're right.  it looks like i selected pc*6*_p2..pc7_p3
rather than pc*7*_p2..pc7_p3.  sorry for the typo.

and since the only changes in p2 were to squash those 2 commits near
the tip of p2, only those 3 commits changed SHAs in v7 over v6.

so, please disregard the duplicates.

would you like me to send a corrected V8 for p3 ?

Jeff


Re: [PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests

2017-12-08 Thread Junio C Hamano
Jeff Hostetler  writes:

> From: Jeff Hostetler 
>
> This is V7 of part 3 of partial clone.  It builds upon V7 of part 2
> (which builds upon V6 of part 1).

Aren't the three patches at the bottom sort-of duplicate from the
part 2 series?



[PATCH v7 00/16] Parial clone part 3: clone, fetch, fetch-pack, upload-pack, and tests

2017-12-08 Thread Jeff Hostetler
From: Jeff Hostetler 

This is V7 of part 3 of partial clone.  It builds upon V7 of part 2
(which builds upon V6 of part 1).

This version adds additional tests, fixes test errors on the MAC version,
and squashes some fixup commits.

It also restores functionality accidentally dropped from the V6 series
for "git fetch" to automatically inherit the partial-clone filter-spec
when appropriate.  This version extends the --no-filter argument to
override this inheritance.

Jeff Hostetler (8):
  upload-pack: add object filtering for partial clone
  fetch-pack, index-pack, transport: partial clone
  fetch-pack: add --no-filter
  fetch: support filters
  partial-clone: define partial clone settings in config
  t5616: end-to-end tests for partial clone
  fetch: inherit filter-spec from partial clone
  t5616: test bulk prefetch after partial fetch

Jonathan Tan (8):
  sha1_file: support lazily fetching missing objects
  rev-list: support termination at promisor objects
  gc: do not repack promisor packfiles
  fetch-pack: test support excluding large blobs
  fetch: refactor calculation of remote list
  clone: partial clone
  unpack-trees: batch fetching of missing blobs
  fetch-pack: restore save_commit_buffer after use

 Documentation/config.txt  |   4 +
 Documentation/git-pack-objects.txt|  11 ++
 Documentation/rev-list-options.txt|  11 ++
 Documentation/technical/pack-protocol.txt |   8 +
 Documentation/technical/protocol-capabilities.txt |   8 +
 builtin/cat-file.c|   2 +
 builtin/clone.c   |  22 ++-
 builtin/fetch-pack.c  |  10 ++
 builtin/fetch.c   |  83 -
 builtin/fsck.c|   3 +
 builtin/gc.c  |   3 +
 builtin/index-pack.c  |   6 +
 builtin/pack-objects.c|  37 +++-
 builtin/prune.c   |   7 +
 builtin/repack.c  |   8 +-
 builtin/rev-list.c|  73 +++-
 cache.h   |   9 +
 config.c  |   5 +
 connected.c   |   2 +
 environment.c |   1 +
 fetch-object.c|  29 ++-
 fetch-object.h|   5 +
 fetch-pack.c  |  17 ++
 fetch-pack.h  |   2 +
 list-objects-filter-options.c |  92 --
 list-objects-filter-options.h |  18 ++
 list-objects.c|  29 ++-
 object.c  |   2 +-
 remote-curl.c |   6 +
 revision.c|  33 +++-
 revision.h|   5 +-
 sha1_file.c   |  32 +++-
 t/t0410-partial-clone.sh  | 206 +-
 t/t5500-fetch-pack.sh |  63 +++
 t/t5601-clone.sh  | 101 +++
 t/t5616-partial-clone.sh  | 146 +++
 transport-helper.c|   5 +
 transport.c   |   4 +
 transport.h   |   5 +
 unpack-trees.c|  22 +++
 upload-pack.c |  31 +++-
 41 files changed, 1110 insertions(+), 56 deletions(-)
 create mode 100755 t/t5616-partial-clone.sh

-- 
2.9.3