Re: [PATCH 00/13] WIP Partial clone part 1: object filtering

2017-10-30 Thread Jonathan Nieder
Hi,

Jonathan Tan wrote:

> As for how this patch set (excluding the partialclone patch) interacts
> with my fsck series, they are relatively independent, as far as I can
> tell. I'll rebase my fsck, gc, and lazy object fetch patches (but not
> the fetch and clone parts, which we plan to instead adapt from Jeff
> Hostetler's patches, as far as I know) on top of these and resend
> those out once discussion on this has settled.

Selfishly, I'll make a request here.  The only part of the series that
overlaps is the max-blob-bytes part, right?  Would you mind re-sending
the remainder of the series so it can go through the "next" ->
"master" -> etc process in the usual way?

My selfishness comes in because this would reduce the set of patches I
have to apply locally to just the max-blob-bytes part.  If I
understood correctly, the rest of the series was something everyone
agreed about, so there's no reason not to pursue including it in
"next".

I'd have the same request for this object filtering series, but I
think it's already happening: the patches in this thread so far do not
allow omitting some blobs from the local object store, so they should
be able to go through the "next" -> "master" -> etc process as well
without having to wait for the fsck patches.

Thanks,
Jonathan


Re: [PATCH 00/13] WIP Partial clone part 1: object filtering

2017-10-25 Thread Junio C Hamano
Jeff Hostetler  writes:

> A question of mailing-list etiquette: in patch 9, I took Jonathan's
> ideas for adding the "extensions.partialclone" setting and extended it
> with some helper functions.  His change was part of a larger change
> with other code (fsck, IIRC) that I wasn't ready for.  What is the
> preferred way to give credit for something like this?

I think the note you left in the proposed log message

This patch is part of a patch originally authored by:
Jonathan Tan 

was a bit misleading.  The phrasing makes it sound as if it is
more-or-less verbatim copy (either of the whole thing or just a
subset) of Jonathan's patch, in which case, keeping the authorship
intact, i.e.

From: Jonathan Tan 

... log message taken from the original, with additional
... note to describe any adjustment you did

Signed-off-by: Jonathan Tan 
Signed-off-by: Jeff Hostetler 

would have been more appropriate.  But if you just were inspired by
the idea in his patch and wrote a one that is similar to but
different from it that suits the need of your series better, then a
note left in the log that instead does s/is part of/was inspired by/
would have been perfectly fine.

Thanks.





Re: [PATCH 00/13] WIP Partial clone part 1: object filtering

2017-10-25 Thread Junio C Hamano
Jonathan Tan  writes:

> On Tue, Oct 24, 2017 at 10:00 PM, Junio C Hamano  wrote:
>> OK, thanks for working well together.  So does this (1) build on
>> Jonathan's fsck-squelching series, or (2) ignores that and builds
>> filtering first, potentially leaving the codebase to a broken state
>> where it can create fsck-unclean repository until Jonathan's series
>> is rebased on top of this, or (3) something else?  [*1*]
>
> Excluding the partialclone patch (patch 9), I think that the answer is
> (2), but I don't think that it leaves the codebase in a broken state.
> In particular, none of the code modifies the repo, so it can't create
> a fsck-unclean one.

OK.  It is not dangerous enough to matter until we start using the
updated features in repack->rev-list|pack-objects ;-)  As I said, I
was mostly interested in learning what the big-picture direction was
and also seeing you two were more-or-less in agreement.

> The above is relevant only if we can exclude the partialclone patch,
> but I think that we can and we should, as I wrote in my reply to Jeff
> Hostetler [1].

OK.

> As for how this patch set (excluding the partialclone patch) interacts
> with my fsck series, they are relatively independent, as far as I can
> tell. I'll rebase my fsck, gc, and lazy object fetch patches (but not
> the fetch and clone parts, which we plan to instead adapt from Jeff
> Hostetler's patches, as far as I know) on top of these and resend
> those out once discussion on this has settled.

OK.  Thanks, I think tht is a reasonable way forward.

> [1] 
> https://public-inbox.org/git/CAGf8dg+8AR=xfsv92odatktnjbnd1+ovzp9rs4y4otz_ezy...@mail.gmail.com/
>
>> I also saw a patch marked as "this is from Jonathan's earlier work",
>> taking the authorship (which to me implies that the changes were
>> extensive enough), so I am a bit at loss envisioning how this piece
>> fits in the bigger picture together with the other piece.
>
> The patch you mentioned is the partialclone patch, which I think can
> be considered separately from the rest (as I said above).

Good, that lets us sidestep Jeff's question about "how should the
credits for this change attributed?", too.


Re: [PATCH 00/13] WIP Partial clone part 1: object filtering

2017-10-25 Thread Jeff Hostetler



On 10/25/2017 2:46 AM, Jonathan Tan wrote:

On Tue, Oct 24, 2017 at 10:00 PM, Junio C Hamano  wrote:

OK, thanks for working well together.  So does this (1) build on
Jonathan's fsck-squelching series, or (2) ignores that and builds
filtering first, potentially leaving the codebase to a broken state
where it can create fsck-unclean repository until Jonathan's series
is rebased on top of this, or (3) something else?  [*1*]


Excluding the partialclone patch (patch 9), I think that the answer is
(2), but I don't think that it leaves the codebase in a broken state.
In particular, none of the code modifies the repo, so it can't create
a fsck-unclean one.


My part 1 series starts with filtering, rev-list, and pack-objects.
So, yes, it add new features that no one will use yet.  But it is useful
by itself.  For example, you can use rev-list to ask for the set of
missing objects that you would need to do a checkout (assuming you had
commits and trees, but no blobs or no large blobs) *before* actually
starting the checkout.

I was then going to lay Jonathan's fsck/gc/dynamic object fetch
on top of that.  I started that here:
https://github.com/jeffhostetler/git/pull/7

Patch 9 just adds the "extensions.partialclone*" fields and is prep
for my rev-list and his fsck changes.  I included it sooner rather than
later so I can test rev-list on a repo with hand-deleted blobs.
But yes, it can be omitted from this series and included with the fsck
changes.




Maybe one could say that this leaves the codebase with features that
no one will ever use in the absence of partial clone, but I don't
think that's true - rev-list with blob-size/sparse-specification
filter seems independently useful, at least (for example, when
desiring to operate on a repo in a sparse way without going through a
workdir), and if we're planning to allow listing of objects, we
probably should allow packing as well (especially since this doesn't
add much code).

The above is relevant only if we can exclude the partialclone patch,
but I think that we can and we should, as I wrote in my reply to Jeff
Hostetler [1].

As for how this patch set (excluding the partialclone patch) interacts
with my fsck series, they are relatively independent, as far as I can
tell. I'll rebase my fsck, gc, and lazy object fetch patches (but not
the fetch and clone parts, which we plan to instead adapt from Jeff
Hostetler's patches, as far as I know) on top of these and resend
those out once discussion on this has settled.


Yes, I want to get Jonathan's fsck/gc/lazy changes built into part 2.
They came over easily and are independent of how/why there are missing
objects.

For part 3, I'd like to take my version of clone, fetch, fetch-pack,
and upload-pack (that talks with the filters from part 1) and incorporate
Jonathan's promisor concepts.  That merge is a little messier, so I'd
like to parts 1 and 2 a chance to get some feedback first.



[1] 
https://public-inbox.org/git/CAGf8dg+8AR=xfsv92odatktnjbnd1+ovzp9rs4y4otz_ezy...@mail.gmail.com/


I also saw a patch marked as "this is from Jonathan's earlier work",
taking the authorship (which to me implies that the changes were
extensive enough), so I am a bit at loss envisioning how this piece
fits in the bigger picture together with the other piece.


The patch you mentioned is the partialclone patch, which I think can
be considered separately from the rest (as I said above).


A question of mailing-list etiquette: in patch 9, I took Jonathan's
ideas for adding the "extensions.partialclone" setting and extended it
with some helper functions.  His change was part of a larger change
with other code (fsck, IIRC) that I wasn't ready for.  What is the
preferred way to give credit for something like this?


Thanks
Jeff




Re: [PATCH 00/13] WIP Partial clone part 1: object filtering

2017-10-24 Thread Jonathan Tan
On Tue, Oct 24, 2017 at 10:00 PM, Junio C Hamano  wrote:
> OK, thanks for working well together.  So does this (1) build on
> Jonathan's fsck-squelching series, or (2) ignores that and builds
> filtering first, potentially leaving the codebase to a broken state
> where it can create fsck-unclean repository until Jonathan's series
> is rebased on top of this, or (3) something else?  [*1*]

Excluding the partialclone patch (patch 9), I think that the answer is
(2), but I don't think that it leaves the codebase in a broken state.
In particular, none of the code modifies the repo, so it can't create
a fsck-unclean one.

Maybe one could say that this leaves the codebase with features that
no one will ever use in the absence of partial clone, but I don't
think that's true - rev-list with blob-size/sparse-specification
filter seems independently useful, at least (for example, when
desiring to operate on a repo in a sparse way without going through a
workdir), and if we're planning to allow listing of objects, we
probably should allow packing as well (especially since this doesn't
add much code).

The above is relevant only if we can exclude the partialclone patch,
but I think that we can and we should, as I wrote in my reply to Jeff
Hostetler [1].

As for how this patch set (excluding the partialclone patch) interacts
with my fsck series, they are relatively independent, as far as I can
tell. I'll rebase my fsck, gc, and lazy object fetch patches (but not
the fetch and clone parts, which we plan to instead adapt from Jeff
Hostetler's patches, as far as I know) on top of these and resend
those out once discussion on this has settled.

[1] 
https://public-inbox.org/git/CAGf8dg+8AR=xfsv92odatktnjbnd1+ovzp9rs4y4otz_ezy...@mail.gmail.com/

> I also saw a patch marked as "this is from Jonathan's earlier work",
> taking the authorship (which to me implies that the changes were
> extensive enough), so I am a bit at loss envisioning how this piece
> fits in the bigger picture together with the other piece.

The patch you mentioned is the partialclone patch, which I think can
be considered separately from the rest (as I said above).


Re: [PATCH 00/13] WIP Partial clone part 1: object filtering

2017-10-24 Thread Junio C Hamano
Jeff Hostetler  writes:

> From: Jeff Hostetler 
>
> I've been working with Jonathan Tan to combine our partial clone
> proposals.  This patch series represents a first step in that effort
> and introduces an object filtering mechanism to select unwanted
> objects.
>
> [1] traverse_commit_list and list-objects is extended to allow
> various filters.
> [2] rev-list is extended to expose filtering.  This allows testing
> of the filtering options.  And can be used later to predict
> missing objects before commands like checkout or merge.
> [3] pack-objects is extended to use filtering parameters and build
> packfiles that omit unwanted objects.
>
> This patch series lays the ground work for subsequent parts which
> will extend clone, fetch, fetch-pack, upload-pack, fsck, and etc.

OK, thanks for working well together.  So does this (1) build on
Jonathan's fsck-squelching series, or (2) ignores that and builds
filtering first, potentially leaving the codebase to a broken state
where it can create fsck-unclean repository until Jonathan's series
is rebased on top of this, or (3) something else?  [*1*]

I also saw a patch marked as "this is from Jonathan's earlier work",
taking the authorship (which to me implies that the changes were
extensive enough), so I am a bit at loss envisioning how this piece
fits in the bigger picture together with the other piece.


[Footnote]

*1* Not having the answer to this question does bother me, but it is
perfectly fine if the answer is (2), especially while the series
is in a WIP state.



Re: [PATCH 00/13] WIP Partial clone part 1: object filtering

2017-10-24 Thread Jonathan Tan
On Tue, Oct 24, 2017 at 11:53 AM, Jeff Hostetler  wrote:
> From: Jeff Hostetler 
>
> I've been working with Jonathan Tan to combine our partial clone
> proposals.  This patch series represents a first step in that effort
> and introduces an object filtering mechanism to select unwanted
> objects.
>
> [1] traverse_commit_list and list-objects is extended to allow
> various filters.
> [2] rev-list is extended to expose filtering.  This allows testing
> of the filtering options.  And can be used later to predict
> missing objects before commands like checkout or merge.
> [3] pack-objects is extended to use filtering parameters and build
> packfiles that omit unwanted objects.
>
> This patch series lays the ground work for subsequent parts which
> will extend clone, fetch, fetch-pack, upload-pack, fsck, and etc.

Thanks - I've taken a look at all of them except for the partialclone
extension one, which I've only glanced over briefly. Apart from some
style issues (indentation and typedef-ing of enums) I think that they
generally look all right.

One possible issue with using a formatted filter string (e.g.
blob:none) directly passed to the server as opposed to actual
client-interpreted flags (e.g. --blob-byte-limit=0) is that a user may
be confused if the version of Git they're using supports fancier
filters, which will work if they're running rev-list locally, but not
when they're fetching from a not-so-fancy Git server. Maybe that is
fine, though - something we've discussed internally is an ability to
offload some calculations (e.g. git log -S) to Git servers, and if we
end up doing something similar to that, users will need to deal with
this problem anyway.

The reason why I only glanced over the partialclone patch is because I
think that that change needs more discussion than the rest, and it
would be good to get the others in first. I know that you included the
partialclone patch because it is needed by the rev-list one, but as I
commented in the rev-list one, I think that it does not need to be
aware of partial clones, at least for now. The overall ideas in the
partialclone patch seem good, though - in particular, one conceptual
difference from my patch [1] is that the filter setting is a property
of the repository as opposed to the remote, which does seem like an
improvement, because it makes it easier to make and explain e.g. a
"git log --use-repo-filter -S" command that uses the preconfigured
config.

[1] 
https://public-inbox.org/git/407a298b52a9e0a2ee4135fe844e35b9a14c0f7b.1506714999.git.jonathanta...@google.com/


[PATCH 00/13] WIP Partial clone part 1: object filtering

2017-10-24 Thread Jeff Hostetler
From: Jeff Hostetler 

I've been working with Jonathan Tan to combine our partial clone
proposals.  This patch series represents a first step in that effort
and introduces an object filtering mechanism to select unwanted
objects.

[1] traverse_commit_list and list-objects is extended to allow
various filters.
[2] rev-list is extended to expose filtering.  This allows testing
of the filtering options.  And can be used later to predict
missing objects before commands like checkout or merge.
[3] pack-objects is extended to use filtering parameters and build
packfiles that omit unwanted objects.

This patch series lays the ground work for subsequent parts which
will extend clone, fetch, fetch-pack, upload-pack, fsck, and etc.


Jeff Hostetler (13):
  dir: allow exclusions from blob in addition to file
  list-objects-filter-map: extend oidmap to collect omitted objects
  list-objects: filter objects in traverse_commit_list
  list-objects-filter-blobs-none: add filter to omit all blobs
  list-objects-filter-blobs-limit: add large blob filtering
  list-objects-filter-sparse: add sparse filter
  list-objects-filter-options: common argument parsing
  list-objects: add traverse_commit_list_filtered method
  extension.partialclone: introduce partial clone extension
  rev-list: add list-objects filtering support
  t6112: rev-list object filtering test
  pack-objects: add list-objects filtering
  t5317: pack-objects object filtering test

 Documentation/git-pack-objects.txt |   8 +-
 Documentation/git-rev-list.txt |   5 +-
 Documentation/rev-list-options.txt |  30 ++
 Documentation/technical/repository-version.txt |  22 ++
 Makefile   |   6 +
 builtin/pack-objects.c |  18 +-
 builtin/rev-list.c |  84 +-
 cache.h|   4 +
 config.h   |   3 +
 dir.c  |  51 +++-
 dir.h  |   3 +
 environment.c  |   2 +
 list-objects-filter-blobs-limit.c  | 146 ++
 list-objects-filter-blobs-limit.h  |  18 ++
 list-objects-filter-blobs-none.c   |  83 ++
 list-objects-filter-blobs-none.h   |  18 ++
 list-objects-filter-map.c  |  63 
 list-objects-filter-map.h  |  26 ++
 list-objects-filter-options.c  | 101 +++
 list-objects-filter-options.h  |  50 
 list-objects-filter-sparse.c   | 241 
 list-objects-filter-sparse.h   |  30 ++
 list-objects.c | 111 +--
 list-objects.h |  43 ++-
 partial-clone-utils.c  |  99 +++
 partial-clone-utils.h  |  34 +++
 setup.c|  15 +
 t/t5317-pack-objects-filter-objects.sh | 384 +
 t/t6112-rev-list-filters-objects.sh| 223 ++
 29 files changed, 1897 insertions(+), 24 deletions(-)
 create mode 100644 list-objects-filter-blobs-limit.c
 create mode 100644 list-objects-filter-blobs-limit.h
 create mode 100644 list-objects-filter-blobs-none.c
 create mode 100644 list-objects-filter-blobs-none.h
 create mode 100644 list-objects-filter-map.c
 create mode 100644 list-objects-filter-map.h
 create mode 100644 list-objects-filter-options.c
 create mode 100644 list-objects-filter-options.h
 create mode 100644 list-objects-filter-sparse.c
 create mode 100644 list-objects-filter-sparse.h
 create mode 100644 partial-clone-utils.c
 create mode 100644 partial-clone-utils.h
 create mode 100755 t/t5317-pack-objects-filter-objects.sh
 create mode 100755 t/t6112-rev-list-filters-objects.sh

-- 
2.9.3