Re: [PATCH 00/13] WIP Partial clone part 1: object filtering
Hi, Jonathan Tan wrote: > As for how this patch set (excluding the partialclone patch) interacts > with my fsck series, they are relatively independent, as far as I can > tell. I'll rebase my fsck, gc, and lazy object fetch patches (but not > the fetch and clone parts, which we plan to instead adapt from Jeff > Hostetler's patches, as far as I know) on top of these and resend > those out once discussion on this has settled. Selfishly, I'll make a request here. The only part of the series that overlaps is the max-blob-bytes part, right? Would you mind re-sending the remainder of the series so it can go through the "next" -> "master" -> etc process in the usual way? My selfishness comes in because this would reduce the set of patches I have to apply locally to just the max-blob-bytes part. If I understood correctly, the rest of the series was something everyone agreed about, so there's no reason not to pursue including it in "next". I'd have the same request for this object filtering series, but I think it's already happening: the patches in this thread so far do not allow omitting some blobs from the local object store, so they should be able to go through the "next" -> "master" -> etc process as well without having to wait for the fsck patches. Thanks, Jonathan
Re: [PATCH 00/13] WIP Partial clone part 1: object filtering
Jeff Hostetler writes: > A question of mailing-list etiquette: in patch 9, I took Jonathan's > ideas for adding the "extensions.partialclone" setting and extended it > with some helper functions. His change was part of a larger change > with other code (fsck, IIRC) that I wasn't ready for. What is the > preferred way to give credit for something like this? I think the note you left in the proposed log message This patch is part of a patch originally authored by: Jonathan Tan was a bit misleading. The phrasing makes it sound as if it is more-or-less verbatim copy (either of the whole thing or just a subset) of Jonathan's patch, in which case, keeping the authorship intact, i.e. From: Jonathan Tan ... log message taken from the original, with additional ... note to describe any adjustment you did Signed-off-by: Jonathan Tan Signed-off-by: Jeff Hostetler would have been more appropriate. But if you just were inspired by the idea in his patch and wrote a one that is similar to but different from it that suits the need of your series better, then a note left in the log that instead does s/is part of/was inspired by/ would have been perfectly fine. Thanks.
Re: [PATCH 00/13] WIP Partial clone part 1: object filtering
Jonathan Tan writes: > On Tue, Oct 24, 2017 at 10:00 PM, Junio C Hamano wrote: >> OK, thanks for working well together. So does this (1) build on >> Jonathan's fsck-squelching series, or (2) ignores that and builds >> filtering first, potentially leaving the codebase to a broken state >> where it can create fsck-unclean repository until Jonathan's series >> is rebased on top of this, or (3) something else? [*1*] > > Excluding the partialclone patch (patch 9), I think that the answer is > (2), but I don't think that it leaves the codebase in a broken state. > In particular, none of the code modifies the repo, so it can't create > a fsck-unclean one. OK. It is not dangerous enough to matter until we start using the updated features in repack->rev-list|pack-objects ;-) As I said, I was mostly interested in learning what the big-picture direction was and also seeing you two were more-or-less in agreement. > The above is relevant only if we can exclude the partialclone patch, > but I think that we can and we should, as I wrote in my reply to Jeff > Hostetler [1]. OK. > As for how this patch set (excluding the partialclone patch) interacts > with my fsck series, they are relatively independent, as far as I can > tell. I'll rebase my fsck, gc, and lazy object fetch patches (but not > the fetch and clone parts, which we plan to instead adapt from Jeff > Hostetler's patches, as far as I know) on top of these and resend > those out once discussion on this has settled. OK. Thanks, I think tht is a reasonable way forward. > [1] > https://public-inbox.org/git/CAGf8dg+8AR=xfsv92odatktnjbnd1+ovzp9rs4y4otz_ezy...@mail.gmail.com/ > >> I also saw a patch marked as "this is from Jonathan's earlier work", >> taking the authorship (which to me implies that the changes were >> extensive enough), so I am a bit at loss envisioning how this piece >> fits in the bigger picture together with the other piece. > > The patch you mentioned is the partialclone patch, which I think can > be considered separately from the rest (as I said above). Good, that lets us sidestep Jeff's question about "how should the credits for this change attributed?", too.
Re: [PATCH 00/13] WIP Partial clone part 1: object filtering
On 10/25/2017 2:46 AM, Jonathan Tan wrote: On Tue, Oct 24, 2017 at 10:00 PM, Junio C Hamano wrote: OK, thanks for working well together. So does this (1) build on Jonathan's fsck-squelching series, or (2) ignores that and builds filtering first, potentially leaving the codebase to a broken state where it can create fsck-unclean repository until Jonathan's series is rebased on top of this, or (3) something else? [*1*] Excluding the partialclone patch (patch 9), I think that the answer is (2), but I don't think that it leaves the codebase in a broken state. In particular, none of the code modifies the repo, so it can't create a fsck-unclean one. My part 1 series starts with filtering, rev-list, and pack-objects. So, yes, it add new features that no one will use yet. But it is useful by itself. For example, you can use rev-list to ask for the set of missing objects that you would need to do a checkout (assuming you had commits and trees, but no blobs or no large blobs) *before* actually starting the checkout. I was then going to lay Jonathan's fsck/gc/dynamic object fetch on top of that. I started that here: https://github.com/jeffhostetler/git/pull/7 Patch 9 just adds the "extensions.partialclone*" fields and is prep for my rev-list and his fsck changes. I included it sooner rather than later so I can test rev-list on a repo with hand-deleted blobs. But yes, it can be omitted from this series and included with the fsck changes. Maybe one could say that this leaves the codebase with features that no one will ever use in the absence of partial clone, but I don't think that's true - rev-list with blob-size/sparse-specification filter seems independently useful, at least (for example, when desiring to operate on a repo in a sparse way without going through a workdir), and if we're planning to allow listing of objects, we probably should allow packing as well (especially since this doesn't add much code). The above is relevant only if we can exclude the partialclone patch, but I think that we can and we should, as I wrote in my reply to Jeff Hostetler [1]. As for how this patch set (excluding the partialclone patch) interacts with my fsck series, they are relatively independent, as far as I can tell. I'll rebase my fsck, gc, and lazy object fetch patches (but not the fetch and clone parts, which we plan to instead adapt from Jeff Hostetler's patches, as far as I know) on top of these and resend those out once discussion on this has settled. Yes, I want to get Jonathan's fsck/gc/lazy changes built into part 2. They came over easily and are independent of how/why there are missing objects. For part 3, I'd like to take my version of clone, fetch, fetch-pack, and upload-pack (that talks with the filters from part 1) and incorporate Jonathan's promisor concepts. That merge is a little messier, so I'd like to parts 1 and 2 a chance to get some feedback first. [1] https://public-inbox.org/git/CAGf8dg+8AR=xfsv92odatktnjbnd1+ovzp9rs4y4otz_ezy...@mail.gmail.com/ I also saw a patch marked as "this is from Jonathan's earlier work", taking the authorship (which to me implies that the changes were extensive enough), so I am a bit at loss envisioning how this piece fits in the bigger picture together with the other piece. The patch you mentioned is the partialclone patch, which I think can be considered separately from the rest (as I said above). A question of mailing-list etiquette: in patch 9, I took Jonathan's ideas for adding the "extensions.partialclone" setting and extended it with some helper functions. His change was part of a larger change with other code (fsck, IIRC) that I wasn't ready for. What is the preferred way to give credit for something like this? Thanks Jeff
Re: [PATCH 00/13] WIP Partial clone part 1: object filtering
On Tue, Oct 24, 2017 at 10:00 PM, Junio C Hamano wrote: > OK, thanks for working well together. So does this (1) build on > Jonathan's fsck-squelching series, or (2) ignores that and builds > filtering first, potentially leaving the codebase to a broken state > where it can create fsck-unclean repository until Jonathan's series > is rebased on top of this, or (3) something else? [*1*] Excluding the partialclone patch (patch 9), I think that the answer is (2), but I don't think that it leaves the codebase in a broken state. In particular, none of the code modifies the repo, so it can't create a fsck-unclean one. Maybe one could say that this leaves the codebase with features that no one will ever use in the absence of partial clone, but I don't think that's true - rev-list with blob-size/sparse-specification filter seems independently useful, at least (for example, when desiring to operate on a repo in a sparse way without going through a workdir), and if we're planning to allow listing of objects, we probably should allow packing as well (especially since this doesn't add much code). The above is relevant only if we can exclude the partialclone patch, but I think that we can and we should, as I wrote in my reply to Jeff Hostetler [1]. As for how this patch set (excluding the partialclone patch) interacts with my fsck series, they are relatively independent, as far as I can tell. I'll rebase my fsck, gc, and lazy object fetch patches (but not the fetch and clone parts, which we plan to instead adapt from Jeff Hostetler's patches, as far as I know) on top of these and resend those out once discussion on this has settled. [1] https://public-inbox.org/git/CAGf8dg+8AR=xfsv92odatktnjbnd1+ovzp9rs4y4otz_ezy...@mail.gmail.com/ > I also saw a patch marked as "this is from Jonathan's earlier work", > taking the authorship (which to me implies that the changes were > extensive enough), so I am a bit at loss envisioning how this piece > fits in the bigger picture together with the other piece. The patch you mentioned is the partialclone patch, which I think can be considered separately from the rest (as I said above).
Re: [PATCH 00/13] WIP Partial clone part 1: object filtering
Jeff Hostetler writes: > From: Jeff Hostetler > > I've been working with Jonathan Tan to combine our partial clone > proposals. This patch series represents a first step in that effort > and introduces an object filtering mechanism to select unwanted > objects. > > [1] traverse_commit_list and list-objects is extended to allow > various filters. > [2] rev-list is extended to expose filtering. This allows testing > of the filtering options. And can be used later to predict > missing objects before commands like checkout or merge. > [3] pack-objects is extended to use filtering parameters and build > packfiles that omit unwanted objects. > > This patch series lays the ground work for subsequent parts which > will extend clone, fetch, fetch-pack, upload-pack, fsck, and etc. OK, thanks for working well together. So does this (1) build on Jonathan's fsck-squelching series, or (2) ignores that and builds filtering first, potentially leaving the codebase to a broken state where it can create fsck-unclean repository until Jonathan's series is rebased on top of this, or (3) something else? [*1*] I also saw a patch marked as "this is from Jonathan's earlier work", taking the authorship (which to me implies that the changes were extensive enough), so I am a bit at loss envisioning how this piece fits in the bigger picture together with the other piece. [Footnote] *1* Not having the answer to this question does bother me, but it is perfectly fine if the answer is (2), especially while the series is in a WIP state.
Re: [PATCH 00/13] WIP Partial clone part 1: object filtering
On Tue, Oct 24, 2017 at 11:53 AM, Jeff Hostetler wrote: > From: Jeff Hostetler > > I've been working with Jonathan Tan to combine our partial clone > proposals. This patch series represents a first step in that effort > and introduces an object filtering mechanism to select unwanted > objects. > > [1] traverse_commit_list and list-objects is extended to allow > various filters. > [2] rev-list is extended to expose filtering. This allows testing > of the filtering options. And can be used later to predict > missing objects before commands like checkout or merge. > [3] pack-objects is extended to use filtering parameters and build > packfiles that omit unwanted objects. > > This patch series lays the ground work for subsequent parts which > will extend clone, fetch, fetch-pack, upload-pack, fsck, and etc. Thanks - I've taken a look at all of them except for the partialclone extension one, which I've only glanced over briefly. Apart from some style issues (indentation and typedef-ing of enums) I think that they generally look all right. One possible issue with using a formatted filter string (e.g. blob:none) directly passed to the server as opposed to actual client-interpreted flags (e.g. --blob-byte-limit=0) is that a user may be confused if the version of Git they're using supports fancier filters, which will work if they're running rev-list locally, but not when they're fetching from a not-so-fancy Git server. Maybe that is fine, though - something we've discussed internally is an ability to offload some calculations (e.g. git log -S) to Git servers, and if we end up doing something similar to that, users will need to deal with this problem anyway. The reason why I only glanced over the partialclone patch is because I think that that change needs more discussion than the rest, and it would be good to get the others in first. I know that you included the partialclone patch because it is needed by the rev-list one, but as I commented in the rev-list one, I think that it does not need to be aware of partial clones, at least for now. The overall ideas in the partialclone patch seem good, though - in particular, one conceptual difference from my patch [1] is that the filter setting is a property of the repository as opposed to the remote, which does seem like an improvement, because it makes it easier to make and explain e.g. a "git log --use-repo-filter -S" command that uses the preconfigured config. [1] https://public-inbox.org/git/407a298b52a9e0a2ee4135fe844e35b9a14c0f7b.1506714999.git.jonathanta...@google.com/
[PATCH 00/13] WIP Partial clone part 1: object filtering
From: Jeff Hostetler I've been working with Jonathan Tan to combine our partial clone proposals. This patch series represents a first step in that effort and introduces an object filtering mechanism to select unwanted objects. [1] traverse_commit_list and list-objects is extended to allow various filters. [2] rev-list is extended to expose filtering. This allows testing of the filtering options. And can be used later to predict missing objects before commands like checkout or merge. [3] pack-objects is extended to use filtering parameters and build packfiles that omit unwanted objects. This patch series lays the ground work for subsequent parts which will extend clone, fetch, fetch-pack, upload-pack, fsck, and etc. Jeff Hostetler (13): dir: allow exclusions from blob in addition to file list-objects-filter-map: extend oidmap to collect omitted objects list-objects: filter objects in traverse_commit_list list-objects-filter-blobs-none: add filter to omit all blobs list-objects-filter-blobs-limit: add large blob filtering list-objects-filter-sparse: add sparse filter list-objects-filter-options: common argument parsing list-objects: add traverse_commit_list_filtered method extension.partialclone: introduce partial clone extension rev-list: add list-objects filtering support t6112: rev-list object filtering test pack-objects: add list-objects filtering t5317: pack-objects object filtering test Documentation/git-pack-objects.txt | 8 +- Documentation/git-rev-list.txt | 5 +- Documentation/rev-list-options.txt | 30 ++ Documentation/technical/repository-version.txt | 22 ++ Makefile | 6 + builtin/pack-objects.c | 18 +- builtin/rev-list.c | 84 +- cache.h| 4 + config.h | 3 + dir.c | 51 +++- dir.h | 3 + environment.c | 2 + list-objects-filter-blobs-limit.c | 146 ++ list-objects-filter-blobs-limit.h | 18 ++ list-objects-filter-blobs-none.c | 83 ++ list-objects-filter-blobs-none.h | 18 ++ list-objects-filter-map.c | 63 list-objects-filter-map.h | 26 ++ list-objects-filter-options.c | 101 +++ list-objects-filter-options.h | 50 list-objects-filter-sparse.c | 241 list-objects-filter-sparse.h | 30 ++ list-objects.c | 111 +-- list-objects.h | 43 ++- partial-clone-utils.c | 99 +++ partial-clone-utils.h | 34 +++ setup.c| 15 + t/t5317-pack-objects-filter-objects.sh | 384 + t/t6112-rev-list-filters-objects.sh| 223 ++ 29 files changed, 1897 insertions(+), 24 deletions(-) create mode 100644 list-objects-filter-blobs-limit.c create mode 100644 list-objects-filter-blobs-limit.h create mode 100644 list-objects-filter-blobs-none.c create mode 100644 list-objects-filter-blobs-none.h create mode 100644 list-objects-filter-map.c create mode 100644 list-objects-filter-map.h create mode 100644 list-objects-filter-options.c create mode 100644 list-objects-filter-options.h create mode 100644 list-objects-filter-sparse.c create mode 100644 list-objects-filter-sparse.h create mode 100644 partial-clone-utils.c create mode 100644 partial-clone-utils.h create mode 100755 t/t5317-pack-objects-filter-objects.sh create mode 100755 t/t6112-rev-list-filters-objects.sh -- 2.9.3