Re: [PATCH v2 4/6] list-objects: filter objects in traverse_commit_list
On 11/2/2017 3:32 PM, Jonathan Tan wrote: On Thu, 2 Nov 2017 17:50:11 + Jeff Hostetlerwrote: + if (skip_prefix(v0, "oid=", )) { + filter_options->choice = LOFC_SPARSE_OID; + if (!get_oid_with_context(v1, GET_OID_BLOB, + _oid, )) { + /* +* We successfully converted the +* into an actual OID. Rewrite the raw_value +* in canonoical form with just the OID. +* (If we send this request to the server, we +* want an absolute expression rather than a +* local-ref-relative expression.) +*/ I think this would lead to confusing behavior - for example, a fetch with "--filter=oid=mybranch:sparseconfig" would have different results depending on whether "mybranch" refers to a valid object locally. The way I see it, this should either (i) only accept full 40-character OIDs or (ii) retain the raw string to be interpreted only when the filtering is done. (i) is simpler and safer, but is not so useful. In both cases, if the user really wants client-side interpretation, they can still use "$(git rev-parse foo)" to make it explicit. Good point. I'll change it to always pass the original expression so that it is evaluated wherever the filtering is actually performed. + free((char *)filter_options->raw_value); + filter_options->raw_value = + xstrfmt("sparse:oid=%s", + oid_to_hex(_oid)); + filter_options->sparse_oid_value = + oiddup(_oid); + } else { + /* +* We could not turn the into an +* OID. Leave the raw_value as is in case +* the server can parse it. (It may refer to +* a branch, commit, or blob we don't have.) +*/ + } + return 0; + } + + if (skip_prefix(v0, "path=", )) { + filter_options->choice = LOFC_SPARSE_PATH; + filter_options->sparse_path_value = strdup(v1); + return 0; + } + } + + die(_("invalid filter expression '%s'"), arg); + return 0; +} [snip] +void arg_format_list_objects_filter( + struct argv_array *argv_array, + const struct list_objects_filter_options *filter_options) Is this function used anywhere (in this patch or subsequent patches)? It is used in upload-pack.c in part 3. I'll remove it from part 1 and revisit in part 3. diff --git a/list-objects-filter.c b/list-objects-filter.c +/* See object.h and revision.h */ +#define FILTER_REVISIT (1<<25) Looking later in the code, this flag indicates that a tree has been SHOWN, so it might be better to just call this FILTER_SHOWN. I'll amend this. There are already several SHOWN bits that behave slightly differently. I'll update and document this better. Thanks. [snip] +struct frame { + int defval; Document this variable? + int child_prov_omit : 1; I think it's clearer if we use "unsigned" here. Also, document this (e.g. "1 if any descendant of this tree object was provisionally omitted"). got it. thanks. +enum list_objects_filter_type { + LOFT_BEGIN_TREE, + LOFT_END_TREE, + LOFT_BLOB +}; Optional: probably a better name would be list_objects_filter_situation. got it. thanks. +void traverse_commit_list_filtered( + struct list_objects_filter_options *filter_options, + struct rev_info *revs, + show_commit_fn show_commit, + show_object_fn show_object, + void *show_data, + struct oidset *omitted) +{ + filter_object_fn filter_fn = NULL; + filter_free_fn filter_free_fn = NULL; + void *filter_data = NULL; + + filter_data = list_objects_filter__init(omitted, filter_options, + _fn, _free_fn); + do_traverse(revs, show_commit, show_object, show_data, + filter_fn, filter_data); + if (filter_data && filter_free_fn) + filter_free_fn(filter_data); +} This function traverse_commit_list_filtered() is in list-objects.c but in list-objects-filter.h, if I'm reading the diff correctly? oops. thanks. Overall, this looks like a good change. Object traversal was upgraded with the behaviors of MARK_SEEN and SHOW independently controllable and with the ability
Re: [PATCH v2 4/6] list-objects: filter objects in traverse_commit_list
On Mon, 6 Nov 2017 12:51:52 -0500 Jeff Hostetlerwrote: > Jonathan and I were talking off-list about the performance > effects of inspecting the pathnames to identify the ".git*" > special files. I added it in my first draft back in the spring, > thinking that even if you set the blob-limit to a small > number (or zero), you'd probably still always want the > .gitattribute and .gitignore files. But now with the addition > of the sparse filter and functional dynamic object fetching, > I'm not sure I see the need for this. > > Also, if the primary use of the blob-limit is to filter out > giant binary assets, it is unlikely anyone is going to have > a 1MB+ .git* file, so it is unlikely that the is_special_file > would include anything that wouldn't already be included by > the size criteria. > > So, if there's no objections, I think I'll remove this and > simplify the blob-limit filter function. (That would let me > get rid of the provisional omit code here.) This sounds like a good idea to me. (For the record, one of the performance impacts of checking the filename is that bitmaps can't be used to obtain a whitelist of what is to be packed - instead, a regular object walk must be used.)
Re: [PATCH v2 4/6] list-objects: filter objects in traverse_commit_list
On 11/2/2017 1:50 PM, Jeff Hostetler wrote: From: Jeff HostetlerCreate traverse_commit_list_filtered() and add filtering interface to allow certain objects to be omitted from the traversal. ... diff --git a/list-objects-filter.c b/list-objects-filter.c new file mode 100644 index 000..7f28425 --- /dev/null +++ b/list-objects-filter.c ... +/* + * A filter for list-objects to omit large blobs, + * but always include ".git*" special files. + * And to OPTIONALLY collect a list of the omitted OIDs. + */ Jonathan and I were talking off-list about the performance effects of inspecting the pathnames to identify the ".git*" special files. I added it in my first draft back in the spring, thinking that even if you set the blob-limit to a small number (or zero), you'd probably still always want the .gitattribute and .gitignore files. But now with the addition of the sparse filter and functional dynamic object fetching, I'm not sure I see the need for this. Also, if the primary use of the blob-limit is to filter out giant binary assets, it is unlikely anyone is going to have a 1MB+ .git* file, so it is unlikely that the is_special_file would include anything that wouldn't already be included by the size criteria. So, if there's no objections, I think I'll remove this and simplify the blob-limit filter function. (That would let me get rid of the provisional omit code here.) Jeff
Re: [PATCH v2 4/6] list-objects: filter objects in traverse_commit_list
On 11/3/2017 7:54 AM, Johannes Schindelin wrote: Hi Jonathan, On Thu, 2 Nov 2017, Jonathan Tan wrote: On Thu, 2 Nov 2017 17:50:11 + Jeff Hostetlerwrote: +int parse_list_objects_filter(struct list_objects_filter_options *filter_options, + const char *arg) Returning void is fine, I think. It seems that all your code paths either return 0 or die. Can we please start to encourage libified code, rather than discourage it? I did that so that I could call it from the opt_parse_... version below it that is used by the OPT_ macros. And Johannes is right, it bothers me that there doesn't seem to be a hard line where one should or should not call die() vs returning an error code. Jeff
Re: [PATCH v2 4/6] list-objects: filter objects in traverse_commit_list
Hi Jonathan, On Thu, 2 Nov 2017, Jonathan Tan wrote: > On Thu, 2 Nov 2017 17:50:11 + > Jeff Hostetlerwrote: > > > +int parse_list_objects_filter(struct list_objects_filter_options > > *filter_options, > > + const char *arg) > > Returning void is fine, I think. It seems that all your code paths > either return 0 or die. Can we please start to encourage libified code, rather than discourage it? Thank you, Dscho
Re: [PATCH v2 4/6] list-objects: filter objects in traverse_commit_list
On Thu, 2 Nov 2017 17:50:11 + Jeff Hostetlerwrote: > +int parse_list_objects_filter(struct list_objects_filter_options > *filter_options, > + const char *arg) Returning void is fine, I think. It seems that all your code paths either return 0 or die. > +{ > + struct object_context oc; > + struct object_id sparse_oid; > + const char *v0; > + const char *v1; > + > + if (filter_options->choice) > + die(_("multiple object filter types cannot be combined")); > + > + /* > + * TODO consider rejecting 'arg' if it contains any > + * TODO injection characters (since we might send this > + * TODO to a sub-command or to the server and we don't > + * TODO want to deal with legacy quoting/escaping for > + * TODO a new feature). > + */ > + > + filter_options->raw_value = strdup(arg); > + > + if (skip_prefix(arg, "blob:", ) || skip_prefix(arg, "blobs:", )) { > + if (!strcmp(v0, "none")) { > + filter_options->choice = LOFC_BLOB_NONE; > + return 0; > + } > + > + if (skip_prefix(v0, "limit=", ) && > + git_parse_ulong(v1, _options->blob_limit_value)) { > + filter_options->choice = LOFC_BLOB_LIMIT; > + return 0; > + } > + } > + else if (skip_prefix(arg, "sparse:", )) { Style: join the 2 lines above. > + if (skip_prefix(v0, "oid=", )) { > + filter_options->choice = LOFC_SPARSE_OID; > + if (!get_oid_with_context(v1, GET_OID_BLOB, > + _oid, )) { > + /* > + * We successfully converted the > + * into an actual OID. Rewrite the raw_value > + * in canonoical form with just the OID. > + * (If we send this request to the server, we > + * want an absolute expression rather than a > + * local-ref-relative expression.) > + */ I think this would lead to confusing behavior - for example, a fetch with "--filter=oid=mybranch:sparseconfig" would have different results depending on whether "mybranch" refers to a valid object locally. The way I see it, this should either (i) only accept full 40-character OIDs or (ii) retain the raw string to be interpreted only when the filtering is done. (i) is simpler and safer, but is not so useful. In both cases, if the user really wants client-side interpretation, they can still use "$(git rev-parse foo)" to make it explicit. > + free((char *)filter_options->raw_value); > + filter_options->raw_value = > + xstrfmt("sparse:oid=%s", > + oid_to_hex(_oid)); > + filter_options->sparse_oid_value = > + oiddup(_oid); > + } else { > + /* > + * We could not turn the into an > + * OID. Leave the raw_value as is in case > + * the server can parse it. (It may refer to > + * a branch, commit, or blob we don't have.) > + */ > + } > + return 0; > + } > + > + if (skip_prefix(v0, "path=", )) { > + filter_options->choice = LOFC_SPARSE_PATH; > + filter_options->sparse_path_value = strdup(v1); > + return 0; > + } > + } > + > + die(_("invalid filter expression '%s'"), arg); > + return 0; > +} [snip] > +void arg_format_list_objects_filter( > + struct argv_array *argv_array, > + const struct list_objects_filter_options *filter_options) Is this function used anywhere (in this patch or subsequent patches)? > diff --git a/list-objects-filter.c b/list-objects-filter.c > +/* See object.h and revision.h */ > +#define FILTER_REVISIT (1<<25) Looking later in the code, this flag indicates that a tree has been SHOWN, so it might be better to just call this FILTER_SHOWN. Also document this flag in object.h in the table above FLAG_BITS. > +static enum list_objects_filter_result filter_blobs_limit( > + enum list_objects_filter_type filter_type, > + struct object *obj, > + const char *pathname, > + const char *filename, > + void *filter_data_) > +{ > + struct filter_blobs_limit_data *filter_data = filter_data_; > + unsigned long object_length; > + enum object_type t; > + int is_special_filename; > + > + switch (filter_type) { > + default: > +
[PATCH v2 4/6] list-objects: filter objects in traverse_commit_list
From: Jeff HostetlerCreate traverse_commit_list_filtered() and add filtering interface to allow certain objects to be omitted from the traversal. Update traverse_commit_list() to be a wrapper for the above with a null filter to minimize the number of callers that needed to be changed. Object filtering will be used in a future commit by rev-list and pack-objects for partial clone and fetch to omit unwanted objects from the result. traverse_bitmap_commit_list() does not work with filtering. If a packfile bitmap is present, it will not be used. Signed-off-by: Jeff Hostetler --- Makefile | 2 + list-objects-filter-options.c | 119 list-objects-filter-options.h | 55 ++ list-objects-filter.c | 408 ++ list-objects-filter.h | 84 + list-objects.c| 95 -- list-objects.h| 2 +- 7 files changed, 748 insertions(+), 17 deletions(-) create mode 100644 list-objects-filter-options.c create mode 100644 list-objects-filter-options.h create mode 100644 list-objects-filter.c create mode 100644 list-objects-filter.h diff --git a/Makefile b/Makefile index cd75985..ca378a4 100644 --- a/Makefile +++ b/Makefile @@ -807,6 +807,8 @@ LIB_OBJS += levenshtein.o LIB_OBJS += line-log.o LIB_OBJS += line-range.o LIB_OBJS += list-objects.o +LIB_OBJS += list-objects-filter.o +LIB_OBJS += list-objects-filter-options.o LIB_OBJS += ll-merge.o LIB_OBJS += lockfile.o LIB_OBJS += log-tree.o diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c new file mode 100644 index 000..31255e7 --- /dev/null +++ b/list-objects-filter-options.c @@ -0,0 +1,119 @@ +#include "cache.h" +#include "commit.h" +#include "config.h" +#include "revision.h" +#include "argv-array.h" +#include "list-objects.h" +#include "list-objects-filter.h" +#include "list-objects-filter-options.h" + +/* + * Parse value of the argument to the "filter" keword. + * On the command line this looks like: + * --filter= + * and in the pack protocol as: + * "filter" SP + * + * ::= blob:none + * blob:limit=[kmg] + * sparse:oid= + * sparse:path= + */ +int parse_list_objects_filter(struct list_objects_filter_options *filter_options, + const char *arg) +{ + struct object_context oc; + struct object_id sparse_oid; + const char *v0; + const char *v1; + + if (filter_options->choice) + die(_("multiple object filter types cannot be combined")); + + /* +* TODO consider rejecting 'arg' if it contains any +* TODO injection characters (since we might send this +* TODO to a sub-command or to the server and we don't +* TODO want to deal with legacy quoting/escaping for +* TODO a new feature). +*/ + + filter_options->raw_value = strdup(arg); + + if (skip_prefix(arg, "blob:", ) || skip_prefix(arg, "blobs:", )) { + if (!strcmp(v0, "none")) { + filter_options->choice = LOFC_BLOB_NONE; + return 0; + } + + if (skip_prefix(v0, "limit=", ) && + git_parse_ulong(v1, _options->blob_limit_value)) { + filter_options->choice = LOFC_BLOB_LIMIT; + return 0; + } + } + else if (skip_prefix(arg, "sparse:", )) { + if (skip_prefix(v0, "oid=", )) { + filter_options->choice = LOFC_SPARSE_OID; + if (!get_oid_with_context(v1, GET_OID_BLOB, + _oid, )) { + /* +* We successfully converted the +* into an actual OID. Rewrite the raw_value +* in canonoical form with just the OID. +* (If we send this request to the server, we +* want an absolute expression rather than a +* local-ref-relative expression.) +*/ + free((char *)filter_options->raw_value); + filter_options->raw_value = + xstrfmt("sparse:oid=%s", + oid_to_hex(_oid)); + filter_options->sparse_oid_value = + oiddup(_oid); + } else { + /* +* We could not turn the into an +* OID. Leave the raw_value as is in case +* the server can parse it. (It may refer to +