Re: [PATCH v3 13/35] ls-refs: introduce ls-refs server command
On 02/26, Jonathan Nieder wrote: > Ævar Arnfjörð Bjarmason wrote: > > On Sat, Feb 24 2018, Jeff King jotted: > > >> I actually wonder if we should just specify that the patterns must > >> _always_ be fully-qualified, but may end with a single "/*" to iterate > >> over wildcards. Or even simpler, that "refs/heads/foo" would find that > >> ref itself, and anything under it. > > > > I agree that this is a very good trade-off for now, but I think having > > an escape hatch makes sense. It looks like the protocol is implicitly > > extendible since another parameter could be added, but maybe having such > > a parameter from the get-go would make sense: > > I prefer to rely on the implicit extensibility (following the general > YAGNI principle). > > In other words, we can introduce a pattern-type later and make the > current pattern-type the default. Yeah this is what I'm going to do for the next re-roll of the series, make the pattern matching simple and later we can extend it if we want since we already have the ability to add new features to commands (you can see how I added shallow to fetch for an example). > > Thanks for looking to the future. > > [...] > > E.g. if the refs were stored indexed using the method described at > > https://swtch.com/~rsc/regexp/regexp4.html tail matching becomes no less > > efficient than prefix matching, but a function of how many trigrams in > > your index match the pattern given. > > I think the nearest planned change to ref storage is [1], which is > still optimized for prefix matching. Longer term, maybe some day > we'll want a secondary index that supports infix matching, or maybe > we'll never need it. :) > > Sincerely, > Jonathan > > [1] > https://public-inbox.org/git/CAJo=hjszcam9sipdvr7tmd-fd2v2w6_pvmq791egcdsdkq0...@mail.gmail.com/#t -- Brandon Williams
Re: [PATCH v3 13/35] ls-refs: introduce ls-refs server command
Ævar Arnfjörð Bjarmason wrote: > On Sat, Feb 24 2018, Jeff King jotted: >> I actually wonder if we should just specify that the patterns must >> _always_ be fully-qualified, but may end with a single "/*" to iterate >> over wildcards. Or even simpler, that "refs/heads/foo" would find that >> ref itself, and anything under it. > > I agree that this is a very good trade-off for now, but I think having > an escape hatch makes sense. It looks like the protocol is implicitly > extendible since another parameter could be added, but maybe having such > a parameter from the get-go would make sense: I prefer to rely on the implicit extensibility (following the general YAGNI principle). In other words, we can introduce a pattern-type later and make the current pattern-type the default. Thanks for looking to the future. [...] > E.g. if the refs were stored indexed using the method described at > https://swtch.com/~rsc/regexp/regexp4.html tail matching becomes no less > efficient than prefix matching, but a function of how many trigrams in > your index match the pattern given. I think the nearest planned change to ref storage is [1], which is still optimized for prefix matching. Longer term, maybe some day we'll want a secondary index that supports infix matching, or maybe we'll never need it. :) Sincerely, Jonathan [1] https://public-inbox.org/git/CAJo=hjszcam9sipdvr7tmd-fd2v2w6_pvmq791egcdsdkq0...@mail.gmail.com/#t
Re: [PATCH v3 13/35] ls-refs: introduce ls-refs server command
On Sat, Feb 24 2018, Jeff King jotted: > On Thu, Feb 22, 2018 at 04:45:14PM -0800, Brandon Williams wrote: >> > Does the client have to be aware that we're using wildmatch? I think >> > they'd need "refs/heads/**" to actually implement what we usually >> > specify in refspecs as "refs/heads/*". Or does the lack of WM_PATHNAME >> > make this work with just "*"? >> > >> > Do we anticipate that the client would left-anchor the refspec like >> > "/refs/heads/*" so that in theory the server could avoid looking outside >> > of /refs/heads/? >> >> Yeah we may want to anchor it by providing the leading '/' instead of >> just "refs/". > > I actually wonder if we should just specify that the patterns must > _always_ be fully-qualified, but may end with a single "/*" to iterate > over wildcards. Or even simpler, that "refs/heads/foo" would find that > ref itself, and anything under it. I agree that this is a very good trade-off for now, but I think having an escape hatch makes sense. It looks like the protocol is implicitly extendible since another parameter could be added, but maybe having such a parameter from the get-go would make sense: pattern-type [simple|wildmatch|pcre|...] ref-pattern E.g.: pattern-type simple ref-pattern refs/tags/* ref-pattern refs/pull/* pattern-type wildmatch ref-pattern refs/**/2018 pattern-type pcre ref-pattern ^refs/release/v-201[56789]-\d+$ I.e. each ref-pattern is typed by the pattern-type in play, with just "simple" (with the behavior being discussed here) for now, anything else (wildmatch, pcre etc.) would be an error. But it allows for adding more patterns down the line, and in e.g. in-house setups of git where you control both the server & clients to make the trade-off that we'd like a bit more work on the server (e.g. to match dated tags created in the last 3 months) by setting some config option. The discussion upthread about: > The other problem with tail-matching is that it's inefficient on the > server[...] Is also something that's only true in the current implementation, but doesn't need to be, so it would be unfortunate to not work in an escape hatch for that limtiation. E.g. if the refs were stored indexed using the method described at https://swtch.com/~rsc/regexp/regexp4.html tail matching becomes no less efficient than prefix matching, but a function of how many trigrams in your index match the pattern given.
Re: [PATCH v3 13/35] ls-refs: introduce ls-refs server command
Jeff Kingwrites: > I actually wonder if we should just specify that the patterns must > _always_ be fully-qualified, but may end with a single "/*" to iterate > over wildcards. Or even simpler, that "refs/heads/foo" would find that > ref itself, and anything under it. > > That drops any question about how wildcards work (e.g., does "refs/foo*" > work to find "refs/foobar"?). Sounds quite sensible to me.
Re: [PATCH v3 13/35] ls-refs: introduce ls-refs server command
On Fri, Feb 23, 2018 at 04:19:54PM -0800, Brandon Williams wrote: > > We always have the ability to extend the patterns accepted via a feature > > (or capability) to ls-refs, so maybe the best thing to do now would only > > support a few patterns with specific semantics. Something like if you > > say "master" only match against refs/heads/ and refs/tags/ and if you > > want something else you would need to specify "refs/pull/master"? > > > > That way we could only support globs at the end "master*" where * can > > match anything (including slashes) > > After some in-office discussion it seems like the best thing to do for > this (right now since if we change our mind we can just introduce a > capability which extends the patterns supported) would be to left-anchor > the ref-patterns and only allow for a single wildcard character '*' > which matches zero or more characters (and doesn't care about slashes > '/'). This wildcard character should only be supported at the end of > the ref pattern. This means that if a client wants 'master' then they > would need to specify 'refs/heads/master' (and the other > ref_rev_parse_rules expansions) as a ref pattern. But they could say > "refs/heads/*" for all refs under refs/heads. Heh, I just responded without having read this and came up with the same suggestion. So I agree that is the right path. Or the simplification I mentioned that "refs/heads/master" would return that ref or possibly "refs/heads/master/foo" if it exists. Remember that it's fine to be overly broad here. This is purely an optimization in the advertisement, as we'd still pick out the refs we care about in a separate step. -Peff
Re: [PATCH v3 13/35] ls-refs: introduce ls-refs server command
On Thu, Feb 22, 2018 at 04:45:14PM -0800, Brandon Williams wrote: > > This kind of tail matching can't quite implement all of the current > > behavior. Because we actually do the normal dwim_ref() matching, which > > includes stuff like "refs/remotes/%s/HEAD". > > > > The other problem with tail-matching is that it's inefficient on the > > server. Ideally we could get a request for "master" and only look up > > refs/heads/master, refs/tags/master, etc. And if there are 50,000 refs > > in refs/pull, we wouldn't have to process those at all. Of course this > > is no worse than the current code, which not only looks at each ref but > > actually _sends_ it. But it would be nice if we could fix this. > > > > There's some more discussion in this old thread: > > > > > > https://public-inbox.org/git/20161024132932.i42rqn2vlpocq...@sigill.intra.peff.net/ > > Thanks for the pointer. I was told to be wary a while about about > performance implications on the server but no discussion ensued till now > about it :) > > We always have the ability to extend the patterns accepted via a feature > (or capability) to ls-refs, so maybe the best thing to do now would only > support a few patterns with specific semantics. Something like if you > say "master" only match against refs/heads/ and refs/tags/ and if you > want something else you would need to specify "refs/pull/master"? The big question is whether you want to break compatibility with the existing program behavior. If not, then I think you have to ask for every variant in ref_rev_parse_rules (of which there are 6 variants). Which sounds pretty gross, but it actually may not be _too_ bad. Most fetches tend to ask for either a single name, or they use left-anchored wildcards. So it would work to just have the client expand all of the possibilities itself into fully-qualified refs, and keep the server as dumb as possible. And then the server for now can just cull based on the pattern list, like you have here. But later, we could optimize it to look up the individual patterns, which should be cheaper, since we'd generally have many fewer patterns than total refs. > > Does the client have to be aware that we're using wildmatch? I think > > they'd need "refs/heads/**" to actually implement what we usually > > specify in refspecs as "refs/heads/*". Or does the lack of WM_PATHNAME > > make this work with just "*"? > > > > Do we anticipate that the client would left-anchor the refspec like > > "/refs/heads/*" so that in theory the server could avoid looking outside > > of /refs/heads/? > > Yeah we may want to anchor it by providing the leading '/' instead of > just "refs/". I actually wonder if we should just specify that the patterns must _always_ be fully-qualified, but may end with a single "/*" to iterate over wildcards. Or even simpler, that "refs/heads/foo" would find that ref itself, and anything under it. That drops any question about how wildcards work (e.g., does "refs/foo*" work to find "refs/foobar"?). > I need to read over the discussion you linked to more but what sort of > ref patterns do you believe we should support as part of the initial > release of v2? It seems like you wanted this at some point in the past > so I assume you have an idea of what sort of filtering would be > beneficial. My goals were just optimizing: 1. Don't send all the refs across the wire if we can avoid it. 2. Don't even iterate over all the refs internally if we can avoid it. Especially with the new binary-searching packed-refs code, we should be able to serve a request like "ls-refs refs/heads/*" without looking into "refs/pull" or "refs/changes" at all. -Peff
Re: [PATCH v3 13/35] ls-refs: introduce ls-refs server command
On 02/22, Brandon Williams wrote: > On 02/22, Jeff King wrote: > > On Tue, Feb 06, 2018 at 05:12:50PM -0800, Brandon Williams wrote: > > > > > +ls-refs takes in the following parameters wrapped in packet-lines: > > > + > > > +symrefs > > > + In addition to the object pointed by it, show the underlying ref > > > + pointed by it when showing a symbolic ref. > > > +peel > > > + Show peeled tags. > > > +ref-pattern > > > + When specified, only references matching the one of the provided > > > + patterns are displayed. > > > > How do we match those patterns? That's probably an important thing to > > include in the spec. > > Yeah I thought about it when I first wrote it and was hoping that > someone who nudge me in the right direction :) > > > > > Looking at the code, I see: > > > > > +/* > > > + * Check if one of the patterns matches the tail part of the ref. > > > + * If no patterns were provided, all refs match. > > > + */ > > > +static int ref_match(const struct argv_array *patterns, const char > > > *refname) > > > > This kind of tail matching can't quite implement all of the current > > behavior. Because we actually do the normal dwim_ref() matching, which > > includes stuff like "refs/remotes/%s/HEAD". > > > > The other problem with tail-matching is that it's inefficient on the > > server. Ideally we could get a request for "master" and only look up > > refs/heads/master, refs/tags/master, etc. And if there are 50,000 refs > > in refs/pull, we wouldn't have to process those at all. Of course this > > is no worse than the current code, which not only looks at each ref but > > actually _sends_ it. But it would be nice if we could fix this. > > > > There's some more discussion in this old thread: > > > > > > https://public-inbox.org/git/20161024132932.i42rqn2vlpocq...@sigill.intra.peff.net/ > > Thanks for the pointer. I was told to be wary a while about about > performance implications on the server but no discussion ensued till now > about it :) > > We always have the ability to extend the patterns accepted via a feature > (or capability) to ls-refs, so maybe the best thing to do now would only > support a few patterns with specific semantics. Something like if you > say "master" only match against refs/heads/ and refs/tags/ and if you > want something else you would need to specify "refs/pull/master"? > > That way we could only support globs at the end "master*" where * can > match anything (including slashes) After some in-office discussion it seems like the best thing to do for this (right now since if we change our mind we can just introduce a capability which extends the patterns supported) would be to left-anchor the ref-patterns and only allow for a single wildcard character '*' which matches zero or more characters (and doesn't care about slashes '/'). This wildcard character should only be supported at the end of the ref pattern. This means that if a client wants 'master' then they would need to specify 'refs/heads/master' (and the other ref_rev_parse_rules expansions) as a ref pattern. But they could say "refs/heads/*" for all refs under refs/heads. > > > > > > +{ > > > + char *pathbuf; > > > + int i; > > > + > > > + if (!patterns->argc) > > > + return 1; /* no restriction */ > > > + > > > + pathbuf = xstrfmt("/%s", refname); > > > + for (i = 0; i < patterns->argc; i++) { > > > + if (!wildmatch(patterns->argv[i], pathbuf, 0)) { > > > + free(pathbuf); > > > + return 1; > > > + } > > > + } > > > + free(pathbuf); > > > + return 0; > > > +} > > > > Does the client have to be aware that we're using wildmatch? I think > > they'd need "refs/heads/**" to actually implement what we usually > > specify in refspecs as "refs/heads/*". Or does the lack of WM_PATHNAME > > make this work with just "*"? > > > > Do we anticipate that the client would left-anchor the refspec like > > "/refs/heads/*" so that in theory the server could avoid looking outside > > of /refs/heads/? > > Yeah we may want to anchor it by providing the leading '/' instead of > just "refs/". > > > > > -Peff > > I need to read over the discussion you linked to more but what sort of > ref patterns do you believe we should support as part of the initial > release of v2? It seems like you wanted this at some point in the past > so I assume you have an idea of what sort of filtering would be > beneficial. > > -- > Brandon Williams -- Brandon Williams
Re: [PATCH v3 13/35] ls-refs: introduce ls-refs server command
On 02/22, Jeff King wrote: > On Tue, Feb 06, 2018 at 05:12:50PM -0800, Brandon Williams wrote: > > > +ls-refs takes in the following parameters wrapped in packet-lines: > > + > > +symrefs > > + In addition to the object pointed by it, show the underlying ref > > + pointed by it when showing a symbolic ref. > > +peel > > + Show peeled tags. > > +ref-pattern > > + When specified, only references matching the one of the provided > > + patterns are displayed. > > How do we match those patterns? That's probably an important thing to > include in the spec. Yeah I thought about it when I first wrote it and was hoping that someone who nudge me in the right direction :) > > Looking at the code, I see: > > > +/* > > + * Check if one of the patterns matches the tail part of the ref. > > + * If no patterns were provided, all refs match. > > + */ > > +static int ref_match(const struct argv_array *patterns, const char > > *refname) > > This kind of tail matching can't quite implement all of the current > behavior. Because we actually do the normal dwim_ref() matching, which > includes stuff like "refs/remotes/%s/HEAD". > > The other problem with tail-matching is that it's inefficient on the > server. Ideally we could get a request for "master" and only look up > refs/heads/master, refs/tags/master, etc. And if there are 50,000 refs > in refs/pull, we wouldn't have to process those at all. Of course this > is no worse than the current code, which not only looks at each ref but > actually _sends_ it. But it would be nice if we could fix this. > > There's some more discussion in this old thread: > > > https://public-inbox.org/git/20161024132932.i42rqn2vlpocq...@sigill.intra.peff.net/ Thanks for the pointer. I was told to be wary a while about about performance implications on the server but no discussion ensued till now about it :) We always have the ability to extend the patterns accepted via a feature (or capability) to ls-refs, so maybe the best thing to do now would only support a few patterns with specific semantics. Something like if you say "master" only match against refs/heads/ and refs/tags/ and if you want something else you would need to specify "refs/pull/master"? That way we could only support globs at the end "master*" where * can match anything (including slashes) > > > +{ > > + char *pathbuf; > > + int i; > > + > > + if (!patterns->argc) > > + return 1; /* no restriction */ > > + > > + pathbuf = xstrfmt("/%s", refname); > > + for (i = 0; i < patterns->argc; i++) { > > + if (!wildmatch(patterns->argv[i], pathbuf, 0)) { > > + free(pathbuf); > > + return 1; > > + } > > + } > > + free(pathbuf); > > + return 0; > > +} > > Does the client have to be aware that we're using wildmatch? I think > they'd need "refs/heads/**" to actually implement what we usually > specify in refspecs as "refs/heads/*". Or does the lack of WM_PATHNAME > make this work with just "*"? > > Do we anticipate that the client would left-anchor the refspec like > "/refs/heads/*" so that in theory the server could avoid looking outside > of /refs/heads/? Yeah we may want to anchor it by providing the leading '/' instead of just "refs/". > > -Peff I need to read over the discussion you linked to more but what sort of ref patterns do you believe we should support as part of the initial release of v2? It seems like you wanted this at some point in the past so I assume you have an idea of what sort of filtering would be beneficial. -- Brandon Williams
Re: [PATCH v3 13/35] ls-refs: introduce ls-refs server command
On Tue, Feb 06, 2018 at 05:12:50PM -0800, Brandon Williams wrote: > +ls-refs takes in the following parameters wrapped in packet-lines: > + > +symrefs > + In addition to the object pointed by it, show the underlying ref > + pointed by it when showing a symbolic ref. > +peel > + Show peeled tags. > +ref-pattern > + When specified, only references matching the one of the provided > + patterns are displayed. How do we match those patterns? That's probably an important thing to include in the spec. Looking at the code, I see: > +/* > + * Check if one of the patterns matches the tail part of the ref. > + * If no patterns were provided, all refs match. > + */ > +static int ref_match(const struct argv_array *patterns, const char *refname) This kind of tail matching can't quite implement all of the current behavior. Because we actually do the normal dwim_ref() matching, which includes stuff like "refs/remotes/%s/HEAD". The other problem with tail-matching is that it's inefficient on the server. Ideally we could get a request for "master" and only look up refs/heads/master, refs/tags/master, etc. And if there are 50,000 refs in refs/pull, we wouldn't have to process those at all. Of course this is no worse than the current code, which not only looks at each ref but actually _sends_ it. But it would be nice if we could fix this. There's some more discussion in this old thread: https://public-inbox.org/git/20161024132932.i42rqn2vlpocq...@sigill.intra.peff.net/ > +{ > + char *pathbuf; > + int i; > + > + if (!patterns->argc) > + return 1; /* no restriction */ > + > + pathbuf = xstrfmt("/%s", refname); > + for (i = 0; i < patterns->argc; i++) { > + if (!wildmatch(patterns->argv[i], pathbuf, 0)) { > + free(pathbuf); > + return 1; > + } > + } > + free(pathbuf); > + return 0; > +} Does the client have to be aware that we're using wildmatch? I think they'd need "refs/heads/**" to actually implement what we usually specify in refspecs as "refs/heads/*". Or does the lack of WM_PATHNAME make this work with just "*"? Do we anticipate that the client would left-anchor the refspec like "/refs/heads/*" so that in theory the server could avoid looking outside of /refs/heads/? -Peff
[PATCH v3 13/35] ls-refs: introduce ls-refs server command
Introduce the ls-refs server command. In protocol v2, the ls-refs command is used to request the ref advertisement from the server. Since it is a command which can be requested (as opposed to mandatory in v1), a client can sent a number of parameters in its request to limit the ref advertisement based on provided ref-patterns. Signed-off-by: Brandon Williams--- Documentation/technical/protocol-v2.txt | 32 + Makefile| 1 + ls-refs.c | 96 ++ ls-refs.h | 9 +++ serve.c | 8 +++ t/t5701-git-serve.sh| 115 6 files changed, 261 insertions(+) create mode 100644 ls-refs.c create mode 100644 ls-refs.h diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt index f87372f9b..ef81df868 100644 --- a/Documentation/technical/protocol-v2.txt +++ b/Documentation/technical/protocol-v2.txt @@ -112,3 +112,35 @@ printable ASCII characters except space (i.e., the byte range 32 < x < "git/1.8.3.1"). The agent strings are purely informative for statistics and debugging purposes, and MUST NOT be used to programmatically assume the presence or absence of particular features. + + ls-refs +- + +`ls-refs` is the command used to request a reference advertisement in v2. +Unlike the current reference advertisement, ls-refs takes in parameters +which can be used to limit the refs sent from the server. + +Additional features not supported in the base command will be advertised +as the value of the command in the capability advertisement in the form +of a space separated list of features, e.g. "= +". + +ls-refs takes in the following parameters wrapped in packet-lines: + +symrefs + In addition to the object pointed by it, show the underlying ref + pointed by it when showing a symbolic ref. +peel + Show peeled tags. +ref-pattern + When specified, only references matching the one of the provided + patterns are displayed. + +The output of ls-refs is as follows: + +output = *ref +flush-pkt +ref = PKT-LINE(obj-id SP refname *(SP ref-attribute) LF) +ref-attribute = (symref | peeled) +symref = "symref-target:" symref-target +peeled = "peeled:" obj-id diff --git a/Makefile b/Makefile index 18c255428..e50927cfb 100644 --- a/Makefile +++ b/Makefile @@ -825,6 +825,7 @@ LIB_OBJS += list-objects-filter-options.o LIB_OBJS += ll-merge.o LIB_OBJS += lockfile.o LIB_OBJS += log-tree.o +LIB_OBJS += ls-refs.o LIB_OBJS += mailinfo.o LIB_OBJS += mailmap.o LIB_OBJS += match-trees.o diff --git a/ls-refs.c b/ls-refs.c new file mode 100644 index 0..70682b4f7 --- /dev/null +++ b/ls-refs.c @@ -0,0 +1,96 @@ +#include "cache.h" +#include "repository.h" +#include "refs.h" +#include "remote.h" +#include "argv-array.h" +#include "ls-refs.h" +#include "pkt-line.h" + +struct ls_refs_data { + unsigned peel; + unsigned symrefs; + struct argv_array patterns; +}; + +/* + * Check if one of the patterns matches the tail part of the ref. + * If no patterns were provided, all refs match. + */ +static int ref_match(const struct argv_array *patterns, const char *refname) +{ + char *pathbuf; + int i; + + if (!patterns->argc) + return 1; /* no restriction */ + + pathbuf = xstrfmt("/%s", refname); + for (i = 0; i < patterns->argc; i++) { + if (!wildmatch(patterns->argv[i], pathbuf, 0)) { + free(pathbuf); + return 1; + } + } + free(pathbuf); + return 0; +} + +static int send_ref(const char *refname, const struct object_id *oid, + int flag, void *cb_data) +{ + struct ls_refs_data *data = cb_data; + const char *refname_nons = strip_namespace(refname); + struct strbuf refline = STRBUF_INIT; + + if (!ref_match(>patterns, refname)) + return 0; + + strbuf_addf(, "%s %s", oid_to_hex(oid), refname_nons); + if (data->symrefs && flag & REF_ISSYMREF) { + struct object_id unused; + const char *symref_target = resolve_ref_unsafe(refname, 0, + , + ); + + if (!symref_target) + die("'%s' is a symref but it is not?", refname); + + strbuf_addf(, " symref-target:%s", symref_target); + } + + if (data->peel) { + struct object_id peeled; + if (!peel_ref(refname, )) + strbuf_addf(, " peeled:%s", oid_to_hex()); + } + + strbuf_addch(, '\n'); + packet_write(1, refline.buf, refline.len); + + strbuf_release(); + return 0; +} +