Re: [PATCHv4 1/2] clone: respect additional configured fetch refspecs during initial fetch
On Wed, Jun 14, 2017 at 2:48 AM, Jonathan Nieder wrote: >> diff --git a/remote.h b/remote.h >> index 924881169..9ad8c1085 100644 >> --- a/remote.h >> +++ b/remote.h >> @@ -164,6 +164,7 @@ struct ref *ref_remove_duplicates(struct ref *ref_map); >> >> int valid_fetch_refspec(const char *refspec); >> struct refspec *parse_fetch_refspec(int nr_refspec, const char **refspec); >> +void add_and_parse_fetch_refspec(struct remote *remote, const char >> *refspec); > I'm tempted to say that this one should be named add_fetch_refspec (or > something like remote_add_refspec) --- this is the only way to add a > fetch refspec in the public remote API, and the fact that it parses is > an implementation detail. The private add_fetch_refpsec that builds > the fetch_refspec as preparation for parsing them in a batch is not > part of the exported API. I kind of agree, but ... First, there is an add_push_refspec() function as well, which, just like its fetch counterpart, doesn't parse the given refspec, only appends it to remote->push_refspec. Changing add_fetch_refspec() to parse, too, would break this symmetry. Furthermore, at the moment we have both remote->fetch_refspec (for strings) and remote->fetch (for parsed refspecs), and parsing a refspec die()s if it's bogus, therefore I think that parsing is not an implementation detail that should be hidden. > The caller adds one refspec right after calling remote_get. I'm > starting to wonder if this could be done more simply by having a > variant of remote_get that allows naming an additional refspec, so > that remote->fetch could be immutable after construction like it was > before. What do you think? That's such a very specific and narrow use case that I don't think it justifies a dedicated function. I don't think remote->fetch should be immutable; I think remote->{fetch,push}_refspec and the lazy parsing of refspecs should go away. Cleaning up this corner of the remote API is beyond the scope of this patch series. > [...] >> + /* Not free_refspecs(), as we copied its pointers above */ >> + free(rs); > > Allocating an array to put the parsed refspec in and then freeing it > seems wasteful. Should parse_refspec_internal be changed to take an > output parameter so it can put the refspec into remote->fetch > directly? No, I found that extracting the huge body of its loop into a helper function that fills an output parameter is much more useful. > [...] >> +++ b/builtin/clone.c > [...] >> @@ -848,16 +853,13 @@ int cmd_clone(int argc, const char **argv, const char >> *prefix) >> const struct ref *our_head_points_at; >> struct ref *mapped_refs; >> const struct ref *ref; >> - struct strbuf key = STRBUF_INIT, value = STRBUF_INIT; >> + struct strbuf key = STRBUF_INIT, default_refspec = STRBUF_INIT; > > nit: since it's not part of a key, value pair like value, > default_refspec should probably go on its own line. Fun fact: they were never part of a key-value pair. While 'key' is indeed the name of a configuration variable, 'value' is not the value of that configuration variable, or any other configuration variable for that matter. Best, Gábor
Re: [PATCHv4 1/2] clone: respect additional configured fetch refspecs during initial fetch
On Tue, Jun 13, 2017 at 05:48:16PM -0700, Jonathan Nieder wrote: > > The reason for this is that the initial fetch is not a fully fledged > > 'git fetch' but a bunch of direct calls into the fetch/transport > > machinery with clone's own refs-to-refspec matching logic, which > > bypasses parts of 'git fetch' processing configured fetch refspecs. > > Agh, subtle. > > I'm hoping that longer term we can make fetch behave more like a > library and make the initial fetch into a fully fledged 'git fetch' > like thing again. But this smaller change is the logical fix in the > meantime. We talked about this earlier in the thread, but I doubt that will ever happen. Things like --single-branch means that clone has to actually look at what the remote has before it decides what to fetch. Of course we _could_ teach all that logic to fetch, too, but I don't think you'll ever get the nice simple: 1. configure refspecs 2. fetch 3. checkout So the best thing is probably to push any repeated logic down into the transport layer, where it can be easily used by both commands. > > + /* Not free_refspecs(), as we copied its pointers above */ > > + free(rs); > > Allocating an array to put the parsed refspec in and then freeing it > seems wasteful. Should parse_refspec_internal be changed to take an > output parameter so it can put the refspec into remote->fetch > directly? It's not quite trivial to make that change. parse_refspec_internal() actually handles an array of refspecs. So its callers would all have to start allocating the correctly-sized array. I doubt that one malloc per clone is worth caring about. I'd worry more about the trickiness that merited the comment above, but it's at least contained in this one function. -Peff
Re: [PATCHv4 1/2] clone: respect additional configured fetch refspecs during initial fetch
Hi, SZEDER Gábor wrote: > The initial fetch during a clone doesn't transfer refs matching > additional fetch refspecs given on the command line as configuration > variables. This contradicts the documentation stating that > configuration variables specified via 'git clone -c = ...' > "take effect immediately after the repository is initialized, but > before the remote history is fetched" and the given example [...] > The reason for this is that the initial fetch is not a fully fledged > 'git fetch' but a bunch of direct calls into the fetch/transport > machinery with clone's own refs-to-refspec matching logic, which > bypasses parts of 'git fetch' processing configured fetch refspecs. Agh, subtle. I'm hoping that longer term we can make fetch behave more like a library and make the initial fetch into a fully fledged 'git fetch' like thing again. But this smaller change is the logical fix in the meantime. [...] > diff --git a/remote.c b/remote.c > index ad6c5424e..b8fd09dc9 100644 > --- a/remote.c > +++ b/remote.c > @@ -626,6 +626,19 @@ struct refspec *parse_fetch_refspec(int nr_refspec, > const char **refspec) > return parse_refspec_internal(nr_refspec, refspec, 1, 0); > } > > +void add_and_parse_fetch_refspec(struct remote *remote, const char *refspec) > +{ > + struct refspec *rs; > + > + add_fetch_refspec(remote, refspec); > + rs = parse_fetch_refspec(1, &refspec); > + REALLOC_ARRAY(remote->fetch, remote->fetch_refspec_nr); > + remote->fetch[remote->fetch_refspec_nr - 1] = *rs; > + > + /* Not free_refspecs(), as we copied its pointers above */ > + free(rs); > +} > + > static struct refspec *parse_push_refspec(int nr_refspec, const char > **refspec) > { > return parse_refspec_internal(nr_refspec, refspec, 0, 0); > diff --git a/remote.h b/remote.h > index 924881169..9ad8c1085 100644 > --- a/remote.h > +++ b/remote.h > @@ -164,6 +164,7 @@ struct ref *ref_remove_duplicates(struct ref *ref_map); > > int valid_fetch_refspec(const char *refspec); > struct refspec *parse_fetch_refspec(int nr_refspec, const char **refspec); > +void add_and_parse_fetch_refspec(struct remote *remote, const char *refspec); > > void free_refspec(int nr_refspec, struct refspec *refspec); I realize its neighbors don't have this, but can this function have a brief comment explaining how it is meant to be used and what guarantees it makes? For example: /** Adds a refspec to remote->fetch_refspec and remote->fetch. */ void add_and_parse_fetch_refspec(struct remote *remote, const char *refspec); I'm tempted to say that this one should be named add_fetch_refspec (or something like remote_add_refspec) --- this is the only way to add a fetch refspec in the public remote API, and the fact that it parses is an implementation detail. The private add_fetch_refpsec that builds the fetch_refspec as preparation for parsing them in a batch is not part of the exported API. Also, now that the API is appending to remote->fetch instead of allocating it in one go, should it use the ALLOC_GROW heuristic / fetch_refspec_alloc size? The caller adds one refspec right after calling remote_get. I'm starting to wonder if this could be done more simply by having a variant of remote_get that allows naming an additional refspec, so that remote->fetch could be immutable after construction like it was before. What do you think? [...] > + /* Not free_refspecs(), as we copied its pointers above */ > + free(rs); Allocating an array to put the parsed refspec in and then freeing it seems wasteful. Should parse_refspec_internal be changed to take an output parameter so it can put the refspec into remote->fetch directly? [...] > +++ b/builtin/clone.c [...] > @@ -848,16 +853,13 @@ int cmd_clone(int argc, const char **argv, const char > *prefix) > const struct ref *our_head_points_at; > struct ref *mapped_refs; > const struct ref *ref; > - struct strbuf key = STRBUF_INIT, value = STRBUF_INIT; > + struct strbuf key = STRBUF_INIT, default_refspec = STRBUF_INIT; nit: since it's not part of a key, value pair like value, default_refspec should probably go on its own line. [...] > --- a/t/t5611-clone-config.sh > +++ b/t/t5611-clone-config.sh > @@ -37,6 +37,50 @@ test_expect_success 'clone -c config is available during > clone' ' > test_cmp expect child/file > ' > > +test_expect_success 'clone -c remote.origin.fetch= works' ' > + rm -rf child && > + git update-ref refs/grab/it refs/heads/master && > + git update-ref refs/leave/out refs/heads/master && > + git clone -c "remote.origin.fetch=+refs/grab/*:refs/grab/*" . child && > + git -C child for-each-ref --format="%(refname)" >actual && > + cat >expect <<-EOF && > + refs/grab/it > + refs/heads/master > + refs/remotes/origin/HEAD > + refs/remotes/origin/master > + EOF > + test_cmp expect actual > +' Can use <<-\EOF to save the reviewer from having to
Re: [PATCHv4 1/2] clone: respect additional configured fetch refspecs during initial fetch
On Tue, Jun 6, 2017 at 8:37 PM, Jeff King wrote: >> To put your worries to rest we should eliminate remote->fetch_refspec >> altogether and parse refspecs into remote->fetch right away, I'd >> think. After all, that's what's used in most places anyway, and it >> can be easily turned back to a single string where needed (I think in >> only 3 places in builtin/remote.c). > > I don't think we can parse right away without regressing the error > handling. If I have two remotes, one with a bogus refspec, like: > > [remote "one"] > url = ... > fetch = refs/heads/*:refs/remotes/one/* > [remote "two"] > url = ... > fetch = ***bogus*** > > and I do: > > git fetch one > > then read_config() will grab the data for _both_ of them, but only call > remote_get() on the first one. If we parsed the refspecs during > read_config(), we'd parse the bogus remote.two.fetch and die(). > > I guess that's a minor case, but as far as I can tell that's the > motivation for the lazy parsing. Yeah, I know, we'd need a parse_refspec_gently() or something. parse_refspec_internal() already has a 'verify' parameter which prevents it from die()ing while parsing a bogus refspec, but in its current form it doesn't tell us which refspec was bogus, so we'd need a bit more than that to let the user know what's wrong.
Re: [PATCHv4 1/2] clone: respect additional configured fetch refspecs during initial fetch
On Tue, Jun 06, 2017 at 08:19:09PM +0200, SZEDER Gábor wrote: > > if (!remote->fetch) > > BUG("cannot add refspec to an unparsed remote"); > > > > ? > > But as mentioned before, remote->fetch being NULL is not a bug in > itself, it's a perfectly valid value even in a fully parsed remote > when the remote has no fetch refspecs. > Therefore, I think, the condition should instead be: > > remote->fetch_refspec_nr && !remote->fetch Right, that would be a better check. > We could even try to be extra helpful by checking this condition and > calling parse_fetch_refspec() to initialize remote->fetch instead of > BUG()ing out. However, that would mask the real issue, namely not > using remote_get() to get the remote, so I don't actually think that's > a good thing to do. OK. > To put your worries to rest we should eliminate remote->fetch_refspec > altogether and parse refspecs into remote->fetch right away, I'd > think. After all, that's what's used in most places anyway, and it > can be easily turned back to a single string where needed (I think in > only 3 places in builtin/remote.c). I don't think we can parse right away without regressing the error handling. If I have two remotes, one with a bogus refspec, like: [remote "one"] url = ... fetch = refs/heads/*:refs/remotes/one/* [remote "two"] url = ... fetch = ***bogus*** and I do: git fetch one then read_config() will grab the data for _both_ of them, but only call remote_get() on the first one. If we parsed the refspecs during read_config(), we'd parse the bogus remote.two.fetch and die(). I guess that's a minor case, but as far as I can tell that's the motivation for the lazy parsing. -Peff
Re: [PATCHv4 1/2] clone: respect additional configured fetch refspecs during initial fetch
On Mon, Jun 5, 2017 at 10:18 AM, Jeff King wrote: > Yeah, I agree it is safe now. I'm just worried about some function in > remote.c later doing: > >read_config(); >add_and_parse_fetch_refspec(remotes[0], whatever); > > which leaves the struct in an inconsistent state (we realloc NULL which > allocates from scratch, and all of the other entries in remote->fetch > end up uninitialized). Can we at least add an assertion like: > > if (!remote->fetch) > BUG("cannot add refspec to an unparsed remote"); > > ? But as mentioned before, remote->fetch being NULL is not a bug in itself, it's a perfectly valid value even in a fully parsed remote when the remote has no fetch refspecs. Therefore, I think, the condition should instead be: remote->fetch_refspec_nr && !remote->fetch We could even try to be extra helpful by checking this condition and calling parse_fetch_refspec() to initialize remote->fetch instead of BUG()ing out. However, that would mask the real issue, namely not using remote_get() to get the remote, so I don't actually think that's a good thing to do. OTOH, having remote->fetch so closely related to, yet separate from remote->fetch_refspec{,_nr,_alloc} will always inherently be error prone. This assertion would catch one case where a less than careful dev could cause trouble, sure, but there will be still others left, e.g. he could still do: add_fetch_refspec(remote, ...);// this doesn't update remote->fetch for (i = 0; i < remote->fetch_refspec_nr; i++) func(remote->fetch[i]); and watch the array indexing blow up in the last iteration. Or a non-hypothetical one: when I first tried to use remote_get() for an earlier version of this patch, I ALLOC_GROW()-ed remote->fetch to create room for the default refspec, because in struct remote not **fetch_refspec but *fetch is listed right above fetch_refspec_{nr,alloc}, being way past my bedtime may be my only excuse... It didn't work :) [1] To put your worries to rest we should eliminate remote->fetch_refspec altogether and parse refspecs into remote->fetch right away, I'd think. After all, that's what's used in most places anyway, and it can be easily turned back to a single string where needed (I think in only 3 places in builtin/remote.c). [1] - Though in the end this could be considered beneficial, because commits 53c5de29 (pickaxe: fix segfault with '-S<...> --pickaxe-regex', 2017-03-18), 59210dd56 (tests: make the 'test_pause' helper work in non-verbose mode, 2017-03-18), and 4ecae3c8c (tests: create an interactive gdb session with the 'debug' helper, 2017-03-18) were all fallouts from the ensuing debugging session :)
Re: [PATCHv4 1/2] clone: respect additional configured fetch refspecs during initial fetch
On Wed, May 31, 2017 at 11:34:23AM +0200, SZEDER Gábor wrote: > >> +void add_and_parse_fetch_refspec(struct remote *remote, const char > >> *refspec) > >> +{ > >> + struct refspec *rs; > >> + > >> + add_fetch_refspec(remote, refspec); > >> + rs = parse_fetch_refspec(1, &refspec); > >> + REALLOC_ARRAY(remote->fetch, remote->fetch_refspec_nr); > >> + remote->fetch[remote->fetch_refspec_nr - 1] = *rs; > >> + > >> + /* Not free_refspecs(), as we copied its pointers above */ > >> + free(rs); > >> +} > > > > What happens here if remote->fetch isn't already initialized? I think > > we'd end up with a bunch of garbage values. That's what I was trying to > > protect against in my original suggestion. > > > > I'm not sure if that's possible or not. We seem to initialize it in both > > remote_get() and for_each_remote(), and I don't think there are any > > other ways to get a remote. > > The only place creating remotes is remote.c:make_remote(), which > calloc()s the required memory, making all of struct remote's fields > zero-initialized. In case of clone the common case is that the user > doesn't specify any additional fetch refspecs, so remote->fetch will > still be NULL after full initialization and when > add_and_parse_fetch_refspec() is called with the default fetch > refspec, meaning we can't 'if (remote->fetch) { parse ... }'. OTOH, > all functions involved can cope with the fetch-refspec-related fields > being 0/NULL, and at the time remote->fetch_refspec_nr-1 is used for > array indexing it's not 0 anymore. Yeah, I agree it is safe now. I'm just worried about some function in remote.c later doing: read_config(); add_and_parse_fetch_refspec(remotes[0], whatever); which leaves the struct in an inconsistent state (we realloc NULL which allocates from scratch, and all of the other entries in remote->fetch end up uninitialized). Can we at least add an assertion like: if (!remote->fetch) BUG("cannot add refspec to an unparsed remote"); ? -Peff
Re: [PATCHv4 1/2] clone: respect additional configured fetch refspecs during initial fetch
On Wed, May 31, 2017 at 6:23 AM, Jeff King wrote: > On Tue, May 30, 2017 at 09:12:43AM +0200, SZEDER Gábor wrote: > >> diff --git a/remote.c b/remote.c >> index ad6c5424e..b8fd09dc9 100644 >> --- a/remote.c >> +++ b/remote.c >> @@ -626,6 +626,19 @@ struct refspec *parse_fetch_refspec(int nr_refspec, >> const char **refspec) >> return parse_refspec_internal(nr_refspec, refspec, 1, 0); >> } >> >> +void add_and_parse_fetch_refspec(struct remote *remote, const char *refspec) >> +{ >> + struct refspec *rs; >> + >> + add_fetch_refspec(remote, refspec); >> + rs = parse_fetch_refspec(1, &refspec); >> + REALLOC_ARRAY(remote->fetch, remote->fetch_refspec_nr); >> + remote->fetch[remote->fetch_refspec_nr - 1] = *rs; >> + >> + /* Not free_refspecs(), as we copied its pointers above */ >> + free(rs); >> +} > > What happens here if remote->fetch isn't already initialized? I think > we'd end up with a bunch of garbage values. That's what I was trying to > protect against in my original suggestion. > > I'm not sure if that's possible or not. We seem to initialize it in both > remote_get() and for_each_remote(), and I don't think there are any > other ways to get a remote. The only place creating remotes is remote.c:make_remote(), which calloc()s the required memory, making all of struct remote's fields zero-initialized. In case of clone the common case is that the user doesn't specify any additional fetch refspecs, so remote->fetch will still be NULL after full initialization and when add_and_parse_fetch_refspec() is called with the default fetch refspec, meaning we can't 'if (remote->fetch) { parse ... }'. OTOH, all functions involved can cope with the fetch-refspec-related fields being 0/NULL, and at the time remote->fetch_refspec_nr-1 is used for array indexing it's not 0 anymore. > (In fact, I kind of wondered why we do this > parsing lazily at all, but I think it's so that misconfigured remotes > don't cause us to die() if we aren't accessing them at all). > > If we assume that the contract that remote.c provides is that the > fetch fields are always filled in before a "struct remote" is returned > to a caller, and that only such callers would use this function, it's > OK. It just seems more dangerous than it needs to be. > > -Peff
Re: [PATCHv4 1/2] clone: respect additional configured fetch refspecs during initial fetch
On Tue, May 30, 2017 at 09:12:43AM +0200, SZEDER Gábor wrote: > diff --git a/remote.c b/remote.c > index ad6c5424e..b8fd09dc9 100644 > --- a/remote.c > +++ b/remote.c > @@ -626,6 +626,19 @@ struct refspec *parse_fetch_refspec(int nr_refspec, > const char **refspec) > return parse_refspec_internal(nr_refspec, refspec, 1, 0); > } > > +void add_and_parse_fetch_refspec(struct remote *remote, const char *refspec) > +{ > + struct refspec *rs; > + > + add_fetch_refspec(remote, refspec); > + rs = parse_fetch_refspec(1, &refspec); > + REALLOC_ARRAY(remote->fetch, remote->fetch_refspec_nr); > + remote->fetch[remote->fetch_refspec_nr - 1] = *rs; > + > + /* Not free_refspecs(), as we copied its pointers above */ > + free(rs); > +} What happens here if remote->fetch isn't already initialized? I think we'd end up with a bunch of garbage values. That's what I was trying to protect against in my original suggestion. I'm not sure if that's possible or not. We seem to initialize it in both remote_get() and for_each_remote(), and I don't think there are any other ways to get a remote. (In fact, I kind of wondered why we do this parsing lazily at all, but I think it's so that misconfigured remotes don't cause us to die() if we aren't accessing them at all). If we assume that the contract that remote.c provides is that the fetch fields are always filled in before a "struct remote" is returned to a caller, and that only such callers would use this function, it's OK. It just seems more dangerous than it needs to be. -Peff
[PATCHv4 1/2] clone: respect additional configured fetch refspecs during initial fetch
The initial fetch during a clone doesn't transfer refs matching additional fetch refspecs given on the command line as configuration variables. This contradicts the documentation stating that configuration variables specified via 'git clone -c = ...' "take effect immediately after the repository is initialized, but before the remote history is fetched" and the given example specifically mentions "adding additional fetch refspecs to the origin remote". Furthermore, one-shot configuration variables specified via 'git -c = clone ...', though not written to the newly created repository's config file, live during the lifetime of the 'clone' command, including the initial fetch. All this implies that any fetch refspecs specified this way should already be taken into account during the initial fetch. The reason for this is that the initial fetch is not a fully fledged 'git fetch' but a bunch of direct calls into the fetch/transport machinery with clone's own refs-to-refspec matching logic, which bypasses parts of 'git fetch' processing configured fetch refspecs. This logic only considers a single default refspec, potentially influenced by options like '--single-branch' and '--mirror'. The configured refspecs are, however, already read and parsed properly when clone calls remote.c:remote_get(), but it never looks at the parsed refspecs in the resulting 'struct remote'. Modify clone to take the configured fetch refspecs into account to retrieve all matching refs during the initial fetch. Note that the configuration at that point only includes the fetch refspecs specified by the user, but it doesn't include the default fetch refspec. Add a function to the remote API which encapsulates adding a fetch refspec to a 'struct remote' and parsing it, including all the necessary memory management housekeeping as well. Use this new function to add clone's default fetch refspec to the other refspecs the user might have specified. Add tests to check that refspecs given both via 'git clone -c ...' and 'git -c ... clone' retrieve all refs matching either the default or the additional refspecs, and that it works even when the user specifies an alternative remote name via '--origin='. Helped-by: Jeff King Signed-off-by: SZEDER Gábor --- builtin/clone.c | 34 +- remote.c| 13 + remote.h| 1 + t/t5611-clone-config.sh | 44 4 files changed, 75 insertions(+), 17 deletions(-) diff --git a/builtin/clone.c b/builtin/clone.c index a35d62293..4157922d8 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -520,7 +520,7 @@ static struct ref *find_remote_branch(const struct ref *refs, const char *branch } static struct ref *wanted_peer_refs(const struct ref *refs, - struct refspec *refspec) + struct refspec *refspec, unsigned int refspec_nr) { struct ref *head = copy_ref(find_ref_by_name(refs, "HEAD")); struct ref *local_refs = head; @@ -541,13 +541,18 @@ static struct ref *wanted_peer_refs(const struct ref *refs, warning(_("Could not find remote branch %s to clone."), option_branch); else { - get_fetch_map(remote_head, refspec, &tail, 0); + unsigned int i; + for (i = 0; i < refspec_nr; i++) + get_fetch_map(remote_head, &refspec[i], &tail, 0); /* if --branch=tag, pull the requested tag explicitly */ get_fetch_map(remote_head, tag_refspec, &tail, 0); } - } else - get_fetch_map(refs, refspec, &tail, 0); + } else { + unsigned int i; + for (i = 0; i < refspec_nr; i++) + get_fetch_map(refs, &refspec[i], &tail, 0); + } if (!option_mirror && !option_single_branch) get_fetch_map(refs, tag_refspec, &tail, 0); @@ -848,16 +853,13 @@ int cmd_clone(int argc, const char **argv, const char *prefix) const struct ref *our_head_points_at; struct ref *mapped_refs; const struct ref *ref; - struct strbuf key = STRBUF_INIT, value = STRBUF_INIT; + struct strbuf key = STRBUF_INIT, default_refspec = STRBUF_INIT; struct strbuf branch_top = STRBUF_INIT, reflog_msg = STRBUF_INIT; struct transport *transport = NULL; const char *src_ref_prefix = "refs/heads/"; struct remote *remote; int err = 0, complete_refs_before_fetch = 1; - struct refspec *refspec; - const char *fetch_pattern; - packet_trace_identity("clone"); argc = parse_options(argc, argv, prefix, builtin_clone_options, builtin_clone_usage, 0); @@ -975,7 +977,6 @@ int cmd_clone(int argc, const char **argv, const char *prefix) strbuf_a