Re: [PATCH 4/5] grep: optionally recurse into submodules
On Thu, Oct 27, 2016 at 3:38 PM, Brandon Williamswrote: > diff --git a/builtin/grep.c b/builtin/grep.c > index 8887b6add..f34f16df9 100644 > --- a/builtin/grep.c > +++ b/builtin/grep.c > @@ -18,12 +18,20 @@ > #include "quote.h" > #include "dir.h" > #include "pathspec.h" > +#include "submodule.h" > > static char const * const grep_usage[] = { > N_("git grep [] [-e] [...] [[--] ...]"), > NULL > }; > > +static const char *super_prefix; I think that the super_prefix changes could be in its own patch. > +static int recurse_submodules; > +static struct argv_array submodule_options = ARGV_ARRAY_INIT; I guess this has to be static because it is shared by multiple threads. > + > +static int grep_submodule_launch(struct grep_opt *opt, > +const struct grep_source *gs); > + > #define GREP_NUM_THREADS_DEFAULT 8 > static int num_threads; > > @@ -174,7 +182,10 @@ static void *run(void *arg) > break; > > opt->output_priv = w; > - hit |= grep_source(opt, >source); > + if (w->source.type == GREP_SOURCE_SUBMODULE) > + hit |= grep_submodule_launch(opt, >source); > + else > + hit |= grep_source(opt, >source); It seems to me that GREP_SOURCE_SUBMODULE is of a different nature than the other GREP_SOURCE_.* - in struct work_item, could we instead have another variable that distinguishes between submodules and "native" sources? This might also assuage Junio's concerns in about the nature of the sources. That variable could also be the discriminant for a tagged union, such that we have "struct grep_source" for the "native" sources and a new struct (holding only submodule-relevant information) for the submodule. > +/* > + * Prep grep structures for a submodule grep > + * sha1: the sha1 of the submodule or NULL if using the working tree > + * filename: name of the submodule including tree name of parent > + * path: location of the submodule > + */ > +static int grep_submodule(struct grep_opt *opt, const unsigned char *sha1, > + const char *filename, const char *path) > +{ > + if (!(is_submodule_initialized(path) && > + is_submodule_checked_out(path))) { > + warning("skiping submodule '%s%s' since it is not initialized > and checked out", > + super_prefix ? super_prefix: "", > + path); > + return 0; > + } > + > +#ifndef NO_PTHREADS > + if (num_threads) { > + add_work(opt, GREP_SOURCE_SUBMODULE, filename, path, sha1); > + return 0; > + } else > +#endif > + { > + struct work_item w; > + int hit; > + > + grep_source_init(, GREP_SOURCE_SUBMODULE, > +filename, path, sha1); > + strbuf_init(, 0); > + opt->output_priv = > + hit = grep_submodule_launch(opt, ); > + > + write_or_die(1, w.out.buf, w.out.len); > + > + grep_source_clear(); > + strbuf_release(); > + return hit; > + } This is at least the third invocation of this "if pthreads, add work, otherwise do it now" pattern - could this be extracted into its own function (in another patch)? Ideally, there would also be exactly one function in which the grep_source.* functions are invoked, and both "run" and the non-pthread code path can use it. > +} > + > +static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec, > + int cached) This line isn't modified other than the line break, as far as I can tell, so I wouldn't break it. > diff --git a/t/t7814-grep-recurse-submodules.sh > b/t/t7814-grep-recurse-submodules.sh > new file mode 100755 > index 0..b670c70cb > --- /dev/null > +++ b/t/t7814-grep-recurse-submodules.sh > @@ -0,0 +1,99 @@ > +#!/bin/sh > + > +test_description='Test grep recurse-submodules feature > + > +This test verifies the recurse-submodules feature correctly greps across > +submodules. > +' > + > +. ./test-lib.sh > + Would it be possible to also test it while num_threads is zero? (Or, if num_threads is already zero, to test it while it is not zero?)
Re: Regarding "git log" on "git series" metadata
Junio C Hamanowrites: >> I think the main complication is that the reachability rules are used >> during object transfer. One should not type after spending 20+ waking hours on plane and airport. I missed it when I wrote my first response, but yes, the reachability that originates from inside a tree object indeed is a problem, as we do not dig into trees while doing the want/have/ack exhange.
Re: Regarding "git log" on "git series" metadata
On Fri, Nov 04, 2016 at 09:41:06PM -0700, Junio C Hamano wrote: > > I think the main complication is that the reachability rules are used > > during object transfer. So you'd probably want to introduce some > > protocol extension to say "I understand gitrefs", so that when one side > > says "I have sha1 X and its reachable objects", we know whether they are > > including gitrefs there. And likewise receivers with > > transfer.fsckObjects may complain about the new gitref tree mode > > (fortunately a new object type shouldn't be needed). > > Quite honestly I do not think backward compatibility here matters. > When gitlinks were introduced, a repository that was created with > gitlink capable version of Git would have failed "git fsck" that is > not gitlink aware, and I think this new "link with reachability" is > the same deal. No existing implemention understands a tree entry > whose mode bits are 14 or whatever new bit pattern we would > assign to this thing. You have to wait until both ends understand > the new thing, and that is perfectly OK. I'm OK with saying "if you use the gitref feature, you cannot push or pull those objects with remotes that do not understand it". But unlike gitlink, if we fail to notice the situation, we run into a case where we might silently lose objects, which is bad. So I think we need to be a bit more careful. I don't think the problems are insurmountable. I just think that's where the real complexity is, not in the changes to teach a single git about gitrefs. I'm happy to stand back and let you or Josh figure out all the corner cases. :) -Peff
Re: Regarding "git log" on "git series" metadata
Christian Couderwrites: > Couldn't a RefTree be used to store refs that point to the base > commit, I think it is the other way around. With the new "gitref" thing that is a pointer to an in-repository commit, RefTree can be naturally implemented.
Re: Regarding "git log" on "git series" metadata
On Fri, Nov 04, 2016 at 09:55:23PM -0600, Josh Triplett wrote: > >> Is it possible currently for a protocol extension to result in "oh > >the > >> server doesn't support this so I'm going to stop pushing"? > > > >Yes, it would be easy for the client to abort if the server fails to > >advertise a particular extension. > > And the reverse (old client, new server) should work as well? Yes. The server says "I know about gitrefs" and if the client does not respond with "I also know about gitrefs", then the server can act appropriately (e.g., for a fetch, bail if the fetched content includes gitrefs). -Peff
Re: Regarding "git log" on "git series" metadata
Jeff Kingwrites: > On Fri, Nov 04, 2016 at 12:19:55PM -0700, Jacob Keller wrote: > >> I agree with your assessment here. The main difficulty in implementing >> gitrefs is to ensure that they actually do get picked up by >> reachability checks to prevent dropping commits. I'm not sure how easy >> this is, but I would much rather we go this route rather than >> continuing along with the hack. This seems like the ideal solution, >> since it solves the entire problem and doesn't need more hacks bolted >> on. > > I think the main complication is that the reachability rules are used > during object transfer. So you'd probably want to introduce some > protocol extension to say "I understand gitrefs", so that when one side > says "I have sha1 X and its reachable objects", we know whether they are > including gitrefs there. And likewise receivers with > transfer.fsckObjects may complain about the new gitref tree mode > (fortunately a new object type shouldn't be needed). Quite honestly I do not think backward compatibility here matters. When gitlinks were introduced, a repository that was created with gitlink capable version of Git would have failed "git fsck" that is not gitlink aware, and I think this new "link with reachability" is the same deal. No existing implemention understands a tree entry whose mode bits are 14 or whatever new bit pattern we would assign to this thing. You have to wait until both ends understand the new thing, and that is perfectly OK. Besides, I think the point of having this discussion is that Josh did a good prototyping work of "git series" to discover what he can do in that area of "keeping track of history of history" and what operations are useful, without wasting time on mucking with the object model and traversal machinery that are available to him. Now his prototyping is at the point where he knows at least one enhancement to the core that would help him to redo the prototype in the right way. And I do not mind helping him from the core side.
Re: Regarding "git log" on "git series" metadata
On November 4, 2016 7:48:17 PM MDT, Jeff Kingwrote: >On Fri, Nov 04, 2016 at 04:34:34PM -0700, Jacob Keller wrote: > >> > You might also want fallback rules for storing gitrefs on "old" >servers >> > (e.g., backfilling gitrefs you need if the server didn't them in >the >> > initial fetch). But I guess storing any gitrefs on such a server is >> > inherently dangerous, because the server might prune them at any >time. >> >> Is it possible currently for a protocol extension to result in "oh >the >> server doesn't support this so I'm going to stop pushing"? > >Yes, it would be easy for the client to abort if the server fails to >advertise a particular extension. And the reverse (old client, new server) should work as well? - Josh Triplett
Re: [PATCH v2 1/6] submodules: add helper functions to determine presence of submodules
On 10/31/2016 03:38 PM, Brandon Williams wrote: + struct strbuf buf = STRBUF_INIT; + char *submodule_url = NULL; + + strbuf_addf(, "submodule.%s.url", module->name); + ret = !git_config_get_string(buf.buf, _url); + + free(submodule_url); + strbuf_release(); + } + + return ret; +} + +/* + * Determine if a submodule has been checked out at a given 'path' + */ +int is_submodule_checked_out(const char *path) +{ + int ret = 0; + struct strbuf buf = STRBUF_INIT; + + strbuf_addf(, "%s/.git", path); + ret = file_exists(buf.buf); + + strbuf_release(); In this and the previous function, you can use xstrfmt.
Re: Regarding "git log" on "git series" metadata
On Fri, Nov 04, 2016 at 04:34:34PM -0700, Jacob Keller wrote: > > You might also want fallback rules for storing gitrefs on "old" servers > > (e.g., backfilling gitrefs you need if the server didn't them in the > > initial fetch). But I guess storing any gitrefs on such a server is > > inherently dangerous, because the server might prune them at any time. > > Is it possible currently for a protocol extension to result in "oh the > server doesn't support this so I'm going to stop pushing"? Yes, it would be easy for the client to abort if the server fails to advertise a particular extension. What I would worry about more is that "somehow" an older client gets hold of history with a gitref, and then pushes it. It would be nice if even an old server said "nope, I don't understand this and I won't take it" rather than propagating the data to a server that will throw it away. > Right. I'm assuming tree objects don't get checked for invalid mode > already? If they do, we could just change the mode to something > unsupported currently. But... that seems like it might not be the case > because it requires checking every tree object coming in? > > I'm not familiar with what sort of checking already exists... Thoughts? If the server sets receive.fsckObjects, then fsck_tree() runs and will reject any non-standard mode. That option is not the default, though some big hosters set it (GitHub does, but I am actually not that worried about GitHub; if gitrefs support materialized I would probably ship it there fairly promptly). -Peff
Re: [PATCH v1 12/19] Documentation/config: add splitIndex.maxPercentChange
On Tue, Nov 1, 2016 at 8:19 PM, Junio C Hamanowrote: > Christian Couder writes: > >> +splitIndex.maxPercentChange:: >> + When the split index feature is used, this specifies the >> + percent of entries the split index can contain compared to the >> + whole number of entries in both the split index and the shared >> + index before a new shared index is written. >> + The value should be between 0 and 100. If the value is 0 then >> + a new shared index is always written, if it is 100 a new >> + shared index is never written. > > Hmph. The early part of the description implies this will kick in > only when some other conditions (i.e. the bit in the index or the > other configuration) are met, but if this disables the split index > when it is set to 0, would we even need the other configuration > variable? IOW, perhaps we can do without core.splitIndex? I think it is easier for user to be able to just set core.splitIndex to true to enable split-index. If we ask them to instead set splitIndex.maxPercentChange to a "reasonable value like 20" to enable it, they will start to wonder why 20 and not 50 or 100 and might be unnecessarily confused. Also it might be good to make it possible to have other values than "true" and "false" for core.splitIndex in the future. For example if we decide to first enable split-index only on $GIT_DIR/index, we might later want an "everywhere" or "always" value for core.splitIndex if we find a way to enable it on other indexes. We might also want an "auto" mode in the future when split-index will work really well, so that it can be turned on automatically on all the repos where it makes sense (like when the index contains more than 1 entries for example). >> + By default the value is 20, so a new shared index is written >> + if the number of entries in the split index would be greater >> + than 20 percent of the total number of entries. >> + See linkgit:git-update-index[1].
Re: [PATCH v1 09/19] config: add git_config_get_max_percent_split_change()
On Tue, Nov 1, 2016 at 8:13 PM, Junio C Hamanowrote: > >>> + return -1; > > Perhaps do the usual > > return error(_("...")); > > here? Ok, it will be in the next version.
Re: [PATCH v3] transport: add protocol policy config option
On 11/04, Jeff King wrote: > On Fri, Nov 04, 2016 at 02:35:57PM -0700, Stefan Beller wrote: > > > On Fri, Nov 4, 2016 at 1:58 PM, Brandon Williamswrote: > > > On 11/04, Brandon Williams wrote: > > >> Signed-off-by: Brandon Williams > > > > > > Is there an acceptable way to give credit to Jeff for helping with this > > > patch? > > > > What about: > > Helped-by: Jeff King > > That, or often I would write: > > Based on a patch by Jeff King > > in the commit message. Basically anything is OK _except_ forging > signed-off-by, because it has a very specific meaning. So let me also > say that I am happy to give my: > > Signed-off-by: Jeff King > > to the original (which you should add in, to make clear that the > copyright issues are OK). > > In some cases it makes sense to just roll somebody's patch into your > series, and then build on top. I'm fine with it all going into a single > patch here. > > -Peff Oh if it would be more clear I can easily break it up into two patches. -- Brandon Williams
Re: Regarding "git log" on "git series" metadata
On Fri, Nov 04, 2016 at 04:37:34PM -0700, Jacob Keller wrote: > On Fri, Nov 4, 2016 at 2:55 PM, Josh Triplettwrote: > > That said, I'd *love* to have gitrefs available, for a wide variety of > > applications, and I can see an argument for introducing them and waiting > > a few years for them to become universally available, similar to the > > process gitlinks went through. > > > > But I'd also love to have a backward-compatible solution. > > > > - Josh Triplett > > I think that you won't really find a backwards compatible solution > other than something like automatically generating refs for each point > of history. I know that gerrit does something like this by storing > each version in "refs/changes/id/version" or something along those > lines. I think this might actually be cleaner than your parent links > hack, and could be used as a fallback for when gitrefs don't work, > though you'd have to code exactly how to tell what to push to a > repository when pushing a series? I'm not sure what the advantage of that would be, and it would mean that if you ever have one branch without pushing the other(s), you'd get severe time-delated breakage due to pruning. (And if you pushed the series without the other ref(s), its history would look right but then you couldn't access the underlying versions of the patch series.) One of my design goals was to *not* need a special "git series push" or "git series pull"; you should just be able to use git push and git pull, and you can set up normal refspecs. That said, I could fairly easily generate the existing format with artificial parent refs for backward compatibility, and provide a way to use the new gitref-based storage format if you know that all your servers and clients can handle it. I'm also open to other suggestions for how to make such a transition while still working with every git server and git client that exists today.
Re: Regarding "git log" on "git series" metadata
On Fri, Nov 4, 2016 at 2:55 PM, Josh Triplettwrote: > That said, I'd *love* to have gitrefs available, for a wide variety of > applications, and I can see an argument for introducing them and waiting > a few years for them to become universally available, similar to the > process gitlinks went through. > > But I'd also love to have a backward-compatible solution. > > - Josh Triplett I think that you won't really find a backwards compatible solution other than something like automatically generating refs for each point of history. I know that gerrit does something like this by storing each version in "refs/changes/id/version" or something along those lines. I think this might actually be cleaner than your parent links hack, and could be used as a fallback for when gitrefs don't work, though you'd have to code exactly how to tell what to push to a repository when pushing a series? Thanks, Jake
Re: Regarding "git log" on "git series" metadata
On Fri, Nov 4, 2016 at 12:49 PM, Jeff Kingwrote: > I think the main complication is that the reachability rules are used > during object transfer. So you'd probably want to introduce some > protocol extension to say "I understand gitrefs", so that when one side > says "I have sha1 X and its reachable objects", we know whether they are > including gitrefs there. And likewise receivers with > transfer.fsckObjects may complain about the new gitref tree mode > (fortunately a new object type shouldn't be needed). > > You might also want fallback rules for storing gitrefs on "old" servers > (e.g., backfilling gitrefs you need if the server didn't them in the > initial fetch). But I guess storing any gitrefs on such a server is > inherently dangerous, because the server might prune them at any time. > Is it possible currently for a protocol extension to result in "oh the server doesn't support this so I'm going to stop pushing"? This would be a rather hard transition, but it would at least ensure that pushing to a server which doesn't support gitrefs would fail rather than silently accept objects and then discard them later? I think this is the only real transition unless we can make a change that old servers object to already. > So perhaps a related question is: how can gitrefs be designed such that > existing servers reject them (rather than accepting the push and then > later throwing away half the data). It would be easy to notice in the > client during a push that we are sending gitrefs to a server which does > not claim that capability. But it seems more robust if it is the server > who decides "I will not accept these bogus objects". > > I haven't thought all that hard about this. That's just my initial > thoughts on what sound hard. Tweaking the reachability code doesn't seem > all that bad; we already know all of the spots that care about > S_ISGITLINK(). It may even be that some of those spots work out of the > box (because gitlinks are usually about telling the graph-walking code > that we _don't_ care about reachability; we do by default for trees and > blobs). Right. I'm assuming tree objects don't get checked for invalid mode already? If they do, we could just change the mode to something unsupported currently. But... that seems like it might not be the case because it requires checking every tree object coming in? I'm not familiar with what sort of checking already exists... Thoughts? > > I'd be surprised if all such sites work out of the box, though. Even if > they see "ah, sha1 X is referenced by tree Y and isn't a gitlink, and > therefore should be reachable", they need to also note that "X" is a > commit and recursively walk its objects. > They won't all work out of the box, but it shouldn't be much work to do this part. > -Peff
Re: [PATCH v3] transport: add protocol policy config option
On Fri, Nov 04, 2016 at 02:35:57PM -0700, Stefan Beller wrote: > On Fri, Nov 4, 2016 at 1:58 PM, Brandon Williamswrote: > > On 11/04, Brandon Williams wrote: > >> Signed-off-by: Brandon Williams > > > > Is there an acceptable way to give credit to Jeff for helping with this > > patch? > > What about: > Helped-by: Jeff King That, or often I would write: Based on a patch by Jeff King in the commit message. Basically anything is OK _except_ forging signed-off-by, because it has a very specific meaning. So let me also say that I am happy to give my: Signed-off-by: Jeff King to the original (which you should add in, to make clear that the copyright issues are OK). In some cases it makes sense to just roll somebody's patch into your series, and then build on top. I'm fine with it all going into a single patch here. -Peff
Re: [PATCH v3] transport: add protocol policy config option
On Fri, Nov 04, 2016 at 01:55:33PM -0700, Brandon Williams wrote: > Previously the `GIT_ALLOW_PROTOCOL` environment variable was used to > specify a whitelist of protocols to be used in clone/fetch/pull > commands. This patch introduces new configuration options for more > fine-grained control for allowing/disallowing protocols. This also has > the added benefit of allowing easier construction of a protocol > whitelist on systems where setting an environment variable is > non-trivial. Good rationale. > Now users can specify a policy to be used for each type of protocol via > the 'protocol..allow' config option. A default policy for all > unknown protocols can be set with the 'protocol.allow' config option. I think "unconfigured" is a better word here than "unknown", as it would apply to known protocols like "https", etc. That made me wonder if "unknown" would be a better behavior, but I'm pretty sure it is not. It is harder to explain, and I think would be less convenient in practice. I.e., you really do want: git config protocol.allow never git config protocol.https.allow always to allow nothing but https. > If no user configured default is made git, by default, will allow > known-safe protocols (http, https, git, ssh, file), disallow > known-dangerous protocols (ext), and have a default poliy of `user` for > all other protocols. I think this is a good way of thinking about it. The order of enforcement becomes: - GIT_ALLOW_PROTOCOL; environment variables always take precedence over config, so this makes sense. And it also is nice to put the blunt hammer at the front for backwards-compatibility. - protocol-specific config - protocol-generic config - built-in defaults (known-safe, known-scary, unknown) which seems right. Also, s/poliy/policy/. > The supported policies are `always`, `never`, and `user`. The `user` > policy can be used to configure a protocol to be usable when explicitly > used by a user, while disallowing it for commands which run > clone/fetch/pull commands without direct user intervention (e.g. > recursive initialization of submodules). Commands which can potentially > clone/fetch/pull from untrusted repositories without user intervention > can export `GIT_PROTOCOL_FROM_USER` with a value of '0' to prevent > protocols configured to the `user` policy from being used. Makes sense. I know "user" came from me. I don't know if there is a better word to describe it. I originally called it "cmdline", but that seemed too obscure (especially when a tool external to git sets it). Something like "trusted" might make sense (we allow it only in a more-trusted setting), but it's kind of vague. And it also doesn't leave room for there to be more types of trust in the future. So "user" is probably reasonable (or perhaps "user-only" or similar). > diff --git a/Documentation/config.txt b/Documentation/config.txt > index 27069ac..5d845c4 100644 > --- a/Documentation/config.txt > +++ b/Documentation/config.txt > @@ -2308,6 +2308,31 @@ pretty.:: > Note that an alias with the same name as a built-in format > will be silently ignored. > > +protocol.allow:: > + If set, provide a user defined default policy for all protocols which > + don't explicitly have a policy (protocol..allow). By default, > + if unset, known-safe protocols (http, https, git, ssh, file) have a > + default policy of `always`, known-dangerous protocols (ext) have a > + default policy of `never`, and all other protocols have a default policy > + of `user`. Supported policies: > ++ > +-- > + > +* `always` - protocol is always able to be used. > + > +* `never` - protocol is never able to be used. > + > +* `user` - protocol is only able to be used when `GIT_PROTOCOL_FROM_USER` is > + either unset or has a value of 1. This policy should be used when you > want a > + protocol to be usable by the user but don't want it used by commands which > + execute clone/fetch/pull commands without user input, e.g. recursive > + submodule initialization. Makes sense. I wonder if it would be good to emphasize _directly_ usable here. I.e., "...when you want a protocol to be directly usable by the user but don't want...". Should clone/fetch/pull also include push? > +protocol..allow:: > + Set a policy to be used by protocol with clone/fetch/pull > commands. > + Nice that this matches protocol.allow, so we don't need to re-explain that. Should the list of protocols be here? I know they're covered under GIT_ALLOW_PROTOCOL already, but if this is the preferred system, we should probably explain them here, and then just have GIT_ALLOW_PROTOCOL refer the user. > diff --git a/Documentation/git.txt b/Documentation/git.txt > index ab7215e..ab25580 100644 > --- a/Documentation/git.txt > +++ b/Documentation/git.txt > @@ -1150,13 +1150,13 @@ of clones and fetches. > cloning a repository to make a backup). > > `GIT_ALLOW_PROTOCOL`:: > - If set, provide a
Re: Regarding "git log" on "git series" metadata
On Fri, Nov 4, 2016 at 10:19 PM, Josh Triplettwrote: > On Fri, Nov 04, 2016 at 09:47:41PM +0100, Christian Couder wrote: >> On Fri, Nov 4, 2016 at 6:57 PM, Junio C Hamano wrote: >> > >> > Imagine we invent a new tree entry type, "gitref", that is similar >> > to "gitlink" in that it can record a commit object name in a tree, >> > but unlike "gitlink" it does imply reachability. And you do not add >> > phony parents to your commit object. A tree that has "gitref"s in >> > it is about annotating the commits in the same repository (e.g. the >> > tree references two commits, "base" and "tip", to point into a slice >> > of the main history). And it is perfectly sensible for such a >> > pointer to imply reachability---after all it serves different >> > purposes from "gitlink". >> >> The more I think about this (and also about how to limit ref >> advertisements as recently discussed in >> https://public-inbox.org/git/20161024132932.i42rqn2vlpocq...@sigill.intra.peff.net/), >> the more I think about Shawn's RefTree: >> >> https://public-inbox.org/git/CAJo=hjvnapnaddcaawavu9c4rveqdos3ev9wtguhx4fd0v_...@mail.gmail.com/ >> >> Couldn't a RefTree be used to store refs that point to the base >> commit, the tip commit and the blob that contains the cover letter, >> and maybe also a ref pointing to the RefTree of the previous version >> of the series? > > That's really interesting! The Software Heritage project is working on > something similar, because they want to store all the refs as part of > their data model as well. I'll point them to the reftree work. Yeah, I know them :-) and I think I have already told Stefano Zacchiroli about this, but I am not sure anymore. Anyway I am CC'ing him. > If upstream git supported RefTree, I could potentially use that for > git-series. However, I do want a commit message and history for the > series itself, and using refs in the reftree to refer to the parents > seems like abusing reftree to recreate commits, in a reversal of the > hack of using commit parents as a reftree. :) Yeah, maybe :-) But the properties of the existing Git objects we already use wouldn't change at all. > What if, rather than storing a hash reference to a reftree as a single > reference and replacing it with no history, In what I suggest the history is kept because the new reftree has a ref that points to the old one it is replacing. Yeah, this reftree history maybe seen as "redundant" with the commit history, but in my opinion this can be seen as a "feature" that will prevent us from "mucking" too much with the commit object. > a reftree could be > referenced from a commit and have history? (That would also allow > tagging a version of the reftree.) I think that tags are already allowed to point to any kind of Git object, so tagging a reftree should be allowed anyway if we add a reftree object.
Re: gitk: avoid obscene memory consumption
On Fri, Nov 4, 2016 at 12:49 PM, Markus Hitterwrote: > > Hello all, +cc Paul Mackeras, who maintains gitk. > > after Gitk brought my shabby development machine (Core2Duo, 4 GB RAM, Ubuntu > 16.10, no swap to save the SSD) to its knees once more than I'm comfortable > with, I decided to investigate this issue. > > Result of this investigation is, my Git repo has a commit with a diff of some > 365'000 lines and Gitk tries to display all of them, consuming more than 1.5 > GB of memory. > > The solution is to cut off diffs at 50'000 lines for the display. This > consumes about 350 MB RAM, still a lot. These first 50'000 lines are shown, > followed by a copyable message on how to view the full diff on the command > line. Diffs shorter than this limit are displayed as before. Bikeshedding: I'd argue to even lower the number to 5-10k lines. > > To test the waters whether such a change is welcome, here's the patch as I > currently use it. If this patch makes sense I'll happily apply change > requests and bring it more in line with Git's patch submission expectations. I have never contributed to gitk myself, which is hosted at git://ozlabs.org/~paulus/gitk though I'd expect these guide lines would roughly apply: https://github.com/git/git/blob/master/Documentation/SubmittingPatches
Re: [PATCH v3] transport: add protocol policy config option
On Fri, Nov 4, 2016 at 1:55 PM, Brandon Williamswrote: > Previously the `GIT_ALLOW_PROTOCOL` environment variable was used to > specify a whitelist of protocols to be used in clone/fetch/pull > commands. This patch introduces new configuration options for more > fine-grained control for allowing/disallowing protocols. This also has > the added benefit of allowing easier construction of a protocol > whitelist on systems where setting an environment variable is > non-trivial. > > Now users can specify a policy to be used for each type of protocol via > the 'protocol..allow' config option. A default policy for all > unknown protocols can be set with the 'protocol.allow' config option. > If no user configured default is made git, by default, will allow > known-safe protocols (http, https, git, ssh, file), disallow > known-dangerous protocols (ext), and have a default poliy of `user` for > all other protocols. > > The supported policies are `always`, `never`, and `user`. The `user` > policy can be used to configure a protocol to be usable when explicitly > used by a user, while disallowing it for commands which run > clone/fetch/pull commands without direct user intervention (e.g. > recursive initialization of submodules). Commands which can potentially > clone/fetch/pull from untrusted repositories without user intervention > can export `GIT_PROTOCOL_FROM_USER` with a value of '0' to prevent > protocols configured to the `user` policy from being used. > > Signed-off-by: Brandon Williams > --- > Documentation/config.txt | 25 > Documentation/git.txt| 19 +++--- > git-submodule.sh | 12 ++-- > t/lib-proto-disable.sh | 132 > +++ > t/t5509-fetch-push-namespaces.sh | 1 + > t/t5802-connect-helper.sh| 1 + > transport.c | 73 +- > 7 files changed, 235 insertions(+), 28 deletions(-) > > diff --git a/Documentation/config.txt b/Documentation/config.txt > index 27069ac..5d845c4 100644 > --- a/Documentation/config.txt > +++ b/Documentation/config.txt > @@ -2308,6 +2308,31 @@ pretty.:: > Note that an alias with the same name as a built-in format > will be silently ignored. > > +protocol.allow:: > + If set, provide a user defined default policy for all protocols which > + don't explicitly have a policy (protocol..allow). By default, Use hyphens (`protocol..allow`) to highlight the config option. By default, if unset, ... have a default policy ... sounds strange. How about just dropping the first 4 words here: Known-safe protocols (http, https, git, ssh, file) have a default policy of `always`, known-dangerous protocols (ext) have a default policy of `never`, and all other protocols have a default policy of `user`. Supported policies: What happens if protocol.allow is set to true? > + if unset, known-safe protocols (http, https, git, ssh, file) have a > + default policy of `always`, known-dangerous protocols (ext) have a > + default policy of `never`, and all other protocols have a default > policy > + of `user`. Supported policies: > ++ > +-- > + > +* `always` - protocol is always able to be used. > + > +* `never` - protocol is never able to be used. > + > +* `user` - protocol is only able to be used when `GIT_PROTOCOL_FROM_USER` is > + either unset or has a value of 1. This policy should be used when you > want a > + protocol to be usable by the user but don't want it used by commands which > + execute clone/fetch/pull commands without user input, e.g. recursive > + submodule initialization. > + > +-- > + > +protocol..allow:: > + Set a policy to be used by protocol with clone/fetch/pull > commands. How does this interact with protocol.allow? When protocol.allow is set, this overrides the specific protocol. If protocol is not set, it overrides the specific protocol as well(?) > + > pull.ff:: > By default, Git does not create an extra merge commit when merging > a commit that is a descendant of the current commit. Instead, the > diff --git a/Documentation/git.txt b/Documentation/git.txt > index ab7215e..ab25580 100644 > --- a/Documentation/git.txt > +++ b/Documentation/git.txt > @@ -1150,13 +1150,13 @@ of clones and fetches. > cloning a repository to make a backup). > > `GIT_ALLOW_PROTOCOL`:: > - If set, provide a colon-separated list of protocols which are > - allowed to be used with fetch/push/clone. This is useful to > - restrict recursive submodule initialization from an untrusted > - repository. Any protocol not mentioned will be disallowed (i.e., > - this is a whitelist, not a blacklist). If the variable is not > - set at all, all protocols are enabled. The protocol names > - currently used by git are: > + The new way to configure allowed
gitk: avoid obscene memory consumption
Hello all, after Gitk brought my shabby development machine (Core2Duo, 4 GB RAM, Ubuntu 16.10, no swap to save the SSD) to its knees once more than I'm comfortable with, I decided to investigate this issue. Result of this investigation is, my Git repo has a commit with a diff of some 365'000 lines and Gitk tries to display all of them, consuming more than 1.5 GB of memory. The solution is to cut off diffs at 50'000 lines for the display. This consumes about 350 MB RAM, still a lot. These first 50'000 lines are shown, followed by a copyable message on how to view the full diff on the command line. Diffs shorter than this limit are displayed as before. To test the waters whether such a change is welcome, here's the patch as I currently use it. If this patch makes sense I'll happily apply change requests and bring it more in line with Git's patch submission expectations. The patch is made against git(k) version 2.9.3, the one coming with latest Ubuntu. Please also note that this is the first time I wrote some Tcl code, so the strategy used might not follow best Tcl practices. $ diff -uw /usr/bin/gitk.org /usr/bin/gitk --- /usr/bin/gitk.org 2016-08-16 22:32:47.0 +0200 +++ /usr/bin/gitk 2016-11-04 20:06:14.805920404 +0100 @@ -7,6 +7,15 @@ # and distributed under the terms of the GNU General Public Licence, # either version 2, or (at your option) any later version. +# Markus: trying to limit memory consumption. It happened that +# complex commits led to more than 1.5 GB of memory usage. +# +# The problem was identified to be caused by extremely long diffs. The +# commit leading to this research had some 365'000 lines of diff, consuming +# these 1.5 GB when drawn into the canvas. The solution is to limit diffs to +# 50'000 lines and skipping the rest. In case of a cutoff, a CLI command for +# getting the full diff is shown. + package require Tk proc hasworktree {} { @@ -7956,6 +7965,7 @@ proc getblobdiffs {ids} { global blobdifffd diffids env +global parseddifflines global treediffs global diffcontext global ignorespace @@ -7987,6 +7997,7 @@ } fconfigure $bdf -blocking 0 -encoding binary -eofchar {} set blobdifffd($ids) $bdf +set parseddifflines 0 initblobdiffvars filerun $bdf [list getblobdiffline $bdf $diffids] } @@ -8063,20 +8074,34 @@ proc getblobdiffline {bdf ids} { global diffids blobdifffd +global parseddifflines global ctext set nr 0 +set maxlines 5 $ctext conf -state normal while {[incr nr] <= 1000 && [gets $bdf line] >= 0} { +incr parseddifflines +if {$parseddifflines >= $maxlines} { +break +} if {$ids != $diffids || $bdf != $blobdifffd($ids)} { catch {close $bdf} return 0 } parseblobdiffline $ids $line } +if {$parseddifflines >= $maxlines} { +$ctext insert end "\n--" hunksep +$ctext insert end " Lines exceeding $maxlines skipped " hunksep +$ctext insert end "--\n\n" hunksep +$ctext insert end "To get a full diff, run\n\n" hunksep +$ctext insert end " git diff-tree -p -C --cc $ids\n\n" hunksep +$ctext insert end "on the command line.\n" hunksep +} $ctext conf -state disabled blobdiffmaybeseehere [eof $bdf] -if {[eof $bdf]} { +if {[eof $bdf] || $parseddifflines >= $maxlines} { catch {close $bdf} return 0 } @@ -9093,6 +9118,7 @@ proc diffcommits {a b} { global diffcontext diffids blobdifffd diffinhdr currdiffsubmod +global parseddifflines set tmpdir [gitknewtmpdir] set fna [file join $tmpdir "commit-[string range $a 0 7]"] @@ -9114,6 +9140,7 @@ set blobdifffd($diffids) $fd set diffinhdr 0 set currdiffsubmod "" +set parseddifflines 0 filerun $fd [list getblobdiffline $fd $diffids] } Cheers, Markus -- - - - - - - - - - - - - - - - - - - - Dipl. Ing. (FH) Markus Hitter http://www.jump-ing.de/
Re: Regarding "git log" on "git series" metadata
On Fri, Nov 04, 2016 at 03:49:07PM -0400, Jeff King wrote: > On Fri, Nov 04, 2016 at 12:19:55PM -0700, Jacob Keller wrote: > > > I agree with your assessment here. The main difficulty in implementing > > gitrefs is to ensure that they actually do get picked up by > > reachability checks to prevent dropping commits. I'm not sure how easy > > this is, but I would much rather we go this route rather than > > continuing along with the hack. This seems like the ideal solution, > > since it solves the entire problem and doesn't need more hacks bolted > > on. > > I think the main complication is that the reachability rules are used > during object transfer. So you'd probably want to introduce some > protocol extension to say "I understand gitrefs", so that when one side > says "I have sha1 X and its reachable objects", we know whether they are > including gitrefs there. And likewise receivers with > transfer.fsckObjects may complain about the new gitref tree mode > (fortunately a new object type shouldn't be needed). > > You might also want fallback rules for storing gitrefs on "old" servers > (e.g., backfilling gitrefs you need if the server didn't them in the > initial fetch). But I guess storing any gitrefs on such a server is > inherently dangerous, because the server might prune them at any time. > > So perhaps a related question is: how can gitrefs be designed such that > existing servers reject them (rather than accepting the push and then > later throwing away half the data). It would be easy to notice in the > client during a push that we are sending gitrefs to a server which does > not claim that capability. But it seems more robust if it is the server > who decides "I will not accept these bogus objects". This seems like the critical problem, here. The parent hack I used in git-series might be a hack, but it transparently works with old servers and clients. So, for instance, I can push a git-series ref to github, with no changes required on github's part. If git added gitrefs, and I started using them in git-series, then that'd eliminate parent hack and allow many standard git tools to work naturally on git-series commits and history, but it'd also mean that people couldn't push git-series commits to any server until that server updates git. That said, I'd *love* to have gitrefs available, for a wide variety of applications, and I can see an argument for introducing them and waiting a few years for them to become universally available, similar to the process gitlinks went through. But I'd also love to have a backward-compatible solution. - Josh Triplett
[PATCH] doc: fill in omitted word
Signed-off-by: Kristoffer Haugsbakk--- Documentation/gitcore-tutorial.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/gitcore-tutorial.txt b/Documentation/gitcore-tutorial.txt index 4546fa0..9860517 100644 --- a/Documentation/gitcore-tutorial.txt +++ b/Documentation/gitcore-tutorial.txt @@ -25,7 +25,7 @@ you want to understand Git's internals. The core Git is often called "plumbing", with the prettier user interfaces on top of it called "porcelain". You may not want to use the plumbing directly very often, but it can be good to know what the -plumbing does for when the porcelain isn't flushing. +plumbing does for you when the porcelain isn't flushing. Back when this document was originally written, many porcelain commands were shell scripts. For simplicity, it still uses them as -- 2.10.2
Re: [PATCH v3] transport: add protocol policy config option
On Fri, Nov 4, 2016 at 1:58 PM, Brandon Williamswrote: > On 11/04, Brandon Williams wrote: >> Signed-off-by: Brandon Williams > > Is there an acceptable way to give credit to Jeff for helping with this patch? What about: Helped-by: Jeff King > > -- > Brandon Williams
Re: Regarding "git log" on "git series" metadata
On Fri, Nov 04, 2016 at 09:47:41PM +0100, Christian Couder wrote: > On Fri, Nov 4, 2016 at 6:57 PM, Junio C Hamanowrote: > > > > Imagine we invent a new tree entry type, "gitref", that is similar > > to "gitlink" in that it can record a commit object name in a tree, > > but unlike "gitlink" it does imply reachability. And you do not add > > phony parents to your commit object. A tree that has "gitref"s in > > it is about annotating the commits in the same repository (e.g. the > > tree references two commits, "base" and "tip", to point into a slice > > of the main history). And it is perfectly sensible for such a > > pointer to imply reachability---after all it serves different > > purposes from "gitlink". > > The more I think about this (and also about how to limit ref > advertisements as recently discussed in > https://public-inbox.org/git/20161024132932.i42rqn2vlpocq...@sigill.intra.peff.net/), > the more I think about Shawn's RefTree: > > https://public-inbox.org/git/CAJo=hjvnapnaddcaawavu9c4rveqdos3ev9wtguhx4fd0v_...@mail.gmail.com/ > > Couldn't a RefTree be used to store refs that point to the base > commit, the tip commit and the blob that contains the cover letter, > and maybe also a ref pointing to the RefTree of the previous version > of the series? That's really interesting! The Software Heritage project is working on something similar, because they want to store all the refs as part of their data model as well. I'll point them to the reftree work. If upstream git supported RefTree, I could potentially use that for git-series. However, I do want a commit message and history for the series itself, and using refs in the reftree to refer to the parents seems like abusing reftree to recreate commits, in a reversal of the hack of using commit parents as a reftree. :) What if, rather than storing a hash reference to a reftree as a single reference and replacing it with no history, a reftree could be referenced from a commit and have history? (That would also allow tagging a version of the reftree.)
Re: Regarding "git log" on "git series" metadata
On Fri, Nov 04, 2016 at 10:57:09AM -0700, Junio C Hamano wrote: > After your talk at LPC2016, I was thinking about your proposal to > give an option to hide certain parents from "git log" traversal. > > While I do not think we would terribly mind a new feature in the > core to support third-party additions like "git series" better, I > think this particular one is a big mistake that we shouldn't take. [...] > I think this is backwards. The root cause of the issue you have > with "gitk" is because you added something that is *NOT* a parent to > your commit. We shouldn't have to add a mechanism to filter > something that shouldn't have been added there in the first place. > > I am wondering if an alternative approach would work better. > > Imagine we invent a new tree entry type, "gitref", that is similar > to "gitlink" in that it can record a commit object name in a tree, > but unlike "gitlink" it does imply reachability. And you do not add > phony parents to your commit object. A tree that has "gitref"s in > it is about annotating the commits in the same repository (e.g. the > tree references two commits, "base" and "tip", to point into a slice > of the main history). And it is perfectly sensible for such a > pointer to imply reachability---after all it serves different > purposes from "gitlink". I absolutely agree with this, and I'd love to have gitref or similar in core git. Given the availability of that mechanism, I'd love to use it in git-series. (And in git submodule, as well, for other projects.) The one critical issue there, though: that would break backward compatibility with old versions of git. No old version of git could push, pull, gc, repack, or otherwise touch a repository that used this feature. The advantages of the approach (viewing and manipulating the series with pure git) seem sufficiently high to make that worth considering, but it is a significant downside. > Another alternative that I am negative about (but is probably a > better hack than how you abused the "parent" link) might be to add a > new commit object header field that behaves similarly to "parent" > only in that it implies reachability. But recording the extra > parent in commit object was not something you wanted to do in the > first place (i.e. your series processing is done solely on the > contents of the tree, and you do not read this extra parent). If you > need to add an in-tree reference to another commit in your future > versions of "git series", with either this variant or your original > implementation, you would end up needing adding more "parent" (or > pseudo parent) only to preserve reachability. At that point, I > think it makes more sense to have entries in the tree to directly > ensure reachability, if you want these entries to always point at an > in-tree object. This would similarly break compatibility with old git, as old git wouldn't follow those reachability-only links from commits, so it could throw away the data. One approach compatible with old git would be to continue adding the relevant commits as artificial parents, but have a separate commit metadata field that says which parents to ignore; old git would then do the right thing, as long as it doesn't rewrite the commit entirely. That does have the same disadvantages of having to duplicate the information in both the tree and the parent list, though; it's the same class of hack, just with improved usability. I'd much rather use gitrefs. > I am afraid that I probably am two steps ahead of myself, because I > am reasonably sure that it is quite possible that I have overlooked > something trivially obvious that makes the "gitref" approach > unworkable. gitref seems like a good idea to me, as long as we can sort out the compatibility story.
Re: [PATCH v3] transport: add protocol policy config option
On 11/04, Brandon Williams wrote: > Signed-off-by: Brandon WilliamsIs there an acceptable way to give credit to Jeff for helping with this patch? -- Brandon Williams
[PATCH v3] transport: add protocol policy config option
Previously the `GIT_ALLOW_PROTOCOL` environment variable was used to specify a whitelist of protocols to be used in clone/fetch/pull commands. This patch introduces new configuration options for more fine-grained control for allowing/disallowing protocols. This also has the added benefit of allowing easier construction of a protocol whitelist on systems where setting an environment variable is non-trivial. Now users can specify a policy to be used for each type of protocol via the 'protocol..allow' config option. A default policy for all unknown protocols can be set with the 'protocol.allow' config option. If no user configured default is made git, by default, will allow known-safe protocols (http, https, git, ssh, file), disallow known-dangerous protocols (ext), and have a default poliy of `user` for all other protocols. The supported policies are `always`, `never`, and `user`. The `user` policy can be used to configure a protocol to be usable when explicitly used by a user, while disallowing it for commands which run clone/fetch/pull commands without direct user intervention (e.g. recursive initialization of submodules). Commands which can potentially clone/fetch/pull from untrusted repositories without user intervention can export `GIT_PROTOCOL_FROM_USER` with a value of '0' to prevent protocols configured to the `user` policy from being used. Signed-off-by: Brandon Williams--- Documentation/config.txt | 25 Documentation/git.txt| 19 +++--- git-submodule.sh | 12 ++-- t/lib-proto-disable.sh | 132 +++ t/t5509-fetch-push-namespaces.sh | 1 + t/t5802-connect-helper.sh| 1 + transport.c | 73 +- 7 files changed, 235 insertions(+), 28 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 27069ac..5d845c4 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -2308,6 +2308,31 @@ pretty.:: Note that an alias with the same name as a built-in format will be silently ignored. +protocol.allow:: + If set, provide a user defined default policy for all protocols which + don't explicitly have a policy (protocol..allow). By default, + if unset, known-safe protocols (http, https, git, ssh, file) have a + default policy of `always`, known-dangerous protocols (ext) have a + default policy of `never`, and all other protocols have a default policy + of `user`. Supported policies: ++ +-- + +* `always` - protocol is always able to be used. + +* `never` - protocol is never able to be used. + +* `user` - protocol is only able to be used when `GIT_PROTOCOL_FROM_USER` is + either unset or has a value of 1. This policy should be used when you want a + protocol to be usable by the user but don't want it used by commands which + execute clone/fetch/pull commands without user input, e.g. recursive + submodule initialization. + +-- + +protocol..allow:: + Set a policy to be used by protocol with clone/fetch/pull commands. + pull.ff:: By default, Git does not create an extra merge commit when merging a commit that is a descendant of the current commit. Instead, the diff --git a/Documentation/git.txt b/Documentation/git.txt index ab7215e..ab25580 100644 --- a/Documentation/git.txt +++ b/Documentation/git.txt @@ -1150,13 +1150,13 @@ of clones and fetches. cloning a repository to make a backup). `GIT_ALLOW_PROTOCOL`:: - If set, provide a colon-separated list of protocols which are - allowed to be used with fetch/push/clone. This is useful to - restrict recursive submodule initialization from an untrusted - repository. Any protocol not mentioned will be disallowed (i.e., - this is a whitelist, not a blacklist). If the variable is not - set at all, all protocols are enabled. The protocol names - currently used by git are: + The new way to configure allowed protocols is done through the config + interface, though this setting takes precedences. See + linkgit:git-config[1] for more details. If set, provide a + colon-separated list of protocols which are allowed to be used with + fetch/push/clone. Any protocol not mentioned will be disallowed (i.e., + this is a whitelist, not a blacklist). The protocol names currently + used by git are: - `file`: any local file-based path (including `file://` URLs, or local paths) @@ -1174,6 +1174,11 @@ of clones and fetches. - any external helpers are named by their protocol (e.g., use `hg` to allow the `git-remote-hg` helper) +`GIT_PROTOCOL_FROM_USER`:: + Set to 0 to prevent protocols used by fetch/push/clone which are + configured to the `user` state. This is useful to restrict recursive + submodule initialization from an untrusted
Re: Regarding "git log" on "git series" metadata
On Fri, Nov 04, 2016 at 12:19:55PM -0700, Jacob Keller wrote: > I agree with your assessment here. The main difficulty in implementing > gitrefs is to ensure that they actually do get picked up by > reachability checks to prevent dropping commits. I'm not sure how easy > this is, but I would much rather we go this route rather than > continuing along with the hack. This seems like the ideal solution, > since it solves the entire problem and doesn't need more hacks bolted > on. I think the main complication is that the reachability rules are used during object transfer. So you'd probably want to introduce some protocol extension to say "I understand gitrefs", so that when one side says "I have sha1 X and its reachable objects", we know whether they are including gitrefs there. And likewise receivers with transfer.fsckObjects may complain about the new gitref tree mode (fortunately a new object type shouldn't be needed). You might also want fallback rules for storing gitrefs on "old" servers (e.g., backfilling gitrefs you need if the server didn't them in the initial fetch). But I guess storing any gitrefs on such a server is inherently dangerous, because the server might prune them at any time. So perhaps a related question is: how can gitrefs be designed such that existing servers reject them (rather than accepting the push and then later throwing away half the data). It would be easy to notice in the client during a push that we are sending gitrefs to a server which does not claim that capability. But it seems more robust if it is the server who decides "I will not accept these bogus objects". I haven't thought all that hard about this. That's just my initial thoughts on what sound hard. Tweaking the reachability code doesn't seem all that bad; we already know all of the spots that care about S_ISGITLINK(). It may even be that some of those spots work out of the box (because gitlinks are usually about telling the graph-walking code that we _don't_ care about reachability; we do by default for trees and blobs). I'd be surprised if all such sites work out of the box, though. Even if they see "ah, sha1 X is referenced by tree Y and isn't a gitlink, and therefore should be reachable", they need to also note that "X" is a commit and recursively walk its objects. -Peff
Re: Regarding "git log" on "git series" metadata
On Fri, Nov 4, 2016 at 6:57 PM, Junio C Hamanowrote: > > Imagine we invent a new tree entry type, "gitref", that is similar > to "gitlink" in that it can record a commit object name in a tree, > but unlike "gitlink" it does imply reachability. And you do not add > phony parents to your commit object. A tree that has "gitref"s in > it is about annotating the commits in the same repository (e.g. the > tree references two commits, "base" and "tip", to point into a slice > of the main history). And it is perfectly sensible for such a > pointer to imply reachability---after all it serves different > purposes from "gitlink". The more I think about this (and also about how to limit ref advertisements as recently discussed in https://public-inbox.org/git/20161024132932.i42rqn2vlpocq...@sigill.intra.peff.net/), the more I think about Shawn's RefTree: https://public-inbox.org/git/CAJo=hjvnapnaddcaawavu9c4rveqdos3ev9wtguhx4fd0v_...@mail.gmail.com/ Couldn't a RefTree be used to store refs that point to the base commit, the tip commit and the blob that contains the cover letter, and maybe also a ref pointing to the RefTree of the previous version of the series?
[PATCH 5/5] check-ref-format: New --stdin option
Signed-off-by: Ian Jackson--- Documentation/git-check-ref-format.txt | 10 -- builtin/check-ref-format.c | 34 +++--- 2 files changed, 39 insertions(+), 5 deletions(-) diff --git a/Documentation/git-check-ref-format.txt b/Documentation/git-check-ref-format.txt index e9a2657..5a213ce 100644 --- a/Documentation/git-check-ref-format.txt +++ b/Documentation/git-check-ref-format.txt @@ -10,8 +10,9 @@ SYNOPSIS [verse] 'git check-ref-format' [--report-errors] [--normalize] [--[no-]allow-onelevel] [--refspec-pattern] - -'git check-ref-format' [--report-errors] --branch +| --stdin +'git check-ref-format' [--report-errors] --branch +| --stdin DESCRIPTION --- @@ -109,6 +110,11 @@ OPTIONS If any ref does not check OK, print a message to stderr. (By default, git check-ref-format is silent.) +--stdin:: + Instead of checking on ref supplied on the command line, + read refs, one per line, from stdin. The exit status is + 0 if all the refs were OK. + EXAMPLES diff --git a/builtin/check-ref-format.c b/builtin/check-ref-format.c index 559d5c2..87f52fa 100644 --- a/builtin/check-ref-format.c +++ b/builtin/check-ref-format.c @@ -76,6 +76,7 @@ static int check_one_ref_format(const char *refname) int cmd_check_ref_format(int argc, const char **argv, const char *prefix) { int i; + int use_stdin = 0; if (argc == 2 && !strcmp(argv[1], "-h")) usage(builtin_check_ref_format_usage); @@ -93,6 +94,8 @@ int cmd_check_ref_format(int argc, const char **argv, const char *prefix) check_branch = 1; else if (!strcmp(argv[i], "--report-errors")) report_errors = 1; + else if (!strcmp(argv[i], "--stdin")) + use_stdin = 1; else usage(builtin_check_ref_format_usage); } @@ -100,8 +103,33 @@ int cmd_check_ref_format(int argc, const char **argv, const char *prefix) if (check_branch && (flags || normalize)) usage(builtin_check_ref_format_usage); - if (! (i == argc - 1)) - usage(builtin_check_ref_format_usage); + if (!use_stdin) { + if (! (i == argc - 1)) + usage(builtin_check_ref_format_usage); + + return check_one_ref_format(argv[i]); + } else { + char buffer[2048]; + int worst = 0; - return check_one_ref_format(argv[i]); + if (! (i == argc)) + usage(builtin_check_ref_format_usage); + + while (fgets(buffer, sizeof(buffer), stdin)) { + char *newline = strchr(buffer, '\n'); + if (!newline) { + fprintf(stderr, "%s --stdin: missing final newline or line too long\n", *argv); + exit(127); + } + *newline = 0; + int got = check_one_ref_format(buffer); + if (got > worst) + worst = got; + } + if (!feof(stdin)) { + perror("reading from stdin"); + exit(127); + } + return worst; + } } -- 2.10.1
[PATCH 4/5] check-ref-format: New --report-errors option
Signed-off-by: Ian Jackson--- Documentation/git-check-ref-format.txt | 8 ++-- builtin/check-ref-format.c | 10 -- 2 files changed, 14 insertions(+), 4 deletions(-) diff --git a/Documentation/git-check-ref-format.txt b/Documentation/git-check-ref-format.txt index 8611a99..e9a2657 100644 --- a/Documentation/git-check-ref-format.txt +++ b/Documentation/git-check-ref-format.txt @@ -8,10 +8,10 @@ git-check-ref-format - Ensures that a reference name is well formed SYNOPSIS [verse] -'git check-ref-format' [--normalize] +'git check-ref-format' [--report-errors] [--normalize] [--[no-]allow-onelevel] [--refspec-pattern] -'git check-ref-format' --branch +'git check-ref-format' [--report-errors] --branch DESCRIPTION --- @@ -105,6 +105,10 @@ OPTIONS with a status of 0. (`--print` is a deprecated way to spell `--normalize`.) +--report-errors:: + If any ref does not check OK, print a message to stderr. +(By default, git check-ref-format is silent.) + EXAMPLES diff --git a/builtin/check-ref-format.c b/builtin/check-ref-format.c index 020ebe8..559d5c2 100644 --- a/builtin/check-ref-format.c +++ b/builtin/check-ref-format.c @@ -9,7 +9,7 @@ static const char builtin_check_ref_format_usage[] = "git check-ref-format [--normalize] [] \n" -" or: git check-ref-format --branch "; +" or: git check-ref-format [] --branch "; /* * Return a copy of refname but with leading slashes removed and runs @@ -51,6 +51,7 @@ static int check_ref_format_branch(const char *arg) static int normalize = 0; static int check_branch = 0; static int flags = 0; +static int report_errors = 0; static int check_one_ref_format(const char *refname) { @@ -61,8 +62,11 @@ static int check_one_ref_format(const char *refname) got = check_branch ? check_ref_format_branch(refname) : check_refname_format(refname, flags); - if (got) + if (got) { + if (report_errors) + fprintf(stderr, "bad ref format: %s\n", refname); return 1; + } if (normalize) { printf("%s\n", refname); free((void*)refname); @@ -87,6 +91,8 @@ int cmd_check_ref_format(int argc, const char **argv, const char *prefix) flags |= REFNAME_REFSPEC_PATTERN; else if (!strcmp(argv[i], "--branch")) check_branch = 1; + else if (!strcmp(argv[i], "--report-errors")) + report_errors = 1; else usage(builtin_check_ref_format_usage); } -- 2.10.1
[PATCH 0/5] git check-ref-format --stdin --report-errors
I wanted to be able to syntax check lots of proposed refs quickly (please don't ask why - it's complicated!) So I added a --stdin option to git-check-ref-format. Also it has --report-errors now too so you can get some kind of useful error message if it complains. It's still not really a good batch mode but it's good enough for my use case. To improve it would involve a new command line option to offer a suitable stdout output format. There are three small refactoring patches and the two patches with new options and corresponding docs. Thanks for your attention. FYI I am not likely to need this again in the near future: it's a one-off use case. So my effort for rework is probably limited. I thought I'd share what I'd done in what I hope is a useful form, anyway. Regards, Ian.
[PATCH 2/5] check-ref-format: Refactor to make --branch code more common
We are going to want to permit other options with --branch. So, replace the special case with just an entry for --branch in the parser for ordinary options, and check for option compatibility at the end. No overall functional change. Signed-off-by: Ian Jackson--- builtin/check-ref-format.c | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/builtin/check-ref-format.c b/builtin/check-ref-format.c index 4d56caa..f12c19c 100644 --- a/builtin/check-ref-format.c +++ b/builtin/check-ref-format.c @@ -49,13 +49,19 @@ static int check_ref_format_branch(const char *arg) } static int normalize = 0; +static int check_branch = 0; static int flags = 0; static int check_one_ref_format(const char *refname) { + int got; + if (normalize) refname = collapse_slashes(refname); - if (check_refname_format(refname, flags)) + got = check_branch + ? check_ref_format_branch(refname) + : check_refname_format(refname, flags); + if (got) return 1; if (normalize) printf("%s\n", refname); @@ -68,9 +74,6 @@ int cmd_check_ref_format(int argc, const char **argv, const char *prefix) if (argc == 2 && !strcmp(argv[1], "-h")) usage(builtin_check_ref_format_usage); - if (argc == 3 && !strcmp(argv[1], "--branch")) - return check_ref_format_branch(argv[2]); - for (i = 1; i < argc && argv[i][0] == '-'; i++) { if (!strcmp(argv[i], "--normalize") || !strcmp(argv[i], "--print")) normalize = 1; @@ -80,9 +83,15 @@ int cmd_check_ref_format(int argc, const char **argv, const char *prefix) flags &= ~REFNAME_ALLOW_ONELEVEL; else if (!strcmp(argv[i], "--refspec-pattern")) flags |= REFNAME_REFSPEC_PATTERN; + else if (!strcmp(argv[i], "--branch")) + check_branch = 1; else usage(builtin_check_ref_format_usage); } + + if (check_branch && (flags || normalize)) + usage(builtin_check_ref_format_usage); + if (! (i == argc - 1)) usage(builtin_check_ref_format_usage); -- 2.10.1
[PATCH 1/5] check-ref-format: Refactor out check_one_ref_format
We are going to want to reuse this. No functional change right now. It currently has a hidden memory leak if --normalize is used. Signed-off-by: Ian Jackson--- builtin/check-ref-format.c | 26 ++ 1 file changed, 14 insertions(+), 12 deletions(-) diff --git a/builtin/check-ref-format.c b/builtin/check-ref-format.c index eac4994..4d56caa 100644 --- a/builtin/check-ref-format.c +++ b/builtin/check-ref-format.c @@ -48,12 +48,22 @@ static int check_ref_format_branch(const char *arg) return 0; } +static int normalize = 0; +static int flags = 0; + +static int check_one_ref_format(const char *refname) +{ + if (normalize) + refname = collapse_slashes(refname); + if (check_refname_format(refname, flags)) + return 1; + if (normalize) + printf("%s\n", refname); +} + int cmd_check_ref_format(int argc, const char **argv, const char *prefix) { int i; - int normalize = 0; - int flags = 0; - const char *refname; if (argc == 2 && !strcmp(argv[1], "-h")) usage(builtin_check_ref_format_usage); @@ -76,13 +86,5 @@ int cmd_check_ref_format(int argc, const char **argv, const char *prefix) if (! (i == argc - 1)) usage(builtin_check_ref_format_usage); - refname = argv[i]; - if (normalize) - refname = collapse_slashes(refname); - if (check_refname_format(refname, flags)) - return 1; - if (normalize) - printf("%s\n", refname); - - return 0; + return check_one_ref_format(argv[i]); } -- 2.10.1
[PATCH 3/5] check-ref-format: Abolish leak of collapsed refname
collapse_slashes always returns a value from xmallocz. Right now this leak is not very interesting, since we only call check_one_ref_format once. Signed-off-by: Ian Jackson--- builtin/check-ref-format.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/builtin/check-ref-format.c b/builtin/check-ref-format.c index f12c19c..020ebe8 100644 --- a/builtin/check-ref-format.c +++ b/builtin/check-ref-format.c @@ -63,8 +63,10 @@ static int check_one_ref_format(const char *refname) : check_refname_format(refname, flags); if (got) return 1; - if (normalize) + if (normalize) { printf("%s\n", refname); + free((void*)refname); + } } int cmd_check_ref_format(int argc, const char **argv, const char *prefix) -- 2.10.1
Re: Regarding "git log" on "git series" metadata
On Fri, Nov 4, 2016 at 10:57 AM, Junio C Hamanowrote: > I think this is backwards. The root cause of the issue you have > with "gitk" is because you added something that is *NOT* a parent to > your commit. We shouldn't have to add a mechanism to filter > something that shouldn't have been added there in the first place. > > I am wondering if an alternative approach would work better. > > Imagine we invent a new tree entry type, "gitref", that is similar > to "gitlink" in that it can record a commit object name in a tree, > but unlike "gitlink" it does imply reachability. And you do not add > phony parents to your commit object. A tree that has "gitref"s in > it is about annotating the commits in the same repository (e.g. the > tree references two commits, "base" and "tip", to point into a slice > of the main history). And it is perfectly sensible for such a > pointer to imply reachability---after all it serves different > purposes from "gitlink". > I agree with your assessment here. The main difficulty in implementing gitrefs is to ensure that they actually do get picked up by reachability checks to prevent dropping commits. I'm not sure how easy this is, but I would much rather we go this route rather than continuing along with the hack. This seems like the ideal solution, since it solves the entire problem and doesn't need more hacks bolted on. It would of course mean some work for people who previously used git series as you would want to re-write the commits to drop the parent links and become gitrefs instead of gitlinks. However, this can (probably?) be solved by some sort of use of the filter-branch code. I don't think you've hit upon any trivially obvious unworkable things. It is probably somewhat complex to make the reachability checks detect in-tree gitrefs but I don't think it would be impossible. Thanks, Jake
Regarding "git log" on "git series" metadata
After your talk at LPC2016, I was thinking about your proposal to give an option to hide certain parents from "git log" traversal. While I do not think we would terribly mind a new feature in the core to support third-party additions like "git series" better, I think this particular one is a big mistake that we shouldn't take. For those listening from sidelines, here is a version of my understanding of "git series": * "git series" wants to represent a patch series evolution. It is a history of history, and each element of this evolution is represented by: - a commit object, that is used to describe what this reroll of the topic is about, and its parent links point at previous rerolls (it could be a merge of two independent incarnations of a series). - the tree contained in the commit object records the base commit where the topic forks from the main history, and the tip commit where the topic ends. These are pointers into the main history DAG. - the tree may have other metadata, an example of which is the cover letter contents to be used when the topic becomes ready for re-submission. There may be more metadata you would want to add in the future versions of "git series". Needless to say, the commits that represent the history of a series record a tree that is completely differently shaped. The only relation between the series history and main history is that the former has pointers into the latter. * You chose to represent the base and tip commit object as gitlinks in the tree of a series commit, simply because it was a way that was already implemented to record a commit object name in a tree. * However, because gitlink is designed to be used for "external" things (the prominent example is submodule), recording these as gitlinks would guarantee that they will get GCed as a series progresses, the main history rewound and rewritten thereby making the base and tip recorded in the older part of the series history unreachable from the main history. Because you want to make sure that base and tip objects will stay in the repository even after the topic branch in the main history gets rewound, this is not what you want. * In order to workaround that reachability issue, the hack you invented is to add the tip commit as a "parent" of a commit that represents one step in the series. This may guarantee the reachability---as long as a commit in a series history is reachable from a ref, the tip and base commits will be reachable from there even if they are rebased away from the main history. But of course, there are downsides. * Due to this hack, feeding "gitk" (or "git log") a commit in the series history will give you nonsense results. You are not interested in traversing or viewing the commits in the main history. * Because of the above, you propose another hack to tell the revision traversal machinery to optionally omit a parent commit that appear as a gitlink in the tree. I think this is backwards. The root cause of the issue you have with "gitk" is because you added something that is *NOT* a parent to your commit. We shouldn't have to add a mechanism to filter something that shouldn't have been added there in the first place. I am wondering if an alternative approach would work better. Imagine we invent a new tree entry type, "gitref", that is similar to "gitlink" in that it can record a commit object name in a tree, but unlike "gitlink" it does imply reachability. And you do not add phony parents to your commit object. A tree that has "gitref"s in it is about annotating the commits in the same repository (e.g. the tree references two commits, "base" and "tip", to point into a slice of the main history). And it is perfectly sensible for such a pointer to imply reachability---after all it serves different purposes from "gitlink". Another alternative that I am negative about (but is probably a better hack than how you abused the "parent" link) might be to add a new commit object header field that behaves similarly to "parent" only in that it implies reachability. But recording the extra parent in commit object was not something you wanted to do in the first place (i.e. your series processing is done solely on the contents of the tree, and you do not read this extra parent). If you need to add an in-tree reference to another commit in your future versions of "git series", with either this variant or your original implementation, you would end up needing adding more "parent" (or pseudo parent) only to preserve reachability. At that point, I think it makes more sense to have entries in the tree to directly ensure reachability, if you want these entries to always point at an in-tree object. I am afraid that I probably am two steps ahead of myself, because I am reasonably sure that it is quite possible that I have overlooked something trivially
Re: [PATCH] diffcore-delta: remove unused parameter to diffcore_count_changes()
On 2016-11-04 at 17:37:14 +0100, Jeff Kingwrote: > On Fri, Nov 04, 2016 at 11:24:36AM +0100, Tobias Klauser wrote: > > > The delta_limit parameter to diffcore_count_changes() has been unused > > since commit c06c79667c95 ("diffcore-rename: somewhat optimized."). > > Remove the parameter and adjust all callers. > > Sounds like a good idea to get rid of an unused parameter, but I think > this went away in ba23bbc8e (diffcore-delta: make change counter to byte > oriented again., 2006-03-04). Ugh, I must have fat-fingered the commit id. Will update the description accordingly for v2. Thanks! Tobias
Re: [PATCH] branch: remove unused parameter to create_branch()
On 2016-11-04 at 17:30:12 +0100, Jeff Kingwrote: > On Fri, Nov 04, 2016 at 04:19:49PM +0100, Tobias Klauser wrote: > > > The name parameter to create_branch() has been unused since commit > > 55c4a673070f ("Prevent force-updating of the current branch"). Remove > > the parameter and adjust the callers accordingly. Also remove the > > parameter from the function's documentation comment. > > This seemed eerily familiar, and it turns out I wrote this as a > preparatory step for a different topic a while back, but never finished > it. > > So clearly a good change, though we might want to explain a bit more why > it's correct that the parameter is unused. Here's what I wrote: > > This function used to have the caller pass in the current value of > HEAD, in order to make sure we didn't clobber HEAD. In 55c4a6730, > that logic moved to validate_new_branchname(), which just resolves > HEAD itself. The parameter to create_branch is now unused. Ah, I didn't know about the history of this parameter. It clearly makes sense to explain this in the patch description. > I also ended up reformatting the documentation comment, but that's > purely optional. My patch is below for reference. Feel free to grab any > bits of it that you agree with. I like your documentation comment much better as IMO it's easier to read and to identify the individual parameters. Given these facts, I guess it's better if my patch is dropped and yours is applied instead :) Thanks a lot!
Re: [PATCH] diffcore-delta: remove unused parameter to diffcore_count_changes()
On Fri, Nov 04, 2016 at 11:24:36AM +0100, Tobias Klauser wrote: > The delta_limit parameter to diffcore_count_changes() has been unused > since commit c06c79667c95 ("diffcore-rename: somewhat optimized."). > Remove the parameter and adjust all callers. Sounds like a good idea to get rid of an unused parameter, but I think this went away in ba23bbc8e (diffcore-delta: make change counter to byte oriented again., 2006-03-04). The patch itself looks good. -Peff
Re: git -C has unexpected behaviour
Hi Felix, On Fri, 4 Nov 2016, Felix Nairz wrote: > Now, to the unexpected part, which I think is a bug: > > If the TestData folder is there, but empty (I deleted all the files), > then running > > git -C .\TestData reset --hard > > will NOT throw me an error but run > > git reset --hard > > on the git repository (not the submodule in the sub-directory!), > without warning, or error. I *think* that this is actually intended. Please note that -C is *not* a synonym of --git-dir. It is more of a synonym of cd .\TestData git reset --hard In other words, you probably expected -C to specify the top-level directory of a worktree, and you expected Git to error out when it is not. Instead, Git will treat -C as a directory *from where to start*; It *can* be a subdirectory of the top-level directory of the worktree. Please note that git --git-dir=.\TestData\.git reset --hard will not work, though, as it mistakes the current directory for the top-level directory of the worktree. What you want to do is probably git -C .\TestData --git-dir=.git reset --hard This will tell Git to change the current working directory to the top-level of your intended worktree, *and* state that the repository needs to be in .git (which can be a file containing "gitdir: ", which is the default in submodules). If the repository is *not* found, this command will exit with a failure. Ciao, Johannes
Re: git -C has unexpected behaviour
On Fri, Nov 4, 2016 at 7:28 AM, Felix Nairzwrote: > Hi guys, > > I ran into some really weird git behaviour today. > > My git --version is: git version 2.8.1.windows.1 > > We have a git repository with a submodule called TestData. The data in > there is modified and reset as part of our unit tests. > > The submodule is a sub-folder of the git repository called TestData. > So the relative path from the git repository to the submodule is > .\TestData > > If I delete the entire TestData folder and run > git -C .\TestData reset --hard > > I will get the following error: > git : fatal: Cannot change to '.\TestData': No such file or directory > This is as expected. > > > Now, to the unexpected part, which I think is a bug: > > If the TestData folder is there, but empty (I deleted all the files), And "all the files" includes the ".git" file, which git uses to find out if it is in a git repository. So it keeps walking up until it finds a .git file/directory, which is the parent project. Once a git directory is found, the main function of git initializes some data structures, e.g. the "path prefix" inside the repository which would be " .\TestData" in your case. then the actual command is found and run. So what it is doing is: "Suppose you are in the TestData directory of the parent project and then run the command ..." My gut reaction was to propose to check if any GITLINK (submodule) is a prefix of said "path prefix" and then rather initialize and operate on the submodule. However I do not think this is a good idea: * Git wants to be fast and checking if we are in any submodule slows down the common case. * Historically commands in un-initialized or deinitialized submodules behave as if in the parent project. I think if we'd fix this issue, other people would complain as their workflow is harmed. > > Because of this we have had losses of uncommitted changes a few times > now (loosing a few days of work, and getting a bit paranoid), * commit early, commit often such that the losses are less than a few days. * do not remove the submodule directory as a whole thing (make sure the .git file is there and not wiped) * instead use "git -C TestData clean -dffx && git -C TestData reset --hard" https://git-scm.com/docs/git-clean > could never find the root cause for this until today, where I found > out that it happens when the TestData directory is empty. I am undecided if it is a bug or a feature. Fixing this as a bug would be a performance penalty. Not fixing it may incur data losses. I dunno. > > Thank for looking into this, and I am looking forward to hear your > opinions about this. > > Best Regards, Felix Nairz
Re: [PATCH] branch: remove unused parameter to create_branch()
On Fri, Nov 04, 2016 at 04:19:49PM +0100, Tobias Klauser wrote: > The name parameter to create_branch() has been unused since commit > 55c4a673070f ("Prevent force-updating of the current branch"). Remove > the parameter and adjust the callers accordingly. Also remove the > parameter from the function's documentation comment. This seemed eerily familiar, and it turns out I wrote this as a preparatory step for a different topic a while back, but never finished it. So clearly a good change, though we might want to explain a bit more why it's correct that the parameter is unused. Here's what I wrote: This function used to have the caller pass in the current value of HEAD, in order to make sure we didn't clobber HEAD. In 55c4a6730, that logic moved to validate_new_branchname(), which just resolves HEAD itself. The parameter to create_branch is now unused. I also ended up reformatting the documentation comment, but that's purely optional. My patch is below for reference. Feel free to grab any bits of it that you agree with. -- >8 -- Subject: [PATCH] create_branch: drop unused "head" parameter This function used to have the caller pass in the current value of HEAD, in order to make sure we didn't clobber HEAD. In 55c4a6730, that logic moved to validate_new_branchname(), which just resolves HEAD itself. The parameter to create_branch is now unused. Since we have to update and re-wrap the docstring describing the parameters anyway, let's take this opportunity to break it out into a list, which makes it easier to find the parameters. Signed-off-by: Jeff King--- branch.c | 3 +-- branch.h | 22 ++ builtin/branch.c | 4 ++-- builtin/checkout.c | 2 +- 4 files changed, 18 insertions(+), 13 deletions(-) diff --git a/branch.c b/branch.c index a5a8dcbd0..0d459b3cf 100644 --- a/branch.c +++ b/branch.c @@ -228,8 +228,7 @@ N_("\n" "will track its remote counterpart, you may want to use\n" "\"git push -u\" to set the upstream config as you push."); -void create_branch(const char *head, - const char *name, const char *start_name, +void create_branch(const char *name, const char *start_name, int force, int reflog, int clobber_head, int quiet, enum branch_track track) { diff --git a/branch.h b/branch.h index b2f964933..3103eb9ad 100644 --- a/branch.h +++ b/branch.h @@ -4,15 +4,21 @@ /* Functions for acting on the information about branches. */ /* - * Creates a new branch, where head is the branch currently checked - * out, name is the new branch name, start_name is the name of the - * existing branch that the new branch should start from, force - * enables overwriting an existing (non-head) branch, reflog creates a - * reflog for the branch, and track causes the new branch to be - * configured to merge the remote branch that start_name is a tracking - * branch for (if any). + * Creates a new branch, where: + * + * - name is the new branch name + * + * - start_name is the name of the existing branch that the new branch should + * start from + * + * - force enables overwriting an existing (non-head) branch + * + * - reflog creates a reflog for the branch + * + * - track causes the new branch to be configured to merge the remote branch + * that start_name is a tracking branch for (if any). */ -void create_branch(const char *head, const char *name, const char *start_name, +void create_branch(const char *name, const char *start_name, int force, int reflog, int clobber_head, int quiet, enum branch_track track); diff --git a/builtin/branch.c b/builtin/branch.c index d5d93a8c0..60cc5c8e8 100644 --- a/builtin/branch.c +++ b/builtin/branch.c @@ -807,7 +807,7 @@ int cmd_branch(int argc, const char **argv, const char *prefix) * create_branch takes care of setting up the tracking * info and making sure new_upstream is correct */ - create_branch(head, branch->name, new_upstream, 0, 0, 0, quiet, BRANCH_TRACK_OVERRIDE); + create_branch(branch->name, new_upstream, 0, 0, 0, quiet, BRANCH_TRACK_OVERRIDE); } else if (unset_upstream) { struct branch *branch = branch_get(argv[0]); struct strbuf buf = STRBUF_INIT; @@ -853,7 +853,7 @@ int cmd_branch(int argc, const char **argv, const char *prefix) strbuf_release(); branch_existed = ref_exists(branch->refname); - create_branch(head, argv[0], (argc == 2) ? argv[1] : head, + create_branch(argv[0], (argc == 2) ? argv[1] : head, force, reflog, 0, quiet, track); /* diff --git a/builtin/checkout.c b/builtin/checkout.c index 9b2a5b31d..512492aad 100644 --- a/builtin/checkout.c +++ b/builtin/checkout.c @@ -630,7 +630,7 @@ static void update_refs_for_switch(const struct
[PATCH] branch: remove unused parameter to create_branch()
The name parameter to create_branch() has been unused since commit 55c4a673070f ("Prevent force-updating of the current branch"). Remove the parameter and adjust the callers accordingly. Also remove the parameter from the function's documentation comment. Signed-off-by: Tobias Klauser--- branch.c | 3 +-- branch.h | 15 +++ builtin/branch.c | 4 ++-- builtin/checkout.c | 2 +- 4 files changed, 11 insertions(+), 13 deletions(-) diff --git a/branch.c b/branch.c index a5a8dcbd0ed9..0d459b3cfe50 100644 --- a/branch.c +++ b/branch.c @@ -228,8 +228,7 @@ N_("\n" "will track its remote counterpart, you may want to use\n" "\"git push -u\" to set the upstream config as you push."); -void create_branch(const char *head, - const char *name, const char *start_name, +void create_branch(const char *name, const char *start_name, int force, int reflog, int clobber_head, int quiet, enum branch_track track) { diff --git a/branch.h b/branch.h index b2f964933270..8e63d1b6f964 100644 --- a/branch.h +++ b/branch.h @@ -4,15 +4,14 @@ /* Functions for acting on the information about branches. */ /* - * Creates a new branch, where head is the branch currently checked - * out, name is the new branch name, start_name is the name of the - * existing branch that the new branch should start from, force - * enables overwriting an existing (non-head) branch, reflog creates a - * reflog for the branch, and track causes the new branch to be - * configured to merge the remote branch that start_name is a tracking - * branch for (if any). + * Creates a new branch, where name is the new branch name, start_name + * is the name of the existing branch that the new branch should start + * from, force enables overwriting an existing (non-head) branch, reflog + * creates a reflog for the branch, and track causes the new branch to + * be configured to merge the remote branch that start_name is a + * tracking branch for (if any). */ -void create_branch(const char *head, const char *name, const char *start_name, +void create_branch(const char *name, const char *start_name, int force, int reflog, int clobber_head, int quiet, enum branch_track track); diff --git a/builtin/branch.c b/builtin/branch.c index d5d93a8c03fe..60cc5c8e8da0 100644 --- a/builtin/branch.c +++ b/builtin/branch.c @@ -807,7 +807,7 @@ int cmd_branch(int argc, const char **argv, const char *prefix) * create_branch takes care of setting up the tracking * info and making sure new_upstream is correct */ - create_branch(head, branch->name, new_upstream, 0, 0, 0, quiet, BRANCH_TRACK_OVERRIDE); + create_branch(branch->name, new_upstream, 0, 0, 0, quiet, BRANCH_TRACK_OVERRIDE); } else if (unset_upstream) { struct branch *branch = branch_get(argv[0]); struct strbuf buf = STRBUF_INIT; @@ -853,7 +853,7 @@ int cmd_branch(int argc, const char **argv, const char *prefix) strbuf_release(); branch_existed = ref_exists(branch->refname); - create_branch(head, argv[0], (argc == 2) ? argv[1] : head, + create_branch(argv[0], (argc == 2) ? argv[1] : head, force, reflog, 0, quiet, track); /* diff --git a/builtin/checkout.c b/builtin/checkout.c index 9b2a5b31d423..512492aad909 100644 --- a/builtin/checkout.c +++ b/builtin/checkout.c @@ -630,7 +630,7 @@ static void update_refs_for_switch(const struct checkout_opts *opts, } } else - create_branch(old->name, opts->new_branch, new->name, + create_branch(opts->new_branch, new->name, opts->new_branch_force ? 1 : 0, opts->new_branch_log, opts->new_branch_force ? 1 : 0, -- 2.9.0
git -C has unexpected behaviour
Hi guys, I ran into some really weird git behaviour today. My git --version is: git version 2.8.1.windows.1 We have a git repository with a submodule called TestData. The data in there is modified and reset as part of our unit tests. The submodule is a sub-folder of the git repository called TestData. So the relative path from the git repository to the submodule is .\TestData If I delete the entire TestData folder and run git -C .\TestData reset --hard I will get the following error: git : fatal: Cannot change to '.\TestData': No such file or directory This is as expected. Now, to the unexpected part, which I think is a bug: If the TestData folder is there, but empty (I deleted all the files), then running git -C .\TestData reset --hard will NOT throw me an error but run git reset --hard on the git repository (not the submodule in the sub-directory!), without warning, or error. This is easy to reproduce by having an empty .\TestData folder, and just changing any file in your git repository before running git -C .\TestData reset --hard and seeing the local file changes gone. Because of this we have had losses of uncommitted changes a few times now (loosing a few days of work, and getting a bit paranoid), but could never find the root cause for this until today, where I found out that it happens when the TestData directory is empty. Thank for looking into this, and I am looking forward to hear your opinions about this. Best Regards, Felix Nairz
[no subject]
Bug.private.program.
[PATCH] diffcore-delta: remove unused parameter to diffcore_count_changes()
The delta_limit parameter to diffcore_count_changes() has been unused since commit c06c79667c95 ("diffcore-rename: somewhat optimized."). Remove the parameter and adjust all callers. Signed-off-by: Tobias Klauser--- diff.c| 2 +- diffcore-break.c | 1 - diffcore-delta.c | 1 - diffcore-rename.c | 4 diffcore.h| 1 - 5 files changed, 1 insertion(+), 8 deletions(-) diff --git a/diff.c b/diff.c index 8981477c436d..ec8728362dae 100644 --- a/diff.c +++ b/diff.c @@ -2023,7 +2023,7 @@ static void show_dirstat(struct diff_options *options) if (DIFF_FILE_VALID(p->one) && DIFF_FILE_VALID(p->two)) { diff_populate_filespec(p->one, 0); diff_populate_filespec(p->two, 0); - diffcore_count_changes(p->one, p->two, NULL, NULL, 0, + diffcore_count_changes(p->one, p->two, NULL, NULL, , ); diff_free_filespec_data(p->one); diff_free_filespec_data(p->two); diff --git a/diffcore-break.c b/diffcore-break.c index 881a74f29e4f..c64359f489c8 100644 --- a/diffcore-break.c +++ b/diffcore-break.c @@ -73,7 +73,6 @@ static int should_break(struct diff_filespec *src, if (diffcore_count_changes(src, dst, >cnt_data, >cnt_data, - 0, _copied, _added)) return 0; diff --git a/diffcore-delta.c b/diffcore-delta.c index 2ebedb32d18a..ebe70fb06851 100644 --- a/diffcore-delta.c +++ b/diffcore-delta.c @@ -166,7 +166,6 @@ int diffcore_count_changes(struct diff_filespec *src, struct diff_filespec *dst, void **src_count_p, void **dst_count_p, - unsigned long delta_limit, unsigned long *src_copied, unsigned long *literal_added) { diff --git a/diffcore-rename.c b/diffcore-rename.c index 54a2396653df..f7444c86bde3 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -145,7 +145,6 @@ static int estimate_similarity(struct diff_filespec *src, * call into this function in that case. */ unsigned long max_size, delta_size, base_size, src_copied, literal_added; - unsigned long delta_limit; int score; /* We deal only with regular files. Symlink renames are handled @@ -191,11 +190,8 @@ static int estimate_similarity(struct diff_filespec *src, if (!dst->cnt_data && diff_populate_filespec(dst, 0)) return 0; - delta_limit = (unsigned long) - (base_size * (MAX_SCORE-minimum_score) / MAX_SCORE); if (diffcore_count_changes(src, dst, >cnt_data, >cnt_data, - delta_limit, _copied, _added)) return 0; diff --git a/diffcore.h b/diffcore.h index c11b8465fc8e..623024135478 100644 --- a/diffcore.h +++ b/diffcore.h @@ -142,7 +142,6 @@ extern int diffcore_count_changes(struct diff_filespec *src, struct diff_filespec *dst, void **src_count_p, void **dst_count_p, - unsigned long delta_limit, unsigned long *src_copied, unsigned long *literal_added); -- 2.9.0