[PATCH] Make git log work for git CWD outside of work tree
Make git log's `--use-mailmap` argument works if the GIT_DIR & GIT_WORK_TREE env vars are set and git is run from outside of work tree. Without the NEED_WORK_TREE set on the log subcommand, .mailmap is silently not found. Signed-off-by: Danny Sauer--- Notes: I'm not entirely sure if this is the best way to fix it, as my git internals knowledge is pretty weak. But this /seems/ reasonable to me, and passes all of the current test cases. If there's a more appropriate way to make `--use-mailmap` work properly when `git log` is run outside of the tree, I'd be excited about the opportunity to learn. :) git.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/git.c b/git.c index 8ff44f0..e147f01 100644 --- a/git.c +++ b/git.c @@ -440,7 +440,7 @@ static struct cmd_struct commands[] = { { "init", cmd_init_db }, { "init-db", cmd_init_db }, { "interpret-trailers", cmd_interpret_trailers, RUN_SETUP_GENTLY }, - { "log", cmd_log, RUN_SETUP }, + { "log", cmd_log, RUN_SETUP | NEED_WORK_TREE }, { "ls-files", cmd_ls_files, RUN_SETUP | SUPPORT_SUPER_PREFIX }, { "ls-remote", cmd_ls_remote, RUN_SETUP_GENTLY }, { "ls-tree", cmd_ls_tree, RUN_SETUP }, -- 2.7.4
Re: Tools that do an automatic fetch defeat "git push --force-with-lease"
On Sat, Apr 8, 2017 at 3:13 PM, Jeff Kingwrote: > On Sat, Apr 08, 2017 at 02:54:29PM -0700, Jacob Keller wrote: > >> > So the best way to use it is something like: >> > >> > git fetch ;# update 'master' from remote >> > git tag base master;# mark our base point >> > git rebase -i master ;# rewrite some commits >> > git push --force-with-lease=master:base master:master >> [...] >> >> What if we added a separate command something like: >> >> git create-lease >> >> which you're expected to run at the start of a rewind-y operation and >> it creates a tag (or some other ref like a tag but in a different >> namespace) which is used by force-with-lease? > > So then you replace that "git tag" command above with "git > create-lease". I think it's an incremental improvement because it would > probably record which branch you're leasing. But I think the more > important issue is that the user needs to remember to take the lease in > the first place. A create-lease command doesn't help that. > Well, if we don't mind backwards compatibility breaking, we could require that uesrs run create-lease, and refuse to accept anything that isn't a lease ref? That would force the user to have created a lease which should help? But that IS backwards incompatible... >> It might be possible to generate these lease tags prior to operations >> which modify history and then maybe having a way to list them so you >> can select which one you meant when you try to use force-with-lease.. > > So yeah, I think that is the more interesting direction. I hadn't > considered resolving the multiple-operation ambiguity at push time. But > I guess it would be something like "you did a rebase on sha1 X at time > T, and then one on Y at time T+N", and you pick which one you're > expecting. Or maybe it could even cull old leases that are still > ancestors of your push (so old, already-pushed rebases aren't > mentioned). I suspect that ends up being equivalent to "clear the leases > when you push". And I think that may be converging on the "integrate" > refs that Stefan is talking about elsewhere (or some isomorphism of it). > > -Peff Yea, I agree this sort of direction is nicer. Thanks, Jake
Re: Tools that do an automatic fetch defeat "git push --force-with-lease"
On Sat, Apr 08, 2017 at 02:54:29PM -0700, Jacob Keller wrote: > > So the best way to use it is something like: > > > > git fetch ;# update 'master' from remote > > git tag base master;# mark our base point > > git rebase -i master ;# rewrite some commits > > git push --force-with-lease=master:base master:master > [...] > > What if we added a separate command something like: > > git create-lease > > which you're expected to run at the start of a rewind-y operation and > it creates a tag (or some other ref like a tag but in a different > namespace) which is used by force-with-lease? So then you replace that "git tag" command above with "git create-lease". I think it's an incremental improvement because it would probably record which branch you're leasing. But I think the more important issue is that the user needs to remember to take the lease in the first place. A create-lease command doesn't help that. > It might be possible to generate these lease tags prior to operations > which modify history and then maybe having a way to list them so you > can select which one you meant when you try to use force-with-lease.. So yeah, I think that is the more interesting direction. I hadn't considered resolving the multiple-operation ambiguity at push time. But I guess it would be something like "you did a rebase on sha1 X at time T, and then one on Y at time T+N", and you pick which one you're expecting. Or maybe it could even cull old leases that are still ancestors of your push (so old, already-pushed rebases aren't mentioned). I suspect that ends up being equivalent to "clear the leases when you push". And I think that may be converging on the "integrate" refs that Stefan is talking about elsewhere (or some isomorphism of it). -Peff
Re: broken text encoding in commit author name
Hi Michał, On 08/04/2017 22:30, Michał Walenciak wrote: > Hi > > since git 2.12 I'm having random problems with broken text enciding in author > name. > > As an example broken commit: > Author: MichaÅ Walenciak2017-04-07 21:38:23 > Committer: Michał Walenciak 2017-04-07 21:38:33 > > good commit: > Author: Michał Walenciak 2017-04-08 21:21:15 > Committer: Michał Walenciak 2017-04-08 21:21:15 > > Broken commits are quite rare and I cannot find any rule when they happen. > Git 2.12.2 here, Arch linux. For what it`s worth, looking at the timestamp of the broken commit example you`ve shown, could the problem be related to amending commits? If so, are you amending through the command line, or maybe using some tool (like "git-gui")...? There was a similar discussion recently[1], not sure if it`s related to your case. [1] https://public-inbox.org/git/CAEPqvoz8s=CVLABuXx-zOzryrXwr_cb39G2TYJvvF xryzpc...@mail.gmail.com/ Regards, Buga
Re: Tools that do an automatic fetch defeat "git push --force-with-lease"
On Sat, Apr 08, 2017 at 05:03:06PM +0200, Stefan Haller wrote: > Jeff Kingwrote: > > > I think Matt's point is just that the default, to use origin/branch, is > > unsafe. It's convenient when you don't have extra fetches, but that > > convenience may not be worth the potential surprise. > > I don't think "surprise" is the right word here. The point of > --force-with-lease is to provide a guarantee, so if you can't trust the > guarantee, it makes the feature rather pointless. You can trust it under a certain set of conditions. If you break those conditions, it doesn't work. So that's why I said "surprise": most users aren't going to be aware of those conditions. But I also used the word "unsafe". That, or "dangerous", is certainly accurate. And I don't at all disagree that the situation should be improved. -Peff
Re: Tools that do an automatic fetch defeat "git push --force-with-lease"
On Sat, Apr 8, 2017 at 2:29 AM, Jeff Kingwrote: > On Sat, Apr 08, 2017 at 09:35:04AM +0200, Ævar Arnfjörð Bjarmason wrote: > >> Is it correct that you'd essentially want something that works like: >> >> git push --force-with-lease=master:master origin master:master > > I don't think that would do anything useful. It would reject any push > where the remote "master" is not the same as your own master. And of > course if they _are_ the same, then the push is a noop. > >> I haven't used this feature but I'm surprised it works the way it >> does, as you point out just having your remote refs updated isn't a >> strong signal for wanting to clobber whatever that ref points to. > > The point of the --force-with-lease feature is that you would mark a > point in time where you started some rewind-y operation (like a rebase), > and at the end you would want to make sure nobody had moved the remote > ref when you push over it (which needs to be a force because of the > rewind). > > So the best way to use it is something like: > > git fetch ;# update 'master' from remote > git tag base master;# mark our base point > git rebase -i master ;# rewrite some commits > git push --force-with-lease=master:base master:master > > That final operation will fail if somebody else pushed in the meantime. > But obviously this workflow is a pain, because you have to manually mark > the start of the unsafe operation with a tag. > > If you haven't fetched in the meantime, then origin/master is a good > approximation of "base". But if you have fetched, then it is worthless. > > It would be nice if we could automatically deduce the real value of > base. I don't think we could do it in a foolproof way, but I wonder if > we could come close in some common circumstances. For instance, imagine > that unsafe operations like rebase would note that "master" has an > upstream of "origin/master", and would record a note saying "we took a > lease for origin/master at sha1 X". > > One trouble with that is that you may perform several unsafe operations. > For example, imagine it takes you multiple rebases to achieve your final > state: > > git fetch > git rebase -i master > git rebase -i master > git push --force-with-lease=master > > and that --force-with-lease now defaults to whatever lease-marker is > left by rebase. Which marker should it respect? If the second one, then > it's unsafe. But if the first one, then how do we deal with stale > markers? > > Perhaps it would be enough to reset the markers whenever the ref is > pushed. I haven't thought it through well enough to know whether that > just hits more corner cases. > > -Peff What if we added a separate command something like: git create-lease which you're expected to run at the start of a rewind-y operation and it creates a tag (or some other ref like a tag but in a different namespace) which is used by force-with-lease? However, I think using origin/master works fine as long as you don't auto-fetch. If you're doing it right, you can handle origin/master updates by checking that your rewind-y stuff is correct for the new origin/master RIGHT before you push. The tricky part here is that you have to remember to check after every fetch before you push. This is why I would always create my own tag or lease reference prior to the use. It might be possible to generate these lease tags prior to operations which modify history and then maybe having a way to list them so you can select which one you meant when you try to use force-with-lease.. Thanks, Jake
Re: [PATCH] submodule: prevent backslash expantion in submodule names
Am 08.04.2017 um 12:59 schrieb Jeff King: The reason I mentioned escaping earlier is I wondered what would happen when the submodule starts with a double-quote, or has a newline in the name. I have tested newlines within the name, these work fine. I also tested double and single quotes within the name, but not at beginning or end. So I think your patch is fine there. But it does raise a few concerns. It looks like git-submodule does not cope well with exotic filenames: $ git submodule add /some/repo "$(printf 'sub with\nnewline')" Cloning into '/home/peff/tmp/sub with newline'... done. error: invalid key (newline): submodule.sub with newline.url error: invalid key (newline): submodule.sub with newline.path Failed to register submodule 'sub with newline' Strange. I'm running essentially the same kind of request, and things work fine. Might be due to me using Python3 instead of bash, or maybe due to different versions of git. If anybody is interested, I can publish my test code on github, it was scheduled to land there anyway. I'm not too worried about that. It's a nonsense request, and our config format has no syntactic mechanism to represent that key. Oh. I've been thinking that the quoted format is exactly for that kind of stuff. Though it might be prone to eol conversion if a submodule name contains crlf sequences. Also, funny behavour. Experience has taught me that funny behaviour, if it isn't exploitable today, may combine with some new funny behaviour in a future version of the same software. So I'm worried even with that. This is starting to look like a can of worms to me... one way to "close the lid" would be if git * defined what's a valid submodule name, * rejected invalid submodule names, and * documented validity rules in the git-submodule docs. YMMV, just my 2 cents :-) Regards, Jo
broken text encoding in commit author name
Hi since git 2.12 I'm having random problems with broken text enciding in author name. As an example broken commit: Author: MichaÅ Walenciak2017-04-07 21:38:23 Committer: Michał Walenciak 2017-04-07 21:38:33 good commit: Author: Michał Walenciak 2017-04-08 21:21:15 Committer: Michał Walenciak 2017-04-08 21:21:15 Broken commits are quite rare and I cannot find any rule when they happen. Git 2.12.2 here, Arch linux. regards -- Michał Walenciak gmail.com kicer86 http://kicer.sileman.net.pl gg: 3729519
Re: Tools that do an automatic fetch defeat "git push --force-with-lease"
Ævar Arnfjör? Bjarmasonwrote: > On Sat, Apr 8, 2017 at 5:03 PM, Stefan Haller wrote: > > > Here's a rough proposal for how I would imagine this to work. > > > > For every local branch that has a remote tracking branch, git maintains > > a new config entry branch.*.integrated, which records the sha1 of the > > last upstream commit that was integrated into the local branch. > > Can you elaborate on what "integrate" means in this context? > > In some ways the entire point of this feature is that you're trying to > push over history that you don't want to integrate. > > I.e. you're trying to force push your unrelated X over remote history > A, but not more recent arbitrary history B based on A which someone > may have just pushed. > > I'm having a hard time imagining how anything merge/rebase/whatever > would place in branch.*.integrated wouldn't just be a roundabout way > of recording the info we now record via the tracking branch, or in > cases where that's auto-updated for some reason having another > tracking branch as my "[PATCH] push: document & test > --force-with-lease with multiple remotes" suggests. It doesn't matter whether the history you are overwriting is arbitrary, or whether the new history you are pushing is related or unrelated to what you are overwriting. What matters is whether you are aware of what you are overwriting. I want to record all cases where the local branch is brought up to date with the tracking branch (or vice versa), i.e. mostly push and pull, because I know that after pushing or pulling, my local branch is up to date (in some way) with the tracking branch. If I then rewrite the local branch, I know it is safe to push it *if* the branch on the remote is still in the same state as what I recorded for last push or pull. If the tracking branch is updated by fetch though, then my local branch is not brought up to date with the remote branch, so I may be overwriting stuff that appeared on the remote without me being aware of it. It may well be that there are better names then "integrate"; suggestions welcome. Your suggestion to use a second remote doesn't seem like a satisfactory solution to me, firstly because it's extra work and complexity for the user, and second because it doesn't solve the problem of working with more than one local branch (pulling one branch amounts to a fetch for the other). -- Stefan Haller Berlin, Germany http://www.haller-berlin.de/
Re: Feature request: --format=json
On Sat, Apr 8, 2017 at 6:07 PM, Fred .Flintstonewrote: > $ git log --format=json > [{ > "commit": "64eabf050e315a4c7a11e0c05ca163be7cf9075e", > "tree": "b1e977800f40bbf6de906b1fe4f2de4b4b14f0fd", > "author": "Tux 1490981516 +0200", > "committer": "Tux 1490981516 +0200", > "message": "This is a test commit", > "long_message": "This explains in more details the commit" > }] > > This would make it easy to parse the output. The git-log command isn't plumbing that's meant for machines, but the git-for-each-ref command is what you're most likely looking for. It doesn't have JSON output, but you can make e.g. --format emit something even more easily parsable, e.g. a version of what you have with each field delimited by a custom delimiter, and then split on that. It does have --perl, --tcl etc. options to make it easy to quote the fields, however there's no logic to manage the state machine JSON would need to omit trailing commas, whereas emitting output for languages like Perl where trailing commas don't matter is much easier.
Feature request: --format=json
$ git log --format=json [{ "commit": "64eabf050e315a4c7a11e0c05ca163be7cf9075e", "tree": "b1e977800f40bbf6de906b1fe4f2de4b4b14f0fd", "author": "Tux1490981516 +0200", "committer": "Tux 1490981516 +0200", "message": "This is a test commit", "long_message": "This explains in more details the commit" }] This would make it easy to parse the output.
Re: Tools that do an automatic fetch defeat "git push --force-with-lease"
On Sat, Apr 8, 2017 at 5:03 PM, Stefan Hallerwrote: > Matt McCutchen wrote: > >> When I'm rewriting history, "git push --force-with-lease" is a nice >> safeguard compared to "git push --force", but it still assumes the >> remote-tracking ref gives the old state the user wants to overwrite. >> Tools that do an implicit fetch, assuming it to be a safe operation, >> may break this assumption. In the worst case, Visual Studio Code does >> an automatic fetch every 3 minutes by default [1], making >> --force-with-lease pretty much reduce to --force. >> >> For a safer workflow, "git push" would check against a separate "old" >> ref that isn't updated by "git fetch", but is updated by "git push" the >> same way the remote-tracking ref is and maybe also by commands that >> update the local branch to take into account remote changes (I'm not >> sure what reasonable scenarios there are, if any). > > Here's a rough proposal for how I would imagine this to work. > > For every local branch that has a remote tracking branch, git maintains > a new config entry branch.*.integrated, which records the sha1 of the > last upstream commit that was integrated into the local branch. Can you elaborate on what "integrate" means in this context? In some ways the entire point of this feature is that you're trying to push over history that you don't want to integrate. I.e. you're trying to force push your unrelated X over remote history A, but not more recent arbitrary history B based on A which someone may have just pushed. I'm having a hard time imagining how anything merge/rebase/whatever would place in branch.*.integrated wouldn't just be a roundabout way of recording the info we now record via the tracking branch, or in cases where that's auto-updated for some reason having another tracking branch as my "[PATCH] push: document & test --force-with-lease with multiple remotes" suggests. But maybe I'm just missing something obvious...
Re: Tools that do an automatic fetch defeat "git push --force-with-lease"
Jeff Kingwrote: > I think Matt's point is just that the default, to use origin/branch, is > unsafe. It's convenient when you don't have extra fetches, but that > convenience may not be worth the potential surprise. I don't think "surprise" is the right word here. The point of --force-with-lease is to provide a guarantee, so if you can't trust the guarantee, it makes the feature rather pointless. -- Stefan Haller Berlin, Germany http://www.haller-berlin.de/
Re: Tools that do an automatic fetch defeat "git push --force-with-lease"
Matt McCutchenwrote: > When I'm rewriting history, "git push --force-with-lease" is a nice > safeguard compared to "git push --force", but it still assumes the > remote-tracking ref gives the old state the user wants to overwrite. > Tools that do an implicit fetch, assuming it to be a safe operation, > may break this assumption. In the worst case, Visual Studio Code does > an automatic fetch every 3 minutes by default [1], making > --force-with-lease pretty much reduce to --force. > > For a safer workflow, "git push" would check against a separate "old" > ref that isn't updated by "git fetch", but is updated by "git push" the > same way the remote-tracking ref is and maybe also by commands that > update the local branch to take into account remote changes (I'm not > sure what reasonable scenarios there are, if any). Here's a rough proposal for how I would imagine this to work. For every local branch that has a remote tracking branch, git maintains a new config entry branch.*.integrated, which records the sha1 of the last upstream commit that was integrated into the local branch. When --force-with-lease is used without an argument, it will use the values of "branch.*.remote:branch.*.integrated" as an argument. If either doesn't exist, the push fails (this is essential). Initially the "integrated" entry is created at the same time that branch.*.merge is, i.e. with commits like "git checkout -b name-of-remote-branch", or "git branch --set-upstream-to" and the like, and it will be set to the sha1 that the tip of the remote tracking branch has at that time. Then, every command that either integrates the remote tracking branch into the local branch, or updates the remote tracking branch to the local branch, will update the value of the "integrated" entry. The most obvious ones are "git pull" and "git push", or course; others that may have to be supported are "git rebase @{u}", "git rebase --onto @{u}", "git reset @{u}", and probably others. The nice thing about these is that initially they don't have to be supported for the feature to still be useful. After using one of them, push --force-with-lease will fail, and the user can then investigate the situation and either use push -f or manually update branch.*.integrated when they have convinced themselves that everything is fine. I find it essential that --force-with-lease might fail erroneously, but never succeed erroneously, and I think this proposal would guarantee that. -- Stefan Haller Berlin, Germany http://www.haller-berlin.de/
Re: [PATCH 4/6] receive-pack: quarantine objects until pre-receive accepts
On Sat, Oct 1, 2016 at 11:12 AM, Jeff Kingwrote: > On Fri, Sep 30, 2016 at 03:36:30PM -0400, Jeff King wrote: > >> @@ -1639,6 +1666,18 @@ static const char *unpack(int err_fd, struct >> shallow_info *si) >> argv_array_push(, alt_shallow_file); >> } >> >> + tmp_objdir = tmp_objdir_create(); >> + if (!tmp_objdir) >> + return "unable to create temporary object directory"; >> + child.env = tmp_objdir_env(tmp_objdir); > > One thing to note here: this new code kicks in all the time. My > reasoning was that there's basically no time you _wouldn't_ want it to, > and certainly that was the case for us when I wrote it. But I tried to > think of user-visible changes. Here's what I came up with: > > - we currently leave the tmp_pack_* for a failed push sitting around > (e.g., if the client hangs up halfway through, or index-pack rejects > the pack for some reason). But with this series, it would always be > cleaned up. That's a very good thing if you're running a git hosting > site. It might make things harder if you're debugging. > > I don't think it's a good reason not to enable this by default, but > it _could_ be a reason to have a config switch to turn it off > temporarily (or just leave the "incoming-*" directory in place). > > - the environment that pre-receive pack runs in has access to objects > that the rest of the repository doesn't. So if you were to do > something silly in your pre-receive like: > > # reject the push, but log a copy of the objects > git update-ref refs/rejected/$(date +%s) $new_sha1 > exit 1 > > Then your ref-update would succeed (you have $new_sha1), but the > objects would be deleted immediately afterwards. I find this a > somewhat questionable pattern, and I have no idea if anybody else > has thought of it. But it _does_ work today, and not with this > series. > > I don't think it would be too hard to put a config conditional around > this tmp_objdir_create(). And then all of the tmp_objdir_env() calls > would just return NULL, and effectively become noops. Very late reply, but I have a question about this. Is there anything you can do on the plumbing level to figure out which area an object is in (of course that's not mutually exclusive). The use-case for that is e.g. having a hook that rejects large binaries, but has an exception for a binary that's been added in the past, before your change there's no optimal way to find that out, you'd need to go over the whole history and list all blobs that have ever been added, with your change it would be trivial if something could give me "this object is in the quarantine area", but AFAICT there's no API that'll show me that. Also, I think this whole thing could really do with some documentation in githooks(5). E.g. what hooks does it apply for? The test is just for pre-receive but the patch changes run_update_hook(), does it also take effect for update? Ditto the caveat about update-ref.
Assalamu`Alaikum.
Dear Sir/Madam. Assalamu`Alaikum. I am Dr mohammad ouattara, I have ($10.6 Million us dollars) to transfer into your account, I will send you more details about this deal and the procedures to follow when I receive a positive response from you, Have a great day, Dr mohammad ouattara.
Re: [PATCH v2] unpack-trees: avoid duplicate ODB lookups during checkout
Am 07.04.2017 um 17:53 schrieb g...@jeffhostetler.com: From: Jeff HostetlerTeach traverse_trees_recursive() to not do redundant ODB lookups when both directories refer to the same OID. In operations such as read-tree, checkout, and merge when the differences between the commits are relatively small, there will likely be many directories that have the same SHA-1. In these cases we can avoid hitting the ODB multiple times for the same SHA-1. This patch handles n=2 and n=3 cases and simply copies the data rather than repeating the fill_tree_descriptor(). On the Windows repo (500K trees, 3.1M files, 450MB index), this reduced the overall time by 0.75 seconds when cycling between 2 commits with a single file difference. (avg) before: 22.699 (avg) after: 21.955 === Using the p0004-read-tree test (posted earlier this week) with 1M files on Linux: before: $ ./p0004-read-tree.sh 0004.5: switch work1 work2 (1003037) 11.99(8.12+3.32) 0004.6: switch commit aliases (1003037)11.95(8.20+3.14) after: $ ./p0004-read-tree.sh 0004.5: switch work1 work2 (1003037) 11.02(7.71+2.78) 0004.6: switch commit aliases (1003037)10.95(7.57+2.82) Signed-off-by: Jeff Hostetler --- unpack-trees.c | 23 +++ 1 file changed, 19 insertions(+), 4 deletions(-) diff --git a/unpack-trees.c b/unpack-trees.c index 3a8ee19..143c5d9 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -531,6 +531,11 @@ static int switch_cache_bottom(struct traverse_info *info) return ret; } +static inline int are_same_oid(struct name_entry *name_j, struct name_entry *name_k) +{ + return name_j->oid && name_k->oid && !oidcmp(name_j->oid, name_k->oid); +} + static int traverse_trees_recursive(int n, unsigned long dirmask, unsigned long df_conflicts, struct name_entry *names, @@ -554,10 +559,20 @@ static int traverse_trees_recursive(int n, unsigned long dirmask, newinfo.df_conflicts |= df_conflicts; for (i = 0; i < n; i++, dirmask >>= 1) { - const unsigned char *sha1 = NULL; - if (dirmask & 1) - sha1 = names[i].oid->hash; - buf[i] = fill_tree_descriptor(t+i, sha1); + if (i > 0 && are_same_oid([i], [i-1])) { + /* implicitly borrow buf[i-1] inside tree_desc[i] */ + memcpy([i], [i-1], sizeof(struct tree_desc)); An assignment would be simpler: t[i] = t[i - 1]; + buf[i] = NULL; + } else if (i > 1 && are_same_oid([i], [i-2])) { + /* implicitly borrow buf[i-2] inside tree_desc[i] */ + memcpy([i], [i-2], sizeof(struct tree_desc)); Similar case. + buf[i] = NULL; What makes the previous two entries special, or differently put: Why not just check *all* previous entries? MAX_UNPACK_TREES is 8; the number of comparisons would just about double in the worst case and stay the same for three trees or less. The order of four trees or more wouldn't matter anymore. + } else { + const unsigned char *sha1 = NULL; + if (dirmask & 1) + sha1 = names[i].oid->hash; + buf[i] = fill_tree_descriptor(t+i, sha1); + } } bottom = switch_cache_bottom();
[PATCH 10/12] grep: change the internal PCRE macro names to be PCRE1
Change the internal USE_LIBPCRE define, & build options flag to use a naming convention ending in PCRE1, without changing the long-standing USE_LIBPCRE Makefile flag which enables this code. This is for preparation for libpcre2 support where having things like USE_LIBPCRE and USE_LIBPCRE2 in any more places than we absolutely need to for backwards compatibility with old Makefile arguments would be confusing. In some ways it would be better to change everything that now uses USE_LIBPCRE to use USE_LIBPCRE1, and to make specifying USE_LIBPCRE (or --with-pcre) an error. This would impose a one-time burden on packagers of git to s/USE_LIBPCRE/USE_LIBPCRE1/ in their build scripts. However I'd like to leave the door open to making USE_LIBPCRE=YesPlease eventually mean USE_LIBPCRE2=YesPlease, i.e. once PCRE v2 is ubiquitous enough that it makes sense to make it the default. This code and the USE_LIBPCRE Makefile argument was added in commit 63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09). At the time there was no indication that the PCRE project would release an entirely new & incompatible API around 3 years later. Signed-off-by: Ævar Arnfjörð Bjarmason--- Makefile | 4 ++-- grep.c| 6 +++--- grep.h| 2 +- t/test-lib.sh | 2 +- 4 files changed, 7 insertions(+), 7 deletions(-) diff --git a/Makefile b/Makefile index 23945d87cf..c8a26087e3 100644 --- a/Makefile +++ b/Makefile @@ -1084,7 +1084,7 @@ ifdef NO_LIBGEN_H endif ifdef USE_LIBPCRE - BASIC_CFLAGS += -DUSE_LIBPCRE + BASIC_CFLAGS += -DUSE_LIBPCRE1 ifdef LIBPCREDIR BASIC_CFLAGS += -I$(LIBPCREDIR)/include EXTLIBS += -L$(LIBPCREDIR)/$(lib) $(CC_LD_DYNPATH)$(LIBPCREDIR)/$(lib) @@ -2235,7 +2235,7 @@ GIT-BUILD-OPTIONS: FORCE @echo TAR=\''$(subst ','\'',$(subst ','\'',$(TAR)))'\' >>$@+ @echo NO_CURL=\''$(subst ','\'',$(subst ','\'',$(NO_CURL)))'\' >>$@+ @echo NO_EXPAT=\''$(subst ','\'',$(subst ','\'',$(NO_EXPAT)))'\' >>$@+ - @echo USE_LIBPCRE=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE)))'\' >>$@+ + @echo USE_LIBPCRE1=\''$(subst ','\'',$(subst ','\'',$(USE_LIBPCRE)))'\' >>$@+ @echo NO_PERL=\''$(subst ','\'',$(subst ','\'',$(NO_PERL)))'\' >>$@+ @echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@+ @echo NO_UNIX_SOCKETS=\''$(subst ','\'',$(subst ','\'',$(NO_UNIX_SOCKETS)))'\' >>$@+ diff --git a/grep.c b/grep.c index 1575f8f9ed..99b9e9447f 100644 --- a/grep.c +++ b/grep.c @@ -325,7 +325,7 @@ static NORETURN void compile_regexp_failed(const struct grep_pat *p, die("%s'%s': %s", where, p->pattern, error); } -#ifdef USE_LIBPCRE +#ifdef USE_LIBPCRE1 static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt) { const char *error; @@ -377,7 +377,7 @@ static void free_pcre_regexp(struct grep_pat *p) pcre_free(p->pcre_extra_info); pcre_free((void *)p->pcre_tables); } -#else /* !USE_LIBPCRE */ +#else /* !USE_LIBPCRE1 */ static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt) { die("cannot use Perl-compatible regexes when not compiled with USE_LIBPCRE"); @@ -392,7 +392,7 @@ static int pcrematch(struct grep_pat *p, const char *line, const char *eol, static void free_pcre_regexp(struct grep_pat *p) { } -#endif /* !USE_LIBPCRE */ +#endif /* !USE_LIBPCRE1 */ static int is_fixed(const char *s, size_t len) { diff --git a/grep.h b/grep.h index 267534ca24..073b0e4c92 100644 --- a/grep.h +++ b/grep.h @@ -1,7 +1,7 @@ #ifndef GREP_H #define GREP_H #include "color.h" -#ifdef USE_LIBPCRE +#ifdef USE_LIBPCRE1 #include #else typedef int pcre; diff --git a/t/test-lib.sh b/t/test-lib.sh index 6abf1d1918..e5cfbcc36b 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -1010,7 +1010,7 @@ esac ( COLUMNS=1 && test $COLUMNS = 1 ) && test_set_prereq COLUMNS_CAN_BE_1 test -z "$NO_PERL" && test_set_prereq PERL test -z "$NO_PYTHON" && test_set_prereq PYTHON -test -n "$USE_LIBPCRE" && test_set_prereq PCRE +test -n "$USE_LIBPCRE1" && test_set_prereq PCRE test -z "$NO_GETTEXT" && test_set_prereq GETTEXT # Can we rely on git's output in the C locale? -- 2.11.0
[PATCH 12/12] grep: add support for PCRE v2
Add support for v2 of the PCRE API. This is a new major version of PCRE that came out in early 2015[1]. The regular expression syntax is the same, but while similar-ish requires a different codepath to support it. Git can now be compiled with any combination of USE_LIBPCRE=[YesPlease|] & USE_LIBPCRE2=[YesPlease|]. If both are provided the version of the PCRE library can be selected at runtime with grep.PatternType, but the default (for now) is v1. This table shows what the various combinations do depending on what libraries Git is compiled against: |--+-+-+--| | grep.PatternType | v1 | v2 | v1 && v2 | |--+-+-+--| | perl | v1 | v2 | v1 | | pcre | v1 | v2 | v1 | | pcre1| v1 | ERR | v1 | | pcre2| ERR | v2 | v2 | |--+-+-+--| When Git is only compiled with v2 grep.PatternType=perl, --perl-regexp & -P will use v2. All tests pass with this new PCRE version. When Git is compiled with both v1 & v2 most of the tests will only test v1, but there are some v2-specific tests that will be run. Originally I started work on this series because my ad-hoc patch to support v2 ("Very promising results with libpcre2",
[PATCH 06/12] log: add -P as a synonym for --perl-regexp
Add a short -P option as a synonym for the longer --perl-regexp, for consistency with the options the corresponding grep invocations accept. This was intentionally omitted in commit 727b6fc3ed ("log --grep: accept --basic-regexp and --perl-regexp", 2012-10-03) for unspecified future use. Since nothing has come along in over 4 1/2 years that's wanted to use it, it's more valuable to make it consistent with "grep" than to keep it open for future use, and to avoid the confusion of -P meaning different things for grep & log, as is the case with the -G option. As noted in the aforementioned commit the --basic-regexp option can't have a corresponding -G argument, as the log command already uses that for -G. Signed-off-by: Ævar Arnfjörð Bjarmason--- Documentation/rev-list-options.txt | 1 + revision.c | 2 +- t/t4202-log.sh | 9 + 3 files changed, 11 insertions(+), 1 deletion(-) diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt index a02f7324c0..5b7fa49a7f 100644 --- a/Documentation/rev-list-options.txt +++ b/Documentation/rev-list-options.txt @@ -91,6 +91,7 @@ endif::git-rev-list[] Consider the limiting patterns to be fixed strings (don't interpret pattern as a regular expression). +-P:: --perl-regexp:: Consider the limiting patterns to be Perl-compatible regular expressions. Requires libpcre to be compiled in. diff --git a/revision.c b/revision.c index 7ff61ff5f7..03a3a012de 100644 --- a/revision.c +++ b/revision.c @@ -1995,7 +1995,7 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg DIFF_OPT_SET(>diffopt, PICKAXE_IGNORE_CASE); } else if (!strcmp(arg, "--fixed-strings") || !strcmp(arg, "-F")) { revs->grep_filter.pattern_type_option = GREP_PATTERN_TYPE_FIXED; - } else if (!strcmp(arg, "--perl-regexp")) { + } else if (!strcmp(arg, "--perl-regexp") || !strcmp(arg, "-P")) { revs->grep_filter.pattern_type_option = GREP_PATTERN_TYPE_PCRE; } else if (!strcmp(arg, "--all-match")) { revs->grep_filter.all_match = 1; diff --git a/t/t4202-log.sh b/t/t4202-log.sh index 00d0585253..6f7c74e027 100755 --- a/t/t4202-log.sh +++ b/t/t4202-log.sh @@ -320,8 +320,17 @@ test_expect_success 'log with various grep.patternType configurations & command- --grep="(1|2)" >actual.fixed.short-arg && git log --pretty=tformat:%s -E \ --grep="\|2" >actual.extended.short-arg && + if test_have_prereq PCRE + then + git log --pretty=tformat:%s -P \ + --grep="\((?=1)" >actual.perl.short-arg + fi && test_cmp expect.fixed actual.fixed.short-arg && test_cmp expect.extended actual.extended.short-arg && + if test_have_prereq PCRE + then + test_cmp expect.perl actual.perl.short-arg + fi && git log --pretty=tformat:%s --fixed-strings \ --grep="(1|2)" >actual.fixed.long-arg && -- 2.11.0
[PATCH 04/12] grep: add a test for backreferences in PCRE patterns
Add a test for backreferences such as (.)\1 in PCRE patterns. This test ensures that the PCRE_NO_AUTO_CAPTURE option isn't turned on. Before this change turning it on would break these sort of patterns, but wouldn't break any tests. Signed-off-by: Ævar Arnfjörð Bjarmason--- t/t7810-grep.sh | 7 +++ 1 file changed, 7 insertions(+) diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh index 53c2ca05c4..83b0ee53be 100755 --- a/t/t7810-grep.sh +++ b/t/t7810-grep.sh @@ -1102,6 +1102,13 @@ test_expect_success LIBPCRE 'grep -P -w pattern' ' test_cmp expected actual ' +test_expect_success PCRE 'grep -P backreferences work (the PCRE NO_AUTO_CAPTURE flag is not set)' ' + git grep -P -h "(?P.)(?P=one)" hello_world >actual && + test_cmp hello_world actual && + git grep -P -h "(.)\1" hello_world >actual && + test_cmp hello_world actual +' + test_expect_success 'grep -G invalidpattern properly dies ' ' test_must_fail git grep -G "a[" ' -- 2.11.0
[PATCH 02/12] grep: remove redundant regflags assignment under PCRE
Remove a redundant assignment to the "regflags" variable. This variable is only used for POSIX regular expression matching, not when the PCRE library is used. This redundant assignment was added as a result of copy/paste programming in commit 84befcd0a4 ("grep: add a grep.patternType configuration setting", 2012-08-03). That commit modified already working code in commit cca2c172e0 ("git-grep: do not die upon -F/-P when grep.extendedRegexp is set.", 2011-05-09) which didn't assign to regflags when under PCRE. Revert back to that behavior, more to reduce "wait this is used under PCRE how?" confusion when reading the code, than to to save ourselves trivial CPU cycles by removing one assignment. Signed-off-by: Ævar Arnfjörð Bjarmason--- grep.c | 1 - 1 file changed, 1 deletion(-) diff --git a/grep.c b/grep.c index 56ef0ecbff..8564fe726d 100644 --- a/grep.c +++ b/grep.c @@ -197,7 +197,6 @@ static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, st case GREP_PATTERN_TYPE_PCRE: opt->fixed = 0; opt->pcre = 1; - opt->regflags &= ~REG_EXTENDED; break; } } -- 2.11.0
[PATCH 05/12] log: add exhaustive tests for pattern style options & config
Add exhaustive tests for how the different grep.patternType options & the corresponding command-line options affect git-log. Before this change it was possible to patch revision.c so that the --basic-regexp option was synonymous with --extended-regexp, and --perl-regexp wasn't recognized at all, and still have 100% of the test suite pass. This was because the first test being modified here, added in commit 34a4ae55b2 ("log --grep: use the same helper to set -E/-F options as "git grep"", 2012-10-03), didn't actually check whether we'd enabled extended regular expressions as distinct from re-toggling non-fixed string support. Fix that by changing the pattern to a pattern that'll only match if --extended-regexp option is provided, but won't match under the default --basic-regexp option. Other potential regressions were possible since there were no tests for the rest of the combinations of grep.patternType configuration toggles & corresponding git-log command-line options. Add exhaustive tests for those. The patterns being passed to fixed/basic/extended/PCRE are carefully crafted to return the wrong thing if the grep engine were to pick any other matching method than the one it's told to use. Signed-off-by: Ævar Arnfjörð Bjarmason--- t/t4202-log.sh | 67 +- 1 file changed, 66 insertions(+), 1 deletion(-) diff --git a/t/t4202-log.sh b/t/t4202-log.sh index 48b55bfd27..00d0585253 100755 --- a/t/t4202-log.sh +++ b/t/t4202-log.sh @@ -261,7 +261,13 @@ test_expect_success 'log --grep -i' ' test_expect_success 'log -F -E --grep= uses ere' ' echo second >expect && - git log -1 --pretty="tformat:%s" -F -E --grep=s.c.nd >actual && + git log -1 --pretty="tformat:%s" -F -E --grep="(s).c.nd" >actual && + test_cmp expect actual +' + +test_expect_success LIBPCRE 'log -F -E --perl-regexp --grep= uses PCRE' ' + echo second >expect && + git log -1 --pretty="tformat:%s" -F -E --perl-regexp --grep="s(?=ec)econd" >actual && test_cmp expect actual ' @@ -279,6 +285,65 @@ test_expect_success 'log with grep.patternType configuration and command line' ' test_cmp expect actual ' +test_expect_success 'log with various grep.patternType configurations & command-lines' ' + git init pattern-type && + ( + cd pattern-type && + test_commit 1 file A && + test_commit "(1|2)" file B && + + echo "(1|2)" >expect.fixed && + cp expect.fixed expect.basic && + cp expect.fixed expect.extended && + cp expect.fixed expect.perl && + + git -c grep.patternType=fixed log --pretty=tformat:%s \ + --grep="(1|2)" >actual.fixed && + git -c grep.patternType=basic log --pretty=tformat:%s \ + --grep="(.|.)" >actual.basic && + git -c grep.patternType=extended log --pretty=tformat:%s \ + --grep="\|2" >actual.extended && + if test_have_prereq LIBPCRE + then + git -c grep.patternType=perl log --pretty=tformat:%s \ + --grep="\((?=1)" >actual.perl + fi && + test_cmp expect.fixed actual.fixed && + test_cmp expect.basic actual.basic && + test_cmp expect.extended actual.extended && + if test_have_prereq LIBPCRE + then + test_cmp expect.perl actual.perl + fi && + + git log --pretty=tformat:%s -F \ + --grep="(1|2)" >actual.fixed.short-arg && + git log --pretty=tformat:%s -E \ + --grep="\|2" >actual.extended.short-arg && + test_cmp expect.fixed actual.fixed.short-arg && + test_cmp expect.extended actual.extended.short-arg && + + git log --pretty=tformat:%s --fixed-strings \ + --grep="(1|2)" >actual.fixed.long-arg && + git log --pretty=tformat:%s --basic-regexp \ + --grep="(.|.)" >actual.basic.long-arg && + git log --pretty=tformat:%s --extended-regexp \ + --grep="\|2" >actual.extended.long-arg && + if test_have_prereq LIBPCRE + then + git log --pretty=tformat:%s --perl-regexp \ + --grep="\((?=1)" >actual.perl.long-arg + fi && + test_cmp expect.fixed actual.fixed.long-arg && + test_cmp expect.basic actual.basic.long-arg && + test_cmp expect.extended actual.extended.long-arg && + if test_have_prereq LIBPCRE + then + test_cmp expect.perl actual.perl.long-arg + fi + ) +' + cat > expect <
[PATCH 09/12] test-lib: rename the LIBPCRE prerequisite to PCRE
Rename the LIBPCRE prerequisite to PCRE. This is for preparation for libpcre2 support, where having just "LIBPCRE" would be confusing as it implies v1 of the library. None of these tests are incompatible between versions 1 & 2 of libpcre, it's less confusing to give them a more general name to make it clear that they work on both library versions. Signed-off-by: Ævar Arnfjörð Bjarmason--- t/README| 4 ++-- t/t4202-log.sh | 10 +- t/t7810-grep.sh | 30 +++--- t/t7812-grep-icase-non-ascii.sh | 4 ++-- t/t7813-grep-icase-iso.sh | 2 +- t/test-lib.sh | 2 +- 6 files changed, 26 insertions(+), 26 deletions(-) diff --git a/t/README b/t/README index ab386c3681..a90cb62583 100644 --- a/t/README +++ b/t/README @@ -803,9 +803,9 @@ use these, and "test_set_prereq" for how to define your own. Test is not run by root user, and an attempt to write to an unwritable file is expected to fail correctly. - - LIBPCRE + - PCRE - Git was compiled with USE_LIBPCRE=YesPlease. Wrap any tests + Git was compiled with support for PCRE. Wrap any tests that use git-grep --perl-regexp or git-grep -P in these. - CASE_INSENSITIVE_FS diff --git a/t/t4202-log.sh b/t/t4202-log.sh index 6f7c74e027..93bb8a 100755 --- a/t/t4202-log.sh +++ b/t/t4202-log.sh @@ -265,7 +265,7 @@ test_expect_success 'log -F -E --grep= uses ere' ' test_cmp expect actual ' -test_expect_success LIBPCRE 'log -F -E --perl-regexp --grep= uses PCRE' ' +test_expect_success PCRE 'log -F -E --perl-regexp --grep= uses PCRE' ' echo second >expect && git log -1 --pretty="tformat:%s" -F -E --perl-regexp --grep="s(?=ec)econd" >actual && test_cmp expect actual @@ -303,7 +303,7 @@ test_expect_success 'log with various grep.patternType configurations & command- --grep="(.|.)" >actual.basic && git -c grep.patternType=extended log --pretty=tformat:%s \ --grep="\|2" >actual.extended && - if test_have_prereq LIBPCRE + if test_have_prereq PCRE then git -c grep.patternType=perl log --pretty=tformat:%s \ --grep="\((?=1)" >actual.perl @@ -311,7 +311,7 @@ test_expect_success 'log with various grep.patternType configurations & command- test_cmp expect.fixed actual.fixed && test_cmp expect.basic actual.basic && test_cmp expect.extended actual.extended && - if test_have_prereq LIBPCRE + if test_have_prereq PCRE then test_cmp expect.perl actual.perl fi && @@ -338,7 +338,7 @@ test_expect_success 'log with various grep.patternType configurations & command- --grep="(.|.)" >actual.basic.long-arg && git log --pretty=tformat:%s --extended-regexp \ --grep="\|2" >actual.extended.long-arg && - if test_have_prereq LIBPCRE + if test_have_prereq PCRE then git log --pretty=tformat:%s --perl-regexp \ --grep="\((?=1)" >actual.perl.long-arg @@ -346,7 +346,7 @@ test_expect_success 'log with various grep.patternType configurations & command- test_cmp expect.fixed actual.fixed.long-arg && test_cmp expect.basic actual.basic.long-arg && test_cmp expect.extended actual.extended.long-arg && - if test_have_prereq LIBPCRE + if test_have_prereq PCRE then test_cmp expect.perl actual.perl.long-arg fi diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh index b50f1dff43..46f528183d 100755 --- a/t/t7810-grep.sh +++ b/t/t7810-grep.sh @@ -275,7 +275,7 @@ do test_cmp expected actual ' - test_expect_success LIBPCRE "grep $L with grep.patterntype=perl" ' + test_expect_success PCRE "grep $L with grep.patterntype=perl" ' echo "${HC}ab:a+b*c" >expected && git -c grep.patterntype=perl grep "a\x{2b}b\x{2a}c" $H ab >actual && test_cmp expected actual @@ -1053,12 +1053,12 @@ hello.c:int main(int argc, const char **argv) hello.c: printf("Hello world.\n"); EOF -test_expect_success LIBPCRE 'grep --perl-regexp pattern' ' +test_expect_success PCRE 'grep --perl-regexp pattern' ' git grep --perl-regexp "\p{Ps}.*?\p{Pe}" hello.c >actual && test_cmp expected actual ' -test_expect_success LIBPCRE 'grep -P pattern' ' +test_expect_success PCRE 'grep -P pattern' ' git grep -P "\p{Ps}.*?\p{Pe}" hello.c >actual && test_cmp expected actual ' @@ -1070,13 +1070,13 @@ test_expect_success 'grep pattern with grep.extendedRegexp=true' '
[PATCH 11/12] grep: change the internal PCRE code & header names to be PCRE1
Change the internal PCRE variable & function names to have a "1" suffix. This is for preparation for libpcre2 support, where having non-versioned names would be confusing. The earlier "grep: change the internal PCRE macro names to be PCRE1" change elaborates on the motivations behind this commit. Signed-off-by: Ævar Arnfjörð Bjarmason--- builtin/grep.c | 4 ++-- grep.c | 56 grep.h | 10 +- revision.c | 2 +- 4 files changed, 36 insertions(+), 36 deletions(-) diff --git a/builtin/grep.c b/builtin/grep.c index 9478ab5dff..dffb9743b8 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -490,7 +490,7 @@ static void compile_submodule_options(const struct grep_opt *opt, case GREP_PATTERN_TYPE_FIXED: argv_array_push(_options, "-F"); break; - case GREP_PATTERN_TYPE_PCRE: + case GREP_PATTERN_TYPE_PCRE1: argv_array_push(_options, "-P"); break; case GREP_PATTERN_TYPE_UNSPECIFIED: @@ -1036,7 +1036,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix) GREP_PATTERN_TYPE_FIXED), OPT_SET_INT('P', "perl-regexp", _type_arg, N_("use Perl-compatible regular expressions"), - GREP_PATTERN_TYPE_PCRE), + GREP_PATTERN_TYPE_PCRE1), OPT_GROUP(""), OPT_BOOL('n', "line-number", , N_("show line numbers")), OPT_NEGBIT('h', NULL, , N_("don't show filenames"), 1), diff --git a/grep.c b/grep.c index 99b9e9447f..ac7d6f9bbf 100644 --- a/grep.c +++ b/grep.c @@ -63,7 +63,7 @@ static int parse_pattern_type_arg(const char *opt, const char *arg) else if (!strcmp(arg, "perl") || !strcmp(arg, "pcre") || !strcmp(arg, "pcre1")) - return GREP_PATTERN_TYPE_PCRE; + return GREP_PATTERN_TYPE_PCRE1; die("bad %s argument: %s", opt, arg); } @@ -180,25 +180,25 @@ static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, st case GREP_PATTERN_TYPE_BRE: opt->fixed = 0; - opt->pcre = 0; + opt->pcre1 = 0; opt->regflags &= ~REG_EXTENDED; break; case GREP_PATTERN_TYPE_ERE: opt->fixed = 0; - opt->pcre = 0; + opt->pcre1 = 0; opt->regflags |= REG_EXTENDED; break; case GREP_PATTERN_TYPE_FIXED: opt->fixed = 1; - opt->pcre = 0; + opt->pcre1 = 0; opt->regflags &= ~REG_EXTENDED; break; - case GREP_PATTERN_TYPE_PCRE: + case GREP_PATTERN_TYPE_PCRE1: opt->fixed = 0; - opt->pcre = 1; + opt->pcre1 = 1; break; } } @@ -326,7 +326,7 @@ static NORETURN void compile_regexp_failed(const struct grep_pat *p, } #ifdef USE_LIBPCRE1 -static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt) +static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt) { const char *error; int erroffset; @@ -334,23 +334,23 @@ static void compile_pcre_regexp(struct grep_pat *p, const struct grep_opt *opt) if (opt->ignore_case) { if (has_non_ascii(p->pattern)) - p->pcre_tables = pcre_maketables(); + p->pcre1_tables = pcre_maketables(); options |= PCRE_CASELESS; } if (is_utf8_locale() && has_non_ascii(p->pattern)) options |= PCRE_UTF8; - p->pcre_regexp = pcre_compile(p->pattern, options, , , - p->pcre_tables); - if (!p->pcre_regexp) + p->pcre1_regexp = pcre_compile(p->pattern, options, , , + p->pcre1_tables); + if (!p->pcre1_regexp) compile_regexp_failed(p, error); - p->pcre_extra_info = pcre_study(p->pcre_regexp, 0, ); - if (!p->pcre_extra_info && error) + p->pcre1_extra_info = pcre_study(p->pcre1_regexp, 0, ); + if (!p->pcre1_extra_info && error) die("%s", error); } -static int pcrematch(struct grep_pat *p, const char *line, const char *eol, +static int pcre1match(struct grep_pat *p, const char *line, const char *eol, regmatch_t *match, int eflags) { int ovector[30], ret, flags = 0; @@ -358,7 +358,7 @@ static int pcrematch(struct grep_pat *p, const char *line, const char *eol, if (eflags & REG_NOTBOL) flags |= PCRE_NOTBOL; - ret = pcre_exec(p->pcre_regexp, p->pcre_extra_info, line, eol - line, + ret = pcre_exec(p->pcre1_regexp, p->pcre1_extra_info, line, eol - line,
[PATCH 07/12] grep & rev-list doc: stop promising libpcre for --perl-regexp
Stop promising in our grep & rev-list options documentation that we're always going to be using libpcre when given the --perl-regexp option. Instead talk about using "Perl-compatible regular expressions" and using "the PCRE library". Saying "libpcre" strongly suggests that we might be talking about libpcre.so, which is always going to be v1. Not only do does the documentation now align more clearly with future plans, but it should be more easily readable to the layman. This is for preparation for libpcre2 support, which in the future may be powering the --perl-regexp option by default, depending on how Git is compiled. Signed-off-by: Ævar Arnfjörð Bjarmason--- Documentation/git-grep.txt | 4 ++-- Documentation/rev-list-options.txt | 5 +++-- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index 7b52e3fbc4..1c01154dbe 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -161,8 +161,8 @@ OPTIONS -P:: --perl-regexp:: - Use Perl-compatible regexp for patterns. Requires libpcre to be - compiled in. + Use Perl-compatible regular expressions for patterns. Uses the + PCRE library, which Git needs to be compiled against. -F:: --fixed-strings:: diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt index 5b7fa49a7f..565cde636e 100644 --- a/Documentation/rev-list-options.txt +++ b/Documentation/rev-list-options.txt @@ -93,8 +93,9 @@ endif::git-rev-list[] -P:: --perl-regexp:: - Consider the limiting patterns to be Perl-compatible regular expressions. - Requires libpcre to be compiled in. + Consider the limiting patterns to be Perl-compatible regular + expressions. Uses the PCRE library, which Git needs to be + compiled against. --remove-empty:: Stop when a given path disappears from the tree. -- 2.11.0
[PATCH 08/12] grep: make grep.patternType=[pcre|pcre1] a synonym for "perl"
Make the pattern types "pcre" & "pcre1" synonyms for long-standing "perl" grep.patternType. This change is part of a longer patch series to add pcre2 support to Git. It's nice to be able to performance test PCRE v1 v.s. v2 without having to recompile git, and doing that via grep.patternType makes sense. However, just adding "pcre2" when we only have "perl" would be confusing, so start by adding a "pcre" & "pcre1" synonym. In the future "perl" and "pcre" might be changed to default to "pcre2" instead of "pcre1", and depending on how Git is compiled the more specific "pcre1" or "pcre2" pattern types might produce an error. Signed-off-by: Ævar Arnfjörð Bjarmason--- Documentation/config.txt | 9 + grep.c | 4 +++- t/t7810-grep.sh | 10 ++ 3 files changed, 22 insertions(+), 1 deletion(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 475e874d51..5ef12d0694 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1624,6 +1624,15 @@ grep.patternType:: 'fixed', or 'perl' will enable the `--basic-regexp`, `--extended-regexp`, `--fixed-strings`, or `--perl-regexp` option accordingly, while the value 'default' will return to the default matching behavior. ++ +The 'pcre' and 'pcre1' values are synonyms for 'perl'. The other +values starting with 'pcre' are reserved for future use, e.g. if we'd +like to use 'pcre2' for the PCRE v2 library. ++ +In the future 'perl' and 'pcre' might become synonyms for some other +implementation or PCRE version, such as 'pcre2', while the more +specific 'pcre1' & 'pcre2' might throw errors depending on whether git +is compiled to include those libraries. grep.extendedRegexp:: If set to true, enable `--extended-regexp` option by default. This diff --git a/grep.c b/grep.c index 8564fe726d..1575f8f9ed 100644 --- a/grep.c +++ b/grep.c @@ -60,7 +60,9 @@ static int parse_pattern_type_arg(const char *opt, const char *arg) return GREP_PATTERN_TYPE_ERE; else if (!strcmp(arg, "fixed")) return GREP_PATTERN_TYPE_FIXED; - else if (!strcmp(arg, "perl")) + else if (!strcmp(arg, "perl") || +!strcmp(arg, "pcre") || +!strcmp(arg, "pcre1")) return GREP_PATTERN_TYPE_PCRE; die("bad %s argument: %s", opt, arg); } diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh index 83b0ee53be..b50f1dff43 100755 --- a/t/t7810-grep.sh +++ b/t/t7810-grep.sh @@ -1522,4 +1522,14 @@ test_expect_success 'grep with thread options' ' test_must_fail git -c grep.threads=1 grep --threads=-1 st.*dio ' +test_expect_success LIBPCRE "grep with grep.patternType synonyms perl/pcre/pcre1" ' + echo "#include " >expected && + git -c grep.patternType=perl grep -h --no-line-number "st(?=dio)" >actual && + test_cmp expected actual && + git -c grep.patternType=pcre grep -h --no-line-number "st(?=dio)" >actual && + test_cmp expected actual && + git -c grep.patternType=pcre1 grep -h --no-line-number "st(?=dio)" >actual && + test_cmp expected actual +' + test_done -- 2.11.0
[PATCH 03/12] Makefile & configure: reword outdated comment about PCRE
Reword an outdated comment which suggests that only git-grep can use PCRE. This comment was added back when PCRE support was initially added in commit 63e7e9d8b6 ("git-grep: Learn PCRE", 2011-05-09), and was true at the time. It hasn't been telling the full truth since git-log learned to use PCRE with --grep in commit 727b6fc3ed ("log --grep: accept --basic-regexp and --perl-regexp", 2012-10-03), and more importantly is likely to get more inaccurate over time as more use is made of PCRE in other areas. Reword it to be more future-proof, and to more clearly explain that this enables user-initiated runtime behavior. Signed-off-by: Ævar Arnfjörð Bjarmason--- Makefile | 6 -- configure.ac | 12 2 files changed, 12 insertions(+), 6 deletions(-) diff --git a/Makefile b/Makefile index 9b36068ac5..23945d87cf 100644 --- a/Makefile +++ b/Makefile @@ -24,8 +24,10 @@ all:: # Define NO_OPENSSL environment variable if you do not have OpenSSL. # This also implies BLK_SHA1. # -# Define USE_LIBPCRE if you have and want to use libpcre. git-grep will be -# able to use Perl-compatible regular expressions. +# Define USE_LIBPCRE if you have and want to use libpcre. Various +# commands such as like log, grep offer runtime options to use +# Perl-compatible regular expressions instead of standard or extended +# POSIX regular expressions. # # Define LIBPCREDIR=/foo/bar if your libpcre header and library files are in # /foo/bar/include and /foo/bar/lib directories. diff --git a/configure.ac b/configure.ac index 128165529f..d09a204a7e 100644 --- a/configure.ac +++ b/configure.ac @@ -250,8 +250,10 @@ AS_HELP_STRING([--with-openssl],[use OpenSSL library (default is YES)]) AS_HELP_STRING([], [ARG can be prefix for openssl library and headers]), GIT_PARSE_WITH([openssl])) -# Define USE_LIBPCRE if you have and want to use libpcre. git-grep will be -# able to use Perl-compatible regular expressions. +# Define USE_LIBPCRE if you have and want to use libpcre. Various +# commands such as like log, grep offer runtime options to use +# Perl-compatible regular expressions instead of standard or extended +# POSIX regular expressions. # # Define LIBPCREDIR=/foo/bar if your libpcre header and library files are in # /foo/bar/include and /foo/bar/lib directories. @@ -499,8 +501,10 @@ GIT_CONF_SUBST([NEEDS_SSL_WITH_CRYPTO]) GIT_CONF_SUBST([NO_OPENSSL]) # -# Define USE_LIBPCRE if you have and want to use libpcre. git-grep will be -# able to use Perl-compatible regular expressions. +# Define USE_LIBPCRE if you have and want to use libpcre. Various +# commands such as like log, grep offer runtime options to use +# Perl-compatible regular expressions instead of standard or extended +# POSIX regular expressions. # if test -n "$USE_LIBPCRE"; then -- 2.11.0
[PATCH 01/12] grep: add ability to disable threading with --threads=0 or grep.threads=0
Add the ability to entirely disable threading by having grep.threads=0 in the config or --threads=0 on the command-line. This was made configurable in commit 89f09dd34e ("grep: add --threads= option and grep.threads configuration", 2015-12-15). Before that change there was no way to disable threaded grep other than to recompile Git. It's very useful for testing & debugging to be able to entirely disable threading without recompiling with NO_PTHREADS=YesPlease, so support setting the value to 0 to disable threading. There was no reason this wasn't the case already other than an implementation detail in how OPT_INTEGER() works. When it's used there's no way to tell the difference between an unset value & the default value. Use OPT_CALLBACK() instead using the same pattern as in commit b16a991c1b ("cherry-pick: detect bogus arguments to --mainline", 2017-03-15). Signed-off-by: Ævar Arnfjörð Bjarmason--- Documentation/git-grep.txt | 4 ++-- builtin/grep.c | 26 ++ t/t7810-grep.sh| 10 ++ 3 files changed, 34 insertions(+), 6 deletions(-) diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index 71f32f3508..7b52e3fbc4 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -56,8 +56,8 @@ grep.extendedRegexp:: other than 'default'. grep.threads:: - Number of grep worker threads to use. If unset (or set to 0), - 8 threads are used by default (for now). + Number of grep worker threads to use. If unset, 8 threads are + used by default (for now). Set to 0 to disable threading. grep.fullName:: If set to true, enable `--full-name` option by default. diff --git a/builtin/grep.c b/builtin/grep.c index 65070c52fc..9478ab5dff 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -35,7 +35,7 @@ static int grep_submodule_launch(struct grep_opt *opt, const struct grep_source *gs); #define GREP_NUM_THREADS_DEFAULT 8 -static int num_threads; +static int num_threads = -1; #ifndef NO_PTHREADS static pthread_t *threads; @@ -897,6 +897,24 @@ static int context_callback(const struct option *opt, const char *arg, return 0; } +static int thread_callback(const struct option *opt, + const char *arg, int unset) +{ + int *threads = (int*)opt->value; + char *end; + + if (unset) { + *threads = GREP_NUM_THREADS_DEFAULT; + return 0; + } + + *threads = strtol(arg, , 10); + if (*end || *threads < 0) + return opterror(opt, "invalid number of threads specified", 0); + + return 0; +} + static int file_callback(const struct option *opt, const char *arg, int unset) { struct grep_opt *grep_opt = opt->value; @@ -1049,8 +1067,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix) N_("show context lines before matches")), OPT_INTEGER('A', "after-context", _context, N_("show context lines after matches")), - OPT_INTEGER(0, "threads", _threads, - N_("use worker threads")), + OPT_CALLBACK(0, "threads", _threads, N_("n"), + N_("use worker threads"), thread_callback), OPT_NUMBER_CALLBACK(, N_("shortcut for -C NUM"), context_callback), OPT_BOOL('p', "show-function", , @@ -1222,7 +1240,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix) #ifndef NO_PTHREADS if (list.nr || cached || show_in_pager) num_threads = 0; - else if (num_threads == 0) + else if (num_threads == -1) num_threads = GREP_NUM_THREADS_DEFAULT; else if (num_threads < 0) die(_("invalid number of threads specified (%d)"), num_threads); diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh index cee42097b0..53c2ca05c4 100755 --- a/t/t7810-grep.sh +++ b/t/t7810-grep.sh @@ -1505,4 +1505,14 @@ test_expect_success 'grep does not report i-t-a and assume unchanged with -L' ' test_cmp expected actual ' +test_expect_success 'grep with thread options' ' + git -c grep.threads=4 grep st.*dio && + git grep --threads=4 st.*dio && + git -c grep.threads=4 grep --threads=6 st.*dio && + test_must_fail git -c grep.threads=-1 grep st.*dio && + test_must_fail git -c grep.threads=-1 grep --threads=-1 st.*dio && + test_must_fail git -c grep.threads=-1 grep --threads=1 st.*dio && + test_must_fail git -c grep.threads=1 grep --threads=-1 st.*dio +' + test_done -- 2.11.0
[PATCH 00/12] PCREv2 & more
This adds PCRE v2 support, but as I was adding that I kept noticing other related problems to fix. It's all bundled up into the same series because much of it conflicts because it modifies the same test or other code. Notes on each patch below. Ævar Arnfjörð Bjarmason (12): grep: add ability to disable threading with --threads=0 or grep.threads=0 This really has nothing to do with the rest except I'm using it to test non-multithreaded & threaded PCRE more easily. grep: remove redundant regflags assignment under PCRE Makefile & configure: reword outdated comment about PCRE Just some trivial cleanups. grep: add a test for backreferences in PCRE patterns log: add exhaustive tests for pattern style options & config Yay, more tests! log: add -P as a synonym for --perl-regexp We've had --perl-regexp for years, but not -P like grep, add it. grep & rev-list doc: stop promising libpcre for --perl-regexp grep: make grep.patternType=[pcre|pcre1] a synonym for "perl" test-lib: rename the LIBPCRE prerequisite to PCRE grep: change the internal PCRE macro names to be PCRE1 grep: change the internal PCRE code & header names to be PCRE1 grep: add support for PCRE v2 These combined add the support for PCRE 2. It's split up for ease of readability. The last one's still a bit big, and I could e.g. split up all the Makefile/autoconf stuff into a different patch (which wouldn't do anything without the code), but I thought on balance doing it this way made the most sense. Documentation/config.txt | 7 ++ Documentation/git-grep.txt | 8 +- Documentation/rev-list-options.txt | 6 +- Makefile | 28 +- builtin/grep.c | 26 +- configure.ac | 61 ++-- grep.c | 184 ++--- grep.h | 26 -- revision.c | 2 +- t/README | 16 +++- t/t4202-log.sh | 76 ++- t/t7810-grep.sh| 79 +--- t/t7812-grep-icase-non-ascii.sh| 4 +- t/t7813-grep-icase-iso.sh | 11 ++- t/test-lib.sh | 4 +- 15 files changed, 456 insertions(+), 82 deletions(-) -- 2.11.0
[PATCH] push: document & test --force-with-lease with multiple remotes
Document & test for cases where there are two remotes pointing to the same URL, and a background fetch & subsequent `git push --force-with-lease` shouldn't clobber un-updated references we haven't fetched. Some editors like Microsoft's VSC have a feature to auto-fetch in the background, this bypasses the protections offered by --force-with-lease as noted in the documentation being added here. See the 'Tools that do an automatic fetch defeat "git push --force-with-lease"' (<1491617750.2149.10.ca...@mattmccutchen.net>) git mailing list thread for more details. Jakub Narębski suggested this method of adding another remote to bypass this edge case, document that & add a test for it. Signed-off-by: Ævar Arnfjörð Bjarmason--- On Sat, Apr 8, 2017 at 11:29 AM, Jeff King wrote: > On Sat, Apr 08, 2017 at 09:35:04AM +0200, Ævar Arnfjörð Bjarmason wrote: > >> Is it correct that you'd essentially want something that works like: >> >> git push --force-with-lease=master:master origin master:master > > I don't think that would do anything useful. It would reject any push > where the remote "master" is not the same as your own master. And of > course if they _are_ the same, then the push is a noop. > Yeah my whole suggestion is obviously dumb & useless. But I liked Jakub's suggestion to work around this, so here's docs & a test for that. According to my eyeballing of the MS VSC code this should work, i.e. it seems to do a 'fetch' here, not a 'fetch --all': https://github.com/Microsoft/vscode/blob/master/src/vs/workbench/parts/git/node/git.lib.ts#L505 Of course another way is to just disable autofetching: https://github.com/Microsoft/vscode/blob/535a3de60023c81d75d0eac22044284f07dbcddf/extensions/git/src/autofetch.ts#L27 But having two remotes allows you to have your cake & eat it too without all the hassle of tag creation, which I've added to the docs though for completeness. Documentation/git-push.txt | 37 + t/t5533-push-cas.sh| 29 + 2 files changed, 66 insertions(+) diff --git a/Documentation/git-push.txt b/Documentation/git-push.txt index 1624a35888..2f2e9c078b 100644 --- a/Documentation/git-push.txt +++ b/Documentation/git-push.txt @@ -210,6 +210,43 @@ or we do not even have to have such a remote-tracking branch when this form is used). If `` is the empty string, then the named ref must not already exist. + +This option interacts very badly with anything that implicitly runs +`git fetch` on the remote to be pushed to in the background. The +protection it offers over `--force` is ensuring that subsequent +changes your work wasn't based on aren't clobbered, but this is +trivially defeated if some background process is updating refs in the +background. We don't have anything except the remote tracking info to +go by as a heuristic for refs you're expected to have seen & are +willing to clobber. ++ +If your editor or some other system is running `git fetch` in the +background for you a way to mitigate this is to simply set up another +remote: ++ + git remote add origin-push $(git config remote.origin.url) + git fetch origin-push ++ +Now when the background process runs `git fetch origin` the references +on `origin-push` won't be updated, and thus commands like: ++ + git push --force-with-lease origin ++ +Will fail unless you manually run `git fetch origin-push`. This method +is of course entirely defeated by something that runs `git fetch +--all`, in that case you'd need to either disable it or do something +more tedious like: ++ + git fetch ;# update 'master' from remote + git tag base master;# mark our base point + git rebase -i master ;# rewrite some commits + git push --force-with-lease=master:base master:master ++ +I.e. create a `base` tag for versions of the upstream code that you've +seen and are willing to overwrite, then rewrite history, and finally +force push changes to `master` if the remote version is still at +`base`, regardless of what your local `remotes/origin/master` has been +updated to in the background. ++ Note that all forms other than `--force-with-lease=:` that specifies the expected current value of the ref explicitly are still experimental and their semantics may change as we gain experience diff --git a/t/t5533-push-cas.sh b/t/t5533-push-cas.sh index a2c9e7439f..d38ecee217 100755 --- a/t/t5533-push-cas.sh +++ b/t/t5533-push-cas.sh @@ -229,4 +229,33 @@ test_expect_success 'new branch already exists' ' ) ' +test_expect_success 'background updates of REMOTE can be mitigated with a non-updated REMOTE-push' ' + rm -rf src dst && + git init --bare src.bare && + test_when_finished "rm -rf src.bare" && + git clone --no-local src.bare dst && + test_when_finished "rm -rf dst" && + ( + cd dst && + test_commit G && + git remote
Re: [PATCH] submodule: prevent backslash expantion in submodule names
On Fri, Apr 07, 2017 at 10:23:06AM -0700, Brandon Williams wrote: > When attempting to add a submodule with backslashes in its name 'git > submodule' fails in a funny way. We can see that some of the > backslashes are expanded resulting in a bogus path: > > git -C main submodule add ../sub\\with\\backslash > fatal: repository '/tmp/test/sub\witackslash' does not exist > fatal: clone of '/tmp/test/sub\witackslash' into submodule path > > To solve this, convert calls to 'read' to 'read -r' in git-submodule.sh > in order to prevent backslash expantion in submodule names. This looks sane overall, without digging into the individual read calls. The reason I mentioned escaping earlier is I wondered what would happen when the submodule starts with a double-quote, or has a newline in the name. Git's normal quoting would include backslash escape sequences, and I wondered if we might be relying on any of these "read" calls to interpret them. But I don't think so, for two reasons. One, because that quoting also puts double-quotes around the name. So plain "read" would not be sufficient to de-quote for us anyway. And two, because these are being fed from "submodule--helper", which does not seem to quote in the first place. So I think your patch is fine there. But it does raise a few concerns. It looks like git-submodule does not cope well with exotic filenames: $ git submodule add /some/repo "$(printf 'sub with\nnewline')" Cloning into '/home/peff/tmp/sub with newline'... done. error: invalid key (newline): submodule.sub with newline.url error: invalid key (newline): submodule.sub with newline.path Failed to register submodule 'sub with newline' I'm not too worried about that. It's a nonsense request, and our config format has no syntactic mechanism to represent that key. So tough luck. But what I am more worried about is: $ git submodule--helper list 16 576053ed5ad378490974fabe97e4bd59633d2d1e 0 sub with newline That's obviously nonsense that git-submodule.sh is going to choke on. But what happens when the filename is: foo\n16000 0\t../../escaped or something. Can a malicious repository provoke git-submodule.sh to look at or modify files outside the repository? -Peff
Re: [PATCH v6 0/3] read-cache: speed up add_index_entry
On Fri, Apr 07, 2017 at 02:27:24PM -0400, Jeff Hostetler wrote: > > Just thinking about this algorithmically for a moment. You're saving the > > binary search when the input is given in sorted order. But in other > > cases you're adding an extra strcmp() before the binary search begins. > > So it's a tradeoff. > > > > How often is the input sorted? You save O(log n) strcmps for a "hit" > > with your patch, and one for a "miss". So it's a net win if we expect at > > least 1/log(n) of additions to be sorted (I'm talking about individual > > calls, but it should scale linearly either way over a set of n calls). > > > > I have no clue if that's a reasonable assumption or not. > > I was seeing checkout call merge_working_tree to iterate over the > source index/trees and call add_index_entry() for each. For example, > in a "checkout -b" like operation where both sides are the same, this > calls keep_entry() which appends the entry to the new index array. > The append path should always be taken because the iteration is being > driven from a sorted list. > > I would think calls to add/stage individual files arrive in random > order, so I'm not suggesting replacing the code -- just checking the > end first. Right, what I was wondering is how much this costs in those random-order cases. We _know_ it speeds up the cases you care about, but I want to make sure that it is not making some other case worse. How often do the random-order cases come up, and how much are they slowed? I suspect in practice that calls here fall into one of two camps: feeding a small-ish (compared to the total number of entries) set of paths, or feeding _every_ path. And if you are feeding every path, you are likely to do so in sorted order, rather than a random jumble. So it helps in the big cases, and the small cases are presumably small enough that we don't care much. At least that seems like a plausible line of reasoning to me. ;) -Peff
Re: [PATCH v7 2/3] p0004-read-tree: perf test to time read-tree
On Fri, Apr 07, 2017 at 09:20:46PM +, g...@jeffhostetler.com wrote: > diff --git a/t/perf/p0004-read-tree.sh b/t/perf/p0004-read-tree.sh > new file mode 100755 > index 000..a70e969 I think p0004 is taken by your lazy-init-name-hash script already (which, btw, I think is missing its executable bit). > +## usage: dir depth width files > +fill_index() { > + awk -v arg_dir=$1 -v arg_depth=$2 -v arg_width=$3 -v arg_files=$4 ' > + function make_paths(dir, depth, width, files, f, w) { > + for (f = 1; f <= files; f++) { > + print dir "/file" f > + } > + if (depth > 0) { > + for (w = 1; w <= width; w++) { > + make_paths(dir "/dir" w, depth - 1, > width, files) > + } > + } > + } > + END { make_paths(arg_dir, arg_depth, arg_width, arg_files) } > +' + sed "s/^/100644 $EMPTY_BLOB /" | > + git update-index --index-info > + return 0 > +} I saw some discussion earlier on testing synthetic versus real repositories. The original point of the perf test suite was to find performance regressions between versions. So periodically you'd run: cd t/perf ./run v2.10.0 HEAD and make sure that nothing got inexplicably slower. And for that, using real repositories is nice, because it's showing real user-impacting performance changes, not synthetic benchmarks. In an ideal world, people run it against their own real repositories and can report back to the list when they see a problem. So running the whole suite against your monstrous Windows repo would be a nice benchmark to do periodically (I shudder to think how long it might take to run, though). Of course, perf scripts are also a nice way to show off your improvements. And synthetic repos can often exaggerate the improvement (which is sometimes good to see changes, but also sometimes bad if real-world repos don't show the improvement). And they also serve as a reproduction case for people who don't necessarily have access to the extreme repo that motivated the test in the first place. But one side effect of writing the perf test the way you have it here is that you _can't_ easily run it against a real repo. I think the perf suite could provide better tools for this. You can already say "run this test against repo X" with GIT_PERF_REPO. But there's no way to say "create a synthetic repo with parameters X,Y,Z, and run against that". IOW, I think rather than having the perf-scripts themselves create the synthetic repo, we should have a _library_ of synthetic repos that the test-runners can choose from. I'm imagining a world where your repo setup goes into t/perf/repos/many-files.sh (which could even take your depth, width, and files parameters to allow experimenting), and then you could run "GIT_PERF_REPO=many-files ./p0004-read-tree.sh". > +br_base=xxx_base_xxx > +br_work1=xxx_work1_xxx > +br_work2=xxx_work2_xxx > +br_work3=xxx_work3_xxx FWIW, I really dislike the extra level of indirection here. You still have to consistently say $br_base everywhere. Why not just call the branch "br_base" in the first place? My experience has been that debugging tests is much easier when as little state is carried in the environment as possible. Because it's quite often reasonable to do: ./t1234-whatever.sh -v -i cd trash* git cmd-that-failed where you can pick the final line out from the failed test output. When the command involves $br_base, I have to dig into the script to find out what's in that variable. I know that perf tests need less debugging than the regular regression tests, but I'd hate to see this pattern get applied liberally. -Peff
Re: [PATCH v7 1/3] read-cache: add strcmp_offset function
On Fri, Apr 07, 2017 at 09:20:45PM +, g...@jeffhostetler.com wrote: > diff --git a/t/helper/test-strcmp-offset.c b/t/helper/test-strcmp-offset.c > new file mode 100644 > index 000..03e1eef > --- /dev/null > +++ b/t/helper/test-strcmp-offset.c > @@ -0,0 +1,64 @@ > +#include "cache.h" > + > +struct test_data { > + const char *s1; > + const char *s2; > + size_t expected_first_change; /* or strlen() when equal */ > +}; > + > +static struct test_data data[] = { > + { "abc", "abc", 3 }, > + { "abc", "def", 0 }, > + > + { "abc", "abz", 2 }, > + > + { "abc", "abcdef", 3 }, > + > + { "abc\xF0zzz", "abc\xFFzzz", 3 }, > + > + { NULL, NULL, 0 } > +}; I've been thinking a bit on the comments on earlier rounds regarding C tests. I think in the early days, we generally had plumbing that exposed the C interfaces in a pretty transparent way. I.e., it was reasonable to unit-test update-index, because you pretty much know how input to it will map to the code and data structures used. These days we have more C infrastructure, and it's sometimes hard to tickle the exact inputs to those modules through plumbing commands. So I don't really object to adding module-specific helpers that make it easy to unit test these underlying modules. I'm not sure how I feel about sticking test data in the helpers, though. A higher level language like shell is actually pretty good for passing data around. Passing in the input makes it much easier to prod a helper with new test cases, see its output, run it in a debugger for a specific case, etc. It also integrates better with our test suite. For instance, here: > + if (r_tst_sign != r_exp_sign) { > + error("FAIL: '%s' vs '%s', result expect %d, observed %d\n", > + sa, sb, r_exp_sign, r_tst_sign); > + failed = 1; > + } > + > + if (offset != expected_first_change) { > + error("FAIL: '%s' vs '%s', offset expect %lu, observed %lu\n", > + sa, sb, expected_first_change, offset); > + failed = 1; > + } > + > + return failed; > +} You're essentially rebuilding a test harness just for this one function, and the regular test harness only knows "did anything fail", and nothing about specific tests. Perhaps something like: t/helper/test-strcmp-offset.c | 66 +++ t/t0065-strcmp-offset.sh | 17 +++-- 2 files changed, 26 insertions(+), 57 deletions(-) diff --git a/t/helper/test-strcmp-offset.c b/t/helper/test-strcmp-offset.c index 03e1eef8a..1fdf4d137 100644 --- a/t/helper/test-strcmp-offset.c +++ b/t/helper/test-strcmp-offset.c @@ -1,64 +1,22 @@ #include "cache.h" -struct test_data { - const char *s1; - const char *s2; - size_t expected_first_change; /* or strlen() when equal */ -}; - -static struct test_data data[] = { - { "abc", "abc", 3 }, - { "abc", "def", 0 }, - - { "abc", "abz", 2 }, - - { "abc", "abcdef", 3 }, - - { "abc\xF0zzz", "abc\xFFzzz", 3 }, - - { NULL, NULL, 0 } -}; - -int try_pair(const char *sa, const char *sb, size_t expected_first_change) +int cmd_main(int argc, const char **argv) { - int failed = 0; - int r_exp, r_tst, r_exp_sign, r_tst_sign; + int result; size_t offset; + if (!argv[1] || !argv[2]) + die("usage: %s ", argv[0]); + + result = strcmp_offset(argv[1], argv[2], ); + /* * Because differnt CRTs behave differently, only rely on signs * of the result values. */ - r_exp = strcmp(sa, sb); - r_exp_sign = ((r_exp < 0) ? -1 : ((r_exp == 0) ? 0 : 1)); - - r_tst = strcmp_offset(sa, sb, ); - r_tst_sign = ((r_tst < 0) ? -1 : ((r_tst == 0) ? 0 : 1)); - - if (r_tst_sign != r_exp_sign) { - error("FAIL: '%s' vs '%s', result expect %d, observed %d\n", - sa, sb, r_exp_sign, r_tst_sign); - failed = 1; - } - - if (offset != expected_first_change) { - error("FAIL: '%s' vs '%s', offset expect %lu, observed %lu\n", - sa, sb, expected_first_change, offset); - failed = 1; - } - - return failed; -} - -int cmd_main(int argc, const char **argv) -{ - int failed = 0; - int k; - - for (k=0; data[k].s1; k++) { - failed += try_pair(data[k].s1, data[k].s2, data[k].expected_first_change); - failed += try_pair(data[k].s2, data[k].s1, data[k].expected_first_change); - } - - return failed; + result = result < 0 ? -1 : +result > 0 ? 1 : +0; + printf("%d %"PRIuMAX"\n", result, (uintmax_t)offset); + return 0; } diff --git a/t/t0065-strcmp-offset.sh b/t/t0065-strcmp-offset.sh index 0176c8c92..8c167d24b 100755 --- a/t/t0065-strcmp-offset.sh +++ b/t/t0065-strcmp-offset.sh @@ -4,8 +4,19 @@
Re: Tools that do an automatic fetch defeat "git push --force-with-lease"
W dniu 08.04.2017 o 11:29, Jeff King pisze: [...] > Perhaps it would be enough to reset the markers whenever the ref is > pushed. I haven't thought it through well enough to know whether that > just hits more corner cases. I wonder if using a separate remote for pushing (with separate remote- -tracking branches) would be a good solution for this problem... -- Jakub Narębski
Re: [PATCH v7 3/3] read-cache: speed up add_index_entry during checkout
On Fri, Apr 07, 2017 at 09:20:47PM +, g...@jeffhostetler.com wrote: > This helps performance on very large repositories. > > > Before and after numbers on index with 1M files > ./p0004-read-tree.sh > 0004.2: read-tree work1 (1003037) 3.21(2.54+0.62) > 0004.3: switch base work1 (3038 1003037) 7.49(5.39+1.84) > 0004.5: switch work1 work2 (1003037) 11.91(8.38+3.00) > 0004.6: switch commit aliases (1003037)12.22(8.30+3.06) > > ./p0004-read-tree.sh > 0004.2: read-tree work1 (1003040) 2.40(1.65+0.73) > 0004.3: switch base work1 (3041 1003040) 6.07(4.12+1.66) > 0004.5: switch work1 work2 (1003040) 10.23(6.76+2.92) > 0004.6: switch commit aliases (1003040)10.53(6.97+2.83) > By the way, you may want to try: $ cd t/perf $ ./run HEAD^ HEAD p0004-read-tree.sh which gives you the before/after in a nice table, with percentage changes: Test HEAD^ HEAD --- 0004.2: read-tree work1 (1003065) 2.34(1.90+0.42) 1.91(1.51+0.38) -18.4% 0004.3: switch base work1 (3066 1003065) 5.12(4.14+0.96) 4.45(3.55+0.88) -13.1% 0004.5: switch work1 work2 (1003065) 8.55(6.63+1.87) 7.78(5.76+2.00) -9.0% 0004.6: switch commit aliases (1003065)8.59(6.75+1.80) 7.64(5.92+1.70) -11.1% The results are stored for each tested version, so you can re-run just a single test and then re-output the results with "./aggregate.perl HEAD^ HEAD p0004-read-tree.sh". The "run" script obviously builds each version behind the scenes, so you probably also want to set GIT_PERF_MAKE_OPTS as appropriate (at the very least "-j16" makes it more pleasant). -Peff
Re: Tools that do an automatic fetch defeat "git push --force-with-lease"
On Sat, Apr 08, 2017 at 01:25:43AM -0700, Jacob Keller wrote: > On Fri, Apr 7, 2017 at 7:15 PM, Matt McCutchenwrote: > > When I'm rewriting history, "git push --force-with-lease" is a nice > > safeguard compared to "git push --force", but it still assumes the > > remote-tracking ref gives the old state the user wants to overwrite. > > Tools that do an implicit fetch, assuming it to be a safe operation, > > may break this assumption. In the worst case, Visual Studio Code does > > an automatic fetch every 3 minutes by default [1], making > > --force-with-lease pretty much reduce to --force. > > > > Isn't the point of force-with-lease to actually record a "commit" id, > and not pass it a branch name, but actually the sha1 you intend the > remote server to be at? Sure if you happen to pass it a branch or > remote name it will interpret it for yuou, but you should be able to > do something like > > current=$(git rev-parse origin/branch) > > git push --force-with-lease=$current > > and this will work regardless of when if if you fetch in between? That's definitely the _best way to do it (modulo using "branch:$current" in the final command). I think Matt's point is just that the default, to use origin/branch, is unsafe. It's convenient when you don't have extra fetches, but that convenience may not be worth the potential surprise. -Peff
Re: Tools that do an automatic fetch defeat "git push --force-with-lease"
On Sat, Apr 08, 2017 at 09:35:04AM +0200, Ævar Arnfjörð Bjarmason wrote: > Is it correct that you'd essentially want something that works like: > > git push --force-with-lease=master:master origin master:master I don't think that would do anything useful. It would reject any push where the remote "master" is not the same as your own master. And of course if they _are_ the same, then the push is a noop. > I haven't used this feature but I'm surprised it works the way it > does, as you point out just having your remote refs updated isn't a > strong signal for wanting to clobber whatever that ref points to. The point of the --force-with-lease feature is that you would mark a point in time where you started some rewind-y operation (like a rebase), and at the end you would want to make sure nobody had moved the remote ref when you push over it (which needs to be a force because of the rewind). So the best way to use it is something like: git fetch ;# update 'master' from remote git tag base master;# mark our base point git rebase -i master ;# rewrite some commits git push --force-with-lease=master:base master:master That final operation will fail if somebody else pushed in the meantime. But obviously this workflow is a pain, because you have to manually mark the start of the unsafe operation with a tag. If you haven't fetched in the meantime, then origin/master is a good approximation of "base". But if you have fetched, then it is worthless. It would be nice if we could automatically deduce the real value of base. I don't think we could do it in a foolproof way, but I wonder if we could come close in some common circumstances. For instance, imagine that unsafe operations like rebase would note that "master" has an upstream of "origin/master", and would record a note saying "we took a lease for origin/master at sha1 X". One trouble with that is that you may perform several unsafe operations. For example, imagine it takes you multiple rebases to achieve your final state: git fetch git rebase -i master git rebase -i master git push --force-with-lease=master and that --force-with-lease now defaults to whatever lease-marker is left by rebase. Which marker should it respect? If the second one, then it's unsafe. But if the first one, then how do we deal with stale markers? Perhaps it would be enough to reset the markers whenever the ref is pushed. I haven't thought it through well enough to know whether that just hits more corner cases. -Peff
Re: Tools that do an automatic fetch defeat "git push --force-with-lease"
On Fri, Apr 7, 2017 at 7:15 PM, Matt McCutchenwrote: > When I'm rewriting history, "git push --force-with-lease" is a nice > safeguard compared to "git push --force", but it still assumes the > remote-tracking ref gives the old state the user wants to overwrite. > Tools that do an implicit fetch, assuming it to be a safe operation, > may break this assumption. In the worst case, Visual Studio Code does > an automatic fetch every 3 minutes by default [1], making > --force-with-lease pretty much reduce to --force. > Isn't the point of force-with-lease to actually record a "commit" id, and not pass it a branch name, but actually the sha1 you intend the remote server to be at? Sure if you happen to pass it a branch or remote name it will interpret it for yuou, but you should be able to do something like current=$(git rev-parse origin/branch) git push --force-with-lease=$current and this will work regardless of when if if you fetch in between? Thanks, Jake
Re: Tools that do an automatic fetch defeat "git push --force-with-lease"
On Sat, Apr 8, 2017 at 4:15 AM, Matt McCutchenwrote: > When I'm rewriting history, "git push --force-with-lease" is a nice > safeguard compared to "git push --force", but it still assumes the > remote-tracking ref gives the old state the user wants to overwrite. > Tools that do an implicit fetch, assuming it to be a safe operation, > may break this assumption. In the worst case, Visual Studio Code does > an automatic fetch every 3 minutes by default [1], making > --force-with-lease pretty much reduce to --force. > > For a safer workflow, "git push" would check against a separate "old" > ref that isn't updated by "git fetch", but is updated by "git push" the > same way the remote-tracking ref is and maybe also by commands that > update the local branch to take into account remote changes (I'm not > sure what reasonable scenarios there are, if any). Has there been any > work on this? I can write a wrapper script for the simple case of "git > push" of a single branch to the same branch on a remote, which is > pretty much all I use right now, but a native implementation would > support all of the command-line usage forms, which would be nice. > > Thanks for reading! > > [1] > https://github.com/Microsoft/vscode/blob/535a3de60023c81d75d0eac22044284f07dbcddf/extensions/git/src/autofetch.ts Is it correct that you'd essentially want something that works like: git push --force-with-lease=master:master origin master:master As opposed to the current: git push --force-with-lease=master:origin/master origin master:master Which is how the default: git push --force-with-lease Works now, assuming you're on the master branch with correct tracking info. I haven't used this feature but I'm surprised it works the way it does, as you point out just having your remote refs updated isn't a strong signal for wanting to clobber whatever that ref points to. Junio implemented this in v1.8.3.2-736-g28f5d17611 & noted in that commit that the current semantics may not be a sensible default. I think looking at the local ref instead of the remote tracking ref would be a better default.
Re: Tools that do an automatic fetch defeat "git push --force-with-lease"
Matt McCutchenwrote: > When I'm rewriting history, "git push --force-with-lease" is a nice > safeguard compared to "git push --force", but it still assumes the > remote-tracking ref gives the old state the user wants to overwrite. > Tools that do an implicit fetch, assuming it to be a safe operation, > may break this assumption. In the worst case, Visual Studio Code does > an automatic fetch every 3 minutes by default [1], making > --force-with-lease pretty much reduce to --force. That's a big problem, but even without such tools, I find --force-with-lease without an argument to be pretty limited in usefulness. I like to type "git fetch" myself regularly, just to see what's new upstream before integrating it; this already breaks it. But even avoiding "git fetch" doesn't help if you are working on more than one branch at a time, because doing "git pull" on one branch will do an implicit "git fetch" on the other. I like the idea of git maintaining a separate "last integrated" commit for each branch, I think this could solve it in a nice way. I'm probably not qualified enough to work on this myself though, but I'm happy to give input if someone else wants to. -- Stefan Haller Berlin, Germany http://www.haller-berlin.de/