[PATCH 1/4] submodule update: add regression test with old style setups
As f178c13fda (Revert "Merge branch 'sb/submodule-core-worktree'", 2018-09-07) was produced shortly before a release, nobody asked for a regression test to be included. Add a regression test that makes sure that the invocation of `git submodule update` on old setups doesn't produce errors as pointed out in f178c13fda. The place to add such a regression test may look odd in t7412, but that is the best place as there we setup old style submodule setups explicitly. Signed-off-by: Stefan Beller --- t/t7412-submodule-absorbgitdirs.sh | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/t/t7412-submodule-absorbgitdirs.sh b/t/t7412-submodule-absorbgitdirs.sh index ce74c12da2..1cfa150768 100755 --- a/t/t7412-submodule-absorbgitdirs.sh +++ b/t/t7412-submodule-absorbgitdirs.sh @@ -75,7 +75,12 @@ test_expect_success 're-setup nested submodule' ' GIT_WORK_TREE=../../../nested git -C sub1/.git/modules/nested config \ core.worktree "../../../nested" && # make sure this re-setup is correct - git status --ignore-submodules=none + git status --ignore-submodules=none && + + # also make sure this old setup does not regress + git submodule update --init --recursive >out 2>err && + test_must_be_empty out && + test_must_be_empty err ' test_expect_success 'absorb the git dir in a nested submodule' ' -- 2.20.0.rc2.403.gdbc3b29805-goog
[PATCH 4/4] submodule deinit: unset core.worktree
This re-introduces 984cd77ddb (submodule deinit: unset core.worktree, 2018-06-18), which was reverted as part of f178c13fda (Revert "Merge branch 'sb/submodule-core-worktree'", 2018-09-07) The whole series was reverted as the offending commit e98317508c (submodule: ensure core.worktree is set after update, 2018-06-18) was relied on by other commits such as 984cd77ddb. Keep the offending commit reverted, but its functionality came back via 4d6d6ef1fc (Merge branch 'sb/submodule-update-in-c', 2018-09-17), such that we can reintroduce 984cd77ddb now. Signed-off-by: Stefan Beller --- builtin/submodule--helper.c | 2 ++ t/lib-submodule-update.sh | 2 +- t/t7400-submodule-basic.sh | 5 + 3 files changed, 8 insertions(+), 1 deletion(-) diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c index 31ac30cf2f..672b74db89 100644 --- a/builtin/submodule--helper.c +++ b/builtin/submodule--helper.c @@ -1131,6 +1131,8 @@ static void deinit_submodule(const char *path, const char *prefix, if (!(flags & OPT_QUIET)) printf(format, displaypath); + submodule_unset_core_worktree(sub); + strbuf_release(_rm); } diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh index 51d449..5b56b23166 100755 --- a/t/lib-submodule-update.sh +++ b/t/lib-submodule-update.sh @@ -235,7 +235,7 @@ reset_work_tree_to_interested () { then mkdir -p submodule_update/.git/modules/sub1/modules && cp -r submodule_update_repo/.git/modules/sub1/modules/sub2 submodule_update/.git/modules/sub1/modules/sub2 - GIT_WORK_TREE=. git -C submodule_update/.git/modules/sub1/modules/sub2 config --unset core.worktree + # core.worktree is unset for sub2 as it is not checked out fi && # indicate we are interested in the submodule: git -C submodule_update config submodule.sub1.url "bogus" && diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh index 76a7cb0af7..aba2d4d6ee 100755 --- a/t/t7400-submodule-basic.sh +++ b/t/t7400-submodule-basic.sh @@ -984,6 +984,11 @@ test_expect_success 'submodule deinit should remove the whole submodule section rmdir init ' +test_expect_success 'submodule deinit should unset core.worktree' ' + test_path_is_file .git/modules/example/config && + test_must_fail git config -f .git/modules/example/config core.worktree +' + test_expect_success 'submodule deinit from subdirectory' ' git submodule update --init && git config submodule.example.foo bar && -- 2.20.0.rc2.403.gdbc3b29805-goog
[PATCH 0/4]
A couple days before the 2.19 release we had a bug report about broken submodules[1] and reverted[2] the commits leading up to them. The behavior of said bug fixed itself by taking a different approach[3], specifically by a weaker enforcement of having `core.worktree` set in a submodule [4]. The revert [2] was overly broad as we neared the release, such that we wanted to rather keep the known buggy behavior of always having `core.worktree` set, rather than figuring out how to fix the new bug of having 'git submodule update' not working in old style repository setups. This series re-introduces those reverted patches, with no changes in code, but with drastically changed commit messages, as those focus on why it is safe to re-introduce them instead of explaining the desire for the change. [1] https://public-inbox.org/git/2659750.rG6xLiZASK@twilight [2] f178c13fda (Revert "Merge branch 'sb/submodule-core-worktree'", 2018-09-07) [3] 4d6d6ef1fc (Merge branch 'sb/submodule-update-in-c', 2018-09-17) [4] 74d4731da1 (submodule--helper: replace connect-gitdir-workingtree by ensure-core-worktree, 2018-08-13) Stefan Beller (4): submodule update: add regression test with old style setups submodule: unset core.worktree if no working tree is present submodule--helper: fix BUG message in ensure_core_worktree submodule deinit: unset core.worktree builtin/submodule--helper.c| 4 +++- submodule.c| 14 ++ submodule.h| 2 ++ t/lib-submodule-update.sh | 5 +++-- t/t7400-submodule-basic.sh | 5 + t/t7412-submodule-absorbgitdirs.sh | 7 ++- 6 files changed, 33 insertions(+), 4 deletions(-) -- 2.20.0.rc2.403.gdbc3b29805-goog
[PATCH 2/4] submodule: unset core.worktree if no working tree is present
This reintroduces 4fa4f90ccd (submodule: unset core.worktree if no working tree is present, 2018-06-12), which was reverted as part of f178c13fda (Revert "Merge branch 'sb/submodule-core-worktree'", 2018-09-07). 4fa4f90ccd was reverted as its followup commit was faulty, but without the accompanying change of the followup, we'd have an incomplete workflow of setting `core.worktree` again, when it is needed such as checking out a revision that contains a submodule. So re-introduce that commit as-is, focusing on fixing up the followup Signed-off-by: Stefan Beller Signed-off-by: Junio C Hamano Signed-off-by: Stefan Beller --- submodule.c | 14 ++ submodule.h | 2 ++ t/lib-submodule-update.sh | 3 ++- 3 files changed, 18 insertions(+), 1 deletion(-) diff --git a/submodule.c b/submodule.c index 6415cc5580..d393e947e6 100644 --- a/submodule.c +++ b/submodule.c @@ -1561,6 +1561,18 @@ int bad_to_remove_submodule(const char *path, unsigned flags) return ret; } +void submodule_unset_core_worktree(const struct submodule *sub) +{ + char *config_path = xstrfmt("%s/modules/%s/config", + get_git_common_dir(), sub->name); + + if (git_config_set_in_file_gently(config_path, "core.worktree", NULL)) + warning(_("Could not unset core.worktree setting in submodule '%s'"), + sub->path); + + free(config_path); +} + static const char *get_super_prefix_or_empty(void) { const char *s = get_super_prefix(); @@ -1726,6 +1738,8 @@ int submodule_move_head(const char *path, if (is_empty_dir(path)) rmdir_or_warn(path); + + submodule_unset_core_worktree(sub); } } out: diff --git a/submodule.h b/submodule.h index a680214c01..9e18e9b807 100644 --- a/submodule.h +++ b/submodule.h @@ -131,6 +131,8 @@ int submodule_move_head(const char *path, const char *new_head, unsigned flags); +void submodule_unset_core_worktree(const struct submodule *sub); + /* * Prepare the "env_array" parameter of a "struct child_process" for executing * a submodule by clearing any repo-specific environment variables, but diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh index 016391723c..51d449 100755 --- a/t/lib-submodule-update.sh +++ b/t/lib-submodule-update.sh @@ -709,7 +709,8 @@ test_submodule_recursing_with_args_common() { git branch -t remove_sub1 origin/remove_sub1 && $command remove_sub1 && test_superproject_content origin/remove_sub1 && - ! test -e sub1 + ! test -e sub1 && + test_must_fail git config -f .git/modules/sub1/config core.worktree ) ' # ... absorbing a .git directory along the way. -- 2.20.0.rc2.403.gdbc3b29805-goog
[PATCH 3/4] submodule--helper: fix BUG message in ensure_core_worktree
Shortly after f178c13fda (Revert "Merge branch 'sb/submodule-core-worktree'", 2018-09-07), we had another series that implemented partially the same, ensuring that core.worktree was set in a checked out submodule, namely 74d4731da1 (submodule--helper: replace connect-gitdir-workingtree by ensure-core-worktree, 2018-08-13) As the series 4d6d6ef1fc (Merge branch 'sb/submodule-update-in-c', 2018-09-17) has different goals than the reverted series 7e25437d35 (Merge branch 'sb/submodule-core-worktree', 2018-07-18), I'd wanted to replay the series on top of it to reach the goal of having `core.worktree` correctly set when the submodules worktree is present, and unset when the worktree is not present. The replay resulted in a strange merge conflict highlighting that the BUG message was not changed in 74d4731da1 (submodule--helper: replace connect-gitdir-workingtree by ensure-core-worktree, 2018-08-13). Fix the error message. Signed-off-by: Stefan Beller --- builtin/submodule--helper.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c index d38113a31a..31ac30cf2f 100644 --- a/builtin/submodule--helper.c +++ b/builtin/submodule--helper.c @@ -2045,7 +2045,7 @@ static int ensure_core_worktree(int argc, const char **argv, const char *prefix) struct repository subrepo; if (argc != 2) - BUG("submodule--helper connect-gitdir-workingtree "); + BUG("submodule--helper ensure-core-worktree "); path = argv[1]; -- 2.20.0.rc2.403.gdbc3b29805-goog
Re: [PATCH] mailmap: update brandon williams's email address
On Fri, Dec 7, 2018 at 1:40 PM Jonathan Nieder wrote: > > Brandon Williams wrote: > > > Signed-off-by: Brandon Williams > > --- > > .mailmap | 1 + > > 1 file changed, 1 insertion(+) > > I can confirm that this is indeed the same person. What would be more of interest is why we'd be interested in this patch as there is no commit/patch sent by Brandon with this email in gits history. Is that so you get cc'd on your private address and can follow things you worked on without being subscribed to the mailing list? (I'd be interested to see the use case in the commit message;) Thanks, Stefan
Re: [wishlist] git submodule update --reset-hard
On Thu, Dec 6, 2018 at 5:23 PM Yaroslav Halchenko wrote: > > There was a proposal to "re-attach HEAD" in the submodule, i.e. > > if the branch branch points at the same commit, we don't need > > a detached HEAD, but could go with the branch instead. > > if I got the idea right, if we are talking about any branch, it > would also non-deterministic since who knows what left over branch(es) > point to that commit. Not sure if I would have used that ;) I would think we'd rather want to have it deterministic, i.e. something like 1) record branch name of the submodule 2) update submodules HEAD to to superprojects gitlink 3) if recorded branch (1) matches the sha1 of detached HEAD, have HEAD point to the branch instead. You notice a small inefficiency here as we write HEAD twice, so it could be reworded as: 1) compare superprojects gitlink with the submodules branch 2a) if equal, set submodules HEAD to branch 2b) if unequal set HEAD to gitlink value, resulting in detached HEAD Note that this idea of reattaching reflects the idea (a) below. > > a) "stay on submodule branch (i.e. HEAD still points at $branch), and > > reset --hard" such that the submodule has a clean index and at that $branch > > or > > b) "stay on submodule branch (i.e. HEAD still points at $branch), but > > $branch is > >set to the gitlink from the superproject, and then a reset --hard > >will have the worktree set to it as well. > NB "gitlink" -- just now discovered the thing for me. Thought it would be > called a subproject echoing what git diff/log -p shows for submodule > commits. The terminology is messy: The internal representation in Gits object model is a "gitlink" entry in a tree object. Once we have a .gitmodules entry, we call it submodule. The term 'subproject' is a historic artifact and will likely not be changed in the diff output (or format-patch), because these diffs can be applied using git-am for example. That makes the diff output effectively a transport protocol, and changing protocols is hard if you have no versioning in them. More in https://git-scm.com/docs/gitsubmodules (a rather recent new write of a man page, going into concepts). > > > right -- I meant the local changes and indeed reset --recurse-submodules > > > indeed seems to recurse nicely. Then the undesired effect remaining only > > > the detached HEAD > > > For that we may want to revive discussions in > > https://public-inbox.org/git/20170501180058.8063-5-sbel...@google.com/ > > well, isn't that one requires a branch to be specified in .gitmodules? Ah good point. > > git reset --hard --recursive=hard,keep-branch PREVIOUSPOINT > > 'keep-branch' (given aforementioned keeping the specified in .gitmodules > branch) might be confusing. Also what if a submodule already in a > detached HEAD? IMHO --recursive=hard, and just saying that it would do > "reset --hard", is imho sufficient. (that is why I like pure > --reset hard since it doesn't care and neither does anything to the > branch) For that we might want to first do the git submodule update --reset-hard which runs reset --hard inside the submodule, no matter which branch the submodule is on (if any) and resets to the given superproject sha1. See git-submodule.sh in git.git[1] in cmd_update. We'd need to add a command line flag (`--reset-hard` would be the obvious choice?) which would set the `update` variable, which then is evaluated to what needs to be done in the submodule, which in that case would be the hard reset. https://github.com/git/git/blob/master/git-submodule.sh#L606 Once that is done we'd want to add a test case, presumably in t/t7406-submodule-update.sh > > > I would have asked for > > > >git revert --recursive ... > > >git rebase --recursive [-i] ... > > > > which I also frequently desire (could elaborate on the use cases etc). > > > These would be nice to have. It would be nice if you'd elaborate on the > > use cases for future reference in the mailing list archive. :-) > > ok, will try to do so ;-) In summary: they are just a logical extension > of git support for submodules for anyone actively working with > submodules to keep entire tree in sync. Then quite often the need for > reverting a specific commit (which also has changes reflected in > submodules) arises. The same with rebase, especially to trim away some > no longer desired changes reflected in submodules. > > the initial "git submodule update --reset-hard" is pretty much a > crude workaround for some of those cases, so I would just go earlier in > the history, and redo some things, whenever I could just drop or revert > some selected set of commits. That makes sense. Do you want to give the implementation a try for the --reset-hard switch? > ah... so it is only submodule command which has --recursive, and the > rest have --recurse-submodules when talking about recursing into > submodules? I don't think we were that cautious in development as it was done by different
Re: git, monorepos, and access control
On Thu, Dec 6, 2018 at 12:09 PM Johannes Schindelin wrote: > > Hi, > > On Wed, 5 Dec 2018, Jeff King wrote: > > > The model that fits more naturally with how Git is implemented would be > > to use submodules. There you leak the hash of the commit from the > > private submodule, but that's probably obscure enough (and if you're > > really worried, you can add a random nonce to the commit messages in the > > submodule to make their hashes unguessable). > > I hear myself frequently saying: "Friends don't let friends use > submodules". It's almost like: "Some people think their problem is solved > by using submodules. Only now they have two problems." Blaming tools for their lack of evolution/development is not necessarily the right approach. I recall having patches rejected on this very mailing list that fixed obvious but minor good things like whitespaces and coding style, because it *might* produce merge conflicts. Would that situation warrant me to blame the lacks in the merge algorithm, or could you imagine a better way out? (No need to answer, it's purely to demonstrate that blaming tooling is not always the right approach; only sometimes it may be) > There are big reasons, after all, why some companies go for monorepos: it > is not for lack of trying to go with submodules, it is the problems that > were incurred by trying to treat entire repositories the same as single > files (or even trees): they are just too different. We could change that in more places. One example you might think of is the output of git-status that displays changed files. And in case of submodules it would just show "submodule changes", which we already differentiate into "dirty tree" and "different sha1 at HEAD". Instead we could have the output of all changed files recursively in the superprojects git-status output. Another example is the diff machinery, which already knows some basics such as embedding submodule logs or actual diffs. > In a previous life, I also tried to go for submodules, was burned, and had > to restart the whole thing. We ended up with something that might work in > this instance, too, although our use case was not need-to-know type of > encapsulation. What we went for was straight up modularization. So this is a "Fix the data instead of the tool", which seems to be a local optimization (i.e. you only have to do it once, such that it is cheaper to do than fixing the tool for that workgroup) ... and because everyone does that the tool never gets fixed. > What I mean is that we split the project up into over 100 individual > projects that are now all maintained in individual repositories, and they > are connected completely outside of Git, via a dependency management > system (in this case, Maven, although that is probably too Java-centric > for AMD's needs). Once you have the dependency management system in place, you will encounter the rare case of still wanting to change things across repository boundaries at the same time. Submodules offer that, which is why Android wants to migrate off of the repo tool, and there it seems natural to go for submodules. > I just wanted to throw that out here: if you can split up your project > into individual projects, it might make sense not to maintain them as > submodules but instead as individual repositories whose artifacts are > uploaded into a central, versioned artifact store (Maven, NuGet, etc). And > those artifacts would then be retrieved by the projects that need them. This is cool and industry standard. But once you happen to run in a bug that involves 2 new artifacts (but each of the new artifacts work fine on their own), then you'd wish for something like "git-bisect but across repositories". Submodules (in theory) allow for fine grained bisection across these repository boundaries, I would think. > I figure that that scheme might work for you better than submodules: I > could imagine that you need to make the build artifacts available even to > people who are not permitted to look at the corresponding source code, > anyway. This is a sensible suggestion, as they probably don't want to ramp up development on submodules. :-)
Re: [PATCHv2 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip
On Tue, Dec 4, 2018 at 7:10 PM Junio C Hamano wrote: > > Stefan Beller writes: > > > This is a resend of sb/submodule-recursive-fetch-gets-the-tip, > > with all feedback addressed. As it took some time, I'll send it > > without range-diff, but would ask for full review. > > Is that a "resend" or reroll/update (or whatever word that does not > imply "just sending the same thing again")? As you noticed, it is an actual update. I started to use resend as DScho seems very unhappy about the word reroll claiming we'd be the only Software community that uses the term reroll for an iteration of a change. I see how resend could sound like retransmission without change. > child_process_init(cp); > - cp->dir = strbuf_detach(_path, NULL); > - prepare_submodule_repo_env(>env_array); > + cp->dir = xstrdup(repo->worktree); > + prepare_submodule_repo_env(>env_array); > > Hmph, I offhand do not see there would be any difference if you > assigned to cp->dir before or after preparing the repo env, but is > there a reason these two must be done in this updated order that I > am missing? Very similar changes appear multiple times in this > range-diff. Jonathan Tan asked for it to be "diff friendly". This -of course- is range-diff unfriendly. > [...] you seem to be OK with a lot of the changes, I did not find an actionable suggestion. Thanks for still queuing topics during -rc time, Stefan
Re: [wishlist] git submodule update --reset-hard
On Thu, Dec 6, 2018 at 1:25 PM Yaroslav Halchenko wrote: > > > On Thu, 06 Dec 2018, Stefan Beller wrote: > > > On Thu, Dec 6, 2018 at 10:02 AM Yaroslav Halchenko > > wrote: > > > > Dear Git Gurus, > > > > I wondered what would be your take on my wishlist request to add > > > --reset-hard option, which would be very similar to regular "update" which > > > checks out necessary commit, but I want it to remain in the branch. > > > What if the branch differs from the sha1 recorded in the superproject? > > git reset --hard itself is an operation which should be done with some > level of competence in doing "the right thing" by calling it. You > can hop branches even in current (without any submodules in question) > repository with it and cause as much chaos as you desire. Right. git reset --hard would the branch (as well as the working tree) to the given sha1, which is confusing as submodules get involved. The Right Thing as of now is the sha1 as found in the superprojects gitlink. But as that can be different from any branch in the submodule, we'd rather detach the HEAD to make it deterministic. There was a proposal to "re-attach HEAD" in the submodule, i.e. if the branch branch points at the same commit, we don't need a detached HEAD, but could go with the branch instead. > If desired though, a number of prevention mechanisms could be in place (but > would require option(s) to overcome) to allow submodule to be reset --hard'ed > only when some conditions met (e.g. only to the commit which is among parent > commits path of the current branch). This way wild hops would be prevented, > although you might still end up in some feature branch. But since "reset > --hard" itself doesn't have any safe-guards, I do not really think they should > be implemented here either. So are you looking for a) "stay on submodule branch (i.e. HEAD still points at $branch), and reset --hard" such that the submodule has a clean index and at that $branch or b) "stay on submodule branch (i.e. HEAD still points at $branch), but $branch is set to the gitlink from the superproject, and then a reset --hard will have the worktree set to it as well. (a) is what the referenced submodule.repoLike option implements. I'd understand the desire for (b) as well, as it is a "real" hard reset on the superproject level, without detaching branches. > > git reset --hard --recurse-submodules HEAD > it does indeed some trick(s) but not all seems to be the ones I desire: > > 1. Seems to migrate submodule's .git directories into the top level > .git/modules Ah yes, that happens too. This will help once you want to git-rm a submodule and checkout states before and after. > > undesirable in the sense of still having local changes (that is what > > the above reset with `--recurse` would fix) or changed the branch > > state? (i.e. is detached but was on a branch before?) > > right -- I meant the local changes and indeed reset --recurse-submodules > indeed seems to recurse nicely. Then the undesired effect remaining only > the detached HEAD For that we may want to revive discussions in https://public-inbox.org/git/20170501180058.8063-5-sbel...@google.com/ > > > git submodule update --recursive > > > > I would end up in the detached HEADs within submodules. > > > > What I want is to retain current branch they are at (or may be possible > > > "were in"? reflog records might have that information) > > > So something like > > > git submodule foreach --recursive git reset --hard > > > ? > > not quite -- this would just kill all local changes within each submodule, > not > to reset it to the desired state, which wouldn't be specified in such > invocation, and is only known to the repo containing it With this answer it sounds like you'd want (b) from above. > > You may be interested in > > https://public-inbox.org/git/20180927221603.148025-1-sbel...@google.com/ > > which introduces a switch `submodule.repoLike [ = true]`, which > > when set would not detach HEAD in submodules. > > Thanks! looks interesting -- was there more discussion/activity beyond those 5 > posts in the thread? Unfortunately there was not. > This feature might indeed come handy but if I got it right, it is somewhat > complimentary to just having submodule update --reset-hard . E.g. submodules > might be in different branches (if I am not tracking based on branch names), > so > I would not want a recursive checkout with -b|-B. But we would indeed benefit > from such functionality, since this difficulty of managing branches of > submodules I think would be elevated with it! (e.g. in one
[PATCH] fetch: ensure submodule objects fetched
Currently when git-fetch is asked to recurse into submodules, it dispatches a plain "git-fetch -C " (with some submodule related options such as prefix and recusing strategy, but) without any information of the remote or the tip that should be fetched. But this default fetch is not sufficient, as a newly fetched commit in the superproject could point to a commit in the submodule that is not in the default refspec. This is common in workflows like Gerrit's. When fetching a Gerrit change under review (from refs/changes/??), the commits in that change likely point to submodule commits that have not been merged to a branch yet. Fetch a submodule object by id if the object that the superproject points to, cannot be found. For now this object is fetched from the 'origin' remote as we defer getting the default remote to a later patch. A list of new submodule commits are already generated in certain conditions (by check_for_new_submodule_commits()); this new feature invokes that function in more situations. The submodule checks were done only when a ref in the superproject changed, these checks were extended to also be performed when fetching into FETCH_HEAD for completeness, and add a test for that too. Signed-off-by: Stefan Beller --- Thanks Jonathan for the review! So it looks like only the last patch needs some improvements, which is why I'd only resend the last patch here. Also note the test with interious superproject commits. All suggestions sounded sensible, addressing them all, here is a range-diff to the currently queued version: Range-diff: 1: 04eb06607b ! 1: ac6558cbc9 fetch: try fetching submodules if needed objects were not fetched @@ -1,6 +1,6 @@ Author: Stefan Beller -fetch: try fetching submodules if needed objects were not fetched +fetch: ensure submodule objects fetched Currently when git-fetch is asked to recurse into submodules, it dispatches a plain "git-fetch -C " (with some submodule related options @@ -14,22 +14,19 @@ commits in that change likely point to submodule commits that have not been merged to a branch yet. -Try fetching a submodule by object id if the object id that the -superproject points to, cannot be found. +Fetch a submodule object by id if the object that the superproject +points to, cannot be found. For now this object is fetched from the +'origin' remote as we defer getting the default remote to a later patch. -builtin/fetch used to only inspect submodules when they were fetched -"on-demand", as in either on/off case it was clear whether the submodule -needs to be fetched. However to know whether we need to try fetching the -object ids, we need to identify the object names, which is done in this -function check_for_new_submodule_commits(), so we'll also run that code -in case the submodule recursion is set to "on". +A list of new submodule commits are already generated in certain +conditions (by check_for_new_submodule_commits()); this new feature +invokes that function in more situations. The submodule checks were done only when a ref in the superproject changed, these checks were extended to also be performed when fetching into FETCH_HEAD for completeness, and add a test for that too. Signed-off-by: Stefan Beller -Signed-off-by: Junio C Hamano diff --git a/builtin/fetch.c b/builtin/fetch.c --- a/builtin/fetch.c @@ -82,7 +79,7 @@ struct string_list changed_submodule_names; + -+ /* The submodules to fetch in */ ++ /* Pending fetches by OIDs */ + struct fetch_task **oid_fetch_tasks; + int oid_fetch_tasks_nr, oid_fetch_tasks_alloc; }; @@ -97,13 +94,16 @@ return spf->default_option; } ++/* ++ * Fetch in progress (if callback data) or ++ * pending (if in oid_fetch_tasks in struct submodule_parallel_fetch) ++ */ +struct fetch_task { + struct repository *repo; + const struct submodule *sub; + unsigned free_sub : 1; /* Do we need to free the submodule? */ + -+ /* fetch specific oids if set, otherwise fetch default refspec */ -+ struct oid_array *commits; ++ struct oid_array *commits; /* Ensure these commits are fetched */ +}; + +/** @@ -176,7 +176,6 @@ for (; spf->count < spf->r->index->cache_nr; spf->count++) { - struct strbuf submodule_prefix = STRBUF_INIT; -+ int recurse_config; const struct cache_entry *ce = spf->r->index->cache[spf->count]; const char *default_argv; - const struct submodule *submodule; @@ -199,11 +198,9 @@ + task = fetch_task_create(spf-&
Re: [wishlist] git submodule update --reset-hard
On Thu, Dec 6, 2018 at 10:02 AM Yaroslav Halchenko wrote: > > Dear Git Gurus, > > I wondered what would be your take on my wishlist request to add > --reset-hard option, which would be very similar to regular "update" which > checks out necessary commit, but I want it to remain in the branch. What if the branch differs from the sha1 recorded in the superproject? > Rationale: In DataLad we heavily rely on submodules, and we have established > easy ways to do some manipulations across full hierarchies of them. E.g. a > single command could introduce a good number of commits across deep hierarchy > of submodules, e.g. while committing changes within deep submodule, while also > doing all necessary commits in the repositories leading to that submodule so > the entire tree of them stays in a "clean" state. The difficulty comes when > there is a need to just "forget" some changes. The obvious way is to e.g. > >git reset --hard PREVIOUS_STATE git reset --hard --recurse-submodules HEAD would do the trick > in the top level repository. But that leaves all the submodules now in > the undesired state. If I do undesirable in the sense of still having local changes (that is what the above reset with `--recurse` would fix) or changed the branch state? (i.e. is detached but was on a branch before?) > git submodule update --recursive > > I would end up in the detached HEADs within submodules. > > What I want is to retain current branch they are at (or may be possible > "were in"? reflog records might have that information) So something like git submodule foreach --recursive git reset --hard ? You may be interested in https://public-inbox.org/git/20180927221603.148025-1-sbel...@google.com/ which introduces a switch `submodule.repoLike [ = true]`, which when set would not detach HEAD in submodules. Can you say more about the first question above: Would you typically have situations where the submodule branch is out of sync with the superproject and how do you deal with that? Adding another mode to `git submodule update` sounds reasonable to me, too. Stefan
Re: A case where diff.colorMoved=plain is more sensible than diff.colorMoved=zebra & others
On Thu, Dec 6, 2018 at 6:58 AM Phillip Wood wrote: > > So is there some "must be at least two consecutive lines" condition for > > not-plain, or is something else going on here? > > To be considered a block has to have 20 alphanumeric characters - see > commit f0b8fb6e59 ("diff: define block by number of alphanumeric chars", > 2017-08-15). This stops things like random '}' lines being marked as > moved on their own. This is spot on. All but the "plain" mode use the concept of "blocks" of code (there is even one mode called "blocks", which adds to the confusion). > It might be better to use some kind of frequency > information (a bit like python's difflib junk parameter) instead so that > (fairly) unique short lines also get marked properly. Yes that is what I was initially thinking about. However to have good information, you'd need to index a whole lot (the whole repository, i.e. all text blobs in existence?) to get an accurate picture of frequency information, which I'd prefer to call entropy as I come from a background familiar with https://en.wikipedia.org/wiki/Information_theory, I am not sure where 'frequency information' comes from -- it sounds like the same concept. Of course it is too expensive to run an operation O(repository size) just for this diff, so maybe we could get away with some smaller corpus to build up this information on what is sufficient for coloring. When only looking at the given diff, I would imagine that each line would not carry a whole lot of information as its characters occur rather frequently compared to the rest of the diff. Best, Stefan
Re: [ANNOUNCE] Git v2.20.0-rc2
-cc linux list > Perhaps we should note this more prominently, and since Brandon isn't at > Google anymore can some of you working there edit this post? It's the > first Google result for "git protocol v2", so it's going to be quite > confusing for people if after 2.20 the instructions in it no longer > work. Thanks for pointing this out, we missed the implications when that patch was discussed. Looking into it. > > 1. > https://opensource.googleblog.com/2018/05/introducing-git-protocol-version-2.html
Re: [PATCH on sb/more-repo-in-api] revision: use commit graph in get_reference()
On Tue, Dec 4, 2018 at 2:42 PM Jonathan Tan wrote: > > When fetching into a repository, a connectivity check is first made by > check_exist_and_connected() in builtin/fetch.c that runs: > > git rev-list --objects --stdin --not --all --quiet <(list of objects) > > If the client repository has many refs, this command can be slow, > regardless of the nature of the server repository or what is being > fetched. A profiler reveals that most of the time is spent in > setup_revisions() (approx. 60/63), and of the time spent in > setup_revisions(), most of it is spent in parse_object() (approx. > 49/60). This is because setup_revisions() parses the target of every ref > (from "--all"), and parse_object() reads the buffer of the object. > > Reading the buffer is unnecessary if the repository has a commit graph > and if the ref points to a commit (which is typically the case). This > patch uses the commit graph wherever possible; on my computer, when I > run the above command with a list of 1 object on a many-ref repository, > I get a speedup from 1.8s to 1.0s. > > Another way to accomplish this effect would be to modify parse_object() > to use the commit graph if possible; however, I did not want to change > parse_object()'s current behavior of always checking the object > signature of the returned object. > > Signed-off-by: Jonathan Tan > --- > This is on sb/more-repo-in-api because I'm using the repo_parse_commit() > function. This is a mere nicety, not strictly required. Before we had parse_commit(struct commit *) which would accomplish the same, (and we'd still have that afterwards as a #define falling back onto the_repository). As the function get_reference() is not the_repository safe as it contains a call to is_promisor_object() that is repository agnostic, I think it would be fair game to not depend on that series. I am not complaining, though. > A colleague noticed this issue when handling a mirror clone. > > Looking at the bigger picture, the speed of the connectivity check > during a fetch might be further improved by passing only the negotiation > tips (obtained through --negotiation-tip) instead of "--all". This patch > just handles the low-hanging fruit first. > --- > revision.c | 15 ++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/revision.c b/revision.c > index b5108b75ab..e7da2c57ab 100644 > --- a/revision.c > +++ b/revision.c > @@ -212,7 +212,20 @@ static struct object *get_reference(struct rev_info > *revs, const char *name, > { > struct object *object; > > - object = parse_object(revs->repo, oid); > + /* > +* If the repository has commit graphs, repo_parse_commit() avoids > +* reading the object buffer, so use it whenever possible. > +*/ > + if (oid_object_info(revs->repo, oid, NULL) == OBJ_COMMIT) { > + struct commit *c = lookup_commit(revs->repo, oid); > + if (!repo_parse_commit(revs->repo, c)) > + object = (struct object *) c; > + else > + object = NULL; Would it make sense in this case to rely on parse_object below instead of assigning NULL? The reason for that would be that when lookup_commit returns NULL, we would try more broadly. AFAICT oid_object_info doesn't take advantage of the commit graph, but just looks up the object header, which is still less than completely parsing it. Then lookup_commit is overly strict, as it may return NULL as when there still is a type mismatch (I don't think a mismatch could happen here, as both rely on just the object store, and not the commit graph.), so this would be just defensive programming for the sake of it. I dunno. struct commit *c; if (oid_object_info(revs->repo, oid, NULL) == OBJ_COMMIT && (c = lookup_commit(revs->repo, oid)) && !repo_parse_commit(revs->repo, c)) object = (struct object *) c; else object = parse_object(revs->repo, oid); So with all that said, I still think this is a good patch. Thanks, Stefan > + } else { > + object = parse_object(revs->repo, oid); > + } > + > if (!object) { > if (revs->ignore_missing) > return object; > -- > 2.19.0.271.gfe8321ec05.dirty >
Re: [WIP RFC 5/5] upload-pack: send part of packfile response as uri
On Mon, Dec 3, 2018 at 3:38 PM Jonathan Tan wrote: > > This is a partial implementation of upload-pack sending part of its > packfile response as URIs. It does so by implementing a new flag `--exclude-configured-blobs` in pack-objects, which would change the output of pack-objects to output a list of URLs (of the excluded blobs) followed by the pack to be asked for. This design seems easy to implement as then upload-pack can just parse the output and only needs to insert "packfile" and "packfile-uris\n" at the appropriate places of the stream, otherwise it just passes through the information obtained from pack-objects. The design as-is would make for hard documentation of pack-objects (its output is not just a pack anymore when that flag is given, but a highly optimized byte stream). Initially I did not anticipate this to be one of the major design problems as I assumed we'd want to use this pack feature over broadly (e.g. eventually by offloading most of the objects into a base pack that is just always included as the likelihood for any object in there is very high on initial clone), but it makes total sense to only output the URIs that we actually need. An alternative that comes very close to the current situation would be to either pass a file path or file descriptor (that upload-pack listens to in parallel) to pack-objects as an argument of the new flag. Then we would not need to splice the protocol sections into the single output stream, but we could announce the sections, then flush the URIs and then flush the pack afterwards. I looked at this quickly, but that would need either extensions in run-command.c for setting up the new fd for us, as there we already have OS specific code for these setups, or we'd have to duplicate some of the logic here, which doesn't enthuse me either. So maybe we'd create a temp file via mkstemp and pass the file name to pack-objects for writing the URIs and then we'd just need to stream that file? > + # NEEDSWORK: "git clone" fails here because it ignores the URI > provided > + # instead of fetching it. > + test_must_fail env GIT_TRACE_PACKET="$(pwd)/log" \ > + git -c protocol.version=2 clone \ > + "$HTTPD_URL/smart/http_parent" http_child 2>err && > + # Although "git clone" fails, we can still check that the server > + # provided the URI we requested and that the error message pinpoints > + # the object that is missing. > + grep "clone< uri https://example.com/a-uri; log && > + test_i18ngrep "did not receive expected object $(cat h)" err That is a nice test!
Re: [WIP RFC 3/5] upload-pack: refactor reading of pack-objects out
On Mon, Dec 3, 2018 at 3:37 PM Jonathan Tan wrote: > > Subsequent patches will change how the output of pack-objects is > processed, so extract that processing into its own function. > > Currently, at most 1 character can be buffered (in the "buffered" local > variable). One of those patches will require a larger buffer, so replace > that "buffered" local variable with a buffer array. This buffering sounds oddly similar to the pkt reader which can buffer at most one pkt, the difference being that we'd buffer bytes instead of pkts.
Re: [WIP RFC 2/5] Documentation: add Packfile URIs design doc
Thanks for bringing this design to the list! > diff --git a/Documentation/technical/protocol-v2.txt > b/Documentation/technical/protocol-v2.txt > index 345c00e08c..2cb1c41742 100644 > --- a/Documentation/technical/protocol-v2.txt > +++ b/Documentation/technical/protocol-v2.txt > @@ -313,7 +313,8 @@ header. Most sections are sent only when the packfile is > sent. > > output = acknowledgements flush-pkt | > [acknowledgments delim-pkt] [shallow-info delim-pkt] > -[wanted-refs delim-pkt] packfile flush-pkt > +[wanted-refs delim-pkt] [packfile-uris delim-pkt] > +packfile flush-pkt While this is an RFC and incomplete, we'd need to remember to add packfile-uris to the capabilities list above, stating that it requires thin-pack and ofs-delta to be sent, and what to expect from it. The mention of --no-packfile-urls in the Client design above seems to imply we'd want to turn it on by default, which I thought was not the usual stance how we introduce new things. An odd way of disabling it would be --no-thin-pack, hoping the client side implementation abides by the implied requirements. > acknowledgments = PKT-LINE("acknowledgments" LF) > (nak | *ack) > @@ -331,6 +332,9 @@ header. Most sections are sent only when the packfile is > sent. > *PKT-LINE(wanted-ref LF) > wanted-ref = obj-id SP refname > > +packfile-uris = PKT-LINE("packfile-uris" LF) *packfile-uri > +packfile-uri = PKT-LINE("uri" SP *%x20-ff LF) Is the *%x20-ff a fancy way of saying obj-id? While the server is configured with pairs of (oid URL), we would not need to send the exact oid to the client as that is what the client can figure out on its own by reading the downloaded pack. Instead we could send an integrity hash (i.e. the packfile downloaded from "uri" is expected to hash to $oid here) Thanks, Stefan
Re: [WIP RFC 0/5] Design for offloading part of packfile response to CDN
On Mon, Dec 3, 2018 at 3:37 PM Jonathan Tan wrote: > > There is a potential issue: a server which produces both the URIs and > the packfile at roughly the same time (like the implementation in this > patch set) will not have sideband access until it has concluded sending > the URIs. Among other things, this means that the server cannot send > keepalive packets until quite late in the response. One solution to this > might be to add a feature that allows the server to use a sideband > throughout the whole response - and this has other benefits too like > allowing servers to inform the client throughout the whole fetch, not > just at the end. While side band sounds like the right thing to do, we could also sending (NULL)-URIs within this feature.
Re: [PATCH] pack-protocol.txt: accept error packets in any context
> diff --git a/pkt-line.c b/pkt-line.c > index 04d10bbd0..ce9e42d10 100644 > --- a/pkt-line.c > +++ b/pkt-line.c > @@ -346,6 +346,10 @@ enum packet_read_status packet_read_with_status(int fd, > char **src_buffer, > return PACKET_READ_EOF; > } > > + if (starts_with(buffer, "ERR ")) { > + die(_("remote error: %s"), buffer + 4); > + } > + Handling any ERR line in the pkt reader is okay, as * we do not pkt-ize pack files and * we do not have any other parts of the protocol where user data is in the first four bytes, which could randomly match this pattern and * the rest of the pkt-ized part of the protocol never sends ERR lines. Makes sense.
Re: [PATCH] sideband: color lines with keyword only
On Mon, Dec 3, 2018 at 3:23 PM Jonathan Nieder wrote: > I was curious about what versions of Gerrit this is designed to > support (or in other words whether it's a bug fix or a feature). > Looking at examples like [1], it seems that Gerrit historically always > used "ERROR:" so the 59a255aef0 logic would work for it. More > recently, [2] (ReceiveCommits: add a "SUCCESS" marker for successful > change updates, 2018-08-21) put SUCCESS on a line of its own. That > puts this squarely in the new-feature category. Ooops. From the internal bug, I assumed this to be long standing Gerrit behavior, which is why I sent it out in -rc to begin with. > > --- a/sideband.c > > +++ b/sideband.c > > @@ -87,7 +87,7 @@ static void maybe_colorize_sideband(struct strbuf *dest, > > const char *src, int n) > > struct keyword_entry *p = keywords + i; > > int len = strlen(p->keyword); > > > > - if (n <= len) > > + if (n < len) > > continue; > > In the old code, we would escape early if 'n == len', but we didn't > need to. If 'n == len', then > > src[len] == '\0' src[len] could also be one of "\n\r", see the caller recv_sideband for sidebase case 2. > src .. [len-1] is a valid buffer to read from > > so the strncasecmp and strbuf_add operations used in this function are > valid. Good. Yes, they are all valid... > > - if (!strncasecmp(p->keyword, src, len) && !isalnum(src[len])) > > { > > + if (!strncasecmp(p->keyword, src, len) && > > + (len == n || !isalnum(src[len]))) { > > Our custom isalnum treats '\0' as not alphanumeric (sane_ctype[0] == > GIT_CNTRL) so this part of the patch is unnecessary. That said, it's > good for clarity and defensive programming. ... but here we need to check for src[len] for validity. I made no assumptions about isalnum, but rather needed to shortcut the condition, as accessing src[len] would be out of bounds, no? > > > strbuf_addstr(dest, p->color); > > strbuf_add(dest, src, len); unlike here (or the rest of the block), where len is used correctly.
[PATCH] sideband: color lines with keyword only
When bf1a11f0a1 (sideband: highlight keywords in remote sideband output, 2018-08-07) was introduced, it was carefully considered which strings would be highlighted. However 59a255aef0 (sideband: do not read beyond the end of input, 2018-08-18) brought in a regression that the original did not test for. A line containing only the keyword and nothing else ("SUCCESS") should still be colored. Signed-off-by: Stefan Beller --- sideband.c | 5 +++-- t/t5409-colorize-remote-messages.sh | 2 ++ 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/sideband.c b/sideband.c index 368647acf8..7c3d33d3f8 100644 --- a/sideband.c +++ b/sideband.c @@ -87,7 +87,7 @@ static void maybe_colorize_sideband(struct strbuf *dest, const char *src, int n) struct keyword_entry *p = keywords + i; int len = strlen(p->keyword); - if (n <= len) + if (n < len) continue; /* * Match case insensitively, so we colorize output from existing @@ -95,7 +95,8 @@ static void maybe_colorize_sideband(struct strbuf *dest, const char *src, int n) * messages. We only highlight the word precisely, so * "successful" stays uncolored. */ - if (!strncasecmp(p->keyword, src, len) && !isalnum(src[len])) { + if (!strncasecmp(p->keyword, src, len) && + (len == n || !isalnum(src[len]))) { strbuf_addstr(dest, p->color); strbuf_add(dest, src, len); strbuf_addstr(dest, GIT_COLOR_RESET); diff --git a/t/t5409-colorize-remote-messages.sh b/t/t5409-colorize-remote-messages.sh index f81b6813c0..2a8c449661 100755 --- a/t/t5409-colorize-remote-messages.sh +++ b/t/t5409-colorize-remote-messages.sh @@ -17,6 +17,7 @@ test_expect_success 'setup' ' echo " " "error: leading space" echo "" echo Err + echo SUCCESS exit 0 EOF echo 1 >file && @@ -35,6 +36,7 @@ test_expect_success 'keywords' ' grep "error: error" decoded && grep "hint:" decoded && grep "success:" decoded && + grep "SUCCESS" decoded && grep "warning:" decoded ' -- 2.20.0.rc2.403.gdbc3b29805-goog
Re: [PATCH/RFC v2 0/7] Introduce new commands switch-branch and checkout-files
On Thu, Nov 29, 2018 at 7:33 AM Duy Nguyen wrote: > > On Wed, Nov 28, 2018 at 9:30 PM Stefan Beller wrote: > > > > On Wed, Nov 28, 2018 at 12:09 PM Duy Nguyen wrote: > > > > > > On Wed, Nov 28, 2018 at 9:01 PM Duy Nguyen wrote: > > > > should we do > > > > something about detached HEAD in this switch-branch command (or > > > > whatever its name will be)? > > > > > > > > This is usually a confusing concept to new users > > > > > > And it just occurred to me that perhaps we should call this "unnamed > > > branch" (at least at high UI level) instead of detached HEAD. It is > > > technically not as accurate, but much better to understand. > > > > or 'direct' branch? > > makes me think, what is an indirect branch? I drew the term from HEAD pointing to a branch pointing to a commit, i.e. HEAD indirectly points to a commit, but in 'direct' branch mode, HEAD contains the commit id. So indirect branch would work for our current 'real' branches. When asked out of context of this discussion, I might add yet another layer of abstraction to make an 'indirect branch', i.e. HEAD pointing to a symbolic ref that points at a branch that points to a commit. The term symref is what we currently use (Just looked into gitglossary, where we distinguish symbolic refs from pseudorefs) for hat I would have called an indirect branch as well. So maybe we need to measure the level of indirection ("How often do we need to dereference the ref/object to get a commit oid?") to come to terms in how to describe the use cases easily. Here is a fun-one: git checkout git checkout --detach Currently the --detach option detaches HEAD from branch pointing at the object id, i.e. it is the same as git checkout whereas when we focus on the levels of indirection it would also be reasonable to have git checkout as a reasonable alternative, where is the branch that is pointed at from the .
Re: [PATCH v2 6/7] checkout: split into switch-branch and checkout-files
> > Idea: > > If git checkout-files modifies the submodules file, it could also > > auto-update the submodules. (For example, with something like "git > > submodule update --init --recursive --progress"). > > This one is tricky because we should deal with submodule autoupdate > consistently across all porcelain commands (or at least common ones), > not just checkout-files. I'd prefer to delay this until later. Once we > figure out what to do with other commands, then we can still change > defaults for checkout-files. checkout/reset are respecting the submodule.recurse setting for this already, and as your patches only change the UX frontend git -c submodule.recurse checkout-files would also touch submodules. Given that deep down in the submodules it's all about files again, we could think checkout-files is still a good name. I think Stefan X. is asking for making submodule.recurse to default to true, which is indeed unrelated to this. Stefan
Re: [PATCH 09/10] fetch: try fetching submodules if needed objects were not fetched
On Fri, Oct 26, 2018 at 1:41 PM Jonathan Tan wrote: > > > But this default fetch is not sufficient, as a newly fetched commit in > > the superproject could point to a commit in the submodule that is not > > in the default refspec. This is common in workflows like Gerrit's. > > When fetching a Gerrit change under review (from refs/changes/??), the > > commits in that change likely point to submodule commits that have not > > been merged to a branch yet. > > > > Try fetching a submodule by object id if the object id that the > > superproject points to, cannot be found. > > I see that these suggestions of mine (from [1]) was implemented, but not > others. If you disagree, that's fine, but I think they should be > discussed. ok. > > > - if ((recurse_submodules != RECURSE_SUBMODULES_OFF) && > > - (recurse_submodules != RECURSE_SUBMODULES_ON)) > > + if (recurse_submodules != RECURSE_SUBMODULES_OFF) > > I think the next patch should be squashed into this patch. Then you can > say that these are now redundant and can be removed. ok. > > > @@ -1218,8 +1218,12 @@ struct submodule_parallel_fetch { > > int result; > > > > struct string_list changed_submodule_names; > > + struct get_next_submodule_task **fetch_specific_oids; > > + int fetch_specific_oids_nr, fetch_specific_oids_alloc; > > }; > > Add documentation for fetch_specific_oids. Also, it might be better to > call these oid_fetch_tasks and the struct, "struct fetch_task". ok. > Here, struct get_next_submodule_task is used for 2 different things: > (1) After the first round fetch, fetch_finish() uses it to determine if > a second round is needed. > (2) In submodule_parallel_fetch.fetch_specific_oids, to tell the > parallel runner (through get_next_submodule_task()) to do this > fetch. > > I think that it's better to have 2 different structs for these 2 > different uses. (Note that task_cb can be NULL for the second round. > Unless we plan to recheck the OIDs? Currently we recheck them, but we > don't do anything either way.) I think it is easier to only have one struct until we have substantially more to communicate. (1) is a boolean answer, for which (non-)NULL is sufficient. > I think that this is best described as the submodule that has no entry > in .gitmodules? Maybe call it "get_non_gitmodules_submodule" and > document it with a similar comment as in get_submodule_repo_for(). done. > > > + > > +static struct get_next_submodule_task *get_next_submodule_task_create( > > + struct repository *r, const char *path) > > +{ > > + struct get_next_submodule_task *task = xmalloc(sizeof(*task)); > > + memset(task, 0, sizeof(*task)); > > + > > + task->sub = submodule_from_path(r, _oid, path); > > + if (!task->sub) { > > + task->sub = get_default_submodule(path); > > + task->free_sub = 1; > > + } > > + > > + return task; > > +} > > Clearer to me would be to make get_next_submodule_task_create() return > NULL if submodule_from_path() returns NULL. I doubled down on this one and return NULL when get_default_submodule (now renamed to get_non_gitmodules_submodule) returns NULL, as then we can move the free() from get_next_submodule here and there we'll just have task = fetch_task_create(spf->r, ce->name); if (!task) continue; which helps get_next_submodule to stay readable. > Same comment about "on-demand" as in my previous e-mail. I'd want to push back on refactoring and defer that to a later series. > Break lines to 80. [...] > Same comment about "s--h" as in my previous e-mail. done > > + /* Are there commits that do not exist? */ > > + if (commits->nr) { > > + /* We already tried fetching them, do not try again. */ > > + if (task->commits) > > + return 0; > > Same comment about "task->commits" as in my previous e-mail. Good call. I reordered the function read easier and added a comment on any early return how it could happen. > > > diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh > > index 6c2f9b2ba2..5a75b57852 100755 > > One more thing to test is the case where a submodule doesn't have a > .gitmodules entry. added a test. I just resent the series. Stefan
[PATCH 4/9] submodule.c: tighten scope of changed_submodule_names struct
The `changed_submodule_names` are only used for fetching, so let's make it part of the struct that is passed around for fetching submodules. Signed-off-by: Stefan Beller --- submodule.c | 19 +++ 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/submodule.c b/submodule.c index 3c388f85cc..f93f0aff82 100644 --- a/submodule.c +++ b/submodule.c @@ -25,7 +25,6 @@ #include "commit-reach.h" static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF; -static struct string_list changed_submodule_names = STRING_LIST_INIT_DUP; static int initialized_fetch_ref_tips; static struct oid_array ref_tips_before_fetch; static struct oid_array ref_tips_after_fetch; @@ -1136,7 +1135,8 @@ void check_for_new_submodule_commits(struct object_id *oid) oid_array_append(_tips_after_fetch, oid); } -static void calculate_changed_submodule_paths(struct repository *r) +static void calculate_changed_submodule_paths(struct repository *r, + struct string_list *changed_submodule_names) { struct argv_array argv = ARGV_ARRAY_INIT; struct string_list changed_submodules = STRING_LIST_INIT_DUP; @@ -1174,7 +1174,8 @@ static void calculate_changed_submodule_paths(struct repository *r) continue; if (!submodule_has_commits(r, path, commits)) - string_list_append(_submodule_names, name->string); + string_list_append(changed_submodule_names, + name->string); } free_submodules_oids(_submodules); @@ -1221,8 +1222,10 @@ struct submodule_parallel_fetch { int default_option; int quiet; int result; + + struct string_list changed_submodule_names; }; -#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0} +#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, STRING_LIST_INIT_DUP } static int get_fetch_recurse_config(const struct submodule *submodule, struct submodule_parallel_fetch *spf) @@ -1284,7 +1287,7 @@ static int get_next_submodule(struct child_process *cp, case RECURSE_SUBMODULES_ON_DEMAND: if (!submodule || !string_list_lookup( - _submodule_names, + >changed_submodule_names, submodule->name)) continue; default_argv = "on-demand"; @@ -1376,8 +1379,8 @@ int fetch_populated_submodules(struct repository *r, argv_array_push(, "--recurse-submodules-default"); /* default value, "--submodule-prefix" and its value are added later */ - calculate_changed_submodule_paths(r); - string_list_sort(_submodule_names); + calculate_changed_submodule_paths(r, _submodule_names); + string_list_sort(_submodule_names); run_processes_parallel(max_parallel_jobs, get_next_submodule, fetch_start_failure, @@ -1386,7 +1389,7 @@ int fetch_populated_submodules(struct repository *r, argv_array_clear(); out: - string_list_clear(_submodule_names, 1); + string_list_clear(_submodule_names, 1); return spf.result; } -- 2.20.0.rc1.387.gf8505762e3-goog
[PATCH 9/9] fetch: try fetching submodules if needed objects were not fetched
Currently when git-fetch is asked to recurse into submodules, it dispatches a plain "git-fetch -C " (with some submodule related options such as prefix and recusing strategy, but) without any information of the remote or the tip that should be fetched. But this default fetch is not sufficient, as a newly fetched commit in the superproject could point to a commit in the submodule that is not in the default refspec. This is common in workflows like Gerrit's. When fetching a Gerrit change under review (from refs/changes/??), the commits in that change likely point to submodule commits that have not been merged to a branch yet. Try fetching a submodule by object id if the object id that the superproject points to, cannot be found. builtin/fetch used to only inspect submodules when they were fetched "on-demand", as in either on/off case it was clear whether the submodule needs to be fetched. However to know whether we need to try fetching the object ids, we need to identify the object names, which is done in this function check_for_new_submodule_commits(), so we'll also run that code in case the submodule recursion is set to "on". The submodule checks were done only when a ref in the superproject changed, these checks were extended to also be performed when fetching into FETCH_HEAD for completeness, and add a test for that too. Signed-off-by: Stefan Beller --- builtin/fetch.c | 11 +- submodule.c | 206 +++- t/t5526-fetch-submodules.sh | 86 +++ 3 files changed, 265 insertions(+), 38 deletions(-) diff --git a/builtin/fetch.c b/builtin/fetch.c index e0140327aa..91f9b7d9c8 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -763,9 +763,6 @@ static int update_local_ref(struct ref *ref, what = _("[new ref]"); } - if ((recurse_submodules != RECURSE_SUBMODULES_OFF) && - (recurse_submodules != RECURSE_SUBMODULES_ON)) - check_for_new_submodule_commits(>new_oid); r = s_update_ref(msg, ref, 0); format_display(display, r ? '!' : '*', what, r ? _("unable to update local ref") : NULL, @@ -779,9 +776,6 @@ static int update_local_ref(struct ref *ref, strbuf_add_unique_abbrev(, >object.oid, DEFAULT_ABBREV); strbuf_addstr(, ".."); strbuf_add_unique_abbrev(, >new_oid, DEFAULT_ABBREV); - if ((recurse_submodules != RECURSE_SUBMODULES_OFF) && - (recurse_submodules != RECURSE_SUBMODULES_ON)) - check_for_new_submodule_commits(>new_oid); r = s_update_ref("fast-forward", ref, 1); format_display(display, r ? '!' : ' ', quickref.buf, r ? _("unable to update local ref") : NULL, @@ -794,9 +788,6 @@ static int update_local_ref(struct ref *ref, strbuf_add_unique_abbrev(, >object.oid, DEFAULT_ABBREV); strbuf_addstr(, "..."); strbuf_add_unique_abbrev(, >new_oid, DEFAULT_ABBREV); - if ((recurse_submodules != RECURSE_SUBMODULES_OFF) && - (recurse_submodules != RECURSE_SUBMODULES_ON)) - check_for_new_submodule_commits(>new_oid); r = s_update_ref("forced-update", ref, 1); format_display(display, r ? '!' : '+', quickref.buf, r ? _("unable to update local ref") : _("forced update"), @@ -892,6 +883,8 @@ static int store_updated_refs(const char *raw_url, const char *remote_name, ref->force = rm->peer_ref->force; } + if (recurse_submodules != RECURSE_SUBMODULES_OFF) + check_for_new_submodule_commits(>old_oid); if (!strcmp(rm->name, "HEAD")) { kind = ""; diff --git a/submodule.c b/submodule.c index d1b6646f42..1ce944a737 100644 --- a/submodule.c +++ b/submodule.c @@ -1231,8 +1231,14 @@ struct submodule_parallel_fetch { int result; struct string_list changed_submodule_names; + + /* The submodules to fetch in */ + struct fetch_task **oid_fetch_tasks; + int oid_fetch_tasks_nr, oid_fetch_tasks_alloc; }; -#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, STRING_LIST_INIT_DUP } +#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, \ + STRING_LIST_INIT_DUP, \ + NULL, 0, 0} static int get_fetch_recurse_config(const struct submodule *submodule, struct submodule_parallel_fetch *spf) @@ -1259,6 +1265,73
[PATCHv2 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip
This is a resend of sb/submodule-recursive-fetch-gets-the-tip, with all feedback addressed. As it took some time, I'll send it without range-diff, but would ask for full review. I plan on resending after the next release as this got delayed quite a bit, which is why I also rebased it to master. Thanks, Stefan Previous round: https://public-inbox.org/git/20181016181327.107186-1-sbel...@google.com/ Stefan Beller (9): sha1-array: provide oid_array_filter submodule.c: fix indentation submodule.c: sort changed_submodule_names before searching it submodule.c: tighten scope of changed_submodule_names struct submodule: store OIDs in changed_submodule_names repository: repo_submodule_init to take a submodule struct submodule: migrate get_next_submodule to use repository structs submodule.c: fetch in submodules git directory instead of in worktree fetch: try fetching submodules if needed objects were not fetched Documentation/technical/api-oid-array.txt| 5 + builtin/fetch.c | 11 +- builtin/grep.c | 17 +- builtin/ls-files.c | 12 +- builtin/submodule--helper.c | 2 +- repository.c | 27 +- repository.h | 12 +- sha1-array.c | 17 ++ sha1-array.h | 3 + submodule.c | 284 --- t/helper/test-submodule-nested-repo-config.c | 8 +- t/t5526-fetch-submodules.sh | 86 ++ 12 files changed, 395 insertions(+), 89 deletions(-) -- 2.20.0.rc1.387.gf8505762e3-goog
[PATCH 5/9] submodule: store OIDs in changed_submodule_names
'calculate_changed_submodule_paths' uses a local list to compute the changed submodules, and then produces the result by copying appropriate items into the result list. Instead use the result list directly and prune items afterwards using string_list_remove_empty_items. By doing so we'll have access to the util pointer for longer that contains the commits that we need to fetch, which will be useful in a later patch. Signed-off-by: Stefan Beller Reviewed-by: Jonathan Tan --- submodule.c | 19 ++- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/submodule.c b/submodule.c index f93f0aff82..0c81aca6f2 100644 --- a/submodule.c +++ b/submodule.c @@ -1139,8 +1139,7 @@ static void calculate_changed_submodule_paths(struct repository *r, struct string_list *changed_submodule_names) { struct argv_array argv = ARGV_ARRAY_INIT; - struct string_list changed_submodules = STRING_LIST_INIT_DUP; - const struct string_list_item *name; + struct string_list_item *name; /* No need to check if there are no submodules configured */ if (!submodule_from_path(r, NULL, NULL)) @@ -1157,9 +1156,9 @@ static void calculate_changed_submodule_paths(struct repository *r, * Collect all submodules (whether checked out or not) for which new * commits have been recorded upstream in "changed_submodule_names". */ - collect_changed_submodules(r, _submodules, ); + collect_changed_submodules(r, changed_submodule_names, ); - for_each_string_list_item(name, _submodules) { + for_each_string_list_item(name, changed_submodule_names) { struct oid_array *commits = name->util; const struct submodule *submodule; const char *path = NULL; @@ -1173,12 +1172,14 @@ static void calculate_changed_submodule_paths(struct repository *r, if (!path) continue; - if (!submodule_has_commits(r, path, commits)) - string_list_append(changed_submodule_names, - name->string); + if (submodule_has_commits(r, path, commits)) { + oid_array_clear(commits); + *name->string = '\0'; + } } - free_submodules_oids(_submodules); + string_list_remove_empty_items(changed_submodule_names, 1); + argv_array_clear(); oid_array_clear(_tips_before_fetch); oid_array_clear(_tips_after_fetch); @@ -1389,7 +1390,7 @@ int fetch_populated_submodules(struct repository *r, argv_array_clear(); out: - string_list_clear(_submodule_names, 1); + free_submodules_oids(_submodule_names); return spf.result; } -- 2.20.0.rc1.387.gf8505762e3-goog
[PATCH 3/9] submodule.c: sort changed_submodule_names before searching it
We can string_list_insert() to maintain sorted-ness of the list as we find new items, or we can string_list_append() to build an unsorted list and sort it at the end just once. As we do not rely on the sortedness while building the list, we pick the "append and sort at the end" as it has better worst case execution times. Signed-off-by: Stefan Beller Signed-off-by: Junio C Hamano --- submodule.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/submodule.c b/submodule.c index bc48ea3b68..3c388f85cc 100644 --- a/submodule.c +++ b/submodule.c @@ -1283,7 +1283,7 @@ static int get_next_submodule(struct child_process *cp, case RECURSE_SUBMODULES_DEFAULT: case RECURSE_SUBMODULES_ON_DEMAND: if (!submodule || - !unsorted_string_list_lookup( + !string_list_lookup( _submodule_names, submodule->name)) continue; @@ -1377,6 +1377,7 @@ int fetch_populated_submodules(struct repository *r, /* default value, "--submodule-prefix" and its value are added later */ calculate_changed_submodule_paths(r); + string_list_sort(_submodule_names); run_processes_parallel(max_parallel_jobs, get_next_submodule, fetch_start_failure, -- 2.20.0.rc1.387.gf8505762e3-goog
[PATCH 7/9] submodule: migrate get_next_submodule to use repository structs
We used to recurse into submodules, even if they were broken having only an objects directory. The child process executed in the submodule would fail though if the submodule was broken. This is tested via "fetching submodule into a broken repository" in t5526. This patch tightens the check upfront, such that we do not need to spawn a child process to find out if the submodule is broken. Signed-off-by: Stefan Beller --- submodule.c | 56 + 1 file changed, 44 insertions(+), 12 deletions(-) diff --git a/submodule.c b/submodule.c index 0c81aca6f2..77ace5e784 100644 --- a/submodule.c +++ b/submodule.c @@ -1253,6 +1253,30 @@ static int get_fetch_recurse_config(const struct submodule *submodule, return spf->default_option; } +static struct repository *get_submodule_repo_for(struct repository *r, +const struct submodule *sub) +{ + struct repository *ret = xmalloc(sizeof(*ret)); + + if (repo_submodule_init(ret, r, sub)) { + /* +* No entry in .gitmodules? Technically not a submodule, +* but historically we supported repositories that happen to be +* in-place where a gitlink is. Keep supporting them. +*/ + struct strbuf gitdir = STRBUF_INIT; + strbuf_repo_worktree_path(, r, "%s/.git", sub->path); + if (repo_init(ret, gitdir.buf, NULL)) { + strbuf_release(); + free(ret); + return NULL; + } + strbuf_release(); + } + + return ret; +} + static int get_next_submodule(struct child_process *cp, struct strbuf *err, void *data, void **task_cb) { @@ -1260,12 +1284,11 @@ static int get_next_submodule(struct child_process *cp, struct submodule_parallel_fetch *spf = data; for (; spf->count < spf->r->index->cache_nr; spf->count++) { - struct strbuf submodule_path = STRBUF_INIT; - struct strbuf submodule_git_dir = STRBUF_INIT; struct strbuf submodule_prefix = STRBUF_INIT; const struct cache_entry *ce = spf->r->index->cache[spf->count]; - const char *git_dir, *default_argv; + const char *default_argv; const struct submodule *submodule; + struct repository *repo; struct submodule default_submodule = SUBMODULE_INIT; if (!S_ISGITLINK(ce->ce_mode)) @@ -1300,15 +1323,11 @@ static int get_next_submodule(struct child_process *cp, continue; } - strbuf_repo_worktree_path(_path, spf->r, "%s", ce->name); - strbuf_addf(_git_dir, "%s/.git", submodule_path.buf); strbuf_addf(_prefix, "%s%s/", spf->prefix, ce->name); - git_dir = read_gitfile(submodule_git_dir.buf); - if (!git_dir) - git_dir = submodule_git_dir.buf; - if (is_directory(git_dir)) { + repo = get_submodule_repo_for(spf->r, submodule); + if (repo) { child_process_init(cp); - cp->dir = strbuf_detach(_path, NULL); + cp->dir = xstrdup(repo->worktree); prepare_submodule_repo_env(>env_array); cp->git_cmd = 1; if (!spf->quiet) @@ -1319,10 +1338,23 @@ static int get_next_submodule(struct child_process *cp, argv_array_push(>args, default_argv); argv_array_push(>args, "--submodule-prefix"); argv_array_push(>args, submodule_prefix.buf); + + repo_clear(repo); + free(repo); ret = 1; + } else { + /* +* An empty directory is normal, +* the submodule is not initialized +*/ + if (S_ISGITLINK(ce->ce_mode) && + !is_empty_dir(ce->name)) { + spf->result = 1; + strbuf_addf(err, + _("Could not access submodule '%s'"), + ce->name); + } } - strbuf_release(_path); - strbuf_release(_git_dir); strbuf_release(_prefix); if (ret) { spf->count++; -- 2.20.0.rc1.387.gf8505762e3-goog
[PATCH 8/9] submodule.c: fetch in submodules git directory instead of in worktree
Keep the properties introduced in 10f5c52656 (submodule: avoid auto-discovery in prepare_submodule_repo_env(), 2016-09-01), by fixating the git directory of the submodule. Signed-off-by: Stefan Beller --- submodule.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/submodule.c b/submodule.c index 77ace5e784..d1b6646f42 100644 --- a/submodule.c +++ b/submodule.c @@ -494,6 +494,12 @@ void prepare_submodule_repo_env(struct argv_array *out) DEFAULT_GIT_DIR_ENVIRONMENT); } +static void prepare_submodule_repo_env_in_gitdir(struct argv_array *out) +{ + prepare_submodule_repo_env_no_git_dir(out); + argv_array_pushf(out, "%s=.", GIT_DIR_ENVIRONMENT); +} + /* Helper function to display the submodule header line prior to the full * summary output. If it can locate the submodule objects directory it will * attempt to lookup both the left and right commits and put them into the @@ -1327,8 +1333,8 @@ static int get_next_submodule(struct child_process *cp, repo = get_submodule_repo_for(spf->r, submodule); if (repo) { child_process_init(cp); - cp->dir = xstrdup(repo->worktree); - prepare_submodule_repo_env(>env_array); + cp->dir = xstrdup(repo->gitdir); + prepare_submodule_repo_env_in_gitdir(>env_array); cp->git_cmd = 1; if (!spf->quiet) strbuf_addf(err, "Fetching submodule %s%s\n", -- 2.20.0.rc1.387.gf8505762e3-goog
[PATCH 6/9] repository: repo_submodule_init to take a submodule struct
When constructing a struct repository for a submodule for some revision of the superproject where the submodule is not contained in the index, it may not be present in the working tree currently either. In that situation giving a 'path' argument is not useful. Upgrade the repo_submodule_init function to take a struct submodule instead. The submodule struct can be obtained via submodule_from_{path, name} or an artificial submodule struct can be passed in. While we are at it, rename the repository struct in the repo_submodule_init function, which is to be initialized, to a name that is not confused with the struct submodule as easily. Perform such renames in similar functions as well. Also move its documentation into the header file. Reviewed-by: Jonathan Tan Signed-off-by: Stefan Beller --- builtin/grep.c | 17 +++- builtin/ls-files.c | 12 + builtin/submodule--helper.c | 2 +- repository.c | 27 repository.h | 12 +++-- t/helper/test-submodule-nested-repo-config.c | 8 +++--- 6 files changed, 43 insertions(+), 35 deletions(-) diff --git a/builtin/grep.c b/builtin/grep.c index 71df52a333..d6bd887b2d 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -404,7 +404,10 @@ static int grep_submodule(struct grep_opt *opt, struct repository *superproject, const struct object_id *oid, const char *filename, const char *path) { - struct repository submodule; + struct repository subrepo; + const struct submodule *sub = submodule_from_path(superproject, + _oid, path); + int hit; /* @@ -420,12 +423,12 @@ static int grep_submodule(struct grep_opt *opt, struct repository *superproject, return 0; } - if (repo_submodule_init(, superproject, path)) { + if (repo_submodule_init(, superproject, sub)) { grep_read_unlock(); return 0; } - repo_read_gitmodules(); + repo_read_gitmodules(); /* * NEEDSWORK: This adds the submodule's object directory to the list of @@ -437,7 +440,7 @@ static int grep_submodule(struct grep_opt *opt, struct repository *superproject, * store is no longer global and instead is a member of the repository * object. */ - add_to_alternates_memory(submodule.objects->objectdir); + add_to_alternates_memory(subrepo.objects->objectdir); grep_read_unlock(); if (oid) { @@ -462,14 +465,14 @@ static int grep_submodule(struct grep_opt *opt, struct repository *superproject, init_tree_desc(, data, size); hit = grep_tree(opt, pathspec, , , base.len, - object->type == OBJ_COMMIT, ); + object->type == OBJ_COMMIT, ); strbuf_release(); free(data); } else { - hit = grep_cache(opt, , pathspec, 1); + hit = grep_cache(opt, , pathspec, 1); } - repo_clear(); + repo_clear(); return hit; } diff --git a/builtin/ls-files.c b/builtin/ls-files.c index c70a9c7158..583a0e1ca2 100644 --- a/builtin/ls-files.c +++ b/builtin/ls-files.c @@ -206,17 +206,19 @@ static void show_files(struct repository *repo, struct dir_struct *dir); static void show_submodule(struct repository *superproject, struct dir_struct *dir, const char *path) { - struct repository submodule; + struct repository subrepo; + const struct submodule *sub = submodule_from_path(superproject, + _oid, path); - if (repo_submodule_init(, superproject, path)) + if (repo_submodule_init(, superproject, sub)) return; - if (repo_read_index() < 0) + if (repo_read_index() < 0) die("index file corrupt"); - show_files(, dir); + show_files(, dir); - repo_clear(); + repo_clear(); } static void show_ce(struct repository *repo, struct dir_struct *dir, diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c index d38113a31a..4eceb8f040 100644 --- a/builtin/submodule--helper.c +++ b/builtin/submodule--helper.c @@ -2053,7 +2053,7 @@ static int ensure_core_worktree(int argc, const char **argv, const char *prefix) if (!sub) BUG("We could get the submodule handle before?"); - if (repo_submodule_init(, the_repository, path)) + if (repo_submodule_init(, the_repository, sub)) die(_("could not get a repository handle for submodule '%s'"), path); if (!repo_config_get_string(, "core.worktree&
[PATCH 1/9] sha1-array: provide oid_array_filter
Helped-by: Junio C Hamano Signed-off-by: Stefan Beller Signed-off-by: Junio C Hamano --- Documentation/technical/api-oid-array.txt | 5 + sha1-array.c | 17 + sha1-array.h | 3 +++ 3 files changed, 25 insertions(+) diff --git a/Documentation/technical/api-oid-array.txt b/Documentation/technical/api-oid-array.txt index 9febfb1d52..c97428c2c3 100644 --- a/Documentation/technical/api-oid-array.txt +++ b/Documentation/technical/api-oid-array.txt @@ -48,6 +48,11 @@ Functions is not sorted, this function has the side effect of sorting it. +`oid_array_filter`:: + Apply the callback function `want` to each entry in the array, + retaining only the entries for which the function returns true. + Preserve the order of the entries that are retained. + Examples diff --git a/sha1-array.c b/sha1-array.c index b94e0ec0f5..d922e94e3f 100644 --- a/sha1-array.c +++ b/sha1-array.c @@ -77,3 +77,20 @@ int oid_array_for_each_unique(struct oid_array *array, } return 0; } + +void oid_array_filter(struct oid_array *array, + for_each_oid_fn want, + void *cb_data) +{ + unsigned nr = array->nr, src, dst; + struct object_id *oids = array->oid; + + for (src = dst = 0; src < nr; src++) { + if (want([src], cb_data)) { + if (src != dst) + oidcpy([dst], [src]); + dst++; + } + } + array->nr = dst; +} diff --git a/sha1-array.h b/sha1-array.h index 232bf95017..55d016c4bf 100644 --- a/sha1-array.h +++ b/sha1-array.h @@ -22,5 +22,8 @@ int oid_array_for_each(struct oid_array *array, int oid_array_for_each_unique(struct oid_array *array, for_each_oid_fn fn, void *data); +void oid_array_filter(struct oid_array *array, + for_each_oid_fn want, + void *cbdata); #endif /* SHA1_ARRAY_H */ -- 2.20.0.rc1.387.gf8505762e3-goog
[PATCH 2/9] submodule.c: fix indentation
The submodule subsystem is really bad at staying within 80 characters. Fix it while we are here. Signed-off-by: Stefan Beller Signed-off-by: Junio C Hamano --- submodule.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/submodule.c b/submodule.c index 6415cc5580..bc48ea3b68 100644 --- a/submodule.c +++ b/submodule.c @@ -1271,7 +1271,8 @@ static int get_next_submodule(struct child_process *cp, if (!submodule) { const char *name = default_name_or_path(ce->name); if (name) { - default_submodule.path = default_submodule.name = name; + default_submodule.path = name; + default_submodule.name = name; submodule = _submodule; } } @@ -1281,8 +1282,10 @@ static int get_next_submodule(struct child_process *cp, default: case RECURSE_SUBMODULES_DEFAULT: case RECURSE_SUBMODULES_ON_DEMAND: - if (!submodule || !unsorted_string_list_lookup(_submodule_names, -submodule->name)) + if (!submodule || + !unsorted_string_list_lookup( + _submodule_names, + submodule->name)) continue; default_argv = "on-demand"; break; -- 2.20.0.rc1.387.gf8505762e3-goog
Re: [PATCH 3/5] pack-objects: add --sparse option
> +--sparse:: > + Use the "sparse" algorithm to determine which objects to include in > + the pack. This can have significant performance benefits when > computing > + a pack to send a small change. However, it is possible that extra > + objects are added to the pack-file if the included commits contain > + certain types of direct renames. As a user, where do I find a discussion of these walking algorithms? (i.e. how can I tell if I can really expect to gain performance benefits, what are the tradeoffs? "If it were strictly better than the original, it would be on by default" might be a thought of a user)
Re: [PATCH/RFC v2 0/7] Introduce new commands switch-branch and checkout-files
On Wed, Nov 28, 2018 at 12:09 PM Duy Nguyen wrote: > > On Wed, Nov 28, 2018 at 9:01 PM Duy Nguyen wrote: > > should we do > > something about detached HEAD in this switch-branch command (or > > whatever its name will be)? > > > > This is usually a confusing concept to new users > > And it just occurred to me that perhaps we should call this "unnamed > branch" (at least at high UI level) instead of detached HEAD. It is > technically not as accurate, but much better to understand. or 'direct' branch? I mean 'detached HEAD' itself is also not correct as the HEAD points to a valid commit/tag usually, so it is attached to that content. The detachment comes from the implicit "from a branch".
Re: Git pull confusing output
On Tue, Nov 27, 2018 at 10:31 PM Junio C Hamano wrote: > > Will writes: > > > I’m far from being a guru, but I consider myself a competent Git > > user. Yet, here’s my understanding of the output of one the most-used > > commands, `git push`: > >> Counting objects: 6, done. > > No idea what an “object” is. Apparently there’s 6 of them > > here. What does “counting” them means? Should I care? > > You vaguely recall that the last time you pushed you saw ~400 > objects counted there, so you get the feeling how active you were > since then. > > It is up to you if you are interested in such a feel of the level of > activity. "git fetch" (hence "git pull") would also give you a > similar "feel", e.g. "the last fetch was ~1200 objects and today's > is mere ~200, so it seems it is a relatively slow day". > > As to "what is an object?", there are plenty of Git tutorials and > books to learn the basics from. Again, it is up to you if you care. While this is very interesting to the experienced git user, the approximation of activity by object count is very coarse to say at least. As It approximates changes in the DAG object count and nothing about the deltas (which as we learn comes later and it comes with a progress meter in bytes), it only provides the basics. > > I think we have "--quiet" option for those who do not care. I think some users are not focused as much on the version control as much as they are focused on another problem that is solved with the content inside the repo. Which means they only care about 'actionable' output, such as * errors * information provided by remote (e.g. links to click to start a code review) * too long waiting time (so they can abort and inspect the problem) I would suggest we come up with a mode that is "not quiet", but cuts down to only the basic actionable items [and make that the default eventually]. Now these actionable items depend on the workflow used, for which I think an email based maintainers workflow is not the norm. The vast majority of people uses git-push to upload their change to a code review system instead. And for such a workflow the size (as proxied by object/delta count) is not as important as the target ref that you push to or potentially the diffstat output of a potential merge to a target branch. TLDR: I still think making git-push a bit more quiet is beneficial to the user base at large. Stefan
Re: [bug report] git-gui child windows are blank
On Wed, Nov 28, 2018 at 6:13 AM Kenn Sebesta wrote: > > v2.19.2, installed from brew on macOS Mojave 14.2.1. > > `git-gui` is my much beloved go-to tool for everything git. > Unfortunately, on my new Macbook Air it seems to have a bug. When I > first load the program, the parent window populates normally with the > stage/unstaged and diff panes. However, when I click Push, the child > window is completely blank. The frame is there, but there is no > content. > > This same behavior is seen if I do a `git gui blame`, where the > primary blame window opens normally but all the children windows are > empty. > > I have googled but was unsuccessful in finding a solution. Is this a > known issue? Looking through the mailing list archive, this seemed to be one of the last relevant messges https://public-inbox.org/git/20180724065120.7664-1-sunsh...@sunshineco.com/ > > > --Kenn
Re: [PATCH v2 6/7] checkout: split into switch-branch and checkout-files
On Wed, Nov 28, 2018 at 7:31 AM Duy Nguyen wrote: > > On Wed, Nov 28, 2018 at 7:03 AM Junio C Hamano wrote: > > > > Nguyễn Thái Ngọc Duy writes: > > > > > The good old "git checkout" command is still here and will be until > > > all (or most of users) are sick of it. > > > > Two comments on the goal (the implementation looked reasonable > > assuming the reader agrees with the gaol). > > > > At least to me, the verb "switch" needs two things to switch > > between, i.e. "switch A and B", unless it is "switch to X". > > Either "switch-to-branch" or simply "switch-to", perhaps? > > > > As I already hinted in my response to Stefan (?) about > > checkout-from-tree vs checkout-from-index, a command with multiple > > modes of operation is not confusing to people with the right mental > > model, and I suspect that having two separate commands for "checking > > out a branch" and "checking out paths" that is done by this step > > would help users to form the right mental model. > > Since the other one is already "checkout-files", maybe this one could > just be "checkout-branch". I dislike the checkout-* names, as we already have checkout-index as plumbing, so it would be confusing as to which checkout-* command should be used when and why as it seems the co-index moves content *from index* to the working tree, but the co-files moves content *to files*, whereas checkout-branch is neither 'moving' to or from a branch but rather 'switching' to that branch.
Re: [PATCH v2 4/7] checkout: move dwim_new_local_branch to checkout_opts
On Tue, Nov 27, 2018 at 8:53 AM Nguyễn Thái Ngọc Duy wrote: > > Signed-off-by: Nguyễn Thái Ngọc Duy I would not mind to have this squashed into the previous patch but keeping it separated is fine, too. (Reason for squashing: it makes it clearer that we do not care about one specific option, but have to treat all the loose options the same way.)
Re: [PATCH v2 3/7] checkout: move 'confict_style' to checkout_opts
On Tue, Nov 27, 2018 at 8:53 AM Nguyễn Thái Ngọc Duy wrote: > The last patches seemed self explanatory after the first RFC and their commit message. This one is harder to reason about, as --conflict is documented as "The same as --merge option above, but ..." and --merge is "When switching branches, ..." so ... ah my mind wandered off, expecting this to be a preparation for the separate commands already, but this is just about ensuring everything is in opts such that the split can be done later?
Re: [PATCH v2 1/7] parse-options: allow parse_options_concat(NULL, options)
On Tue, Nov 27, 2018 at 8:53 AM Nguyễn Thái Ngọc Duy wrote: > > There is currently no caller that calls this function with "a" being > NULL. But it will be introduced shortly. It is used to construct the > option array from scratch, e.g. > >struct parse_options opts = NULL; >opts = parse_options_concat(opts, opts_1); >opts = parse_options_concat(opts, opts_2); While this addresses the immediate needs, I'd prefer to think about the API exposure of parse_options_concat, (related: do we want to have docs in its header file?) and I'd recommend to make it symmetrical, i.e. allow the second argument to also be NULL? In the example given here, you'd just short it to struct parse_options opts = opts_1; opts = parse_options_concat(opts, opts_2); if not for this patch. Are opts_{1,2} ever be NULL?
Re: Git pull confusing output
On Tue, Nov 27, 2018 at 8:52 AM Will wrote: > And even them, do they need this info every time they push? I agree that we should make the output a bit more user friendly, which means we'd only want to output relevant data for the user. The different phases taking each one line takes up precious screen real estate, so another approach would be delete the line after one phase is finished, such that you'd only see the currently active phase (that can be useful for debugging as in "The phase of 'Writing objects' takes very long" -> slow network connection). > I feel like a less intimidating output would help, while showing info > about objects and deltas with the verbose flag: I agree that most information in pushing is not very useful and could be omitted. This helps in multiple ways: * it keeps the focus on the actually important information, see bf1a11f0a1 (sideband: highlight keywords in remote sideband output, 2018-08-07) * less space in a terminal wasted, such that you can scroll over it better > > Compressing… done After the push succeeded this information would not be useful any more, it is only useful during the compression phase (Does it progress quickly enough? or does it error out?) Slightly related (but applies mostly to fetch, for which this discussion can also be had): When fetching, these informations are generated on the remote side (as the server needs to create the packfile according to your local state that you negotiated with the server), which takes some time. Sending over this information also keeps the connection alive. This is only relevant in corner cases depending on the setup of the hosting provider/repository, but it led to commits such as https://eclipse.googlesource.com/jgit/jgit/+/a38531b21c7e2b0dc956e0ed1bfc9513f604273c in the java implementation of Git. > > Pushing to github.com:williamdclt/some-repo.git… done > > 1ca9aaa..4320d30 master -> master > > > I’d be more than happy to work on this (`git push` is an example > amongst so many other), but want the mailing list’s opinion on it. Am > I wrong in thinking that this output is not something users want, am I > fighting windmills or maybe just being ignorant? I think this would be a useful patch, but it could get complicated quickly: push uses other low level git commands to prepare the packfile to be sent to the server, currently it only needs to pipe through the output of the low level command (or even have the low level command directly write to the terminal). The output of those low level commands should not be changed when run on their own, I would assume. So maybe the best way to dive into understanding what happens under the hood in git-push is to run GIT_TRACE=1 git push ... and see what child processes are invoked (e.g. run_command: git pack-objects --all-progress-implied) and then we'd need to change the output of iff the specific progress flag is given. Stefan
Re: Confusing behavior with ignored submodules and `git commit -a`
On Thu, Nov 15, 2018 at 4:31 PM Michael Forney wrote: > > On 2018-11-15, Stefan Beller wrote: > > On Thu, Nov 15, 2018 at 1:33 PM Michael Forney wrote: > >> Well, currently the submodule config can be disabled in diff_flags by > >> setting override_submodule_config=1. However, I'm thinking it may be > >> simpler to selectively *enable* the submodule config in diff_flags > >> where it is needed instead of disabling it everywhere else (i.e. > >> use_submodule_config instead of override_submodule_config). > > > > This sounds like undoing the good(?) part of the series that introduced > > this regression, as before that we selectively loaded the submodule > > config, which lead to confusion when you forgot it. Selectively *enabling* > > the submodule config sounds like that state before? > > > > Or do we *only* talk about enabling the ignore flag, while loading the > > rest of the submodule config automatic? > > Yes, that is what I meant. I believe the automatic loading of > submodule config is the right thing to do, it just uncovered cases > where the current handling of override_submodule_config is not quite > sufficient. That sounds good. > > My suggestion of replacing override_submodule_config with > use_submodule_config is because it seems like there are fewer places > where we want to apply the ignore config than not. I think it should > only apply in diffs against the working tree and when staging changes > to the index (unless specified explicitly). The documentation just > mentions the "diff family", but all but one of the possible values for > submodule..ignore ("all") don't make sense unless comparing with > the working tree. This is also how show/log -p behaved in git <2.15. > So I think that clarifying that it is about modifications *to the > working tree* would be a good idea. > > >> I'm also starting to see why this is tricky. The only difference that > >> diff.c:run_diff_files sees between `git add inner` and `git add --all` > >> is whether the index entry matched the pathspec exactly or not. > > > > Unrelated to the trickiness, I think we'd need to document the behavior > > of the -a flag in git-add and git-commit better as adding the diff below > > will depart from the "all" rule again, which I thought was a strong > > motivator for Brandons series (IIRC). > > Can you explain what you mean by the "all" rule? -a as short for --all in git commit, and I presumed it to be '--all-content', but documentation tells me it's about files only, so there is no really an "all" rule, but rather me misunderstanding to what it applies.
Re: [RFC] Introduce two new commands, switch-branch and restore-paths
On Mon, Nov 26, 2018 at 8:01 AM Ævar Arnfjörð Bjarmason wrote: > > > On Tue, Nov 20 2018, Duy Nguyen wrote: > > > On Mon, Nov 19, 2018 at 04:19:53PM +0100, Duy Nguyen wrote: > >> I promise to come back with something better (at least it still > >> sounds better in my mind). If that idea does not work out, we can > >> come back and see if we can improve this. > > > > So this is it. The patch isn't pretty, mostly as a proof of > > concept. Just look at the three functions at the bottom of checkout.c, > > which is the main thing. > > > > This patch tries to split "git checkout" command in two new ones: > > > > - git switch-branch is all about switching branches > > Isn't this going to also take the other ref arguments 'git checkout' > takes now? I.e. tags, detached HEADs etc? I'm reminded of the discussion > about what "range-diff" should be called :) Heh, good call. :-) Note that the color of a bikeshed has fewer functional implications than coming up with proper names in your API exposed to millions of users, as cognitive associations playing mind tricks, can have a huge impact on their productivity. ;-) In a neighboring thread there is discussion of the concept of a 'change' (and evolving the change locally), which is yet another thing in the refs-space. 'switch-branch' sounds like a good name for a beginner who is just getting started, but as soon as they discover that there is more than branches (detached HEAD via commits, tags, remote tracking "branches"), this name may be confusing. So it would not be a good choice for the intermediate Git user. The old time power user would not care as they have 'checkout' in their muscle memory, aliased to 'co', but maybe they'd find it nice for explaining to new users. So I'd be thrilled to find a name that serves users on all levels. Maybe we need to step back and consider what the command does. And from that I would name it "rewire-HEAD-and-update-index" and then simplify from there. For the beginner user, the concept of HEAD might be overwhelming, such that we don't want to have that in there. So I'd be tempted to suggest to call it "switch-to-ref", but that would be wrong in the corner case as well: When using that with a remote tracking branch, you don't "switch to it" by putting it into your HEAD, but you merely checkout the commit that it's pointing at. > > > - git restore-paths (maybe restore-file is better) for checking out > > paths "content-to-path", maybe(?) as it moves the content (as given by commit or implicitly assuming the index when omitted) into that path(, again). (I am not enthused about this, as you can similarly argue for content-to-paths, content-to-worktree, which then could split up into "index-to-worktree [pathspec]" as well as "tree-to-worktree ". also the notion of X-to-Y seems a novel concept in our naming, so maybe verb-noun is better, hence restore-path or "fix-paths" may be better) > > Later on, we could start to add a bit more stuff in, e.g. some form of > > disambiguation is no longer needed when running as switch-branch, or > > restore-paths. > > > > So, what do you think? The patch looks interestingly small :-) > That "git checkout" does too many things is something that keeps coming > up in online discussions about Git's UI. Two things: > > a) It would really help to have some comparison of cases where these >split commands are much clearer or less ambiguous than >git-checkout. I can think of some (e.g. branch with the same name as >a file) but having some overall picture of what the new UI looks like >with solved / not solved cases would be nice. Also a comparison with >other SCMs people find less confusing (svn, hg, bzr, ...) How do other SCMs solve this issue? (What is their design space? How many commands do they have for what git-checkout does all-in-one?) > b) I think we really need to have some end-game where we'd actually >switch away from "checkout" (which we could still auto-route to new >commands in perpetuity, but print a warning or error). Otherwise >we'll just end up with https://xkcd.com/927/ and more UI confusion >for all. Heh, that situation is only avoided when the new command has clear advantages over the old, and ISTM that we can only compete on UX and better defaults, so maybe I'd push for making it more logical, maybe so: git tree-to-worktree # git checkout -- git index-to-worktree # git checkout -- git rev-to-ref # git checkout Just food for thought, specifically the last one would be hilarious if we'd end up with it. Stefan
Re: [PATCH 2/5] ieot: default to not writing IEOT section
On Mon, Nov 26, 2018 at 1:48 PM Ben Peart wrote: > > > > On 11/26/2018 2:59 PM, Stefan Beller wrote: > >>> +static int record_ieot(void) > >>> +{ > >>> + int val; > >>> + > >> > >> Initialize stack val to zero to ensure proper default. > > > > I don't think that is needed here, as we only use `val` when > > we first write to it via git_config_get_bool. > > > > Did you spot this via code review and thought of > > defensive programming or is there a tool that > > has a false positive here? > > > > Code review and defensive programming. I had to review the code in > git_config_get_bool() to see if it always initialized the val even if it > didn't find the requested config variable (esp since we don't pass in a > default value for this function like we do others). > Ah, sorry to have sent out this email, which I found as one of the earliest discussions in my mailbox. The later patches/discussions became a lot more heated from my cursory skimming and sorted out this as well. It is interesting to notice that, as I also had to lookup how the config machinery works (once? a couple times?) but now it is so hardcoded in my brain to assume that if functions like git_config_* take the branch, we can access the value that the config function was supposed to read into. Sorry for the noise, Stefan
Re: What's cooking in git.git (Nov 2018, #06; Wed, 21)
> > * sb/submodule-recursive-fetch-gets-the-tip (2018-10-31) 11 commits > [...] > > "git fetch --recurse-submodules" may not fetch the necessary commit > that is bound to the superproject, which is getting corrected. > > Is the discussion on this topic over? What was the outcome? Please don't merge down the topic in the current state. I have a local updated version sitting on my disk and plan to send it once I made sure it addressed all comments of various revisions.
Re: [PATCH v2 0/9] diff --color-moved-ws fixes and enhancment
On Fri, Nov 23, 2018 at 3:17 AM Phillip Wood wrote: > > From: Phillip Wood > > Thanks to Stefan for his feedback on v1. I've updated patches 2 & 8 in > response to those comments - see the range-diff below for details (the > patch numbers are off by one in the range diff, I think because the > first patch is unchanged and so it was used as the merge base by > --range-diff=. `git range-diff` accepts a three dotted "range" OLD...NEW as an easy abbreviation for the arguments "COMMON..OLD COMMON..NEW" and the common element is computed as the last common element. It doesn't have knowledge about where you started your topic branch. > For some reason the range-diff also includes > the notes even though I did not give --notes to format-patch) This is interesting. The existence of notes.rewrite. seems to work well with the range-diff then, as the config would trigger the copy-over of notes and then range-diff would diff the original notes to the new notes. > > When trying out the new --color-moved-ws=allow-indentation-change I > was disappointed to discover it did not work if the indentation > contains a mix of spaces and tabs. This series reworks it so that it > does. > The range-diff looks good to me. Thanks, Stefan
Re: [PATCH v1 1/2] log -G: Ignore binary files
On Wed, Nov 21, 2018 at 1:08 PM Thomas Braun wrote: > > The -G option of log looks for the differences whose patch text > contains added/removed lines that match regex. > > The concept of differences only makes sense for text files, therefore > we need to ignore binary files when searching with -G as well. What about partial text/partial binary files? I recall using text searching tools (not necessarily git machinery, my memory is fuzzy) to check for strings in pdf files, which are usually marked binary in context of git, such that we do not see their diffs in `log -p`. But I would expect a search with -G or -S to still work... until I find the exception in the docs, only to wonder if there is a switch to turn off this optimisation for this corner case. Stefan
Re: [PATCH 2/5] ieot: default to not writing IEOT section
> > +static int record_ieot(void) > > +{ > > + int val; > > + > > Initialize stack val to zero to ensure proper default. I don't think that is needed here, as we only use `val` when we first write to it via git_config_get_bool. Did you spot this via code review and thought of defensive programming or is there a tool that has a false positive here? > > > + if (!git_config_get_bool("index.recordoffsettable", )) > > + return val; > > + return 0; > > +}
Re: [PATCH] grep: use grep_opt->repo instead of explict repo argument
On Sun, Nov 18, 2018 at 8:38 AM Nguyễn Thái Ngọc Duy wrote: > > This command is probably the first one that operates on a repository > other than the_repository, in f9ee2fcdfa (grep: recurse in-process > using 'struct repository' - 2017-08-02). An explicit 'struct > repository *' was added in that commit to pass around the repository > that we're supposed to grep from. > > Since 38bbc2ea39 (grep.c: remove implicit dependency on the_index - > 2018-09-21). 'struct grep_opt *' carries in itself a repository > parameter for grepping. We should now be able to reuse grep_opt to > hold the submodule repo instead of a separate argument, which is just > room for mistakes. That makes sense. Assuming we did not make a mistake yet, the test suite would not need changes (further assuming we do test for it, but we do as we have explicit submodule grep tests). > > While at there, use the right reference instead of the_repository and > the_index in this code. I was a bit careless in my attempt to kick > the_repository / the_index out of library code. It's normally safe to > just stick the_repository / the_index in bultin/ code, but it's not > the case for grep. Reviewed-by: Stefan Beller > Signed-off-by: Nguyễn Thái Ngọc Duy
Re: [PATCH v1 9/9] diff --color-moved-ws: handle blank lines
On Fri, Nov 16, 2018 at 3:04 AM Phillip Wood wrote: > > From: Phillip Wood > > When using --color-moved-ws=allow-indentation-change allow lines with > the same indentation change to be grouped across blank lines. For now > this only works if the blank lines have been moved as well, not for > blocks that have just had their indentation changed. > > This completes the changes to the implementation of > --color-moved=allow-indentation-change. Running > > git diff --color-moved=allow-indentation-change v2.18.0 v2.19.0 > > now takes 5.0s. This is a saving of 41% from 8.5s for the optimized > version of the previous implementation and 66% from the original which > took 14.6s. > > Signed-off-by: Phillip Wood > --- > > Notes: > Changes since rfc: > - Split these changes into a separate commit. > - Detect blank lines when processing the indentation rather than >parsing each line twice. > - Tweaked the test to make it harder as suggested by Stefan. > - Added timing data to the commit message. > > diff.c | 34 --- > t/t4015-diff-whitespace.sh | 41 ++ > 2 files changed, 68 insertions(+), 7 deletions(-) > > diff --git a/diff.c b/diff.c > index 89559293e7..072b5bced6 100644 > --- a/diff.c > +++ b/diff.c > @@ -792,9 +792,11 @@ static void moved_block_clear(struct moved_block *b) > memset(b, 0, sizeof(*b)); > } > > +#define INDENT_BLANKLINE INT_MIN Answering my question from the previous patch: This is why we need to keep the indents signed. This patch looks quite nice to read along. The whole series looks good to me. Do we need to update the docs in any way? Thanks, Stefan
Re: [PATCH v1 8/9] diff --color-moved-ws: modify allow-indentation-change
On Fri, Nov 16, 2018 at 3:04 AM Phillip Wood wrote: > > From: Phillip Wood > > Currently diff --color-moved-ws=allow-indentation-change does not > support indentation that contains a mix of tabs and spaces. For > example in commit 546f70f377 ("convert.h: drop 'extern' from function > declaration", 2018-06-30) the function parameters in the following > lines are not colored as moved [1]. > > -extern int stream_filter(struct stream_filter *, > -const char *input, size_t *isize_p, > -char *output, size_t *osize_p); > +int stream_filter(struct stream_filter *, > + const char *input, size_t *isize_p, > + char *output, size_t *osize_p); > > This commit changes the way the indentation is handled to track the > visual size of the indentation rather than the characters in the > indentation. This has they benefit that any whitespace errors do not s/they/the/ > interfer with the move detection (the whitespace errors will still be > highlighted according to --ws-error-highlight). During the discussion > of this feature there were concerns about the correct detection of > indentation for python. However those concerns apply whether or not > we're detecting moved lines so no attempt is made to determine if the > indentation is 'pythonic'. > > [1] Note that before the commit to fix the erroneous coloring of moved > lines each line was colored as a different block, since that commit > they are uncolored. > > Signed-off-by: Phillip Wood > --- > > Notes: > Changes since rfc: > - It now replaces the existing implementation rather than adding a new >mode. > - The indentation deltas are now calculated once for each line and >cached. > - Optimized the whitespace delta comparison to compare string lengths >before comparing the actual strings. > - Modified the calculation of tabs as suggested by Stefan. > - Split out the blank line handling into a separate commit as suggest >by Stefan. > - Fixed some comments pointed out by Stefan. > > diff.c | 130 + > t/t4015-diff-whitespace.sh | 56 > 2 files changed, 129 insertions(+), 57 deletions(-) > > diff --git a/diff.c b/diff.c > index c378ce3daf..89559293e7 100644 > --- a/diff.c > +++ b/diff.c > @@ -750,6 +750,8 @@ struct emitted_diff_symbol { > const char *line; > int len; > int flags; > + int indent_off; > + int indent_width; So this is the trick how we compute the ws related data only once per line. :-) On the other hand, we do not save memory by disabling the ws detection, but I guess that is not a problem for now. Would it make sense to have the new variables be unsigned? (Also a comment on what they are, I needed to read the code to understand off to be offset into the line, where the content starts, and width to be the visual width, as I did not recall the RFC.) > +static void fill_es_indent_data(struct emitted_diff_symbol *es) > [...] > + if (o->color_moved_ws_handling & > + COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) > + fill_es_indent_data(>emitted_symbols->buf[n]); Nice. By reducing the information kept around to ints, we also do not need to alloc/free memory for each line. > +++ b/t/t4015-diff-whitespace.sh > @@ -1901,4 +1901,60 @@ test_expect_success 'compare whitespace delta > incompatible with other space opti > test_i18ngrep allow-indentation-change err > ' > > +test_expect_success 'compare mixed whitespace delta across moved blocks' ' Looks good, Thanks! Stefan
Re: [PATCH v1 7/9] diff --color-moved-ws: optimize allow-indentation-change
On Fri, Nov 16, 2018 at 3:04 AM Phillip Wood wrote: > > From: Phillip Wood > > When running > > git diff --color-moved-ws=allow-indentation-change v2.18.0 v2.19.0 > > cmp_in_block_with_wsd() is called 694908327 times. Of those 42.7% > return after comparing a and b. By comparing the lengths first we can > return early in all but 0.03% of those cases without dereferencing the > string pointers. The comparison between a and c fails in 6.8% of > calls, by comparing the lengths first we reject all the failing calls > without dereferencing the string pointers. > > This reduces the time to run the command above by by 42% from 14.6s to > 8.5s. This is still much slower than the normal --color-moved which > takes ~0.6-0.7s to run but is a significant improvement. > > The next commits will replace the current implementation with one that > works with mixed tabs and spaces in the indentation. I think it is > worth optimizing the current implementation first to enable a fair > comparison between the two implementations. Up to here the series looks good and I think we could take it as a preparatory self-standing series. I'll read on. Thanks, Stefan
Re: [PATCH v1 2/9] diff: use whitespace consistently
On Fri, Nov 16, 2018 at 3:04 AM Phillip Wood wrote: > > From: Phillip Wood > > Most of the documentation uses 'whitespace' rather than 'white space' > or 'white spaces' convert to latter two to the former for consistency. Makes sense; this doesn't touch docs, but also code. $ git grep "white space" yields some other places as well (Documentation/git-cat-file.txt and lots in t/) But I guess we keep it to this feature for now instead of a tree wide cleanup. Stefan
Re: Confusing behavior with ignored submodules and `git commit -a`
On Thu, Nov 15, 2018 at 1:33 PM Michael Forney wrote: > > On 2018-11-15, Stefan Beller wrote: > > On Wed, Nov 14, 2018 at 10:05 PM Michael Forney > > wrote: > >> Looking at ff6f1f564c, I don't really see anything that might be > >> related to git-add, git-reset, or git-diff, so I'm guessing that this > >> only worked before because the submodule config wasn't getting loaded > >> during `git add` or `git reset`. Now that the config is loaded > >> automatically, submodule..ignore started taking effect where it > >> shouldn't. > >> > >> Unfortunately, this doesn't really get me much closer to finding a fix. > > > > Maybe selectively unloading or overwriting the config? > > > > Or we can change is_submodule_ignored() in diff.c > > to be only applied selectively whether we are running the > > right command? For this approach we'd have to figure out the > > set of commands to which the ignore config should apply or > > not (and come up with a more concise documentation then) > > > > This approach sounds appealing to me as it would cover > > new commands as well and we'd only have a central point > > where the decision for ignoring is made. > > Well, currently the submodule config can be disabled in diff_flags by > setting override_submodule_config=1. However, I'm thinking it may be > simpler to selectively *enable* the submodule config in diff_flags > where it is needed instead of disabling it everywhere else (i.e. > use_submodule_config instead of override_submodule_config). This sounds like undoing the good(?) part of the series that introduced this regression, as before that we selectively loaded the submodule config, which lead to confusion when you forgot it. Selectively *enabling* the submodule config sounds like that state before? Or do we *only* talk about enabling the ignore flag, while loading the rest of the submodule config automatic? > I'm also starting to see why this is tricky. The only difference that > diff.c:run_diff_files sees between `git add inner` and `git add --all` > is whether the index entry matched the pathspec exactly or not. Unrelated to the trickiness, I think we'd need to document the behavior of the -a flag in git-add and git-commit better as adding the diff below will depart from the "all" rule again, which I thought was a strong motivator for Brandons series (IIRC). > Here is a work-in-progress diff that seems to have the correct > behavior in all cases I tried. Can you think of any cases that it > breaks? I'm not quite sure of the consequences of having diff_change > and diff_addremove always ignore the submodule config; git-diff and > git-status still seem to work correctly. > > diff --git a/builtin/add.c b/builtin/add.c > index f65c17229..9902f7742 100644 > --- a/builtin/add.c > +++ b/builtin/add.c > @@ -117,7 +117,6 @@ int add_files_to_cache(const char *prefix, > rev.diffopt.output_format = DIFF_FORMAT_CALLBACK; > rev.diffopt.format_callback = update_callback; > rev.diffopt.format_callback_data = > - rev.diffopt.flags.override_submodule_config = 1; This line partially reverts 5556808, taking 02f2f56bc377c28 into account. > diff --git a/diff-lib.c b/diff-lib.c > index 83fce5151..fbb048cca 100644 > --- a/diff-lib.c > +++ b/diff-lib.c > @@ -68,12 +68,13 @@ static int check_removed(const struct cache_entry > *ce, struct stat *st) > static int match_stat_with_submodule(struct diff_options *diffopt, > const struct cache_entry *ce, > struct stat *st, unsigned ce_option, > -unsigned *dirty_submodule) > +unsigned *dirty_submodule, > +int exact) > [...]; This is an interesting take so far as it is all about *detecting* change here via stat information and not like the previous (before the regression) where it was about correcting output. match_stat_with_submodule would grow its documentation to be slightly more complicated as a result. > diff --git a/diff.c b/diff.c > index e38d1ecaf..73dc75286 100644 > --- a/diff.c > +++ b/diff.c > [...] > -static int is_submodule_ignored(const char *path, struct diff_options > *options) > -{ > [...] > - if (S_ISGITLINK(mode) && is_submodule_ignored(concatpath, options)) > + if (S_ISGITLINK(mode) && options->flags.ignore_submodules) > return; This basically inlines the function is_submodule_ignored, except for the part: if (!options->flags.override_submodule_config) set_diffopt_flags_from_submodule_config(options, path); but that was taken care off in match_stat_with_submodule in diff-lib? This WIP looks really promising, thanks for looking into this! Stefan
Re: [PATCH 18/23] submodule: use submodule repos for object lookup
On Thu, Nov 15, 2018 at 11:54 AM Jonathan Tan wrote: > > > +/* > > + * Initialize 'out' based on the provided submodule path. > > + * > > + * Unlike repo_submodule_init, this tolerates submodules not present > > + * in .gitmodules. This function exists only to preserve historical > > behavior, > > + * > > + * Returns 0 on success, -1 when the submodule is not present. > > */ > > -static void show_submodule_header(struct diff_options *o, const char *path, > > +static struct repository *open_submodule(const char *path) > > The function documentation needs to be reworded - there's no "out", and > the return value is now a possibly NULL pointer to struct repository. Noted. > > > +{ > > + struct strbuf sb = STRBUF_INIT; > > + struct repository *out = xmalloc(sizeof(*out)); > > + > > + if (submodule_to_gitdir(, path) || repo_init(out, sb.buf, NULL)) { > > + strbuf_release(); > > + free(out); > > + return NULL; > > + } > > + > > + out->submodule_prefix = xstrdup(path); > > I've discussed this submodule_prefix line before [1] - do we really need > this? Tests pass even if I remove this line. We might not need it yet as the tests indicate, but it's the right thing to do: /* * Path from the root of the top-level superproject down to this * repository. This is only non-NULL if the repository is initialized * as a submodule of another repository. */ We're not (yet) using this string in our error reporting, but I anticipate that we'll do eventually. > Other than that, this patch looks good. Thanks, Stefan
Re: Confusing behavior with ignored submodules and `git commit -a`
On Wed, Nov 14, 2018 at 10:05 PM Michael Forney wrote: > > +bmwill > > On 2018-11-14, Michael Forney wrote: > > On 2018-10-25, Stefan Beller wrote: > >> I guess reverting that commit is not a good idea now, as > >> I would expect something to break. > >> > >> Maybe looking through the series 614ea03a71 > >> (Merge branch 'bw/submodule-config-cleanup', 2017-08-26) > >> to understand why it happened in the context would be a good start. > > > > Thanks, that's a good idea. I'll take a look through that series. > > Interesting. If I build git from master after reverting 55568086, I do > indeed observe the issue it claims to fix (unable to add ignored > submodules). However, if I build from 9ef23f91fc (the immediate parent > of 55568086), I do not see the issue. > > Investigating this further, it seems that 55568086 addresses an issue > that does not appear until later on in the series in ff6f1f564c > (submodule-config: lazy-load a repository's .gitmodules file). Perhaps > this was a result of reordering commits during a rebase. In other > words, I get correct behavior until 55568086, and in > 55568086..ff6f1f564c^ if I revert 55568086. > > Looking at ff6f1f564c, I don't really see anything that might be > related to git-add, git-reset, or git-diff, so I'm guessing that this > only worked before because the submodule config wasn't getting loaded > during `git add` or `git reset`. Now that the config is loaded > automatically, submodule..ignore started taking effect where it > shouldn't. > > Unfortunately, this doesn't really get me much closer to finding a fix. Maybe selectively unloading or overwriting the config? Or we can change is_submodule_ignored() in diff.c to be only applied selectively whether we are running the right command? For this approach we'd have to figure out the set of commands to which the ignore config should apply or not (and come up with a more concise documentation then) This approach sounds appealing to me as it would cover new commands as well and we'd only have a central point where the decision for ignoring is made. Stefan
Re: Confusing behavior with ignored submodules and `git commit -a`
> I have a git repository which contains a number of submodules that > refer to external repositories. Some of these repositories need to > patched in some way, so patches are stored alongside the submodules, > and are applied when building. This mostly works fine, but causes > submodules to show up as modified in `git status` and get updated with > `git commit -a`. To resolve this, I've added `ignore = all` to > .gitmodules for all the submodules that need patches applied. This > way, I can explicitly `git add` the submodule when I want to update > the base commit, but otherwise pretend that they are clean. This has > worked pretty well for me, but less so since git 2.15 when this issue > was introduced. > > This is really bad. git-status and git-commit share some code, > > and we'll populate the commit message with a status output. > > So it seems reasonable to expect the status and the commit to match, > > i.e. if status tells me there is no change, then commit should not record > > the submodule update. > > I just checked and if I don't specify a message on the command-line, > the status output in the message template *does* mention that `inner` > is getting updated. That's good. > >> > There have been a couple occasions where I accidentally pushed local > >> > changes to ignored submodules because of this. Since they don't show > >> > up in the log output, it is difficult to figure out what actually has > >> > gone wrong. > > > > How was it prevented before? Just by git commit -a not picking up the > > submodule change? > > Yes. Previously, `git commit -a` would not pick up the change (unless > I added it explicitly with `git add`), and `git log` would still show > changes to ignored submodules (which is the behavior I want). and both are broken currently (commit -a will commit a submodule if it is changed, and it will also not show that in log, but it did show that it is committing it in the commit message template) > I just came across someone else affected by this issue: > https://github.com/git/git/commit/55568086#commitcomment-27137460 Point taken.
Re: [PATCH v2 1/1] bundle: cleanup lock files on error
On Wed, Nov 14, 2018 at 1:43 PM Martin Ågren wrote: > > On Wed, 14 Nov 2018 at 16:26, Gaël Lhez via GitGitGadget > wrote: > > However, the `.lock` file was still open and on Windows that means > > that it could not be deleted properly. This patch fixes that issue. > > Hmmm, doesn't the tempfile machinery remove the lock file when we die? On Windows this seems not to be the case. (Open files cannot be deleted as the open file is not kept by inode or similar but by the file path there?) Rewording your concern: Could the tempfile machinery be taught to work properly on Windows, e.g. by first closing all files and then deleting them afterwards? There was a refactoring of tempfiles merged in 89563ec379 (Merge branch 'jk/incore-lockfile-removal', 2017-09-19), which sounded promising at first, as it is dated after the original patch[1] date (June 2016), but it has no references for Windows being different, so we might still have the original issue; most of the lockfile infrastructure was done in 2015 via db86e61cbb (Merge branch 'mh/tempfile', 2015-08-25) [1] https://github.com/git-for-windows/git/pull/797
[PATCH 21/23] commit: prepare free_commit_buffer and release_commit_memory for any repo
Pass the object pool to free_commit_buffer and release_commit_memory, such that we can eliminate access to 'the_repository'. Also remove the TODO in release_commit_memory, as commit->util was removed in 9d2c97016f (commit.h: delete 'util' field in struct commit, 2018-05-19) Signed-off-by: Stefan Beller --- builtin/fsck.c | 3 ++- builtin/log.c | 6 -- builtin/rev-list.c | 3 ++- commit.c | 9 - commit.h | 4 ++-- object.c | 2 +- 6 files changed, 15 insertions(+), 12 deletions(-) diff --git a/builtin/fsck.c b/builtin/fsck.c index 06eb421720..c476ac6983 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -382,7 +382,8 @@ static int fsck_obj(struct object *obj, void *buffer, unsigned long size) if (obj->type == OBJ_TREE) free_tree_buffer((struct tree *)obj); if (obj->type == OBJ_COMMIT) - free_commit_buffer((struct commit *)obj); + free_commit_buffer(the_repository->parsed_objects, + (struct commit *)obj); return err; } diff --git a/builtin/log.c b/builtin/log.c index 061d4fd864..64c2649c7c 100644 --- a/builtin/log.c +++ b/builtin/log.c @@ -395,7 +395,8 @@ static int cmd_log_walk(struct rev_info *rev) * We may show a given commit multiple times when * walking the reflogs. */ - free_commit_buffer(commit); + free_commit_buffer(the_repository->parsed_objects, + commit); free_commit_list(commit->parents); commit->parents = NULL; } @@ -1922,7 +1923,8 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix) open_next_file(rev.numbered_files ? NULL : commit, NULL, , quiet)) die(_("Failed to create output files")); shown = log_tree_commit(, commit); - free_commit_buffer(commit); + free_commit_buffer(the_repository->parsed_objects, + commit); /* We put one extra blank line between formatted * patches and this flag is used by log-tree code diff --git a/builtin/rev-list.c b/builtin/rev-list.c index cc1b70522f..2b301fa315 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -196,7 +196,8 @@ static void finish_commit(struct commit *commit, void *data) free_commit_list(commit->parents); commit->parents = NULL; } - free_commit_buffer(commit); + free_commit_buffer(the_repository->parsed_objects, + commit); } static inline void finish_object__ma(struct object *obj) diff --git a/commit.c b/commit.c index 7d2f3a9a93..4fe74aa4bc 100644 --- a/commit.c +++ b/commit.c @@ -328,10 +328,10 @@ void repo_unuse_commit_buffer(struct repository *r, free((void *)buffer); } -void free_commit_buffer(struct commit *commit) +void free_commit_buffer(struct parsed_object_pool *pool, struct commit *commit) { struct commit_buffer *v = buffer_slab_peek( - the_repository->parsed_objects->buffer_slab, commit); + pool->buffer_slab, commit); if (v) { FREE_AND_NULL(v->buffer); v->size = 0; @@ -354,13 +354,12 @@ struct object_id *get_commit_tree_oid(const struct commit *commit) return _commit_tree(commit)->object.oid; } -void release_commit_memory(struct commit *c) +void release_commit_memory(struct parsed_object_pool *pool, struct commit *c) { c->maybe_tree = NULL; c->index = 0; - free_commit_buffer(c); + free_commit_buffer(pool, c); free_commit_list(c->parents); - /* TODO: what about commit->util? */ c->object.parsed = 0; } diff --git a/commit.h b/commit.h index 2e6b799b26..d2779a23f6 100644 --- a/commit.h +++ b/commit.h @@ -140,7 +140,7 @@ void repo_unuse_commit_buffer(struct repository *r, /* * Free any cached object buffer associated with the commit. */ -void free_commit_buffer(struct commit *); +void free_commit_buffer(struct parsed_object_pool *pool, struct commit *); struct tree *get_commit_tree(const struct commit *); struct object_id *get_commit_tree_oid(const struct commit *); @@ -149,7 +149,7 @@ struct object_id *get_commit_tree_oid(const struct commit *); * Release memory related to a commit, including the parent list and * any cached object buffer. */ -void release_commit_memory(struct commit *c); +void release_commit_memory(struct parsed_object_pool *pool, struct commit *c); /* * Disassociate any cached object buffer from the commit, but do not free it. diff --git a/object.c b/object.c index 003f870484..c4170d2d0c 100644 --- a/object.c +++ b/object.c
[PATCH 18/23] submodule: use submodule repos for object lookup
This converts the 'show_submodule_header' function to use the repository API properly, such that the submodule objects are not added to the main object store. Signed-off-by: Stefan Beller --- submodule.c | 73 ++--- 1 file changed, 58 insertions(+), 15 deletions(-) diff --git a/submodule.c b/submodule.c index d9d3046689..262f03f118 100644 --- a/submodule.c +++ b/submodule.c @@ -443,7 +443,7 @@ static int prepare_submodule_summary(struct rev_info *rev, const char *path, return prepare_revision_walk(rev); } -static void print_submodule_summary(struct rev_info *rev, struct diff_options *o) +static void print_submodule_summary(struct repository *r, struct rev_info *rev, struct diff_options *o) { static const char format[] = " %m %s"; struct strbuf sb = STRBUF_INIT; @@ -454,7 +454,8 @@ static void print_submodule_summary(struct rev_info *rev, struct diff_options *o ctx.date_mode = rev->date_mode; ctx.output_encoding = get_log_output_encoding(); strbuf_setlen(, 0); - format_commit_message(commit, format, , ); + repo_format_commit_message(r, commit, format, , + ); strbuf_addch(, '\n'); if (commit->object.flags & SYMMETRIC_LEFT) diff_emit_submodule_del(o, sb.buf); @@ -481,14 +482,44 @@ void prepare_submodule_repo_env(struct argv_array *out) DEFAULT_GIT_DIR_ENVIRONMENT); } -/* Helper function to display the submodule header line prior to the full - * summary output. If it can locate the submodule objects directory it will - * attempt to lookup both the left and right commits and put them into the - * left and right pointers. +/* + * Initialize 'out' based on the provided submodule path. + * + * Unlike repo_submodule_init, this tolerates submodules not present + * in .gitmodules. This function exists only to preserve historical behavior, + * + * Returns 0 on success, -1 when the submodule is not present. */ -static void show_submodule_header(struct diff_options *o, const char *path, +static struct repository *open_submodule(const char *path) +{ + struct strbuf sb = STRBUF_INIT; + struct repository *out = xmalloc(sizeof(*out)); + + if (submodule_to_gitdir(, path) || repo_init(out, sb.buf, NULL)) { + strbuf_release(); + free(out); + return NULL; + } + + out->submodule_prefix = xstrdup(path); + + strbuf_release(); + return out; +} + +/* + * Helper function to display the submodule header line prior to the full + * summary output. + * + * If it can locate the submodule git directory it will create a repository + * handle for the submodule and lookup both the left and right commits and + * put them into the left and right pointers. + */ +static void show_submodule_header(struct diff_options *o, + const char *path, struct object_id *one, struct object_id *two, unsigned dirty_submodule, + struct repository *sub, struct commit **left, struct commit **right, struct commit_list **merge_bases) { @@ -507,7 +538,7 @@ static void show_submodule_header(struct diff_options *o, const char *path, else if (is_null_oid(two)) message = "(submodule deleted)"; - if (add_submodule_odb(path)) { + if (!sub) { if (!message) message = "(commits not present)"; goto output_header; @@ -517,8 +548,8 @@ static void show_submodule_header(struct diff_options *o, const char *path, * Attempt to lookup the commit references, and determine if this is * a fast forward or fast backwards update. */ - *left = lookup_commit_reference(the_repository, one); - *right = lookup_commit_reference(the_repository, two); + *left = lookup_commit_reference(sub, one); + *right = lookup_commit_reference(sub, two); /* * Warn about missing commits in the submodule project, but only if @@ -528,7 +559,7 @@ static void show_submodule_header(struct diff_options *o, const char *path, (!is_null_oid(two) && !*right)) message = "(commits not present)"; - *merge_bases = get_merge_bases(*left, *right); + *merge_bases = repo_get_merge_bases(sub, *left, *right); if (*merge_bases) { if ((*merge_bases)->item == *left) fast_forward = 1; @@ -562,16 +593,18 @@ void show_submodule_summary(struct diff_options *o, const char *path, struct rev_info rev; struct commit *left = NULL, *right = NULL; struct commit_list *merge_bases = NULL; + struct repository *sub; + sub = open_
[PATCH 22/23] path.h: make REPO_GIT_PATH_FUNC repository agnostic
git_pathdup uses the_repository internally, but the macro REPO_GIT_PATH_FUNC is specifically made for arbitrary repositories. Switch to repo_git_path which works on arbitrary repositories. Signed-off-by: Stefan Beller --- path.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/path.h b/path.h index b654ea8ff5..651e6157fc 100644 --- a/path.h +++ b/path.h @@ -165,7 +165,7 @@ extern void report_linked_checkout_garbage(void); const char *git_path_##var(struct repository *r) \ { \ if (!r->cached_paths.var) \ - r->cached_paths.var = git_pathdup(filename); \ + r->cached_paths.var = repo_git_path(r, filename); \ return r->cached_paths.var; \ } -- 2.19.1.1215.g8438c0b245-goog
[PATCH 20/23] commit-graph: convert remaining functions to handle any repo
Convert all functions to handle arbitrary repositories in commit-graph.c that are used by functions taking a repository argument already. Notable exclusion is write_commit_graph and its local functions as that only works on the_repository. Signed-off-by: Stefan Beller --- commit-graph.c | 40 1 file changed, 24 insertions(+), 16 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 40c855f185..f78a8e96b5 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -292,7 +292,8 @@ static int bsearch_graph(struct commit_graph *g, struct object_id *oid, uint32_t g->chunk_oid_lookup, g->hash_len, pos); } -static struct commit_list **insert_parent_or_die(struct commit_graph *g, +static struct commit_list **insert_parent_or_die(struct repository *r, +struct commit_graph *g, uint64_t pos, struct commit_list **pptr) { @@ -303,7 +304,7 @@ static struct commit_list **insert_parent_or_die(struct commit_graph *g, die("invalid parent position %"PRIu64, pos); hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos); - c = lookup_commit(the_repository, ); + c = lookup_commit(r, ); if (!c) die(_("could not find commit %s"), oid_to_hex()); c->graph_pos = pos; @@ -317,7 +318,9 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, item->generation = get_be32(commit_data + g->hash_len + 8) >> 2; } -static int fill_commit_in_graph(struct commit *item, struct commit_graph *g, uint32_t pos) +static int fill_commit_in_graph(struct repository *r, + struct commit *item, + struct commit_graph *g, uint32_t pos) { uint32_t edge_value; uint32_t *parent_data_ptr; @@ -341,13 +344,13 @@ static int fill_commit_in_graph(struct commit *item, struct commit_graph *g, uin edge_value = get_be32(commit_data + g->hash_len); if (edge_value == GRAPH_PARENT_NONE) return 1; - pptr = insert_parent_or_die(g, edge_value, pptr); + pptr = insert_parent_or_die(r, g, edge_value, pptr); edge_value = get_be32(commit_data + g->hash_len + 4); if (edge_value == GRAPH_PARENT_NONE) return 1; if (!(edge_value & GRAPH_OCTOPUS_EDGES_NEEDED)) { - pptr = insert_parent_or_die(g, edge_value, pptr); + pptr = insert_parent_or_die(r, g, edge_value, pptr); return 1; } @@ -355,7 +358,7 @@ static int fill_commit_in_graph(struct commit *item, struct commit_graph *g, uin 4 * (uint64_t)(edge_value & GRAPH_EDGE_LAST_MASK)); do { edge_value = get_be32(parent_data_ptr); - pptr = insert_parent_or_die(g, + pptr = insert_parent_or_die(r, g, edge_value & GRAPH_EDGE_LAST_MASK, pptr); parent_data_ptr++; @@ -374,7 +377,9 @@ static int find_commit_in_graph(struct commit *item, struct commit_graph *g, uin } } -static int parse_commit_in_graph_one(struct commit_graph *g, struct commit *item) +static int parse_commit_in_graph_one(struct repository *r, +struct commit_graph *g, +struct commit *item) { uint32_t pos; @@ -382,7 +387,7 @@ static int parse_commit_in_graph_one(struct commit_graph *g, struct commit *item return 1; if (find_commit_in_graph(item, g, )) - return fill_commit_in_graph(item, g, pos); + return fill_commit_in_graph(r, item, g, pos); return 0; } @@ -391,7 +396,7 @@ int parse_commit_in_graph(struct repository *r, struct commit *item) { if (!prepare_commit_graph(r)) return 0; - return parse_commit_in_graph_one(r->objects->commit_graph, item); + return parse_commit_in_graph_one(r, r->objects->commit_graph, item); } void load_commit_graph_info(struct repository *r, struct commit *item) @@ -403,19 +408,22 @@ void load_commit_graph_info(struct repository *r, struct commit *item) fill_commit_graph_info(item, r->objects->commit_graph, pos); } -static struct tree *load_tree_for_commit(struct commit_graph *g, struct commit *c) +static struct tree *load_tree_for_commit(struct repository *r, +struct commit_graph *g, +struct commit *c) { struct object_id oid; const unsigned char *commit_data = g->chunk_commit_data +
[PATCH 23/23] t/helper/test-repository: celebrate independence from the_repository
dade47c06c (commit-graph: add repo arg to graph readers, 2018-07-11) brought more independence from the_repository to the commit graph, however it was not completely independent of the_repository, as the previous patches show. To ensure we're not accessing the_repository by accident, we'd ideally assign NULL to the_repository to trigger a segfault on access. We currently have a temporary hack in cache.h, which relies on the_hash_algo (which is a short form of the_repository->hash_algo) to be set, so we cannot do that. The next best thing is to set all fields of the_repository to 0, so any accidental access is more likely to be found. Signed-off-by: Stefan Beller --- cache.h| 2 ++ t/helper/test-repository.c | 10 ++ 2 files changed, 12 insertions(+) diff --git a/cache.h b/cache.h index 59c8a93046..8864d7ec15 100644 --- a/cache.h +++ b/cache.h @@ -1033,6 +1033,8 @@ static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2) * * This will need to be extended or ripped out when we learn about * hashes of different sizes. +* +* When ripping this out, see TODO in test-repository.c. */ if (the_hash_algo->rawsz != 20) BUG("hash size not yet supported by hashcmp"); diff --git a/t/helper/test-repository.c b/t/helper/test-repository.c index 6a84a53efb..f7f8618445 100644 --- a/t/helper/test-repository.c +++ b/t/helper/test-repository.c @@ -17,6 +17,11 @@ static void test_parse_commit_in_graph(const char *gitdir, const char *worktree, setup_git_env(gitdir); + memset(the_repository, 0, sizeof(*the_repository)); + + /* TODO: Needed for temporary hack in hashcmp, see 183a638b7da. */ + repo_set_hash_algo(the_repository, GIT_HASH_SHA1); + if (repo_init(, gitdir, worktree)) die("Couldn't init repo"); @@ -43,6 +48,11 @@ static void test_get_commit_tree_in_graph(const char *gitdir, setup_git_env(gitdir); + memset(the_repository, 0, sizeof(*the_repository)); + + /* TODO: Needed for temporary hack in hashcmp, see 183a638b7da. */ + repo_set_hash_algo(the_repository, GIT_HASH_SHA1); + if (repo_init(, gitdir, worktree)) die("Couldn't init repo"); -- 2.19.1.1215.g8438c0b245-goog
[PATCH 02/23] packfile: allow has_packed_and_bad to handle arbitrary repositories
has_packed_and_bad is not widely used, so just migrate it all at once. Signed-off-by: Stefan Beller Signed-off-by: Junio C Hamano --- packfile.c | 5 +++-- packfile.h | 2 +- sha1-file.c | 2 +- 3 files changed, 5 insertions(+), 4 deletions(-) diff --git a/packfile.c b/packfile.c index 841b36182f..bc2e0f5043 100644 --- a/packfile.c +++ b/packfile.c @@ -1131,12 +1131,13 @@ void mark_bad_packed_object(struct packed_git *p, const unsigned char *sha1) p->num_bad_objects++; } -const struct packed_git *has_packed_and_bad(const unsigned char *sha1) +const struct packed_git *has_packed_and_bad(struct repository *r, + const unsigned char *sha1) { struct packed_git *p; unsigned i; - for (p = the_repository->objects->packed_git; p; p = p->next) + for (p = r->objects->packed_git; p; p = p->next) for (i = 0; i < p->num_bad_objects; i++) if (hasheq(sha1, p->bad_object_sha1 + the_hash_algo->rawsz * i)) diff --git a/packfile.h b/packfile.h index 442625723d..7a62d72231 100644 --- a/packfile.h +++ b/packfile.h @@ -146,7 +146,7 @@ extern int packed_object_info(struct repository *r, off_t offset, struct object_info *); extern void mark_bad_packed_object(struct packed_git *p, const unsigned char *sha1); -extern const struct packed_git *has_packed_and_bad(const unsigned char *sha1); +extern const struct packed_git *has_packed_and_bad(struct repository *r, const unsigned char *sha1); /* * Iff a pack file in the given repository contains the object named by sha1, diff --git a/sha1-file.c b/sha1-file.c index b8ce21cbaf..856e000ee1 100644 --- a/sha1-file.c +++ b/sha1-file.c @@ -1432,7 +1432,7 @@ void *read_object_file_extended(const struct object_id *oid, die(_("loose object %s (stored in %s) is corrupt"), oid_to_hex(repl), path); - if ((p = has_packed_and_bad(repl->hash)) != NULL) + if ((p = has_packed_and_bad(the_repository, repl->hash)) != NULL) die(_("packed object %s (stored in %s) is corrupt"), oid_to_hex(repl), p->pack_name); -- 2.19.1.1215.g8438c0b245-goog
[PATCH 17/23] pretty: prepare format_commit_message to handle arbitrary repositories
Signed-off-by: Stefan Beller Signed-off-by: Junio C Hamano --- contrib/coccinelle/the_repository.pending.cocci | 10 ++ pretty.c| 15 --- pretty.h| 7 ++- 3 files changed, 24 insertions(+), 8 deletions(-) diff --git a/contrib/coccinelle/the_repository.pending.cocci b/contrib/coccinelle/the_repository.pending.cocci index f5b42cfc62..2ee702ecf7 100644 --- a/contrib/coccinelle/the_repository.pending.cocci +++ b/contrib/coccinelle/the_repository.pending.cocci @@ -132,3 +132,13 @@ expression G; - logmsg_reencode( + repo_logmsg_reencode(the_repository, E, F, G); + +@@ +expression E; +expression F; +expression G; +expression H; +@@ +- format_commit_message( ++ repo_format_commit_message(the_repository, + E, F, G, H); diff --git a/pretty.c b/pretty.c index b359b68750..3240495308 100644 --- a/pretty.c +++ b/pretty.c @@ -1508,9 +1508,10 @@ void userformat_find_requirements(const char *fmt, struct userformat_want *w) strbuf_release(); } -void format_commit_message(const struct commit *commit, - const char *format, struct strbuf *sb, - const struct pretty_print_context *pretty_ctx) +void repo_format_commit_message(struct repository *r, + const struct commit *commit, + const char *format, struct strbuf *sb, + const struct pretty_print_context *pretty_ctx) { struct format_commit_context context; const char *output_enc = pretty_ctx->output_encoding; @@ -1524,9 +1525,9 @@ void format_commit_message(const struct commit *commit, * convert a commit message to UTF-8 first * as far as 'format_commit_item' assumes it in UTF-8 */ - context.message = logmsg_reencode(commit, - _encoding, - utf8); + context.message = repo_logmsg_reencode(r, commit, + _encoding, + utf8); strbuf_expand(sb, format, format_commit_item, ); rewrap_message_tail(sb, , 0, 0, 0); @@ -1550,7 +1551,7 @@ void format_commit_message(const struct commit *commit, } free(context.commit_encoding); - unuse_commit_buffer(commit, context.message); + repo_unuse_commit_buffer(r, commit, context.message); } static void pp_header(struct pretty_print_context *pp, diff --git a/pretty.h b/pretty.h index 7359d318a9..e6625269cf 100644 --- a/pretty.h +++ b/pretty.h @@ -103,9 +103,14 @@ void pp_remainder(struct pretty_print_context *pp, const char **msg_p, * Put the result to "sb". * Please use this function for custom formats. */ -void format_commit_message(const struct commit *commit, +void repo_format_commit_message(struct repository *r, + const struct commit *commit, const char *format, struct strbuf *sb, const struct pretty_print_context *context); +#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS +#define format_commit_message(c, f, s, con) \ + repo_format_commit_message(the_repository, c, f, s, con) +#endif /* * Parse given arguments from "arg", check it for correctness and -- 2.19.1.1215.g8438c0b245-goog
[PATCH 16/23] commit: prepare logmsg_reencode to handle arbitrary repositories
Signed-off-by: Stefan Beller Signed-off-by: Junio C Hamano --- commit.h| 8 contrib/coccinelle/the_repository.pending.cocci | 9 + pretty.c| 13 +++-- 3 files changed, 24 insertions(+), 6 deletions(-) diff --git a/commit.h b/commit.h index 57375e3239..2e6b799b26 100644 --- a/commit.h +++ b/commit.h @@ -180,6 +180,14 @@ extern int has_non_ascii(const char *text); extern const char *logmsg_reencode(const struct commit *commit, char **commit_encoding, const char *output_encoding); +const char *repo_logmsg_reencode(struct repository *r, +const struct commit *commit, +char **commit_encoding, +const char *output_encoding); +#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS +#define logmsg_reencode(c, enc, out) repo_logmsg_reencode(the_repository, c, enc, out) +#endif + extern const char *skip_blank_lines(const char *msg); /** Removes the first commit from a list sorted by date, and adds all diff --git a/contrib/coccinelle/the_repository.pending.cocci b/contrib/coccinelle/the_repository.pending.cocci index 516f19ffee..f5b42cfc62 100644 --- a/contrib/coccinelle/the_repository.pending.cocci +++ b/contrib/coccinelle/the_repository.pending.cocci @@ -123,3 +123,12 @@ expression F; - unuse_commit_buffer( + repo_unuse_commit_buffer(the_repository, E, F); + +@@ +expression E; +expression F; +expression G; +@@ +- logmsg_reencode( ++ repo_logmsg_reencode(the_repository, + E, F, G); diff --git a/pretty.c b/pretty.c index 8ca29e9281..b359b68750 100644 --- a/pretty.c +++ b/pretty.c @@ -595,14 +595,15 @@ static char *replace_encoding_header(char *buf, const char *encoding) return strbuf_detach(, NULL); } -const char *logmsg_reencode(const struct commit *commit, - char **commit_encoding, - const char *output_encoding) +const char *repo_logmsg_reencode(struct repository *r, +const struct commit *commit, +char **commit_encoding, +const char *output_encoding) { static const char *utf8 = "UTF-8"; const char *use_encoding; char *encoding; - const char *msg = get_commit_buffer(commit, NULL); + const char *msg = repo_get_commit_buffer(r, commit, NULL); char *out; if (!output_encoding || !*output_encoding) { @@ -630,7 +631,7 @@ const char *logmsg_reencode(const struct commit *commit, * the cached copy from get_commit_buffer, we need to duplicate it * to avoid munging the cached copy. */ - if (msg == get_cached_commit_buffer(the_repository, commit, NULL)) + if (msg == get_cached_commit_buffer(r, commit, NULL)) out = xstrdup(msg); else out = (char *)msg; @@ -644,7 +645,7 @@ const char *logmsg_reencode(const struct commit *commit, */ out = reencode_string(msg, output_encoding, use_encoding); if (out) - unuse_commit_buffer(commit, msg); + repo_unuse_commit_buffer(r, commit, msg); } /* -- 2.19.1.1215.g8438c0b245-goog
[PATCH 14/23] commit: prepare get_commit_buffer to handle any repo
Signed-off-by: Stefan Beller --- commit.c| 8 +--- commit.h| 7 ++- contrib/coccinelle/the_repository.pending.cocci | 8 3 files changed, 19 insertions(+), 4 deletions(-) diff --git a/commit.c b/commit.c index 7a931d7fd4..4034def16c 100644 --- a/commit.c +++ b/commit.c @@ -297,13 +297,15 @@ const void *get_cached_commit_buffer(struct repository *r, const struct commit * return v->buffer; } -const void *get_commit_buffer(const struct commit *commit, unsigned long *sizep) +const void *repo_get_commit_buffer(struct repository *r, + const struct commit *commit, + unsigned long *sizep) { - const void *ret = get_cached_commit_buffer(the_repository, commit, sizep); + const void *ret = get_cached_commit_buffer(r, commit, sizep); if (!ret) { enum object_type type; unsigned long size; - ret = read_object_file(>object.oid, , ); + ret = repo_read_object_file(r, >object.oid, , ); if (!ret) die("cannot read commit object %s", oid_to_hex(>object.oid)); diff --git a/commit.h b/commit.h index 08935f9a19..591a77a5bb 100644 --- a/commit.h +++ b/commit.h @@ -117,7 +117,12 @@ const void *get_cached_commit_buffer(struct repository *, const struct commit *, * from disk. The resulting memory should not be modified, and must be given * to unuse_commit_buffer when the caller is done. */ -const void *get_commit_buffer(const struct commit *, unsigned long *size); +const void *repo_get_commit_buffer(struct repository *r, + const struct commit *, + unsigned long *size); +#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS +#define get_commit_buffer(c, s) repo_get_commit_buffer(the_repository, c, s) +#endif /* * Tell the commit subsytem that we are done with a particular commit buffer. diff --git a/contrib/coccinelle/the_repository.pending.cocci b/contrib/coccinelle/the_repository.pending.cocci index 8c6a71bf64..4018e6eaf7 100644 --- a/contrib/coccinelle/the_repository.pending.cocci +++ b/contrib/coccinelle/the_repository.pending.cocci @@ -107,3 +107,11 @@ expression G; - in_merge_bases_many( + repo_in_merge_bases_many(the_repository, E, F, G); + +@@ +expression E; +expression F; +@@ +- get_commit_buffer( ++ repo_get_commit_buffer(the_repository, + E, F); -- 2.19.1.1215.g8438c0b245-goog
[PATCH 07/23] commit: allow parse_commit* to handle any repo
Just like the previous commit, parse_commit and friends are used a lot and are found in new patches, so we cannot change their signature easily. Re-introduce these function prefixed with 'repo_' that take a repository argument and keep the original as a shallow macro. Signed-off-by: Stefan Beller --- commit.c | 18 -- commit.h | 17 + .../coccinelle/the_repository.pending.cocci | 24 +++ 3 files changed, 48 insertions(+), 11 deletions(-) diff --git a/commit.c b/commit.c index dc8a39d52a..7a931d7fd4 100644 --- a/commit.c +++ b/commit.c @@ -443,7 +443,10 @@ int parse_commit_buffer(struct repository *r, struct commit *item, const void *b return 0; } -int parse_commit_internal(struct commit *item, int quiet_on_missing, int use_commit_graph) +int repo_parse_commit_internal(struct repository *r, + struct commit *item, + int quiet_on_missing, + int use_commit_graph) { enum object_type type; void *buffer; @@ -454,9 +457,9 @@ int parse_commit_internal(struct commit *item, int quiet_on_missing, int use_com return -1; if (item->object.parsed) return 0; - if (use_commit_graph && parse_commit_in_graph(the_repository, item)) + if (use_commit_graph && parse_commit_in_graph(r, item)) return 0; - buffer = read_object_file(>object.oid, , ); + buffer = repo_read_object_file(r, >object.oid, , ); if (!buffer) return quiet_on_missing ? -1 : error("Could not read %s", @@ -467,18 +470,19 @@ int parse_commit_internal(struct commit *item, int quiet_on_missing, int use_com oid_to_hex(>object.oid)); } - ret = parse_commit_buffer(the_repository, item, buffer, size, 0); + ret = parse_commit_buffer(r, item, buffer, size, 0); if (save_commit_buffer && !ret) { - set_commit_buffer(the_repository, item, buffer, size); + set_commit_buffer(r, item, buffer, size); return 0; } free(buffer); return ret; } -int parse_commit_gently(struct commit *item, int quiet_on_missing) +int repo_parse_commit_gently(struct repository *r, +struct commit *item, int quiet_on_missing) { - return parse_commit_internal(item, quiet_on_missing, 1); + return repo_parse_commit_internal(r, item, quiet_on_missing, 1); } void parse_commit_or_die(struct commit *item) diff --git a/commit.h b/commit.h index 1d260d62f5..08935f9a19 100644 --- a/commit.h +++ b/commit.h @@ -79,12 +79,21 @@ struct commit *lookup_commit_reference_by_name(const char *name); struct commit *lookup_commit_or_die(const struct object_id *oid, const char *ref_name); int parse_commit_buffer(struct repository *r, struct commit *item, const void *buffer, unsigned long size, int check_graph); -int parse_commit_internal(struct commit *item, int quiet_on_missing, int use_commit_graph); -int parse_commit_gently(struct commit *item, int quiet_on_missing); -static inline int parse_commit(struct commit *item) +int repo_parse_commit_internal(struct repository *r, struct commit *item, + int quiet_on_missing, int use_commit_graph); +int repo_parse_commit_gently(struct repository *r, +struct commit *item, +int quiet_on_missing); +static inline int repo_parse_commit(struct repository *r, struct commit *item) { - return parse_commit_gently(item, 0); + return repo_parse_commit_gently(r, item, 0); } +#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS +#define parse_commit_internal(item, quiet, use) repo_parse_commit_internal(the_repository, item, quiet, use) +#define parse_commit_gently(item, quiet) repo_parse_commit_gently(the_repository, item, quiet) +#define parse_commit(item) repo_parse_commit(the_repository, item) +#endif + void parse_commit_or_die(struct commit *item); struct buffer_slab; diff --git a/contrib/coccinelle/the_repository.pending.cocci b/contrib/coccinelle/the_repository.pending.cocci index 46f3a1b23a..b185fe0a1d 100644 --- a/contrib/coccinelle/the_repository.pending.cocci +++ b/contrib/coccinelle/the_repository.pending.cocci @@ -40,3 +40,27 @@ expression F; - has_object_file_with_flags( + repo_has_object_file_with_flags(the_repository, E) + +@@ +expression E; +expression F; +expression G; +@@ +- parse_commit_internal( ++ repo_parse_commit_internal(the_repository, + E, F, G) + +@@ +expression E; +expression F; +@@ +- parse_commit_gently( ++ repo_parse_commit_gently(the_repository, + E, F) + +@@ +expression E; +@@ +- parse_commit( ++ repo_parse_commit(the_repository, + E) -- 2.19.1.1215.g8438c0b245-goog
[PATCH 19/23] submodule: don't add submodule as odb for push
In push_submodule(), because we do not actually need access to objects in the submodule, do not invoke add_submodule_odb(). (for_each_remote_ref_submodule() does not require access to those objects, and the actual push is done by spawning another process, which handles object access by itself.) This code of push_submodule() is exercised in t5531 and continues to work, showing that the submodule odbc is not needed. Signed-off-by: Stefan Beller --- submodule.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/submodule.c b/submodule.c index 262f03f118..5818088df2 100644 --- a/submodule.c +++ b/submodule.c @@ -1021,9 +1021,6 @@ static int push_submodule(const char *path, const struct string_list *push_options, int dry_run) { - if (add_submodule_odb(path)) - return 1; - if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) { struct child_process cp = CHILD_PROCESS_INIT; argv_array_push(, "push"); -- 2.19.1.1215.g8438c0b245-goog
[PATCH 08/23] commit-reach.c: allow paint_down_to_common to handle any repo
As the function is file local and not widely used, migrate it all at once. Signed-off-by: Stefan Beller --- commit-reach.c | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/commit-reach.c b/commit-reach.c index 9f79ce0a22..67c2e43d5e 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -30,7 +30,8 @@ static int queue_has_nonstale(struct prio_queue *queue) } /* all input commits in one and twos[] must have been parsed! */ -static struct commit_list *paint_down_to_common(struct commit *one, int n, +static struct commit_list *paint_down_to_common(struct repository *r, + struct commit *one, int n, struct commit **twos, int min_generation) { @@ -83,7 +84,7 @@ static struct commit_list *paint_down_to_common(struct commit *one, int n, parents = parents->next; if ((p->object.flags & flags) == flags) continue; - if (parse_commit(p)) + if (repo_parse_commit(r, p)) return NULL; p->object.flags |= flags; prio_queue_put(, p); @@ -116,7 +117,7 @@ static struct commit_list *merge_bases_many(struct commit *one, int n, struct co return NULL; } - list = paint_down_to_common(one, n, twos, 0); + list = paint_down_to_common(the_repository, one, n, twos, 0); while (list) { struct commit *commit = pop_commit(); @@ -187,8 +188,8 @@ static int remove_redundant(struct commit **array, int cnt) if (array[j]->generation < min_generation) min_generation = array[j]->generation; } - common = paint_down_to_common(array[i], filled, work, - min_generation); + common = paint_down_to_common(the_repository, array[i], filled, + work, min_generation); if (array[i]->object.flags & PARENT2) redundant[i] = 1; for (j = 0; j < filled; j++) @@ -322,7 +323,9 @@ int in_merge_bases_many(struct commit *commit, int nr_reference, struct commit * if (commit->generation > min_generation) return ret; - bases = paint_down_to_common(commit, nr_reference, reference, commit->generation); + bases = paint_down_to_common(the_repository, commit, +nr_reference, reference, +commit->generation); if (commit->object.flags & PARENT2) ret = 1; clear_commit_marks(commit, all_flags); -- 2.19.1.1215.g8438c0b245-goog
[PATCH 05/23] object-store: prepare has_{sha1, object}_file to handle any repo
Signed-off-by: Stefan Beller --- .../coccinelle/the_repository.pending.cocci | 30 +++ object-store.h| 22 ++ sha1-file.c | 15 ++ 3 files changed, 56 insertions(+), 11 deletions(-) diff --git a/contrib/coccinelle/the_repository.pending.cocci b/contrib/coccinelle/the_repository.pending.cocci index a7ac9e0c46..46f3a1b23a 100644 --- a/contrib/coccinelle/the_repository.pending.cocci +++ b/contrib/coccinelle/the_repository.pending.cocci @@ -10,3 +10,33 @@ expression G; - read_object_file( + repo_read_object_file(the_repository, E, F, G) + +@@ +expression E; +@@ +- has_sha1_file( ++ repo_has_sha1_file(the_repository, + E) + +@@ +expression E; +expression F; +@@ +- has_sha1_file_with_flags( ++ repo_has_sha1_file_with_flags(the_repository, + E) + +@@ +expression E; +@@ +- has_object_file( ++ repo_has_object_file(the_repository, + E) + +@@ +expression E; +expression F; +@@ +- has_object_file_with_flags( ++ repo_has_object_file_with_flags(the_repository, + E) diff --git a/object-store.h b/object-store.h index 00a64622e6..2b5e6ff1ed 100644 --- a/object-store.h +++ b/object-store.h @@ -212,15 +212,27 @@ int read_loose_object(const char *path, * object_info. OBJECT_INFO_SKIP_CACHED is automatically set; pass * nonzero flags to also set other flags. */ -extern int has_sha1_file_with_flags(const unsigned char *sha1, int flags); -static inline int has_sha1_file(const unsigned char *sha1) +int repo_has_sha1_file_with_flags(struct repository *r, + const unsigned char *sha1, int flags); +static inline int repo_has_sha1_file(struct repository *r, +const unsigned char *sha1) { - return has_sha1_file_with_flags(sha1, 0); + return repo_has_sha1_file_with_flags(r, sha1, 0); } +#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS +#define has_sha1_file_with_flags(sha1, flags) repo_has_sha1_file_with_flags(the_repository, sha1, flags) +#define has_sha1_file(sha1) repo_has_sha1_file(the_repository, sha1) +#endif + /* Same as the above, except for struct object_id. */ -extern int has_object_file(const struct object_id *oid); -extern int has_object_file_with_flags(const struct object_id *oid, int flags); +int repo_has_object_file(struct repository *r, const struct object_id *oid); +int repo_has_object_file_with_flags(struct repository *r, + const struct object_id *oid, int flags); +#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS +#define has_object_file(oid) repo_has_object_file(the_repository, oid) +#define has_object_file_with_flags(oid, flags) repo_has_object_file_with_flags(the_repository, oid, flags) +#endif /* * Return true iff an alternate object database has a loose object diff --git a/sha1-file.c b/sha1-file.c index c5b704aec5..e77273ccfd 100644 --- a/sha1-file.c +++ b/sha1-file.c @@ -1768,24 +1768,27 @@ int force_object_loose(const struct object_id *oid, time_t mtime) return ret; } -int has_sha1_file_with_flags(const unsigned char *sha1, int flags) +int repo_has_sha1_file_with_flags(struct repository *r, + const unsigned char *sha1, int flags) { struct object_id oid; if (!startup_info->have_repository) return 0; hashcpy(oid.hash, sha1); - return oid_object_info_extended(the_repository, , NULL, + return oid_object_info_extended(r, , NULL, flags | OBJECT_INFO_SKIP_CACHED) >= 0; } -int has_object_file(const struct object_id *oid) +int repo_has_object_file(struct repository *r, +const struct object_id *oid) { - return has_sha1_file(oid->hash); + return repo_has_sha1_file(r, oid->hash); } -int has_object_file_with_flags(const struct object_id *oid, int flags) +int repo_has_object_file_with_flags(struct repository *r, + const struct object_id *oid, int flags) { - return has_sha1_file_with_flags(oid->hash, flags); + return repo_has_sha1_file_with_flags(r, oid->hash, flags); } static void check_tree(const void *buf, size_t size) -- 2.19.1.1215.g8438c0b245-goog
[PATCH 15/23] commit: prepare repo_unuse_commit_buffer to handle any repo
Signed-off-by: Stefan Beller --- commit.c| 6 -- commit.h| 7 ++- contrib/coccinelle/the_repository.pending.cocci | 8 3 files changed, 18 insertions(+), 3 deletions(-) diff --git a/commit.c b/commit.c index 4034def16c..7d2f3a9a93 100644 --- a/commit.c +++ b/commit.c @@ -318,10 +318,12 @@ const void *repo_get_commit_buffer(struct repository *r, return ret; } -void unuse_commit_buffer(const struct commit *commit, const void *buffer) +void repo_unuse_commit_buffer(struct repository *r, + const struct commit *commit, + const void *buffer) { struct commit_buffer *v = buffer_slab_peek( - the_repository->parsed_objects->buffer_slab, commit); + r->parsed_objects->buffer_slab, commit); if (!(v && v->buffer == buffer)) free((void *)buffer); } diff --git a/commit.h b/commit.h index 591a77a5bb..57375e3239 100644 --- a/commit.h +++ b/commit.h @@ -130,7 +130,12 @@ const void *repo_get_commit_buffer(struct repository *r, * from an earlier call to get_commit_buffer. The buffer may or may not be * freed by this call; callers should not access the memory afterwards. */ -void unuse_commit_buffer(const struct commit *, const void *buffer); +void repo_unuse_commit_buffer(struct repository *r, + const struct commit *, + const void *buffer); +#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS +#define unuse_commit_buffer(c, b) repo_unuse_commit_buffer(the_repository, c, b) +#endif /* * Free any cached object buffer associated with the commit. diff --git a/contrib/coccinelle/the_repository.pending.cocci b/contrib/coccinelle/the_repository.pending.cocci index 4018e6eaf7..516f19ffee 100644 --- a/contrib/coccinelle/the_repository.pending.cocci +++ b/contrib/coccinelle/the_repository.pending.cocci @@ -115,3 +115,11 @@ expression F; - get_commit_buffer( + repo_get_commit_buffer(the_repository, E, F); + +@@ +expression E; +expression F; +@@ +- unuse_commit_buffer( ++ repo_unuse_commit_buffer(the_repository, + E, F); -- 2.19.1.1215.g8438c0b245-goog
[PATCH 06/23] object: parse_object to honor its repository argument
In 8e4b0b6047 (object.c: allow parse_object to handle arbitrary repositories, 2018-06-28), we forgot to pass the repository down to the read_object_file. Signed-off-by: Stefan Beller Signed-off-by: Junio C Hamano --- object.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/object.c b/object.c index e54160550c..003f870484 100644 --- a/object.c +++ b/object.c @@ -259,8 +259,8 @@ struct object *parse_object(struct repository *r, const struct object_id *oid) if (obj && obj->parsed) return obj; - if ((obj && obj->type == OBJ_BLOB && has_object_file(oid)) || - (!obj && has_object_file(oid) && + if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) || + (!obj && repo_has_object_file(r, oid) && oid_object_info(r, oid, NULL) == OBJ_BLOB)) { if (check_object_signature(repl, NULL, 0, NULL) < 0) { error(_("sha1 mismatch %s"), oid_to_hex(oid)); @@ -270,7 +270,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid) return lookup_object(r, oid->hash); } - buffer = read_object_file(oid, , ); + buffer = repo_read_object_file(r, oid, , ); if (buffer) { if (check_object_signature(repl, buffer, size, type_name(type)) < 0) { free(buffer); -- 2.19.1.1215.g8438c0b245-goog
[PATCH 12/23] commit-reach: prepare get_merge_bases to handle any repo
Similarly to previous patches, the get_merge_base functions are used often in the code base, which makes migrating them hard. Implement the new functions, prefixed with 'repo_' and hide the old functions behind a wrapper macro. Signed-off-by: Stefan Beller --- commit-reach.c| 24 ++--- commit-reach.h| 26 --- .../coccinelle/the_repository.pending.cocci | 26 +++ 3 files changed, 56 insertions(+), 20 deletions(-) diff --git a/commit-reach.c b/commit-reach.c index b3b1f62aba..657a4e9b5a 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -258,23 +258,27 @@ static struct commit_list *get_merge_bases_many_0(struct repository *r, return result; } -struct commit_list *get_merge_bases_many(struct commit *one, -int n, -struct commit **twos) +struct commit_list *repo_get_merge_bases_many(struct repository *r, + struct commit *one, + int n, + struct commit **twos) { - return get_merge_bases_many_0(the_repository, one, n, twos, 1); + return get_merge_bases_many_0(r, one, n, twos, 1); } -struct commit_list *get_merge_bases_many_dirty(struct commit *one, - int n, - struct commit **twos) +struct commit_list *repo_get_merge_bases_many_dirty(struct repository *r, + struct commit *one, + int n, + struct commit **twos) { - return get_merge_bases_many_0(the_repository, one, n, twos, 0); + return get_merge_bases_many_0(r, one, n, twos, 0); } -struct commit_list *get_merge_bases(struct commit *one, struct commit *two) +struct commit_list *repo_get_merge_bases(struct repository *r, +struct commit *one, +struct commit *two) { - return get_merge_bases_many_0(the_repository, one, 1, , 1); + return get_merge_bases_many_0(r, one, 1, , 1); } /* diff --git a/commit-reach.h b/commit-reach.h index 7d313e2975..52667d64ac 100644 --- a/commit-reach.h +++ b/commit-reach.h @@ -8,17 +8,23 @@ struct commit_list; struct contains_cache; struct ref_filter; -struct commit_list *get_merge_bases_many(struct commit *one, -int n, -struct commit **twos); -struct commit_list *get_merge_bases_many_dirty(struct commit *one, - int n, - struct commit **twos); -struct commit_list *get_merge_bases(struct commit *one, struct commit *two); -struct commit_list *get_octopus_merge_bases(struct commit_list *in); - +struct commit_list *repo_get_merge_bases(struct repository *r, +struct commit *rev1, +struct commit *rev2); +struct commit_list *repo_get_merge_bases_many(struct repository *r, + struct commit *one, int n, + struct commit **twos); /* To be used only when object flags after this call no longer matter */ -struct commit_list *get_merge_bases_many_dirty(struct commit *one, int n, struct commit **twos); +struct commit_list *repo_get_merge_bases_many_dirty(struct repository *r, + struct commit *one, int n, + struct commit **twos); +#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS +#define get_merge_bases(r1, r2) repo_get_merge_bases(the_repository, r1, r2) +#define get_merge_bases_many(one, n, two) repo_get_merge_bases_many(the_repository, one, n, two) +#define get_merge_bases_many_dirty(one, n, twos) repo_get_merge_bases_many_dirty(the_repository, one, n, twos) +#endif + +struct commit_list *get_octopus_merge_bases(struct commit_list *in); int is_descendant_of(struct commit *commit, struct commit_list *with_commit); int in_merge_bases_many(struct commit *commit, int nr_reference, struct commit **reference); diff --git a/contrib/coccinelle/the_repository.pending.cocci b/contrib/coccinelle/the_repository.pending.cocci index b185fe0a1d..f6c2915a4e 100644 --- a/contrib/coccinelle/the_repository.pending.cocci +++ b/contrib/coccinelle/the_repository.pending.cocci @@ -64,3 +64,29 @@ expression E; - parse_commit( + repo_parse_commit(the_repository, E) + +@@ +expression E; +expression F; +@@ +- get_merge_bases( ++ repo_get_merge_bases(the_repository, + E, F); + +@@ +expression E; +expression F; +expression G
[PATCH 04/23] object-store: prepare read_object_file to deal with any repo
As read_object_file is a widely used function (which is also regularly used in new code in flight between master..pu), changing its signature is painful is hard, as other series in flight rely on the original signature. It would burden the maintainer if we'd just change the signature. Introduce repo_read_object_file which takes the repository argument, and hide the original read_object_file as a macro behind NO_THE_REPOSITORY_COMPATIBILITY_MACROS, similar to e675765235 (diff.c: remove implicit dependency on the_index, 2018-09-21) Add a coccinelle patch to convert existing callers, but do not apply the resulting patch to keep the diff of this patch small. Signed-off-by: Stefan Beller --- contrib/coccinelle/the_repository.pending.cocci | 12 object-store.h | 10 -- 2 files changed, 20 insertions(+), 2 deletions(-) create mode 100644 contrib/coccinelle/the_repository.pending.cocci diff --git a/contrib/coccinelle/the_repository.pending.cocci b/contrib/coccinelle/the_repository.pending.cocci new file mode 100644 index 00..a7ac9e0c46 --- /dev/null +++ b/contrib/coccinelle/the_repository.pending.cocci @@ -0,0 +1,12 @@ +// This file is used for the ongoing refactoring of +// bringing the index or repository struct in all of +// our code base. + +@@ +expression E; +expression F; +expression G; +@@ +- read_object_file( ++ repo_read_object_file(the_repository, + E, F, G) diff --git a/object-store.h b/object-store.h index 3d98a682b2..00a64622e6 100644 --- a/object-store.h +++ b/object-store.h @@ -165,10 +165,16 @@ extern void *read_object_file_extended(struct repository *r, const struct object_id *oid, enum object_type *type, unsigned long *size, int lookup_replace); -static inline void *read_object_file(const struct object_id *oid, enum object_type *type, unsigned long *size) +static inline void *repo_read_object_file(struct repository *r, + const struct object_id *oid, + enum object_type *type, + unsigned long *size) { - return read_object_file_extended(the_repository, oid, type, size, 1); + return read_object_file_extended(r, oid, type, size, 1); } +#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS +#define read_object_file(oid, type, size) repo_read_object_file(the_repository, oid, type, size) +#endif /* Read and unpack an object file into memory, write memory to an object file */ int oid_object_info(struct repository *r, const struct object_id *, unsigned long *); -- 2.19.1.1215.g8438c0b245-goog
[PATCH 13/23] commit-reach: prepare in_merge_bases[_many] to handle any repo
Signed-off-by: Stefan Beller --- commit-reach.c | 15 +-- commit-reach.h | 12 ++-- contrib/coccinelle/the_repository.pending.cocci | 17 + 3 files changed, 36 insertions(+), 8 deletions(-) diff --git a/commit-reach.c b/commit-reach.c index 657a4e9b5a..8715008fef 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -312,16 +312,17 @@ int is_descendant_of(struct commit *commit, struct commit_list *with_commit) /* * Is "commit" an ancestor of one of the "references"? */ -int in_merge_bases_many(struct commit *commit, int nr_reference, struct commit **reference) +int repo_in_merge_bases_many(struct repository *r, struct commit *commit, +int nr_reference, struct commit **reference) { struct commit_list *bases; int ret = 0, i; uint32_t min_generation = GENERATION_NUMBER_INFINITY; - if (parse_commit(commit)) + if (repo_parse_commit(r, commit)) return ret; for (i = 0; i < nr_reference; i++) { - if (parse_commit(reference[i])) + if (repo_parse_commit(r, reference[i])) return ret; if (reference[i]->generation < min_generation) min_generation = reference[i]->generation; @@ -330,7 +331,7 @@ int in_merge_bases_many(struct commit *commit, int nr_reference, struct commit * if (commit->generation > min_generation) return ret; - bases = paint_down_to_common(the_repository, commit, + bases = paint_down_to_common(r, commit, nr_reference, reference, commit->generation); if (commit->object.flags & PARENT2) @@ -344,9 +345,11 @@ int in_merge_bases_many(struct commit *commit, int nr_reference, struct commit * /* * Is "commit" an ancestor of (i.e. reachable from) the "reference"? */ -int in_merge_bases(struct commit *commit, struct commit *reference) +int repo_in_merge_bases(struct repository *r, + struct commit *commit, + struct commit *reference) { - return in_merge_bases_many(commit, 1, ); + return repo_in_merge_bases_many(r, commit, 1, ); } struct commit_list *reduce_heads(struct commit_list *heads) diff --git a/commit-reach.h b/commit-reach.h index 52667d64ac..a0d4a29d25 100644 --- a/commit-reach.h +++ b/commit-reach.h @@ -27,8 +27,16 @@ struct commit_list *repo_get_merge_bases_many_dirty(struct repository *r, struct commit_list *get_octopus_merge_bases(struct commit_list *in); int is_descendant_of(struct commit *commit, struct commit_list *with_commit); -int in_merge_bases_many(struct commit *commit, int nr_reference, struct commit **reference); -int in_merge_bases(struct commit *commit, struct commit *reference); +int repo_in_merge_bases(struct repository *r, + struct commit *commit, + struct commit *reference); +int repo_in_merge_bases_many(struct repository *r, +struct commit *commit, +int nr_reference, struct commit **reference); +#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS +#define in_merge_bases(c1, c2) repo_in_merge_bases(the_repository, c1, c2) +#define in_merge_bases_many(c1, n, cs) repo_in_merge_bases_many(the_repository, c1, n, cs) +#endif /* * Takes a list of commits and returns a new list where those diff --git a/contrib/coccinelle/the_repository.pending.cocci b/contrib/coccinelle/the_repository.pending.cocci index f6c2915a4e..8c6a71bf64 100644 --- a/contrib/coccinelle/the_repository.pending.cocci +++ b/contrib/coccinelle/the_repository.pending.cocci @@ -90,3 +90,20 @@ expression G; - get_merge_bases_many_dirty( + repo_get_merge_bases_many_dirty(the_repository, E, F, G); + +@@ +expression E; +expression F; +@@ +- in_merge_bases( ++ repo_in_merge_bases(the_repository, + E, F); + +@@ +expression E; +expression F; +expression G; +@@ +- in_merge_bases_many( ++ repo_in_merge_bases_many(the_repository, + E, F, G); -- 2.19.1.1215.g8438c0b245-goog
[PATCH 11/23] commit-reach.c: allow get_merge_bases_many_0 to handle any repo
Signed-off-by: Stefan Beller --- commit-reach.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/commit-reach.c b/commit-reach.c index 81015830cb..b3b1f62aba 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -216,7 +216,8 @@ static int remove_redundant(struct repository *r, struct commit **array, int cnt return filled; } -static struct commit_list *get_merge_bases_many_0(struct commit *one, +static struct commit_list *get_merge_bases_many_0(struct repository *r, + struct commit *one, int n, struct commit **twos, int cleanup) @@ -226,7 +227,7 @@ static struct commit_list *get_merge_bases_many_0(struct commit *one, struct commit_list *result; int cnt, i; - result = merge_bases_many(the_repository, one, n, twos); + result = merge_bases_many(r, one, n, twos); for (i = 0; i < n; i++) { if (one == twos[i]) return result; @@ -249,7 +250,7 @@ static struct commit_list *get_merge_bases_many_0(struct commit *one, clear_commit_marks(one, all_flags); clear_commit_marks_many(n, twos, all_flags); - cnt = remove_redundant(the_repository, rslt, cnt); + cnt = remove_redundant(r, rslt, cnt); result = NULL; for (i = 0; i < cnt; i++) commit_list_insert_by_date(rslt[i], ); @@ -261,19 +262,19 @@ struct commit_list *get_merge_bases_many(struct commit *one, int n, struct commit **twos) { - return get_merge_bases_many_0(one, n, twos, 1); + return get_merge_bases_many_0(the_repository, one, n, twos, 1); } struct commit_list *get_merge_bases_many_dirty(struct commit *one, int n, struct commit **twos) { - return get_merge_bases_many_0(one, n, twos, 0); + return get_merge_bases_many_0(the_repository, one, n, twos, 0); } struct commit_list *get_merge_bases(struct commit *one, struct commit *two) { - return get_merge_bases_many_0(one, 1, , 1); + return get_merge_bases_many_0(the_repository, one, 1, , 1); } /* -- 2.19.1.1215.g8438c0b245-goog
[PATCH 03/23] object-store: allow read_object_file_extended to read from any repo
read_object_file_extended is not widely used, so migrate it all at once. Signed-off-by: Stefan Beller Signed-off-by: Junio C Hamano --- object-store.h | 5 +++-- sha1-file.c| 11 ++- streaming.c| 2 +- 3 files changed, 10 insertions(+), 8 deletions(-) diff --git a/object-store.h b/object-store.h index 63b7605a3e..3d98a682b2 100644 --- a/object-store.h +++ b/object-store.h @@ -161,12 +161,13 @@ void sha1_file_name(struct repository *r, struct strbuf *buf, const unsigned cha void *map_sha1_file(struct repository *r, const unsigned char *sha1, unsigned long *size); -extern void *read_object_file_extended(const struct object_id *oid, +extern void *read_object_file_extended(struct repository *r, + const struct object_id *oid, enum object_type *type, unsigned long *size, int lookup_replace); static inline void *read_object_file(const struct object_id *oid, enum object_type *type, unsigned long *size) { - return read_object_file_extended(oid, type, size, 1); + return read_object_file_extended(the_repository, oid, type, size, 1); } /* Read and unpack an object file into memory, write memory to an object file */ diff --git a/sha1-file.c b/sha1-file.c index 856e000ee1..c5b704aec5 100644 --- a/sha1-file.c +++ b/sha1-file.c @@ -1403,7 +1403,8 @@ int pretend_object_file(void *buf, unsigned long len, enum object_type type, * deal with them should arrange to call read_object() and give error * messages themselves. */ -void *read_object_file_extended(const struct object_id *oid, +void *read_object_file_extended(struct repository *r, + const struct object_id *oid, enum object_type *type, unsigned long *size, int lookup_replace) @@ -1413,10 +1414,10 @@ void *read_object_file_extended(const struct object_id *oid, const char *path; struct stat st; const struct object_id *repl = lookup_replace ? - lookup_replace_object(the_repository, oid) : oid; + lookup_replace_object(r, oid) : oid; errno = 0; - data = read_object(the_repository, repl->hash, type, size); + data = read_object(r, repl->hash, type, size); if (data) return data; @@ -1428,11 +1429,11 @@ void *read_object_file_extended(const struct object_id *oid, die(_("replacement %s not found for %s"), oid_to_hex(repl), oid_to_hex(oid)); - if (!stat_sha1_file(the_repository, repl->hash, , )) + if (!stat_sha1_file(r, repl->hash, , )) die(_("loose object %s (stored in %s) is corrupt"), oid_to_hex(repl), path); - if ((p = has_packed_and_bad(the_repository, repl->hash)) != NULL) + if ((p = has_packed_and_bad(r, repl->hash)) != NULL) die(_("packed object %s (stored in %s) is corrupt"), oid_to_hex(repl), p->pack_name); diff --git a/streaming.c b/streaming.c index d1e6b2dce6..c843a1230f 100644 --- a/streaming.c +++ b/streaming.c @@ -490,7 +490,7 @@ static struct stream_vtbl incore_vtbl = { static open_method_decl(incore) { - st->u.incore.buf = read_object_file_extended(oid, type, >size, 0); + st->u.incore.buf = read_object_file_extended(the_repository, oid, type, >size, 0); st->u.incore.read_ptr = 0; st->vtbl = _vtbl; -- 2.19.1.1215.g8438c0b245-goog
[PATCH 09/23] commit-reach.c: allow merge_bases_many to handle any repo
Signed-off-by: Stefan Beller --- commit-reach.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/commit-reach.c b/commit-reach.c index 67c2e43d5e..a53b31e6a2 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -95,7 +95,9 @@ static struct commit_list *paint_down_to_common(struct repository *r, return result; } -static struct commit_list *merge_bases_many(struct commit *one, int n, struct commit **twos) +static struct commit_list *merge_bases_many(struct repository *r, + struct commit *one, int n, + struct commit **twos) { struct commit_list *list = NULL; struct commit_list *result = NULL; @@ -110,14 +112,14 @@ static struct commit_list *merge_bases_many(struct commit *one, int n, struct co return commit_list_insert(one, ); } - if (parse_commit(one)) + if (repo_parse_commit(r, one)) return NULL; for (i = 0; i < n; i++) { - if (parse_commit(twos[i])) + if (repo_parse_commit(r, twos[i])) return NULL; } - list = paint_down_to_common(the_repository, one, n, twos, 0); + list = paint_down_to_common(r, one, n, twos, 0); while (list) { struct commit *commit = pop_commit(); @@ -224,7 +226,7 @@ static struct commit_list *get_merge_bases_many_0(struct commit *one, struct commit_list *result; int cnt, i; - result = merge_bases_many(one, n, twos); + result = merge_bases_many(the_repository, one, n, twos); for (i = 0; i < n; i++) { if (one == twos[i]) return result; -- 2.19.1.1215.g8438c0b245-goog
[PATCH 10/23] commit-reach.c: allow remove_redundant to handle any repo
Signed-off-by: Stefan Beller --- commit-reach.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/commit-reach.c b/commit-reach.c index a53b31e6a2..81015830cb 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -156,7 +156,7 @@ struct commit_list *get_octopus_merge_bases(struct commit_list *in) return ret; } -static int remove_redundant(struct commit **array, int cnt) +static int remove_redundant(struct repository *r, struct commit **array, int cnt) { /* * Some commit in the array may be an ancestor of @@ -174,7 +174,7 @@ static int remove_redundant(struct commit **array, int cnt) ALLOC_ARRAY(filled_index, cnt - 1); for (i = 0; i < cnt; i++) - parse_commit(array[i]); + repo_parse_commit(r, array[i]); for (i = 0; i < cnt; i++) { struct commit_list *common; uint32_t min_generation = array[i]->generation; @@ -190,7 +190,7 @@ static int remove_redundant(struct commit **array, int cnt) if (array[j]->generation < min_generation) min_generation = array[j]->generation; } - common = paint_down_to_common(the_repository, array[i], filled, + common = paint_down_to_common(r, array[i], filled, work, min_generation); if (array[i]->object.flags & PARENT2) redundant[i] = 1; @@ -249,7 +249,7 @@ static struct commit_list *get_merge_bases_many_0(struct commit *one, clear_commit_marks(one, all_flags); clear_commit_marks_many(n, twos, all_flags); - cnt = remove_redundant(rslt, cnt); + cnt = remove_redundant(the_repository, rslt, cnt); result = NULL; for (i = 0; i < cnt; i++) commit_list_insert_by_date(rslt[i], ); @@ -370,7 +370,7 @@ struct commit_list *reduce_heads(struct commit_list *heads) p->item->object.flags &= ~STALE; } } - num_head = remove_redundant(array, num_head); + num_head = remove_redundant(the_repository, array, num_head); for (i = 0; i < num_head; i++) tail = _list_insert(array[i], tail)->next; free(array); -- 2.19.1.1215.g8438c0b245-goog
[PATCHv3 00/23] Bring more repository handles into our code base
This resends origin/sb/more-repo-in-api. Unlike the previous resend (v2), this is not rebased to a newer base. This doesn't contain the patch for pending semantic changes, as that seems to go in its own topic branch. Please have a look at the last 4 patches specifically as they were new in the last iteration (but did not receive any comment), as they demonstrate and fix a problem that is only exposed when using GIT_TEST_COMMIT_GRAPH=1 for the test suite. Previous discussion at https://public-inbox.org/git/20181030220817.61691-1-sbel...@google.com/ Thanks, Stefan Stefan Beller (23): sha1_file: allow read_object to read objects in arbitrary repositories packfile: allow has_packed_and_bad to handle arbitrary repositories object-store: allow read_object_file_extended to read from any repo object-store: prepare read_object_file to deal with any repo object-store: prepare has_{sha1, object}_file to handle any repo object: parse_object to honor its repository argument commit: allow parse_commit* to handle any repo commit-reach.c: allow paint_down_to_common to handle any repo commit-reach.c: allow merge_bases_many to handle any repo commit-reach.c: allow remove_redundant to handle any repo commit-reach.c: allow get_merge_bases_many_0 to handle any repo commit-reach: prepare get_merge_bases to handle any repo commit-reach: prepare in_merge_bases[_many] to handle any repo commit: prepare get_commit_buffer to handle any repo commit: prepare repo_unuse_commit_buffer to handle any repo commit: prepare logmsg_reencode to handle arbitrary repositories pretty: prepare format_commit_message to handle arbitrary repositories submodule: use submodule repos for object lookup submodule: don't add submodule as odb for push commit-graph: convert remaining functions to handle any repo commit: prepare free_commit_buffer and release_commit_memory for any repo path.h: make REPO_GIT_PATH_FUNC repository agnostic t/helper/test-repository: celebrate independence from the_repository builtin/fsck.c| 3 +- builtin/log.c | 6 +- builtin/rev-list.c| 3 +- cache.h | 2 + commit-graph.c| 40 +++-- commit-reach.c| 73 + commit-reach.h| 38 +++-- commit.c | 41 ++--- commit.h | 43 +- .../coccinelle/the_repository.pending.cocci | 144 ++ object-store.h| 35 - object.c | 8 +- packfile.c| 5 +- packfile.h| 2 +- path.h| 2 +- pretty.c | 28 ++-- pretty.h | 7 +- sha1-file.c | 34 +++-- streaming.c | 2 +- submodule.c | 76 ++--- t/helper/test-repository.c| 10 ++ 21 files changed, 452 insertions(+), 150 deletions(-) create mode 100644 contrib/coccinelle/the_repository.pending.cocci Range-diff: 1: 1b9b5c695e = 1: ca9fece80e sha1_file: allow read_object to read objects in arbitrary repositories 2: 33b94066f2 = 2: 8eac25fe32 packfile: allow has_packed_and_bad to handle arbitrary repositories 3: 5217b6b1e1 ! 3: 06e5f83b66 object-store: allow read_object_file_extended to read from arbitrary repositories @@ -1,6 +1,6 @@ Author: Stefan Beller -object-store: allow read_object_file_extended to read from arbitrary repositories +object-store: allow read_object_file_extended to read from any repo read_object_file_extended is not widely used, so migrate it all at once. 4: 2b7239b55b ! 4: 6167722608 object-store: prepare read_object_file to deal with arbitrary repositories @@ -1,6 +1,6 @@ Author: Stefan Beller -object-store: prepare read_object_file to deal with arbitrary repositories +object-store: prepare read_object_file to deal with any repo As read_object_file is a widely used function (which is also regularly used in new code in flight between master..pu), changing its signature is painful @@ -13,16 +13,14 @@ e675765235 (diff.c: remove implicit dependency on the_index, 2018-09-21) Add a coccinelle patch to convert existing callers, but do not apply -the resulting patch from 'make coccicheck' to keep the diff of this -patch small. +the resulting patch to keep the diff of this patch small. Signed-off-by: Stefan Beller -Signed-off-by: Junio C Hamano
[PATCH 01/23] sha1_file: allow read_object to read objects in arbitrary repositories
Allow read_object (a file local functon in sha1_file) to handle arbitrary repositories by passing the repository down to oid_object_info_extended. Signed-off-by: Stefan Beller Signed-off-by: Junio C Hamano --- sha1-file.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/sha1-file.c b/sha1-file.c index dd0b6aa873..b8ce21cbaf 100644 --- a/sha1-file.c +++ b/sha1-file.c @@ -1361,7 +1361,9 @@ int oid_object_info(struct repository *r, return type; } -static void *read_object(const unsigned char *sha1, enum object_type *type, +static void *read_object(struct repository *r, +const unsigned char *sha1, +enum object_type *type, unsigned long *size) { struct object_id oid; @@ -1373,7 +1375,7 @@ static void *read_object(const unsigned char *sha1, enum object_type *type, hashcpy(oid.hash, sha1); - if (oid_object_info_extended(the_repository, , , 0) < 0) + if (oid_object_info_extended(r, , , 0) < 0) return NULL; return content; } @@ -1414,7 +1416,7 @@ void *read_object_file_extended(const struct object_id *oid, lookup_replace_object(the_repository, oid) : oid; errno = 0; - data = read_object(repl->hash, type, size); + data = read_object(the_repository, repl->hash, type, size); if (data) return data; @@ -1755,7 +1757,7 @@ int force_object_loose(const struct object_id *oid, time_t mtime) if (has_loose_object(oid)) return 0; - buf = read_object(oid->hash, , ); + buf = read_object(the_repository, oid->hash, , ); if (!buf) return error(_("cannot read sha1_file for %s"), oid_to_hex(oid)); hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %lu", type_name(type), len) + 1; -- 2.19.1.1215.g8438c0b245-goog
Re: rebase-in-C stability for 2.20
> But maybe I'm being overly paranoid. What do those more familiar with > this think? I am not too worried, * as rebase is a main porcelain, that is even hard to use in a script. so any failures are not deep down in some automation, but when found exposed quickly (and hopefully reported). * 5541bd5b8f was merged to next a month ago; internally we distribute the next branch to Googlers (on a weekly basis) and we have not had any bug reports regarding rebase. (Maybe our environment is too strict for the wide range of bugs reported) * Johannes reported that the rebase is used in GfW https://public-inbox.org/git/nycvar.qro.7.76.6.1808241320540...@tvgsbejvaqbjf.bet/ https://github.com/git-for-windows/git/pull/1800 and from my cursory reading it is part of 2.19.windows, which has a large user base. > (and would re-enable rebase.useBuiltin=true in > master right after 2.20 is out the door). That would be fine with me as well, but I'd rather document rebase.useBuiltin instead of flip-flopping the switch around the release. Have there been any fixes that are only in the C version (has the shell version already bitrotted)? Stefan
[PATCH] diff: align move detection error handling with other options
This changes the error handling for the options --color-moved-ws and --color-moved-ws to be like the rest of the options. Move the die() call out of parse_color_moved_ws into the parsing of command line options. As the function returns a bit field, change its signature to return an unsigned instead of an int; add a new bit to signal errors. Once the error is signaled, we discard the other bits, such that it doesn't matter if the error bit overlaps with any other bit. Signed-off-by: Stefan Beller --- c.f. ./git -c diff.colormovedws=bogus diff HEAD error: unknown color-moved-ws mode 'bogus' fatal: unable to parse 'diff.colormovedws' from command-line config ./git -c core.abbrev=41 diff error: abbrev length out of range: 41 fatal: unable to parse 'core.abbrev' from command-line config ./git diff --color=bogus error: option `color' expects "always", "auto", or "never" ./git -c diff.colormovedws=bogus diff HEAD error: unknown color-moved-ws mode 'bogus' fatal: unable to parse 'diff.colormovedws' from command-line config diff.c | 25 - diff.h | 3 ++- t/t4015-diff-whitespace.sh | 18 ++ 3 files changed, 36 insertions(+), 10 deletions(-) diff --git a/diff.c b/diff.c index 8647db3d30..d7d467b605 100644 --- a/diff.c +++ b/diff.c @@ -291,7 +291,7 @@ static int parse_color_moved(const char *arg) return error(_("color moved setting must be one of 'no', 'default', 'blocks', 'zebra', 'dimmed-zebra', 'plain'")); } -static int parse_color_moved_ws(const char *arg) +static unsigned parse_color_moved_ws(const char *arg) { int ret = 0; struct string_list l = STRING_LIST_INIT_DUP; @@ -312,15 +312,19 @@ static int parse_color_moved_ws(const char *arg) ret |= XDF_IGNORE_WHITESPACE; else if (!strcmp(sb.buf, "allow-indentation-change")) ret |= COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE; - else - error(_("ignoring unknown color-moved-ws mode '%s'"), sb.buf); + else { + ret |= COLOR_MOVED_WS_ERROR; + error(_("unknown color-moved-ws mode '%s', possible values are 'ignore-space-change', 'ignore-space-at-eol', 'ignore-all-space', 'allow-indentation-change'"), sb.buf); + } strbuf_release(); } if ((ret & COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) && - (ret & XDF_WHITESPACE_FLAGS)) - die(_("color-moved-ws: allow-indentation-change cannot be combined with other white space modes")); + (ret & XDF_WHITESPACE_FLAGS)) { + error(_("color-moved-ws: allow-indentation-change cannot be combined with other white space modes")); + ret |= COLOR_MOVED_WS_ERROR; + } string_list_clear(, 0); @@ -341,8 +345,8 @@ int git_diff_ui_config(const char *var, const char *value, void *cb) return 0; } if (!strcmp(var, "diff.colormovedws")) { - int cm = parse_color_moved_ws(value); - if (cm < 0) + unsigned cm = parse_color_moved_ws(value); + if (cm & COLOR_MOVED_WS_ERROR) return -1; diff_color_moved_ws_default = cm; return 0; @@ -5032,10 +5036,13 @@ int diff_opt_parse(struct diff_options *options, else if (skip_prefix(arg, "--color-moved=", )) { int cm = parse_color_moved(arg); if (cm < 0) - die("bad --color-moved argument: %s", arg); + return error("bad --color-moved argument: %s", arg); options->color_moved = cm; } else if (skip_prefix(arg, "--color-moved-ws=", )) { - options->color_moved_ws_handling = parse_color_moved_ws(arg); + unsigned cm = parse_color_moved_ws(arg); + if (cm & COLOR_MOVED_WS_ERROR) + return -1; + options->color_moved_ws_handling = cm; } else if (skip_to_optional_arg_default(arg, "--color-words", >word_regex, NULL)) { options->use_color = 1; options->word_diff = DIFF_WORDS_COLOR; diff --git a/diff.h b/diff.h index ce5e8a8183..9e8061ca29 100644 --- a/diff.h +++ b/diff.h @@ -225,7 +225,8 @@ struct diff_options { /* XDF_WHITESPACE_FLAGS regarding block detection are set at 2, 3, 4 */ #define COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE (1<<5) - int color_moved_ws_handling; + #define COLOR_MOVED_WS_ERROR (1<<0) + unsigned color_moved_ws_handling; struct repository *repo; }; diff --git a/t/t4015-diff-whitespace.sh b/t/t4
Re: [PATCH 1/1] bundle: refuse to create empty bundle
On Tue, Nov 13, 2018 at 12:33 PM Gaël Lhez wrote: > > Hello, > > I don't know why I receive these message (and especially now given the time > at which I pushed this) but I suppose someone (Johannes Schindelin ?) > probably pushed back my original commit from git for windows github to GIT > git repository. Yes that is pretty much what is happening. Johannes (GfW maintainer) tries to push a lot of patches upstream to git and cc's people who originally authored the patch. Thanks for taking a look, again, even after this long time! > > If you think "bundle: cleanup lock files on error" is better, then no problem > with me. I'm not a native english speaker and I simply expressed the reason > for my problem but - after reading back my commit - neither this mail' > subject and my original commit subject (see > https://github.com/git-for-windows/git/pull/797/commits/0ef742a1a92ae53188189238adbd16086fabf80a) > express the core problem. I am not a native speaker either, which makes it extra hard to understand some commits. ;-) So I propose a wording which would have helped me. > As I'm not accustomed to pushing on GIT 'git repository' , please let me know > if I have something else to do ? I don't know how Johannes handles pushing changes upstream, maybe he will take on the work of resending a reworded patch. Let's hear his thoughts on it. I would guess you're more than welcome to take your patches from GitForWindows into Git itself. Cheers, Stefan
Re: [PATCH 1/1] bundle: refuse to create empty bundle
On Tue, Nov 13, 2018 at 7:09 AM Gaël Lhez via GitGitGadget wrote: > > From: =?UTF-8?q?Ga=C3=ABl=20Lhez?= > > When an user tries to create an empty bundle via `git bundle create > ` where `` resolves to an empty list (for > example, like `master..master`), the command fails and warns the user > about how it does not want to create empty bundle. > > However, the `.lock` file was still open and on Windows that means > that it could not be deleted properly. This patch fixes that issue. > > This closes https://github.com/git-for-windows/git/issues/790 > > Signed-off-by: Gaël Lhez > Signed-off-by: Johannes Schindelin The code and the commit message make sense, but by reading the subject line I would have expected a different commit. Maybe "bundle: cleanup lock files on error"
Re: Re* [PATCH v3 1/1] protocol: advertise multiple supported versions
On Mon, Nov 12, 2018 at 7:24 PM Junio C Hamano wrote: > > Stefan Beller writes: > > >> + if (have_advertised_versions_already) > >> + BUG(_("attempting to register an allowed protocol version > >> after advertisement")); > > > > If this is a real BUG (due to wrong program flow) instead of bad user input, > > we would not want to burden the translators with this message. > > > > If it is a message that the user is intended to see upon bad input > > we'd rather go with die(_("translatable text here")); > > Yeah, good suggestion. > > Perhaps we should spell it out in docstrings for BUG/die with the > above rationale. A weatherbaloon patch follows. "Nobody reads documentation any more, we have too much of it." [1] [1] c.f. https://quoteinvestigator.com/2014/08/29/too-crowded/ I would have hoped to have a .cocci patch in the bad style category, i.e. disallowing the _() function inside the context of BUG(). That said, I like the patch below. Thanks for writing it. > git-compat-util.h | 22 +- > 1 file changed, 21 insertions(+), 1 deletion(-) > > diff --git a/git-compat-util.h b/git-compat-util.h > index 96a3f86d8e..a1cf68cbbc 100644 > --- a/git-compat-util.h > +++ b/git-compat-util.h > @@ -420,7 +420,16 @@ static inline char *git_find_last_dir_sep(const char > *path) > > struct strbuf; > > -/* General helper functions */ > +/* > + * General helper functions > + * > + * Note that these are to produce end-user facing messages, and most > + * of them should be marked for translation (the exception is > + * messages generated to be sent over the wire---as we currently do not > + * have a mechanism to notice and honor locale settings used on the > + * other end, leave them untranslated. We should *not* send messages > + * that are localized for our end). > + */ > extern void vreportf(const char *prefix, const char *err, va_list params); > extern NORETURN void usage(const char *err); > extern NORETURN void usagef(const char *err, ...) __attribute__((format > (printf, 1, 2))); > @@ -1142,6 +1151,17 @@ static inline int regexec_buf(const regex_t *preg, > const char *buf, size_t size, > /* usage.c: only to be used for testing BUG() implementation (see test-tool) > */ > extern int BUG_exit_code; > > +/* > + * BUG(fmt, ...) is to be used when the problem of reaching that point > + * in code can only be caused by a program error. To abort with a > + * message to report impossible-to-continue condition due to external > + * factors like end-user input, environment settings, data it was fed > + * over the wire, use die(_(fmt),...). > + * > + * Note that the message from BUG() should not be marked for > + * translation while the message from die() is in general end-user > + * facing and should be marked for translation. > + */ > #ifdef HAVE_VARIADIC_MACROS > __attribute__((format (printf, 3, 4))) NORETURN > void BUG_fl(const char *file, int line, const char *fmt, ...);
Re: [PATCH] remote-curl: die on server-side errors
On Mon, Nov 12, 2018 at 2:45 PM wrote: > > When a smart HTTP server sends an error message via pkt-line, > remote-curl will fail to detect the error (which usually results in > incorrectly falling back to dumb-HTTP mode). > > This patch adds a check in discover_refs() for server-side error > messages, as well as a test case for this issue. > > Signed-off-by: Josh Steadmon Reviewed-by: Stefan Beller
Re: [PATCH] Makefile: use CXXFLAGS for linking fuzzers
On Mon, Nov 12, 2018 at 2:03 PM wrote: > > OSS-Fuzz requires C++-specific flags to link fuzzers. Passing these in > CFLAGS causes lots of build warnings. Using separate CXXFLAGS avoids > this. > That makes sense in this context, > CFLAGS = -g -O2 -Wall > +CXXFLAGS ?= $(CFLAGS) ... but out of context, just by reading the relevant part of the Makefile, a user might mistakenly assume we do some C++ trickery for standard compilation of Git. (Is that bad or do we just not care?) I wonder if setting the CXXFLAGS near or in the fuzz target would be better. > LDFLAGS = > ALL_CFLAGS = $(CPPFLAGS) $(CFLAGS) > ALL_LDFLAGS = $(LDFLAGS) > @@ -3098,14 +3099,14 @@ cover_db_html: cover_db > # An example command to build against libFuzzer from LLVM 4.0.0: > # > # make CC=clang CXX=clang++ \ > -# CFLAGS="-fsanitize-coverage=trace-pc-guard -fsanitize=address" \ > +# CXXFLAGS="-fsanitize-coverage=trace-pc-guard -fsanitize=address" \ > # LIB_FUZZING_ENGINE=/usr/lib/llvm-4.0/lib/libFuzzer.a \ > # fuzz-all > # > .PHONY: fuzz-all Maybe here? > > $(FUZZ_PROGRAMS): all > - $(QUIET_LINK)$(CXX) $(CFLAGS) $(LIB_OBJS) $(BUILTIN_OBJS) \ > + $(QUIET_LINK)$(CXX) $(CXXFLAGS) $(LIB_OBJS) $(BUILTIN_OBJS) \ > $(XDIFF_OBJS) $(EXTLIBS) git.o $@.o $(LIB_FUZZING_ENGINE) -o > $@ Thanks, Stefan
Re: [PATCH v3 1/1] protocol: advertise multiple supported versions
On Mon, Nov 12, 2018 at 1:49 PM wrote: > > Currently the client advertises that it supports the wire protocol > version set in the protocol.version config. However, not all services > support the same set of protocol versions. When connecting to > git-receive-pack, the client automatically downgrades to v0 if > config.protocol is set to v2, but this check is not performed for other > services. > > This patch creates a protocol version registry. Individual operations > register all the protocol versions they support prior to communicating > with a server. Versions should be listed in preference order; the > version specified in protocol.version will automatically be moved to the > front of the registry. > > The protocol version registry is passed to remote helpers via the > GIT_PROTOCOL environment variable. > > Clients now advertise the full list of registered versions. Servers > select the first recognized version from this advertisement. > > Signed-off-by: Josh Steadmon Thanks for resending this patch! +cc Jonathan Tan, who works a lot on the protocol side of things. > +void register_allowed_protocol_version(enum protocol_version version) > +{ > + if (have_advertised_versions_already) > + BUG(_("attempting to register an allowed protocol version > after advertisement")); If this is a real BUG (due to wrong program flow) instead of bad user input, we would not want to burden the translators with this message. If it is a message that the user is intended to see upon bad input we'd rather go with die(_("translatable text here")); > diff --git a/protocol.h b/protocol.h > index 2ad35e433c..b67b2259de 100644 > --- a/protocol.h > +++ b/protocol.h > @@ -16,6 +16,23 @@ enum protocol_version { > */ > extern enum protocol_version get_protocol_version_config(void); > > +/* > + * Register an allowable protocol version for a given operation. Registration > + * must occur before attempting to advertise a version to a server process. > + */ > +extern void register_allowed_protocol_version(enum protocol_version version); We keep extern here as to imitate the file-local style, although Documentation/CodingGuidelines would prefer to not have the extern keywords. Okay. Bonus points for converting the file to omit all extern keywords in a resend. :-) (I think there is no other series currently in flight touching this header file, so it is safe to convert it "while at it", `git log --grep "while at it"` is shorter than expected.) All that said, it's only nits that I contributed to this code review; the code/design looks good to me, and if I were maintainer I'd include it as-is, as it fixes a long(ish) standing protocol error. Thanks, Stefan
Re: [PATCH 0/9] caching loose objects
On Mon, Nov 12, 2018 at 8:02 AM Derrick Stolee wrote: > This cleanup is actually really valuable, and affects much more than > this application. I second this. I'd value this series more for the cleanup than its application. ;-)
Re: [PATCH 6/9] sha1-file: use an object_directory for the main object dir
On Mon, Nov 12, 2018 at 8:09 AM Jeff King wrote: > > On Mon, Nov 12, 2018 at 10:48:36AM -0500, Derrick Stolee wrote: > > > > If the "the first one is the main store, the rest are alternates" bit is > > > too subtle, we could mark each "struct object_directory" with a bit for > > > "is_local". > > > > This is probably a good thing to do proactively. We have the equivalent in > > the packed_git struct, but that's also because they get out of order. At the > > moment, I can't think of a read-only action that needs to treat the local > > object directory more carefully. The closest I know about is 'git > > pack-objects --local', but that also writes a pack-file. > > > > I assume that when we write a pack-file to the "default location" we use > > get_object_directory() instead of referring to the default object_directory? > > Generally, yes, though that should eventually be going away in favor of > accessing it via a "struct repository". And after my series, > get_object_directory() is just returning the_repository->objects->odb->path > (i.e., using the "first one is main" rule). > > One thing that makes me nervous about a "local" flag in each struct is > that it implies that it's the source of truth for where to write to. So > what does git_object_directory() look like after that? Do we leave it > with the "first one is main" rule? Or does it become: s/git/get/ ;-) get_object_directory is very old and was introduced in e1b10391ea (Use git config file for committer name and email info, 2005-10-11) by Linus. I would argue that we might want to get rid of that function now, actually as it doesn't seem to add value to the code (assuming the BUG never triggers), and using a_repo->objects->objectdir or after this series a_repo->objects->odb->path; is just as short. $ git grep get_object_directory |wc -l 30 $ git grep -- "->objects->objectdir" |wc -l 10 Ah well, we're not there yet. > for (odb = the_repository->objects->odb; odb; odb = odb->next) { > if (odb->local) > return odb->path; > } > return NULL; /* yikes? */ > > ? That feels like it's making things more complicated, not less. It depends if the caller cares about the local flag. I'd think we can have more than one local, eventually? Just think of the partial clone stuff that may have a local set of promised stuff and another set of actual objects, which may be stored in different local odbs. If the caller cares about the distinction, they would need to write out this loop as above themselves. If they don't care, we could migrate them to not use this function, so we can get rid of it? > > > - for (odb = r->objects->alt_odb_list; odb; odb = odb->next) { > > > - prepare_multi_pack_index_one(r, odb->path, 0); > > > - prepare_packed_git_one(r, odb->path, 0); > > > + for (odb = r->objects->odb; odb; odb = odb->next) { > > > + int local = (odb == r->objects->odb); > > > > Here seems to be a place where `odb->is_local` would help. > > Yes, though I don't mind this spot in particular, as the check is pretty > straight-forward. > > I think an example that would benefit more is the check_and_freshen() > stuff. There we have two almost-the-same wrappers, one of which operates > on just the first element of the list, and the other of which operates > on all of the elements after the first. > > It could become: > > static int check_and_freshen_odb(struct object_directory *odb_list, >const struct object_id *oid, >int freshen, >int local) > { > struct object_directory *odb; > > for (odb = odb_list; odb; odb = odb->next) { > static struct strbuf path = STRBUF_INIT; > > if (odb->local != local) > continue; > > odb_loose_path(odb, , oid->hash); > return check_and_freshen_file(path.buf, freshen); > } > } > > int check_and_freshen_local(const struct object_id *oid, int freshen) > { > return check_and_freshen_odb(the_repository->objects->odb, oid, > freshen, 1); > } > > int check_and_freshen_nonlocal(const struct object_id *oid, int freshen) > { > return check_and_freshen_odb(the_repository->objects->odb, oid, > freshen, 0); > } > I am fine with (a maybe better documented) "first is local" rule, but the code above looks intriguing, except a little wasteful (we need two full loops in check_and_freshen, but ideally we can do by just one loop). What does the local flag mean anyway in a world where we have many odbs in a repository, that are not distinguishable except by their order? AFAICT it is actually to be used for differentiating how much we care in fsck/cat-file/packing, as it may be borrowed from an alternate, so maybe the flag is rather to be named after
Re: [PATCH 6/9] sha1-file: use an object_directory for the main object dir
On Mon, Nov 12, 2018 at 7:48 AM Derrick Stolee wrote: > [... lots of quoted text...] Some email readers are very good at recognizing unchanged quoted text and collapse it, not so at https://public-inbox.org/git/421d3b43-3425-72c9-218e-facd86e28...@gmail.com/ which I use to read through this series. It would help if you'd cut most of the (con)text that is not nearby to your reply, as I read the context email just before your reply. Thanks, Stefan
Re: [PATCH 2/9] submodule--helper: prefer strip_suffix() to ends_with()
On Mon, Nov 12, 2018 at 6:47 AM Jeff King wrote: > > Using strip_suffix() lets us avoid repeating ourselves. It also makes > the handling of "/" a bit less subtle (we strip one less character than > we matched in order to leave it in place, but we can just as easily > include the "/" when we add more path components). > > Signed-off-by: Jeff King This makes sense. Thanks! (This patch caught my attention as it's a submodule thing, but now looking at the rest of the series)
[PATCH] coccicheck: introduce 'pending' semantic patches
From: SZEDER Gábor Teach `make coccicheck` to avoid patches named "*.pending.cocci" and handle them separately in a new `make coccicheck-pending` instead. This means that we can separate "critical" patches from "FYI" patches. The former target can continue causing Travis to fail its static analysis job, while the latter can let us keep an eye on ongoing (pending) transitions without them causing too much fallout. Document the intended use-cases around these two targets. As the process around the pending patches is not yet fully explored, leave that out. Based-on-work-by: SZEDER Gábor Signed-off-by: Stefan Beller Signed-off-by: Junio C Hamano --- I dialed back on the workflow, as we may want to explore it first before writing it down. Stefan Makefile | 7 +-- contrib/coccinelle/README | 41 +++ 2 files changed, 46 insertions(+), 2 deletions(-) diff --git a/Makefile b/Makefile index bbfbb4292d..6bc4654828 100644 --- a/Makefile +++ b/Makefile @@ -2740,9 +2740,12 @@ endif then \ echo '' SPATCH result: $@; \ fi -coccicheck: $(addsuffix .patch,$(wildcard contrib/coccinelle/*.cocci)) +coccicheck: $(addsuffix .patch,$(filter-out %.pending.cocci,$(wildcard contrib/coccinelle/*.cocci))) -.PHONY: coccicheck +# See contrib/coccinelle/README +coccicheck-pending: $(addsuffix .patch,$(wildcard contrib/coccinelle/*.pending.cocci)) + +.PHONY: coccicheck coccicheck-pending ### Installation rules diff --git a/contrib/coccinelle/README b/contrib/coccinelle/README index 9c2f8879c2..f0e80bd7f0 100644 --- a/contrib/coccinelle/README +++ b/contrib/coccinelle/README @@ -1,2 +1,43 @@ This directory provides examples of Coccinelle (http://coccinelle.lip6.fr/) semantic patches that might be useful to developers. + +There are two types of semantic patches: + + * Using the semantic transformation to check for bad patterns in the code; + The target 'make coccicheck' is designed to check for these patterns and + it is expected that any resulting patch indicates a regression. + The patches resulting from 'make coccicheck' are small and infrequent, + so once they are found, they can be sent to the mailing list as per usual. + + Example for introducing new patterns: + 67947c34ae (convert "hashcmp() != 0" to "!hasheq()", 2018-08-28) + b84c783882 (fsck: s/++i > 1/i++/, 2018-10-24) + + Example of fixes using this approach: + 248f66ed8e (run-command: use strbuf_addstr() for adding a string to + a strbuf, 2018-03-25) + f919ffebed (Use MOVE_ARRAY, 2018-01-22) + + These types of semantic patches are usually part of testing, c.f. + 0860a7641b (travis-ci: fail if Coccinelle static analysis found something + to transform, 2018-07-23) + + * Using semantic transformations in large scale refactorings throughout + the code base. + + When applying the semantic patch into a real patch, sending it to the + mailing list in the usual way, such a patch would be expected to have a + lot of textual and semantic conflicts as such large scale refactorings + change function signatures that are used widely in the code base. + A textual conflict would arise if surrounding code near any call of such + function changes. A semantic conflict arises when other patch series in + flight introduce calls to such functions. + + So to aid these large scale refactorings, semantic patches can be used. + However we do not want to store them in the same place as the checks for + bad patterns, as then automated builds would fail. + That is why semantic patches 'contrib/coccinelle/*.pending.cocci' + are ignored for checks, and can be applied using 'make coccicheck-pending'. + + This allows to expose plans of pending large scale refactorings without + impacting the bad pattern checks. -- 2.19.1.930.g4563a0d9d0-goog
Re: [PATCH] Makefile: add pending semantic patches
On Thu, Nov 8, 2018 at 9:18 PM Junio C Hamano wrote: > > Stefan Beller writes: > > > From: SZEDER Gábor > > > > Add a description and place on how to use coccinelle for large refactorings > > that happen only once. > > > > Based-on-work-by: SZEDER Gábor > > Signed-off-by: Stefan Beller > > --- > > > > I consider including this patch in a resend instead. > > It outlays the basics of such a new workflow, which we can refine later. > > Thanks for tying loose ends. > > > diff --git a/contrib/coccinelle/README b/contrib/coccinelle/README > > index 9c2f8879c2..fa09d1abcc 100644 > > --- a/contrib/coccinelle/README > > +++ b/contrib/coccinelle/README > > @@ -1,2 +1,62 @@ > > This directory provides examples of Coccinelle (http://coccinelle.lip6.fr/) > > semantic patches that might be useful to developers. > > + > > +There are two types of semantic patches: > > + > > + * Using the semantic transformation to check for bad patterns in the code; > > + This is what the original target 'make coccicheck' is designed to do and > > + it is expected that any resulting patch indicates a regression. > > + The patches resulting from 'make coccicheck' are small and infrequent, > > + so once they are found, they can be sent to the mailing list as per > > usual. > > + > > + Example for introducing new patterns: > > + 67947c34ae (convert "hashcmp() != 0" to "!hasheq()", 2018-08-28) > > + b84c783882 (fsck: s/++i > 1/i++/, 2018-10-24) > > + > > + Example of fixes using this approach: > > + 248f66ed8e (run-command: use strbuf_addstr() for adding a string to > > + a strbuf, 2018-03-25) > > + f919ffebed (Use MOVE_ARRAY, 2018-01-22) > > + > > + These types of semantic patches are usually part of testing, c.f. > > + 0860a7641b (travis-ci: fail if Coccinelle static analysis found > > something > > + to transform, 2018-07-23) > > Yup, and I think what we have in 'pu' (including your the_repository > stuff) falls into this category. My impression was that the_repository is strongly second category as the_repository.cocci doesn't fix bad smells of code, but proposes a refactoring that we agreed on doing. > I've been paying attention to "make coccicheck" produced patches for > the past few weeks, and so far, I found it _much_ _much_ more > pleasant, compared to having to worry about merge conflicts with the > topics in flight that changes day to day (not just because we add > new topics or update existing topics, but also the order of the > merge changes as topics mature at different rates and jumps over > each other in master..pu history), that "make coccicheck" after > topics are merged to integration branches fixes these issues up as > needed. So from your end we would not need the "pending" category as long as the follow ups come in a timely manner? > > > + 3) Apply the semantic patch only partially, where needed for the patch > > series > > + that motivates the large scale refactoring and then build that series > > + on top of it. > > It is not quite clear what "needed for the patch series" really > means in the context of this paragraph. What are the changes that > are not needed, that is still produced if we ran "make coccicheck"? An example for "needed" would be 3f21279f50..bd8737176b whereas "not needed" is what is in "treewide: apply cocci patch". The treewide conversion of e.g. unuse_commit_buffer to repo_unuse_commit_buffer is nice, but "needed" only in its followup patch that converts logmsg_reencode as that calls into the unuse_commit_buffer. > Are they wrong changes (e.g. a carelessly written read_cache() to > read_index(_index) conversion may munge the implementation of > read_cache(...) { return read_index(_index, ...); } and make > inifinite recursion)? Or are they "correct but not immediately > necessary" (e.g. because calling read_cache() does not hurt until > that function gets removed, so rewriting the callers to call > read_index() with _index may be correct but not immediately > necessary)? the latter. I assume correctness (of the semantic patch) to be a given, but this is all about timing, i.e. how can I send the series without breaking others in flight. > > > + By applying the semantic patch only partially where needed, the > > series > > + is less likely to conflict with other series in flight. > > That is correct. > > > + To make it possible to apply the semantic patch partially, there > > need
Re: [PATCH] Makefile: add pending semantic patches
On Thu, Nov 8, 2018 at 8:56 PM Martin Ågren wrote: > I haven't followed the original discussion too carefully, so I'll read > this like someone new to the topic probably would. Thanks! > A nit, perhaps, but I was genuinely confused at first. The subject is > "Makefile: add pending semantic patches", but this patch doesn't add > any. It adds Makefile-support for such patches though, and it defines > the entire concept of pending semantic patches. How about "coccicheck: > introduce 'pending' semantic patches"? > > > Add a description and place on how to use coccinelle for large refactorings > > that happen only once. > > A bit confused about "and place". Based on my understanding from reading > the remainder of this patch, maybe: > > Teach `make coccicheck` to avoid patches named "*.pending.cocci" and > handle them separately in a new `make coccicheck-pending` instead. > This means that we can separate "critical" patches from "FYI" patches. > The former target can continue causing Travis to fail its static > analysis job, while the latter can let us keep an eye on ongoing > (pending) transitions without them causing too much fallout. > > Document the intended use-cases and processes around these two > targets. Both suggested title and new commit message make sense. > > > This directory provides examples of Coccinelle (http://coccinelle.lip6.fr/) > > semantic patches that might be useful to developers. > > + > > +There are two types of semantic patches: > > + > > + * Using the semantic transformation to check for bad patterns in the code; > > + This is what the original target 'make coccicheck' is designed to do and > > Is it relevant that this was the "original" target? (Genuine question.) No. I can omit that part. > > > + it is expected that any resulting patch indicates a regression. > > + The patches resulting from 'make coccicheck' are small and infrequent, > > + so once they are found, they can be sent to the mailing list as per > > usual. > > + > > + Example for introducing new patterns: > > + 67947c34ae (convert "hashcmp() != 0" to "!hasheq()", 2018-08-28) > > + b84c783882 (fsck: s/++i > 1/i++/, 2018-10-24) > > + > > + Example of fixes using this approach: > > + 248f66ed8e (run-command: use strbuf_addstr() for adding a string to > > + a strbuf, 2018-03-25) > > + f919ffebed (Use MOVE_ARRAY, 2018-01-22) > > + > > + These types of semantic patches are usually part of testing, c.f. > > + 0860a7641b (travis-ci: fail if Coccinelle static analysis found > > something > > + to transform, 2018-07-23) > > Very nicely described, nice with the examples to quickly give a feeling > about how/when to use this. Thanks. > > > + * Using semantic transformations in large scale refactorings throughout > > + the code base. > > + > > + When applying the semantic patch into a real patch, sending it to the > > + mailing list in the usual way, such a patch would be expected to have a > > + lot of textual and semantic conflicts as such large scale refactorings > > + change function signatures that are used widely in the code base. > > + A textual conflict would arise if surrounding code near any call of such > > + function changes. A semantic conflict arises when other patch series in > > + flight introduce calls to such functions. > > OK, I'm with you. > > > + So to aid these large scale refactorings, semantic patches can be used, > > + using the process as follows: > > + > > + 1) Figure out what kind of large scale refactoring we need > > + -> This is usually done by another unrelated series. > > "This"? The figuring out, or the refactoring? Also, "unrelated"? The need and type of what kind of large scale refactoring are usually determined by a patch series, that is not refactoring for the sake of refactoring, but it wants to achieve a specific goal, unrelated to large refactorings per se. The large refactoring is just another tool that a developer can use to make their original series happen much faster. So "unrelated" == "not the large scale refactoring, as that may come as an preparatory series, but to have a preparatory series it may be good to showcase why we need the preparatory series" > > > + 2) Create the sematic patch needed for the large scale refactoring > > s/sematic/semantic/ yup. > > > + and store it in contrib/coccinelle/*.pending.cocci > > + -> The suffix containing 'pending' is important to differentiate > > + this case from the other use case of checking for bad patterns. > > Good. > > > + 3) Apply the semantic patch only partially, where needed for the patch > > series > > + that motivates the large scale refactoring and then build that series > > + on top of it. > > + By applying the semantic patch only partially where needed, the > > series > > + is less likely to conflict with other series in flight. > > + To make it possible to apply the