from:"Stefan Beller"

[PATCH 1/4] submodule update: add regression test with old style setups

2018-12-07 Thread Stefan Beller

As f178c13fda (Revert "Merge branch 'sb/submodule-core-worktree'",
2018-09-07) was produced shortly before a release, nobody asked for
a regression test to be included. Add a regression test that makes sure
that the invocation of `git submodule update` on old setups doesn't
produce errors as pointed out in f178c13fda.

The place to add such a regression test may look odd in t7412, but
that is the best place as there we setup old style submodule setups
explicitly.

Signed-off-by: Stefan Beller 
---
 t/t7412-submodule-absorbgitdirs.sh | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/t/t7412-submodule-absorbgitdirs.sh 
b/t/t7412-submodule-absorbgitdirs.sh
index ce74c12da2..1cfa150768 100755
--- a/t/t7412-submodule-absorbgitdirs.sh
+++ b/t/t7412-submodule-absorbgitdirs.sh
@@ -75,7 +75,12 @@ test_expect_success 're-setup nested submodule' '
GIT_WORK_TREE=../../../nested git -C sub1/.git/modules/nested config \
core.worktree "../../../nested" &&
# make sure this re-setup is correct
-   git status --ignore-submodules=none
+   git status --ignore-submodules=none &&
+
+   # also make sure this old setup does not regress
+   git submodule update --init --recursive >out 2>err &&
+   test_must_be_empty out &&
+   test_must_be_empty err
 '
 
 test_expect_success 'absorb the git dir in a nested submodule' '
-- 
2.20.0.rc2.403.gdbc3b29805-goog

[PATCH 4/4] submodule deinit: unset core.worktree

2018-12-07 Thread Stefan Beller

This re-introduces 984cd77ddb (submodule deinit: unset core.worktree,
2018-06-18), which was reverted as part of f178c13fda (Revert "Merge
branch 'sb/submodule-core-worktree'", 2018-09-07)

The whole series was reverted as the offending commit e98317508c
(submodule: ensure core.worktree is set after update, 2018-06-18)
was relied on by other commits such as 984cd77ddb.

Keep the offending commit reverted, but its functionality came back via
4d6d6ef1fc (Merge branch 'sb/submodule-update-in-c', 2018-09-17), such
that we can reintroduce 984cd77ddb now.

Signed-off-by: Stefan Beller 
---
 builtin/submodule--helper.c | 2 ++
 t/lib-submodule-update.sh   | 2 +-
 t/t7400-submodule-basic.sh  | 5 +
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 31ac30cf2f..672b74db89 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1131,6 +1131,8 @@ static void deinit_submodule(const char *path, const char 
*prefix,
if (!(flags & OPT_QUIET))
printf(format, displaypath);
 
+   submodule_unset_core_worktree(sub);
+
strbuf_release(_rm);
}
 
diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
index 51d449..5b56b23166 100755
--- a/t/lib-submodule-update.sh
+++ b/t/lib-submodule-update.sh
@@ -235,7 +235,7 @@ reset_work_tree_to_interested () {
then
mkdir -p submodule_update/.git/modules/sub1/modules &&
cp -r submodule_update_repo/.git/modules/sub1/modules/sub2 
submodule_update/.git/modules/sub1/modules/sub2
-   GIT_WORK_TREE=. git -C 
submodule_update/.git/modules/sub1/modules/sub2 config --unset core.worktree
+   # core.worktree is unset for sub2 as it is not checked out
fi &&
# indicate we are interested in the submodule:
git -C submodule_update config submodule.sub1.url "bogus" &&
diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
index 76a7cb0af7..aba2d4d6ee 100755
--- a/t/t7400-submodule-basic.sh
+++ b/t/t7400-submodule-basic.sh
@@ -984,6 +984,11 @@ test_expect_success 'submodule deinit should remove the 
whole submodule section
rmdir init
 '
 
+test_expect_success 'submodule deinit should unset core.worktree' '
+   test_path_is_file .git/modules/example/config &&
+   test_must_fail git config -f .git/modules/example/config core.worktree
+'
+
 test_expect_success 'submodule deinit from subdirectory' '
git submodule update --init &&
git config submodule.example.foo bar &&
-- 
2.20.0.rc2.403.gdbc3b29805-goog

[PATCH 0/4]

2018-12-07 Thread Stefan Beller

A couple days before the 2.19 release we had a bug report about
broken submodules[1] and reverted[2] the commits leading up to them.

The behavior of said bug fixed itself by taking a different approach[3],
specifically by a weaker enforcement of having `core.worktree` set in a
submodule [4].

The revert [2] was overly broad as we neared the release, such that we wanted
to rather keep the known buggy behavior of always having `core.worktree` set,
rather than figuring out how to fix the new bug of having 'git submodule update'
not working in old style repository setups.

This series re-introduces those reverted patches, with no changes in code,
but with drastically changed commit messages, as those focus on why it is safe
to re-introduce them instead of explaining the desire for the change.

[1] https://public-inbox.org/git/2659750.rG6xLiZASK@twilight
[2] f178c13fda (Revert "Merge branch 'sb/submodule-core-worktree'", 2018-09-07)
[3] 4d6d6ef1fc (Merge branch 'sb/submodule-update-in-c', 2018-09-17)
[4] 74d4731da1 (submodule--helper: replace connect-gitdir-workingtree by 
ensure-core-worktree, 2018-08-13)

Stefan Beller (4):
  submodule update: add regression test with old style setups
  submodule: unset core.worktree if no working tree is present
  submodule--helper: fix BUG message in ensure_core_worktree
  submodule deinit: unset core.worktree

 builtin/submodule--helper.c|  4 +++-
 submodule.c| 14 ++
 submodule.h|  2 ++
 t/lib-submodule-update.sh  |  5 +++--
 t/t7400-submodule-basic.sh |  5 +
 t/t7412-submodule-absorbgitdirs.sh |  7 ++-
 6 files changed, 33 insertions(+), 4 deletions(-)

-- 
2.20.0.rc2.403.gdbc3b29805-goog

[PATCH 2/4] submodule: unset core.worktree if no working tree is present

2018-12-07 Thread Stefan Beller

This reintroduces 4fa4f90ccd (submodule: unset core.worktree if no working
tree is present, 2018-06-12), which was reverted as part of f178c13fda
(Revert "Merge branch 'sb/submodule-core-worktree'", 2018-09-07).

4fa4f90ccd was reverted as its followup commit was faulty, but without
the accompanying change of the followup, we'd have an incomplete workflow
of setting `core.worktree` again, when it is needed such as checking out
a revision that contains a submodule.

So re-introduce that commit as-is, focusing on fixing up the followup

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
Signed-off-by: Stefan Beller 
---
 submodule.c   | 14 ++
 submodule.h   |  2 ++
 t/lib-submodule-update.sh |  3 ++-
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/submodule.c b/submodule.c
index 6415cc5580..d393e947e6 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1561,6 +1561,18 @@ int bad_to_remove_submodule(const char *path, unsigned 
flags)
return ret;
 }
 
+void submodule_unset_core_worktree(const struct submodule *sub)
+{
+   char *config_path = xstrfmt("%s/modules/%s/config",
+   get_git_common_dir(), sub->name);
+
+   if (git_config_set_in_file_gently(config_path, "core.worktree", NULL))
+   warning(_("Could not unset core.worktree setting in submodule 
'%s'"),
+ sub->path);
+
+   free(config_path);
+}
+
 static const char *get_super_prefix_or_empty(void)
 {
const char *s = get_super_prefix();
@@ -1726,6 +1738,8 @@ int submodule_move_head(const char *path,
 
if (is_empty_dir(path))
rmdir_or_warn(path);
+
+   submodule_unset_core_worktree(sub);
}
}
 out:
diff --git a/submodule.h b/submodule.h
index a680214c01..9e18e9b807 100644
--- a/submodule.h
+++ b/submodule.h
@@ -131,6 +131,8 @@ int submodule_move_head(const char *path,
const char *new_head,
unsigned flags);
 
+void submodule_unset_core_worktree(const struct submodule *sub);
+
 /*
  * Prepare the "env_array" parameter of a "struct child_process" for executing
  * a submodule by clearing any repo-specific environment variables, but
diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
index 016391723c..51d449 100755
--- a/t/lib-submodule-update.sh
+++ b/t/lib-submodule-update.sh
@@ -709,7 +709,8 @@ test_submodule_recursing_with_args_common() {
git branch -t remove_sub1 origin/remove_sub1 &&
$command remove_sub1 &&
test_superproject_content origin/remove_sub1 &&
-   ! test -e sub1
+   ! test -e sub1 &&
+   test_must_fail git config -f .git/modules/sub1/config 
core.worktree
)
'
# ... absorbing a .git directory along the way.
-- 
2.20.0.rc2.403.gdbc3b29805-goog

[PATCH 3/4] submodule--helper: fix BUG message in ensure_core_worktree

2018-12-07 Thread Stefan Beller

Shortly after f178c13fda (Revert "Merge branch
'sb/submodule-core-worktree'", 2018-09-07), we had another series
that implemented partially the same, ensuring that core.worktree was
set in a checked out submodule, namely 74d4731da1 (submodule--helper:
replace connect-gitdir-workingtree by ensure-core-worktree, 2018-08-13)

As the series 4d6d6ef1fc (Merge branch 'sb/submodule-update-in-c',
2018-09-17) has different goals than the reverted series 7e25437d35
(Merge branch 'sb/submodule-core-worktree', 2018-07-18), I'd wanted to
replay the series on top of it to reach the goal of having `core.worktree`
correctly set when the submodules worktree is present, and unset when the
worktree is not present.

The replay resulted in a strange merge conflict highlighting that
the BUG message was not changed in 74d4731da1 (submodule--helper:
replace connect-gitdir-workingtree by ensure-core-worktree, 2018-08-13).

Fix the error message.

Signed-off-by: Stefan Beller 
---
 builtin/submodule--helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index d38113a31a..31ac30cf2f 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -2045,7 +2045,7 @@ static int ensure_core_worktree(int argc, const char 
**argv, const char *prefix)
struct repository subrepo;
 
if (argc != 2)
-   BUG("submodule--helper connect-gitdir-workingtree  
");
+   BUG("submodule--helper ensure-core-worktree ");
 
path = argv[1];
 
-- 
2.20.0.rc2.403.gdbc3b29805-goog

Re: [PATCH] mailmap: update brandon williams's email address

2018-12-07 Thread Stefan Beller

On Fri, Dec 7, 2018 at 1:40 PM Jonathan Nieder  wrote:
>
> Brandon Williams wrote:
>
> > Signed-off-by: Brandon Williams 
> > ---
> >  .mailmap | 1 +
> >  1 file changed, 1 insertion(+)
>
> I can confirm that this is indeed the same person.

What would be more of interest is why we'd be interested in this patch
as there is no commit/patch sent by Brandon with this email in gits history.

Is that so you get cc'd on your private address and can follow
things you worked on without being subscribed to the mailing list?
(I'd be interested to see the use case in the commit message;)

Thanks,
Stefan

Re: [wishlist] git submodule update --reset-hard

2018-12-07 Thread Stefan Beller

On Thu, Dec 6, 2018 at 5:23 PM Yaroslav Halchenko  wrote:

> > There was a proposal to "re-attach HEAD" in the submodule, i.e.
> > if the branch branch points at the same commit, we don't need
> > a detached HEAD, but could go with the branch instead.
>
> if I got the idea right, if we are talking about any branch, it
> would also non-deterministic since who knows what left over branch(es)
> point to that commit.  Not sure if I would have used that ;)

I would think we'd rather want to have it deterministic, i.e. something like
1) record branch name of the submodule
2) update submodules HEAD to to superprojects gitlink
3) if recorded branch (1) matches the sha1 of detached HEAD,
  have HEAD point to the branch instead.

You notice a small inefficiency here as we write HEAD twice, so it
could be reworded as:
1) compare superprojects gitlink with the submodules branch
2a) if equal, set submodules HEAD to branch
2b) if unequal set HEAD to gitlink value, resulting in detached HEAD

Note that this idea of reattaching reflects the idea (a) below.

> > a) "stay on submodule branch (i.e. HEAD still points at $branch), and
> > reset --hard" such that the submodule has a clean index and at that $branch
> > or
> > b) "stay on submodule branch (i.e. HEAD still points at $branch), but 
> > $branch is
> >set to the gitlink from the superproject, and then a reset --hard
> >will have the worktree set to it as well.

> NB "gitlink" -- just now discovered the thing for me.  Thought it would be
> called a  subproject  echoing what git diff/log -p shows for submodule 
> commits.

The terminology is messy:
The internal representation in Gits object model is a "gitlink" entry in a tree
object. Once we have a .gitmodules entry, we call it submodule.

The term 'subproject' is a historic artifact and will likely not be changed
in the diff output (or format-patch), because these diffs can be applied using
git-am for example. That makes the diff output effectively a transport
protocol, and changing protocols is hard if you have no versioning in them.

More in https://git-scm.com/docs/gitsubmodules (a rather recent new write
of a man page, going into concepts).

> > > right -- I meant the local changes and indeed reset --recurse-submodules
> > > indeed seems to recurse nicely.  Then the undesired effect remaining only
> > > the detached HEAD
>
> > For that we may want to revive discussions in
> > https://public-inbox.org/git/20170501180058.8063-5-sbel...@google.com/
>
> well, isn't that one requires a branch to be specified in .gitmodules?

Ah good point.

> >   git reset --hard --recursive=hard,keep-branch PREVIOUSPOINT
>
> 'keep-branch' (given aforementioned keeping the specified in .gitmodules
> branch) might be confusing.  Also what if a submodule already in a
> detached HEAD?  IMHO --recursive=hard, and just saying that it would do
> "reset --hard", is imho sufficient.  (that is why I like pure
> --reset hard   since it doesn't care and neither does anything to the
> branch)

For that we might want to first do the

  git submodule update --reset-hard

which runs reset --hard inside the submodule, no matter which
branch the submodule is on (if any) and resets to the given
superproject sha1.

See git-submodule.sh in git.git[1] in cmd_update.
We'd need to add a command line flag (`--reset-hard`
would be the obvious choice?) which would set the `update`
variable, which then is evaluated to what needs to be done in
the submodule, which in that case would be the hard reset.

https://github.com/git/git/blob/master/git-submodule.sh#L606

Once that is done we'd want to add a test case, presumably
in t/t7406-submodule-update.sh

> > > I would have asked for
>
> > >git revert --recursive ...
> > >git rebase --recursive [-i] ...
>
> > > which I also frequently desire (could elaborate on the use cases etc).
>
> > These would be nice to have. It would be nice if you'd elaborate on the
> > use cases for future reference in the mailing list archive. :-)
>
> ok, will try to do so ;-) In summary: they are just a logical extension
> of git support for submodules for anyone actively working with
> submodules to keep entire tree in sync.  Then quite often the need for
> reverting a specific commit (which also has changes reflected in
> submodules) arises.  The same with rebase, especially to trim away some
> no longer desired changes reflected in submodules.
>
> the initial "git submodule update --reset-hard" is pretty much a
> crude workaround for some of those cases, so I would just go earlier in
> the history, and redo some things, whenever I could just drop or revert
> some selected set of commits.

That makes sense.
Do you want to give the implementation a try for the --reset-hard switch?

> ah... so it is only   submodule  command which has --recursive, and the
> rest have --recurse-submodules   when talking about recursing into
> submodules?

I don't think we were that cautious in development as it was done by
different

Re: git, monorepos, and access control

2018-12-06 Thread Stefan Beller

On Thu, Dec 6, 2018 at 12:09 PM Johannes Schindelin
 wrote:
>
> Hi,
>
> On Wed, 5 Dec 2018, Jeff King wrote:
>
> > The model that fits more naturally with how Git is implemented would be
> > to use submodules. There you leak the hash of the commit from the
> > private submodule, but that's probably obscure enough (and if you're
> > really worried, you can add a random nonce to the commit messages in the
> > submodule to make their hashes unguessable).
>
> I hear myself frequently saying: "Friends don't let friends use
> submodules". It's almost like: "Some people think their problem is solved
> by using submodules. Only now they have two problems."

Blaming tools for their lack of evolution/development is not necessarily the
right approach. I recall having patches rejected on this very mailing list
that fixed obvious but minor good things like whitespaces and coding style,
because it *might* produce merge conflicts. Would that situation warrant me
to blame the lacks in the merge algorithm, or could you imagine a better
way out? (No need to answer, it's purely to demonstrate that blaming
tooling is not always the right approach; only sometimes it may be)

> There are big reasons, after all, why some companies go for monorepos: it
> is not for lack of trying to go with submodules, it is the problems that
> were incurred by trying to treat entire repositories the same as single
> files (or even trees): they are just too different.

We could change that in more places.

One example you might think of is the output of git-status that displays
changed files. And in case of submodules it would just show
"submodule changes", which we already differentiate into "dirty tree" and
"different sha1 at HEAD".
Instead we could have the output of all changed files recursively in the
superprojects git-status output.

Another example is the diff machinery, which already knows some
basics such as embedding submodule logs or actual diffs.

> In a previous life, I also tried to go for submodules, was burned, and had
> to restart the whole thing. We ended up with something that might work in
> this instance, too, although our use case was not need-to-know type of
> encapsulation. What we went for was straight up modularization.

So this is a "Fix the data instead of the tool", which seems to be a local
optimization (i.e. you only have to do it once, such that it is cheaper to
do than fixing the tool for that workgroup)
... and because everyone does that the tool never gets fixed.

> What I mean is that we split the project up into over 100 individual
> projects that are now all maintained in individual repositories, and they
> are connected completely outside of Git, via a dependency management
> system (in this case, Maven, although that is probably too Java-centric
> for AMD's needs).

Once you have the dependency management system in place, you
will encounter the rare case of still wanting to change things across
repository boundaries at the same time. Submodules offer that, which
is why Android wants to migrate off of the repo tool, and there it seems
natural to go for submodules.

> I just wanted to throw that out here: if you can split up your project
> into individual projects, it might make sense not to maintain them as
> submodules but instead as individual repositories whose artifacts are
> uploaded into a central, versioned artifact store (Maven, NuGet, etc). And
> those artifacts would then be retrieved by the projects that need them.

This is cool and industry standard. But once you happen to run in a bug
that involves 2 new artifacts (but each of the new artifacts work fine
on their own), then you'd wish for something like "git-bisect but across
repositories". Submodules (in theory) allow for fine grained bisection
across these repository boundaries, I would think.

> I figure that that scheme might work for you better than submodules: I
> could imagine that you need to make the build artifacts available even to
> people who are not permitted to look at the corresponding source code,
> anyway.

This is a sensible suggestion, as they probably don't want to ramp up
development on submodules. :-)

Re: [PATCHv2 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip

2018-12-06 Thread Stefan Beller

On Tue, Dec 4, 2018 at 7:10 PM Junio C Hamano  wrote:
>
> Stefan Beller  writes:
>
> > This is a resend of sb/submodule-recursive-fetch-gets-the-tip,
> > with all feedback addressed. As it took some time, I'll send it
> > without range-diff, but would ask for full review.
>
> Is that a "resend" or reroll/update (or whatever word that does not
> imply "just sending the same thing again")?

As you noticed, it is an actual update. I started to use resend
as DScho seems very unhappy about the word reroll claiming we'd
be the only Software community that uses the term reroll for
an iteration of a change.

I see how resend could sound like retransmission without change.

> child_process_init(cp);
>  -  cp->dir = strbuf_detach(_path, NULL);
> -   prepare_submodule_repo_env(>env_array);
>  +  cp->dir = xstrdup(repo->worktree);
> +   prepare_submodule_repo_env(>env_array);
>
> Hmph, I offhand do not see there would be any difference if you
> assigned to cp->dir before or after preparing the repo env, but is
> there a reason these two must be done in this updated order that I
> am missing?  Very similar changes appear multiple times in this
> range-diff.

Jonathan Tan asked for it to be "diff friendly". This -of course- is
range-diff unfriendly.

> [...]

you seem to be OK with a lot of the changes, I did not find an
actionable suggestion.

Thanks for still queuing topics during -rc time,
Stefan

Re: [wishlist] git submodule update --reset-hard

2018-12-06 Thread Stefan Beller

On Thu, Dec 6, 2018 at 1:25 PM Yaroslav Halchenko  wrote:
>
>
> On Thu, 06 Dec 2018, Stefan Beller wrote:
>
> > On Thu, Dec 6, 2018 at 10:02 AM Yaroslav Halchenko  
> > wrote:
>
> > > Dear Git Gurus,
>
> > > I wondered what would be your take on my wishlist request to add
> > > --reset-hard option, which would be very similar to regular "update" which
> > > checks out necessary commit, but I want it to remain in the branch.
>
> > What if the branch differs from the sha1 recorded in the superproject?
>
> git reset --hard  itself is an operation which should be done with some
> level of competence in doing "the right thing" by calling it.  You
> can hop branches even in current (without any submodules in question)
> repository with it and cause as much chaos as you desire.

Right.

git reset --hard would the branch (as well as the working tree) to the
given sha1, which is confusing as submodules get involved.

The Right Thing as of now is the sha1 as found in the
superprojects gitlink. But as that can be different from any branch
in the submodule, we'd rather detach the HEAD to make it
deterministic.

There was a proposal to "re-attach HEAD" in the submodule, i.e.
if the branch branch points at the same commit, we don't need
a detached HEAD, but could go with the branch instead.

> If desired though, a number of prevention mechanisms could be in place (but
> would require option(s) to overcome) to allow submodule to be reset --hard'ed
> only when some conditions met (e.g. only to the commit which is among parent
> commits path of the current branch).  This way wild hops would be prevented,
> although you might still end up in some feature branch.  But since "reset
> --hard" itself doesn't have any safe-guards, I do not really think they should
> be implemented here either.

So are you looking for
a) "stay on submodule branch (i.e. HEAD still points at $branch), and
reset --hard"
such that the submodule has a clean index and at that $branch or
b) "stay on submodule branch (i.e. HEAD still points at $branch), but $branch is
   set to the gitlink from the superproject, and then a reset --hard
will have the worktree
   set to it as well.

(a) is what the referenced submodule.repoLike option implements.

I'd understand the desire for (b) as well, as it is a "real" hard reset on
the superproject level, without detaching branches.

> >   git reset --hard --recurse-submodules HEAD

> it does indeed some trick(s) but not all seems to be the ones I desire:
>
> 1. Seems to migrate submodule's .git directories into the top level
> .git/modules

Ah yes, that happens too. This will help once you want to git-rm
a submodule and checkout states before and after.

> > undesirable in the sense of still having local changes (that is what
> > the above reset with `--recurse` would fix) or changed the branch
> > state? (i.e. is detached but was on a branch before?)
>
> right -- I meant the local changes and indeed reset --recurse-submodules
> indeed seems to recurse nicely.  Then the undesired effect remaining only
> the detached HEAD

For that we may want to revive discussions in
https://public-inbox.org/git/20170501180058.8063-5-sbel...@google.com/


> > >   git submodule update --recursive
>
> > > I would end up in the detached HEADs within submodules.
>
> > > What I want is to retain current branch they are at (or may be possible
> > > "were in"? reflog records might have that information)
>
> > So something like
>
> >   git submodule foreach --recursive git reset --hard
>
> > ?
>
> not quite  -- this would just kill all local changes within each submodule, 
> not
> to reset it to the desired state, which wouldn't be specified in such
> invocation, and is only known to the repo containing it

With this answer it sounds like you'd want (b) from above.

> > You may be interested in
> > https://public-inbox.org/git/20180927221603.148025-1-sbel...@google.com/
> > which introduces a switch `submodule.repoLike [ = true]`, which
> > when set would not detach HEAD in submodules.
>
> Thanks! looks interesting -- was there more discussion/activity beyond those 5
> posts in the thread?

Unfortunately there was not.

> This feature might indeed come handy but if I got it right, it is somewhat
> complimentary to just having submodule update --reset-hard .  E.g.  submodules
> might be in different branches (if I am not tracking based on branch names), 
> so
> I would not want a recursive checkout with -b|-B.  But we would indeed benefit
> from such functionality, since this difficulty of managing branches of
> submodules I think would be elevated with it! (e.g. in one

[PATCH] fetch: ensure submodule objects fetched

2018-12-06 Thread Stefan Beller

Currently when git-fetch is asked to recurse into submodules, it dispatches
a plain "git-fetch -C " (with some submodule related options
such as prefix and recusing strategy, but) without any information of the
remote or the tip that should be fetched.

But this default fetch is not sufficient, as a newly fetched commit in
the superproject could point to a commit in the submodule that is not
in the default refspec. This is common in workflows like Gerrit's.
When fetching a Gerrit change under review (from refs/changes/??), the
commits in that change likely point to submodule commits that have not
been merged to a branch yet.

Fetch a submodule object by id if the object that the superproject
points to, cannot be found. For now this object is fetched from the
'origin' remote as we defer getting the default remote to a later patch.

A list of new submodule commits are already generated in certain
conditions (by check_for_new_submodule_commits()); this new feature
invokes that function in more situations.

The submodule checks were done only when a ref in the superproject
changed, these checks were extended to also be performed when fetching
into FETCH_HEAD for completeness, and add a test for that too.

Signed-off-by: Stefan Beller 
---

Thanks Jonathan for the review!
So it looks like only the last patch needs some improvements,
which is why I'd only resend the last patch here.
Also note the test with interious superproject commits.

All suggestions sounded sensible, addressing them all,
here is a range-diff to the currently queued version:

Range-diff:
1:  04eb06607b ! 1:  ac6558cbc9 fetch: try fetching submodules if needed 
objects were not fetched
@@ -1,6 +1,6 @@
     Author: Stefan Beller 
 
-fetch: try fetching submodules if needed objects were not fetched
+fetch: ensure submodule objects fetched
 
 Currently when git-fetch is asked to recurse into submodules, it 
dispatches
 a plain "git-fetch -C " (with some submodule related 
options
@@ -14,22 +14,19 @@
 commits in that change likely point to submodule commits that have not
 been merged to a branch yet.
 
-Try fetching a submodule by object id if the object id that the
-superproject points to, cannot be found.
+Fetch a submodule object by id if the object that the superproject
+points to, cannot be found. For now this object is fetched from the
+'origin' remote as we defer getting the default remote to a later 
patch.
 
-builtin/fetch used to only inspect submodules when they were fetched
-"on-demand", as in either on/off case it was clear whether the 
submodule
-needs to be fetched. However to know whether we need to try fetching 
the
-object ids, we need to identify the object names, which is done in this
-function check_for_new_submodule_commits(), so we'll also run that code
-in case the submodule recursion is set to "on".
+A list of new submodule commits are already generated in certain
+conditions (by check_for_new_submodule_commits()); this new feature
+invokes that function in more situations.
 
 The submodule checks were done only when a ref in the superproject
 changed, these checks were extended to also be performed when fetching
 into FETCH_HEAD for completeness, and add a test for that too.
     
 Signed-off-by: Stefan Beller 
-Signed-off-by: Junio C Hamano 
 
  diff --git a/builtin/fetch.c b/builtin/fetch.c
  --- a/builtin/fetch.c
@@ -82,7 +79,7 @@
  
struct string_list changed_submodule_names;
 +
-+  /* The submodules to fetch in */
++  /* Pending fetches by OIDs */
 +  struct fetch_task **oid_fetch_tasks;
 +  int oid_fetch_tasks_nr, oid_fetch_tasks_alloc;
  };
@@ -97,13 +94,16 @@
return spf->default_option;
  }
  
++/*
++ * Fetch in progress (if callback data) or
++ * pending (if in oid_fetch_tasks in struct submodule_parallel_fetch)
++ */
 +struct fetch_task {
 +  struct repository *repo;
 +  const struct submodule *sub;
 +  unsigned free_sub : 1; /* Do we need to free the submodule? */
 +
-+  /* fetch specific oids if set, otherwise fetch default refspec */
-+  struct oid_array *commits;
++  struct oid_array *commits; /* Ensure these commits are fetched */
 +};
 +
 +/**
@@ -176,7 +176,6 @@
  
for (; spf->count < spf->r->index->cache_nr; spf->count++) {
 -  struct strbuf submodule_prefix = STRBUF_INIT;
-+  int recurse_config;
const struct cache_entry *ce = spf->r->index->cache[spf->count];
const char *default_argv;
 -  const struct submodule *submodule;
@@ -199,11 +198,9 @@
 +  task = fetch_task_create(spf-&

Re: [wishlist] git submodule update --reset-hard

2018-12-06 Thread Stefan Beller

On Thu, Dec 6, 2018 at 10:02 AM Yaroslav Halchenko  wrote:
>
> Dear Git Gurus,
>
> I wondered what would be your take on my wishlist request to add
> --reset-hard option, which would be very similar to regular "update" which
> checks out necessary commit, but I want it to remain in the branch.

What if the branch differs from the sha1 recorded in the superproject?

> Rationale: In DataLad we heavily rely on submodules, and we have established
> easy ways to do some manipulations across full hierarchies of them. E.g. a
> single command could introduce a good number of commits across deep hierarchy
> of submodules, e.g. while committing changes within deep submodule, while also
> doing all necessary commits in the repositories leading to that submodule so
> the entire tree of them stays in a "clean" state. The difficulty comes when
> there is a need to just "forget" some changes.  The obvious way is to e.g.
>
>git reset --hard PREVIOUS_STATE

  git reset --hard --recurse-submodules HEAD

would do the trick

> in the top level repository.  But that leaves all the submodules now in
> the undesired state.  If I do

undesirable in the sense of still having local changes (that is what
the above reset with `--recurse` would fix) or changed the branch
state? (i.e. is detached but was on a branch before?)

>   git submodule update --recursive
>
> I would end up in the detached HEADs within submodules.
>
> What I want is to retain current branch they are at (or may be possible
> "were in"? reflog records might have that information)

So something like

  git submodule foreach --recursive git reset --hard

?

You may be interested in
https://public-inbox.org/git/20180927221603.148025-1-sbel...@google.com/
which introduces a switch `submodule.repoLike [ = true]`, which
when set would not detach HEAD in submodules.

Can you say more about the first question above:
Would you typically have situations where the
submodule branch is out of sync with the superproject
and how do you deal with that?

Adding another mode to `git submodule update` sounds
reasonable to me, too.

Stefan

Re: A case where diff.colorMoved=plain is more sensible than diff.colorMoved=zebra & others

2018-12-06 Thread Stefan Beller

On Thu, Dec 6, 2018 at 6:58 AM Phillip Wood  wrote:

> > So is there some "must be at least two consecutive lines" condition for
> > not-plain, or is something else going on here?
>
> To be considered a block has to have 20 alphanumeric characters - see
> commit f0b8fb6e59 ("diff: define block by number of alphanumeric chars",
> 2017-08-15). This stops things like random '}' lines being marked as
> moved on their own.

This is spot on.

All but the "plain" mode use the concept of "blocks" of code
(there is even one mode called "blocks", which adds to the confusion).

> It might be better to use some kind of frequency
> information (a bit like python's difflib junk parameter) instead so that
> (fairly) unique short lines also get marked properly.

Yes that is what I was initially thinking about. However to have good
information, you'd need to index a whole lot (the whole repository,
i.e. all text blobs in existence?) to get an accurate picture of frequency
information, which I'd prefer to call entropy as I come from a background
familiar with https://en.wikipedia.org/wiki/Information_theory, I am not
sure where 'frequency information' comes from -- it sounds like the
same concept.

Of course it is too expensive to run an operation O(repository size)
just for this diff, so maybe we could get away with some smaller
corpus to build up this information on what is sufficient for coloring.

When only looking at the given diff, I would imagine that each line
would not carry a whole lot of information as its characters occur
rather frequently compared to the rest of the diff.

Best,
Stefan

Re: [ANNOUNCE] Git v2.20.0-rc2

2018-12-04 Thread Stefan Beller

-cc linux list


> Perhaps we should note this more prominently, and since Brandon isn't at
> Google anymore can some of you working there edit this post? It's the
> first Google result for "git protocol v2", so it's going to be quite
> confusing for people if after 2.20 the instructions in it no longer
> work.

Thanks for pointing this out, we missed the implications when that patch was
discussed. Looking into it.
>
> 1. 
> https://opensource.googleblog.com/2018/05/introducing-git-protocol-version-2.html

Re: [PATCH on sb/more-repo-in-api] revision: use commit graph in get_reference()

2018-12-04 Thread Stefan Beller

On Tue, Dec 4, 2018 at 2:42 PM Jonathan Tan  wrote:
>
> When fetching into a repository, a connectivity check is first made by
> check_exist_and_connected() in builtin/fetch.c that runs:
>
>   git rev-list --objects --stdin --not --all --quiet <(list of objects)
>
> If the client repository has many refs, this command can be slow,
> regardless of the nature of the server repository or what is being
> fetched. A profiler reveals that most of the time is spent in
> setup_revisions() (approx. 60/63), and of the time spent in
> setup_revisions(), most of it is spent in parse_object() (approx.
> 49/60). This is because setup_revisions() parses the target of every ref
> (from "--all"), and parse_object() reads the buffer of the object.
>
> Reading the buffer is unnecessary if the repository has a commit graph
> and if the ref points to a commit (which is typically the case). This
> patch uses the commit graph wherever possible; on my computer, when I
> run the above command with a list of 1 object on a many-ref repository,
> I get a speedup from 1.8s to 1.0s.
>
> Another way to accomplish this effect would be to modify parse_object()
> to use the commit graph if possible; however, I did not want to change
> parse_object()'s current behavior of always checking the object
> signature of the returned object.
>
> Signed-off-by: Jonathan Tan 
> ---
> This is on sb/more-repo-in-api because I'm using the repo_parse_commit()
> function.

This is a mere nicety, not strictly required.
Before we had parse_commit(struct commit *) which would accomplish the
same, (and we'd still have that afterwards as a #define falling back onto
the_repository). As the function get_reference() is not the_repository safe
as it contains a call to is_promisor_object() that is repository
agnostic, I think
it would be fair game to not depend on that series. I am not
complaining, though.

> A colleague noticed this issue when handling a mirror clone.
>
> Looking at the bigger picture, the speed of the connectivity check
> during a fetch might be further improved by passing only the negotiation
> tips (obtained through --negotiation-tip) instead of "--all". This patch
> just handles the low-hanging fruit first.
> ---
>  revision.c | 15 ++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/revision.c b/revision.c
> index b5108b75ab..e7da2c57ab 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -212,7 +212,20 @@ static struct object *get_reference(struct rev_info 
> *revs, const char *name,
>  {
> struct object *object;
>
> -   object = parse_object(revs->repo, oid);
> +   /*
> +* If the repository has commit graphs, repo_parse_commit() avoids
> +* reading the object buffer, so use it whenever possible.
> +*/
> +   if (oid_object_info(revs->repo, oid, NULL) == OBJ_COMMIT) {
> +   struct commit *c = lookup_commit(revs->repo, oid);
> +   if (!repo_parse_commit(revs->repo, c))
> +   object = (struct object *) c;
> +   else
> +   object = NULL;

Would it make sense in this case to rely on parse_object below
instead of assigning NULL? The reason for that would be that
when lookup_commit returns NULL, we would try more broadly.

AFAICT oid_object_info doesn't take advantage of the commit graph,
but just looks up the object header, which is still less than completely
parsing it. Then lookup_commit is overly strict, as it may return
NULL as when there still is a type mismatch (I don't think a mismatch
could happen here, as both rely on just the object store, and not the
commit graph.), so this would be just defensive programming for
the sake of it. I dunno.

struct commit *c;

if (oid_object_info(revs->repo, oid, NULL) == OBJ_COMMIT &&
(c = lookup_commit(revs->repo, oid)) &&
!repo_parse_commit(revs->repo, c))
object = (struct object *) c;
else
object = parse_object(revs->repo, oid);


So with all that said, I still think this is a good patch.

Thanks,
Stefan

> +   } else {
> +   object = parse_object(revs->repo, oid);
> +   }
> +
> if (!object) {
> if (revs->ignore_missing)
> return object;
> --
> 2.19.0.271.gfe8321ec05.dirty
>

Re: [WIP RFC 5/5] upload-pack: send part of packfile response as uri

2018-12-04 Thread Stefan Beller

On Mon, Dec 3, 2018 at 3:38 PM Jonathan Tan  wrote:
>
> This is a partial implementation of upload-pack sending part of its
> packfile response as URIs.

It does so by implementing a new flag `--exclude-configured-blobs`
in pack-objects, which would change the output of pack-objects to
output a list of URLs (of the excluded blobs) followed by the
pack to be asked for.

This design seems easy to implement as then upload-pack
can just parse the output and only needs to insert
"packfile" and "packfile-uris\n" at the appropriate places
of the stream, otherwise it just passes through the information
obtained from pack-objects.

The design as-is would make for hard documentation of
pack-objects (its output is not just a pack anymore when that
flag is given, but a highly optimized byte stream).

Initially I did not anticipate this to be one of the major design problems
as I assumed we'd want to use this pack feature over broadly (e.g.
eventually by offloading most of the objects into a base pack that
is just always included as the likelihood for any object in there is
very high on initial clone), but it makes total sense to only
output the URIs that we actually need.

An alternative that comes very close to the current situation
would be to either pass a file path or file descriptor (that upload-pack
listens to in parallel) to pack-objects as an argument of the new flag.
Then we would not need to splice the protocol sections into the single
output stream, but we could announce the sections, then flush
the URIs and then flush the pack afterwards.

I looked at this quickly, but that would need either extensions in
run-command.c for setting up the new fd for us, as there we already
have OS specific code for these setups, or we'd have to duplicate
some of the logic here, which doesn't enthuse me either.

So maybe we'd create a temp file via mkstemp and pass
the file name to pack-objects for writing the URIs and then
we'd just need to stream that file?

> +   # NEEDSWORK: "git clone" fails here because it ignores the URI 
> provided
> +   # instead of fetching it.
> +   test_must_fail env GIT_TRACE_PACKET="$(pwd)/log" \
> +   git -c protocol.version=2 clone \
> +   "$HTTPD_URL/smart/http_parent" http_child 2>err &&
> +   # Although "git clone" fails, we can still check that the server
> +   # provided the URI we requested and that the error message pinpoints
> +   # the object that is missing.
> +   grep "clone< uri https://example.com/a-uri; log &&
> +   test_i18ngrep "did not receive expected object $(cat h)" err

That is a nice test!

Re: [WIP RFC 3/5] upload-pack: refactor reading of pack-objects out

2018-12-03 Thread Stefan Beller

On Mon, Dec 3, 2018 at 3:37 PM Jonathan Tan  wrote:
>
> Subsequent patches will change how the output of pack-objects is
> processed, so extract that processing into its own function.
>
> Currently, at most 1 character can be buffered (in the "buffered" local
> variable). One of those patches will require a larger buffer, so replace
> that "buffered" local variable with a buffer array.

This buffering sounds oddly similar to the pkt reader which can buffer
at most one pkt, the difference being that we'd buffer bytes
instead of pkts.

Re: [WIP RFC 2/5] Documentation: add Packfile URIs design doc

2018-12-03 Thread Stefan Beller

Thanks for bringing this design to the list!

> diff --git a/Documentation/technical/protocol-v2.txt 
> b/Documentation/technical/protocol-v2.txt
> index 345c00e08c..2cb1c41742 100644
> --- a/Documentation/technical/protocol-v2.txt
> +++ b/Documentation/technical/protocol-v2.txt
> @@ -313,7 +313,8 @@ header. Most sections are sent only when the packfile is 
> sent.
>
>  output = acknowledgements flush-pkt |
>  [acknowledgments delim-pkt] [shallow-info delim-pkt]
> -[wanted-refs delim-pkt] packfile flush-pkt
> +[wanted-refs delim-pkt] [packfile-uris delim-pkt]
> +packfile flush-pkt

While this is an RFC and incomplete, we'd need to remember to
add packfile-uris to the capabilities list above, stating that it requires
thin-pack and ofs-delta to be sent, and what to expect from it.

The mention of  --no-packfile-urls in the Client design above
seems to imply we'd want to turn it on by default, which I thought
was not the usual stance how we introduce new things.

An odd way of disabling it would be --no-thin-pack, hoping the
client side implementation abides by the implied requirements.

>  acknowledgments = PKT-LINE("acknowledgments" LF)
>   (nak | *ack)
> @@ -331,6 +332,9 @@ header. Most sections are sent only when the packfile is 
> sent.
>   *PKT-LINE(wanted-ref LF)
>  wanted-ref = obj-id SP refname
>
> +packfile-uris = PKT-LINE("packfile-uris" LF) *packfile-uri
> +packfile-uri = PKT-LINE("uri" SP *%x20-ff LF)

Is the *%x20-ff a fancy way of saying obj-id?

While the server is configured with pairs of (oid URL),
we would not need to send the exact oid to the client
as that is what the client can figure out on its own by reading
the downloaded pack.

Instead we could send an integrity hash (i.e. the packfile
downloaded from "uri" is expected to hash to $oid here)

Thanks,
Stefan

Re: [WIP RFC 0/5] Design for offloading part of packfile response to CDN

2018-12-03 Thread Stefan Beller

On Mon, Dec 3, 2018 at 3:37 PM Jonathan Tan  wrote:

>
> There is a potential issue: a server which produces both the URIs and
> the packfile at roughly the same time (like the implementation in this
> patch set) will not have sideband access until it has concluded sending
> the URIs. Among other things, this means that the server cannot send
> keepalive packets until quite late in the response. One solution to this
> might be to add a feature that allows the server to use a sideband
> throughout the whole response - and this has other benefits too like
> allowing servers to inform the client throughout the whole fetch, not
> just at the end.

While side band sounds like the right thing to do, we could also
sending (NULL)-URIs within this feature.

Re: [PATCH] pack-protocol.txt: accept error packets in any context

2018-12-03 Thread Stefan Beller

> diff --git a/pkt-line.c b/pkt-line.c
> index 04d10bbd0..ce9e42d10 100644
> --- a/pkt-line.c
> +++ b/pkt-line.c
> @@ -346,6 +346,10 @@ enum packet_read_status packet_read_with_status(int fd, 
> char **src_buffer,
> return PACKET_READ_EOF;
> }
>
> +   if (starts_with(buffer, "ERR ")) {
> +   die(_("remote error: %s"), buffer + 4);
> +   }
> +

Handling any ERR line in the pkt reader is okay, as
* we do not pkt-ize pack files and
* we do not have any other parts of the protocol where user data is in
  the first four bytes, which could randomly match this pattern and
* the rest of the pkt-ized part of the protocol never sends
  ERR lines.

Makes sense.

Re: [PATCH] sideband: color lines with keyword only

2018-12-03 Thread Stefan Beller

On Mon, Dec 3, 2018 at 3:23 PM Jonathan Nieder  wrote:

> I was curious about what versions of Gerrit this is designed to
> support (or in other words whether it's a bug fix or a feature).
> Looking at examples like [1], it seems that Gerrit historically always
> used "ERROR:" so the 59a255aef0 logic would work for it.  More
> recently, [2] (ReceiveCommits: add a "SUCCESS" marker for successful
> change updates, 2018-08-21) put SUCCESS on a line of its own.  That
> puts this squarely in the new-feature category.

Ooops. From the internal bug, I assumed this to be long standing Gerrit
behavior, which is why I sent it out in -rc to begin with.

> > --- a/sideband.c
> > +++ b/sideband.c
> > @@ -87,7 +87,7 @@ static void maybe_colorize_sideband(struct strbuf *dest, 
> > const char *src, int n)
> >   struct keyword_entry *p = keywords + i;
> >   int len = strlen(p->keyword);
> >
> > - if (n <= len)
> > + if (n < len)
> >   continue;
>
> In the old code, we would escape early if 'n == len', but we didn't
> need to.  If 'n == len', then
>
> src[len] == '\0'

src[len] could also be one of "\n\r", see the caller
recv_sideband for sidebase case 2.

> src .. [len-1] is a valid buffer to read from
>
> so the strncasecmp and strbuf_add operations used in this function are
> valid.  Good.

Yes, they are all valid...

> > - if (!strncasecmp(p->keyword, src, len) && !isalnum(src[len])) 
> > {
> > + if (!strncasecmp(p->keyword, src, len) &&
> > + (len == n || !isalnum(src[len]))) {
>
> Our custom isalnum treats '\0' as not alphanumeric (sane_ctype[0] ==
> GIT_CNTRL) so this part of the patch is unnecessary.  That said, it's
> good for clarity and defensive programming.

... but here we need to check for src[len] for validity.

I made no assumptions about isalnum, but rather needed to shortcut
the condition, as accessing src[len] would be out of bounds, no?

>
> >   strbuf_addstr(dest, p->color);
> >   strbuf_add(dest, src, len);

unlike here (or the rest of the block), where len is used correctly.

[PATCH] sideband: color lines with keyword only

2018-12-03 Thread Stefan Beller

When bf1a11f0a1 (sideband: highlight keywords in remote sideband output,
2018-08-07) was introduced, it was carefully considered which strings
would be highlighted. However 59a255aef0 (sideband: do not read beyond
the end of input, 2018-08-18) brought in a regression that the original
did not test for. A line containing only the keyword and nothing else
("SUCCESS") should still be colored.

Signed-off-by: Stefan Beller 
---
 sideband.c  | 5 +++--
 t/t5409-colorize-remote-messages.sh | 2 ++
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/sideband.c b/sideband.c
index 368647acf8..7c3d33d3f8 100644
--- a/sideband.c
+++ b/sideband.c
@@ -87,7 +87,7 @@ static void maybe_colorize_sideband(struct strbuf *dest, 
const char *src, int n)
struct keyword_entry *p = keywords + i;
int len = strlen(p->keyword);
 
-   if (n <= len)
+   if (n < len)
continue;
/*
 * Match case insensitively, so we colorize output from existing
@@ -95,7 +95,8 @@ static void maybe_colorize_sideband(struct strbuf *dest, 
const char *src, int n)
 * messages. We only highlight the word precisely, so
 * "successful" stays uncolored.
 */
-   if (!strncasecmp(p->keyword, src, len) && !isalnum(src[len])) {
+   if (!strncasecmp(p->keyword, src, len) &&
+   (len == n || !isalnum(src[len]))) {
strbuf_addstr(dest, p->color);
strbuf_add(dest, src, len);
strbuf_addstr(dest, GIT_COLOR_RESET);
diff --git a/t/t5409-colorize-remote-messages.sh 
b/t/t5409-colorize-remote-messages.sh
index f81b6813c0..2a8c449661 100755
--- a/t/t5409-colorize-remote-messages.sh
+++ b/t/t5409-colorize-remote-messages.sh
@@ -17,6 +17,7 @@ test_expect_success 'setup' '
echo " " "error: leading space"
echo ""
echo Err
+   echo SUCCESS
exit 0
EOF
echo 1 >file &&
@@ -35,6 +36,7 @@ test_expect_success 'keywords' '
grep "error: error" decoded &&
grep "hint:" decoded &&
grep "success:" decoded &&
+   grep "SUCCESS" decoded &&
grep "warning:" decoded
 '
 
-- 
2.20.0.rc2.403.gdbc3b29805-goog

Re: [PATCH/RFC v2 0/7] Introduce new commands switch-branch and checkout-files

2018-12-03 Thread Stefan Beller

On Thu, Nov 29, 2018 at 7:33 AM Duy Nguyen  wrote:
>
> On Wed, Nov 28, 2018 at 9:30 PM Stefan Beller  wrote:
> >
> > On Wed, Nov 28, 2018 at 12:09 PM Duy Nguyen  wrote:
> > >
> > > On Wed, Nov 28, 2018 at 9:01 PM Duy Nguyen  wrote:
> > > > should we do
> > > > something about detached HEAD in this switch-branch command (or
> > > > whatever its name will be)?
> > > >
> > > > This is usually a confusing concept to new users
> > >
> > > And it just occurred to me that perhaps we should call this "unnamed
> > > branch" (at least at high UI level) instead of detached HEAD. It is
> > > technically not as accurate, but much better to understand.
> >
> > or 'direct' branch?
>
> makes me think, what is an indirect branch?

I drew the term from HEAD pointing to a branch pointing
to a commit, i.e. HEAD indirectly points to a commit, but
in 'direct' branch mode, HEAD contains the commit id.

So indirect branch would work for our current 'real' branches.

When asked out of context of this discussion, I might add
yet another layer of abstraction to make an 'indirect branch',
i.e. HEAD pointing to a symbolic ref that points at a branch
that points to a commit.

The term symref is what we currently use
(Just looked into gitglossary, where we distinguish
symbolic refs from pseudorefs) for hat I would have called
an indirect branch as well.

So maybe we need to measure the level of indirection
("How often do we need to dereference the ref/object to get
a commit oid?") to come to terms in how to describe
the use cases easily.

Here is a fun-one:
  git checkout 
  git checkout --detach

Currently the --detach option detaches HEAD from
branch pointing at the object id, i.e. it is the same as
  git checkout 

whereas when we focus on the levels of indirection
it would also be reasonable to have
  git checkout 
as a reasonable alternative, where  is the
branch that is pointed at from the .

Re: [PATCH v2 6/7] checkout: split into switch-branch and checkout-files

2018-11-29 Thread Stefan Beller

> > Idea:
> > If git checkout-files modifies the submodules file, it could also
> > auto-update the submodules. (For example, with something like "git
> > submodule update --init --recursive --progress").
>
> This one is tricky because we should deal with submodule autoupdate
> consistently across all porcelain commands (or at least common ones),
> not just checkout-files. I'd prefer to delay this until later. Once we
> figure out what to do with other commands, then we can still change
> defaults for checkout-files.

checkout/reset are respecting the submodule.recurse setting for this
already, and as your patches only change the UX frontend

git -c submodule.recurse checkout-files 

would also touch submodules. Given that deep down in
the submodules it's all about files again, we could think
checkout-files is still a good name.

I think Stefan X. is asking for making submodule.recurse
to default to true, which is indeed unrelated to this.

Stefan

Re: [PATCH 09/10] fetch: try fetching submodules if needed objects were not fetched

2018-11-28 Thread Stefan Beller

On Fri, Oct 26, 2018 at 1:41 PM Jonathan Tan  wrote:
>
> > But this default fetch is not sufficient, as a newly fetched commit in
> > the superproject could point to a commit in the submodule that is not
> > in the default refspec. This is common in workflows like Gerrit's.
> > When fetching a Gerrit change under review (from refs/changes/??), the
> > commits in that change likely point to submodule commits that have not
> > been merged to a branch yet.
> >
> > Try fetching a submodule by object id if the object id that the
> > superproject points to, cannot be found.
>
> I see that these suggestions of mine (from [1]) was implemented, but not
> others. If you disagree, that's fine, but I think they should be
> discussed.

ok.

>
> > - if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
> > - (recurse_submodules != RECURSE_SUBMODULES_ON))
> > + if (recurse_submodules != RECURSE_SUBMODULES_OFF)
>
> I think the next patch should be squashed into this patch. Then you can
> say that these are now redundant and can be removed.

ok.

>
> > @@ -1218,8 +1218,12 @@ struct submodule_parallel_fetch {
> >   int result;
> >
> >   struct string_list changed_submodule_names;
> > + struct get_next_submodule_task **fetch_specific_oids;
> > + int fetch_specific_oids_nr, fetch_specific_oids_alloc;
> >  };
>
> Add documentation for fetch_specific_oids. Also, it might be better to
> call these oid_fetch_tasks and the struct, "struct fetch_task".

ok.

> Here, struct get_next_submodule_task is used for 2 different things:
>  (1) After the first round fetch, fetch_finish() uses it to determine if
>  a second round is needed.
>  (2) In submodule_parallel_fetch.fetch_specific_oids, to tell the
>  parallel runner (through get_next_submodule_task()) to do this
>  fetch.
>
> I think that it's better to have 2 different structs for these 2
> different uses. (Note that task_cb can be NULL for the second round.
> Unless we plan to recheck the OIDs? Currently we recheck them, but we
> don't do anything either way.)

I think it is easier to only have one struct until we have substantially
more to communicate. (1) is a boolean answer, for which (non-)NULL
is sufficient.


> I think that this is best described as the submodule that has no entry
> in .gitmodules? Maybe call it "get_non_gitmodules_submodule" and
> document it with a similar comment as in get_submodule_repo_for().

done.

>
> > +
> > +static struct get_next_submodule_task *get_next_submodule_task_create(
> > + struct repository *r, const char *path)
> > +{
> > + struct get_next_submodule_task *task = xmalloc(sizeof(*task));
> > + memset(task, 0, sizeof(*task));
> > +
> > + task->sub = submodule_from_path(r, _oid, path);
> > + if (!task->sub) {
> > + task->sub = get_default_submodule(path);
> > + task->free_sub = 1;
> > + }
> > +
> > + return task;
> > +}
>
> Clearer to me would be to make get_next_submodule_task_create() return
> NULL if submodule_from_path() returns NULL.

I doubled down on this one and return NULL when get_default_submodule
(now renamed to get_non_gitmodules_submodule) returns NULL, as then we
can move the free() from get_next_submodule here and there we'll just have

task = fetch_task_create(spf->r, ce->name);
if (!task)
continue;

which helps get_next_submodule to stay readable.


> Same comment about "on-demand" as in my previous e-mail.

I'd want to push back on refactoring and defer that to a later series.

> Break lines to 80.
[...]
> Same comment about "s--h" as in my previous e-mail.

done

> > + /* Are there commits that do not exist? */
> > + if (commits->nr) {
> > + /* We already tried fetching them, do not try again. */
> > + if (task->commits)
> > + return 0;
>
> Same comment about "task->commits" as in my previous e-mail.

Good call. I reordered the function read easier and added a comment
on any early return how it could happen.

>
> > diff --git a/t/t5526-fetch-submodules.sh b/t/t5526-fetch-submodules.sh
> > index 6c2f9b2ba2..5a75b57852 100755
>
> One more thing to test is the case where a submodule doesn't have a
> .gitmodules entry.

added a test.

I just resent the series.

Stefan

[PATCH 4/9] submodule.c: tighten scope of changed_submodule_names struct

2018-11-28 Thread Stefan Beller

The `changed_submodule_names` are only used for fetching, so let's make it
part of the struct that is passed around for fetching submodules.

Signed-off-by: Stefan Beller 
---
 submodule.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/submodule.c b/submodule.c
index 3c388f85cc..f93f0aff82 100644
--- a/submodule.c
+++ b/submodule.c
@@ -25,7 +25,6 @@
 #include "commit-reach.h"
 
 static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF;
-static struct string_list changed_submodule_names = STRING_LIST_INIT_DUP;
 static int initialized_fetch_ref_tips;
 static struct oid_array ref_tips_before_fetch;
 static struct oid_array ref_tips_after_fetch;
@@ -1136,7 +1135,8 @@ void check_for_new_submodule_commits(struct object_id 
*oid)
oid_array_append(_tips_after_fetch, oid);
 }
 
-static void calculate_changed_submodule_paths(struct repository *r)
+static void calculate_changed_submodule_paths(struct repository *r,
+   struct string_list *changed_submodule_names)
 {
struct argv_array argv = ARGV_ARRAY_INIT;
struct string_list changed_submodules = STRING_LIST_INIT_DUP;
@@ -1174,7 +1174,8 @@ static void calculate_changed_submodule_paths(struct 
repository *r)
continue;
 
if (!submodule_has_commits(r, path, commits))
-   string_list_append(_submodule_names, 
name->string);
+   string_list_append(changed_submodule_names,
+  name->string);
}
 
free_submodules_oids(_submodules);
@@ -1221,8 +1222,10 @@ struct submodule_parallel_fetch {
int default_option;
int quiet;
int result;
+
+   struct string_list changed_submodule_names;
 };
-#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0}
+#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, 
STRING_LIST_INIT_DUP }
 
 static int get_fetch_recurse_config(const struct submodule *submodule,
struct submodule_parallel_fetch *spf)
@@ -1284,7 +1287,7 @@ static int get_next_submodule(struct child_process *cp,
case RECURSE_SUBMODULES_ON_DEMAND:
if (!submodule ||
!string_list_lookup(
-   _submodule_names,
+   >changed_submodule_names,
submodule->name))
continue;
default_argv = "on-demand";
@@ -1376,8 +1379,8 @@ int fetch_populated_submodules(struct repository *r,
argv_array_push(, "--recurse-submodules-default");
/* default value, "--submodule-prefix" and its value are added later */
 
-   calculate_changed_submodule_paths(r);
-   string_list_sort(_submodule_names);
+   calculate_changed_submodule_paths(r, _submodule_names);
+   string_list_sort(_submodule_names);
run_processes_parallel(max_parallel_jobs,
   get_next_submodule,
   fetch_start_failure,
@@ -1386,7 +1389,7 @@ int fetch_populated_submodules(struct repository *r,
 
argv_array_clear();
 out:
-   string_list_clear(_submodule_names, 1);
+   string_list_clear(_submodule_names, 1);
return spf.result;
 }
 
-- 
2.20.0.rc1.387.gf8505762e3-goog

[PATCH 9/9] fetch: try fetching submodules if needed objects were not fetched

2018-11-28 Thread Stefan Beller

Currently when git-fetch is asked to recurse into submodules, it dispatches
a plain "git-fetch -C " (with some submodule related options
such as prefix and recusing strategy, but) without any information of the
remote or the tip that should be fetched.

But this default fetch is not sufficient, as a newly fetched commit in
the superproject could point to a commit in the submodule that is not
in the default refspec. This is common in workflows like Gerrit's.
When fetching a Gerrit change under review (from refs/changes/??), the
commits in that change likely point to submodule commits that have not
been merged to a branch yet.

Try fetching a submodule by object id if the object id that the
superproject points to, cannot be found.

builtin/fetch used to only inspect submodules when they were fetched
"on-demand", as in either on/off case it was clear whether the submodule
needs to be fetched. However to know whether we need to try fetching the
object ids, we need to identify the object names, which is done in this
function check_for_new_submodule_commits(), so we'll also run that code
in case the submodule recursion is set to "on".

The submodule checks were done only when a ref in the superproject
changed, these checks were extended to also be performed when fetching
into FETCH_HEAD for completeness, and add a test for that too.

Signed-off-by: Stefan Beller 
---
 builtin/fetch.c |  11 +-
 submodule.c | 206 +++-
 t/t5526-fetch-submodules.sh |  86 +++
 3 files changed, 265 insertions(+), 38 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index e0140327aa..91f9b7d9c8 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -763,9 +763,6 @@ static int update_local_ref(struct ref *ref,
what = _("[new ref]");
}
 
-   if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
-   (recurse_submodules != RECURSE_SUBMODULES_ON))
-   check_for_new_submodule_commits(>new_oid);
r = s_update_ref(msg, ref, 0);
format_display(display, r ? '!' : '*', what,
   r ? _("unable to update local ref") : NULL,
@@ -779,9 +776,6 @@ static int update_local_ref(struct ref *ref,
strbuf_add_unique_abbrev(, >object.oid, 
DEFAULT_ABBREV);
strbuf_addstr(, "..");
strbuf_add_unique_abbrev(, >new_oid, 
DEFAULT_ABBREV);
-   if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
-   (recurse_submodules != RECURSE_SUBMODULES_ON))
-   check_for_new_submodule_commits(>new_oid);
r = s_update_ref("fast-forward", ref, 1);
format_display(display, r ? '!' : ' ', quickref.buf,
   r ? _("unable to update local ref") : NULL,
@@ -794,9 +788,6 @@ static int update_local_ref(struct ref *ref,
strbuf_add_unique_abbrev(, >object.oid, 
DEFAULT_ABBREV);
strbuf_addstr(, "...");
strbuf_add_unique_abbrev(, >new_oid, 
DEFAULT_ABBREV);
-   if ((recurse_submodules != RECURSE_SUBMODULES_OFF) &&
-   (recurse_submodules != RECURSE_SUBMODULES_ON))
-   check_for_new_submodule_commits(>new_oid);
r = s_update_ref("forced-update", ref, 1);
format_display(display, r ? '!' : '+', quickref.buf,
   r ? _("unable to update local ref") : _("forced 
update"),
@@ -892,6 +883,8 @@ static int store_updated_refs(const char *raw_url, const 
char *remote_name,
ref->force = rm->peer_ref->force;
}
 
+   if (recurse_submodules != RECURSE_SUBMODULES_OFF)
+   check_for_new_submodule_commits(>old_oid);
 
if (!strcmp(rm->name, "HEAD")) {
kind = "";
diff --git a/submodule.c b/submodule.c
index d1b6646f42..1ce944a737 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1231,8 +1231,14 @@ struct submodule_parallel_fetch {
int result;
 
struct string_list changed_submodule_names;
+
+   /* The submodules to fetch in */
+   struct fetch_task **oid_fetch_tasks;
+   int oid_fetch_tasks_nr, oid_fetch_tasks_alloc;
 };
-#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, 
STRING_LIST_INIT_DUP }
+#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, \
+ STRING_LIST_INIT_DUP, \
+ NULL, 0, 0}
 
 static int get_fetch_recurse_config(const struct submodule *submodule,
struct submodule_parallel_fetch *spf)
@@ -1259,6 +1265,73

[PATCHv2 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip

2018-11-28 Thread Stefan Beller

This is a resend of sb/submodule-recursive-fetch-gets-the-tip,
with all feedback addressed. As it took some time, I'll send it
without range-diff, but would ask for full review.

I plan on resending after the next release as this got delayed quite a bit,
which is why I also rebased it to master.

Thanks,
Stefan

Previous round:
https://public-inbox.org/git/20181016181327.107186-1-sbel...@google.com/

Stefan Beller (9):
  sha1-array: provide oid_array_filter
  submodule.c: fix indentation
  submodule.c: sort changed_submodule_names before searching it
  submodule.c: tighten scope of changed_submodule_names struct
  submodule: store OIDs in changed_submodule_names
  repository: repo_submodule_init to take a submodule struct
  submodule: migrate get_next_submodule to use repository structs
  submodule.c: fetch in submodules git directory instead of in worktree
  fetch: try fetching submodules if needed objects were not fetched

 Documentation/technical/api-oid-array.txt|   5 +
 builtin/fetch.c  |  11 +-
 builtin/grep.c   |  17 +-
 builtin/ls-files.c   |  12 +-
 builtin/submodule--helper.c  |   2 +-
 repository.c |  27 +-
 repository.h |  12 +-
 sha1-array.c |  17 ++
 sha1-array.h |   3 +
 submodule.c  | 284 ---
 t/helper/test-submodule-nested-repo-config.c |   8 +-
 t/t5526-fetch-submodules.sh  |  86 ++
 12 files changed, 395 insertions(+), 89 deletions(-)

-- 
2.20.0.rc1.387.gf8505762e3-goog

[PATCH 5/9] submodule: store OIDs in changed_submodule_names

2018-11-28 Thread Stefan Beller

'calculate_changed_submodule_paths' uses a local list to compute the
changed submodules, and then produces the result by copying appropriate
items into the result list.

Instead use the result list directly and prune items afterwards
using string_list_remove_empty_items.

By doing so we'll have access to the util pointer for longer that
contains the commits that we need to fetch, which will be
useful in a later patch.

Signed-off-by: Stefan Beller 
Reviewed-by: Jonathan Tan 
---
 submodule.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/submodule.c b/submodule.c
index f93f0aff82..0c81aca6f2 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1139,8 +1139,7 @@ static void calculate_changed_submodule_paths(struct 
repository *r,
struct string_list *changed_submodule_names)
 {
struct argv_array argv = ARGV_ARRAY_INIT;
-   struct string_list changed_submodules = STRING_LIST_INIT_DUP;
-   const struct string_list_item *name;
+   struct string_list_item *name;
 
/* No need to check if there are no submodules configured */
if (!submodule_from_path(r, NULL, NULL))
@@ -1157,9 +1156,9 @@ static void calculate_changed_submodule_paths(struct 
repository *r,
 * Collect all submodules (whether checked out or not) for which new
 * commits have been recorded upstream in "changed_submodule_names".
 */
-   collect_changed_submodules(r, _submodules, );
+   collect_changed_submodules(r, changed_submodule_names, );
 
-   for_each_string_list_item(name, _submodules) {
+   for_each_string_list_item(name, changed_submodule_names) {
struct oid_array *commits = name->util;
const struct submodule *submodule;
const char *path = NULL;
@@ -1173,12 +1172,14 @@ static void calculate_changed_submodule_paths(struct 
repository *r,
if (!path)
continue;
 
-   if (!submodule_has_commits(r, path, commits))
-   string_list_append(changed_submodule_names,
-  name->string);
+   if (submodule_has_commits(r, path, commits)) {
+   oid_array_clear(commits);
+   *name->string = '\0';
+   }
}
 
-   free_submodules_oids(_submodules);
+   string_list_remove_empty_items(changed_submodule_names, 1);
+
argv_array_clear();
oid_array_clear(_tips_before_fetch);
oid_array_clear(_tips_after_fetch);
@@ -1389,7 +1390,7 @@ int fetch_populated_submodules(struct repository *r,
 
argv_array_clear();
 out:
-   string_list_clear(_submodule_names, 1);
+   free_submodules_oids(_submodule_names);
return spf.result;
 }
 
-- 
2.20.0.rc1.387.gf8505762e3-goog

[PATCH 3/9] submodule.c: sort changed_submodule_names before searching it

2018-11-28 Thread Stefan Beller

We can string_list_insert() to maintain sorted-ness of the
list as we find new items, or we can string_list_append() to
build an unsorted list and sort it at the end just once.

As we do not rely on the sortedness while building the
list, we pick the "append and sort at the end" as it
has better worst case execution times.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 submodule.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/submodule.c b/submodule.c
index bc48ea3b68..3c388f85cc 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1283,7 +1283,7 @@ static int get_next_submodule(struct child_process *cp,
case RECURSE_SUBMODULES_DEFAULT:
case RECURSE_SUBMODULES_ON_DEMAND:
if (!submodule ||
-   !unsorted_string_list_lookup(
+   !string_list_lookup(
_submodule_names,
submodule->name))
continue;
@@ -1377,6 +1377,7 @@ int fetch_populated_submodules(struct repository *r,
/* default value, "--submodule-prefix" and its value are added later */
 
calculate_changed_submodule_paths(r);
+   string_list_sort(_submodule_names);
run_processes_parallel(max_parallel_jobs,
   get_next_submodule,
   fetch_start_failure,
-- 
2.20.0.rc1.387.gf8505762e3-goog

[PATCH 7/9] submodule: migrate get_next_submodule to use repository structs

2018-11-28 Thread Stefan Beller

We used to recurse into submodules, even if they were broken having
only an objects directory. The child process executed in the submodule
would fail though if the submodule was broken. This is tested via
"fetching submodule into a broken repository" in t5526.

This patch tightens the check upfront, such that we do not need
to spawn a child process to find out if the submodule is broken.

Signed-off-by: Stefan Beller 
---
 submodule.c | 56 +
 1 file changed, 44 insertions(+), 12 deletions(-)

diff --git a/submodule.c b/submodule.c
index 0c81aca6f2..77ace5e784 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1253,6 +1253,30 @@ static int get_fetch_recurse_config(const struct 
submodule *submodule,
return spf->default_option;
 }
 
+static struct repository *get_submodule_repo_for(struct repository *r,
+const struct submodule *sub)
+{
+   struct repository *ret = xmalloc(sizeof(*ret));
+
+   if (repo_submodule_init(ret, r, sub)) {
+   /*
+* No entry in .gitmodules? Technically not a submodule,
+* but historically we supported repositories that happen to be
+* in-place where a gitlink is. Keep supporting them.
+*/
+   struct strbuf gitdir = STRBUF_INIT;
+   strbuf_repo_worktree_path(, r, "%s/.git", sub->path);
+   if (repo_init(ret, gitdir.buf, NULL)) {
+   strbuf_release();
+   free(ret);
+   return NULL;
+   }
+   strbuf_release();
+   }
+
+   return ret;
+}
+
 static int get_next_submodule(struct child_process *cp,
  struct strbuf *err, void *data, void **task_cb)
 {
@@ -1260,12 +1284,11 @@ static int get_next_submodule(struct child_process *cp,
struct submodule_parallel_fetch *spf = data;
 
for (; spf->count < spf->r->index->cache_nr; spf->count++) {
-   struct strbuf submodule_path = STRBUF_INIT;
-   struct strbuf submodule_git_dir = STRBUF_INIT;
struct strbuf submodule_prefix = STRBUF_INIT;
const struct cache_entry *ce = spf->r->index->cache[spf->count];
-   const char *git_dir, *default_argv;
+   const char *default_argv;
const struct submodule *submodule;
+   struct repository *repo;
struct submodule default_submodule = SUBMODULE_INIT;
 
if (!S_ISGITLINK(ce->ce_mode))
@@ -1300,15 +1323,11 @@ static int get_next_submodule(struct child_process *cp,
continue;
}
 
-   strbuf_repo_worktree_path(_path, spf->r, "%s", 
ce->name);
-   strbuf_addf(_git_dir, "%s/.git", submodule_path.buf);
strbuf_addf(_prefix, "%s%s/", spf->prefix, ce->name);
-   git_dir = read_gitfile(submodule_git_dir.buf);
-   if (!git_dir)
-   git_dir = submodule_git_dir.buf;
-   if (is_directory(git_dir)) {
+   repo = get_submodule_repo_for(spf->r, submodule);
+   if (repo) {
child_process_init(cp);
-   cp->dir = strbuf_detach(_path, NULL);
+   cp->dir = xstrdup(repo->worktree);
prepare_submodule_repo_env(>env_array);
cp->git_cmd = 1;
if (!spf->quiet)
@@ -1319,10 +1338,23 @@ static int get_next_submodule(struct child_process *cp,
argv_array_push(>args, default_argv);
argv_array_push(>args, "--submodule-prefix");
argv_array_push(>args, submodule_prefix.buf);
+
+   repo_clear(repo);
+   free(repo);
ret = 1;
+   } else {
+   /*
+* An empty directory is normal,
+* the submodule is not initialized
+*/
+   if (S_ISGITLINK(ce->ce_mode) &&
+   !is_empty_dir(ce->name)) {
+   spf->result = 1;
+   strbuf_addf(err,
+   _("Could not access submodule 
'%s'"),
+   ce->name);
+   }
}
-   strbuf_release(_path);
-   strbuf_release(_git_dir);
strbuf_release(_prefix);
if (ret) {
spf->count++;
-- 
2.20.0.rc1.387.gf8505762e3-goog

[PATCH 8/9] submodule.c: fetch in submodules git directory instead of in worktree

2018-11-28 Thread Stefan Beller

Keep the properties introduced in 10f5c52656 (submodule: avoid
auto-discovery in prepare_submodule_repo_env(), 2016-09-01), by fixating
the git directory of the submodule.

Signed-off-by: Stefan Beller 
---
 submodule.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/submodule.c b/submodule.c
index 77ace5e784..d1b6646f42 100644
--- a/submodule.c
+++ b/submodule.c
@@ -494,6 +494,12 @@ void prepare_submodule_repo_env(struct argv_array *out)
 DEFAULT_GIT_DIR_ENVIRONMENT);
 }
 
+static void prepare_submodule_repo_env_in_gitdir(struct argv_array *out)
+{
+   prepare_submodule_repo_env_no_git_dir(out);
+   argv_array_pushf(out, "%s=.", GIT_DIR_ENVIRONMENT);
+}
+
 /* Helper function to display the submodule header line prior to the full
  * summary output. If it can locate the submodule objects directory it will
  * attempt to lookup both the left and right commits and put them into the
@@ -1327,8 +1333,8 @@ static int get_next_submodule(struct child_process *cp,
repo = get_submodule_repo_for(spf->r, submodule);
if (repo) {
child_process_init(cp);
-   cp->dir = xstrdup(repo->worktree);
-   prepare_submodule_repo_env(>env_array);
+   cp->dir = xstrdup(repo->gitdir);
+   prepare_submodule_repo_env_in_gitdir(>env_array);
cp->git_cmd = 1;
if (!spf->quiet)
strbuf_addf(err, "Fetching submodule %s%s\n",
-- 
2.20.0.rc1.387.gf8505762e3-goog

[PATCH 6/9] repository: repo_submodule_init to take a submodule struct

2018-11-28 Thread Stefan Beller

When constructing a struct repository for a submodule for some revision
of the superproject where the submodule is not contained in the index,
it may not be present in the working tree currently either. In that
situation giving a 'path' argument is not useful. Upgrade the
repo_submodule_init function to take a struct submodule instead.
The submodule struct can be obtained via submodule_from_{path, name} or
an artificial submodule struct can be passed in.

While we are at it, rename the repository struct in the repo_submodule_init
function, which is to be initialized, to a name that is not confused with
the struct submodule as easily. Perform such renames in similar functions
as well.

Also move its documentation into the header file.

Reviewed-by: Jonathan Tan 
Signed-off-by: Stefan Beller 
---
 builtin/grep.c   | 17 +++-
 builtin/ls-files.c   | 12 +
 builtin/submodule--helper.c  |  2 +-
 repository.c | 27 
 repository.h | 12 +++--
 t/helper/test-submodule-nested-repo-config.c |  8 +++---
 6 files changed, 43 insertions(+), 35 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 71df52a333..d6bd887b2d 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -404,7 +404,10 @@ static int grep_submodule(struct grep_opt *opt, struct 
repository *superproject,
  const struct object_id *oid,
  const char *filename, const char *path)
 {
-   struct repository submodule;
+   struct repository subrepo;
+   const struct submodule *sub = submodule_from_path(superproject,
+ _oid, path);
+
int hit;
 
/*
@@ -420,12 +423,12 @@ static int grep_submodule(struct grep_opt *opt, struct 
repository *superproject,
return 0;
}
 
-   if (repo_submodule_init(, superproject, path)) {
+   if (repo_submodule_init(, superproject, sub)) {
grep_read_unlock();
return 0;
}
 
-   repo_read_gitmodules();
+   repo_read_gitmodules();
 
/*
 * NEEDSWORK: This adds the submodule's object directory to the list of
@@ -437,7 +440,7 @@ static int grep_submodule(struct grep_opt *opt, struct 
repository *superproject,
 * store is no longer global and instead is a member of the repository
 * object.
 */
-   add_to_alternates_memory(submodule.objects->objectdir);
+   add_to_alternates_memory(subrepo.objects->objectdir);
grep_read_unlock();
 
if (oid) {
@@ -462,14 +465,14 @@ static int grep_submodule(struct grep_opt *opt, struct 
repository *superproject,
 
init_tree_desc(, data, size);
hit = grep_tree(opt, pathspec, , , base.len,
-   object->type == OBJ_COMMIT, );
+   object->type == OBJ_COMMIT, );
strbuf_release();
free(data);
} else {
-   hit = grep_cache(opt, , pathspec, 1);
+   hit = grep_cache(opt, , pathspec, 1);
}
 
-   repo_clear();
+   repo_clear();
return hit;
 }
 
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index c70a9c7158..583a0e1ca2 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -206,17 +206,19 @@ static void show_files(struct repository *repo, struct 
dir_struct *dir);
 static void show_submodule(struct repository *superproject,
   struct dir_struct *dir, const char *path)
 {
-   struct repository submodule;
+   struct repository subrepo;
+   const struct submodule *sub = submodule_from_path(superproject,
+ _oid, path);
 
-   if (repo_submodule_init(, superproject, path))
+   if (repo_submodule_init(, superproject, sub))
return;
 
-   if (repo_read_index() < 0)
+   if (repo_read_index() < 0)
die("index file corrupt");
 
-   show_files(, dir);
+   show_files(, dir);
 
-   repo_clear();
+   repo_clear();
 }
 
 static void show_ce(struct repository *repo, struct dir_struct *dir,
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index d38113a31a..4eceb8f040 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -2053,7 +2053,7 @@ static int ensure_core_worktree(int argc, const char 
**argv, const char *prefix)
if (!sub)
BUG("We could get the submodule handle before?");
 
-   if (repo_submodule_init(, the_repository, path))
+   if (repo_submodule_init(, the_repository, sub))
die(_("could not get a repository handle for submodule '%s'"), 
path);
 
if (!repo_config_get_string(, "core.worktree&

[PATCH 1/9] sha1-array: provide oid_array_filter

2018-11-28 Thread Stefan Beller

Helped-by: Junio C Hamano 
Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 Documentation/technical/api-oid-array.txt |  5 +
 sha1-array.c  | 17 +
 sha1-array.h  |  3 +++
 3 files changed, 25 insertions(+)

diff --git a/Documentation/technical/api-oid-array.txt 
b/Documentation/technical/api-oid-array.txt
index 9febfb1d52..c97428c2c3 100644
--- a/Documentation/technical/api-oid-array.txt
+++ b/Documentation/technical/api-oid-array.txt
@@ -48,6 +48,11 @@ Functions
is not sorted, this function has the side effect of sorting
it.
 
+`oid_array_filter`::
+   Apply the callback function `want` to each entry in the array,
+   retaining only the entries for which the function returns true.
+   Preserve the order of the entries that are retained.
+
 Examples
 
 
diff --git a/sha1-array.c b/sha1-array.c
index b94e0ec0f5..d922e94e3f 100644
--- a/sha1-array.c
+++ b/sha1-array.c
@@ -77,3 +77,20 @@ int oid_array_for_each_unique(struct oid_array *array,
}
return 0;
 }
+
+void oid_array_filter(struct oid_array *array,
+ for_each_oid_fn want,
+ void *cb_data)
+{
+   unsigned nr = array->nr, src, dst;
+   struct object_id *oids = array->oid;
+
+   for (src = dst = 0; src < nr; src++) {
+   if (want([src], cb_data)) {
+   if (src != dst)
+   oidcpy([dst], [src]);
+   dst++;
+   }
+   }
+   array->nr = dst;
+}
diff --git a/sha1-array.h b/sha1-array.h
index 232bf95017..55d016c4bf 100644
--- a/sha1-array.h
+++ b/sha1-array.h
@@ -22,5 +22,8 @@ int oid_array_for_each(struct oid_array *array,
 int oid_array_for_each_unique(struct oid_array *array,
  for_each_oid_fn fn,
  void *data);
+void oid_array_filter(struct oid_array *array,
+ for_each_oid_fn want,
+ void *cbdata);
 
 #endif /* SHA1_ARRAY_H */
-- 
2.20.0.rc1.387.gf8505762e3-goog

[PATCH 2/9] submodule.c: fix indentation

2018-11-28 Thread Stefan Beller

The submodule subsystem is really bad at staying within 80 characters.
Fix it while we are here.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 submodule.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/submodule.c b/submodule.c
index 6415cc5580..bc48ea3b68 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1271,7 +1271,8 @@ static int get_next_submodule(struct child_process *cp,
if (!submodule) {
const char *name = default_name_or_path(ce->name);
if (name) {
-   default_submodule.path = default_submodule.name 
= name;
+   default_submodule.path = name;
+   default_submodule.name = name;
submodule = _submodule;
}
}
@@ -1281,8 +1282,10 @@ static int get_next_submodule(struct child_process *cp,
default:
case RECURSE_SUBMODULES_DEFAULT:
case RECURSE_SUBMODULES_ON_DEMAND:
-   if (!submodule || 
!unsorted_string_list_lookup(_submodule_names,
-submodule->name))
+   if (!submodule ||
+   !unsorted_string_list_lookup(
+   _submodule_names,
+   submodule->name))
continue;
default_argv = "on-demand";
break;
-- 
2.20.0.rc1.387.gf8505762e3-goog

Re: [PATCH 3/5] pack-objects: add --sparse option

2018-11-28 Thread Stefan Beller

> +--sparse::
> +   Use the "sparse" algorithm to determine which objects to include in
> +   the pack. This can have significant performance benefits when 
> computing
> +   a pack to send a small change. However, it is possible that extra
> +   objects are added to the pack-file if the included commits contain
> +   certain types of direct renames.

As a user, where do I find a discussion of these walking algorithms?
(i.e. how can I tell if I can really expect to gain performance benefits,
what are the tradeoffs? "If it were strictly better than the original,
it would be on by default" might be a thought of a user)

Re: [PATCH/RFC v2 0/7] Introduce new commands switch-branch and checkout-files

2018-11-28 Thread Stefan Beller

On Wed, Nov 28, 2018 at 12:09 PM Duy Nguyen  wrote:
>
> On Wed, Nov 28, 2018 at 9:01 PM Duy Nguyen  wrote:
> > should we do
> > something about detached HEAD in this switch-branch command (or
> > whatever its name will be)?
> >
> > This is usually a confusing concept to new users
>
> And it just occurred to me that perhaps we should call this "unnamed
> branch" (at least at high UI level) instead of detached HEAD. It is
> technically not as accurate, but much better to understand.

or 'direct' branch? I mean 'detached HEAD' itself is also not correct
as the HEAD points to a valid commit/tag usually, so it is attached to
that content. The detachment comes from the implicit "from a branch".

Re: Git pull confusing output

2018-11-28 Thread Stefan Beller

On Tue, Nov 27, 2018 at 10:31 PM Junio C Hamano  wrote:
>
> Will  writes:
>
> > I’m far from being a guru, but I consider myself a competent Git
> > user. Yet, here’s my understanding of the output of one the most-used
> > commands, `git push`:
> >> Counting objects: 6, done.
> > No idea what an “object” is. Apparently there’s 6 of them
> > here. What does “counting” them means? Should I care?
>
> You vaguely recall that the last time you pushed you saw ~400
> objects counted there, so you get the feeling how active you were
> since then.
>
> It is up to you if you are interested in such a feel of the level of
> activity.  "git fetch" (hence "git pull") would also give you a
> similar "feel", e.g. "the last fetch was ~1200 objects and today's
> is mere ~200, so it seems it is a relatively slow day".
>
> As to "what is an object?", there are plenty of Git tutorials and
> books to learn the basics from.  Again, it is up to you if you care.

While this is very interesting to the experienced git user, the
approximation of activity by object count is very coarse to say at least.

As It approximates changes in the DAG object count and nothing
about the deltas (which as we learn comes later and it comes with
a progress meter in bytes), it only provides the basics.

>
> I think we have "--quiet" option for those who do not care.

I think some users are not focused as much on the version control as
much as they are focused on another problem that is solved with
the content inside the repo.

Which means they only care about 'actionable' output, such as
* errors
* information provided by remote
  (e.g. links to click to start a code review)
* too long waiting time
  (so they can abort and inspect the problem)

I would suggest we come up with a mode that is "not quiet", but
cuts down to only the basic actionable items [and make that
the default eventually].

Now these actionable items depend on the workflow used,
for which I think an email based maintainers workflow is not
the norm. The vast majority of people uses git-push to
upload their change to a code review system instead.
And for such a workflow the size (as proxied by
object/delta count) is not as important as the target ref
that you push to or potentially the diffstat output of
a potential merge to a target branch.

TLDR: I still think making git-push a bit more quiet is
beneficial to the user base at large.

Stefan

Re: [bug report] git-gui child windows are blank

2018-11-28 Thread Stefan Beller

On Wed, Nov 28, 2018 at 6:13 AM Kenn Sebesta  wrote:
>
> v2.19.2, installed from brew on macOS Mojave 14.2.1.
>
> `git-gui` is my much beloved go-to tool for everything git.
> Unfortunately, on my new Macbook Air it seems to have a bug. When I
> first load the program, the parent window populates normally with the
> stage/unstaged and diff panes. However, when I click Push, the child
> window is completely blank. The frame is there, but there is no
> content.
>
> This same behavior is seen if I do a `git gui blame`, where the
> primary blame window opens normally but all the children windows are
> empty.
>
> I have googled but was unsuccessful in finding a solution. Is this a
> known issue?

Looking through the mailing list archive, this
seemed to be one of the last relevant messges
https://public-inbox.org/git/20180724065120.7664-1-sunsh...@sunshineco.com/

>
>
> --Kenn

Re: [PATCH v2 6/7] checkout: split into switch-branch and checkout-files

2018-11-28 Thread Stefan Beller

On Wed, Nov 28, 2018 at 7:31 AM Duy Nguyen  wrote:
>
> On Wed, Nov 28, 2018 at 7:03 AM Junio C Hamano  wrote:
> >
> > Nguyễn Thái Ngọc Duy   writes:
> >
> > > The good old "git checkout" command is still here and will be until
> > > all (or most of users) are sick of it.
> >
> > Two comments on the goal (the implementation looked reasonable
> > assuming the reader agrees with the gaol).
> >
> > At least to me, the verb "switch" needs two things to switch
> > between, i.e. "switch A and B", unless it is "switch to X".
> > Either "switch-to-branch" or simply "switch-to", perhaps?
> >
> > As I already hinted in my response to Stefan (?) about
> > checkout-from-tree vs checkout-from-index, a command with multiple
> > modes of operation is not confusing to people with the right mental
> > model, and I suspect that having two separate commands for "checking
> > out a branch" and "checking out paths" that is done by this step
> > would help users to form the right mental model.
>
> Since the other one is already "checkout-files", maybe this one could
> just be "checkout-branch".

I dislike the checkout-* names, as we already have checkout-index
as plumbing, so it would be confusing as to which checkout-* command
should be used when and why as it seems the co-index moves
content *from index* to the working tree, but the co-files moves content
*to files*, whereas checkout-branch is neither 'moving' to or from a branch
but rather 'switching' to that branch.

Re: [PATCH v2 4/7] checkout: move dwim_new_local_branch to checkout_opts

2018-11-27 Thread Stefan Beller

On Tue, Nov 27, 2018 at 8:53 AM Nguyễn Thái Ngọc Duy  wrote:
>
> Signed-off-by: Nguyễn Thái Ngọc Duy 

I would not mind to have this squashed into the previous patch
but keeping it separated is fine, too.
(Reason for squashing: it makes it clearer that we do not
care about one specific option, but have to treat all the loose
options the same way.)

Re: [PATCH v2 3/7] checkout: move 'confict_style' to checkout_opts

2018-11-27 Thread Stefan Beller

On Tue, Nov 27, 2018 at 8:53 AM Nguyễn Thái Ngọc Duy  wrote:
>

The last patches seemed self explanatory after the first RFC
and their commit message. This one is harder to reason about,
as --conflict is documented as "The same as --merge option
above, but ..." and --merge is "When switching branches, ..."
so ... ah my mind wandered off, expecting this to be a preparation
for the separate commands already, but this is just about ensuring
everything is in opts such that the split can be done later?

Re: [PATCH v2 1/7] parse-options: allow parse_options_concat(NULL, options)

2018-11-27 Thread Stefan Beller

On Tue, Nov 27, 2018 at 8:53 AM Nguyễn Thái Ngọc Duy  wrote:
>
> There is currently no caller that calls this function with "a" being
> NULL. But it will be introduced shortly. It is used to construct the
> option array from scratch, e.g.
>
>struct parse_options opts = NULL;
>opts = parse_options_concat(opts, opts_1);
>opts = parse_options_concat(opts, opts_2);

While this addresses the immediate needs, I'd prefer to think
about the API exposure of parse_options_concat,
(related: do we want to have docs in its header file?)
and I'd recommend to make it symmetrical, i.e.
allow the second argument to also be NULL?

In the example given here, you'd just short it to

struct parse_options opts = opts_1;
opts = parse_options_concat(opts, opts_2);

if not for this patch. Are opts_{1,2} ever be NULL?

Re: Git pull confusing output

2018-11-27 Thread Stefan Beller

On Tue, Nov 27, 2018 at 8:52 AM Will  wrote:

> And even them, do they need this info every time they push?

I agree that we should make the output a bit more user friendly,
which means we'd only want to output relevant data for the user.

The different phases taking each one line takes up precious
screen real estate, so another approach would be delete the line
after one phase is finished, such that you'd only see the currently
active phase (that can be useful for debugging as in "The phase of
'Writing objects' takes very long" -> slow network connection).

> I feel like a less intimidating output would help, while showing info
> about objects and deltas with the verbose flag:

I agree that most information in pushing is not very useful
and could be omitted. This helps in multiple ways:
* it keeps the focus on the actually important information,
   see bf1a11f0a1 (sideband: highlight keywords in remote
   sideband output, 2018-08-07)
* less space in a terminal wasted, such that you can scroll over
   it better

> > Compressing… done

After the push succeeded this information would not be useful
any more, it is only useful during the compression phase
(Does it progress quickly enough? or does it error out?)

Slightly related (but applies mostly to fetch, for which this
discussion can also be had):
When fetching, these informations are generated on the
remote side (as the server needs to create the packfile
according to your local state that you negotiated with the
server), which takes some time. Sending over this
information also keeps the connection alive. This is only
relevant in corner cases depending on the setup of the
hosting provider/repository, but it led to commits such as
https://eclipse.googlesource.com/jgit/jgit/+/a38531b21c7e2b0dc956e0ed1bfc9513f604273c
in the java implementation of Git.

> > Pushing to github.com:williamdclt/some-repo.git… done
> > 1ca9aaa..4320d30  master -> master
>
>
> I’d be more than happy to work on this (`git push` is an example
> amongst so many other), but want the mailing list’s opinion on it. Am
> I wrong in thinking that this output is not something users want, am I
> fighting windmills or maybe just being ignorant?

I think this would be a useful patch, but it could get complicated
quickly: push uses other low level git commands to prepare the
packfile to be sent to the server, currently it only needs to pipe
through the output of the low level command (or even have the
low level command directly write to the terminal).

The output of those low level commands should not be changed
when run on their own, I would assume.

So maybe the best way to dive into understanding what happens
under the hood in git-push is to run

  GIT_TRACE=1 git push ...

and see what child processes are invoked (e.g.
run_command: git pack-objects --all-progress-implied)
and then we'd need to change the output of iff the
specific progress flag is given.

Stefan

Re: Confusing behavior with ignored submodules and `git commit -a`

2018-11-26 Thread Stefan Beller

On Thu, Nov 15, 2018 at 4:31 PM Michael Forney  wrote:
>
> On 2018-11-15, Stefan Beller  wrote:
> > On Thu, Nov 15, 2018 at 1:33 PM Michael Forney  wrote:
> >> Well, currently the submodule config can be disabled in diff_flags by
> >> setting override_submodule_config=1. However, I'm thinking it may be
> >> simpler to selectively *enable* the submodule config in diff_flags
> >> where it is needed instead of disabling it everywhere else (i.e.
> >> use_submodule_config instead of override_submodule_config).
> >
> > This sounds like undoing the good(?) part of the series that introduced
> > this regression, as before that we selectively loaded the submodule
> > config, which lead to confusion when you forgot it. Selectively *enabling*
> > the submodule config sounds like that state before?
> >
> > Or do we *only* talk about enabling the ignore flag, while loading the
> > rest of the submodule config automatic?
>
> Yes, that is what I meant. I believe the automatic loading of
> submodule config is the right thing to do, it just uncovered cases
> where the current handling of override_submodule_config is not quite
> sufficient.

That sounds good.

>
> My suggestion of replacing override_submodule_config with
> use_submodule_config is because it seems like there are fewer places
> where we want to apply the ignore config than not. I think it should
> only apply in diffs against the working tree and when staging changes
> to the index (unless specified explicitly). The documentation just
> mentions the "diff family", but all but one of the possible values for
> submodule..ignore ("all") don't make sense unless comparing with
> the working tree. This is also how show/log -p behaved in git <2.15.
> So I think that clarifying that it is about modifications *to the
> working tree* would be a good idea.
>
> >> I'm also starting to see why this is tricky. The only difference that
> >> diff.c:run_diff_files sees between `git add inner` and `git add --all`
> >> is whether the index entry matched the pathspec exactly or not.
> >
> > Unrelated to the trickiness, I think we'd need to document the behavior
> > of the -a flag in git-add and git-commit better as adding the diff below
> > will depart from the "all" rule again, which I thought was a strong
> > motivator for Brandons series (IIRC).
>
> Can you explain what you mean by the "all" rule?

-a as short for --all in git commit, and I presumed it to be
'--all-content', but documentation tells me it's about files
only, so there is no really an "all" rule, but rather me misunderstanding
to what it applies.

Re: [RFC] Introduce two new commands, switch-branch and restore-paths

2018-11-26 Thread Stefan Beller

On Mon, Nov 26, 2018 at 8:01 AM Ævar Arnfjörð Bjarmason
 wrote:
>
>
> On Tue, Nov 20 2018, Duy Nguyen wrote:
>
> > On Mon, Nov 19, 2018 at 04:19:53PM +0100, Duy Nguyen wrote:
> >> I promise to come back with something better (at least it still
> >> sounds better in my mind). If that idea does not work out, we can
> >> come back and see if we can improve this.
> >
> > So this is it. The patch isn't pretty, mostly as a proof of
> > concept. Just look at the three functions at the bottom of checkout.c,
> > which is the main thing.
> >
> > This patch tries to split "git checkout" command in two new ones:
> >
> > - git switch-branch is all about switching branches
>
> Isn't this going to also take the other ref arguments 'git checkout'
> takes now? I.e. tags, detached HEADs etc? I'm reminded of the discussion
> about what "range-diff" should be called :)

Heh, good call. :-)
Note that the color of a bikeshed has fewer functional implications
than coming up with proper names in your API exposed to millions
of users, as cognitive associations playing mind tricks, can have a
huge impact on their productivity. ;-)

In a neighboring thread there is discussion of the concept of a
'change' (and evolving the change locally), which is yet another
thing in the refs-space.

'switch-branch' sounds like a good name for a beginner who is just
getting started, but as soon as they discover that there is more than
branches (detached HEAD via commits, tags,
remote tracking "branches"), this name may be confusing.
So it would not be a good choice for the intermediate Git user.
The old time power user would not care as they have 'checkout'
in their muscle memory, aliased to 'co', but maybe they'd find
it nice for explaining to new users.

So I'd be thrilled to find a name that serves users on all levels.

Maybe we need to step back and consider what the command does.
And from that I would name it "rewire-HEAD-and-update-index"
and then simplify from there. For the beginner user, the concept of
HEAD might be overwhelming, such that we don't want to have that
in there.

So I'd be tempted to suggest to call it "switch-to-ref", but that would
be wrong in the corner case as well: When using that with a remote
tracking branch, you don't "switch to it" by putting it into your HEAD,
but you merely checkout the commit that it's pointing at.

>
> > - git restore-paths (maybe restore-file is better) for checking out
> >   paths

"content-to-path", maybe(?) as it moves the content (as given by commit
or implicitly assuming the index when omitted) into that path(, again).
(I am not enthused about this, as you can similarly argue for
content-to-paths, content-to-worktree, which then could split up into
"index-to-worktree [pathspec]" as well as "tree-to-worktree ".
also the notion of X-to-Y seems a novel concept in our naming, so maybe
verb-noun is better, hence restore-path or "fix-paths" may be better)

> > Later on, we could start to add a bit more stuff in, e.g. some form of
> > disambiguation is no longer needed when running as switch-branch, or
> > restore-paths.
> >
> > So, what do you think?

The patch looks interestingly small :-)

> That "git checkout" does too many things is something that keeps coming
> up in online discussions about Git's UI. Two things:
>
> a) It would really help to have some comparison of cases where these
>split commands are much clearer or less ambiguous than
>git-checkout. I can think of some (e.g. branch with the same name as
>a file) but having some overall picture of what the new UI looks like
>with solved / not solved cases would be nice. Also a comparison with
>other SCMs people find less confusing (svn, hg, bzr, ...)

How do other SCMs solve this issue? (What is their design space?
How many commands do they have for what git-checkout does
all-in-one?)

> b) I think we really need to have some end-game where we'd actually
>switch away from "checkout" (which we could still auto-route to new
>commands in perpetuity, but print a warning or error). Otherwise
>we'll just end up with https://xkcd.com/927/ and more UI confusion
>for all.

Heh, that situation is only avoided when the new command has clear
advantages over the old, and ISTM that we can only compete on
UX and better defaults, so maybe I'd push for making it more logical,
maybe so:

  git tree-to-worktree # git checkout  -- 
  git index-to-worktree # git checkout -- 
  git rev-to-ref # git checkout 

Just food for thought, specifically the last one would be
hilarious if we'd end up with it.

Stefan

Re: [PATCH 2/5] ieot: default to not writing IEOT section

2018-11-26 Thread Stefan Beller

On Mon, Nov 26, 2018 at 1:48 PM Ben Peart  wrote:
>
>
>
> On 11/26/2018 2:59 PM, Stefan Beller wrote:
> >>> +static int record_ieot(void)
> >>> +{
> >>> + int val;
> >>> +
> >>
> >> Initialize stack val to zero to ensure proper default.
> >
> > I don't think that is needed here, as we only use `val` when
> > we first write to it via git_config_get_bool.
> >
> > Did you spot this via code review and thought of
> > defensive programming or is there a tool that
> > has a false positive here?
> >
>
> Code review and defensive programming.  I had to review the code in
> git_config_get_bool() to see if it always initialized the val even if it
> didn't find the requested config variable (esp since we don't pass in a
> default value for this function like we do others).
>

Ah, sorry to have sent out this email, which I found as one of the
earliest discussions in my mailbox. The later patches/discussions
became a lot more heated from my cursory skimming and sorted
out this as well.

It is interesting to notice that, as I also had to lookup how the config
machinery works (once? a couple times?) but now it is so hardcoded
in my brain to assume that if functions like git_config_* take the
branch, we can access the value that the config function was supposed
to read into.

Sorry for the noise,
Stefan

Re: What's cooking in git.git (Nov 2018, #06; Wed, 21)

2018-11-26 Thread Stefan Beller

>
> * sb/submodule-recursive-fetch-gets-the-tip (2018-10-31) 11 commits
> [...]
>
>  "git fetch --recurse-submodules" may not fetch the necessary commit
>  that is bound to the superproject, which is getting corrected.
>
>  Is the discussion on this topic over?  What was the outcome?

Please don't merge down the topic in the current state.

I have a local updated version sitting on my disk and plan to send
it once I made sure it addressed all comments of various revisions.

Re: [PATCH v2 0/9] diff --color-moved-ws fixes and enhancment

2018-11-26 Thread Stefan Beller

On Fri, Nov 23, 2018 at 3:17 AM Phillip Wood  wrote:
>
> From: Phillip Wood 
>
> Thanks to Stefan for his feedback on v1. I've updated patches 2 & 8 in
> response to those comments - see the range-diff below for details (the
> patch numbers are off by one in the range diff, I think because the
> first patch is unchanged and so it was used as the merge base by
> --range-diff=.

`git range-diff` accepts a three dotted "range" OLD...NEW
as an easy abbreviation for the arguments
"COMMON..OLD COMMON..NEW" and the common element is
computed as the last common element. It doesn't have knowledge
about where you started your topic branch.

> For some reason the range-diff also includes
> the notes even though I did not give --notes to format-patch)

This is interesting.
The existence of notes.rewrite. seems to work well
with the range-diff then, as the config would trigger the copy-over
of notes and then range-diff would diff the original notes to the new
notes.

>
> When trying out the new --color-moved-ws=allow-indentation-change I
> was disappointed to discover it did not work if the indentation
> contains a mix of spaces and tabs. This series reworks it so that it
> does.
>

The range-diff looks good to me.

Thanks,
Stefan

Re: [PATCH v1 1/2] log -G: Ignore binary files

2018-11-26 Thread Stefan Beller

On Wed, Nov 21, 2018 at 1:08 PM Thomas Braun
 wrote:
>
> The -G  option of log looks for the differences whose patch text
> contains added/removed lines that match regex.
>
> The concept of differences only makes sense for text files, therefore
> we need to ignore binary files when searching with -G  as well.

What about partial text/partial binary files?

I recall using text searching tools (not necessarily git machinery,
my memory is fuzzy) to check for strings in pdf files, which are
usually marked binary in context of git, such that we do not
see their diffs in `log -p`.

But I would expect a search with -G or -S to still work...
until I find the exception in the docs, only to wonder if
there is a switch to turn off this optimisation for this
corner case.

Stefan

Re: [PATCH 2/5] ieot: default to not writing IEOT section

2018-11-26 Thread Stefan Beller

> > +static int record_ieot(void)
> > +{
> > + int val;
> > +
>
> Initialize stack val to zero to ensure proper default.

I don't think that is needed here, as we only use `val` when
we first write to it via git_config_get_bool.

Did you spot this via code review and thought of
defensive programming or is there a tool that
has a false positive here?

>
> > + if (!git_config_get_bool("index.recordoffsettable", ))
> > + return val;
> > + return 0;
> > +}

Re: [PATCH] grep: use grep_opt->repo instead of explict repo argument

2018-11-26 Thread Stefan Beller

On Sun, Nov 18, 2018 at 8:38 AM Nguyễn Thái Ngọc Duy  wrote:
>
> This command is probably the first one that operates on a repository
> other than the_repository, in f9ee2fcdfa (grep: recurse in-process
> using 'struct repository' - 2017-08-02). An explicit 'struct
> repository *' was added in that commit to pass around the repository
> that we're supposed to grep from.
>
> Since 38bbc2ea39 (grep.c: remove implicit dependency on the_index -
> 2018-09-21). 'struct grep_opt *' carries in itself a repository
> parameter for grepping. We should now be able to reuse grep_opt to
> hold the submodule repo instead of a separate argument, which is just
> room for mistakes.

That makes sense. Assuming we did not make a mistake yet, the
test suite would not need changes (further assuming we do test for it,
but we do as we have explicit submodule grep tests).

>
> While at there, use the right reference instead of the_repository and
> the_index in this code. I was a bit careless in my attempt to kick
> the_repository / the_index out of library code. It's normally safe to
> just stick the_repository / the_index in bultin/ code, but it's not
> the case for grep.

Reviewed-by: Stefan Beller 

> Signed-off-by: Nguyễn Thái Ngọc Duy

Re: [PATCH v1 9/9] diff --color-moved-ws: handle blank lines

2018-11-20 Thread Stefan Beller

On Fri, Nov 16, 2018 at 3:04 AM Phillip Wood  wrote:
>
> From: Phillip Wood 
>
> When using --color-moved-ws=allow-indentation-change allow lines with
> the same indentation change to be grouped across blank lines. For now
> this only works if the blank lines have been moved as well, not for
> blocks that have just had their indentation changed.
>
> This completes the changes to the implementation of
> --color-moved=allow-indentation-change. Running
>
>   git diff --color-moved=allow-indentation-change v2.18.0 v2.19.0
>
> now takes 5.0s. This is a saving of 41% from 8.5s for the optimized
> version of the previous implementation and 66% from the original which
> took 14.6s.
>
> Signed-off-by: Phillip Wood 
> ---
>
> Notes:
> Changes since rfc:
>  - Split these changes into a separate commit.
>  - Detect blank lines when processing the indentation rather than
>parsing each line twice.
>  - Tweaked the test to make it harder as suggested by Stefan.
>  - Added timing data to the commit message.
>
>  diff.c | 34 ---
>  t/t4015-diff-whitespace.sh | 41 ++
>  2 files changed, 68 insertions(+), 7 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index 89559293e7..072b5bced6 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -792,9 +792,11 @@ static void moved_block_clear(struct moved_block *b)
> memset(b, 0, sizeof(*b));
>  }
>
> +#define INDENT_BLANKLINE INT_MIN

Answering my question from the previous patch:
This is why we need to keep the indents signed.

This patch looks quite nice to read along.

The whole series looks good to me.
Do we need to update the docs in any way?

Thanks,
Stefan

Re: [PATCH v1 8/9] diff --color-moved-ws: modify allow-indentation-change

2018-11-16 Thread Stefan Beller

On Fri, Nov 16, 2018 at 3:04 AM Phillip Wood  wrote:
>
> From: Phillip Wood 
>
> Currently diff --color-moved-ws=allow-indentation-change does not
> support indentation that contains a mix of tabs and spaces. For
> example in commit 546f70f377 ("convert.h: drop 'extern' from function
> declaration", 2018-06-30) the function parameters in the following
> lines are not colored as moved [1].
>
> -extern int stream_filter(struct stream_filter *,
> -const char *input, size_t *isize_p,
> -char *output, size_t *osize_p);
> +int stream_filter(struct stream_filter *,
> + const char *input, size_t *isize_p,
> + char *output, size_t *osize_p);
>
> This commit changes the way the indentation is handled to track the
> visual size of the indentation rather than the characters in the
> indentation. This has they benefit that any whitespace errors do not

s/they/the/

> interfer with the move detection (the whitespace errors will still be
> highlighted according to --ws-error-highlight). During the discussion
> of this feature there were concerns about the correct detection of
> indentation for python. However those concerns apply whether or not
> we're detecting moved lines so no attempt is made to determine if the
> indentation is 'pythonic'.
>
> [1] Note that before the commit to fix the erroneous coloring of moved
> lines each line was colored as a different block, since that commit
> they are uncolored.
>
> Signed-off-by: Phillip Wood 
> ---
>
> Notes:
> Changes since rfc:
>  - It now replaces the existing implementation rather than adding a new
>mode.
>  - The indentation deltas are now calculated once for each line and
>cached.
>  - Optimized the whitespace delta comparison to compare string lengths
>before comparing the actual strings.
>  - Modified the calculation of tabs as suggested by Stefan.
>  - Split out the blank line handling into a separate commit as suggest
>by Stefan.
>  - Fixed some comments pointed out by Stefan.
>
>  diff.c | 130 +
>  t/t4015-diff-whitespace.sh |  56 
>  2 files changed, 129 insertions(+), 57 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index c378ce3daf..89559293e7 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -750,6 +750,8 @@ struct emitted_diff_symbol {
> const char *line;
> int len;
> int flags;
> +   int indent_off;
> +   int indent_width;

So this is the trick how we compute the ws related
data only once per line. :-)

On the other hand, we do not save memory by disabling
the ws detection, but I guess that is not a problem for now.

Would it make sense to have the new variables be
unsigned? (Also a comment on what they are, I
needed to read the code to understand off to be
offset into the line, where the content starts, and
width to be the visual width, as I did not recall
the RFC.)

> +static void fill_es_indent_data(struct emitted_diff_symbol *es)
> [...]

> +   if (o->color_moved_ws_handling &
> +   COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
> +   fill_es_indent_data(>emitted_symbols->buf[n]);

Nice.

By reducing the information kept around to ints, we also do not need
to alloc/free
memory for each line.

> +++ b/t/t4015-diff-whitespace.sh
> @@ -1901,4 +1901,60 @@ test_expect_success 'compare whitespace delta 
> incompatible with other space opti
> test_i18ngrep allow-indentation-change err
>  '
>
> +test_expect_success 'compare mixed whitespace delta across moved blocks' '

Looks good,

Thanks!
Stefan

Re: [PATCH v1 7/9] diff --color-moved-ws: optimize allow-indentation-change

2018-11-16 Thread Stefan Beller

On Fri, Nov 16, 2018 at 3:04 AM Phillip Wood  wrote:
>
> From: Phillip Wood 
>
> When running
>
>   git diff --color-moved-ws=allow-indentation-change v2.18.0 v2.19.0
>
> cmp_in_block_with_wsd() is called 694908327 times. Of those 42.7%
> return after comparing a and b. By comparing the lengths first we can
> return early in all but 0.03% of those cases without dereferencing the
> string pointers. The comparison between a and c fails in 6.8% of
> calls, by comparing the lengths first we reject all the failing calls
> without dereferencing the string pointers.
>
> This reduces the time to run the command above by by 42% from 14.6s to
> 8.5s. This is still much slower than the normal --color-moved which
> takes ~0.6-0.7s to run but is a significant improvement.
>
> The next commits will replace the current implementation with one that
> works with mixed tabs and spaces in the indentation. I think it is
> worth optimizing the current implementation first to enable a fair
> comparison between the two implementations.

Up to here the series looks good and I think we could take it
as a preparatory self-standing series.

I'll read on.
Thanks,
Stefan

Re: [PATCH v1 2/9] diff: use whitespace consistently

2018-11-16 Thread Stefan Beller

On Fri, Nov 16, 2018 at 3:04 AM Phillip Wood  wrote:
>
> From: Phillip Wood 
>
> Most of the documentation uses 'whitespace' rather than 'white space'
> or 'white spaces' convert to latter two to the former for consistency.

Makes sense; this doesn't touch docs, but also code.
$ git grep "white space" yields some other places
as well (Documentation/git-cat-file.txt and lots in t/)
But I guess we keep it to this feature for now instead
of a tree wide cleanup.

Stefan

Re: Confusing behavior with ignored submodules and `git commit -a`

2018-11-15 Thread Stefan Beller

On Thu, Nov 15, 2018 at 1:33 PM Michael Forney  wrote:
>
> On 2018-11-15, Stefan Beller  wrote:
> > On Wed, Nov 14, 2018 at 10:05 PM Michael Forney 
> > wrote:
> >> Looking at ff6f1f564c, I don't really see anything that might be
> >> related to git-add, git-reset, or git-diff, so I'm guessing that this
> >> only worked before because the submodule config wasn't getting loaded
> >> during `git add` or `git reset`. Now that the config is loaded
> >> automatically, submodule..ignore started taking effect where it
> >> shouldn't.
> >>
> >> Unfortunately, this doesn't really get me much closer to finding a fix.
> >
> > Maybe selectively unloading or overwriting the config?
> >
> > Or we can change is_submodule_ignored() in diff.c
> > to be only applied selectively whether we are running the
> > right command? For this approach we'd have to figure out the
> > set of commands to which the ignore config should apply or
> > not (and come up with a more concise documentation then)
> >
> > This approach sounds appealing to me as it would cover
> > new commands as well and we'd only have a central point
> > where the decision for ignoring is made.
>
> Well, currently the submodule config can be disabled in diff_flags by
> setting override_submodule_config=1. However, I'm thinking it may be
> simpler to selectively *enable* the submodule config in diff_flags
> where it is needed instead of disabling it everywhere else (i.e.
> use_submodule_config instead of override_submodule_config).

This sounds like undoing the good(?) part of the series that introduced
this regression, as before that we selectively loaded the submodule
config, which lead to confusion when you forgot it. Selectively *enabling*
the submodule config sounds like that state before?

Or do we *only* talk about enabling the ignore flag, while loading the
rest of the submodule config automatic?

> I'm also starting to see why this is tricky. The only difference that
> diff.c:run_diff_files sees between `git add inner` and `git add --all`
> is whether the index entry matched the pathspec exactly or not.

Unrelated to the trickiness, I think we'd need to document the behavior
of the -a flag in git-add and git-commit better as adding the diff below
will depart from the "all" rule again, which I thought was a strong
motivator for Brandons series (IIRC).

> Here is a work-in-progress diff that seems to have the correct
> behavior in all cases I tried. Can you think of any cases that it
> breaks? I'm not quite sure of the consequences of having diff_change
> and diff_addremove always ignore the submodule config; git-diff and
> git-status still seem to work correctly.
>
> diff --git a/builtin/add.c b/builtin/add.c
> index f65c17229..9902f7742 100644
> --- a/builtin/add.c
> +++ b/builtin/add.c
> @@ -117,7 +117,6 @@ int add_files_to_cache(const char *prefix,
> rev.diffopt.output_format = DIFF_FORMAT_CALLBACK;
> rev.diffopt.format_callback = update_callback;
> rev.diffopt.format_callback_data = 
> -   rev.diffopt.flags.override_submodule_config = 1;

This line partially reverts 5556808, taking 02f2f56bc377c28
into account.

> diff --git a/diff-lib.c b/diff-lib.c
> index 83fce5151..fbb048cca 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -68,12 +68,13 @@ static int check_removed(const struct cache_entry
> *ce, struct stat *st)
>  static int match_stat_with_submodule(struct diff_options *diffopt,
>  const struct cache_entry *ce,
>  struct stat *st, unsigned ce_option,
> -unsigned *dirty_submodule)
> +unsigned *dirty_submodule,
> +int exact)
> [...];

This is an interesting take so far as it is all about *detecting* change
here via stat information and not like the previous (before the regression)
where it was about correcting output.

match_stat_with_submodule would grow its documentation to be
slightly more complicated as a result.

> diff --git a/diff.c b/diff.c
> index e38d1ecaf..73dc75286 100644
> --- a/diff.c
> +++ b/diff.c
> [...]
> -static int is_submodule_ignored(const char *path, struct diff_options 
> *options)
> -{
> [...]
> -   if (S_ISGITLINK(mode) && is_submodule_ignored(concatpath, options))
> +   if (S_ISGITLINK(mode) && options->flags.ignore_submodules)
> return;

This basically inlines the function is_submodule_ignored,
except for the part:

if (!options->flags.override_submodule_config)
set_diffopt_flags_from_submodule_config(options, path);

but that was taken care off in match_stat_with_submodule in diff-lib?

This WIP looks really promising, thanks for looking into this!
Stefan

Re: [PATCH 18/23] submodule: use submodule repos for object lookup

2018-11-15 Thread Stefan Beller

On Thu, Nov 15, 2018 at 11:54 AM Jonathan Tan  wrote:
>
> > +/*
> > + * Initialize 'out' based on the provided submodule path.
> > + *
> > + * Unlike repo_submodule_init, this tolerates submodules not present
> > + * in .gitmodules. This function exists only to preserve historical 
> > behavior,
> > + *
> > + * Returns 0 on success, -1 when the submodule is not present.
> >   */
> > -static void show_submodule_header(struct diff_options *o, const char *path,
> > +static struct repository *open_submodule(const char *path)
>
> The function documentation needs to be reworded - there's no "out", and
> the return value is now a possibly NULL pointer to struct repository.

Noted.

>
> > +{
> > + struct strbuf sb = STRBUF_INIT;
> > + struct repository *out = xmalloc(sizeof(*out));
> > +
> > + if (submodule_to_gitdir(, path) || repo_init(out, sb.buf, NULL)) {
> > + strbuf_release();
> > + free(out);
> > + return NULL;
> > + }
> > +
> > + out->submodule_prefix = xstrdup(path);
>
> I've discussed this submodule_prefix line before [1] - do we really need
> this? Tests pass even if I remove this line.

We might not need it yet as the tests indicate, but it's the right thing to do:
/*
 * Path from the root of the top-level superproject down to this
 * repository.  This is only non-NULL if the repository is initialized
 * as a submodule of another repository.
 */

We're not (yet) using this string in our error reporting, but
I anticipate that we'll do eventually.

> Other than that, this patch looks good.

Thanks,
Stefan

Re: Confusing behavior with ignored submodules and `git commit -a`

2018-11-15 Thread Stefan Beller

On Wed, Nov 14, 2018 at 10:05 PM Michael Forney  wrote:
>
> +bmwill
>
> On 2018-11-14, Michael Forney  wrote:
> > On 2018-10-25, Stefan Beller  wrote:
> >> I guess reverting that commit is not a good idea now, as
> >> I would expect something to break.
> >>
> >> Maybe looking through the series 614ea03a71
> >> (Merge branch 'bw/submodule-config-cleanup', 2017-08-26)
> >> to understand why it happened in the context would be a good start.
> >
> > Thanks, that's a good idea. I'll take a look through that series.
>
> Interesting. If I build git from master after reverting 55568086, I do
> indeed observe the issue it claims to fix (unable to add ignored
> submodules). However, if I build from 9ef23f91fc (the immediate parent
> of 55568086), I do not see the issue.
>
> Investigating this further, it seems that 55568086 addresses an issue
> that does not appear until later on in the series in ff6f1f564c
> (submodule-config: lazy-load a repository's .gitmodules file). Perhaps
> this was a result of reordering commits during a rebase. In other
> words, I get correct behavior until 55568086, and in
> 55568086..ff6f1f564c^ if I revert 55568086.
>
> Looking at ff6f1f564c, I don't really see anything that might be
> related to git-add, git-reset, or git-diff, so I'm guessing that this
> only worked before because the submodule config wasn't getting loaded
> during `git add` or `git reset`. Now that the config is loaded
> automatically, submodule..ignore started taking effect where it
> shouldn't.
>
> Unfortunately, this doesn't really get me much closer to finding a fix.

Maybe selectively unloading or overwriting the config?

Or we can change is_submodule_ignored() in diff.c
to be only applied selectively whether we are running the
right command? For this approach we'd have to figure out the
set of commands to which the ignore config should apply or
not (and come up with a more concise documentation then)

This approach sounds appealing to me as it would cover
new commands as well and we'd only have a central point
where the decision for ignoring is made.

Stefan

Re: Confusing behavior with ignored submodules and `git commit -a`

2018-11-15 Thread Stefan Beller

> I have a git repository which contains a number of submodules that
> refer to external repositories. Some of these repositories need to
> patched in some way, so patches are stored alongside the submodules,
> and are applied when building. This mostly works fine, but causes
> submodules to show up as modified in `git status` and get updated with
> `git commit -a`. To resolve this, I've added `ignore = all` to
> .gitmodules for all the submodules that need patches applied. This
> way, I can explicitly `git add` the submodule when I want to update
> the base commit, but otherwise pretend that they are clean. This has
> worked pretty well for me, but less so since git 2.15 when this issue
> was introduced.

> > This is really bad. git-status and git-commit share some code,
> > and we'll populate the commit message with a status output.
> > So it seems reasonable to expect the status and the commit to match,
> > i.e. if status tells me there is no change, then commit should not record
> > the submodule update.
>
> I just checked and if I don't specify a message on the command-line,
> the status output in the message template *does* mention that `inner`
> is getting updated.

That's good.

> >> > There have been a couple occasions where I accidentally pushed local
> >> > changes to ignored submodules because of this. Since they don't show
> >> > up in the log output, it is difficult to figure out what actually has
> >> > gone wrong.
> >
> > How was it prevented before? Just by git commit -a not picking up the
> > submodule change?
>
> Yes. Previously, `git commit -a` would not pick up the change (unless
> I added it explicitly with `git add`), and `git log` would still show
> changes to ignored submodules (which is the behavior I want).

and both are broken currently (commit -a will commit a submodule if
it is changed, and it will also not show that in log, but it did show that
it is committing it in the commit message template)

> I just came across someone else affected by this issue:
> https://github.com/git/git/commit/55568086#commitcomment-27137460

Point taken.

Re: [PATCH v2 1/1] bundle: cleanup lock files on error

2018-11-14 Thread Stefan Beller

On Wed, Nov 14, 2018 at 1:43 PM Martin Ågren  wrote:
>
> On Wed, 14 Nov 2018 at 16:26, Gaël Lhez via GitGitGadget
>  wrote:
> > However, the `.lock` file was still open and on Windows that means
> > that it could not be deleted properly. This patch fixes that issue.
>
> Hmmm, doesn't the tempfile machinery remove the lock file when we die?

On Windows this seems not to be the case. (Open files cannot be deleted
as the open file is not kept by inode or similar but by the file path there?)

Rewording your concern: Could the tempfile machinery be taught to
work properly on Windows, e.g. by first closing all files and then deleting
them afterwards?

There was a refactoring of tempfiles merged in 89563ec379
(Merge branch 'jk/incore-lockfile-removal', 2017-09-19), which sounded
promising at first, as it is dated after the original patch[1] date
(June 2016), but it has no references for Windows being different,
so we might still have the original issue; most of the lockfile
infrastructure was done in 2015 via db86e61cbb (Merge branch
'mh/tempfile', 2015-08-25)

[1] https://github.com/git-for-windows/git/pull/797

[PATCH 21/23] commit: prepare free_commit_buffer and release_commit_memory for any repo

2018-11-13 Thread Stefan Beller

Pass the object pool to free_commit_buffer and release_commit_memory,
such that we can eliminate access to 'the_repository'.

Also remove the TODO in release_commit_memory, as commit->util was
removed in 9d2c97016f (commit.h: delete 'util' field in struct commit,
2018-05-19)

Signed-off-by: Stefan Beller 
---
 builtin/fsck.c | 3 ++-
 builtin/log.c  | 6 --
 builtin/rev-list.c | 3 ++-
 commit.c   | 9 -
 commit.h   | 4 ++--
 object.c   | 2 +-
 6 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 06eb421720..c476ac6983 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -382,7 +382,8 @@ static int fsck_obj(struct object *obj, void *buffer, 
unsigned long size)
if (obj->type == OBJ_TREE)
free_tree_buffer((struct tree *)obj);
if (obj->type == OBJ_COMMIT)
-   free_commit_buffer((struct commit *)obj);
+   free_commit_buffer(the_repository->parsed_objects,
+  (struct commit *)obj);
return err;
 }
 
diff --git a/builtin/log.c b/builtin/log.c
index 061d4fd864..64c2649c7c 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -395,7 +395,8 @@ static int cmd_log_walk(struct rev_info *rev)
 * We may show a given commit multiple times when
 * walking the reflogs.
 */
-   free_commit_buffer(commit);
+   free_commit_buffer(the_repository->parsed_objects,
+  commit);
free_commit_list(commit->parents);
commit->parents = NULL;
}
@@ -1922,7 +1923,8 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
open_next_file(rev.numbered_files ? NULL : commit, NULL, 
, quiet))
die(_("Failed to create output files"));
shown = log_tree_commit(, commit);
-   free_commit_buffer(commit);
+   free_commit_buffer(the_repository->parsed_objects,
+  commit);
 
/* We put one extra blank line between formatted
 * patches and this flag is used by log-tree code
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index cc1b70522f..2b301fa315 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -196,7 +196,8 @@ static void finish_commit(struct commit *commit, void *data)
free_commit_list(commit->parents);
commit->parents = NULL;
}
-   free_commit_buffer(commit);
+   free_commit_buffer(the_repository->parsed_objects,
+  commit);
 }
 
 static inline void finish_object__ma(struct object *obj)
diff --git a/commit.c b/commit.c
index 7d2f3a9a93..4fe74aa4bc 100644
--- a/commit.c
+++ b/commit.c
@@ -328,10 +328,10 @@ void repo_unuse_commit_buffer(struct repository *r,
free((void *)buffer);
 }
 
-void free_commit_buffer(struct commit *commit)
+void free_commit_buffer(struct parsed_object_pool *pool, struct commit *commit)
 {
struct commit_buffer *v = buffer_slab_peek(
-   the_repository->parsed_objects->buffer_slab, commit);
+   pool->buffer_slab, commit);
if (v) {
FREE_AND_NULL(v->buffer);
v->size = 0;
@@ -354,13 +354,12 @@ struct object_id *get_commit_tree_oid(const struct commit 
*commit)
return _commit_tree(commit)->object.oid;
 }
 
-void release_commit_memory(struct commit *c)
+void release_commit_memory(struct parsed_object_pool *pool, struct commit *c)
 {
c->maybe_tree = NULL;
c->index = 0;
-   free_commit_buffer(c);
+   free_commit_buffer(pool, c);
free_commit_list(c->parents);
-   /* TODO: what about commit->util? */
 
c->object.parsed = 0;
 }
diff --git a/commit.h b/commit.h
index 2e6b799b26..d2779a23f6 100644
--- a/commit.h
+++ b/commit.h
@@ -140,7 +140,7 @@ void repo_unuse_commit_buffer(struct repository *r,
 /*
  * Free any cached object buffer associated with the commit.
  */
-void free_commit_buffer(struct commit *);
+void free_commit_buffer(struct parsed_object_pool *pool, struct commit *);
 
 struct tree *get_commit_tree(const struct commit *);
 struct object_id *get_commit_tree_oid(const struct commit *);
@@ -149,7 +149,7 @@ struct object_id *get_commit_tree_oid(const struct commit 
*);
  * Release memory related to a commit, including the parent list and
  * any cached object buffer.
  */
-void release_commit_memory(struct commit *c);
+void release_commit_memory(struct parsed_object_pool *pool, struct commit *c);
 
 /*
  * Disassociate any cached object buffer from the commit, but do not free it.
diff --git a/object.c b/object.c
index 003f870484..c4170d2d0c 100644
--- a/object.c
+++ b/object.c

[PATCH 18/23] submodule: use submodule repos for object lookup

2018-11-13 Thread Stefan Beller

This converts the 'show_submodule_header' function to use
the repository API properly, such that the submodule objects
are not added to the main object store.

Signed-off-by: Stefan Beller 
---
 submodule.c | 73 ++---
 1 file changed, 58 insertions(+), 15 deletions(-)

diff --git a/submodule.c b/submodule.c
index d9d3046689..262f03f118 100644
--- a/submodule.c
+++ b/submodule.c
@@ -443,7 +443,7 @@ static int prepare_submodule_summary(struct rev_info *rev, 
const char *path,
return prepare_revision_walk(rev);
 }
 
-static void print_submodule_summary(struct rev_info *rev, struct diff_options 
*o)
+static void print_submodule_summary(struct repository *r, struct rev_info 
*rev, struct diff_options *o)
 {
static const char format[] = "  %m %s";
struct strbuf sb = STRBUF_INIT;
@@ -454,7 +454,8 @@ static void print_submodule_summary(struct rev_info *rev, 
struct diff_options *o
ctx.date_mode = rev->date_mode;
ctx.output_encoding = get_log_output_encoding();
strbuf_setlen(, 0);
-   format_commit_message(commit, format, , );
+   repo_format_commit_message(r, commit, format, ,
+ );
strbuf_addch(, '\n');
if (commit->object.flags & SYMMETRIC_LEFT)
diff_emit_submodule_del(o, sb.buf);
@@ -481,14 +482,44 @@ void prepare_submodule_repo_env(struct argv_array *out)
 DEFAULT_GIT_DIR_ENVIRONMENT);
 }
 
-/* Helper function to display the submodule header line prior to the full
- * summary output. If it can locate the submodule objects directory it will
- * attempt to lookup both the left and right commits and put them into the
- * left and right pointers.
+/*
+ * Initialize 'out' based on the provided submodule path.
+ *
+ * Unlike repo_submodule_init, this tolerates submodules not present
+ * in .gitmodules. This function exists only to preserve historical behavior,
+ *
+ * Returns 0 on success, -1 when the submodule is not present.
  */
-static void show_submodule_header(struct diff_options *o, const char *path,
+static struct repository *open_submodule(const char *path)
+{
+   struct strbuf sb = STRBUF_INIT;
+   struct repository *out = xmalloc(sizeof(*out));
+
+   if (submodule_to_gitdir(, path) || repo_init(out, sb.buf, NULL)) {
+   strbuf_release();
+   free(out);
+   return NULL;
+   }
+
+   out->submodule_prefix = xstrdup(path);
+
+   strbuf_release();
+   return out;
+}
+
+/*
+ * Helper function to display the submodule header line prior to the full
+ * summary output.
+ *
+ * If it can locate the submodule git directory it will create a repository
+ * handle for the submodule and lookup both the left and right commits and
+ * put them into the left and right pointers.
+ */
+static void show_submodule_header(struct diff_options *o,
+   const char *path,
struct object_id *one, struct object_id *two,
unsigned dirty_submodule,
+   struct repository *sub,
struct commit **left, struct commit **right,
struct commit_list **merge_bases)
 {
@@ -507,7 +538,7 @@ static void show_submodule_header(struct diff_options *o, 
const char *path,
else if (is_null_oid(two))
message = "(submodule deleted)";
 
-   if (add_submodule_odb(path)) {
+   if (!sub) {
if (!message)
message = "(commits not present)";
goto output_header;
@@ -517,8 +548,8 @@ static void show_submodule_header(struct diff_options *o, 
const char *path,
 * Attempt to lookup the commit references, and determine if this is
 * a fast forward or fast backwards update.
 */
-   *left = lookup_commit_reference(the_repository, one);
-   *right = lookup_commit_reference(the_repository, two);
+   *left = lookup_commit_reference(sub, one);
+   *right = lookup_commit_reference(sub, two);
 
/*
 * Warn about missing commits in the submodule project, but only if
@@ -528,7 +559,7 @@ static void show_submodule_header(struct diff_options *o, 
const char *path,
 (!is_null_oid(two) && !*right))
message = "(commits not present)";
 
-   *merge_bases = get_merge_bases(*left, *right);
+   *merge_bases = repo_get_merge_bases(sub, *left, *right);
if (*merge_bases) {
if ((*merge_bases)->item == *left)
fast_forward = 1;
@@ -562,16 +593,18 @@ void show_submodule_summary(struct diff_options *o, const 
char *path,
struct rev_info rev;
struct commit *left = NULL, *right = NULL;
struct commit_list *merge_bases = NULL;
+   struct repository *sub;
 
+   sub = open_

[PATCH 22/23] path.h: make REPO_GIT_PATH_FUNC repository agnostic

2018-11-13 Thread Stefan Beller

git_pathdup uses the_repository internally, but the macro
REPO_GIT_PATH_FUNC is specifically made for arbitrary repositories.
Switch to repo_git_path which works on arbitrary repositories.

Signed-off-by: Stefan Beller 
---
 path.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/path.h b/path.h
index b654ea8ff5..651e6157fc 100644
--- a/path.h
+++ b/path.h
@@ -165,7 +165,7 @@ extern void report_linked_checkout_garbage(void);
const char *git_path_##var(struct repository *r) \
{ \
if (!r->cached_paths.var) \
-   r->cached_paths.var = git_pathdup(filename); \
+   r->cached_paths.var = repo_git_path(r, filename); \
return r->cached_paths.var; \
}
 
-- 
2.19.1.1215.g8438c0b245-goog

[PATCH 20/23] commit-graph: convert remaining functions to handle any repo

2018-11-13 Thread Stefan Beller

Convert all functions to handle arbitrary repositories in commit-graph.c
that are used by functions taking a repository argument already.

Notable exclusion is write_commit_graph and its local functions as that
only works on the_repository.

Signed-off-by: Stefan Beller 
---
 commit-graph.c | 40 
 1 file changed, 24 insertions(+), 16 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 40c855f185..f78a8e96b5 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -292,7 +292,8 @@ static int bsearch_graph(struct commit_graph *g, struct 
object_id *oid, uint32_t
g->chunk_oid_lookup, g->hash_len, pos);
 }
 
-static struct commit_list **insert_parent_or_die(struct commit_graph *g,
+static struct commit_list **insert_parent_or_die(struct repository *r,
+struct commit_graph *g,
 uint64_t pos,
 struct commit_list **pptr)
 {
@@ -303,7 +304,7 @@ static struct commit_list **insert_parent_or_die(struct 
commit_graph *g,
die("invalid parent position %"PRIu64, pos);
 
hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos);
-   c = lookup_commit(the_repository, );
+   c = lookup_commit(r, );
if (!c)
die(_("could not find commit %s"), oid_to_hex());
c->graph_pos = pos;
@@ -317,7 +318,9 @@ static void fill_commit_graph_info(struct commit *item, 
struct commit_graph *g,
item->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
 }
 
-static int fill_commit_in_graph(struct commit *item, struct commit_graph *g, 
uint32_t pos)
+static int fill_commit_in_graph(struct repository *r,
+   struct commit *item,
+   struct commit_graph *g, uint32_t pos)
 {
uint32_t edge_value;
uint32_t *parent_data_ptr;
@@ -341,13 +344,13 @@ static int fill_commit_in_graph(struct commit *item, 
struct commit_graph *g, uin
edge_value = get_be32(commit_data + g->hash_len);
if (edge_value == GRAPH_PARENT_NONE)
return 1;
-   pptr = insert_parent_or_die(g, edge_value, pptr);
+   pptr = insert_parent_or_die(r, g, edge_value, pptr);
 
edge_value = get_be32(commit_data + g->hash_len + 4);
if (edge_value == GRAPH_PARENT_NONE)
return 1;
if (!(edge_value & GRAPH_OCTOPUS_EDGES_NEEDED)) {
-   pptr = insert_parent_or_die(g, edge_value, pptr);
+   pptr = insert_parent_or_die(r, g, edge_value, pptr);
return 1;
}
 
@@ -355,7 +358,7 @@ static int fill_commit_in_graph(struct commit *item, struct 
commit_graph *g, uin
  4 * (uint64_t)(edge_value & GRAPH_EDGE_LAST_MASK));
do {
edge_value = get_be32(parent_data_ptr);
-   pptr = insert_parent_or_die(g,
+   pptr = insert_parent_or_die(r, g,
edge_value & GRAPH_EDGE_LAST_MASK,
pptr);
parent_data_ptr++;
@@ -374,7 +377,9 @@ static int find_commit_in_graph(struct commit *item, struct 
commit_graph *g, uin
}
 }
 
-static int parse_commit_in_graph_one(struct commit_graph *g, struct commit 
*item)
+static int parse_commit_in_graph_one(struct repository *r,
+struct commit_graph *g,
+struct commit *item)
 {
uint32_t pos;
 
@@ -382,7 +387,7 @@ static int parse_commit_in_graph_one(struct commit_graph 
*g, struct commit *item
return 1;
 
if (find_commit_in_graph(item, g, ))
-   return fill_commit_in_graph(item, g, pos);
+   return fill_commit_in_graph(r, item, g, pos);
 
return 0;
 }
@@ -391,7 +396,7 @@ int parse_commit_in_graph(struct repository *r, struct 
commit *item)
 {
if (!prepare_commit_graph(r))
return 0;
-   return parse_commit_in_graph_one(r->objects->commit_graph, item);
+   return parse_commit_in_graph_one(r, r->objects->commit_graph, item);
 }
 
 void load_commit_graph_info(struct repository *r, struct commit *item)
@@ -403,19 +408,22 @@ void load_commit_graph_info(struct repository *r, struct 
commit *item)
fill_commit_graph_info(item, r->objects->commit_graph, pos);
 }
 
-static struct tree *load_tree_for_commit(struct commit_graph *g, struct commit 
*c)
+static struct tree *load_tree_for_commit(struct repository *r,
+struct commit_graph *g,
+struct commit *c)
 {
struct object_id oid;
const unsigned char *commit_data = g->chunk_commit_data +

[PATCH 23/23] t/helper/test-repository: celebrate independence from the_repository

2018-11-13 Thread Stefan Beller

dade47c06c (commit-graph: add repo arg to graph readers, 2018-07-11)
brought more independence from the_repository to the commit graph, however
it was not completely independent of the_repository, as the previous
patches show.

To ensure we're not accessing the_repository by accident, we'd ideally
assign NULL to the_repository to trigger a segfault on access.

We currently have a temporary hack in cache.h, which relies on
the_hash_algo (which is a short form of the_repository->hash_algo) to
be set, so we cannot do that. The next best thing is to set all fields of
the_repository to 0, so any accidental access is more likely to be found.

Signed-off-by: Stefan Beller 
---
 cache.h|  2 ++
 t/helper/test-repository.c | 10 ++
 2 files changed, 12 insertions(+)

diff --git a/cache.h b/cache.h
index 59c8a93046..8864d7ec15 100644
--- a/cache.h
+++ b/cache.h
@@ -1033,6 +1033,8 @@ static inline int hashcmp(const unsigned char *sha1, 
const unsigned char *sha2)
 *
 * This will need to be extended or ripped out when we learn about
 * hashes of different sizes.
+*
+* When ripping this out, see TODO in test-repository.c.
 */
if (the_hash_algo->rawsz != 20)
BUG("hash size not yet supported by hashcmp");
diff --git a/t/helper/test-repository.c b/t/helper/test-repository.c
index 6a84a53efb..f7f8618445 100644
--- a/t/helper/test-repository.c
+++ b/t/helper/test-repository.c
@@ -17,6 +17,11 @@ static void test_parse_commit_in_graph(const char *gitdir, 
const char *worktree,
 
setup_git_env(gitdir);
 
+   memset(the_repository, 0, sizeof(*the_repository));
+
+   /* TODO: Needed for temporary hack in hashcmp, see 183a638b7da. */
+   repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+
if (repo_init(, gitdir, worktree))
die("Couldn't init repo");
 
@@ -43,6 +48,11 @@ static void test_get_commit_tree_in_graph(const char *gitdir,
 
setup_git_env(gitdir);
 
+   memset(the_repository, 0, sizeof(*the_repository));
+
+   /* TODO: Needed for temporary hack in hashcmp, see 183a638b7da. */
+   repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+
if (repo_init(, gitdir, worktree))
die("Couldn't init repo");
 
-- 
2.19.1.1215.g8438c0b245-goog

[PATCH 02/23] packfile: allow has_packed_and_bad to handle arbitrary repositories

2018-11-13 Thread Stefan Beller

has_packed_and_bad is not widely used, so just migrate it all at once.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 packfile.c  | 5 +++--
 packfile.h  | 2 +-
 sha1-file.c | 2 +-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/packfile.c b/packfile.c
index 841b36182f..bc2e0f5043 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1131,12 +1131,13 @@ void mark_bad_packed_object(struct packed_git *p, const 
unsigned char *sha1)
p->num_bad_objects++;
 }
 
-const struct packed_git *has_packed_and_bad(const unsigned char *sha1)
+const struct packed_git *has_packed_and_bad(struct repository *r,
+   const unsigned char *sha1)
 {
struct packed_git *p;
unsigned i;
 
-   for (p = the_repository->objects->packed_git; p; p = p->next)
+   for (p = r->objects->packed_git; p; p = p->next)
for (i = 0; i < p->num_bad_objects; i++)
if (hasheq(sha1,
   p->bad_object_sha1 + the_hash_algo->rawsz * 
i))
diff --git a/packfile.h b/packfile.h
index 442625723d..7a62d72231 100644
--- a/packfile.h
+++ b/packfile.h
@@ -146,7 +146,7 @@ extern int packed_object_info(struct repository *r,
  off_t offset, struct object_info *);
 
 extern void mark_bad_packed_object(struct packed_git *p, const unsigned char 
*sha1);
-extern const struct packed_git *has_packed_and_bad(const unsigned char *sha1);
+extern const struct packed_git *has_packed_and_bad(struct repository *r, const 
unsigned char *sha1);
 
 /*
  * Iff a pack file in the given repository contains the object named by sha1,
diff --git a/sha1-file.c b/sha1-file.c
index b8ce21cbaf..856e000ee1 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -1432,7 +1432,7 @@ void *read_object_file_extended(const struct object_id 
*oid,
die(_("loose object %s (stored in %s) is corrupt"),
oid_to_hex(repl), path);
 
-   if ((p = has_packed_and_bad(repl->hash)) != NULL)
+   if ((p = has_packed_and_bad(the_repository, repl->hash)) != NULL)
die(_("packed object %s (stored in %s) is corrupt"),
oid_to_hex(repl), p->pack_name);
 
-- 
2.19.1.1215.g8438c0b245-goog

[PATCH 17/23] pretty: prepare format_commit_message to handle arbitrary repositories

2018-11-13 Thread Stefan Beller

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 contrib/coccinelle/the_repository.pending.cocci | 10 ++
 pretty.c| 15 ---
 pretty.h|  7 ++-
 3 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/contrib/coccinelle/the_repository.pending.cocci 
b/contrib/coccinelle/the_repository.pending.cocci
index f5b42cfc62..2ee702ecf7 100644
--- a/contrib/coccinelle/the_repository.pending.cocci
+++ b/contrib/coccinelle/the_repository.pending.cocci
@@ -132,3 +132,13 @@ expression G;
 - logmsg_reencode(
 + repo_logmsg_reencode(the_repository,
   E, F, G);
+
+@@
+expression E;
+expression F;
+expression G;
+expression H;
+@@
+- format_commit_message(
++ repo_format_commit_message(the_repository,
+  E, F, G, H);
diff --git a/pretty.c b/pretty.c
index b359b68750..3240495308 100644
--- a/pretty.c
+++ b/pretty.c
@@ -1508,9 +1508,10 @@ void userformat_find_requirements(const char *fmt, 
struct userformat_want *w)
strbuf_release();
 }
 
-void format_commit_message(const struct commit *commit,
-  const char *format, struct strbuf *sb,
-  const struct pretty_print_context *pretty_ctx)
+void repo_format_commit_message(struct repository *r,
+   const struct commit *commit,
+   const char *format, struct strbuf *sb,
+   const struct pretty_print_context *pretty_ctx)
 {
struct format_commit_context context;
const char *output_enc = pretty_ctx->output_encoding;
@@ -1524,9 +1525,9 @@ void format_commit_message(const struct commit *commit,
 * convert a commit message to UTF-8 first
 * as far as 'format_commit_item' assumes it in UTF-8
 */
-   context.message = logmsg_reencode(commit,
- _encoding,
- utf8);
+   context.message = repo_logmsg_reencode(r, commit,
+  _encoding,
+  utf8);
 
strbuf_expand(sb, format, format_commit_item, );
rewrap_message_tail(sb, , 0, 0, 0);
@@ -1550,7 +1551,7 @@ void format_commit_message(const struct commit *commit,
}
 
free(context.commit_encoding);
-   unuse_commit_buffer(commit, context.message);
+   repo_unuse_commit_buffer(r, commit, context.message);
 }
 
 static void pp_header(struct pretty_print_context *pp,
diff --git a/pretty.h b/pretty.h
index 7359d318a9..e6625269cf 100644
--- a/pretty.h
+++ b/pretty.h
@@ -103,9 +103,14 @@ void pp_remainder(struct pretty_print_context *pp, const 
char **msg_p,
  * Put the result to "sb".
  * Please use this function for custom formats.
  */
-void format_commit_message(const struct commit *commit,
+void repo_format_commit_message(struct repository *r,
+   const struct commit *commit,
const char *format, struct strbuf *sb,
const struct pretty_print_context *context);
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define format_commit_message(c, f, s, con) \
+   repo_format_commit_message(the_repository, c, f, s, con)
+#endif
 
 /*
  * Parse given arguments from "arg", check it for correctness and
-- 
2.19.1.1215.g8438c0b245-goog

[PATCH 16/23] commit: prepare logmsg_reencode to handle arbitrary repositories

2018-11-13 Thread Stefan Beller

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 commit.h|  8 
 contrib/coccinelle/the_repository.pending.cocci |  9 +
 pretty.c| 13 +++--
 3 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/commit.h b/commit.h
index 57375e3239..2e6b799b26 100644
--- a/commit.h
+++ b/commit.h
@@ -180,6 +180,14 @@ extern int has_non_ascii(const char *text);
 extern const char *logmsg_reencode(const struct commit *commit,
   char **commit_encoding,
   const char *output_encoding);
+const char *repo_logmsg_reencode(struct repository *r,
+const struct commit *commit,
+char **commit_encoding,
+const char *output_encoding);
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define logmsg_reencode(c, enc, out) repo_logmsg_reencode(the_repository, c, 
enc, out)
+#endif
+
 extern const char *skip_blank_lines(const char *msg);
 
 /** Removes the first commit from a list sorted by date, and adds all
diff --git a/contrib/coccinelle/the_repository.pending.cocci 
b/contrib/coccinelle/the_repository.pending.cocci
index 516f19ffee..f5b42cfc62 100644
--- a/contrib/coccinelle/the_repository.pending.cocci
+++ b/contrib/coccinelle/the_repository.pending.cocci
@@ -123,3 +123,12 @@ expression F;
 - unuse_commit_buffer(
 + repo_unuse_commit_buffer(the_repository,
   E, F);
+
+@@
+expression E;
+expression F;
+expression G;
+@@
+- logmsg_reencode(
++ repo_logmsg_reencode(the_repository,
+  E, F, G);
diff --git a/pretty.c b/pretty.c
index 8ca29e9281..b359b68750 100644
--- a/pretty.c
+++ b/pretty.c
@@ -595,14 +595,15 @@ static char *replace_encoding_header(char *buf, const 
char *encoding)
return strbuf_detach(, NULL);
 }
 
-const char *logmsg_reencode(const struct commit *commit,
-   char **commit_encoding,
-   const char *output_encoding)
+const char *repo_logmsg_reencode(struct repository *r,
+const struct commit *commit,
+char **commit_encoding,
+const char *output_encoding)
 {
static const char *utf8 = "UTF-8";
const char *use_encoding;
char *encoding;
-   const char *msg = get_commit_buffer(commit, NULL);
+   const char *msg = repo_get_commit_buffer(r, commit, NULL);
char *out;
 
if (!output_encoding || !*output_encoding) {
@@ -630,7 +631,7 @@ const char *logmsg_reencode(const struct commit *commit,
 * the cached copy from get_commit_buffer, we need to duplicate 
it
 * to avoid munging the cached copy.
 */
-   if (msg == get_cached_commit_buffer(the_repository, commit, 
NULL))
+   if (msg == get_cached_commit_buffer(r, commit, NULL))
out = xstrdup(msg);
else
out = (char *)msg;
@@ -644,7 +645,7 @@ const char *logmsg_reencode(const struct commit *commit,
 */
out = reencode_string(msg, output_encoding, use_encoding);
if (out)
-   unuse_commit_buffer(commit, msg);
+   repo_unuse_commit_buffer(r, commit, msg);
}
 
/*
-- 
2.19.1.1215.g8438c0b245-goog

[PATCH 14/23] commit: prepare get_commit_buffer to handle any repo

2018-11-13 Thread Stefan Beller

Signed-off-by: Stefan Beller 
---
 commit.c| 8 +---
 commit.h| 7 ++-
 contrib/coccinelle/the_repository.pending.cocci | 8 
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/commit.c b/commit.c
index 7a931d7fd4..4034def16c 100644
--- a/commit.c
+++ b/commit.c
@@ -297,13 +297,15 @@ const void *get_cached_commit_buffer(struct repository 
*r, const struct commit *
return v->buffer;
 }
 
-const void *get_commit_buffer(const struct commit *commit, unsigned long 
*sizep)
+const void *repo_get_commit_buffer(struct repository *r,
+  const struct commit *commit,
+  unsigned long *sizep)
 {
-   const void *ret = get_cached_commit_buffer(the_repository, commit, 
sizep);
+   const void *ret = get_cached_commit_buffer(r, commit, sizep);
if (!ret) {
enum object_type type;
unsigned long size;
-   ret = read_object_file(>object.oid, , );
+   ret = repo_read_object_file(r, >object.oid, , 
);
if (!ret)
die("cannot read commit object %s",
oid_to_hex(>object.oid));
diff --git a/commit.h b/commit.h
index 08935f9a19..591a77a5bb 100644
--- a/commit.h
+++ b/commit.h
@@ -117,7 +117,12 @@ const void *get_cached_commit_buffer(struct repository *, 
const struct commit *,
  * from disk. The resulting memory should not be modified, and must be given
  * to unuse_commit_buffer when the caller is done.
  */
-const void *get_commit_buffer(const struct commit *, unsigned long *size);
+const void *repo_get_commit_buffer(struct repository *r,
+  const struct commit *,
+  unsigned long *size);
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define get_commit_buffer(c, s) repo_get_commit_buffer(the_repository, c, s)
+#endif
 
 /*
  * Tell the commit subsytem that we are done with a particular commit buffer.
diff --git a/contrib/coccinelle/the_repository.pending.cocci 
b/contrib/coccinelle/the_repository.pending.cocci
index 8c6a71bf64..4018e6eaf7 100644
--- a/contrib/coccinelle/the_repository.pending.cocci
+++ b/contrib/coccinelle/the_repository.pending.cocci
@@ -107,3 +107,11 @@ expression G;
 - in_merge_bases_many(
 + repo_in_merge_bases_many(the_repository,
   E, F, G);
+
+@@
+expression E;
+expression F;
+@@
+- get_commit_buffer(
++ repo_get_commit_buffer(the_repository,
+  E, F);
-- 
2.19.1.1215.g8438c0b245-goog

[PATCH 07/23] commit: allow parse_commit* to handle any repo

2018-11-13 Thread Stefan Beller

Just like the previous commit, parse_commit and friends are used a lot
and are found in new patches, so we cannot change their signature easily.

Re-introduce these function prefixed with 'repo_' that take a repository
argument and keep the original as a shallow macro.

Signed-off-by: Stefan Beller 
---
 commit.c  | 18 --
 commit.h  | 17 +
 .../coccinelle/the_repository.pending.cocci   | 24 +++
 3 files changed, 48 insertions(+), 11 deletions(-)

diff --git a/commit.c b/commit.c
index dc8a39d52a..7a931d7fd4 100644
--- a/commit.c
+++ b/commit.c
@@ -443,7 +443,10 @@ int parse_commit_buffer(struct repository *r, struct 
commit *item, const void *b
return 0;
 }
 
-int parse_commit_internal(struct commit *item, int quiet_on_missing, int 
use_commit_graph)
+int repo_parse_commit_internal(struct repository *r,
+  struct commit *item,
+  int quiet_on_missing,
+  int use_commit_graph)
 {
enum object_type type;
void *buffer;
@@ -454,9 +457,9 @@ int parse_commit_internal(struct commit *item, int 
quiet_on_missing, int use_com
return -1;
if (item->object.parsed)
return 0;
-   if (use_commit_graph && parse_commit_in_graph(the_repository, item))
+   if (use_commit_graph && parse_commit_in_graph(r, item))
return 0;
-   buffer = read_object_file(>object.oid, , );
+   buffer = repo_read_object_file(r, >object.oid, , );
if (!buffer)
return quiet_on_missing ? -1 :
error("Could not read %s",
@@ -467,18 +470,19 @@ int parse_commit_internal(struct commit *item, int 
quiet_on_missing, int use_com
 oid_to_hex(>object.oid));
}
 
-   ret = parse_commit_buffer(the_repository, item, buffer, size, 0);
+   ret = parse_commit_buffer(r, item, buffer, size, 0);
if (save_commit_buffer && !ret) {
-   set_commit_buffer(the_repository, item, buffer, size);
+   set_commit_buffer(r, item, buffer, size);
return 0;
}
free(buffer);
return ret;
 }
 
-int parse_commit_gently(struct commit *item, int quiet_on_missing)
+int repo_parse_commit_gently(struct repository *r,
+struct commit *item, int quiet_on_missing)
 {
-   return parse_commit_internal(item, quiet_on_missing, 1);
+   return repo_parse_commit_internal(r, item, quiet_on_missing, 1);
 }
 
 void parse_commit_or_die(struct commit *item)
diff --git a/commit.h b/commit.h
index 1d260d62f5..08935f9a19 100644
--- a/commit.h
+++ b/commit.h
@@ -79,12 +79,21 @@ struct commit *lookup_commit_reference_by_name(const char 
*name);
 struct commit *lookup_commit_or_die(const struct object_id *oid, const char 
*ref_name);
 
 int parse_commit_buffer(struct repository *r, struct commit *item, const void 
*buffer, unsigned long size, int check_graph);
-int parse_commit_internal(struct commit *item, int quiet_on_missing, int 
use_commit_graph);
-int parse_commit_gently(struct commit *item, int quiet_on_missing);
-static inline int parse_commit(struct commit *item)
+int repo_parse_commit_internal(struct repository *r, struct commit *item,
+  int quiet_on_missing, int use_commit_graph);
+int repo_parse_commit_gently(struct repository *r,
+struct commit *item,
+int quiet_on_missing);
+static inline int repo_parse_commit(struct repository *r, struct commit *item)
 {
-   return parse_commit_gently(item, 0);
+   return repo_parse_commit_gently(r, item, 0);
 }
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define parse_commit_internal(item, quiet, use) 
repo_parse_commit_internal(the_repository, item, quiet, use)
+#define parse_commit_gently(item, quiet) 
repo_parse_commit_gently(the_repository, item, quiet)
+#define parse_commit(item) repo_parse_commit(the_repository, item)
+#endif
+
 void parse_commit_or_die(struct commit *item);
 
 struct buffer_slab;
diff --git a/contrib/coccinelle/the_repository.pending.cocci 
b/contrib/coccinelle/the_repository.pending.cocci
index 46f3a1b23a..b185fe0a1d 100644
--- a/contrib/coccinelle/the_repository.pending.cocci
+++ b/contrib/coccinelle/the_repository.pending.cocci
@@ -40,3 +40,27 @@ expression F;
 - has_object_file_with_flags(
 + repo_has_object_file_with_flags(the_repository,
   E)
+
+@@
+expression E;
+expression F;
+expression G;
+@@
+- parse_commit_internal(
++ repo_parse_commit_internal(the_repository,
+  E, F, G)
+
+@@
+expression E;
+expression F;
+@@
+- parse_commit_gently(
++ repo_parse_commit_gently(the_repository,
+  E, F)
+
+@@
+expression E;
+@@
+- parse_commit(
++ repo_parse_commit(the_repository,
+  E)
-- 
2.19.1.1215.g8438c0b245-goog

[PATCH 19/23] submodule: don't add submodule as odb for push

2018-11-13 Thread Stefan Beller

In push_submodule(), because we do not actually need access to objects
in the submodule, do not invoke add_submodule_odb().
(for_each_remote_ref_submodule() does not require access to those
objects, and the actual push is done by spawning another process,
which handles object access by itself.)

This code of push_submodule() is exercised in t5531 and continues
to work, showing that the submodule odbc is not needed.

Signed-off-by: Stefan Beller 
---
 submodule.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/submodule.c b/submodule.c
index 262f03f118..5818088df2 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1021,9 +1021,6 @@ static int push_submodule(const char *path,
  const struct string_list *push_options,
  int dry_run)
 {
-   if (add_submodule_odb(path))
-   return 1;
-
if (for_each_remote_ref_submodule(path, has_remote, NULL) > 0) {
struct child_process cp = CHILD_PROCESS_INIT;
argv_array_push(, "push");
-- 
2.19.1.1215.g8438c0b245-goog

[PATCH 08/23] commit-reach.c: allow paint_down_to_common to handle any repo

2018-11-13 Thread Stefan Beller

As the function is file local and not widely used, migrate it all at once.

Signed-off-by: Stefan Beller 
---
 commit-reach.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 9f79ce0a22..67c2e43d5e 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -30,7 +30,8 @@ static int queue_has_nonstale(struct prio_queue *queue)
 }
 
 /* all input commits in one and twos[] must have been parsed! */
-static struct commit_list *paint_down_to_common(struct commit *one, int n,
+static struct commit_list *paint_down_to_common(struct repository *r,
+   struct commit *one, int n,
struct commit **twos,
int min_generation)
 {
@@ -83,7 +84,7 @@ static struct commit_list *paint_down_to_common(struct commit 
*one, int n,
parents = parents->next;
if ((p->object.flags & flags) == flags)
continue;
-   if (parse_commit(p))
+   if (repo_parse_commit(r, p))
return NULL;
p->object.flags |= flags;
prio_queue_put(, p);
@@ -116,7 +117,7 @@ static struct commit_list *merge_bases_many(struct commit 
*one, int n, struct co
return NULL;
}
 
-   list = paint_down_to_common(one, n, twos, 0);
+   list = paint_down_to_common(the_repository, one, n, twos, 0);
 
while (list) {
struct commit *commit = pop_commit();
@@ -187,8 +188,8 @@ static int remove_redundant(struct commit **array, int cnt)
if (array[j]->generation < min_generation)
min_generation = array[j]->generation;
}
-   common = paint_down_to_common(array[i], filled, work,
- min_generation);
+   common = paint_down_to_common(the_repository, array[i], filled,
+ work, min_generation);
if (array[i]->object.flags & PARENT2)
redundant[i] = 1;
for (j = 0; j < filled; j++)
@@ -322,7 +323,9 @@ int in_merge_bases_many(struct commit *commit, int 
nr_reference, struct commit *
if (commit->generation > min_generation)
return ret;
 
-   bases = paint_down_to_common(commit, nr_reference, reference, 
commit->generation);
+   bases = paint_down_to_common(the_repository, commit,
+nr_reference, reference,
+commit->generation);
if (commit->object.flags & PARENT2)
ret = 1;
clear_commit_marks(commit, all_flags);
-- 
2.19.1.1215.g8438c0b245-goog

[PATCH 05/23] object-store: prepare has_{sha1, object}_file to handle any repo

2018-11-13 Thread Stefan Beller

Signed-off-by: Stefan Beller 
---
 .../coccinelle/the_repository.pending.cocci   | 30 +++
 object-store.h| 22 ++
 sha1-file.c   | 15 ++
 3 files changed, 56 insertions(+), 11 deletions(-)

diff --git a/contrib/coccinelle/the_repository.pending.cocci 
b/contrib/coccinelle/the_repository.pending.cocci
index a7ac9e0c46..46f3a1b23a 100644
--- a/contrib/coccinelle/the_repository.pending.cocci
+++ b/contrib/coccinelle/the_repository.pending.cocci
@@ -10,3 +10,33 @@ expression G;
 - read_object_file(
 + repo_read_object_file(the_repository,
   E, F, G)
+
+@@
+expression E;
+@@
+- has_sha1_file(
++ repo_has_sha1_file(the_repository,
+  E)
+
+@@
+expression E;
+expression F;
+@@
+- has_sha1_file_with_flags(
++ repo_has_sha1_file_with_flags(the_repository,
+  E)
+
+@@
+expression E;
+@@
+- has_object_file(
++ repo_has_object_file(the_repository,
+  E)
+
+@@
+expression E;
+expression F;
+@@
+- has_object_file_with_flags(
++ repo_has_object_file_with_flags(the_repository,
+  E)
diff --git a/object-store.h b/object-store.h
index 00a64622e6..2b5e6ff1ed 100644
--- a/object-store.h
+++ b/object-store.h
@@ -212,15 +212,27 @@ int read_loose_object(const char *path,
  * object_info. OBJECT_INFO_SKIP_CACHED is automatically set; pass
  * nonzero flags to also set other flags.
  */
-extern int has_sha1_file_with_flags(const unsigned char *sha1, int flags);
-static inline int has_sha1_file(const unsigned char *sha1)
+int repo_has_sha1_file_with_flags(struct repository *r,
+ const unsigned char *sha1, int flags);
+static inline int repo_has_sha1_file(struct repository *r,
+const unsigned char *sha1)
 {
-   return has_sha1_file_with_flags(sha1, 0);
+   return repo_has_sha1_file_with_flags(r, sha1, 0);
 }
 
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define has_sha1_file_with_flags(sha1, flags) 
repo_has_sha1_file_with_flags(the_repository, sha1, flags)
+#define has_sha1_file(sha1) repo_has_sha1_file(the_repository, sha1)
+#endif
+
 /* Same as the above, except for struct object_id. */
-extern int has_object_file(const struct object_id *oid);
-extern int has_object_file_with_flags(const struct object_id *oid, int flags);
+int repo_has_object_file(struct repository *r, const struct object_id *oid);
+int repo_has_object_file_with_flags(struct repository *r,
+   const struct object_id *oid, int flags);
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define has_object_file(oid) repo_has_object_file(the_repository, oid)
+#define has_object_file_with_flags(oid, flags) 
repo_has_object_file_with_flags(the_repository, oid, flags)
+#endif
 
 /*
  * Return true iff an alternate object database has a loose object
diff --git a/sha1-file.c b/sha1-file.c
index c5b704aec5..e77273ccfd 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -1768,24 +1768,27 @@ int force_object_loose(const struct object_id *oid, 
time_t mtime)
return ret;
 }
 
-int has_sha1_file_with_flags(const unsigned char *sha1, int flags)
+int repo_has_sha1_file_with_flags(struct repository *r,
+ const unsigned char *sha1, int flags)
 {
struct object_id oid;
if (!startup_info->have_repository)
return 0;
hashcpy(oid.hash, sha1);
-   return oid_object_info_extended(the_repository, , NULL,
+   return oid_object_info_extended(r, , NULL,
flags | OBJECT_INFO_SKIP_CACHED) >= 0;
 }
 
-int has_object_file(const struct object_id *oid)
+int repo_has_object_file(struct repository *r,
+const struct object_id *oid)
 {
-   return has_sha1_file(oid->hash);
+   return repo_has_sha1_file(r, oid->hash);
 }
 
-int has_object_file_with_flags(const struct object_id *oid, int flags)
+int repo_has_object_file_with_flags(struct repository *r,
+   const struct object_id *oid, int flags)
 {
-   return has_sha1_file_with_flags(oid->hash, flags);
+   return repo_has_sha1_file_with_flags(r, oid->hash, flags);
 }
 
 static void check_tree(const void *buf, size_t size)
-- 
2.19.1.1215.g8438c0b245-goog

[PATCH 15/23] commit: prepare repo_unuse_commit_buffer to handle any repo

2018-11-13 Thread Stefan Beller

Signed-off-by: Stefan Beller 
---
 commit.c| 6 --
 commit.h| 7 ++-
 contrib/coccinelle/the_repository.pending.cocci | 8 
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/commit.c b/commit.c
index 4034def16c..7d2f3a9a93 100644
--- a/commit.c
+++ b/commit.c
@@ -318,10 +318,12 @@ const void *repo_get_commit_buffer(struct repository *r,
return ret;
 }
 
-void unuse_commit_buffer(const struct commit *commit, const void *buffer)
+void repo_unuse_commit_buffer(struct repository *r,
+ const struct commit *commit,
+ const void *buffer)
 {
struct commit_buffer *v = buffer_slab_peek(
-   the_repository->parsed_objects->buffer_slab, commit);
+   r->parsed_objects->buffer_slab, commit);
if (!(v && v->buffer == buffer))
free((void *)buffer);
 }
diff --git a/commit.h b/commit.h
index 591a77a5bb..57375e3239 100644
--- a/commit.h
+++ b/commit.h
@@ -130,7 +130,12 @@ const void *repo_get_commit_buffer(struct repository *r,
  * from an earlier call to get_commit_buffer.  The buffer may or may not be
  * freed by this call; callers should not access the memory afterwards.
  */
-void unuse_commit_buffer(const struct commit *, const void *buffer);
+void repo_unuse_commit_buffer(struct repository *r,
+ const struct commit *,
+ const void *buffer);
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define unuse_commit_buffer(c, b) repo_unuse_commit_buffer(the_repository, c, 
b)
+#endif
 
 /*
  * Free any cached object buffer associated with the commit.
diff --git a/contrib/coccinelle/the_repository.pending.cocci 
b/contrib/coccinelle/the_repository.pending.cocci
index 4018e6eaf7..516f19ffee 100644
--- a/contrib/coccinelle/the_repository.pending.cocci
+++ b/contrib/coccinelle/the_repository.pending.cocci
@@ -115,3 +115,11 @@ expression F;
 - get_commit_buffer(
 + repo_get_commit_buffer(the_repository,
   E, F);
+
+@@
+expression E;
+expression F;
+@@
+- unuse_commit_buffer(
++ repo_unuse_commit_buffer(the_repository,
+  E, F);
-- 
2.19.1.1215.g8438c0b245-goog

[PATCH 06/23] object: parse_object to honor its repository argument

2018-11-13 Thread Stefan Beller

In 8e4b0b6047 (object.c: allow parse_object to handle
arbitrary repositories, 2018-06-28), we forgot to pass the
repository down to the read_object_file.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 object.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/object.c b/object.c
index e54160550c..003f870484 100644
--- a/object.c
+++ b/object.c
@@ -259,8 +259,8 @@ struct object *parse_object(struct repository *r, const 
struct object_id *oid)
if (obj && obj->parsed)
return obj;
 
-   if ((obj && obj->type == OBJ_BLOB && has_object_file(oid)) ||
-   (!obj && has_object_file(oid) &&
+   if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
+   (!obj && repo_has_object_file(r, oid) &&
 oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
if (check_object_signature(repl, NULL, 0, NULL) < 0) {
error(_("sha1 mismatch %s"), oid_to_hex(oid));
@@ -270,7 +270,7 @@ struct object *parse_object(struct repository *r, const 
struct object_id *oid)
return lookup_object(r, oid->hash);
}
 
-   buffer = read_object_file(oid, , );
+   buffer = repo_read_object_file(r, oid, , );
if (buffer) {
if (check_object_signature(repl, buffer, size, type_name(type)) 
< 0) {
free(buffer);
-- 
2.19.1.1215.g8438c0b245-goog

[PATCH 12/23] commit-reach: prepare get_merge_bases to handle any repo

2018-11-13 Thread Stefan Beller

Similarly to previous patches, the get_merge_base functions are used
often in the code base, which makes migrating them hard.

Implement the new functions, prefixed with 'repo_' and hide the old
functions behind a wrapper macro.

Signed-off-by: Stefan Beller 
---
 commit-reach.c| 24 ++---
 commit-reach.h| 26 ---
 .../coccinelle/the_repository.pending.cocci   | 26 +++
 3 files changed, 56 insertions(+), 20 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index b3b1f62aba..657a4e9b5a 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -258,23 +258,27 @@ static struct commit_list *get_merge_bases_many_0(struct 
repository *r,
return result;
 }
 
-struct commit_list *get_merge_bases_many(struct commit *one,
-int n,
-struct commit **twos)
+struct commit_list *repo_get_merge_bases_many(struct repository *r,
+ struct commit *one,
+ int n,
+ struct commit **twos)
 {
-   return get_merge_bases_many_0(the_repository, one, n, twos, 1);
+   return get_merge_bases_many_0(r, one, n, twos, 1);
 }
 
-struct commit_list *get_merge_bases_many_dirty(struct commit *one,
-  int n,
-  struct commit **twos)
+struct commit_list *repo_get_merge_bases_many_dirty(struct repository *r,
+   struct commit *one,
+   int n,
+   struct commit **twos)
 {
-   return get_merge_bases_many_0(the_repository, one, n, twos, 0);
+   return get_merge_bases_many_0(r, one, n, twos, 0);
 }
 
-struct commit_list *get_merge_bases(struct commit *one, struct commit *two)
+struct commit_list *repo_get_merge_bases(struct repository *r,
+struct commit *one,
+struct commit *two)
 {
-   return get_merge_bases_many_0(the_repository, one, 1, , 1);
+   return get_merge_bases_many_0(r, one, 1, , 1);
 }
 
 /*
diff --git a/commit-reach.h b/commit-reach.h
index 7d313e2975..52667d64ac 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -8,17 +8,23 @@ struct commit_list;
 struct contains_cache;
 struct ref_filter;
 
-struct commit_list *get_merge_bases_many(struct commit *one,
-int n,
-struct commit **twos);
-struct commit_list *get_merge_bases_many_dirty(struct commit *one,
-  int n,
-  struct commit **twos);
-struct commit_list *get_merge_bases(struct commit *one, struct commit *two);
-struct commit_list *get_octopus_merge_bases(struct commit_list *in);
-
+struct commit_list *repo_get_merge_bases(struct repository *r,
+struct commit *rev1,
+struct commit *rev2);
+struct commit_list *repo_get_merge_bases_many(struct repository *r,
+ struct commit *one, int n,
+ struct commit **twos);
 /* To be used only when object flags after this call no longer matter */
-struct commit_list *get_merge_bases_many_dirty(struct commit *one, int n, 
struct commit **twos);
+struct commit_list *repo_get_merge_bases_many_dirty(struct repository *r,
+   struct commit *one, int n,
+   struct commit **twos);
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define get_merge_bases(r1, r2)   repo_get_merge_bases(the_repository, 
r1, r2)
+#define get_merge_bases_many(one, n, two) 
repo_get_merge_bases_many(the_repository, one, n, two)
+#define get_merge_bases_many_dirty(one, n, twos) 
repo_get_merge_bases_many_dirty(the_repository, one, n, twos)
+#endif
+
+struct commit_list *get_octopus_merge_bases(struct commit_list *in);
 
 int is_descendant_of(struct commit *commit, struct commit_list *with_commit);
 int in_merge_bases_many(struct commit *commit, int nr_reference, struct commit 
**reference);
diff --git a/contrib/coccinelle/the_repository.pending.cocci 
b/contrib/coccinelle/the_repository.pending.cocci
index b185fe0a1d..f6c2915a4e 100644
--- a/contrib/coccinelle/the_repository.pending.cocci
+++ b/contrib/coccinelle/the_repository.pending.cocci
@@ -64,3 +64,29 @@ expression E;
 - parse_commit(
 + repo_parse_commit(the_repository,
   E)
+
+@@
+expression E;
+expression F;
+@@
+- get_merge_bases(
++ repo_get_merge_bases(the_repository,
+  E, F);
+
+@@
+expression E;
+expression F;
+expression G

[PATCH 04/23] object-store: prepare read_object_file to deal with any repo

2018-11-13 Thread Stefan Beller

As read_object_file is a widely used function (which is also regularly used
in new code in flight between master..pu), changing its signature is painful
is hard, as other series in flight rely on the original signature. It would
burden the maintainer if we'd just change the signature.

Introduce repo_read_object_file which takes the repository argument, and
hide the original read_object_file as a macro behind
NO_THE_REPOSITORY_COMPATIBILITY_MACROS, similar to
e675765235 (diff.c: remove implicit dependency on the_index, 2018-09-21)

Add a coccinelle patch to convert existing callers, but do not apply
the resulting patch to keep the diff of this patch small.

Signed-off-by: Stefan Beller 
---
 contrib/coccinelle/the_repository.pending.cocci | 12 
 object-store.h  | 10 --
 2 files changed, 20 insertions(+), 2 deletions(-)
 create mode 100644 contrib/coccinelle/the_repository.pending.cocci

diff --git a/contrib/coccinelle/the_repository.pending.cocci 
b/contrib/coccinelle/the_repository.pending.cocci
new file mode 100644
index 00..a7ac9e0c46
--- /dev/null
+++ b/contrib/coccinelle/the_repository.pending.cocci
@@ -0,0 +1,12 @@
+// This file is used for the ongoing refactoring of
+// bringing the index or repository struct in all of
+// our code base.
+
+@@
+expression E;
+expression F;
+expression G;
+@@
+- read_object_file(
++ repo_read_object_file(the_repository,
+  E, F, G)
diff --git a/object-store.h b/object-store.h
index 3d98a682b2..00a64622e6 100644
--- a/object-store.h
+++ b/object-store.h
@@ -165,10 +165,16 @@ extern void *read_object_file_extended(struct repository 
*r,
   const struct object_id *oid,
   enum object_type *type,
   unsigned long *size, int lookup_replace);
-static inline void *read_object_file(const struct object_id *oid, enum 
object_type *type, unsigned long *size)
+static inline void *repo_read_object_file(struct repository *r,
+ const struct object_id *oid,
+ enum object_type *type,
+ unsigned long *size)
 {
-   return read_object_file_extended(the_repository, oid, type, size, 1);
+   return read_object_file_extended(r, oid, type, size, 1);
 }
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define read_object_file(oid, type, size) 
repo_read_object_file(the_repository, oid, type, size)
+#endif
 
 /* Read and unpack an object file into memory, write memory to an object file 
*/
 int oid_object_info(struct repository *r, const struct object_id *, unsigned 
long *);
-- 
2.19.1.1215.g8438c0b245-goog

[PATCH 13/23] commit-reach: prepare in_merge_bases[_many] to handle any repo

2018-11-13 Thread Stefan Beller

Signed-off-by: Stefan Beller 
---
 commit-reach.c  | 15 +--
 commit-reach.h  | 12 ++--
 contrib/coccinelle/the_repository.pending.cocci | 17 +
 3 files changed, 36 insertions(+), 8 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 657a4e9b5a..8715008fef 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -312,16 +312,17 @@ int is_descendant_of(struct commit *commit, struct 
commit_list *with_commit)
 /*
  * Is "commit" an ancestor of one of the "references"?
  */
-int in_merge_bases_many(struct commit *commit, int nr_reference, struct commit 
**reference)
+int repo_in_merge_bases_many(struct repository *r, struct commit *commit,
+int nr_reference, struct commit **reference)
 {
struct commit_list *bases;
int ret = 0, i;
uint32_t min_generation = GENERATION_NUMBER_INFINITY;
 
-   if (parse_commit(commit))
+   if (repo_parse_commit(r, commit))
return ret;
for (i = 0; i < nr_reference; i++) {
-   if (parse_commit(reference[i]))
+   if (repo_parse_commit(r, reference[i]))
return ret;
if (reference[i]->generation < min_generation)
min_generation = reference[i]->generation;
@@ -330,7 +331,7 @@ int in_merge_bases_many(struct commit *commit, int 
nr_reference, struct commit *
if (commit->generation > min_generation)
return ret;
 
-   bases = paint_down_to_common(the_repository, commit,
+   bases = paint_down_to_common(r, commit,
 nr_reference, reference,
 commit->generation);
if (commit->object.flags & PARENT2)
@@ -344,9 +345,11 @@ int in_merge_bases_many(struct commit *commit, int 
nr_reference, struct commit *
 /*
  * Is "commit" an ancestor of (i.e. reachable from) the "reference"?
  */
-int in_merge_bases(struct commit *commit, struct commit *reference)
+int repo_in_merge_bases(struct repository *r,
+   struct commit *commit,
+   struct commit *reference)
 {
-   return in_merge_bases_many(commit, 1, );
+   return repo_in_merge_bases_many(r, commit, 1, );
 }
 
 struct commit_list *reduce_heads(struct commit_list *heads)
diff --git a/commit-reach.h b/commit-reach.h
index 52667d64ac..a0d4a29d25 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -27,8 +27,16 @@ struct commit_list *repo_get_merge_bases_many_dirty(struct 
repository *r,
 struct commit_list *get_octopus_merge_bases(struct commit_list *in);
 
 int is_descendant_of(struct commit *commit, struct commit_list *with_commit);
-int in_merge_bases_many(struct commit *commit, int nr_reference, struct commit 
**reference);
-int in_merge_bases(struct commit *commit, struct commit *reference);
+int repo_in_merge_bases(struct repository *r,
+   struct commit *commit,
+   struct commit *reference);
+int repo_in_merge_bases_many(struct repository *r,
+struct commit *commit,
+int nr_reference, struct commit **reference);
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define in_merge_bases(c1, c2) repo_in_merge_bases(the_repository, c1, c2)
+#define in_merge_bases_many(c1, n, cs) 
repo_in_merge_bases_many(the_repository, c1, n, cs)
+#endif
 
 /*
  * Takes a list of commits and returns a new list where those
diff --git a/contrib/coccinelle/the_repository.pending.cocci 
b/contrib/coccinelle/the_repository.pending.cocci
index f6c2915a4e..8c6a71bf64 100644
--- a/contrib/coccinelle/the_repository.pending.cocci
+++ b/contrib/coccinelle/the_repository.pending.cocci
@@ -90,3 +90,20 @@ expression G;
 - get_merge_bases_many_dirty(
 + repo_get_merge_bases_many_dirty(the_repository,
   E, F, G);
+
+@@
+expression E;
+expression F;
+@@
+- in_merge_bases(
++ repo_in_merge_bases(the_repository,
+  E, F);
+
+@@
+expression E;
+expression F;
+expression G;
+@@
+- in_merge_bases_many(
++ repo_in_merge_bases_many(the_repository,
+  E, F, G);
-- 
2.19.1.1215.g8438c0b245-goog

[PATCH 11/23] commit-reach.c: allow get_merge_bases_many_0 to handle any repo

2018-11-13 Thread Stefan Beller

Signed-off-by: Stefan Beller 
---
 commit-reach.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 81015830cb..b3b1f62aba 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -216,7 +216,8 @@ static int remove_redundant(struct repository *r, struct 
commit **array, int cnt
return filled;
 }
 
-static struct commit_list *get_merge_bases_many_0(struct commit *one,
+static struct commit_list *get_merge_bases_many_0(struct repository *r,
+ struct commit *one,
  int n,
  struct commit **twos,
  int cleanup)
@@ -226,7 +227,7 @@ static struct commit_list *get_merge_bases_many_0(struct 
commit *one,
struct commit_list *result;
int cnt, i;
 
-   result = merge_bases_many(the_repository, one, n, twos);
+   result = merge_bases_many(r, one, n, twos);
for (i = 0; i < n; i++) {
if (one == twos[i])
return result;
@@ -249,7 +250,7 @@ static struct commit_list *get_merge_bases_many_0(struct 
commit *one,
clear_commit_marks(one, all_flags);
clear_commit_marks_many(n, twos, all_flags);
 
-   cnt = remove_redundant(the_repository, rslt, cnt);
+   cnt = remove_redundant(r, rslt, cnt);
result = NULL;
for (i = 0; i < cnt; i++)
commit_list_insert_by_date(rslt[i], );
@@ -261,19 +262,19 @@ struct commit_list *get_merge_bases_many(struct commit 
*one,
 int n,
 struct commit **twos)
 {
-   return get_merge_bases_many_0(one, n, twos, 1);
+   return get_merge_bases_many_0(the_repository, one, n, twos, 1);
 }
 
 struct commit_list *get_merge_bases_many_dirty(struct commit *one,
   int n,
   struct commit **twos)
 {
-   return get_merge_bases_many_0(one, n, twos, 0);
+   return get_merge_bases_many_0(the_repository, one, n, twos, 0);
 }
 
 struct commit_list *get_merge_bases(struct commit *one, struct commit *two)
 {
-   return get_merge_bases_many_0(one, 1, , 1);
+   return get_merge_bases_many_0(the_repository, one, 1, , 1);
 }
 
 /*
-- 
2.19.1.1215.g8438c0b245-goog

[PATCH 03/23] object-store: allow read_object_file_extended to read from any repo

2018-11-13 Thread Stefan Beller

read_object_file_extended is not widely used, so migrate it all at once.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 object-store.h |  5 +++--
 sha1-file.c| 11 ++-
 streaming.c|  2 +-
 3 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/object-store.h b/object-store.h
index 63b7605a3e..3d98a682b2 100644
--- a/object-store.h
+++ b/object-store.h
@@ -161,12 +161,13 @@ void sha1_file_name(struct repository *r, struct strbuf 
*buf, const unsigned cha
 
 void *map_sha1_file(struct repository *r, const unsigned char *sha1, unsigned 
long *size);
 
-extern void *read_object_file_extended(const struct object_id *oid,
+extern void *read_object_file_extended(struct repository *r,
+  const struct object_id *oid,
   enum object_type *type,
   unsigned long *size, int lookup_replace);
 static inline void *read_object_file(const struct object_id *oid, enum 
object_type *type, unsigned long *size)
 {
-   return read_object_file_extended(oid, type, size, 1);
+   return read_object_file_extended(the_repository, oid, type, size, 1);
 }
 
 /* Read and unpack an object file into memory, write memory to an object file 
*/
diff --git a/sha1-file.c b/sha1-file.c
index 856e000ee1..c5b704aec5 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -1403,7 +1403,8 @@ int pretend_object_file(void *buf, unsigned long len, 
enum object_type type,
  * deal with them should arrange to call read_object() and give error
  * messages themselves.
  */
-void *read_object_file_extended(const struct object_id *oid,
+void *read_object_file_extended(struct repository *r,
+   const struct object_id *oid,
enum object_type *type,
unsigned long *size,
int lookup_replace)
@@ -1413,10 +1414,10 @@ void *read_object_file_extended(const struct object_id 
*oid,
const char *path;
struct stat st;
const struct object_id *repl = lookup_replace ?
-   lookup_replace_object(the_repository, oid) : oid;
+   lookup_replace_object(r, oid) : oid;
 
errno = 0;
-   data = read_object(the_repository, repl->hash, type, size);
+   data = read_object(r, repl->hash, type, size);
if (data)
return data;
 
@@ -1428,11 +1429,11 @@ void *read_object_file_extended(const struct object_id 
*oid,
die(_("replacement %s not found for %s"),
oid_to_hex(repl), oid_to_hex(oid));
 
-   if (!stat_sha1_file(the_repository, repl->hash, , ))
+   if (!stat_sha1_file(r, repl->hash, , ))
die(_("loose object %s (stored in %s) is corrupt"),
oid_to_hex(repl), path);
 
-   if ((p = has_packed_and_bad(the_repository, repl->hash)) != NULL)
+   if ((p = has_packed_and_bad(r, repl->hash)) != NULL)
die(_("packed object %s (stored in %s) is corrupt"),
oid_to_hex(repl), p->pack_name);
 
diff --git a/streaming.c b/streaming.c
index d1e6b2dce6..c843a1230f 100644
--- a/streaming.c
+++ b/streaming.c
@@ -490,7 +490,7 @@ static struct stream_vtbl incore_vtbl = {
 
 static open_method_decl(incore)
 {
-   st->u.incore.buf = read_object_file_extended(oid, type, >size, 0);
+   st->u.incore.buf = read_object_file_extended(the_repository, oid, type, 
>size, 0);
st->u.incore.read_ptr = 0;
st->vtbl = _vtbl;
 
-- 
2.19.1.1215.g8438c0b245-goog

[PATCH 09/23] commit-reach.c: allow merge_bases_many to handle any repo

2018-11-13 Thread Stefan Beller

Signed-off-by: Stefan Beller 
---
 commit-reach.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 67c2e43d5e..a53b31e6a2 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -95,7 +95,9 @@ static struct commit_list *paint_down_to_common(struct 
repository *r,
return result;
 }
 
-static struct commit_list *merge_bases_many(struct commit *one, int n, struct 
commit **twos)
+static struct commit_list *merge_bases_many(struct repository *r,
+   struct commit *one, int n,
+   struct commit **twos)
 {
struct commit_list *list = NULL;
struct commit_list *result = NULL;
@@ -110,14 +112,14 @@ static struct commit_list *merge_bases_many(struct commit 
*one, int n, struct co
return commit_list_insert(one, );
}
 
-   if (parse_commit(one))
+   if (repo_parse_commit(r, one))
return NULL;
for (i = 0; i < n; i++) {
-   if (parse_commit(twos[i]))
+   if (repo_parse_commit(r, twos[i]))
return NULL;
}
 
-   list = paint_down_to_common(the_repository, one, n, twos, 0);
+   list = paint_down_to_common(r, one, n, twos, 0);
 
while (list) {
struct commit *commit = pop_commit();
@@ -224,7 +226,7 @@ static struct commit_list *get_merge_bases_many_0(struct 
commit *one,
struct commit_list *result;
int cnt, i;
 
-   result = merge_bases_many(one, n, twos);
+   result = merge_bases_many(the_repository, one, n, twos);
for (i = 0; i < n; i++) {
if (one == twos[i])
return result;
-- 
2.19.1.1215.g8438c0b245-goog

[PATCH 10/23] commit-reach.c: allow remove_redundant to handle any repo

2018-11-13 Thread Stefan Beller

Signed-off-by: Stefan Beller 
---
 commit-reach.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index a53b31e6a2..81015830cb 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -156,7 +156,7 @@ struct commit_list *get_octopus_merge_bases(struct 
commit_list *in)
return ret;
 }
 
-static int remove_redundant(struct commit **array, int cnt)
+static int remove_redundant(struct repository *r, struct commit **array, int 
cnt)
 {
/*
 * Some commit in the array may be an ancestor of
@@ -174,7 +174,7 @@ static int remove_redundant(struct commit **array, int cnt)
ALLOC_ARRAY(filled_index, cnt - 1);
 
for (i = 0; i < cnt; i++)
-   parse_commit(array[i]);
+   repo_parse_commit(r, array[i]);
for (i = 0; i < cnt; i++) {
struct commit_list *common;
uint32_t min_generation = array[i]->generation;
@@ -190,7 +190,7 @@ static int remove_redundant(struct commit **array, int cnt)
if (array[j]->generation < min_generation)
min_generation = array[j]->generation;
}
-   common = paint_down_to_common(the_repository, array[i], filled,
+   common = paint_down_to_common(r, array[i], filled,
  work, min_generation);
if (array[i]->object.flags & PARENT2)
redundant[i] = 1;
@@ -249,7 +249,7 @@ static struct commit_list *get_merge_bases_many_0(struct 
commit *one,
clear_commit_marks(one, all_flags);
clear_commit_marks_many(n, twos, all_flags);
 
-   cnt = remove_redundant(rslt, cnt);
+   cnt = remove_redundant(the_repository, rslt, cnt);
result = NULL;
for (i = 0; i < cnt; i++)
commit_list_insert_by_date(rslt[i], );
@@ -370,7 +370,7 @@ struct commit_list *reduce_heads(struct commit_list *heads)
p->item->object.flags &= ~STALE;
}
}
-   num_head = remove_redundant(array, num_head);
+   num_head = remove_redundant(the_repository, array, num_head);
for (i = 0; i < num_head; i++)
tail = _list_insert(array[i], tail)->next;
free(array);
-- 
2.19.1.1215.g8438c0b245-goog

[PATCHv3 00/23] Bring more repository handles into our code base

2018-11-13 Thread Stefan Beller

This resends origin/sb/more-repo-in-api.

Unlike the previous resend (v2), this is not rebased to a newer base.
This doesn't contain the patch for pending semantic changes, as that
seems to go in its own topic branch.

Please have a look at the last 4 patches specifically as they were new in
the last iteration (but did not receive any comment), as they demonstrate
and fix a problem that is only exposed when using GIT_TEST_COMMIT_GRAPH=1
for the test suite.

Previous discussion at
https://public-inbox.org/git/20181030220817.61691-1-sbel...@google.com/

Thanks,
Stefan

Stefan Beller (23):
  sha1_file: allow read_object to read objects in arbitrary repositories
  packfile: allow has_packed_and_bad to handle arbitrary repositories
  object-store: allow read_object_file_extended to read from any repo
  object-store: prepare read_object_file to deal with any repo
  object-store: prepare has_{sha1, object}_file to handle any repo
  object: parse_object to honor its repository argument
  commit: allow parse_commit* to handle any repo
  commit-reach.c: allow paint_down_to_common to handle any repo
  commit-reach.c: allow merge_bases_many to handle any repo
  commit-reach.c: allow remove_redundant to handle any repo
  commit-reach.c: allow get_merge_bases_many_0 to handle any repo
  commit-reach: prepare get_merge_bases to handle any repo
  commit-reach: prepare in_merge_bases[_many] to handle any repo
  commit: prepare get_commit_buffer to handle any repo
  commit: prepare repo_unuse_commit_buffer to handle any repo
  commit: prepare logmsg_reencode to handle arbitrary repositories
  pretty: prepare format_commit_message to handle arbitrary repositories
  submodule: use submodule repos for object lookup
  submodule: don't add submodule as odb for push
  commit-graph: convert remaining functions to handle any repo
  commit: prepare free_commit_buffer and release_commit_memory for any
repo
  path.h: make REPO_GIT_PATH_FUNC repository agnostic
  t/helper/test-repository: celebrate independence from the_repository

 builtin/fsck.c|   3 +-
 builtin/log.c |   6 +-
 builtin/rev-list.c|   3 +-
 cache.h   |   2 +
 commit-graph.c|  40 +++--
 commit-reach.c|  73 +
 commit-reach.h|  38 +++--
 commit.c  |  41 ++---
 commit.h  |  43 +-
 .../coccinelle/the_repository.pending.cocci   | 144 ++
 object-store.h|  35 -
 object.c  |   8 +-
 packfile.c|   5 +-
 packfile.h|   2 +-
 path.h|   2 +-
 pretty.c  |  28 ++--
 pretty.h  |   7 +-
 sha1-file.c   |  34 +++--
 streaming.c   |   2 +-
 submodule.c   |  76 ++---
 t/helper/test-repository.c|  10 ++
 21 files changed, 452 insertions(+), 150 deletions(-)
 create mode 100644 contrib/coccinelle/the_repository.pending.cocci

Range-diff:
 1:  1b9b5c695e =  1:  ca9fece80e sha1_file: allow read_object to read objects 
in arbitrary repositories
 2:  33b94066f2 =  2:  8eac25fe32 packfile: allow has_packed_and_bad to handle 
arbitrary repositories
 3:  5217b6b1e1 !  3:  06e5f83b66 object-store: allow read_object_file_extended 
to read from arbitrary repositories
@@ -1,6 +1,6 @@
 Author: Stefan Beller 
 
-object-store: allow read_object_file_extended to read from arbitrary 
repositories
+object-store: allow read_object_file_extended to read from any repo
 
 read_object_file_extended is not widely used, so migrate it all at 
once.
 
 4:  2b7239b55b !  4:  6167722608 object-store: prepare read_object_file to 
deal with arbitrary repositories
@@ -1,6 +1,6 @@
 Author: Stefan Beller 
 
-object-store: prepare read_object_file to deal with arbitrary 
repositories
+object-store: prepare read_object_file to deal with any repo
 
 As read_object_file is a widely used function (which is also regularly 
used
 in new code in flight between master..pu), changing its signature is 
painful
@@ -13,16 +13,14 @@
 e675765235 (diff.c: remove implicit dependency on the_index, 
2018-09-21)
 
 Add a coccinelle patch to convert existing callers, but do not apply
-the resulting patch from 'make coccicheck' to keep the diff of this
-patch small.
+the resulting patch to keep the diff of this patch small.
 
 Signed-off-by: Stefan Beller 
-Signed-off-by: Junio C Hamano

[PATCH 01/23] sha1_file: allow read_object to read objects in arbitrary repositories

2018-11-13 Thread Stefan Beller

Allow read_object (a file local functon in sha1_file) to
handle arbitrary repositories by passing the repository down
to oid_object_info_extended.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 sha1-file.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/sha1-file.c b/sha1-file.c
index dd0b6aa873..b8ce21cbaf 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -1361,7 +1361,9 @@ int oid_object_info(struct repository *r,
return type;
 }
 
-static void *read_object(const unsigned char *sha1, enum object_type *type,
+static void *read_object(struct repository *r,
+const unsigned char *sha1,
+enum object_type *type,
 unsigned long *size)
 {
struct object_id oid;
@@ -1373,7 +1375,7 @@ static void *read_object(const unsigned char *sha1, enum 
object_type *type,
 
hashcpy(oid.hash, sha1);
 
-   if (oid_object_info_extended(the_repository, , , 0) < 0)
+   if (oid_object_info_extended(r, , , 0) < 0)
return NULL;
return content;
 }
@@ -1414,7 +1416,7 @@ void *read_object_file_extended(const struct object_id 
*oid,
lookup_replace_object(the_repository, oid) : oid;
 
errno = 0;
-   data = read_object(repl->hash, type, size);
+   data = read_object(the_repository, repl->hash, type, size);
if (data)
return data;
 
@@ -1755,7 +1757,7 @@ int force_object_loose(const struct object_id *oid, 
time_t mtime)
 
if (has_loose_object(oid))
return 0;
-   buf = read_object(oid->hash, , );
+   buf = read_object(the_repository, oid->hash, , );
if (!buf)
return error(_("cannot read sha1_file for %s"), 
oid_to_hex(oid));
hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %lu", type_name(type), len) + 
1;
-- 
2.19.1.1215.g8438c0b245-goog

Re: rebase-in-C stability for 2.20

2018-11-13 Thread Stefan Beller

> But maybe I'm being overly paranoid. What do those more familiar with
> this think?

I am not too worried,
* as rebase is a main porcelain, that is even hard to use in a script.
  so any failures are not deep down in some automation,
  but when found exposed quickly (and hopefully reported).
* 5541bd5b8f was merged to next a month ago; internally we
   distribute the next branch to Googlers (on a weekly basis)
   and we have not had any bug reports regarding rebase.
   (Maybe our environment is too strict for the wide range
of bugs reported)
* Johannes reported that the rebase is used in GfW
   
https://public-inbox.org/git/nycvar.qro.7.76.6.1808241320540...@tvgsbejvaqbjf.bet/
   https://github.com/git-for-windows/git/pull/1800
   and from my cursory reading it is part of
   2.19.windows, which has a large user base.

> (and would re-enable rebase.useBuiltin=true in
> master right after 2.20 is out the door).

That would be fine with me as well, but I'd rather
document rebase.useBuiltin instead of flip-flopping
the switch around the release.

Have there been any fixes that are only in
the C version (has the shell version already bitrotted)?

Stefan

[PATCH] diff: align move detection error handling with other options

2018-11-13 Thread Stefan Beller

This changes the error handling for the options --color-moved-ws
and --color-moved-ws to be like the rest of the options.

Move the die() call out of parse_color_moved_ws into the parsing
of command line options. As the function returns a bit field, change
its signature to return an unsigned instead of an int; add a new bit
to signal errors. Once the error is signaled, we discard the other
bits, such that it doesn't matter if the error bit overlaps with any
other bit.

Signed-off-by: Stefan Beller 
---

c.f.
./git -c diff.colormovedws=bogus diff HEAD
error: unknown color-moved-ws mode 'bogus'
fatal: unable to parse 'diff.colormovedws' from command-line config
./git -c core.abbrev=41 diff
error: abbrev length out of range: 41
fatal: unable to parse 'core.abbrev' from command-line config

./git diff --color=bogus
error: option `color' expects "always", "auto", or "never"
./git -c diff.colormovedws=bogus diff HEAD
error: unknown color-moved-ws mode 'bogus'
fatal: unable to parse 'diff.colormovedws' from command-line config


 diff.c | 25 -
 diff.h |  3 ++-
 t/t4015-diff-whitespace.sh | 18 ++
 3 files changed, 36 insertions(+), 10 deletions(-)

diff --git a/diff.c b/diff.c
index 8647db3d30..d7d467b605 100644
--- a/diff.c
+++ b/diff.c
@@ -291,7 +291,7 @@ static int parse_color_moved(const char *arg)
return error(_("color moved setting must be one of 'no', 
'default', 'blocks', 'zebra', 'dimmed-zebra', 'plain'"));
 }
 
-static int parse_color_moved_ws(const char *arg)
+static unsigned parse_color_moved_ws(const char *arg)
 {
int ret = 0;
struct string_list l = STRING_LIST_INIT_DUP;
@@ -312,15 +312,19 @@ static int parse_color_moved_ws(const char *arg)
ret |= XDF_IGNORE_WHITESPACE;
else if (!strcmp(sb.buf, "allow-indentation-change"))
ret |= COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE;
-   else
-   error(_("ignoring unknown color-moved-ws mode '%s'"), 
sb.buf);
+   else {
+   ret |= COLOR_MOVED_WS_ERROR;
+   error(_("unknown color-moved-ws mode '%s', possible 
values are 'ignore-space-change', 'ignore-space-at-eol', 'ignore-all-space', 
'allow-indentation-change'"), sb.buf);
+   }
 
strbuf_release();
}
 
if ((ret & COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) &&
-   (ret & XDF_WHITESPACE_FLAGS))
-   die(_("color-moved-ws: allow-indentation-change cannot be 
combined with other white space modes"));
+   (ret & XDF_WHITESPACE_FLAGS)) {
+   error(_("color-moved-ws: allow-indentation-change cannot be 
combined with other white space modes"));
+   ret |= COLOR_MOVED_WS_ERROR;
+   }
 
string_list_clear(, 0);
 
@@ -341,8 +345,8 @@ int git_diff_ui_config(const char *var, const char *value, 
void *cb)
return 0;
}
if (!strcmp(var, "diff.colormovedws")) {
-   int cm = parse_color_moved_ws(value);
-   if (cm < 0)
+   unsigned cm = parse_color_moved_ws(value);
+   if (cm & COLOR_MOVED_WS_ERROR)
return -1;
diff_color_moved_ws_default = cm;
return 0;
@@ -5032,10 +5036,13 @@ int diff_opt_parse(struct diff_options *options,
else if (skip_prefix(arg, "--color-moved=", )) {
int cm = parse_color_moved(arg);
if (cm < 0)
-   die("bad --color-moved argument: %s", arg);
+   return error("bad --color-moved argument: %s", arg);
options->color_moved = cm;
} else if (skip_prefix(arg, "--color-moved-ws=", )) {
-   options->color_moved_ws_handling = parse_color_moved_ws(arg);
+   unsigned cm = parse_color_moved_ws(arg);
+   if (cm & COLOR_MOVED_WS_ERROR)
+   return -1;
+   options->color_moved_ws_handling = cm;
} else if (skip_to_optional_arg_default(arg, "--color-words", 
>word_regex, NULL)) {
options->use_color = 1;
options->word_diff = DIFF_WORDS_COLOR;
diff --git a/diff.h b/diff.h
index ce5e8a8183..9e8061ca29 100644
--- a/diff.h
+++ b/diff.h
@@ -225,7 +225,8 @@ struct diff_options {
 
/* XDF_WHITESPACE_FLAGS regarding block detection are set at 2, 3, 4 */
#define COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE (1<<5)
-   int color_moved_ws_handling;
+   #define COLOR_MOVED_WS_ERROR (1<<0)
+   unsigned color_moved_ws_handling;
 
struct repository *repo;
 };
diff --git a/t/t4015-diff-whitespace.sh b/t/t4

Re: [PATCH 1/1] bundle: refuse to create empty bundle

2018-11-13 Thread Stefan Beller

On Tue, Nov 13, 2018 at 12:33 PM Gaël Lhez  wrote:
>
> Hello,
>
> I don't know why I receive these message (and especially now given the time 
> at which I pushed this) but I suppose someone (Johannes Schindelin ?) 
> probably pushed back my original commit from git for windows github to GIT 
> git repository.

Yes that is pretty much what is happening. Johannes (GfW maintainer)
tries to push a lot of patches upstream to git and cc's people who
originally authored the patch.
Thanks for taking a look, again, even after this long time!

>
> If you think "bundle: cleanup lock files on error" is better, then no problem 
> with me. I'm not a native english speaker and I simply expressed the reason 
> for my problem but - after reading back my commit - neither this mail' 
> subject and my original commit subject (see 
> https://github.com/git-for-windows/git/pull/797/commits/0ef742a1a92ae53188189238adbd16086fabf80a)
>  express the core problem.

I am not a native speaker either, which makes it extra hard to
understand some commits. ;-) So I propose a wording which would have
helped me.

> As I'm not accustomed to pushing on GIT 'git repository' , please let me know 
> if I have something else to do ?

I don't know how Johannes handles pushing changes upstream, maybe he
will take on the work of resending a reworded patch.
Let's hear his thoughts on it. I would guess you're more than welcome
to take your patches from GitForWindows into Git itself.

Cheers,
Stefan

Re: [PATCH 1/1] bundle: refuse to create empty bundle

2018-11-13 Thread Stefan Beller

On Tue, Nov 13, 2018 at 7:09 AM Gaël Lhez via GitGitGadget
 wrote:
>
> From: =?UTF-8?q?Ga=C3=ABl=20Lhez?= 
>
> When an user tries to create an empty bundle via `git bundle create
>  ` where `` resolves to an empty list (for
> example, like `master..master`), the command fails and warns the user
> about how it does not want to create empty bundle.
>
> However, the `.lock` file was still open and on Windows that means
> that it could not be deleted properly. This patch fixes that issue.
>
> This closes https://github.com/git-for-windows/git/issues/790
>
> Signed-off-by: Gaël Lhez 
> Signed-off-by: Johannes Schindelin 

The code and the commit message make sense, but by reading the subject line

I would have expected a different commit. Maybe
"bundle: cleanup lock files on error"

Re: Re* [PATCH v3 1/1] protocol: advertise multiple supported versions

2018-11-13 Thread Stefan Beller

On Mon, Nov 12, 2018 at 7:24 PM Junio C Hamano  wrote:
>
> Stefan Beller  writes:
>
> >> +   if (have_advertised_versions_already)
> >> +   BUG(_("attempting to register an allowed protocol version 
> >> after advertisement"));
> >
> > If this is a real BUG (due to wrong program flow) instead of bad user input,
> > we would not want to burden the translators with this message.
> >
> > If it is a message that the user is intended to see upon bad input
> > we'd rather go with die(_("translatable text here"));
>
> Yeah, good suggestion.
>
> Perhaps we should spell it out in docstrings for BUG/die with the
> above rationale.  A weatherbaloon patch follows.

"Nobody reads documentation any more, we have too much of it." [1]
[1] c.f. https://quoteinvestigator.com/2014/08/29/too-crowded/

I would have hoped to have a .cocci patch in the bad style category,
i.e. disallowing the _() function inside the context of BUG().

That said, I like the patch below. Thanks for writing it.

>  git-compat-util.h | 22 +-
>  1 file changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/git-compat-util.h b/git-compat-util.h
> index 96a3f86d8e..a1cf68cbbc 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -420,7 +420,16 @@ static inline char *git_find_last_dir_sep(const char 
> *path)
>
>  struct strbuf;
>
> -/* General helper functions */
> +/*
> + * General helper functions
> + *
> + * Note that these are to produce end-user facing messages, and most
> + * of them should be marked for translation (the exception is
> + * messages generated to be sent over the wire---as we currently do not
> + * have a mechanism to notice and honor locale settings used on the
> + * other end, leave them untranslated.  We should *not* send messages
> + * that are localized for our end).
> + */
>  extern void vreportf(const char *prefix, const char *err, va_list params);
>  extern NORETURN void usage(const char *err);
>  extern NORETURN void usagef(const char *err, ...) __attribute__((format 
> (printf, 1, 2)));
> @@ -1142,6 +1151,17 @@ static inline int regexec_buf(const regex_t *preg, 
> const char *buf, size_t size,
>  /* usage.c: only to be used for testing BUG() implementation (see test-tool) 
> */
>  extern int BUG_exit_code;
>
> +/*
> + * BUG(fmt, ...) is to be used when the problem of reaching that point
> + * in code can only be caused by a program error.  To abort with a
> + * message to report impossible-to-continue condition due to external
> + * factors like end-user input, environment settings, data it was fed
> + * over the wire, use die(_(fmt),...).
> + *
> + * Note that the message from BUG() should not be marked for
> + * translation while the message from die() is in general end-user
> + * facing and should be marked for translation.
> + */
>  #ifdef HAVE_VARIADIC_MACROS
>  __attribute__((format (printf, 3, 4))) NORETURN
>  void BUG_fl(const char *file, int line, const char *fmt, ...);

Re: [PATCH] remote-curl: die on server-side errors

2018-11-12 Thread Stefan Beller

On Mon, Nov 12, 2018 at 2:45 PM  wrote:
>
> When a smart HTTP server sends an error message via pkt-line,
> remote-curl will fail to detect the error (which usually results in
> incorrectly falling back to dumb-HTTP mode).
>
> This patch adds a check in discover_refs() for server-side error
> messages, as well as a test case for this issue.
>
> Signed-off-by: Josh Steadmon 

Reviewed-by: Stefan Beller

Re: [PATCH] Makefile: use CXXFLAGS for linking fuzzers

2018-11-12 Thread Stefan Beller

On Mon, Nov 12, 2018 at 2:03 PM  wrote:
>
> OSS-Fuzz requires C++-specific flags to link fuzzers. Passing these in
> CFLAGS causes lots of build warnings. Using separate CXXFLAGS avoids
> this.
>

That makes sense in this context, 

>  CFLAGS = -g -O2 -Wall
> +CXXFLAGS ?= $(CFLAGS)

... but out of context, just by reading the relevant part of the Makefile,
a user might mistakenly assume we do some C++ trickery for standard
compilation of Git. (Is that bad or do we just not care?)

I wonder if setting the CXXFLAGS near or in the fuzz target
would be better.

>  LDFLAGS =
>  ALL_CFLAGS = $(CPPFLAGS) $(CFLAGS)
>  ALL_LDFLAGS = $(LDFLAGS)
> @@ -3098,14 +3099,14 @@ cover_db_html: cover_db
>  # An example command to build against libFuzzer from LLVM 4.0.0:
>  #
>  # make CC=clang CXX=clang++ \
> -#  CFLAGS="-fsanitize-coverage=trace-pc-guard -fsanitize=address" \
> +#  CXXFLAGS="-fsanitize-coverage=trace-pc-guard -fsanitize=address" \
>  #  LIB_FUZZING_ENGINE=/usr/lib/llvm-4.0/lib/libFuzzer.a \
>  #  fuzz-all
>  #
>  .PHONY: fuzz-all

Maybe here?

>
>  $(FUZZ_PROGRAMS): all
> -   $(QUIET_LINK)$(CXX) $(CFLAGS) $(LIB_OBJS) $(BUILTIN_OBJS) \
> +   $(QUIET_LINK)$(CXX) $(CXXFLAGS) $(LIB_OBJS) $(BUILTIN_OBJS) \
> $(XDIFF_OBJS) $(EXTLIBS) git.o $@.o $(LIB_FUZZING_ENGINE) -o 
> $@

Thanks,
Stefan

Re: [PATCH v3 1/1] protocol: advertise multiple supported versions

2018-11-12 Thread Stefan Beller

On Mon, Nov 12, 2018 at 1:49 PM  wrote:
>
> Currently the client advertises that it supports the wire protocol
> version set in the protocol.version config. However, not all services
> support the same set of protocol versions. When connecting to
> git-receive-pack, the client automatically downgrades to v0 if
> config.protocol is set to v2, but this check is not performed for other
> services.
>
> This patch creates a protocol version registry. Individual operations
> register all the protocol versions they support prior to communicating
> with a server. Versions should be listed in preference order; the
> version specified in protocol.version will automatically be moved to the
> front of the registry.
>
> The protocol version registry is passed to remote helpers via the
> GIT_PROTOCOL environment variable.
>
> Clients now advertise the full list of registered versions. Servers
> select the first recognized version from this advertisement.
>
> Signed-off-by: Josh Steadmon 

Thanks for resending this patch!

+cc Jonathan Tan, who works a lot on the protocol side of things.

> +void register_allowed_protocol_version(enum protocol_version version)
> +{
> +   if (have_advertised_versions_already)
> +   BUG(_("attempting to register an allowed protocol version 
> after advertisement"));

If this is a real BUG (due to wrong program flow) instead of bad user input,
we would not want to burden the translators with this message.

If it is a message that the user is intended to see upon bad input
we'd rather go with die(_("translatable text here"));


> diff --git a/protocol.h b/protocol.h
> index 2ad35e433c..b67b2259de 100644
> --- a/protocol.h
> +++ b/protocol.h
> @@ -16,6 +16,23 @@ enum protocol_version {
>   */
>  extern enum protocol_version get_protocol_version_config(void);
>
> +/*
> + * Register an allowable protocol version for a given operation. Registration
> + * must occur before attempting to advertise a version to a server process.
> + */
> +extern void register_allowed_protocol_version(enum protocol_version version);

We keep extern here as to imitate the file-local style, although
Documentation/CodingGuidelines would prefer to not have the extern keywords.
Okay.

Bonus points for converting the file to omit all extern keywords in a
resend. :-)
(I think there is no other series currently in flight touching this header file,
so it is safe to convert it "while at it", `git log --grep "while at
it"` is shorter
than expected.)

All that said, it's only nits that I contributed to this code review;
the code/design
looks good to me, and if I were maintainer I'd include it as-is, as it
fixes a long(ish)
standing protocol error.

Thanks,
Stefan

Re: [PATCH 0/9] caching loose objects

2018-11-12 Thread Stefan Beller

On Mon, Nov 12, 2018 at 8:02 AM Derrick Stolee  wrote:

> This cleanup is actually really valuable, and affects much more than
> this application.

I second this. I'd value this series more for the cleanup than its
application. ;-)

Re: [PATCH 6/9] sha1-file: use an object_directory for the main object dir

2018-11-12 Thread Stefan Beller

On Mon, Nov 12, 2018 at 8:09 AM Jeff King  wrote:
>
> On Mon, Nov 12, 2018 at 10:48:36AM -0500, Derrick Stolee wrote:
>
> > > If the "the first one is the main store, the rest are alternates" bit is
> > > too subtle, we could mark each "struct object_directory" with a bit for
> > > "is_local".
> >
> > This is probably a good thing to do proactively. We have the equivalent in
> > the packed_git struct, but that's also because they get out of order. At the
> > moment, I can't think of a read-only action that needs to treat the local
> > object directory more carefully. The closest I know about is 'git
> > pack-objects --local', but that also writes a pack-file.
> >
> > I assume that when we write a pack-file to the "default location" we use
> > get_object_directory() instead of referring to the default object_directory?
>
> Generally, yes, though that should eventually be going away in favor of
> accessing it via a "struct repository". And after my series,
> get_object_directory() is just returning the_repository->objects->odb->path
> (i.e., using the "first one is main" rule).
>
> One thing that makes me nervous about a "local" flag in each struct is
> that it implies that it's the source of truth for where to write to. So
> what does git_object_directory() look like after that? Do we leave it
> with the "first one is main" rule? Or does it become:

s/git/get/ ;-)  get_object_directory is very old and was introduced in
e1b10391ea (Use git config file for committer name and email info,
2005-10-11) by Linus.

I would argue that we might want to get rid of that function now,
actually as it doesn't seem to add value to the code (assuming the
BUG never triggers), and using a_repo->objects->objectdir
or after this series a_repo->objects->odb->path; is just as short.

$ git grep get_object_directory |wc -l
30
$ git grep -- "->objects->objectdir"  |wc -l
10

Ah well, we're not there yet.

>   for (odb = the_repository->objects->odb; odb; odb = odb->next) {
> if (odb->local)
> return odb->path;
>   }
>   return NULL; /* yikes? */
>
> ? That feels like it's making things more complicated, not less.

It depends if the caller cares about the local flag.

I'd think we can have more than one local, eventually?
Just think of the partial clone stuff that may have a local
set of promised stuff and another set of actual objects,
which may be stored in different local odbs.

If the caller cares about the distinction, they would need
to write out this loop as above themselves.
If they don't care, we could migrate them to not
use this function, so we can get rid of it?

> > > -   for (odb = r->objects->alt_odb_list; odb; odb = odb->next) {
> > > -   prepare_multi_pack_index_one(r, odb->path, 0);
> > > -   prepare_packed_git_one(r, odb->path, 0);
> > > +   for (odb = r->objects->odb; odb; odb = odb->next) {
> > > +   int local = (odb == r->objects->odb);
> >
> > Here seems to be a place where `odb->is_local` would help.
>
> Yes, though I don't mind this spot in particular, as the check is pretty
> straight-forward.
>
> I think an example that would benefit more is the check_and_freshen()
> stuff. There we have two almost-the-same wrappers, one of which operates
> on just the first element of the list, and the other of which operates
> on all of the elements after the first.
>
> It could become:
>
>   static int check_and_freshen_odb(struct object_directory *odb_list,
>const struct object_id *oid,
>int freshen,
>int local)
>   {
> struct object_directory *odb;
>
> for (odb = odb_list; odb; odb = odb->next) {
> static struct strbuf path = STRBUF_INIT;
>
> if (odb->local != local)
> continue;
>
> odb_loose_path(odb, , oid->hash);
> return check_and_freshen_file(path.buf, freshen);
> }
>   }
>
>   int check_and_freshen_local(const struct object_id *oid, int freshen)
>   {
> return check_and_freshen_odb(the_repository->objects->odb, oid,
>  freshen, 1);
>   }
>
>   int check_and_freshen_nonlocal(const struct object_id *oid, int freshen)
>   {
> return check_and_freshen_odb(the_repository->objects->odb, oid,
>  freshen, 0);
>   }
>

I am fine with (a maybe better documented) "first is local" rule, but
the code above looks intriguing, except a little wasteful
(we need two full loops in check_and_freshen, but ideally we
can do by just one loop).

What does the local flag mean anyway in a world where we
have many odbs in a repository, that are not distinguishable
except by their order? AFAICT it is actually to be used for differentiating
how much we care in fsck/cat-file/packing, as it may be borrowed
from an alternate, so maybe the flag is rather to be named
after

Re: [PATCH 6/9] sha1-file: use an object_directory for the main object dir

2018-11-12 Thread Stefan Beller

On Mon, Nov 12, 2018 at 7:48 AM Derrick Stolee  wrote:
>
[... lots of quoted text...]

Some email readers are very good at recognizing unchanged quoted
text and collapse it, not so at
https://public-inbox.org/git/421d3b43-3425-72c9-218e-facd86e28...@gmail.com/
which I use to read through this series. It would help if you'd cut most
of the (con)text that is not nearby to your reply, as I read the context
email just before your reply.

Thanks,
Stefan

Re: [PATCH 2/9] submodule--helper: prefer strip_suffix() to ends_with()

2018-11-12 Thread Stefan Beller

On Mon, Nov 12, 2018 at 6:47 AM Jeff King  wrote:
>
> Using strip_suffix() lets us avoid repeating ourselves. It also makes
> the handling of "/" a bit less subtle (we strip one less character than
> we matched in order to leave it in place, but we can just as easily
> include the "/" when we add more path components).
>
> Signed-off-by: Jeff King 

This makes sense. Thanks!

(This patch caught my attention as it's a submodule thing,
but now looking at the rest of the series)

[PATCH] coccicheck: introduce 'pending' semantic patches

2018-11-09 Thread Stefan Beller

From: SZEDER Gábor 

Teach `make coccicheck` to avoid patches named "*.pending.cocci" and
handle them separately in a new `make coccicheck-pending` instead.
This means that we can separate "critical" patches from "FYI" patches.
The former target can continue causing Travis to fail its static
analysis job, while the latter can let us keep an eye on ongoing
(pending) transitions without them causing too much fallout.

Document the intended use-cases around these two targets.
As the process around the pending patches is not yet fully explored,
leave that out.

Based-on-work-by: SZEDER Gábor 
Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---

I dialed back on the workflow, as we may want to explore it first
before writing it down.

Stefan

 Makefile  |  7 +--
 contrib/coccinelle/README | 41 +++
 2 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index bbfbb4292d..6bc4654828 100644
--- a/Makefile
+++ b/Makefile
@@ -2740,9 +2740,12 @@ endif
then \
echo '' SPATCH result: $@; \
fi
-coccicheck: $(addsuffix .patch,$(wildcard contrib/coccinelle/*.cocci))
+coccicheck: $(addsuffix .patch,$(filter-out %.pending.cocci,$(wildcard 
contrib/coccinelle/*.cocci)))
 
-.PHONY: coccicheck
+# See contrib/coccinelle/README
+coccicheck-pending: $(addsuffix .patch,$(wildcard 
contrib/coccinelle/*.pending.cocci))
+
+.PHONY: coccicheck coccicheck-pending
 
 ### Installation rules
 
diff --git a/contrib/coccinelle/README b/contrib/coccinelle/README
index 9c2f8879c2..f0e80bd7f0 100644
--- a/contrib/coccinelle/README
+++ b/contrib/coccinelle/README
@@ -1,2 +1,43 @@
 This directory provides examples of Coccinelle (http://coccinelle.lip6.fr/)
 semantic patches that might be useful to developers.
+
+There are two types of semantic patches:
+
+ * Using the semantic transformation to check for bad patterns in the code;
+   The target 'make coccicheck' is designed to check for these patterns and
+   it is expected that any resulting patch indicates a regression.
+   The patches resulting from 'make coccicheck' are small and infrequent,
+   so once they are found, they can be sent to the mailing list as per usual.
+
+   Example for introducing new patterns:
+   67947c34ae (convert "hashcmp() != 0" to "!hasheq()", 2018-08-28)
+   b84c783882 (fsck: s/++i > 1/i++/, 2018-10-24)
+
+   Example of fixes using this approach:
+   248f66ed8e (run-command: use strbuf_addstr() for adding a string to
+   a strbuf, 2018-03-25)
+   f919ffebed (Use MOVE_ARRAY, 2018-01-22)
+
+   These types of semantic patches are usually part of testing, c.f.
+   0860a7641b (travis-ci: fail if Coccinelle static analysis found something
+   to transform, 2018-07-23)
+
+ * Using semantic transformations in large scale refactorings throughout
+   the code base.
+
+   When applying the semantic patch into a real patch, sending it to the
+   mailing list in the usual way, such a patch would be expected to have a
+   lot of textual and semantic conflicts as such large scale refactorings
+   change function signatures that are used widely in the code base.
+   A textual conflict would arise if surrounding code near any call of such
+   function changes. A semantic conflict arises when other patch series in
+   flight introduce calls to such functions.
+
+   So to aid these large scale refactorings, semantic patches can be used.
+   However we do not want to store them in the same place as the checks for
+   bad patterns, as then automated builds would fail.
+   That is why semantic patches 'contrib/coccinelle/*.pending.cocci'
+   are ignored for checks, and can be applied using 'make coccicheck-pending'.
+
+   This allows to expose plans of pending large scale refactorings without
+   impacting the bad pattern checks.
-- 
2.19.1.930.g4563a0d9d0-goog

Re: [PATCH] Makefile: add pending semantic patches

2018-11-09 Thread Stefan Beller

On Thu, Nov 8, 2018 at 9:18 PM Junio C Hamano  wrote:
>
> Stefan Beller  writes:
>
> > From: SZEDER Gábor 
> >
> > Add a description and place on how to use coccinelle for large refactorings
> > that happen only once.
> >
> > Based-on-work-by: SZEDER Gábor 
> > Signed-off-by: Stefan Beller 
> > ---
> >
> > I consider including this patch in a resend instead.
> > It outlays the basics of such a new workflow, which we can refine later.
>
> Thanks for tying loose ends.
>
> > diff --git a/contrib/coccinelle/README b/contrib/coccinelle/README
> > index 9c2f8879c2..fa09d1abcc 100644
> > --- a/contrib/coccinelle/README
> > +++ b/contrib/coccinelle/README
> > @@ -1,2 +1,62 @@
> >  This directory provides examples of Coccinelle (http://coccinelle.lip6.fr/)
> >  semantic patches that might be useful to developers.
> > +
> > +There are two types of semantic patches:
> > +
> > + * Using the semantic transformation to check for bad patterns in the code;
> > +   This is what the original target 'make coccicheck' is designed to do and
> > +   it is expected that any resulting patch indicates a regression.
> > +   The patches resulting from 'make coccicheck' are small and infrequent,
> > +   so once they are found, they can be sent to the mailing list as per 
> > usual.
> > +
> > +   Example for introducing new patterns:
> > +   67947c34ae (convert "hashcmp() != 0" to "!hasheq()", 2018-08-28)
> > +   b84c783882 (fsck: s/++i > 1/i++/, 2018-10-24)
> > +
> > +   Example of fixes using this approach:
> > +   248f66ed8e (run-command: use strbuf_addstr() for adding a string to
> > +   a strbuf, 2018-03-25)
> > +   f919ffebed (Use MOVE_ARRAY, 2018-01-22)
> > +
> > +   These types of semantic patches are usually part of testing, c.f.
> > +   0860a7641b (travis-ci: fail if Coccinelle static analysis found 
> > something
> > +   to transform, 2018-07-23)
>
> Yup, and I think what we have in 'pu' (including your the_repository
> stuff) falls into this category.

My impression was that the_repository is strongly second category
as the_repository.cocci doesn't fix bad smells of code, but proposes
a refactoring that we agreed on doing.

> I've been paying attention to "make coccicheck" produced patches for
> the past few weeks, and so far, I found it _much_ _much_ more
> pleasant, compared to having to worry about merge conflicts with the
> topics in flight that changes day to day (not just because we add
> new topics or update existing topics, but also the order of the
> merge changes as topics mature at different rates and jumps over
> each other in master..pu history), that "make coccicheck" after
> topics are merged to integration branches fixes these issues up as
> needed.

So from your end we would not need the "pending" category as long
as the follow ups come in a timely manner?

>
> > +   3) Apply the semantic patch only partially, where needed for the patch 
> > series
> > +  that motivates the large scale refactoring and then build that series
> > +  on top of it.
>
> It is not quite clear what "needed for the patch series" really
> means in the context of this paragraph.  What are the changes that
> are not needed, that is still produced if we ran "make coccicheck"?

An example for "needed" would be
3f21279f50..bd8737176b
whereas "not needed" is what is in
"treewide: apply cocci patch".

The treewide conversion of e.g. unuse_commit_buffer to
repo_unuse_commit_buffer is nice, but "needed" only in
its followup patch that converts logmsg_reencode as that
calls into the unuse_commit_buffer.

> Are they wrong changes (e.g. a carelessly written read_cache() to
> read_index(_index) conversion may munge the implementation of
> read_cache(...) { return read_index(_index, ...); } and make
> inifinite recursion)?  Or are they "correct but not immediately
> necessary" (e.g. because calling read_cache() does not hurt until
> that function gets removed, so rewriting the callers to call
> read_index() with _index may be correct but not immediately
> necessary)?

the latter. I assume correctness (of the semantic patch) to be a given,
but this is all about timing, i.e. how can I send the series without breaking
others in flight.

>
> > +  By applying the semantic patch only partially where needed, the 
> > series
> > +  is less likely to conflict with other series in flight.
>
> That is correct.
>
> > +  To make it possible to apply the semantic patch partially, there 
> > need

Re: [PATCH] Makefile: add pending semantic patches

2018-11-09 Thread Stefan Beller

On Thu, Nov 8, 2018 at 8:56 PM Martin Ågren  wrote:
> I haven't followed the original discussion too carefully, so I'll read
> this like someone new to the topic probably would.

Thanks!

> A nit, perhaps, but I was genuinely confused at first. The subject is
> "Makefile: add pending semantic patches", but this patch doesn't add
> any. It adds Makefile-support for such patches though, and it defines
> the entire concept of pending semantic patches. How about "coccicheck:
> introduce 'pending' semantic patches"?
>
> > Add a description and place on how to use coccinelle for large refactorings
> > that happen only once.
>
> A bit confused about "and place". Based on my understanding from reading
> the remainder of this patch, maybe:
>
>   Teach `make coccicheck` to avoid patches named "*.pending.cocci" and
>   handle them separately in a new `make coccicheck-pending` instead.
>   This means that we can separate "critical" patches from "FYI" patches.
>   The former target can continue causing Travis to fail its static
>   analysis job, while the latter can let us keep an eye on ongoing
>   (pending) transitions without them causing too much fallout.
>
>   Document the intended use-cases and processes around these two
>   targets.

Both suggested title and new commit message make sense.

>
> >  This directory provides examples of Coccinelle (http://coccinelle.lip6.fr/)
> >  semantic patches that might be useful to developers.
> > +
> > +There are two types of semantic patches:
> > +
> > + * Using the semantic transformation to check for bad patterns in the code;
> > +   This is what the original target 'make coccicheck' is designed to do and
>
> Is it relevant that this was the "original" target? (Genuine question.)

No. I can omit that part.

>
> > +   it is expected that any resulting patch indicates a regression.
> > +   The patches resulting from 'make coccicheck' are small and infrequent,
> > +   so once they are found, they can be sent to the mailing list as per 
> > usual.
> > +
> > +   Example for introducing new patterns:
> > +   67947c34ae (convert "hashcmp() != 0" to "!hasheq()", 2018-08-28)
> > +   b84c783882 (fsck: s/++i > 1/i++/, 2018-10-24)
> > +
> > +   Example of fixes using this approach:
> > +   248f66ed8e (run-command: use strbuf_addstr() for adding a string to
> > +   a strbuf, 2018-03-25)
> > +   f919ffebed (Use MOVE_ARRAY, 2018-01-22)
> > +
> > +   These types of semantic patches are usually part of testing, c.f.
> > +   0860a7641b (travis-ci: fail if Coccinelle static analysis found 
> > something
> > +   to transform, 2018-07-23)
>
> Very nicely described, nice with the examples to quickly give a feeling
> about how/when to use this.

Thanks.


>
> > + * Using semantic transformations in large scale refactorings throughout
> > +   the code base.
> > +
> > +   When applying the semantic patch into a real patch, sending it to the
> > +   mailing list in the usual way, such a patch would be expected to have a
> > +   lot of textual and semantic conflicts as such large scale refactorings
> > +   change function signatures that are used widely in the code base.
> > +   A textual conflict would arise if surrounding code near any call of such
> > +   function changes. A semantic conflict arises when other patch series in
> > +   flight introduce calls to such functions.
>
> OK, I'm with you.
>
> > +   So to aid these large scale refactorings, semantic patches can be used,
> > +   using the process as follows:
> > +
> > +   1) Figure out what kind of large scale refactoring we need
> > +  -> This is usually done by another unrelated series.
>
> "This"? The figuring out, or the refactoring? Also, "unrelated"?

The need and type of what kind of large scale refactoring are
usually determined by a patch series, that is not refactoring for the
sake of refactoring, but it wants to achieve a specific goal, unrelated
to large refactorings per se.

The large refactoring is just another tool that a developer can use
to make their original series happen much faster.

So "unrelated" == "not the large scale refactoring, as that may
come as an preparatory series, but to have a preparatory series
it may be good to showcase why we need the preparatory series"

>
> > +   2) Create the sematic patch needed for the large scale refactoring
>
> s/sematic/semantic/

yup.

>
> > +  and store it in contrib/coccinelle/*.pending.cocci
> > +  -> The suffix containing 'pending' is important to differentiate
> > +  this case from the other use case of checking for bad patterns.
>
> Good.
>
> > +   3) Apply the semantic patch only partially, where needed for the patch 
> > series
> > +  that motivates the large scale refactoring and then build that series
> > +  on top of it.
> > +  By applying the semantic patch only partially where needed, the 
> > series
> > +  is less likely to conflict with other series in flight.
> > +  To make it possible to apply the

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 7730 matches

Mail list logo