Re: [PATCH v3 1/3] read-cache: fix reading the shared index for other repos

2018-02-07 Thread Junio C Hamano
Junio C Hamano  writes:

>>> This was a bit painful change, given that some changes in flight do
>>> add new callsites to read_index_from() and they got the function
>>> changed under their feet.
>>
>> Sorry about that.  Is there any way to make such a change less painful
>> in the future?
>
> One way is to do for read_index_from() what you did for the existing
> users of read_cache_from().  Introduce a _new_ helper that will not
> be known for any existing topics in flight, and use that to make the
> existing API a thin wrapper around it.

This continues to cause pain simply because read_index_from() is
something new topics want to stay stable X-<.  

I'll be merging this to 'next' hopefully as part of today's
integration run, so will need to endure the pain only until it (and
other topics that conflict with it) all graduate to 'master'.

Next time a patch with an internal API change like this appears,
please remind me to push it back a lot stronger.  I was too lenient
and ended up slowing down the progress of other topics this time.
Thanks.




Re: [PATCH v3 1/3] read-cache: fix reading the shared index for other repos

2018-01-27 Thread Thomas Gummerer
On 01/21, Junio C Hamano wrote:
> Thomas Gummerer  writes:
> 
> > On 01/19, Junio C Hamano wrote:
> >> Thomas Gummerer  writes:
> >> 
> >> > read_cache_from() defaults to using the gitdir of the_repository.  As it
> >> > is mostly a convenience macro, having to pass get_git_dir() for every
> >> > call seems overkill, and if necessary users can have more control by
> >> > using read_index_from().
> >> 
> >> This was a bit painful change, given that some changes in flight do
> >> add new callsites to read_index_from() and they got the function
> >> changed under their feet.
> >
> > Sorry about that.  Is there any way to make such a change less painful
> > in the future?
> 
> One way is to do for read_index_from() what you did for the existing
> users of read_cache_from().  Introduce a _new_ helper that will not
> be known for any existing topics in flight, and use that to make the
> existing API a thin wrapper around it.

I'll do that next time.  My worries were just that the
'read_index_from()' API is broken with split index, which is what this
series fixes.  It would be easy to introduce new breakages if we're
not careful.

> I _think_ I got it right with evil merge, so unless this fix needs
> to be delayed for extended period of time for whatever reason while
> any more new callers of the function appears (which is unlikely), we
> should be OK ;-)

Thanks!  As mentioned there's one call that was added that the evil
merge didn't get quite right, for which I sent the diff below in my
previous email.  But I'm happy to fix that on top once this series
goes to master or next if that's preferred.

diff --git a/builtin/worktree.c b/builtin/worktree.c
index 6a49f9e628..4d86a3574f 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -612,7 +612,8 @@ static void validate_no_submodules(const struct worktree 
*wt)
struct index_state istate = {0};
int i, found_submodules = 0;
 
-   if (read_index_from(, worktree_git_path(wt, "index"), 
get_git_dir()) > 0) {
+   if (read_index_from(, worktree_git_path(wt, "index"),
+   get_worktree_git_dir(wt)) > 0) {
for (i = 0; i < istate.cache_nr; i++) {
struct cache_entry *ce = istate.cache[i];
 
diff --git a/t/t2028-worktree-move.sh b/t/t2028-worktree-move.sh
index b3105eaaed..8faf61bbf5 100755
--- a/t/t2028-worktree-move.sh
+++ b/t/t2028-worktree-move.sh
@@ -90,6 +90,16 @@ test_expect_success 'move main worktree' '
test_must_fail git worktree move . def
 '
 
+test_expect_success 'move worktree with split index' '
+   git worktree add test &&
+   (
+   cd test &&
+   test_commit file &&
+   git update-index --split-index
+   ) &&
+   git worktree move test test-destination
+'
+
 test_expect_success 'remove main worktree' '
test_must_fail git worktree remove .
 '

> Thanks.


Re: [PATCH v3 1/3] read-cache: fix reading the shared index for other repos

2018-01-21 Thread Junio C Hamano
Thomas Gummerer  writes:

> On 01/19, Junio C Hamano wrote:
>> Thomas Gummerer  writes:
>> 
>> > read_cache_from() defaults to using the gitdir of the_repository.  As it
>> > is mostly a convenience macro, having to pass get_git_dir() for every
>> > call seems overkill, and if necessary users can have more control by
>> > using read_index_from().
>> 
>> This was a bit painful change, given that some changes in flight do
>> add new callsites to read_index_from() and they got the function
>> changed under their feet.
>
> Sorry about that.  Is there any way to make such a change less painful
> in the future?

One way is to do for read_index_from() what you did for the existing
users of read_cache_from().  Introduce a _new_ helper that will not
be known for any existing topics in flight, and use that to make the
existing API a thin wrapper around it.

I _think_ I got it right with evil merge, so unless this fix needs
to be delayed for extended period of time for whatever reason while
any more new callers of the function appears (which is unlikely), we
should be OK ;-)

Thanks.


Re: [PATCH v3 1/3] read-cache: fix reading the shared index for other repos

2018-01-20 Thread Thomas Gummerer
On 01/19, Junio C Hamano wrote:
> Thomas Gummerer  writes:
> 
> > read_cache_from() defaults to using the gitdir of the_repository.  As it
> > is mostly a convenience macro, having to pass get_git_dir() for every
> > call seems overkill, and if necessary users can have more control by
> > using read_index_from().
> 
> This was a bit painful change, given that some changes in flight do
> add new callsites to read_index_from() and they got the function
> changed under their feet.

Sorry about that.  Is there any way to make such a change less painful
in the future?

> Please double check if I made the right git-dir to be passed to them
> when I push out 'pu' in a few hours.

I think one conversion was not quite correct, even though all tests
still pass with GIT_TEST_SPLIT_INDEX set.

The following diff fixes that conversation and has a test showing the
breakage:

diff --git a/builtin/worktree.c b/builtin/worktree.c
index 6a49f9e628..4d86a3574f 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -612,7 +612,8 @@ static void validate_no_submodules(const struct worktree 
*wt)
struct index_state istate = {0};
int i, found_submodules = 0;
 
-   if (read_index_from(, worktree_git_path(wt, "index"), 
get_git_dir()) > 0) {
+   if (read_index_from(, worktree_git_path(wt, "index"),
+   get_worktree_git_dir(wt)) > 0) {
for (i = 0; i < istate.cache_nr; i++) {
struct cache_entry *ce = istate.cache[i];
 
diff --git a/t/t2028-worktree-move.sh b/t/t2028-worktree-move.sh
index b3105eaaed..8faf61bbf5 100755
--- a/t/t2028-worktree-move.sh
+++ b/t/t2028-worktree-move.sh
@@ -90,6 +90,16 @@ test_expect_success 'move main worktree' '
test_must_fail git worktree move . def
 '
 
+test_expect_success 'move worktree with split index' '
+   git worktree add test &&
+   (
+   cd test &&
+   test_commit file &&
+   git update-index --split-index
+   ) &&
+   git worktree move test test-destination
+'
+
 test_expect_success 'remove main worktree' '
test_must_fail git worktree remove .
 '

With this applied what you have in pu looks good to me.  Does the
above help, or should I send this in another for for you to apply?

> Thanks.


Re: [PATCH v3 1/3] read-cache: fix reading the shared index for other repos

2018-01-19 Thread Junio C Hamano
Thomas Gummerer  writes:

> read_cache_from() defaults to using the gitdir of the_repository.  As it
> is mostly a convenience macro, having to pass get_git_dir() for every
> call seems overkill, and if necessary users can have more control by
> using read_index_from().

This was a bit painful change, given that some changes in flight do
add new callsites to read_index_from() and they got the function
changed under their feet.

Please double check if I made the right git-dir to be passed to them
when I push out 'pu' in a few hours.

Thanks.


Re: [PATCH v3 1/3] read-cache: fix reading the shared index for other repos

2018-01-18 Thread Duy Nguyen
On Thu, Jan 18, 2018 at 1:16 AM, Jonathan Nieder  wrote:
> Hi,
>
> Duy Nguyen wrote:
>> On Wed, Jan 17, 2018 at 4:42 AM, Brandon Williams  wrote:
>
>>>  IIUC Split index is an index extension
>>> that can be enabled to limit the size of the index file that is written
>>> when making changes to the index.  It breaks the index into two pieces,
>>> index (which contains only changes) and sharedindex.X (which
>>> contains unchanged information) where 'X' is a value found in the
>>> index file.  If we don't do anything fancy then these two files live
>>> next to one another in a repository's git directory at $GIT_DIR/index
>>> and $GIT_DIR/sharedindex.X.  This seems to work all well and fine
>>> except that this isn't always the case and the read_index_from function
>>> takes this into account by enabling a caller to specify a path to where
>>> the index file is located.  We can do this by specifying the index file
>>> we want to use by setting GIT_INDEX_FILE.
> [...]
>>> In this case if i were to specify a location of an
>>> index file in my home directory '~/index' and be using the split index
>>> feature then the corresponding sharedindex file would live in my
>>> repository's git directory '~/project/.git/sharedindex.X'.  So the
>>> sharedindex file is always located relative to the project's git
>>> directory and not the index file itself, which is kind of confusing.
>>> Maybe a better design would be to have the sharedindex file located
>>> relative to the index file.
>>
>> That adds more problems. Now when you move the index file around you
>> have to move the shared index file too (think about atomic rename
>> which we use in plenty of places, we can't achieve that by moving two
>> files). A new dependency to $GIT_DIR is not that confusing to me, the
>> index file is useless anyway if you don't have access to
>> $GIT_DIR/objects. There was always the option to _not_ split the index
>> when $GIT_INDEX_FILE is specified, I think I did consider that but I
>> dropped it because we'd lose the performance gain by splitting.
>
> Can you elaborate a little more on this?
>
> At first glance, it seems simpler to say "paths in index extensions
> named in the index file are relative to the location of the index
> file" and to make moving the index file also require moving the shared
> index file, exactly as you say.  So at least from a "principle of
> least surprise" perspective I would be tempted to go that way.
>
> It's true that we rely on atomic rename in plenty of places, but only
> within a directory.  (Filesystem boundaries, NFS, etc mean that atomic
> renames across directories are a lost cause.)
>
> Fortunately index files (including temp index files used by scripts)
> tend to only be in $GIT_DIR, for exactly that reason.  So I am
> wondering if switching to index-file-relative semantics would be an
> invasive move and what the pros and cons of such a move are.

I think it gets messier. Now you have to move two files. If the first
move succeeds but the second one fails, recovery may involve un-move
the first file, but its old content is already gone. We probably can
get around that. But since the shared index is assumed big and heavy,
I just went with "store it in the place it's going to be and never
move it anywhere ever (until nobody uses it then it's deleted)"
-- 
Duy


Re: [PATCH v3 1/3] read-cache: fix reading the shared index for other repos

2018-01-17 Thread Jonathan Nieder
Hi,

Duy Nguyen wrote:
> On Wed, Jan 17, 2018 at 4:42 AM, Brandon Williams  wrote:

>>  IIUC Split index is an index extension
>> that can be enabled to limit the size of the index file that is written
>> when making changes to the index.  It breaks the index into two pieces,
>> index (which contains only changes) and sharedindex.X (which
>> contains unchanged information) where 'X' is a value found in the
>> index file.  If we don't do anything fancy then these two files live
>> next to one another in a repository's git directory at $GIT_DIR/index
>> and $GIT_DIR/sharedindex.X.  This seems to work all well and fine
>> except that this isn't always the case and the read_index_from function
>> takes this into account by enabling a caller to specify a path to where
>> the index file is located.  We can do this by specifying the index file
>> we want to use by setting GIT_INDEX_FILE.
[...]
>> In this case if i were to specify a location of an
>> index file in my home directory '~/index' and be using the split index
>> feature then the corresponding sharedindex file would live in my
>> repository's git directory '~/project/.git/sharedindex.X'.  So the
>> sharedindex file is always located relative to the project's git
>> directory and not the index file itself, which is kind of confusing.
>> Maybe a better design would be to have the sharedindex file located
>> relative to the index file.
>
> That adds more problems. Now when you move the index file around you
> have to move the shared index file too (think about atomic rename
> which we use in plenty of places, we can't achieve that by moving two
> files). A new dependency to $GIT_DIR is not that confusing to me, the
> index file is useless anyway if you don't have access to
> $GIT_DIR/objects. There was always the option to _not_ split the index
> when $GIT_INDEX_FILE is specified, I think I did consider that but I
> dropped it because we'd lose the performance gain by splitting.

Can you elaborate a little more on this?

At first glance, it seems simpler to say "paths in index extensions
named in the index file are relative to the location of the index
file" and to make moving the index file also require moving the shared
index file, exactly as you say.  So at least from a "principle of
least surprise" perspective I would be tempted to go that way.

It's true that we rely on atomic rename in plenty of places, but only
within a directory.  (Filesystem boundaries, NFS, etc mean that atomic
renames across directories are a lost cause.)

Fortunately index files (including temp index files used by scripts)
tend to only be in $GIT_DIR, for exactly that reason.  So I am
wondering if switching to index-file-relative semantics would be an
invasive move and what the pros and cons of such a move are.

Thanks,
Jonathan


Re: [PATCH v3 1/3] read-cache: fix reading the shared index for other repos

2018-01-16 Thread Brandon Williams
On 01/17, Duy Nguyen wrote:
> On Wed, Jan 17, 2018 at 4:42 AM, Brandon Williams  wrote:
> > On 01/07, Thomas Gummerer wrote:
> >> read_index_from() takes a path argument for the location of the index
> >> file.  For reading the shared index in split index mode however it just
> >> ignores that path argument, and reads it from the gitdir of the current
> >> repository.
> >>
> >> This works as long as an index in the_repository is read.  Once that
> >> changes, such as when we read the index of a submodule, or of a
> >> different working tree than the current one, the gitdir of
> >> the_repository will no longer contain the appropriate shared index,
> >> and git will fail to read it.
> >>
> >> For example t3007-ls-files-recurse-submodules.sh was broken with
> >> GIT_TEST_SPLIT_INDEX set in 188dce131f ("ls-files: use repository
> >> object", 2017-06-22), and t7814-grep-recurse-submodules.sh was also
> >> broken in a similar manner, probably by introducing struct repository
> >> there, although I didn't track down the exact commit for that.
> >>
> >> be489d02d2 ("revision.c: --indexed-objects add objects from all
> >> worktrees", 2017-08-23) breaks with split index mode in a similar
> >> manner, not erroring out when it can't read the index, but instead
> >> carrying on with pruning, without taking the index of the worktree into
> >> account.
> >>
> >> Fix this by passing an additional gitdir parameter to read_index_from,
> >> to indicate where it should look for and read the shared index from.
> >>
> >> read_cache_from() defaults to using the gitdir of the_repository.  As it
> >> is mostly a convenience macro, having to pass get_git_dir() for every
> >> call seems overkill, and if necessary users can have more control by
> >> using read_index_from().
> >
> > I'm not saying we need to change this again but I got to thinking about
> > what the root cause for this bug is and I think it's a design flaw in
> > how split index is implemented.  IIUC Split index is an index extension
> > that can be enabled to limit the size of the index file that is written
> > when making changes to the index.  It breaks the index into two pieces,
> > index (which contains only changes) and sharedindex.X (which
> > contains unchanged information) where 'X' is a value found in the
> > index file.  If we don't do anything fancy then these two files live
> > next to one another in a repository's git directory at $GIT_DIR/index
> > and $GIT_DIR/sharedindex.X.  This seems to work all well and fine
> > except that this isn't always the case and the read_index_from function
> > takes this into account by enabling a caller to specify a path to where
> > the index file is located.  We can do this by specifying the index file
> > we want to use by setting GIT_INDEX_FILE.
> >
> > Being able to specify an index file via the environment is a feature
> > that has existed for a very long time (one that I personally think makes
> > things difficult because of things like additions like the sharedindex)
> > and I don't think was taken into account when introducing the split
> > index extension.
> 
> It was (partly because I did use GIT_INDEX_FILE on occasions).
> 
> > In this case if i were to specify a location of an
> > index file in my home directory '~/index' and be using the split index
> > feature then the corresponding sharedindex file would live in my
> > repository's git directory '~/project/.git/sharedindex.X'.  So the
> > sharedindex file is always located relative to the project's git
> > directory and not the index file itself, which is kind of confusing.
> > Maybe a better design would be to have the sharedindex file located
> > relative to the index file.
> 
> That adds more problems. Now when you move the index file around you
> have to move the shared index file too (think about atomic rename
> which we use in plenty of places, we can't achieve that by moving two
> files). A new dependency to $GIT_DIR is not that confusing to me, the
> index file is useless anyway if you don't have access to
> $GIT_DIR/objects. There was always the option to _not_ split the index
> when $GIT_INDEX_FILE is specified, I think I did consider that but I
> dropped it because we'd lose the performance gain by splitting.
> 
> > Anyway, some food for thought.
> >
> > --
> > Brandon Williams
> 
> 
> 
> -- 
> Duy

Thanks for giving me some more context :) makes me feel more confident
in the solution that's already been proposed.

-- 
Brandon Williams


Re: [PATCH v3 1/3] read-cache: fix reading the shared index for other repos

2018-01-16 Thread Duy Nguyen
On Wed, Jan 17, 2018 at 4:42 AM, Brandon Williams  wrote:
> On 01/07, Thomas Gummerer wrote:
>> read_index_from() takes a path argument for the location of the index
>> file.  For reading the shared index in split index mode however it just
>> ignores that path argument, and reads it from the gitdir of the current
>> repository.
>>
>> This works as long as an index in the_repository is read.  Once that
>> changes, such as when we read the index of a submodule, or of a
>> different working tree than the current one, the gitdir of
>> the_repository will no longer contain the appropriate shared index,
>> and git will fail to read it.
>>
>> For example t3007-ls-files-recurse-submodules.sh was broken with
>> GIT_TEST_SPLIT_INDEX set in 188dce131f ("ls-files: use repository
>> object", 2017-06-22), and t7814-grep-recurse-submodules.sh was also
>> broken in a similar manner, probably by introducing struct repository
>> there, although I didn't track down the exact commit for that.
>>
>> be489d02d2 ("revision.c: --indexed-objects add objects from all
>> worktrees", 2017-08-23) breaks with split index mode in a similar
>> manner, not erroring out when it can't read the index, but instead
>> carrying on with pruning, without taking the index of the worktree into
>> account.
>>
>> Fix this by passing an additional gitdir parameter to read_index_from,
>> to indicate where it should look for and read the shared index from.
>>
>> read_cache_from() defaults to using the gitdir of the_repository.  As it
>> is mostly a convenience macro, having to pass get_git_dir() for every
>> call seems overkill, and if necessary users can have more control by
>> using read_index_from().
>
> I'm not saying we need to change this again but I got to thinking about
> what the root cause for this bug is and I think it's a design flaw in
> how split index is implemented.  IIUC Split index is an index extension
> that can be enabled to limit the size of the index file that is written
> when making changes to the index.  It breaks the index into two pieces,
> index (which contains only changes) and sharedindex.X (which
> contains unchanged information) where 'X' is a value found in the
> index file.  If we don't do anything fancy then these two files live
> next to one another in a repository's git directory at $GIT_DIR/index
> and $GIT_DIR/sharedindex.X.  This seems to work all well and fine
> except that this isn't always the case and the read_index_from function
> takes this into account by enabling a caller to specify a path to where
> the index file is located.  We can do this by specifying the index file
> we want to use by setting GIT_INDEX_FILE.
>
> Being able to specify an index file via the environment is a feature
> that has existed for a very long time (one that I personally think makes
> things difficult because of things like additions like the sharedindex)
> and I don't think was taken into account when introducing the split
> index extension.

It was (partly because I did use GIT_INDEX_FILE on occasions).

> In this case if i were to specify a location of an
> index file in my home directory '~/index' and be using the split index
> feature then the corresponding sharedindex file would live in my
> repository's git directory '~/project/.git/sharedindex.X'.  So the
> sharedindex file is always located relative to the project's git
> directory and not the index file itself, which is kind of confusing.
> Maybe a better design would be to have the sharedindex file located
> relative to the index file.

That adds more problems. Now when you move the index file around you
have to move the shared index file too (think about atomic rename
which we use in plenty of places, we can't achieve that by moving two
files). A new dependency to $GIT_DIR is not that confusing to me, the
index file is useless anyway if you don't have access to
$GIT_DIR/objects. There was always the option to _not_ split the index
when $GIT_INDEX_FILE is specified, I think I did consider that but I
dropped it because we'd lose the performance gain by splitting.

> Anyway, some food for thought.
>
> --
> Brandon Williams



-- 
Duy


Re: [PATCH v3 1/3] read-cache: fix reading the shared index for other repos

2018-01-16 Thread Brandon Williams
On 01/07, Thomas Gummerer wrote:
> read_index_from() takes a path argument for the location of the index
> file.  For reading the shared index in split index mode however it just
> ignores that path argument, and reads it from the gitdir of the current
> repository.
> 
> This works as long as an index in the_repository is read.  Once that
> changes, such as when we read the index of a submodule, or of a
> different working tree than the current one, the gitdir of
> the_repository will no longer contain the appropriate shared index,
> and git will fail to read it.
> 
> For example t3007-ls-files-recurse-submodules.sh was broken with
> GIT_TEST_SPLIT_INDEX set in 188dce131f ("ls-files: use repository
> object", 2017-06-22), and t7814-grep-recurse-submodules.sh was also
> broken in a similar manner, probably by introducing struct repository
> there, although I didn't track down the exact commit for that.
> 
> be489d02d2 ("revision.c: --indexed-objects add objects from all
> worktrees", 2017-08-23) breaks with split index mode in a similar
> manner, not erroring out when it can't read the index, but instead
> carrying on with pruning, without taking the index of the worktree into
> account.
> 
> Fix this by passing an additional gitdir parameter to read_index_from,
> to indicate where it should look for and read the shared index from.
> 
> read_cache_from() defaults to using the gitdir of the_repository.  As it
> is mostly a convenience macro, having to pass get_git_dir() for every
> call seems overkill, and if necessary users can have more control by
> using read_index_from().

I'm not saying we need to change this again but I got to thinking about
what the root cause for this bug is and I think it's a design flaw in
how split index is implemented.  IIUC Split index is an index extension
that can be enabled to limit the size of the index file that is written
when making changes to the index.  It breaks the index into two pieces,
index (which contains only changes) and sharedindex.X (which
contains unchanged information) where 'X' is a value found in the
index file.  If we don't do anything fancy then these two files live
next to one another in a repository's git directory at $GIT_DIR/index
and $GIT_DIR/sharedindex.X.  This seems to work all well and fine
except that this isn't always the case and the read_index_from function
takes this into account by enabling a caller to specify a path to where
the index file is located.  We can do this by specifying the index file
we want to use by setting GIT_INDEX_FILE.

Being able to specify an index file via the environment is a feature
that has existed for a very long time (one that I personally think makes
things difficult because of things like additions like the sharedindex)
and I don't think was taken into account when introducing the split
index extension.  In this case if i were to specify a location of an
index file in my home directory '~/index' and be using the split index
feature then the corresponding sharedindex file would live in my
repository's git directory '~/project/.git/sharedindex.X'.  So the
sharedindex file is always located relative to the project's git
directory and not the index file itself, which is kind of confusing.
Maybe a better design would be to have the sharedindex file located
relative to the index file.

Anyway, some food for thought.

-- 
Brandon Williams


Re: [PATCH v3 1/3] read-cache: fix reading the shared index for other repos

2018-01-13 Thread Thomas Gummerer
On 01/08, Thomas Gummerer wrote:
> On 01/08, Duy Nguyen wrote:
> > On Mon, Jan 8, 2018 at 5:30 AM, Thomas Gummerer  
> > wrote:
> > > @@ -1896,16 +1895,17 @@ int read_index_from(struct index_state *istate, 
> > > const char *path)
> > > split_index->base = xcalloc(1, 
> > > sizeof(*split_index->base));
> > >
> > > base_sha1_hex = sha1_to_hex(split_index->base_sha1);
> > > -   base_path = git_path("sharedindex.%s", base_sha1_hex);
> > > +   base_path = xstrfmt("%s/sharedindex.%s", gitdir, base_sha1_hex);
> > 
> > Personally I prefer the repo_git_path() from v2 (sorry I was away and
> > could not comment anything).
> 
> It felt slightly nicer to me as well, but it threw up a bunch of
> questions about how worktrees will fit together with struct
> repository.  As I don't feel very strongly about this either way I
> decided to go with what Brandon suggested as an alternative, which
> allows us to defer that decision.  I'd be happy to revert this to the
> way I had it in v2, but I don't want to drag the discussion on too
> long either, as this series does fix some real regressions.
> 
> > The thing is, git_path() and friends
> > could do some path translation underneath to support multiple
> > worktrees. Think of the given path here as a "virtual path" that may
> > be translated to something else, not exactly  + "/" +
> > "sharedindex.%s". But in practice, we're not breaking the relationship
> > between $GIT_DIR/index and $GIT_DIR/sharedindex.* any time soon, doing
> > manual path transformation here is fine.
> >
> > What about the other git_path() in this file? With patch applied I still get
> > 
> > > ~/w/git/temp $ git grep git_path read-cache.c
> > read-cache.c:   shared_index_path = git_path("%s", de->d_name);
> > read-cache.c:   temp = mks_tempfile(git_path("sharedindex_XX"));
> > read-cache.c: git_path("sharedindex.%s",
> > sha1_to_hex(si->base->sha1)));
> > read-cache.c:   const char *shared_index = 
> > git_path("sharedindex.%s",
> 
> Good point, I hadn't looked at these, I only looked at the current
> test failures.  I'm going to be away for the rest of the week, but
> I'll have a look at them when I come back.
> 
> > I suppose submodule has not triggered any of these code paths yet. Not
> > sure if we should deal with them now or wait until later.

Having had a look at these now, they are all in the write_locked_index
codepath.  We should probably have something like
'repo_write_locked_index()' for this.  But that probably requires a
bit more work/discussion to see what this should look like.  I'd
rather keep this patch series focused on the current breakages, and
deal with that in a separate patch series.

While looking at this, I did find another breakage in the split index
code, which I'll send as 4/3.

> > Perhaps if we add a "struct repository *" pointer inside index_state,
> > we could retrieve back the_repository (or others) and call
> > repo_git_path() everywhere without changing index api too much. I
> > don't know. I like the  'struct repository' concept but couldn't
> > follow its development so I don't if this is what it should become.
> 
> Interesting.  I didn't follow the development of  struct repository
> too closely either, so I'm not sure.  Brandon might have more of an
> opinion on that? :)
> 
> > -- 
> > Duy


Re: [PATCH v3 1/3] read-cache: fix reading the shared index for other repos

2018-01-08 Thread Duy Nguyen
On Tue, Jan 9, 2018 at 6:38 AM, Brandon Williams  wrote:
> On 01/08, Duy Nguyen wrote:
>> On Mon, Jan 8, 2018 at 5:30 AM, Thomas Gummerer  wrote:
>> > @@ -1896,16 +1895,17 @@ int read_index_from(struct index_state *istate, 
>> > const char *path)
>> > split_index->base = xcalloc(1, sizeof(*split_index->base));
>> >
>> > base_sha1_hex = sha1_to_hex(split_index->base_sha1);
>> > -   base_path = git_path("sharedindex.%s", base_sha1_hex);
>> > +   base_path = xstrfmt("%s/sharedindex.%s", gitdir, base_sha1_hex);
>>
>> Personally I prefer the repo_git_path() from v2 (sorry I was away and
>> could not comment anything). The thing is, git_path() and friends
>> could do some path translation underneath to support multiple
>> worktrees. Think of the given path here as a "virtual path" that may
>> be translated to something else, not exactly  + "/" +
>> "sharedindex.%s". But in practice, we're not breaking the relationship
>> between $GIT_DIR/index and $GIT_DIR/sharedindex.* any time soon, doing
>> manual path transformation here is fine.
>
> My biggest complaint about v2 is that we still don't quite know the best
> way to integrate worktrees and struct repository yet so I was very
> reluctant to start having them interact in the way v2 was using them
> together.

This will be on my todo list (after I have finished with
nd/worktree-move which has been on 'pu' for too long). I'm ok with v3
then.

> I'm very much in favor of this version (v3) as each worktree
> can explicitly provide their gitdir to be used to determine where to
> read the shared index file without having to replicate a struct
> repository for each.

Isn't that the end goal though (I vaguely recall this discussion when
'struct repository' was an idea), that we will pass struct repository
around [1] rather than relying on global objects.

[1] or in this case struct worktree-something because the index
belongs more to a worktree than the object/refs database.

>> What about the other git_path() in this file? With patch applied I still get
>>
>> > ~/w/git/temp $ git grep git_path read-cache.c
>> read-cache.c:   shared_index_path = git_path("%s", de->d_name);
>> read-cache.c:   temp = mks_tempfile(git_path("sharedindex_XX"));
>> read-cache.c: git_path("sharedindex.%s",
>> sha1_to_hex(si->base->sha1)));
>> read-cache.c:   const char *shared_index = git_path("sharedindex.%s",
>>
>> I suppose submodule has not triggered any of these code paths yet. Not
>> sure if we should deal with them now or wait until later.
>>
>> Perhaps if we add a "struct repository *" pointer inside index_state,
>> we could retrieve back the_repository (or others) and call
>> repo_git_path() everywhere without changing index api too much. I
>> don't know. I like the  'struct repository' concept but couldn't
>> follow its development so I don't if this is what it should become.
>
> I'm not too keen on having an index_state struct contain a back pointer
> to a repository struct.  I do think that we may want to have worktree
> structs contain a back pointer to the struct repository they correspond
> to.  That way there is only one instance of the repository (and
> object-store once that gets integrated) yet multiple worktrees.

Yeah back pointer from worktree struct makes sense.
-- 
Duy


Re: [PATCH v3 1/3] read-cache: fix reading the shared index for other repos

2018-01-08 Thread Brandon Williams
On 01/08, Duy Nguyen wrote:
> On Mon, Jan 8, 2018 at 5:30 AM, Thomas Gummerer  wrote:
> > @@ -1896,16 +1895,17 @@ int read_index_from(struct index_state *istate, 
> > const char *path)
> > split_index->base = xcalloc(1, sizeof(*split_index->base));
> >
> > base_sha1_hex = sha1_to_hex(split_index->base_sha1);
> > -   base_path = git_path("sharedindex.%s", base_sha1_hex);
> > +   base_path = xstrfmt("%s/sharedindex.%s", gitdir, base_sha1_hex);
> 
> Personally I prefer the repo_git_path() from v2 (sorry I was away and
> could not comment anything). The thing is, git_path() and friends
> could do some path translation underneath to support multiple
> worktrees. Think of the given path here as a "virtual path" that may
> be translated to something else, not exactly  + "/" +
> "sharedindex.%s". But in practice, we're not breaking the relationship
> between $GIT_DIR/index and $GIT_DIR/sharedindex.* any time soon, doing
> manual path transformation here is fine.

My biggest complaint about v2 is that we still don't quite know the best
way to integrate worktrees and struct repository yet so I was very
reluctant to start having them interact in the way v2 was using them
together.  I'm very much in favor of this version (v3) as each worktree
can explicitly provide their gitdir to be used to determine where to
read the shared index file without having to replicate a struct
repository for each.

> 
> What about the other git_path() in this file? With patch applied I still get
> 
> > ~/w/git/temp $ git grep git_path read-cache.c
> read-cache.c:   shared_index_path = git_path("%s", de->d_name);
> read-cache.c:   temp = mks_tempfile(git_path("sharedindex_XX"));
> read-cache.c: git_path("sharedindex.%s",
> sha1_to_hex(si->base->sha1)));
> read-cache.c:   const char *shared_index = git_path("sharedindex.%s",
> 
> I suppose submodule has not triggered any of these code paths yet. Not
> sure if we should deal with them now or wait until later.
> 
> Perhaps if we add a "struct repository *" pointer inside index_state,
> we could retrieve back the_repository (or others) and call
> repo_git_path() everywhere without changing index api too much. I
> don't know. I like the  'struct repository' concept but couldn't
> follow its development so I don't if this is what it should become.

I'm not too keen on having an index_state struct contain a back pointer
to a repository struct.  I do think that we may want to have worktree
structs contain a back pointer to the struct repository they correspond
to.  That way there is only one instance of the repository (and
object-store once that gets integrated) yet multiple worktrees.

-- 
Brandon Williams


Re: [PATCH v3 1/3] read-cache: fix reading the shared index for other repos

2018-01-08 Thread Thomas Gummerer
On 01/08, Duy Nguyen wrote:
> On Mon, Jan 8, 2018 at 5:30 AM, Thomas Gummerer  wrote:
> > @@ -1896,16 +1895,17 @@ int read_index_from(struct index_state *istate, 
> > const char *path)
> > split_index->base = xcalloc(1, sizeof(*split_index->base));
> >
> > base_sha1_hex = sha1_to_hex(split_index->base_sha1);
> > -   base_path = git_path("sharedindex.%s", base_sha1_hex);
> > +   base_path = xstrfmt("%s/sharedindex.%s", gitdir, base_sha1_hex);
> 
> Personally I prefer the repo_git_path() from v2 (sorry I was away and
> could not comment anything).

It felt slightly nicer to me as well, but it threw up a bunch of
questions about how worktrees will fit together with struct
repository.  As I don't feel very strongly about this either way I
decided to go with what Brandon suggested as an alternative, which
allows us to defer that decision.  I'd be happy to revert this to the
way I had it in v2, but I don't want to drag the discussion on too
long either, as this series does fix some real regressions.

> The thing is, git_path() and friends
> could do some path translation underneath to support multiple
> worktrees. Think of the given path here as a "virtual path" that may
> be translated to something else, not exactly  + "/" +
> "sharedindex.%s". But in practice, we're not breaking the relationship
> between $GIT_DIR/index and $GIT_DIR/sharedindex.* any time soon, doing
> manual path transformation here is fine.
>
> What about the other git_path() in this file? With patch applied I still get
> 
> > ~/w/git/temp $ git grep git_path read-cache.c
> read-cache.c:   shared_index_path = git_path("%s", de->d_name);
> read-cache.c:   temp = mks_tempfile(git_path("sharedindex_XX"));
> read-cache.c: git_path("sharedindex.%s",
> sha1_to_hex(si->base->sha1)));
> read-cache.c:   const char *shared_index = git_path("sharedindex.%s",

Good point, I hadn't looked at these, I only looked at the current
test failures.  I'm going to be away for the rest of the week, but
I'll have a look at them when I come back.

> I suppose submodule has not triggered any of these code paths yet. Not
> sure if we should deal with them now or wait until later.
> 
> Perhaps if we add a "struct repository *" pointer inside index_state,
> we could retrieve back the_repository (or others) and call
> repo_git_path() everywhere without changing index api too much. I
> don't know. I like the  'struct repository' concept but couldn't
> follow its development so I don't if this is what it should become.

Interesting.  I didn't follow the development of  struct repository
too closely either, so I'm not sure.  Brandon might have more of an
opinion on that? :)

> -- 
> Duy


Re: [PATCH v3 1/3] read-cache: fix reading the shared index for other repos

2018-01-08 Thread Duy Nguyen
On Mon, Jan 8, 2018 at 5:30 AM, Thomas Gummerer  wrote:
> @@ -1896,16 +1895,17 @@ int read_index_from(struct index_state *istate, const 
> char *path)
> split_index->base = xcalloc(1, sizeof(*split_index->base));
>
> base_sha1_hex = sha1_to_hex(split_index->base_sha1);
> -   base_path = git_path("sharedindex.%s", base_sha1_hex);
> +   base_path = xstrfmt("%s/sharedindex.%s", gitdir, base_sha1_hex);

Personally I prefer the repo_git_path() from v2 (sorry I was away and
could not comment anything). The thing is, git_path() and friends
could do some path translation underneath to support multiple
worktrees. Think of the given path here as a "virtual path" that may
be translated to something else, not exactly  + "/" +
"sharedindex.%s". But in practice, we're not breaking the relationship
between $GIT_DIR/index and $GIT_DIR/sharedindex.* any time soon, doing
manual path transformation here is fine.

What about the other git_path() in this file? With patch applied I still get

> ~/w/git/temp $ git grep git_path read-cache.c
read-cache.c:   shared_index_path = git_path("%s", de->d_name);
read-cache.c:   temp = mks_tempfile(git_path("sharedindex_XX"));
read-cache.c: git_path("sharedindex.%s",
sha1_to_hex(si->base->sha1)));
read-cache.c:   const char *shared_index = git_path("sharedindex.%s",

I suppose submodule has not triggered any of these code paths yet. Not
sure if we should deal with them now or wait until later.

Perhaps if we add a "struct repository *" pointer inside index_state,
we could retrieve back the_repository (or others) and call
repo_git_path() everywhere without changing index api too much. I
don't know. I like the  'struct repository' concept but couldn't
follow its development so I don't if this is what it should become.
-- 
Duy


[PATCH v3 1/3] read-cache: fix reading the shared index for other repos

2018-01-07 Thread Thomas Gummerer
read_index_from() takes a path argument for the location of the index
file.  For reading the shared index in split index mode however it just
ignores that path argument, and reads it from the gitdir of the current
repository.

This works as long as an index in the_repository is read.  Once that
changes, such as when we read the index of a submodule, or of a
different working tree than the current one, the gitdir of
the_repository will no longer contain the appropriate shared index,
and git will fail to read it.

For example t3007-ls-files-recurse-submodules.sh was broken with
GIT_TEST_SPLIT_INDEX set in 188dce131f ("ls-files: use repository
object", 2017-06-22), and t7814-grep-recurse-submodules.sh was also
broken in a similar manner, probably by introducing struct repository
there, although I didn't track down the exact commit for that.

be489d02d2 ("revision.c: --indexed-objects add objects from all
worktrees", 2017-08-23) breaks with split index mode in a similar
manner, not erroring out when it can't read the index, but instead
carrying on with pruning, without taking the index of the worktree into
account.

Fix this by passing an additional gitdir parameter to read_index_from,
to indicate where it should look for and read the shared index from.

read_cache_from() defaults to using the gitdir of the_repository.  As it
is mostly a convenience macro, having to pass get_git_dir() for every
call seems overkill, and if necessary users can have more control by
using read_index_from().

Helped-by: Brandon Williams 
Signed-off-by: Thomas Gummerer 
---
 cache-tree.c |  2 +-
 cache.h  |  5 +++--
 read-cache.c | 23 +--
 repository.c |  2 +-
 revision.c   |  3 ++-
 5 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/cache-tree.c b/cache-tree.c
index e03e72c34a..0dd6292a94 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -608,7 +608,7 @@ int write_index_as_tree(unsigned char *sha1, struct 
index_state *index_state, co
 
hold_lock_file_for_update(_file, index_path, LOCK_DIE_ON_ERROR);
 
-   entries = read_index_from(index_state, index_path);
+   entries = read_index_from(index_state, index_path, get_git_dir());
if (entries < 0) {
ret = WRITE_TREE_UNREADABLE_INDEX;
goto out;
diff --git a/cache.h b/cache.h
index d8b975a571..d5a7d1d7f1 100644
--- a/cache.h
+++ b/cache.h
@@ -371,7 +371,7 @@ extern void free_name_hash(struct index_state *istate);
 #define active_cache_tree (the_index.cache_tree)
 
 #define read_cache() read_index(_index)
-#define read_cache_from(path) read_index_from(_index, (path))
+#define read_cache_from(path) read_index_from(_index, (path), 
(get_git_dir()))
 #define read_cache_preload(pathspec) read_index_preload(_index, (pathspec))
 #define is_cache_unborn() is_index_unborn(_index)
 #define read_cache_unmerged() read_index_unmerged(_index)
@@ -616,7 +616,8 @@ extern int read_index(struct index_state *);
 extern int read_index_preload(struct index_state *, const struct pathspec 
*pathspec);
 extern int do_read_index(struct index_state *istate, const char *path,
 int must_exist); /* for testting only! */
-extern int read_index_from(struct index_state *, const char *path);
+extern int read_index_from(struct index_state *, const char *path,
+  const char *gitdir);
 extern int is_index_unborn(struct index_state *);
 extern int read_index_unmerged(struct index_state *);
 
diff --git a/read-cache.c b/read-cache.c
index 2eb81a66b9..3f5b4afa36 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1603,7 +1603,7 @@ int hold_locked_index(struct lock_file *lk, int 
lock_flags)
 
 int read_index(struct index_state *istate)
 {
-   return read_index_from(istate, get_index_file());
+   return read_index_from(istate, get_index_file(), get_git_dir());
 }
 
 static struct cache_entry *cache_entry_from_ondisk(struct ondisk_cache_entry 
*ondisk,
@@ -1863,20 +1863,19 @@ int do_read_index(struct index_state *istate, const 
char *path, int must_exist)
  * This way, shared index can be removed if they have not been used
  * for some time.
  */
-static void freshen_shared_index(char *base_sha1_hex, int warn)
+static void freshen_shared_index(const char *shared_index, int warn)
 {
-   char *shared_index = git_pathdup("sharedindex.%s", base_sha1_hex);
if (!check_and_freshen_file(shared_index, 1) && warn)
warning("could not freshen shared index '%s'", shared_index);
-   free(shared_index);
 }
 
-int read_index_from(struct index_state *istate, const char *path)
+int read_index_from(struct index_state *istate, const char *path,
+   const char *gitdir)
 {
struct split_index *split_index;
int ret;
char *base_sha1_hex;
-   const char *base_path;
+   char *base_path;
 
/* istate->initialized covers both .git/index and .git/sharedindex.xxx 
*/
if