Re: [PATCH] git: add --no-optional-locks option

2017-09-27 Thread Jeff King
On Wed, Sep 27, 2017 at 07:20:05PM +0530, Kaartic Sivaraam wrote:

> On Wed, 2017-09-27 at 02:40 -0400, Jeff King wrote:
> > On Sun, Sep 24, 2017 at 01:08:41PM +0530, Kaartic Sivaraam wrote:
> > 
> > > 
> > > So, if I get that correctly "git status --no-optional-locks" is a way to 
> > > get
> > > the "current" status without updating the on disk index file?
> > 
> > It's actually "git --no-optional-locks status", since it's a git-wide
> > option (it's just that "status" is the only one who respects it for
> > now).
> 
> Thanks for correcting the command. Now let me ask my (corrected)
> question again! So, if I get that correctly "git --no-optional-
> locks status" is a way to get the "current" status *without updating*
> the on disk index file?

Yes.

-Peff


Re: [PATCH] git: add --no-optional-locks option

2017-09-27 Thread Kaartic Sivaraam
On Wed, 2017-09-27 at 02:40 -0400, Jeff King wrote:
> On Sun, Sep 24, 2017 at 01:08:41PM +0530, Kaartic Sivaraam wrote:
> 
> > 
> > So, if I get that correctly "git status --no-optional-locks" is a way to get
> > the "current" status without updating the on disk index file?
> 
> It's actually "git --no-optional-locks status", since it's a git-wide
> option (it's just that "status" is the only one who respects it for
> now).

Thanks for correcting the command. Now let me ask my (corrected)
question again! So, if I get that correctly "git --no-optional-
locks status" is a way to get the "current" status *without updating*
the on disk index file?

---
Kaartic


Re: [PATCH] git: add --no-optional-locks option

2017-09-27 Thread Jeff King
On Mon, Sep 25, 2017 at 11:51:31AM -0700, Stefan Beller wrote:

> > diff --git a/read-cache.c b/read-cache.c
> > index 65f4fe8375..fc1ba122a3 100644
> > --- a/read-cache.c
> > +++ b/read-cache.c
> > @@ -1563,7 +1563,8 @@ static int read_index_extension(struct index_state 
> > *istate,
> >
> >  int hold_locked_index(struct lock_file *lk, int lock_flags)
> >  {
> > -   return hold_lock_file_for_update(lk, get_index_file(), lock_flags);
> > +   return hold_lock_file_for_update_timeout(lk, get_index_file(),
> > +lock_flags, 500);
> >  }
> >
> >  int read_index(struct index_state *istate)
> >
> > though I think there are a few sites which manually call
> > hold_lock_file_for_update() on the index that would need similar
> > treatment.
> 
> uh, too bad. The patch above looks really promising, though. :)

There are probably only a handful of other callers, and they'd just need
to swap out s/update/&_timeout/. So it really is pretty trivial.

> > I suspect it would work OK in practice, unless your index is so big that
> > 500ms isn't enough. The user may also see minor stalls when there's lock
> > contention. I'm not sure how annoying that would be.
> 
> There is only one way to find out. Though we don't want to volunteer
> all users into this experiment, I'd presume.

Yes. One of the nice things about the optional-locks approach is that it
only affects callers who specify the option. And the general idea has
gotten a year of testing in Visual Studio, which makes me feel good
about it.

> Regarding larger indexes, I wonder if we can adapt the 500ms
> to the repo size. At first I thought the abbreviation length could be
> a good proxy to determine the maximum waiting time, but now I am
> not so sure any more.

I think madness that way lies.

-Peff


Re: [PATCH] git: add --no-optional-locks option

2017-09-27 Thread Jeff King
On Sun, Sep 24, 2017 at 01:08:41PM +0530, Kaartic Sivaraam wrote:

> On Thursday 21 September 2017 10:02 AM, Jeff King wrote:
> > Some tools like IDEs or fancy editors may periodically run
> > commands like "git status" in the background to keep track
> > of the state of the repository.
> 
> I might be missing something, shouldn't the IDEs be encouraged to use
> libgit2 instead? I thought it was meant for these use cases.

GitHub Desktop, at least, has actually moved _away_ from libgit2. I
think there were a number of reasons. Some match what Dscho said
regarding Visual Studio. But I think a main one is just that libgit2
doesn't quite match regular git for feature parity or pace of
development. So you're stuck shelling out to regular Git half the time
anyway, and now you get to handle dependencies on _two_ systems. :)

> Note: I assume getting the status through libgit2 doesn't create an
> index.lock file.

I don't know if that's the case or not.

> > This patch implements the option 3.
> 
> So, if I get that correctly "git status --no-optional-locks" is a way to get
> the "current" status without updating the on disk index file?

It's actually "git --no-optional-locks status", since it's a git-wide
option (it's just that "status" is the only one who respects it for
now).

> > +`GIT_OPTIONAL_LOCKS`::
> > +   If set to `0`, Git will avoid performing any operations which
> > +   require taking a lock and which are not required to complete the
> > +   requested operation.
> 
> The above sentence seems to be a little confusing. How about,
> 
>If set to `0`, Git will complete the requested operation without
>performing the optional sub-operations that require taking a lock.

Yeah, my original is pretty clunky. I ended up with:

  If set to `0`, Git will complete any requested operation without
  performing any optional sub-operations that require taking a lock.

which swaps out "the" for "any".

Thanks.

-Peff


Re: [PATCH] git: add --no-optional-locks option

2017-09-26 Thread Jeff Hostetler



On 9/25/2017 12:17 PM, Johannes Schindelin wrote:

Hi Kaartic,

On Sun, 24 Sep 2017, Kaartic Sivaraam wrote:


On Thursday 21 September 2017 10:02 AM, Jeff King wrote:

Some tools like IDEs or fancy editors may periodically run commands
like "git status" in the background to keep track of the state of the
repository.


I might be missing something, shouldn't the IDEs be encouraged to use
libgit2 instead? I thought it was meant for these use cases.


There are pros and cons. Visual Studio moved away from libgit2 e.g. to
support SSH (back then, libgit2 did not support spawning processes, I have
no idea whether that changed in the meantime).


There were other issues besides feature parity.  The big one for VS
was that it moved the git computations into a separate process and
address space.  You can't easily read/modify/write a 500MB .git/index
file into memory (with however many copies of the data that that
creates) in a 32-bit process.



Re: [PATCH] git: add --no-optional-locks option

2017-09-25 Thread Stefan Beller
>> There might be another option to cope with the situation:
>>
>>  4. Teach all commands to spinlock / busywait shortly for important
>>  locks instead of giving up. In that case git-status rewriting
>>  the index ought to be fine?
>
> We do have all the infrastructure in place to do a reasonable busywait
> with backoff. I think the patch is roughly just:
>
> diff --git a/read-cache.c b/read-cache.c
> index 65f4fe8375..fc1ba122a3 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1563,7 +1563,8 @@ static int read_index_extension(struct index_state 
> *istate,
>
>  int hold_locked_index(struct lock_file *lk, int lock_flags)
>  {
> -   return hold_lock_file_for_update(lk, get_index_file(), lock_flags);
> +   return hold_lock_file_for_update_timeout(lk, get_index_file(),
> +lock_flags, 500);
>  }
>
>  int read_index(struct index_state *istate)
>
> though I think there are a few sites which manually call
> hold_lock_file_for_update() on the index that would need similar
> treatment.

uh, too bad. The patch above looks really promising, though. :)

>
> I suspect it would work OK in practice, unless your index is so big that
> 500ms isn't enough. The user may also see minor stalls when there's lock
> contention. I'm not sure how annoying that would be.

There is only one way to find out. Though we don't want to volunteer
all users into this experiment, I'd presume.

Regarding larger indexes, I wonder if we can adapt the 500ms
to the repo size. At first I thought the abbreviation length could be
a good proxy to determine the maximum waiting time, but now I am
not so sure any more.

Stefan


Re: [PATCH] git: add --no-optional-locks option

2017-09-25 Thread Jeff King
On Mon, Sep 25, 2017 at 06:10:12PM +0200, Johannes Schindelin wrote:

> On Thu, 21 Sep 2017, Jeff King wrote:
> 
> > Johannes, this is an adaptation of your 67e5ce7f63 (status: offer *not*
> > to lock the index and update it, 2016-08-12). Folks working on GitHub
> > Desktop complained to me that it's only available on Windows. :)
> 
> Okay, so this is trying to help me by upstreaming a patch from Git for
> Windows?
> 
> If so: thanks!

Yes, in a round-about way. :)

> The changes, in particular the different semantics, will guarantee that I
> have to work on the consumer's side here, though, leaving me even less
> time for the Git mailing list, so I will need a lot more help with
> upstreaming patches.

I think you can get away with doing nothing for the time being. The two
patches can co-exist in the GfW codebase[1], and people using the
existing option can switch over at their leisure. Eventually you may
decide to revert 67e5ce7f63.

-Peff

[1] Modulo some trivial conflict resolution when the two are merged.


Re: [PATCH] git: add --no-optional-locks option

2017-09-25 Thread Johannes Schindelin
Hi Kaartic,

On Sun, 24 Sep 2017, Kaartic Sivaraam wrote:

> On Thursday 21 September 2017 10:02 AM, Jeff King wrote:
> > Some tools like IDEs or fancy editors may periodically run commands
> > like "git status" in the background to keep track of the state of the
> > repository.
> 
> I might be missing something, shouldn't the IDEs be encouraged to use
> libgit2 instead? I thought it was meant for these use cases.

There are pros and cons. Visual Studio moved away from libgit2 e.g. to
support SSH (back then, libgit2 did not support spawning processes, I have
no idea whether that changed in the meantime).

Ciao,
Johannes


Re: [PATCH] git: add --no-optional-locks option

2017-09-25 Thread Johannes Schindelin
Hi Peff,

On Thu, 21 Sep 2017, Jeff King wrote:

> Johannes, this is an adaptation of your 67e5ce7f63 (status: offer *not*
> to lock the index and update it, 2016-08-12). Folks working on GitHub
> Desktop complained to me that it's only available on Windows. :)

Okay, so this is trying to help me by upstreaming a patch from Git for
Windows?

If so: thanks!

The changes, in particular the different semantics, will guarantee that I
have to work on the consumer's side here, though, leaving me even less
time for the Git mailing list, so I will need a lot more help with
upstreaming patches.

Ciao,
Dscho


Re: [PATCH] git: add --no-optional-locks option

2017-09-24 Thread Kaartic Sivaraam
On Thursday 21 September 2017 10:02 AM, Jeff King wrote:
> Some tools like IDEs or fancy editors may periodically run
> commands like "git status" in the background to keep track
> of the state of the repository.

I might be missing something, shouldn't the IDEs be encouraged to use
libgit2 instead? I thought it was meant for these use cases.

Note: I assume getting the status through libgit2 doesn't create an 
index.lock file.

>3. Ask "status" not to lock or write the index.
> 
>   This is easy to implement. The big downside is that any
>   work done in refreshing the index for such a call is
>   lost when the process exits. So a background process
>   may end up re-hashing a changed file multiple times
>   until the user runs a command that does an index
>   refresh themselves.
> 
> This patch implements the option 3.

So, if I get that correctly "git status --no-optional-locks" is a way
to get the "current" status without updating the on disk index file?

> +`GIT_OPTIONAL_LOCKS`::
> + If set to `0`, Git will avoid performing any operations which
> + require taking a lock and which are not required to complete the
> + requested operation.

The above sentence seems to be a little hard to interpret. How about,

If set to `0`, Git will complete the requested operation without
performing the optional sub-operations that require taking a lock.


---
Kaartic


Re: [PATCH] git: add --no-optional-locks option

2017-09-22 Thread Jeff King
On Fri, Sep 22, 2017 at 02:41:06PM -0700, Stefan Beller wrote:

> > Ah, thanks. I _thought_ we could already do that but when I went looking
> > for the standard --recursive option I couldn't find it.
> 
> Thanks for checking for submodules there.
> 
> I personally prefer --recurse-submodules despite the longer name,
> because a plain recursive option can mean anything in a sufficiently
> complex program such as Git (recurse into the tree (c.f. ls-tree), or
> for the algorithm used (c.f. merge, diff) or yet another dimension
> I did not think of).

Yeah, I agree that's a better name (mentioning --recursive is probably
just showing my general ignorance of submodules).

> > So yes, I would think we would want this option to apply recursively in
> > that case, even when we cross repository boundaries.
> 
> Regarding the actual patch which is heavily aimed at coping with IDEs
> despite the command line being used, I wonder how many IDEs pass
> --ignore-submodules and recurse themselves (if needed). Reason for
> my suspicion is [1] which does pay attention to submodules:
> 
> >Our application calls status including the following flags:
> >--porcelain=v2 --ignored --untracked-files=all --ignore-submodules=none
> 
> [1] 
> https://public-inbox.org/git/2bbb1d0f-ae06-1878-d185-112bd51f7...@gmail.com/

Yeah, I think that's probably Visual Studio (which is what Johannes
presumably wrote the original patch for). GitHub Desktop does not
currently pass it, though perhaps should consider doing so (though of
course the user could tweak their config, too).

> There might be another option to cope with the situation:
> 
>  4. Teach all commands to spinlock / busywait shortly for important
>  locks instead of giving up. In that case git-status rewriting
>  the index ought to be fine?

We do have all the infrastructure in place to do a reasonable busywait
with backoff. I think the patch is roughly just:

diff --git a/read-cache.c b/read-cache.c
index 65f4fe8375..fc1ba122a3 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1563,7 +1563,8 @@ static int read_index_extension(struct index_state 
*istate,
 
 int hold_locked_index(struct lock_file *lk, int lock_flags)
 {
-   return hold_lock_file_for_update(lk, get_index_file(), lock_flags);
+   return hold_lock_file_for_update_timeout(lk, get_index_file(),
+lock_flags, 500);
 }
 
 int read_index(struct index_state *istate)

though I think there are a few sites which manually call
hold_lock_file_for_update() on the index that would need similar
treatment.

I suspect it would work OK in practice, unless your index is so big that
500ms isn't enough. The user may also see minor stalls when there's lock
contention. I'm not sure how annoying that would be.

-Peff


Re: [PATCH] git: add --no-optional-locks option

2017-09-22 Thread Stefan Beller
On Fri, Sep 22, 2017 at 2:25 PM, Jeff King  wrote:
> On Fri, Sep 22, 2017 at 01:09:32PM -0700, Stefan Beller wrote:
>
>> On Thu, Sep 21, 2017 at 9:25 PM, Jeff King  wrote:
>>
>> >
>> > But imagine that "git status" learns to recurse into submodules and run
>> > "git status" inside them. Surely we would want the submodule repos to
>> > also avoid taking any unnecessary locks?
>>
>> You can teach Git to recurse into submodules already at home,
>> just 'git config status.submoduleSummary none'. ;)
>>
>> It occurs to me that the config name is badly choosen, as it stores
>> an argument for git status --ignore-submodules[=mode]
>
> Ah, thanks. I _thought_ we could already do that but when I went looking
> for the standard --recursive option I couldn't find it.

Thanks for checking for submodules there.

I personally prefer --recurse-submodules despite the longer name,
because a plain recursive option can mean anything in a sufficiently
complex program such as Git (recurse into the tree (c.f. ls-tree), or
for the algorithm used (c.f. merge, diff) or yet another dimension
I did not think of).

> So yes, I would think we would want this option to apply recursively in
> that case, even when we cross repository boundaries.

Regarding the actual patch which is heavily aimed at coping with IDEs
despite the command line being used, I wonder how many IDEs pass
--ignore-submodules and recurse themselves (if needed). Reason for
my suspicion is [1] which does pay attention to submodules:

>Our application calls status including the following flags:
>--porcelain=v2 --ignored --untracked-files=all --ignore-submodules=none

[1] https://public-inbox.org/git/2bbb1d0f-ae06-1878-d185-112bd51f7...@gmail.com/

There might be another option to cope with the situation:

 4. Teach all commands to spinlock / busywait shortly for important
 locks instead of giving up. In that case git-status rewriting
 the index ought to be fine?

Thanks,
Stefan


Re: [PATCH] git: add --no-optional-locks option

2017-09-22 Thread Jeff King
On Fri, Sep 22, 2017 at 01:09:32PM -0700, Stefan Beller wrote:

> On Thu, Sep 21, 2017 at 9:25 PM, Jeff King  wrote:
> 
> >
> > But imagine that "git status" learns to recurse into submodules and run
> > "git status" inside them. Surely we would want the submodule repos to
> > also avoid taking any unnecessary locks?
> 
> You can teach Git to recurse into submodules already at home,
> just 'git config status.submoduleSummary none'. ;)
> 
> It occurs to me that the config name is badly choosen, as it stores
> an argument for git status --ignore-submodules[=mode]

Ah, thanks. I _thought_ we could already do that but when I went looking
for the standard --recursive option I couldn't find it.

So yes, I would think we would want this option to apply recursively in
that case, even when we cross repository boundaries.

-Peff


Re: [PATCH] git: add --no-optional-locks option

2017-09-22 Thread Stefan Beller
On Thu, Sep 21, 2017 at 9:25 PM, Jeff King  wrote:

>
> But imagine that "git status" learns to recurse into submodules and run
> "git status" inside them. Surely we would want the submodule repos to
> also avoid taking any unnecessary locks?

You can teach Git to recurse into submodules already at home,
just 'git config status.submoduleSummary none'. ;)

It occurs to me that the config name is badly choosen, as it stores
an argument for git status --ignore-submodules[=mode]


Re: [PATCH] git: add --no-optional-locks option

2017-09-22 Thread Jeff King
On Fri, Sep 22, 2017 at 01:42:10AM -0500, Daniel Santos wrote:

> > But taking the index lock may conflict with other operations
> > in the repository. Especially ones that the user is doing
> > themselves, which _aren't_ opportunistic. In other words,
> > "git status" knows how to back off when somebody else is
> > holding the lock, but other commands don't know that status
> > would be happy to drop the lock if somebody else wanted it.
> 
> Interestingly, this usually slaps me when performing an _interactive_
> rebase.  It occurred to me that if I'm performing an interaction
> operation, it doesn't seem unreasonable for git wait up to 125ms or so
> for the lock and then prompting the user to ask if they want to continue
> waiting for the lock.

Yes, lock timeouts would help with this situation, though not eliminate
it entirely (and we've been adding some lock timeouts in a few places
for related situations).

We generally avoid prompting, especially when a command can just be
reissued. It sounds like "git rebase" gets into a funny state if it
cannot grab the index lock. That's something that should be fixed. Even
if it bails, you should be able to move forward with "rebase --continue"
or similar.

> That is not necessarily the case.  I don't actually know git on the
> inside, but I would ask you to consider a read-write lock and a hybrid
> of one and three.
> 
> I don't know what dotlocks are, but I'm certain that you can implement a
> rw lock using lock files and no other IPC, although it does increase the
> complexity.  The way this works is that `git status' acquires a read
> lock and does its thing.  If it has real changes, instead of discarding
> them it attempts to upgrade to a write lock.  If that fails, you throw
> it away, otherwise you write them and release.

By dotlocks, I just mean our strategy of creating O_EXCL "index.lock"
files.

You're right that it can be done using two such locks, and communicating
via the lockfile contents. So my "impossible" was overstating it. I do
stand by my contention that it's much more complex than the existing
scheme. :)

More importantly, though, it changes the locking contract completely
between old and new versions (and between other implementations).
There's probably only a small minority of users who might use two
implementations to simultaneously access the same repository, but it is
a use case we try to support. I think we'd have to require a
repository-version bump to tell programs to use the new scheme.

-Peff


Re: [PATCH] git: add --no-optional-locks option

2017-09-22 Thread Jeff King
On Fri, Sep 22, 2017 at 07:22:28AM -0400, Jeff Hostetler wrote:

> > > I don't think we should pass this environment variable to remote
> > > repositories. It should be listed in local_repo_env[] in environment.c.
> > 
> > I'm not sure I agree. This is really about the context in which the
> > command is executing, not anything about the particular repository
> > you're operating on.
> > 
> > For fetch/push operations that touch a remote, I doubt it would matter
> > either way (and anyway, those often cross network boundaries that don't
> > propagate environment variables anyway).
> > 
> > But imagine that "git status" learns to recurse into submodules and run
> > "git status" inside them. Surely we would want the submodule repos to
> > also avoid taking any unnecessary locks?
> > 
> > -Peff
> > 
> 
> https://github.com/git-for-windows/git/commit/ff63b51c22389139a864eb2e565c6cdc5a30f061
> 
> https://github.com/git-for-windows/git/pull/1004/commits/45bad66192352481acbc826f11d90c8928b39a7a
> 
> We should compare this with what we did in Git for Windows last fall.
> I guess those commits didn't get pushed upstream.

Right. I think you missed the initial message in the thread that
explains how this is an expanded version of ff63b51c22. :)

I didn't know about the environment thing in 45bad66192, though[1]. That
makes me even more confident that this is the right approach.

-Peff

[1] Sorry for not doing my homework more carefully on the existing
solution.  GitHub Desktop ran into the same situation and pointed me
at ff63b51c22. I extrapolated the rest of it on my own. ;)


Re: [PATCH] git: add --no-optional-locks option

2017-09-22 Thread Jeff Hostetler



On 9/22/2017 12:25 AM, Jeff King wrote:

On Thu, Sep 21, 2017 at 08:25:50PM +0200, Johannes Sixt wrote:


+`GIT_OPTIONAL_LOCKS`::
+   If set to `0`, Git will avoid performing any operations which
+   require taking a lock and which are not required to complete the
+   requested operation. For example, this will prevent `git status`
+   from refreshing the index as a side effect. This is useful for
+   processes running in the background which do not want to cause
+   lock contention with other operations on the repository.
+   Defaults to `1`. >>

I don't think we should pass this environment variable to remote
repositories. It should be listed in local_repo_env[] in environment.c.


I'm not sure I agree. This is really about the context in which the
command is executing, not anything about the particular repository
you're operating on.

For fetch/push operations that touch a remote, I doubt it would matter
either way (and anyway, those often cross network boundaries that don't
propagate environment variables anyway).

But imagine that "git status" learns to recurse into submodules and run
"git status" inside them. Surely we would want the submodule repos to
also avoid taking any unnecessary locks?

-Peff



https://github.com/git-for-windows/git/commit/ff63b51c22389139a864eb2e565c6cdc5a30f061

https://github.com/git-for-windows/git/pull/1004/commits/45bad66192352481acbc826f11d90c8928b39a7a

We should compare this with what we did in Git for Windows last fall.
I guess those commits didn't get pushed upstream.

We added '--no-lock-index' to keep status from locking the index
during status and effectively being read-only.  This helped with
problems with Visual Studio similar to the ones being described
for KDevelop.

Jeff



Re: [PATCH] git: add --no-optional-locks option

2017-09-22 Thread Daniel Santos
On 09/20/2017 11:32 PM, Jeff King wrote:
> Johannes, this is an adaptation of your 67e5ce7f63 (status: offer *not*
> to lock the index and update it, 2016-08-12). Folks working on GitHub
> Desktop complained to me that it's only available on Windows. :)
>
> I expanded the scope a bit to let us give the same treatment to more
> commands in the long run.  I'd also be OK with just cherry-picking your
> patch to non-Windows Git if you don't find my reasoning below
> compelling. But I think we need _something_ like this, as the other
> solutions I could come up with don't seem very promising.
>
> -Peff
>
> -- >8 --
> Some tools like IDEs or fancy editors may periodically run
> commands like "git status" in the background to keep track
> of the state of the repository. Some of these commands may
> refresh the index and write out the result in an
> opportunistic way: if they can get the index lock, then they
> update the on-disk index with any updates they find. And if
> not, then their in-core refresh is lost and just has to be
> recomputed by the next caller.
>
> But taking the index lock may conflict with other operations
> in the repository. Especially ones that the user is doing
> themselves, which _aren't_ opportunistic. In other words,
> "git status" knows how to back off when somebody else is
> holding the lock, but other commands don't know that status
> would be happy to drop the lock if somebody else wanted it.

Interestingly, this usually slaps me when performing an _interactive_
rebase.  It occurred to me that if I'm performing an interaction
operation, it doesn't seem unreasonable for git wait up to 125ms or so
for the lock and then prompting the user to ask if they want to continue
waiting for the lock.

> There are a couple possible solutions:
>
>   1. Have some kind of "pseudo-lock" that allows other
>  commands to tell status that they want the lock.
>
>  This is likely to be complicated and error-prone to
>  implement (and maybe even impossible with just
>  dotlocks to work from, as it requires some
>  inter-process communication).
>
>   2. Avoid background runs of commands like "git status"
>  that want to do opportunistic updates, preferring
>  instead plumbing like diff-files, etc.
>
>  This is awkward for a couple of reasons. One is that
>  "status --porcelain" reports a lot more about the
>  repository state than is available from individual
>  plumbing commands. And two is that we actually _do_
>  want to see the refreshed index. We just don't want to
>  take a lock or write out the result. Whereas commands
>  like diff-files expect us to refresh the index
>  separately and write it to disk so that they can depend
>  on the result. But that write is exactly what we're
>  trying to avoid.
>
>   3. Ask "status" not to lock or write the index.
>
>  This is easy to implement. The big downside is that any
>  work done in refreshing the index for such a call is
>  lost when the process exits. So a background process
>  may end up re-hashing a changed file multiple times
>  until the user runs a command that does an index
>  refresh themselves.

That is not necessarily the case.  I don't actually know git on the
inside, but I would ask you to consider a read-write lock and a hybrid
of one and three.

I don't know what dotlocks are, but I'm certain that you can implement a
rw lock using lock files and no other IPC, although it does increase the
complexity.  The way this works is that `git status' acquires a read
lock and does its thing.  If it has real changes, instead of discarding
them it attempts to upgrade to a write lock.  If that fails, you throw
it away, otherwise you write them and release.

In order to implement rw locks with only lock files, "off the cuff" I
say you have a single "lock list" file that should never be deleted and
a "lock lock" file that is held in order to read or modify the list. 
The format of the lock list would have a pair of 32-bit wrapping
modification counts (or versions) at the top -- one for modifications to
the lock list its self and another for modifications to the underlying
data (i.e., the number of times a write lock has been acquired).  This
header is followed by entries something like this:

   

  'r' if waiting for a read lock
 'R' if actively reading
 'w' if waiting for write lock
 'W' if actively writing
The pid
If active, the version of the data at the time lock acquired or 
zero.
  Time began waiting or time lock acquired


An operation of 'r' or 'w' means that you are waiting and upper case
means that you are active.   is the version of the data at the
time the lock became active and writers increment it when they acquire a
lock.  You wait with file alteration notification on the lock list (if
there is any doubt based upon timestamp precision then you can examine
the lock list version).  When you want to read 

Re: [PATCH] git: add --no-optional-locks option

2017-09-21 Thread Jeff King
On Thu, Sep 21, 2017 at 08:25:50PM +0200, Johannes Sixt wrote:

> > +`GIT_OPTIONAL_LOCKS`::
> > +   If set to `0`, Git will avoid performing any operations which
> > +   require taking a lock and which are not required to complete the
> > +   requested operation. For example, this will prevent `git status`
> > +   from refreshing the index as a side effect. This is useful for
> > +   processes running in the background which do not want to cause
> > +   lock contention with other operations on the repository.
> > +   Defaults to `1`.
> 
> I don't think we should pass this environment variable to remote
> repositories. It should be listed in local_repo_env[] in environment.c.

I'm not sure I agree. This is really about the context in which the
command is executing, not anything about the particular repository
you're operating on.

For fetch/push operations that touch a remote, I doubt it would matter
either way (and anyway, those often cross network boundaries that don't
propagate environment variables anyway).

But imagine that "git status" learns to recurse into submodules and run
"git status" inside them. Surely we would want the submodule repos to
also avoid taking any unnecessary locks?

-Peff


Re: [PATCH] git: add --no-optional-locks option

2017-09-21 Thread Johannes Sixt

Am 21.09.2017 um 06:32 schrieb Jeff King:

diff --git a/Documentation/git.txt b/Documentation/git.txt
index 6e3a6767e5..8dd3ae05ae 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -159,6 +159,10 @@ foo.bar= ...`) sets `foo.bar` to the empty string which ` 
git config
Add "icase" magic to all pathspec. This is equivalent to setting
the `GIT_ICASE_PATHSPECS` environment variable to `1`.
  
+--no-optional-locks::

+   Do not perform optional operations that require locks. This is
+   equivalent to setting the `GIT_OPTIONAL_LOCKS` to `0`.
+
  GIT COMMANDS
  
  
@@ -697,6 +701,15 @@ of clones and fetches.

which feed potentially-untrusted URLS to git commands.  See
linkgit:git-config[1] for more details.
  
+`GIT_OPTIONAL_LOCKS`::

+   If set to `0`, Git will avoid performing any operations which
+   require taking a lock and which are not required to complete the
+   requested operation. For example, this will prevent `git status`
+   from refreshing the index as a side effect. This is useful for
+   processes running in the background which do not want to cause
+   lock contention with other operations on the repository.
+   Defaults to `1`.


I don't think we should pass this environment variable to remote 
repositories. It should be listed in local_repo_env[] in environment.c.


-- Hannes


Re: [PATCH] git: add --no-optional-locks option

2017-09-20 Thread Junio C Hamano
Jeff King  writes:

> I admit that's just adding more hand-waving to the pile. But I don't
> think it really _hurts_ to leave that door open (aside from making the
> documentation a bit wishy-washy). And it helps because callers would
> pick up the new features automatically, without having to learn about a
> new option.

Oh, I do agree it is a good idea to leave that door open, so that
scripts that rely on today's --no-optional-locks option will not
have to be updated when a similar feature (like the "ref compaction"
example you mentioned) against which "let's disable when we are not
the primary process, in order to keep the interference to the
minimum" would be something we would want to say.  The option being
added here should cover these future needs.

> And I think that's really what this option is. It is less about the
> caller asking for some specific behavior, and more about them telling
> Git about the context in which it's executing so it can make intelligent
> decisions.

Yes, indeed.


Re: [PATCH] git: add --no-optional-locks option

2017-09-20 Thread Jeff King
On Thu, Sep 21, 2017 at 01:55:58PM +0900, Junio C Hamano wrote:

> The phrase 'optional lock' does not answer this question clearly,
> though: does it make sense to extend the coverage of this option in
> the future to things more than the "opportunistic update to the
> index file"?
> 
> If the answer is no, then having 'index' instead of 'lock' in the
> name of the option would make more sense (and 'opportunistic' over
> 'optional', too), because what the change is about is to allow other
> processes that are directly interacting with the user to update the
> index, and 'lock' being hindrance is merely an implementation
> detail.  The comment on the "test" in the log message mentions as if
> it were a short-coming that it does not check the lock but checks
> if the index is written, but I think that is testing what matters
> and preferable than testing "did we lock and then unlock it?"
> 
> On the other hand, if the answer is yes, then I am curious what
> other things this may extend to, and if these other things are also
> opportunistic optimizations.

I left it intentionally vague exactly because I thought we might want to
leave room for the answer to change to "yes" eventually.  For instance,
imagine that we had a ref storage format that required periodic
compaction, and readers might sometimes choose to compact in order to
save future readers from repeating some work they've done. If that
compaction means holding a lock even for a brief period, I think it
would fall under this option.

I admit that's just adding more hand-waving to the pile. But I don't
think it really _hurts_ to leave that door open (aside from making the
documentation a bit wishy-washy). And it helps because callers would
pick up the new features automatically, without having to learn about a
new option.

And I think that's really what this option is. It is less about the
caller asking for some specific behavior, and more about them telling
Git about the context in which it's executing so it can make intelligent
decisions.

And in that sense, something descriptive like --background-process
perhaps would be a better name. Except that I couldn't come up with a
name that isn't confusing (certainly --background-process implies to me
that Git would itself run in the background, which makes no sense here).

I also considered something like "--read-only" to tell Git that we
should avoid writing to the repository. But that's really not what this
does. It just avoids writes that may cause contention, not all writes.

I also considered using the word "opportunistic" in the option name, but
decided it was too long and hard to spell.

So there. I am open to a better name, but I could not come up with one.

> Thanks (and sorry for not being Johannes ;-).

You lack his rugged good looks, but your review was still welcome.

-Peff


Re: [PATCH] git: add --no-optional-locks option

2017-09-20 Thread Junio C Hamano
Jeff King  writes:

> Johannes, this is an adaptation of your 67e5ce7f63 (status: offer *not*
> to lock the index and update it, 2016-08-12). Folks working on GitHub
> Desktop complained to me that it's only available on Windows. :)
>
> I expanded the scope a bit to let us give the same treatment to more
> commands in the long run.  I'd also be OK with just cherry-picking your
> patch to non-Windows Git if you don't find my reasoning below
> compelling. But I think we need _something_ like this, as the other
> solutions I could come up with don't seem very promising.

The phrase 'optional lock' does not answer this question clearly,
though: does it make sense to extend the coverage of this option in
the future to things more than the "opportunistic update to the
index file"?

If the answer is no, then having 'index' instead of 'lock' in the
name of the option would make more sense (and 'opportunistic' over
'optional', too), because what the change is about is to allow other
processes that are directly interacting with the user to update the
index, and 'lock' being hindrance is merely an implementation
detail.  The comment on the "test" in the log message mentions as if
it were a short-coming that it does not check the lock but checks
if the index is written, but I think that is testing what matters
and preferable than testing "did we lock and then unlock it?"

On the other hand, if the answer is yes, then I am curious what
other things this may extend to, and if these other things are also
opportunistic optimizations.

Other than that, I think this change (including the part that this
is done globally and down to subprocesses as needed) makes sense.

Thanks (and sorry for not being Johannes ;-).




[PATCH] git: add --no-optional-locks option

2017-09-20 Thread Jeff King
Johannes, this is an adaptation of your 67e5ce7f63 (status: offer *not*
to lock the index and update it, 2016-08-12). Folks working on GitHub
Desktop complained to me that it's only available on Windows. :)

I expanded the scope a bit to let us give the same treatment to more
commands in the long run.  I'd also be OK with just cherry-picking your
patch to non-Windows Git if you don't find my reasoning below
compelling. But I think we need _something_ like this, as the other
solutions I could come up with don't seem very promising.

-Peff

-- >8 --
Some tools like IDEs or fancy editors may periodically run
commands like "git status" in the background to keep track
of the state of the repository. Some of these commands may
refresh the index and write out the result in an
opportunistic way: if they can get the index lock, then they
update the on-disk index with any updates they find. And if
not, then their in-core refresh is lost and just has to be
recomputed by the next caller.

But taking the index lock may conflict with other operations
in the repository. Especially ones that the user is doing
themselves, which _aren't_ opportunistic. In other words,
"git status" knows how to back off when somebody else is
holding the lock, but other commands don't know that status
would be happy to drop the lock if somebody else wanted it.

There are a couple possible solutions:

  1. Have some kind of "pseudo-lock" that allows other
 commands to tell status that they want the lock.

 This is likely to be complicated and error-prone to
 implement (and maybe even impossible with just
 dotlocks to work from, as it requires some
 inter-process communication).

  2. Avoid background runs of commands like "git status"
 that want to do opportunistic updates, preferring
 instead plumbing like diff-files, etc.

 This is awkward for a couple of reasons. One is that
 "status --porcelain" reports a lot more about the
 repository state than is available from individual
 plumbing commands. And two is that we actually _do_
 want to see the refreshed index. We just don't want to
 take a lock or write out the result. Whereas commands
 like diff-files expect us to refresh the index
 separately and write it to disk so that they can depend
 on the result. But that write is exactly what we're
 trying to avoid.

  3. Ask "status" not to lock or write the index.

 This is easy to implement. The big downside is that any
 work done in refreshing the index for such a call is
 lost when the process exits. So a background process
 may end up re-hashing a changed file multiple times
 until the user runs a command that does an index
 refresh themselves.

This patch implements the option 3. The idea (and the test)
is largely stolen from a Git for Windows patch by Johannes
Schindelin, 67e5ce7f63 (status: offer *not* to lock the
index and update it, 2016-08-12). The twist here is that
instead of making this an option to "git status", it becomes
a "git" option and matching environment variable.

The reason there is two-fold:

  1. An environment variable is carried through to
 sub-processes. And whether an invocation is a
 background process or not should apply to the whole
 process tree. So you could do "git --no-optional-locks
 foo", and if "foo" is a script or alias that calls
 "status", you'll still get the effect.

  2. There may be other programs that want the same
 treatment.

 I've punted here on finding more callers to convert,
 since "status" is the obvious one to call as a repeated
 background job. But "git diff"'s opportunistic refresh
 of the index may be a good candidate.

The test is taken from 67e5ce7f63, and it's worth repeating
Johannes's explanation:

  Note that the regression test added in this commit does
  not *really* verify that no index.lock file was written;
  that test is not possible in a portable way. Instead, we
  verify that .git/index is rewritten *only* when `git
  status` is run without `--no-optional-locks`.

Signed-off-by: Jeff King 
---
 Documentation/git.txt | 13 +
 builtin/commit.c  |  5 -
 cache.h   |  6 ++
 environment.c |  5 +
 git.c |  4 
 t/t7508-status.sh | 10 ++
 6 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/Documentation/git.txt b/Documentation/git.txt
index 6e3a6767e5..8dd3ae05ae 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -159,6 +159,10 @@ foo.bar= ...`) sets `foo.bar` to the empty string which ` 
git config
Add "icase" magic to all pathspec. This is equivalent to setting
the `GIT_ICASE_PATHSPECS` environment variable to `1`.
 
+--no-optional-locks::
+   Do not perform optional operations that require locks. This is
+   equivalent to setting the `GIT_OPTIONAL_LOCKS` to `0`.
+
 GIT COMMANDS
 
 
@@ -697,6