Re: [PATCH] packed_ref_store: handle a packed-refs file that is a symlink

2017-07-28 Thread Michael Haggerty
On Thu, Jul 27, 2017 at 12:40 PM, Junio C Hamano  wrote:
> Jeff King  writes:
>
>> On Thu, Jul 27, 2017 at 10:19:48AM -0700, Junio C Hamano wrote:
>>
>>> Makes sense.  Makes me wonder why we need a separate .new file
>>> (instead of writing into the .lock instead), but that is a different
>>> issue.
>>
>> It comes from 42dfa7ece (commit_packed_refs(): use a staging file
>> separate from the lockfile, 2017-06-23). That commit explains that we
>> want to be able to put the new contents into service before we release
>> the lock. But it doesn't say why that's useful.
>
> By being able to hold the lock on packed-refs longer, I guess
> something like this becomes possible:
>
>  * hold the lock on packed-refs
>  * hold the lock on loose ref A, B, C, ...
>  * update packed-refs to include the freshest values of these refs
>  * start serving packed-refs without releasing the lock
>  * for X in A, B, C...: delete the loose ref X and unlock X
>  * unlock the packed-refs

Hmmm, I thought that I explained somewhere why this is useful, but I
can't find it now.

The code even after this patch series is

Prepare:
1. lock loose refs
Finish:
2. perform creates and updates of loose refs, releasing their locks as we go
3. perform deletes of loose refs
4. lock packed-refs file
5. repack_without_refs() (delete packed versions of refs to be deleted)
6. commit new packed-refs file (which also unlocks it)
7. release locks on loose refs that were deleted

This is problematic because after a loose reference is deleted, it
briefly (between steps 3 and 5) reveals any packed version of the
reference, which might even point at a commit that has been GCed. It
is even worse if the attempt to acquire the packed refs lock in step 4
fails, because then the repo could be left in that broken state.

Clearly we can avoid the second problem by acquiring the packed-refs
lock during the prepare phase. You might also attempt to fix the first
problem by deleting the references from the packed-refs file before we
delete the corresponding loose references:

Prepare:
1. lock loose refs
2. lock packed-refs file
Finish:
3. perform creates and updates of loose refs, releasing their locks as we go
4. repack_without_refs() (delete packed versions of refs to be deleted)
5. commit new packed-refs file (which also unlocks it)
6. perform deletes of loose refs
7. release locks on loose refs that were deleted

But now we have a different problem. pack-refs does the following:

A. lock packed-refs file
B. read loose refs
C. commit new packed-refs file (which also unlocks it)
Then, for each loose reference that should be pruned, delete it in a
transaction with REF_ISPRUNING, which means:
D. lock the loose reference
E. lock packed-refs file
F. check that the reference's value is the same as the value just
written to packed-refs
G. delete the loose ref file if it still exists
H. unlock the loose reference

If a reference deletion is running at the same time as a pack-refs,
you could have a race where steps A and B of a pack-refs run between
steps 5 and 6 of a reference deletion. The pack-refs would see the
loose reference and would pack it into the packed-refs file, leaving
the reference at its old value even though the user was told that the
reference was deleted.

If a new packed-refs file can be activated while retaining the lock on
the file, then a reference update could keep the packed-refs file
locked during step 6 of the second algorithm, which would block step A
of pack-refs from beginning.

I have the feeling that there might be other, more improbably races
lurking here, too.

The eternal problems with races in the files backend is a main reason
that I'm enthusiastic about the reftable proposal.

Michael


Re: [PATCH] packed_ref_store: handle a packed-refs file that is a symlink

2017-07-27 Thread Junio C Hamano
Jeff King  writes:

> On Thu, Jul 27, 2017 at 10:19:48AM -0700, Junio C Hamano wrote:
>
>> Makes sense.  Makes me wonder why we need a separate .new file
>> (instead of writing into the .lock instead), but that is a different
>> issue.
>
> It comes from 42dfa7ece (commit_packed_refs(): use a staging file
> separate from the lockfile, 2017-06-23). That commit explains that we
> want to be able to put the new contents into service before we release
> the lock. But it doesn't say why that's useful.

By being able to hold the lock on packed-refs longer, I guess
something like this becomes possible:

 * hold the lock on packed-refs
 * hold the lock on loose ref A, B, C, ...
 * update packed-refs to include the freshest values of these refs
 * start serving packed-refs without releasing the lock
 * for X in A, B, C...: delete the loose ref X and unlock X
 * unlock the packed-refs

Other people racing with the sequence to recreate a loose ref that
is even fresher than the resulting packed-refs file, while we still
hold the lock on packed-refs, is perfectly OK.

But we must make sure our packed-refs is visible to others before
starting to delete and unlock the loose refs.

Hmph, but that is not a sufficient explanation.  I am not seeing
anything bad to happen if we unlock the packed-refs before deleting
loose refs that we have locks on, so there must be something else
that needs "new packed-refs is made visible way before we unlock it".

> I recall from past discussions that this will help close some races,
> and e5cc7d7d2 (repack_without_refs(): don't lock or unlock the packed
> refs, 2017-07-01) alludes to this. I think the races in question have to
> do with holding the packed-refs lock while pruning the just-packed
> files, but I'm having trouble digging up specifics in the archive.
>
> -Peff


Re: [PATCH] packed_ref_store: handle a packed-refs file that is a symlink

2017-07-27 Thread Jeff King
On Thu, Jul 27, 2017 at 10:19:48AM -0700, Junio C Hamano wrote:

> Michael Haggerty  writes:
> 
> > Change `commit_packed_refs()` to use `get_locked_file_path()` to find
> > the path of the file that it should overwrite. Since that path was
> > properly resolved when the lockfile was created, this restores the
> > pre-42dfa7ecef behavior.
> 
> Because when we take a lock hold_lock_file() eventually calls into
> lock_file() which by default takes the lock on the target of the
> symbolic link (which is the sensible default, which is triggered in
> this codepath), so this change to use the name of that file is all
> that is needed.
> 
> Makes sense.  Makes me wonder why we need a separate .new file
> (instead of writing into the .lock instead), but that is a different
> issue.

It comes from 42dfa7ece (commit_packed_refs(): use a staging file
separate from the lockfile, 2017-06-23). That commit explains that we
want to be able to put the new contents into service before we release
the lock. But it doesn't say why that's useful.

I recall from past discussions that this will help close some races,
and e5cc7d7d2 (repack_without_refs(): don't lock or unlock the packed
refs, 2017-07-01) alludes to this. I think the races in question have to
do with holding the packed-refs lock while pruning the just-packed
files, but I'm having trouble digging up specifics in the archive.

-Peff


Re: [PATCH] packed_ref_store: handle a packed-refs file that is a symlink

2017-07-27 Thread Junio C Hamano
Michael Haggerty  writes:

> Change `commit_packed_refs()` to use `get_locked_file_path()` to find
> the path of the file that it should overwrite. Since that path was
> properly resolved when the lockfile was created, this restores the
> pre-42dfa7ecef behavior.

Because when we take a lock hold_lock_file() eventually calls into
lock_file() which by default takes the lock on the target of the
symbolic link (which is the sensible default, which is triggered in
this codepath), so this change to use the name of that file is all
that is needed.

Makes sense.  Makes me wonder why we need a separate .new file
(instead of writing into the .lock instead), but that is a different
issue.

Thanks.  I'll do the SYMLINKS thing while queuing.

> diff --git a/t/t3210-pack-refs.sh b/t/t3210-pack-refs.sh
> index 2bb4b25ed9..0d8a03e2a9 100755
> --- a/t/t3210-pack-refs.sh
> +++ b/t/t3210-pack-refs.sh
> @@ -238,4 +238,19 @@ test_expect_success 'retry acquiring packed-refs.lock' '
>   git -c core.packedrefstimeout=3000 pack-refs --all --prune
>  '
>  
> +test_expect_success 'pack symlinked packed-refs' '
> + # First make sure that symlinking works when reading:
> + git update-ref refs/heads/loosy refs/heads/master &&
> + git for-each-ref >all-refs-before &&
> + mv .git/packed-refs .git/my-deviant-packed-refs &&
> + ln -s my-deviant-packed-refs .git/packed-refs &&
> + git for-each-ref >all-refs-linked &&
> + test_cmp all-refs-before all-refs-linked &&
> + git pack-refs --all --prune &&
> + git for-each-ref >all-refs-packed &&
> + test_cmp all-refs-before all-refs-packed &&
> + test -h .git/packed-refs &&
> + test "$(readlink .git/packed-refs)" = "my-deviant-packed-refs"
> +'
> +
>  test_done


Re: [PATCH] packed_ref_store: handle a packed-refs file that is a symlink

2017-07-27 Thread Michael Haggerty
Stefan, Jonathan: you're both right, the new test needs the SYMLINKS
prerequisite. Junio, would you mind adding that flag?

Thanks,
Michael


Re: [PATCH] packed_ref_store: handle a packed-refs file that is a symlink

2017-07-26 Thread Jonathan Nieder
Hi,

Michael Haggerty wrote:

> One of the tricks that `contrib/workdir/git-new-workdir` plays is to
> making `packed-refs` in the new workdir a symlink to the `packed-refs`
> file in the original repository. Before
> 42dfa7ecef ("commit_packed_refs(): use a staging file separate from
> the lockfile", 2017-06-23), a lockfile was used as the staging file,
> and because the `LOCK_NO_DEREF` was not used, the pointed-to file was
> locked and modified.
>
> But after that commit, the staging file was created using a tempfile,
> with the end result that rewriting the `packed-refs` file in the
> workdir overwrote the symlink rather than the original `packed-refs`
> file.
>
> Change `commit_packed_refs()` to use `get_locked_file_path()` to find
> the path of the file that it should overwrite. Since that path was
> properly resolved when the lockfile was created, this restores the
> pre-42dfa7ecef behavior.
>
> Also add a test case to document this use case and prevent a
> regression like this from recurring.
>
> Signed-off-by: Michael Haggerty 

Reported-by: Dave Walker 

[...]
>  refs/packed-backend.c | 24 ++--
>  t/t3210-pack-refs.sh  | 15 +++
>  2 files changed, 33 insertions(+), 6 deletions(-)

The patch looks good, except for one nit marked below (*).  I'll apply
it locally and ask people to test it, probably tomorrow morning.

Reviewed-by: Jonathan Nieder 

Thanks for writing it.  Patch left unsnipped for reference.

> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> index a28befbfa3..59e7d1a509 100644
> --- a/refs/packed-backend.c
> +++ b/refs/packed-backend.c
> @@ -610,19 +610,27 @@ int commit_packed_refs(struct ref_store *ref_store, 
> struct strbuf *err)
>   struct packed_ref_cache *packed_ref_cache =
>   get_packed_ref_cache(refs);
>   int ok;
> + int ret = -1;
>   struct strbuf sb = STRBUF_INIT;
>   FILE *out;
>   struct ref_iterator *iter;
> + char *packed_refs_path;
>  
>   if (!is_lock_file_locked(>lock))
>   die("BUG: commit_packed_refs() called when unlocked");
>  
> - strbuf_addf(, "%s.new", refs->path);
> + /*
> +  * If packed-refs is a symlink, we want to overwrite the
> +  * symlinked-to file, not the symlink itself. Also, put the
> +  * staging file next to it:
> +  */
> + packed_refs_path = get_locked_file_path(>lock);
> + strbuf_addf(, "%s.new", packed_refs_path);
>   if (create_tempfile(>tempfile, sb.buf) < 0) {
>   strbuf_addf(err, "unable to create file %s: %s",
>   sb.buf, strerror(errno));
>   strbuf_release();
> - return -1;
> + goto out;
>   }
>   strbuf_release();
>  
> @@ -660,17 +668,21 @@ int commit_packed_refs(struct ref_store *ref_store, 
> struct strbuf *err)
>   goto error;
>   }
>  
> - if (rename_tempfile(>tempfile, refs->path)) {
> + if (rename_tempfile(>tempfile, packed_refs_path)) {
>   strbuf_addf(err, "error replacing %s: %s",
>   refs->path, strerror(errno));
> - return -1;
> + goto out;
>   }
>  
> - return 0;
> + ret = 0;
> + goto out;
>  
>  error:
>   delete_tempfile(>tempfile);
> - return -1;
> +
> +out:
> + free(packed_refs_path);
> + return ret;
>  }
>  
>  /*
> diff --git a/t/t3210-pack-refs.sh b/t/t3210-pack-refs.sh
> index 2bb4b25ed9..0d8a03e2a9 100755
> --- a/t/t3210-pack-refs.sh
> +++ b/t/t3210-pack-refs.sh
> @@ -238,4 +238,19 @@ test_expect_success 'retry acquiring packed-refs.lock' '
>   git -c core.packedrefstimeout=3000 pack-refs --all --prune
>  '
>  
> +test_expect_success 'pack symlinked packed-refs' '

(*) Does this need a SYMLINKS prereq to avoid trouble on Windows?

> + # First make sure that symlinking works when reading:
> + git update-ref refs/heads/loosy refs/heads/master &&
> + git for-each-ref >all-refs-before &&
> + mv .git/packed-refs .git/my-deviant-packed-refs &&
> + ln -s my-deviant-packed-refs .git/packed-refs &&
> + git for-each-ref >all-refs-linked &&
> + test_cmp all-refs-before all-refs-linked &&
> + git pack-refs --all --prune &&
> + git for-each-ref >all-refs-packed &&
> + test_cmp all-refs-before all-refs-packed &&
> + test -h .git/packed-refs &&
> + test "$(readlink .git/packed-refs)" = "my-deviant-packed-refs"
> +'
> +
>  test_done


Re: [PATCH] packed_ref_store: handle a packed-refs file that is a symlink

2017-07-26 Thread Stefan Beller
On Wed, Jul 26, 2017 at 4:39 PM, Michael Haggerty  wrote:
> One of the tricks that `contrib/workdir/git-new-workdir` plays is to
> making `packed-refs` in the new workdir a symlink to the `packed-refs`
> file in the original repository. Before
> 42dfa7ecef ("commit_packed_refs(): use a staging file separate from
> the lockfile", 2017-06-23), a lockfile was used as the staging file,
> and because the `LOCK_NO_DEREF` was not used, the pointed-to file was
> locked and modified.
>
> But after that commit, the staging file was created using a tempfile,
> with the end result that rewriting the `packed-refs` file in the
> workdir overwrote the symlink rather than the original `packed-refs`
> file.
>
> Change `commit_packed_refs()` to use `get_locked_file_path()` to find
> the path of the file that it should overwrite. Since that path was
> properly resolved when the lockfile was created, this restores the
> pre-42dfa7ecef behavior.
>
> Also add a test case to document this use case and prevent a
> regression like this from recurring.
>
> Signed-off-by: Michael Haggerty 
> ---
> Sorry for the slow response; I've been traveling and very busy.
>
> I didn't even know that a symlinked `packed-refs` file is a thing. The
> attached patch should fix the problem. It applies on top of v3 of
> mh/packed-ref-store [1] (which is already in next).

Thanks for providing a fix.
Code looks good to me, I wonder if the test needs SYMLINKS
special treatment though?

Stefan


[PATCH] packed_ref_store: handle a packed-refs file that is a symlink

2017-07-26 Thread Michael Haggerty
One of the tricks that `contrib/workdir/git-new-workdir` plays is to
making `packed-refs` in the new workdir a symlink to the `packed-refs`
file in the original repository. Before
42dfa7ecef ("commit_packed_refs(): use a staging file separate from
the lockfile", 2017-06-23), a lockfile was used as the staging file,
and because the `LOCK_NO_DEREF` was not used, the pointed-to file was
locked and modified.

But after that commit, the staging file was created using a tempfile,
with the end result that rewriting the `packed-refs` file in the
workdir overwrote the symlink rather than the original `packed-refs`
file.

Change `commit_packed_refs()` to use `get_locked_file_path()` to find
the path of the file that it should overwrite. Since that path was
properly resolved when the lockfile was created, this restores the
pre-42dfa7ecef behavior.

Also add a test case to document this use case and prevent a
regression like this from recurring.

Signed-off-by: Michael Haggerty 
---
Sorry for the slow response; I've been traveling and very busy.

I didn't even know that a symlinked `packed-refs` file is a thing. The
attached patch should fix the problem. It applies on top of v3 of
mh/packed-ref-store [1] (which is already in next).

Michael

[1] http://public-inbox.org/git/cover.1498933362.git.mhag...@alum.mit.edu/

 refs/packed-backend.c | 24 ++--
 t/t3210-pack-refs.sh  | 15 +++
 2 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index a28befbfa3..59e7d1a509 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -610,19 +610,27 @@ int commit_packed_refs(struct ref_store *ref_store, 
struct strbuf *err)
struct packed_ref_cache *packed_ref_cache =
get_packed_ref_cache(refs);
int ok;
+   int ret = -1;
struct strbuf sb = STRBUF_INIT;
FILE *out;
struct ref_iterator *iter;
+   char *packed_refs_path;
 
if (!is_lock_file_locked(>lock))
die("BUG: commit_packed_refs() called when unlocked");
 
-   strbuf_addf(, "%s.new", refs->path);
+   /*
+* If packed-refs is a symlink, we want to overwrite the
+* symlinked-to file, not the symlink itself. Also, put the
+* staging file next to it:
+*/
+   packed_refs_path = get_locked_file_path(>lock);
+   strbuf_addf(, "%s.new", packed_refs_path);
if (create_tempfile(>tempfile, sb.buf) < 0) {
strbuf_addf(err, "unable to create file %s: %s",
sb.buf, strerror(errno));
strbuf_release();
-   return -1;
+   goto out;
}
strbuf_release();
 
@@ -660,17 +668,21 @@ int commit_packed_refs(struct ref_store *ref_store, 
struct strbuf *err)
goto error;
}
 
-   if (rename_tempfile(>tempfile, refs->path)) {
+   if (rename_tempfile(>tempfile, packed_refs_path)) {
strbuf_addf(err, "error replacing %s: %s",
refs->path, strerror(errno));
-   return -1;
+   goto out;
}
 
-   return 0;
+   ret = 0;
+   goto out;
 
 error:
delete_tempfile(>tempfile);
-   return -1;
+
+out:
+   free(packed_refs_path);
+   return ret;
 }
 
 /*
diff --git a/t/t3210-pack-refs.sh b/t/t3210-pack-refs.sh
index 2bb4b25ed9..0d8a03e2a9 100755
--- a/t/t3210-pack-refs.sh
+++ b/t/t3210-pack-refs.sh
@@ -238,4 +238,19 @@ test_expect_success 'retry acquiring packed-refs.lock' '
git -c core.packedrefstimeout=3000 pack-refs --all --prune
 '
 
+test_expect_success 'pack symlinked packed-refs' '
+   # First make sure that symlinking works when reading:
+   git update-ref refs/heads/loosy refs/heads/master &&
+   git for-each-ref >all-refs-before &&
+   mv .git/packed-refs .git/my-deviant-packed-refs &&
+   ln -s my-deviant-packed-refs .git/packed-refs &&
+   git for-each-ref >all-refs-linked &&
+   test_cmp all-refs-before all-refs-linked &&
+   git pack-refs --all --prune &&
+   git for-each-ref >all-refs-packed &&
+   test_cmp all-refs-before all-refs-packed &&
+   test -h .git/packed-refs &&
+   test "$(readlink .git/packed-refs)" = "my-deviant-packed-refs"
+'
+
 test_done
-- 
2.11.0