Re: [PATCH 1/4] resolve_ref: close race condition for packed refs

2013-05-15 Thread Michael Haggerty
On 05/16/2013 05:47 AM, Jeff King wrote:
>> I probably would have separated the rest of the patch, which is a pure
>> refactoring, from this last chunk, which is a functional change.  But
>> that's just me.
> 
> Yeah, I go back and forth on whether it is better to have strict
> refactors followed by changes or not. Sometimes it is hard to understand
> the motivation for the refactor without seeing the change, and you end
> up explaining it twice.

A pure refactoring doesn't need much justification.  Something like
"extract function foo()" plus maybe "this function will soon have
multiple callers" is IMO usually adequate, especially if the function is
well-named and documented in the patch itself.

> My usual rule of thumb is:
> 
>   - If you are factoring out some code, and then are going to change
> that code, make it two separate changes. That keeps the diffs
> readable (the first one is pure movement and you do not need to look
> closely, and the second shows a sane diff of the change).
> 
>   - If you are factoring out some code, and then adding more callers,
> it's OK to do it together. The new caller provides the motivation
> for the refactor.
> 
> This is the latter case. But I'm open to arguments that the rule is not
> a good one. :)

Yes, I see how keeping the changes together makes the justification of
the refactoring more obvious.  On the other hand, splitting has the
following benefits:

1. Reviewers have a single thing to check in each patch: "Did he
   transcribe the code correctly into a function and choose a good
   API?" vs. "Does it make sense to call the function from this new
   place?"  The threads of feedback emails about each patch are
   similarly separated.

   On the other hand, of course these two changes are not completely
   independent, because having an idea what new callers want to do
   with the function affects what its API should be.

2. If the patch series needs to be revised, it is quite possible that
   the revisions affect only one patch or the other.  Therefore, the
   unaffected patch can be carried along through interactive rebases
   etc. intact, or might serve as a building block for an alternative
   solution.

3. If there's a problem, bisect can figure out which half of the change
   was to blame.

That being said, this case is very much in the gray area where it is a
matter of personal preference and I don't mind at all if you leave it as
a single patch.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] resolve_ref: close race condition for packed refs

2013-05-15 Thread Jeff King
On Mon, May 13, 2013 at 12:56:05AM +0200, Michael Haggerty wrote:

> > + * This should be called from resolve_ref_unsafe when a loose ref cannot be
> > + * accessed; err must represent the errno from the last attempt to access 
> > the
> > + * loose ref, and the other options are forwarded from 
> > resolve_safe_unsaefe.
> 
> s/resolve_ref_unsaefe/resolve_ref_unsafe/

Oops, thanks.

> > -   return NULL;
> > +   return handle_loose_ref_failure(errno, refname, sha1,
> > +   reading, flag);
> 
> I probably would have separated the rest of the patch, which is a pure
> refactoring, from this last chunk, which is a functional change.  But
> that's just me.

Yeah, I go back and forth on whether it is better to have strict
refactors followed by changes or not. Sometimes it is hard to understand
the motivation for the refactor without seeing the change, and you end
up explaining it twice.

My usual rule of thumb is:

  - If you are factoring out some code, and then are going to change
that code, make it two separate changes. That keeps the diffs
readable (the first one is pure movement and you do not need to look
closely, and the second shows a sane diff of the change).

  - If you are factoring out some code, and then adding more callers,
it's OK to do it together. The new caller provides the motivation
for the refactor.

This is the latter case. But I'm open to arguments that the rule is not
a good one. :)

> I suggest adding a comment here mentioning briefly the race condition
> that the call to handle_loose_ref_failure() is meant to address;
> otherwise somebody not thinking of race conditions might have the clever
> idea of "inlining" the call to "return NULL" because it seems redundant
> with the check of ENOENT following the lstat() call above.

Yeah, I thought I had mentioned that at the top of
handle_loose_ref_failure, but I see that I didn't. Probably something
like this squashed on top makes sense:

diff --git a/refs.c b/refs.c
index c127baf..1a7e4ef 100644
--- a/refs.c
+++ b/refs.c
@@ -,7 +,7 @@ static int get_packed_ref(const char *refname, unsigned 
char *sha1)
 /*
  * This should be called from resolve_ref_unsafe when a loose ref cannot be
  * accessed; err must represent the errno from the last attempt to access the
- * loose ref, and the other options are forwarded from resolve_safe_unsaefe.
+ * loose ref, and the other options are forwarded from resolve_safe_unsafe.
  */
 static const char *handle_loose_ref_failure(int err,
const char *refname,
@@ -1200,9 +1200,16 @@ const char *resolve_ref_unsafe(const char *refname, 
unsigned char *sha1, int rea
 * a ref
 */
fd = open(path, O_RDONLY);
-   if (fd < 0)
+   if (fd < 0) {
+   /*
+* We need to check again here for ENOENT and fall back
+* to the packed-refs file to avoid a race condition in
+* which the ref is packed and pruned between our
+* lstat() and open() calls.
+*/
return handle_loose_ref_failure(errno, refname, sha1,
reading, flag);
+   }
len = read_in_full(fd, buffer, sizeof(buffer)-1);
close(fd);
if (len < 0)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] resolve_ref: close race condition for packed refs

2013-05-12 Thread Michael Haggerty
On 05/07/2013 04:38 AM, Jeff King wrote:
> [...]
> This patch solves it by factoring out the fallback code from
> the lstat() case and calling it from both places. We do not
> need to do the same thing for the symlink/readlink code
> path, even though it might receive ENOENT, too, because
> symlinks cannot be packed (so if it disappears after the
> lstat, it is truly being deleted).

To be really pedantic, we should handle all kinds of changes that
another process might make simultaneously:

* symlink -> deleted

  Was already OK, as you explained above.

* regular file -> deleted

  Handled by your patch

* symlink -> regular file

  Could come from

  update-ref --no-deref

  if the old version of the reference were a symlink-based symbolic
  reference.  Oops, wait, that's broken anyway:

  $ ln -sf master .git/refs/heads/foo
  $ git for-each-ref
  4204906e2ee75f9b7224860c40df712d112c004b commit   refs/heads/foo
  4204906e2ee75f9b7224860c40df712d112c004b commit   refs/heads/master
  $ git update-ref --no-deref refs/heads/foo master
  $ dir .git/refs/heads/foo
  lrwxrwxrwx 1 mhagger mhagger 6 May 13 01:07 .git/refs/heads/foo ->
master

  But supposing that were fixed and it occurred between the call to
  lstat() and the call to readlink(), then readlink() would fail with
  errno == EINVAL and we should respond by repeating the code starting
  with the lstat().

* regular file -> symlink

  Could come from overwriting a reference with a symlink-based symbolic
  reference.  But this could only happen if the parallel process were
  an old version of Git

I think it's been quite a while since Git has used symlinks for symbolic
references, so I doubt that it is worth fixing the second two cases.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] resolve_ref: close race condition for packed refs

2013-05-12 Thread Michael Haggerty
On 05/07/2013 04:38 AM, Jeff King wrote:
> When we attempt to resolve a ref, the first thing we do is
> call lstat() to see if it is a symlink or a real ref. If we
> find that the ref is missing, we fall back to looking it up
> in the packed-refs file. If we find the loose ref does exist
> (and is a regular file), we continue with trying to open it.
> 
> However, we do not do the same fallback if our open() call
> fails; we just report the ref as missing.  A "git pack-refs
> --prune" process which is simultaneously running may remove
> the loose ref between our lstat() and open().  In this case,
> we would erroneously report the ref as missing, even though
> we could find it if we checked the packed-refs file.
> 
> This patch solves it by factoring out the fallback code from
> the lstat() case and calling it from both places. We do not
> need to do the same thing for the symlink/readlink code
> path, even though it might receive ENOENT, too, because
> symlinks cannot be packed (so if it disappears after the
> lstat, it is truly being deleted).
> 
> Note that this solves only the part of the race within
> resolve_ref_unsafe. In the situation described above, we may
> still be depending on a cached view of the packed-refs file;
> that race will be dealt with in a future patch.
> 
> Signed-off-by: Jeff King 

+1, with trivial suggestions below.

> ---
> 
>  refs.c | 63 ++-
>  1 file changed, 42 insertions(+), 21 deletions(-)
> 
> diff --git a/refs.c b/refs.c
> index de2d8eb..5a14703 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -1083,6 +1083,43 @@ static int get_packed_ref(const char *refname, 
> unsigned char *sha1)
>   return -1;
>  }
>  
> +/*
> + * This should be called from resolve_ref_unsafe when a loose ref cannot be
> + * accessed; err must represent the errno from the last attempt to access the
> + * loose ref, and the other options are forwarded from resolve_safe_unsaefe.

s/resolve_ref_unsaefe/resolve_ref_unsafe/

> + */
> +static const char *handle_loose_ref_failure(int err,
> + const char *refname,
> + unsigned char *sha1,
> + int reading,
> + int *flag)
> +{
> + /*
> +  * If we didn't get ENOENT, something is broken
> +  * with the loose ref, and we should not fallback,
> +  * but instead pretend it doesn't exist.
> +  */
> + if (err != ENOENT)
> + return NULL;
> +
> + /*
> +  * The loose reference file does not exist;
> +  * check for a packed reference.
> +  */
> + if (!get_packed_ref(refname, sha1)) {
> + if (flag)
> + *flag |= REF_ISPACKED;
> + return refname;
> + }
> +
> + /* The reference is not a packed reference, either. */
> + if (reading)
> + return NULL;
> +
> + hashclr(sha1);
> + return refname;
> +}
> +
>  const char *resolve_ref_unsafe(const char *refname, unsigned char *sha1, int 
> reading, int *flag)
>  {
>   int depth = MAXDEPTH;
> @@ -1107,26 +1144,9 @@ const char *resolve_ref_unsafe(const char *refname, 
> unsigned char *sha1, int rea
>  
>   git_snpath(path, sizeof(path), "%s", refname);
>  
> - if (lstat(path, &st) < 0) {
> - if (errno != ENOENT)
> - return NULL;
> - /*
> -  * The loose reference file does not exist;
> -  * check for a packed reference.
> -  */
> - if (!get_packed_ref(refname, sha1)) {
> - if (flag)
> - *flag |= REF_ISPACKED;
> - return refname;
> - }
> - /* The reference is not a packed reference, either. */
> - if (reading) {
> - return NULL;
> - } else {
> - hashclr(sha1);
> - return refname;
> - }
> - }
> + if (lstat(path, &st) < 0)
> + return handle_loose_ref_failure(errno, refname, sha1,
> + reading, flag);
>  
>   /* Follow "normalized" - ie "refs/.." symlinks by hand */
>   if (S_ISLNK(st.st_mode)) {
> @@ -1156,7 +1176,8 @@ const char *resolve_ref_unsafe(const char *refname, 
> unsigned char *sha1, int rea
>*/
>   fd = open(path, O_RDONLY);
>   if (fd < 0)
> - return NULL;
> + return handle_loose_ref_failure(errno, refname, sha1,
> + reading, flag);

I probably would have separated the rest of the patch, which is 

[PATCH 1/4] resolve_ref: close race condition for packed refs

2013-05-06 Thread Jeff King
When we attempt to resolve a ref, the first thing we do is
call lstat() to see if it is a symlink or a real ref. If we
find that the ref is missing, we fall back to looking it up
in the packed-refs file. If we find the loose ref does exist
(and is a regular file), we continue with trying to open it.

However, we do not do the same fallback if our open() call
fails; we just report the ref as missing.  A "git pack-refs
--prune" process which is simultaneously running may remove
the loose ref between our lstat() and open().  In this case,
we would erroneously report the ref as missing, even though
we could find it if we checked the packed-refs file.

This patch solves it by factoring out the fallback code from
the lstat() case and calling it from both places. We do not
need to do the same thing for the symlink/readlink code
path, even though it might receive ENOENT, too, because
symlinks cannot be packed (so if it disappears after the
lstat, it is truly being deleted).

Note that this solves only the part of the race within
resolve_ref_unsafe. In the situation described above, we may
still be depending on a cached view of the packed-refs file;
that race will be dealt with in a future patch.

Signed-off-by: Jeff King 
---

 refs.c | 63 ++-
 1 file changed, 42 insertions(+), 21 deletions(-)

diff --git a/refs.c b/refs.c
index de2d8eb..5a14703 100644
--- a/refs.c
+++ b/refs.c
@@ -1083,6 +1083,43 @@ static int get_packed_ref(const char *refname, unsigned 
char *sha1)
return -1;
 }
 
+/*
+ * This should be called from resolve_ref_unsafe when a loose ref cannot be
+ * accessed; err must represent the errno from the last attempt to access the
+ * loose ref, and the other options are forwarded from resolve_safe_unsaefe.
+ */
+static const char *handle_loose_ref_failure(int err,
+   const char *refname,
+   unsigned char *sha1,
+   int reading,
+   int *flag)
+{
+   /*
+* If we didn't get ENOENT, something is broken
+* with the loose ref, and we should not fallback,
+* but instead pretend it doesn't exist.
+*/
+   if (err != ENOENT)
+   return NULL;
+
+   /*
+* The loose reference file does not exist;
+* check for a packed reference.
+*/
+   if (!get_packed_ref(refname, sha1)) {
+   if (flag)
+   *flag |= REF_ISPACKED;
+   return refname;
+   }
+
+   /* The reference is not a packed reference, either. */
+   if (reading)
+   return NULL;
+
+   hashclr(sha1);
+   return refname;
+}
+
 const char *resolve_ref_unsafe(const char *refname, unsigned char *sha1, int 
reading, int *flag)
 {
int depth = MAXDEPTH;
@@ -1107,26 +1144,9 @@ const char *resolve_ref_unsafe(const char *refname, 
unsigned char *sha1, int rea
 
git_snpath(path, sizeof(path), "%s", refname);
 
-   if (lstat(path, &st) < 0) {
-   if (errno != ENOENT)
-   return NULL;
-   /*
-* The loose reference file does not exist;
-* check for a packed reference.
-*/
-   if (!get_packed_ref(refname, sha1)) {
-   if (flag)
-   *flag |= REF_ISPACKED;
-   return refname;
-   }
-   /* The reference is not a packed reference, either. */
-   if (reading) {
-   return NULL;
-   } else {
-   hashclr(sha1);
-   return refname;
-   }
-   }
+   if (lstat(path, &st) < 0)
+   return handle_loose_ref_failure(errno, refname, sha1,
+   reading, flag);
 
/* Follow "normalized" - ie "refs/.." symlinks by hand */
if (S_ISLNK(st.st_mode)) {
@@ -1156,7 +1176,8 @@ const char *resolve_ref_unsafe(const char *refname, 
unsigned char *sha1, int rea
 */
fd = open(path, O_RDONLY);
if (fd < 0)
-   return NULL;
+   return handle_loose_ref_failure(errno, refname, sha1,
+   reading, flag);
len = read_in_full(fd, buffer, sizeof(buffer)-1);
close(fd);
if (len < 0)
-- 
1.8.3.rc1.2.g12db477

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel