Re: on Amazon EFS (NFS): "Reference directory conflict: refs/heads/" with status code 128

2016-08-25 Thread Michael Haggerty
On 08/25/2016 06:01 PM, Alex Nauda wrote:
> On Thu, Aug 25, 2016 at 2:28 AM, Michael Haggerty  
> wrote:
>> On 08/24/2016 11:39 PM, Jeff King wrote:
>>> On Wed, Aug 24, 2016 at 04:52:33PM -0400, Alex Nauda wrote:
>>>
 Elastic File System (EFS) is Amazon's scalable filesystem product that
 is exposed to the OS as an NFS mount. We're using EFS to host the
 filesystem used by a Jenkins CI server. Sometimes when Jenkins tries
 to git fetch, we get this error:
 $ git -c core.askpass=true fetch --tags --progress
 g...@github.com:mediasilo/dodo.git
 +refs/pull/*:refs/remotes/origin/pr/*
 fatal: Reference directory conflict: refs/heads/
 $ echo $? 128

 Has anyone seen anything like this before? Any tips on how to troubleshoot 
 it?
>>>
>>> No, I haven't seen it before. That's an internal assertion in the refs
>>> code that shouldn't ever happen. It looks like it happens when the loose
>>> refs end up with duplicate directory entries. While a bug in git is an
>>> obvious culprit, I wonder if it's possible that your filesystem might
>>> expose the same name twice in one set of readdir() results.
>>>
>>> +cc Michael, who added this assertion long ago (and since this is the
>>> first report in all these years, it does make me suspect that the
>>> filesystem is a critical part of reproducing).
>>
>> Thanks for the CC.
>>
>> I've never heard of this problem before.
>>
>> What Git version are you using?
> Git client 2.7.4 against GitHub (Git 2.6.5)
> 
>>
>> I tried to provoke the problem by hand-corrupting the packed-refs file,
>> but wasn't successful.
>>
>> So Peff's suggestion that the problem originates in your filesystem
>> seems to be to be the most likely cause. A quick Google search found,
>> for example,
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=739222
>>
>> http://superuser.com/questions/640419/how-can-i-have-two-files-with-the-same-name-in-a-directory-when-mounted-with-nfs
>>
>> though these reports seem connected with having lots of files in the
>> directory, which seems unlikely for `$GIT_DIR/refs/`. But I didn't do a
>> more careful search, and it is easily possible that there are other bugs
>> in NFS (or EFS) that could be affecting you.
>>
>> If this were repeatable, you could run Git under strace to test Peff's
>> hypothesis. But I suppose it only happens rarely, right?
> Actually it seems to be reproducible. Here's the last portion of an strace:
> 
> [...]
> stat(".git/refs/remotes/origin/pr/7/head", {st_mode=S_IFREG|0644,
> st_size=41, ...}) = 0
> lstat(".git/refs/remotes/origin/pr/7/head", {st_mode=S_IFREG|0644,
> st_size=41, ...}) = 0
> open(".git/refs/remotes/origin/pr/7/head", O_RDONLY) = 4
> read(4, "5d82811a248900efd8e201c6d9232de5"..., 256) = 41
> read(4, "", 215)= 0
> close(4)= 0
> getdents(3, /* 0 entries */, 32768) = 0
> close(3)= 0
> open(".git/refs/remotes/origin/pr/16/",
> O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
> fstat(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
> getdents(3, /* 3 entries */, 32768) = 72
> stat(".git/refs/remotes/origin/pr/16/head", {st_mode=S_IFREG|0644,
> st_size=41, ...}) = 0
> lstat(".git/refs/remotes/origin/pr/16/head", {st_mode=S_IFREG|0644,
> st_size=41, ...}) = 0
> open(".git/refs/remotes/origin/pr/16/head", O_RDONLY) = 4
> read(4, "2886c4f3ba8c3b5c2306029f6e39498d"..., 256) = 41
> read(4, "", 215)= 0
> close(4)= 0
> getdents(3, /* 0 entries */, 32768) = 0
> close(3)= 0
> open(".git/refs/tags/", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
> fstat(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
> getdents(3, /* 2 entries */, 32768) = 48
> getdents(3, /* 0 entries */, 32768) = 0
> close(3)= 0
> open(".git/refs/bisect/", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) =
> -1 ENOENT (No such file or directory)
> open(".git/packed-refs", O_RDONLY)  = -1 ENOENT (No such file or 
> directory)
> fstat(2, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 3), ...}) = 0
> write(2, "fatal: Reference directory confl"..., 58fatal: Reference
> directory conflict: refs/remotes/origin/
> ) = 58
> exit_group(128) = ?
> +++ exited with 128 +++

Thanks for the additional information.

>From the strace output it is clear that there is no packed-refs file at
the time of the problem, so the problem must be among the loose refs.

The error is a "Reference directory conflict", which suggests that
"refs/remotes/origin/" appears in two entries; once as a reference
directory and once as a reference. But in fact it could also mean that
"refs/remotes/origin/" appears twice, both as directories. Neither one
should happen in normal operation.

Unfortunately there is not enough strace output to see whether (in this
case) path `refs/remotes/origin` was 

Re: on Amazon EFS (NFS): "Reference directory conflict: refs/heads/" with status code 128

2016-08-25 Thread Alex Nauda
On Thu, Aug 25, 2016 at 2:28 AM, Michael Haggerty  wrote:
> On 08/24/2016 11:39 PM, Jeff King wrote:
>> On Wed, Aug 24, 2016 at 04:52:33PM -0400, Alex Nauda wrote:
>>
>>> Elastic File System (EFS) is Amazon's scalable filesystem product that
>>> is exposed to the OS as an NFS mount. We're using EFS to host the
>>> filesystem used by a Jenkins CI server. Sometimes when Jenkins tries
>>> to git fetch, we get this error:
>>> $ git -c core.askpass=true fetch --tags --progress
>>> g...@github.com:mediasilo/dodo.git
>>> +refs/pull/*:refs/remotes/origin/pr/*
>>> fatal: Reference directory conflict: refs/heads/
>>> $ echo $? 128
>>>
>>> Has anyone seen anything like this before? Any tips on how to troubleshoot 
>>> it?
>>
>> No, I haven't seen it before. That's an internal assertion in the refs
>> code that shouldn't ever happen. It looks like it happens when the loose
>> refs end up with duplicate directory entries. While a bug in git is an
>> obvious culprit, I wonder if it's possible that your filesystem might
>> expose the same name twice in one set of readdir() results.
>>
>> +cc Michael, who added this assertion long ago (and since this is the
>> first report in all these years, it does make me suspect that the
>> filesystem is a critical part of reproducing).
>
> Thanks for the CC.
>
> I've never heard of this problem before.
>
> What Git version are you using?
Git client 2.7.4 against GitHub (Git 2.6.5)

>
> I tried to provoke the problem by hand-corrupting the packed-refs file,
> but wasn't successful.
>
> So Peff's suggestion that the problem originates in your filesystem
> seems to be to be the most likely cause. A quick Google search found,
> for example,
>
> https://bugzilla.redhat.com/show_bug.cgi?id=739222
>
> http://superuser.com/questions/640419/how-can-i-have-two-files-with-the-same-name-in-a-directory-when-mounted-with-nfs
>
> though these reports seem connected with having lots of files in the
> directory, which seems unlikely for `$GIT_DIR/refs/`. But I didn't do a
> more careful search, and it is easily possible that there are other bugs
> in NFS (or EFS) that could be affecting you.
>
> If this were repeatable, you could run Git under strace to test Peff's
> hypothesis. But I suppose it only happens rarely, right?
Actually it seems to be reproducible. Here's the last portion of an strace:

[...]
stat(".git/refs/remotes/origin/pr/7/head", {st_mode=S_IFREG|0644,
st_size=41, ...}) = 0
lstat(".git/refs/remotes/origin/pr/7/head", {st_mode=S_IFREG|0644,
st_size=41, ...}) = 0
open(".git/refs/remotes/origin/pr/7/head", O_RDONLY) = 4
read(4, "5d82811a248900efd8e201c6d9232de5"..., 256) = 41
read(4, "", 215)= 0
close(4)= 0
getdents(3, /* 0 entries */, 32768) = 0
close(3)= 0
open(".git/refs/remotes/origin/pr/16/",
O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
getdents(3, /* 3 entries */, 32768) = 72
stat(".git/refs/remotes/origin/pr/16/head", {st_mode=S_IFREG|0644,
st_size=41, ...}) = 0
lstat(".git/refs/remotes/origin/pr/16/head", {st_mode=S_IFREG|0644,
st_size=41, ...}) = 0
open(".git/refs/remotes/origin/pr/16/head", O_RDONLY) = 4
read(4, "2886c4f3ba8c3b5c2306029f6e39498d"..., 256) = 41
read(4, "", 215)= 0
close(4)= 0
getdents(3, /* 0 entries */, 32768) = 0
close(3)= 0
open(".git/refs/tags/", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
getdents(3, /* 2 entries */, 32768) = 48
getdents(3, /* 0 entries */, 32768) = 0
close(3)= 0
open(".git/refs/bisect/", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) =
-1 ENOENT (No such file or directory)
open(".git/packed-refs", O_RDONLY)  = -1 ENOENT (No such file or directory)
fstat(2, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 3), ...}) = 0
write(2, "fatal: Reference directory confl"..., 58fatal: Reference
directory conflict: refs/remotes/origin/
) = 58
exit_group(128) = ?
+++ exited with 128 +++

>
> Is it possible that multiple clients have the same NFS filesystem
> mounted while Git is running? That would seem like an especially bad
> idea and I could imagine it leading to problems like this.
>
> It's surprising that you are seeing this problem in directory `refs`,
> because (1) that directory is unlikely to have very many entries, and
> (2) as far as I remember, Git will never delete the directories
> `refs/heads` and `refs/tags`.
Seems like sometimes it happens on other directories:
refs/remotes/origin/ or refs/remotes/origin/pr/1
Then as I was stracing it again, suddenly it succeeded. Some kind of
race condition?

>
> Michael
>
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: on Amazon EFS (NFS): "Reference directory conflict: refs/heads/" with status code 128

2016-08-25 Thread Michael Haggerty
On 08/24/2016 11:39 PM, Jeff King wrote:
> On Wed, Aug 24, 2016 at 04:52:33PM -0400, Alex Nauda wrote:
> 
>> Elastic File System (EFS) is Amazon's scalable filesystem product that
>> is exposed to the OS as an NFS mount. We're using EFS to host the
>> filesystem used by a Jenkins CI server. Sometimes when Jenkins tries
>> to git fetch, we get this error:
>> $ git -c core.askpass=true fetch --tags --progress
>> g...@github.com:mediasilo/dodo.git
>> +refs/pull/*:refs/remotes/origin/pr/*
>> fatal: Reference directory conflict: refs/heads/
>> $ echo $? 128
>>
>> Has anyone seen anything like this before? Any tips on how to troubleshoot 
>> it?
> 
> No, I haven't seen it before. That's an internal assertion in the refs
> code that shouldn't ever happen. It looks like it happens when the loose
> refs end up with duplicate directory entries. While a bug in git is an
> obvious culprit, I wonder if it's possible that your filesystem might
> expose the same name twice in one set of readdir() results.
> 
> +cc Michael, who added this assertion long ago (and since this is the
> first report in all these years, it does make me suspect that the
> filesystem is a critical part of reproducing).

Thanks for the CC.

I've never heard of this problem before.

What Git version are you using?

I tried to provoke the problem by hand-corrupting the packed-refs file,
but wasn't successful.

So Peff's suggestion that the problem originates in your filesystem
seems to be to be the most likely cause. A quick Google search found,
for example,

https://bugzilla.redhat.com/show_bug.cgi?id=739222

http://superuser.com/questions/640419/how-can-i-have-two-files-with-the-same-name-in-a-directory-when-mounted-with-nfs

though these reports seem connected with having lots of files in the
directory, which seems unlikely for `$GIT_DIR/refs/`. But I didn't do a
more careful search, and it is easily possible that there are other bugs
in NFS (or EFS) that could be affecting you.

If this were repeatable, you could run Git under strace to test Peff's
hypothesis. But I suppose it only happens rarely, right?

Is it possible that multiple clients have the same NFS filesystem
mounted while Git is running? That would seem like an especially bad
idea and I could imagine it leading to problems like this.

It's surprising that you are seeing this problem in directory `refs`,
because (1) that directory is unlikely to have very many entries, and
(2) as far as I remember, Git will never delete the directories
`refs/heads` and `refs/tags`.

Michael

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: on Amazon EFS (NFS): "Reference directory conflict: refs/heads/" with status code 128

2016-08-24 Thread Jeff King
On Wed, Aug 24, 2016 at 04:52:33PM -0400, Alex Nauda wrote:

> Elastic File System (EFS) is Amazon's scalable filesystem product that
> is exposed to the OS as an NFS mount. We're using EFS to host the
> filesystem used by a Jenkins CI server. Sometimes when Jenkins tries
> to git fetch, we get this error:
> $ git -c core.askpass=true fetch --tags --progress
> g...@github.com:mediasilo/dodo.git
> +refs/pull/*:refs/remotes/origin/pr/*
> fatal: Reference directory conflict: refs/heads/
> $ echo $? 128
> 
> Has anyone seen anything like this before? Any tips on how to troubleshoot it?

No, I haven't seen it before. That's an internal assertion in the refs
code that shouldn't ever happen. It looks like it happens when the loose
refs end up with duplicate directory entries. While a bug in git is an
obvious culprit, I wonder if it's possible that your filesystem might
expose the same name twice in one set of readdir() results.

+cc Michael, who added this assertion long ago (and since this is the
first report in all these years, it does make me suspect that the
filesystem is a critical part of reproducing).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


on Amazon EFS (NFS): "Reference directory conflict: refs/heads/" with status code 128

2016-08-24 Thread Alex Nauda
Elastic File System (EFS) is Amazon's scalable filesystem product that
is exposed to the OS as an NFS mount. We're using EFS to host the
filesystem used by a Jenkins CI server. Sometimes when Jenkins tries
to git fetch, we get this error:
$ git -c core.askpass=true fetch --tags --progress
g...@github.com:mediasilo/dodo.git
+refs/pull/*:refs/remotes/origin/pr/*
fatal: Reference directory conflict: refs/heads/
$ echo $? 128

Has anyone seen anything like this before? Any tips on how to troubleshoot it?

Related Jenkins issue: https://issues.jenkins-ci.org/browse/JENKINS-37653
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html