Re: Deadlock between git-remote-http and git fetch-pack

2017-01-27 Thread Junio C Hamano
tsuna  writes:

> While investigating a hung job in our CI system today, I think I found
> a deadlock in git-remote-http
> ...
> Here PID 27319 (git fetch-pack) is stuck reading on stdin, while its
> parent, PID 27317 (git-remote-http) is stuck reading on its child’s
> stdout.  Nothing has moved for like 2h, it’s deadlocked.

Hmph, would this be related to 296b847c0d ("remote-curl: don't hang
when a server dies before any output", 2016-11-18) I wonder...


Re: Deadlock between git-remote-http and git fetch-pack

2017-01-27 Thread Jonathan Tan

On 01/27/2017 02:31 PM, tsuna wrote:

Hi there,
While investigating a hung job in our CI system today, I think I found
a deadlock in git-remote-http

Git version: 2.9.3
Linux (amd64) kernel 4.9.0

Excerpt from the process list:

jenkins  27316  0.0  0.0  18508  6024 ?S19:30   0:00  |
   \_ git -C ../../../arista fetch --unshallow
jenkins  27317  0.0  0.0 169608 10916 ?S19:30   0:00  |
   \_ git-remote-http origin http://gerrit/arista
jenkins  27319  0.0  0.0  24160  8260 ?S19:30   0:00  |
   \_ git fetch-pack --stateless-rpc --stdin
--lock-pack --include-tag --thin --no-progress --depth=2147483647
http://gerrit/arista/

Here PID 27319 (git fetch-pack) is stuck reading on stdin, while its
parent, PID 27317 (git-remote-http) is stuck reading on its child’s
stdout.  Nothing has moved for like 2h, it’s deadlocked.


strace -fp 27319

strace: Process 27319 attached
read(0,

Here FD 0 is a pipe:

~ @8a33a534e2f7> lsof -np 27319 | grep 0r
git 27319 jenkins0r  FIFO   0,10  0t0 354519158 pipe

The writing end of which is owned by the parent process:

~ @8a33a534e2f7> lsof -n 2>/dev/null | fgrep 354519158
git-remot 27317jenkins4w FIFO   0,10  0t0
354519158 pipe
git   27319jenkins0r FIFO   0,10  0t0
354519158 pipe

And the parent process (git-remote-http) is stuck reading from another FD:


strace -fp 27317

strace: Process 27317 attached
read(5,

And here FD 5 is another pipe:

~ @8a33a534e2f7> lsof -np 27317 | grep 5r
git-remot 27317 jenkins5r  FIFO   0,10  0t0 354519159 pipe

Which is the child’s stdout:


lsof -n 2>/dev/null | fgrep 354519159

git-remot 27317jenkins5r FIFO   0,10  0t0
354519159 pipe
git   27319jenkins1w FIFO   0,10  0t0
354519159 pipe

Hence the deadlock.

Stack trace in git-remote-http:

(gdb) bt
#0  0x7f04f1e1363d in read () from target:/lib64/libpthread.so.0
#1  0x562417472d73 in xread ()
#2  0x562417472f2b in read_in_full ()
#3  0x562417438a6e in get_packet_data ()
#4  0x562417439129 in packet_read ()
#5  0x5624174245e0 in rpc_service ()
#6  0x562417424f10 in fetch_git ()
#7  0x5624174233fd in main ()

Stack trace in git fetch-pack:

(gdb) bt
#0  0x7fb3ab478620 in __read_nocancel () from target:/lib64/libpthread.so.0
#1  0x55f688827283 in xread ()
#2  0x55f68882743b in read_in_full ()
#3  0x55f6887ce35e in get_packet_data ()
#4  0x55f6887cea19 in packet_read ()
#5  0x55f6887ceb90 in packet_read_line ()
#6  0x55f68879dd05 in get_ack ()
#7  0x55f68879f6b4 in fetch_pack ()
#8  0x55f688710619 in cmd_fetch_pack ()
#9  0x55f6886dff7b in handle_builtin ()
#10 0x55f6886df026 in main ()

I looked at the diff between v2.9.3 and HEAD on fetch-pack.c and
remote-curl.c and didn’t see anything noteworthy in that area of the
code, so I presume the bug is still there in master.



I haven't looked into this in detail, but this might be related to 
something I discovered while writing my patch set. I noticed that 
upload-pack (the process on the "other side" of fetch-pack) can die 
without first writing any notification, causing fetch-pack to block 
forever on a read. A fix would probably look like that patch [1].


[1] 



Deadlock between git-remote-http and git fetch-pack

2017-01-27 Thread tsuna
Hi there,
While investigating a hung job in our CI system today, I think I found
a deadlock in git-remote-http

Git version: 2.9.3
Linux (amd64) kernel 4.9.0

Excerpt from the process list:

jenkins  27316  0.0  0.0  18508  6024 ?S19:30   0:00  |
   \_ git -C ../../../arista fetch --unshallow
jenkins  27317  0.0  0.0 169608 10916 ?S19:30   0:00  |
   \_ git-remote-http origin http://gerrit/arista
jenkins  27319  0.0  0.0  24160  8260 ?S19:30   0:00  |
   \_ git fetch-pack --stateless-rpc --stdin
--lock-pack --include-tag --thin --no-progress --depth=2147483647
http://gerrit/arista/

Here PID 27319 (git fetch-pack) is stuck reading on stdin, while its
parent, PID 27317 (git-remote-http) is stuck reading on its child’s
stdout.  Nothing has moved for like 2h, it’s deadlocked.

> strace -fp 27319
strace: Process 27319 attached
read(0,

Here FD 0 is a pipe:

~ @8a33a534e2f7> lsof -np 27319 | grep 0r
git 27319 jenkins0r  FIFO   0,10  0t0 354519158 pipe

The writing end of which is owned by the parent process:

~ @8a33a534e2f7> lsof -n 2>/dev/null | fgrep 354519158
git-remot 27317jenkins4w FIFO   0,10  0t0
354519158 pipe
git   27319jenkins0r FIFO   0,10  0t0
354519158 pipe

And the parent process (git-remote-http) is stuck reading from another FD:

> strace -fp 27317
strace: Process 27317 attached
read(5,

And here FD 5 is another pipe:

~ @8a33a534e2f7> lsof -np 27317 | grep 5r
git-remot 27317 jenkins5r  FIFO   0,10  0t0 354519159 pipe

Which is the child’s stdout:

> lsof -n 2>/dev/null | fgrep 354519159
git-remot 27317jenkins5r FIFO   0,10  0t0
354519159 pipe
git   27319jenkins1w FIFO   0,10  0t0
354519159 pipe

Hence the deadlock.

Stack trace in git-remote-http:

(gdb) bt
#0  0x7f04f1e1363d in read () from target:/lib64/libpthread.so.0
#1  0x562417472d73 in xread ()
#2  0x562417472f2b in read_in_full ()
#3  0x562417438a6e in get_packet_data ()
#4  0x562417439129 in packet_read ()
#5  0x5624174245e0 in rpc_service ()
#6  0x562417424f10 in fetch_git ()
#7  0x5624174233fd in main ()

Stack trace in git fetch-pack:

(gdb) bt
#0  0x7fb3ab478620 in __read_nocancel () from target:/lib64/libpthread.so.0
#1  0x55f688827283 in xread ()
#2  0x55f68882743b in read_in_full ()
#3  0x55f6887ce35e in get_packet_data ()
#4  0x55f6887cea19 in packet_read ()
#5  0x55f6887ceb90 in packet_read_line ()
#6  0x55f68879dd05 in get_ack ()
#7  0x55f68879f6b4 in fetch_pack ()
#8  0x55f688710619 in cmd_fetch_pack ()
#9  0x55f6886dff7b in handle_builtin ()
#10 0x55f6886df026 in main ()

I looked at the diff between v2.9.3 and HEAD on fetch-pack.c and
remote-curl.c and didn’t see anything noteworthy in that area of the
code, so I presume the bug is still there in master.

-- 
Benoit "tsuna" Sigoure