** Tags added: kernel-bug-break-fix

** Description changed:

  In a multi-threaded pthreads process running on Ubuntu 14.04 AMD64 (with
  over 1000 threads) which uses real time FIFO scheduling, we occasionally
  see calls to recv() with flags (MSG_PEEK | MSG_WAITALL) get stuck in an
  infinte loop or deadlock meaning the threads lock up chewing as much CPU
  as they can (due to FIFO scheduling) while stuck inside recv().
  
  Here's an example gdb back trace:
  
  [Switching to thread 4 (Thread 0x7f6040546700 (LWP 27251))]
  #0  0x00007f6231d2f7eb in __libc_recv (fd=fd@entry=146, 
buf=buf@entry=0x7f6040543600, n=n@entry=5, flags=-1, flags@entry=258) at 
../sysdeps/unix/sysv/linux/x86_64/recv.c:33
  33      ../sysdeps/unix/sysv/linux/x86_64/recv.c: No such file or directory.
  (gdb) bt
  #0  0x00007f6231d2f7eb in __libc_recv (fd=fd@entry=146, 
buf=buf@entry=0x7f6040543600, n=n@entry=5, flags=-1, flags@entry=258) at 
../sysdeps/unix/sysv/linux/x86_64/recv.c:33
  #1  0x0000000000421945 in recv (__flags=258, __n=5, __buf=0x7f6040543600, 
__fd=146) at /usr/include/x86_64-linux-gnu/bits/socket2.h:44
  [snip]
  
  The socket is a TCP socket in blocking mode, the recv() call is inside
  an outer loop with a counter, and I've checked the counter with gdb and
  it's always at 1, meaning that I'm sure that the outer loop isn't the
  problem, the thread is indeed deadlocked inside the recv() internals.
  
- Other nodes: 
+ Other nodes:
  * There always seems to be 2 or more threads deadlocked in the same place 
(same recv() call but with distinct FDs)
  * The threads calling recv() have cancellation disbaled by previously 
executing: thread_setcancelstate(PTHREAD_CANCEL_DISABLE, NULL);
  
  I've even tried adding a poll() call for POLLRDNORM on the socket before
  calling recv() with MSG_PEEK | MSG_WAITALL flags to try to make sure
  there's data available on the socket before calling *recv()*, but it
  makes no difference.
  
  So, I don't know what is wrong here, I've read all the recv()
  documentation and believe that recv() is being used correctly, the only
  conclusion I can come to is that there is a bug in libc recv() when
  using flags MSG_PEEK | MSG_WAITALL with thousands of pthreads running.
+ 
+ ===
+ break-fix: - dfbafc995304ebb9a9b03f65083e6e9cea143b20

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1486146

Title:
  recvfrom SYSCALL infinite loop/deadlock chewing 100% CPU
  (MSG_PEEK|MSG_WAITALL)

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1486146/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to