Re: epoll with ONESHOT possibly fails to deliver events

2012-12-21 Thread Eric Wong
"Junchang(Jason) Wang" wrote: > We still believe this is a bug in epoll system even though we can't > prove that so far. Both Andi and I are very interested in this problem > and helping you experts solve this it. Just let us know if we can > help. I'm just another epoll user, definitely not an

Re: epoll with ONESHOT possibly fails to deliver events

2012-12-21 Thread Andreas Voellmy
Hi Eric, Thanks again for looking at our bug report. I agree with Jason's comments: the bug occurs independently of the socketCheck function; this function waits long enough for the server to stop serving any more requests and then checks the sockets to find out which ones still have data

Re: epoll with ONESHOT possibly fails to deliver events

2012-12-21 Thread Andreas Voellmy
Hi Eric, Thanks again for looking at our bug report. I agree with Jason's comments: the bug occurs independently of the socketCheck function; this function waits long enough for the server to stop serving any more requests and then checks the sockets to find out which ones still have data

Re: epoll with ONESHOT possibly fails to deliver events

2012-12-21 Thread Eric Wong
Junchang(Jason) Wang junchang.w...@yale.edu wrote: We still believe this is a bug in epoll system even though we can't prove that so far. Both Andi and I are very interested in this problem and helping you experts solve this it. Just let us know if we can help. I'm just another epoll user,

Re: epoll with ONESHOT possibly fails to deliver events

2012-12-20 Thread Junchang(Jason) Wang
On Thu, Dec 20, 2012 at 4:32 PM, Eric Wong wrote: > Andreas Voellmy wrote: >> I wrote a C program that behaves similar to my original program and >> triggers the bug. The bug only arises when I use enough cores and >> threads (about 16). The program is here: >>

Re: epoll with ONESHOT possibly fails to deliver events

2012-12-20 Thread Eric Wong
Andreas Voellmy wrote: > I wrote a C program that behaves similar to my original program and > triggers the bug. The bug only arises when I use enough cores and > threads (about 16). The program is here: > https://github.com/AndreasVoellmy/epollbug/blob/master/epollbug.c I finally took a closer

Re: epoll with ONESHOT possibly fails to deliver events

2012-12-20 Thread Eric Wong
Andreas Voellmy andreas.voel...@yale.edu wrote: I wrote a C program that behaves similar to my original program and triggers the bug. The bug only arises when I use enough cores and threads (about 16). The program is here: https://github.com/AndreasVoellmy/epollbug/blob/master/epollbug.c I

Re: epoll with ONESHOT possibly fails to deliver events

2012-12-20 Thread Junchang(Jason) Wang
On Thu, Dec 20, 2012 at 4:32 PM, Eric Wong normalper...@yhbt.net wrote: Andreas Voellmy andreas.voel...@yale.edu wrote: I wrote a C program that behaves similar to my original program and triggers the bug. The bug only arises when I use enough cores and threads (about 16). The program is here:

Re: epoll with ONESHOT possibly fails to deliver events

2012-12-19 Thread Andreas Voellmy
We (i.e. I together with my colleague Jason Wang, cc'ed) installed the latest stable kernel (3.7.1) and verified that the bug still occurs. The bug occurs when testing the program across a network link and when testing on the loopback interface. We also noticed that when testing across the

Re: epoll with ONESHOT possibly fails to deliver events

2012-12-19 Thread Andreas Voellmy
We (i.e. I together with my colleague Jason Wang, cc'ed) installed the latest stable kernel (3.7.1) and verified that the bug still occurs. The bug occurs when testing the program across a network link and when testing on the loopback interface. We also noticed that when testing across the

Re: epoll with ONESHOT possibly fails to deliver events

2012-12-18 Thread Andreas Voellmy
BTW, I simplified the test program a bit: I removed the loop that epoll_waits on the eventfd fd and reads from it (I also removed the epoll instance in that loop). The bug still occurs with this removed. Now the bug is triggered simply by adding the call to eventfd_write after processing each

Re: epoll with ONESHOT possibly fails to deliver events

2012-12-18 Thread Andreas Voellmy
BTW, I simplified the test program a bit: I removed the loop that epoll_waits on the eventfd fd and reads from it (I also removed the epoll instance in that loop). The bug still occurs with this removed. Now the bug is triggered simply by adding the call to eventfd_write after processing each

Re: epoll with ONESHOT possibly fails to deliver events

2012-12-17 Thread Andreas Voellmy
On Dec 17, 2012, at 9:07 PM, Eric Wong wrote: > Andreas Voellmy wrote: >> There were a couple of errors in the code when I posted my last >> message. I have fixed those. The epoll bug still occurs. > > Sorry I haven't gotten around to this. > > Can you reproduce this with fewer cores? (I

Re: epoll with ONESHOT possibly fails to deliver events

2012-12-17 Thread Eric Wong
Andreas Voellmy wrote: > There were a couple of errors in the code when I posted my last > message. I have fixed those. The epoll bug still occurs. Sorry I haven't gotten around to this. Can you reproduce this with fewer cores? (I only have 4 at most). Have you tried the latest stable kernel

Re: epoll with ONESHOT possibly fails to deliver events

2012-12-17 Thread Eric Wong
Andreas Voellmy andreas.voel...@yale.edu wrote: There were a couple of errors in the code when I posted my last message. I have fixed those. The epoll bug still occurs. Sorry I haven't gotten around to this. Can you reproduce this with fewer cores? (I only have 4 at most). Have you tried the

Re: epoll with ONESHOT possibly fails to deliver events

2012-12-17 Thread Andreas Voellmy
On Dec 17, 2012, at 9:07 PM, Eric Wong normalper...@yhbt.net wrote: Andreas Voellmy andreas.voel...@yale.edu wrote: There were a couple of errors in the code when I posted my last message. I have fixed those. The epoll bug still occurs. Sorry I haven't gotten around to this. Can you

Re: epoll with ONESHOT possibly fails to deliver events

2012-12-15 Thread Andreas Voellmy
There were a couple of errors in the code when I posted my last message. I have fixed those. The epoll bug still occurs. -Andi On Dec 13, 2012, at 7:16 PM, Andreas Voellmy wrote: > I believe I have found a bug in epoll. This bug causes the behavior I > described in earlier emails. The bug

Re: epoll with ONESHOT possibly fails to deliver events

2012-12-15 Thread Andreas Voellmy
There were a couple of errors in the code when I posted my last message. I have fixed those. The epoll bug still occurs. -Andi On Dec 13, 2012, at 7:16 PM, Andreas Voellmy andreas.voel...@yale.edu wrote: I believe I have found a bug in epoll. This bug causes the behavior I described in

Re: epoll with ONESHOT possibly fails to deliver events

2012-12-13 Thread Andreas Voellmy
I believe I have found a bug in epoll. This bug causes the behavior I described in earlier emails. The bug is caused by the interaction of epoll instances which share no files in common. I wrote a C program that behaves similar to my original program and triggers the bug. The bug only arises

Re: epoll with ONESHOT possibly fails to deliver events

2012-12-13 Thread Phil Turmel
On 12/13/2012 04:32 AM, Eric Wong wrote: > Andreas Voellmy wrote: [trim /] >>> Another thread, distinct from all of the threads serving particular >>> sockets, is perfoming epoll_wait calls. When sockets are returned as >>> being ready from an epoll_wait call, the thread signals to the >>>

Re: epoll with ONESHOT possibly fails to deliver events

2012-12-13 Thread Phil Turmel
On 12/13/2012 07:08 PM, Phil Turmel wrote: > On 12/13/2012 04:32 AM, Eric Wong wrote: >> Andreas Voellmy wrote: > > [trim /] > Another thread, distinct from all of the threads serving particular sockets, is perfoming epoll_wait calls. When sockets are returned as being ready from

Re: epoll with ONESHOT possibly fails to deliver events

2012-12-13 Thread Andreas Voellmy
Hi Eric, On Dec 13, 2012, at 4:32 AM, Eric Wong wrote: > Andreas Voellmy wrote: > >>> Another thread, distinct from all of the threads serving particular >>> sockets, is perfoming epoll_wait calls. When sockets are returned as >>> being ready from an epoll_wait call, the thread signals to

Re: epoll with ONESHOT possibly fails to deliver events

2012-12-13 Thread Eric Wong
Andreas Voellmy wrote: > Using strace, I checked that my program is using epoll api as I > described. Here is a fragment of the strace output that demonstrates > my use: > > recvfrom(161, "GET / HTTP/1.1\r\nHost: 10.12.0.1:"..., 90, 0, NULL, NULL) = 90 > sendto(161, "HTTP/1.1 200 OK\r\nDate:

Re: epoll with ONESHOT possibly fails to deliver events

2012-12-13 Thread Eric Wong
Andreas Voellmy andreas.voel...@yale.edu wrote: Using strace, I checked that my program is using epoll api as I described. Here is a fragment of the strace output that demonstrates my use: recvfrom(161, GET / HTTP/1.1\r\nHost: 10.12.0.1:..., 90, 0, NULL, NULL) = 90 sendto(161, HTTP/1.1 200

Re: epoll with ONESHOT possibly fails to deliver events

2012-12-13 Thread Andreas Voellmy
Hi Eric, On Dec 13, 2012, at 4:32 AM, Eric Wong normalper...@yhbt.net wrote: Andreas Voellmy andreas.voel...@yale.edu wrote: Another thread, distinct from all of the threads serving particular sockets, is perfoming epoll_wait calls. When sockets are returned as being ready from an

Re: epoll with ONESHOT possibly fails to deliver events

2012-12-13 Thread Phil Turmel
On 12/13/2012 07:08 PM, Phil Turmel wrote: On 12/13/2012 04:32 AM, Eric Wong wrote: Andreas Voellmy andreas.voel...@yale.edu wrote: [trim /] Another thread, distinct from all of the threads serving particular sockets, is perfoming epoll_wait calls. When sockets are returned as being

Re: epoll with ONESHOT possibly fails to deliver events

2012-12-13 Thread Phil Turmel
On 12/13/2012 04:32 AM, Eric Wong wrote: Andreas Voellmy andreas.voel...@yale.edu wrote: [trim /] Another thread, distinct from all of the threads serving particular sockets, is perfoming epoll_wait calls. When sockets are returned as being ready from an epoll_wait call, the thread signals

Re: epoll with ONESHOT possibly fails to deliver events

2012-12-13 Thread Andreas Voellmy
I believe I have found a bug in epoll. This bug causes the behavior I described in earlier emails. The bug is caused by the interaction of epoll instances which share no files in common. I wrote a C program that behaves similar to my original program and triggers the bug. The bug only arises

Re: epoll with ONESHOT possibly fails to deliver events

2012-12-12 Thread Andreas Voellmy
Hi list, Using strace, I checked that my program is using epoll api as I described. Here is a fragment of the strace output that demonstrates my use: recvfrom(161, "GET / HTTP/1.1\r\nHost: 10.12.0.1:"..., 90, 0, NULL, NULL) = 90 sendto(161, "HTTP/1.1 200 OK\r\nDate: Tue, 09 O"..., 323, 0,

Re: epoll with ONESHOT possibly fails to deliver events

2012-12-12 Thread Andreas Voellmy
Hi list, Using strace, I checked that my program is using epoll api as I described. Here is a fragment of the strace output that demonstrates my use: recvfrom(161, GET / HTTP/1.1\r\nHost: 10.12.0.1:..., 90, 0, NULL, NULL) = 90 sendto(161, HTTP/1.1 200 OK\r\nDate: Tue, 09 O..., 323, 0, NULL,

epoll with ONESHOT possibly fails to deliver events

2012-12-11 Thread Andreas Voellmy
Hi list, I am using epoll for the Linux (version 3.4.0) implementation of the event notification subsystem of GHC's (Glasgow Haskell Compiler) RTS (runtime system). I am running into a bug that has only popped up using many cores (> 16) and under particular kind of load. I've been debugging

epoll with ONESHOT possibly fails to deliver events

2012-12-11 Thread Andreas Voellmy
Hi list, I am using epoll for the Linux (version 3.4.0) implementation of the event notification subsystem of GHC's (Glasgow Haskell Compiler) RTS (runtime system). I am running into a bug that has only popped up using many cores ( 16) and under particular kind of load. I've been debugging for