Hi ALL,
Recently, we found a bug in SmartOS version 20160121 lx branded zone (UUID 
1abb405a-d6a4-11e5-9bb8-bbe3fd09fb3c) system call epoll_wait, triggered by 
setting the timeout parameter of epoll_wait to -1. In this case, epoll_wait 
cannot return anything in some condition.

Below is our test case:




-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com

Attachment: test_epoll.c
Description: test_epoll.c


in normal linux environment, the output is 
$ gcc test_epoll.c -lpthread && ./a.out 
Server: listening
Server: got a connection
epoll_fd created 4
Add fb to epoll
epoll: fd 3 events 4
Remove OUT bit
Server: got msg something
put OUT bit back
epoll: fd 3 events 4
Remove OUT bit
Server: got msg something
put OUT bit back
epoll: fd 3 events 4
Remove OUT bit
Server: got msg something
...
but in SmartOS lx branded zone environment, the output is 
Server: listening
Server: got a connection
epoll_fd created 4
Add fb to epoll
epoll: fd 3 events 4
Remove OUT bit
Server: got msg something
put OUT bit back
put OUT bit back
put OUT bit back
...
     We also checked the SmartOS source code. It seems that this issue is caused by the change(https://github.com/illumos/illumos-gate/commit/f3bb54f387fc03cf651e19bbee54cc88ee51bb29?diff=split), which is used to solve the problem of Feature #6291(https://www.illumos.org/issues/6291). Putting simply, the epoll_wait will block at the condition variable pc_cv. In older versions, epoll_ctl will active this variable when dpwrite is called. But the version after 201510 dose not have this activation any more and epoll_wait has to wait forever.

Thanks
Jing

Reply via email to