This is a conservative workaround to prevent missed wakeups on EPOLLONESHOT-based systems running old Linux kernels. Unfortunately some users cannot upgrade kernels as easily as they can userland software, so we'll provide this workaround for them.
Newer Linux kernels are immune to the race and do not need the workaround: 3.8+, 2.6.32.61+, 2.6.34.15+, 3.0.59+, 3.2.37+, 3.4.26+, 3.5.7.3+, 3.7.3+ ref: Linux kenrel commit 128dd1759d96ad36c379240f8b9463e8acfd37a1 ("epoll: prevent missed events on EPOLL_CTL_MOD") This kernel bug was probably never triggered by common level-triggered epoll users, and may not have been exposed in less-scalable, older kernel versions or systems with few CPUs. --- I'm undecided on whether I want to push this to yahns master, but this message will at least serve as documentation in case anybody encounters this rare, tiny race condition. When I first heard of this bug on Linux 3.7, I tried for hours to reproduce this on a 4-core machine with no luck... My personal preference is to push people to newer kernels; but that's not always practical, unfortunately :< lib/yahns/queue_epoll.rb | 54 ++++++++++++++++++++++++++++++++++++++++++++++-- test/test_queue.rb | 25 ++++++++++++++++++++++ 2 files changed, 77 insertions(+), 2 deletions(-) create mode 100644 test/test_queue.rb diff --git a/lib/yahns/queue_epoll.rb b/lib/yahns/queue_epoll.rb index 4a10ce0..208167a 100644 --- a/lib/yahns/queue_epoll.rb +++ b/lib/yahns/queue_epoll.rb @@ -55,9 +55,9 @@ class Yahns::Queue < SleepyPenguin::Epoll::IO # :nodoc: # thread only until epoll_ctl is called on it. case rv = io.yahns_step when :wait_readable - epoll_ctl(Epoll::CTL_MOD, io, QEV_RD) + epoll_ctl_mod(io, QEV_RD) when :wait_writable - epoll_ctl(Epoll::CTL_MOD, io, QEV_WR) + epoll_ctl_mod(io, QEV_WR) when :ignore # only used by rack.hijack # we cannot call Epoll::CTL_DEL after hijacking, the hijacker # may have already closed it Likewise, io.fileno is not @@ -76,4 +76,54 @@ class Yahns::Queue < SleepyPenguin::Epoll::IO # :nodoc: end while true end end + + # workaround racy EPOLL_CTL_MOD raciness with EPOLLONESHOT on SMP systems. + # ref: Linux commit 128dd1759d96ad36c379240f8b9463e8acfd37a1 + # ("epoll: prevent missed events on EPOLL_CTL_MOD") + # We'll be conservative and assume bugginess while older kernels. + def self.epoll_ctl_mod_buggy?(uname) + # maybe somebody ported epoll to non-Linux, assume it works: + uname[:sysname] == "Linux" or return false + + # converts a version array (e.g. %w(2 6 32 61)) into an integer, + # no official Linux kernel version component exceeds 255 currently. + ver = lambda { |*v| + v[0] << 24 | (v[1] || 0) << 16 | (v[2] || 0) << 8 | (v[3] || 0) + } + release = uname[:release].split(/\./).map(&:to_i) + cur = ver[*release] + + # all 3.8+ kernels are good (not buggy) + return false if cur >= ver[3,8] + + # some stable versions have the relevant patch backported, + # most of these are on kernel.org: + # git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git + # 3.5.7 stable is from: git://kernel.ubuntu.com/ubuntu/linux.git + + curpfx = cur >> 16 # X.Y + [ ver[3,7,3], ver[3,4,26], ver[3,2,37], ver[3,0,59] ].each do |minver| + minpfx = minver >> 16 + return false if minpfx == curpfx && cur >= minver + end + + curpfx = cur >> 8 # X.Y.Z + [ ver[3,5,7,3], ver[2,6,34,15], ver[2,6,32,61] ].each do |minver| + minpfx = minver >> 8 + return false if minpfx == curpfx && cur >= minver + end + true + end + + if epoll_ctl_mod_buggy?(Etc.uname) + # slow and safe for systems missing the necessary memory barrier + def epoll_ctl_mod(io, flag) + epoll_ctl(Epoll::CTL_DEL, io, 0) + epoll_ctl(Epoll::CTL_ADD, io, flag) + end + else + def epoll_ctl_mod(io, flag) + epoll_ctl(Epoll::CTL_MOD, io, flag) + end + end end diff --git a/test/test_queue.rb b/test/test_queue.rb new file mode 100644 index 0000000..32b8b76 --- /dev/null +++ b/test/test_queue.rb @@ -0,0 +1,25 @@ +# Copyright (C) 2014, all contributors (see git://yhbt.net/yahns.git history) +# License: GPLv3 or later (https://www.gnu.org/licenses/gpl-3.0.txt) +require_relative 'helper' + +class TestQueue < Testcase + def test_ep_buggy + uname = Etc.uname + + if uname[:sysname] == "Linux" + %w(2.4.19 3.7.2 3.4.25 3.2.36 3.0.58 2.6.0 2.6.32 2.6.32.60).each do |v| + uname[:release] = v + assert Yahns::Queue.epoll_ctl_mod_buggy?(uname), "#{v} is buggy" + end + + %w(3.8 3.8.1 3.16.2 4.0 2.6.32.61 3.5.7.3 3.7.3).each do |v| + uname[:release] = v + refute Yahns::Queue.epoll_ctl_mod_buggy?(uname), "#{v} is not buggy" + end + end + + uname[:sysname] = "Hurd" + uname[:release] = "0.1.0" + refute Yahns::Queue.epoll_ctl_mod_buggy?(uname), "Hurd is never buggy :)" + end if Yahns::Queue.respond_to?(:epoll_ctl_mod_buggy?) +end -- EW