Paul Henry <[email protected]> wrote:
> Hello!
>
> When using unicorn unicorn v4.8.2, we're seeing a high number of TCP
> retransmits at a high connection rate.
>
> Smartos version: 20131218T184706Z
>
> Rackup file used:
>
> > cat server.ru
> run -> (env) { [200, {"Content-Type" => "text/plain"},
> [Time.now.iso8601(6)]] }
>
> To start unicorn:
> > bundle exec unicorn -l 9292 server.ru
>
> To benchmark unicorn:
> >> Benchmark.measure { 1000.times { time=Benchmark.measure {
> >> open("http://<ip>:9292/api/v1/settings/time",
> "Host" => "api.wanelo.com")}.real; p (time*1000).round(2) if time > 0.05 } }
Thanks for including a small example to reproduce the problem on your
end. Is that open call is from "open-uri" in the stdlib?
(does it attempt persistent connections?)
> After the total number of connections on the system goes above 8,000
> (16,000 is the average number of connections), we start seeing delays of
> around 1.1 seconds.
I was not able to reproduce the issue with two Linux machines over
my (pretty bad) 100Mbps LAN.
I used the script at the bottom to create a lot of client connections
from my client machine to the machine running unicorn (but this is not
connecting to unicorn, but a scalable server (e.g. nginx) with infinite
persistence).
> We don't see the issue over the local loopback interface, only over the
> net. When using Webrick, we also don't see this issue.
What's strange is the issue does not manifest under Webrick for you.
Which version of Ruby is that Webrick from?
strace-ing "rackup -s webrick server.ru" reveals several differences:
1) webrick uses a listen backlog of only 128 (unicorn uses 1024)
2) webrick does not disable Nagle's algorithm.
3) webrick does not set SO_KEEPALIVE (not really needed for unicorn)
So perhaps you can try config like the following to more closely match
what webrick does:
listen 9292, backlog: 128, tcp_nodelay: false
On the other hand, maybe webrick is too slow.
> Our tcp initial retransmit interval is 1 second. When the interval is
> reduced, the occasional latency goes down. We also see the retransmits in
> netstat, about 1 - 2 every second.
>
> Anything that we should look at next?
Try the above config to minimize the differences between webrick
and unicorn.
If that fails, perhaps disabling SO_KEEPALIVE will work, but I'm
a bit lost as I'm not familiar with SmartOS quirks.
(you'll need to comment it out in lib/unicorn/socket_helper.rb)
Maybe try a little mock server like this, too (should be fastest :)
---------------------------- hello_world.rb ----------------
require 'socket'
s = TCPServer.new(host, port)
# start changing knobs here:
# s.setsockopt(:SOL_SOCKET, :SO_KEEPALIVE, 1)
# s.setsockopt(:IPPROTOL_TCP, :TCP_NODELAY, 1)
res = "HTTP/1.0 200 OK\r\nContent-Length: 12\r\n\r\nhello world\n"
junk = ""
loop do
c = s.accept
c.readpartial(1024, junk)
c.write(res)
c.close
end
----------------------------- many.rb --------------------------
# opens a lot of idle connections, be careful :)
require 'socket'
pids = []
host = '10.45.14.175'
port = 7500 # not unicorn
at_exit { pids.each { |pid| Process.kill(:TERM, pid) } }
24.times do
pid = fork do
keep = []
begin
s = TCPSocket.new(host, port)
# put something in the socket buffers
s.write("GET / HTTP/1.1\r\nHost: example.com\r\n\r\n")
keep << s
rescue => e
$stdout.syswrite("#$$ done (#{keep.size}): #{e.message}\n")
sleep
end while true
end
pids << pid
end
p Process.waitall
--
EW