Paul Henry <[email protected]> wrote:
> Hello!
> 
> When using unicorn unicorn v4.8.2, we're seeing a high number of TCP
> retransmits at a high connection rate.
> 
> Smartos version: 20131218T184706Z
> 
> Rackup file used:
> 
> > cat server.ru
> run -> (env) { [200, {"Content-Type" => "text/plain"},
> [Time.now.iso8601(6)]] }
> 
> To start unicorn:
> > bundle exec unicorn -l 9292 server.ru
> 
> To benchmark unicorn:
> >> Benchmark.measure { 1000.times { time=Benchmark.measure { 
> >> open("http://<ip>:9292/api/v1/settings/time",
> "Host" => "api.wanelo.com")}.real; p (time*1000).round(2) if time > 0.05 } }

Thanks for including a small example to reproduce the problem on your
end.  Is that open call is from "open-uri" in the stdlib?
(does it attempt persistent connections?)

> After the total number of connections on the system goes above 8,000
> (16,000 is the average number of connections), we start seeing delays of
> around 1.1 seconds.

I was not able to reproduce the issue with two Linux machines over
my (pretty bad) 100Mbps LAN.

I used the script at the bottom to create a lot of client connections
from my client machine to the machine running unicorn (but this is not
connecting to unicorn, but a scalable server (e.g. nginx) with infinite
persistence).

> We don't see the issue over the local loopback interface, only over the
> net. When using Webrick, we also don't see this issue.

What's strange is the issue does not manifest under Webrick for you.
Which version of Ruby is that Webrick from?

strace-ing "rackup -s webrick server.ru" reveals several differences:

1) webrick uses a listen backlog of only 128 (unicorn uses 1024)
2) webrick does not disable Nagle's algorithm.
3) webrick does not set SO_KEEPALIVE (not really needed for unicorn)

So perhaps you can try config like the following to more closely match
what webrick does:

        listen 9292, backlog: 128, tcp_nodelay: false

On the other hand, maybe webrick is too slow.

> Our tcp initial retransmit interval is 1 second. When the interval is
> reduced, the occasional latency goes down. We also see the retransmits in
> netstat, about 1 - 2 every second.
> 
> Anything that we should look at next?

Try the above config to minimize the differences between webrick
and unicorn.

If that fails, perhaps disabling SO_KEEPALIVE will work, but I'm
a bit lost as I'm not familiar with SmartOS quirks.
(you'll need to comment it out in lib/unicorn/socket_helper.rb)

Maybe try a little mock server like this, too (should be fastest :)
---------------------------- hello_world.rb ----------------
require 'socket'
s = TCPServer.new(host, port)
# start changing knobs here:
# s.setsockopt(:SOL_SOCKET, :SO_KEEPALIVE, 1)
# s.setsockopt(:IPPROTOL_TCP, :TCP_NODELAY, 1)
res = "HTTP/1.0 200 OK\r\nContent-Length: 12\r\n\r\nhello world\n"
junk = ""
loop do
  c = s.accept
  c.readpartial(1024, junk)
  c.write(res)
  c.close
end
----------------------------- many.rb --------------------------
# opens a lot of idle connections, be careful :)
require 'socket'
pids = []
host = '10.45.14.175'
port = 7500 # not unicorn
at_exit { pids.each { |pid| Process.kill(:TERM, pid) } }

24.times do
  pid = fork do
    keep = []
    begin
      s = TCPSocket.new(host, port)
      # put something in the socket buffers
      s.write("GET / HTTP/1.1\r\nHost: example.com\r\n\r\n")
      keep << s
    rescue => e
      $stdout.syswrite("#$$ done (#{keep.size}): #{e.message}\n")
      sleep
    end while true
  end
  pids << pid
end
p Process.waitall
-- 
EW

Reply via email to