Re: after_worker_exit on murder

2017-04-04 Thread Simon Eskildsen
You're right, I misread the code—I thought the wait(2) would happen in
the kill worker call, but the reap worker call takes care of that.

Thanks for clearing that up Jeremy!



Re: [PATCH] check_client_connection: use tcp state on linux

2017-03-21 Thread Simon Eskildsen
Looks like Puma encountered some issues with some Linux distro's
kernels not supporting this. That's crazy..

https://github.com/puma/puma/issues/1241



Re: check_client_connection using getsockopt(2)

2017-03-21 Thread Simon Eskildsen
On Wed, Feb 22, 2017 at 8:42 PM, Eric Wong <e...@80x24.org> wrote:
> Simon Eskildsen <simon.eskild...@shopify.com> wrote:
>
>  Thanks for the writeup!
>
> Another sidenote: It seems nginx <-> unicorn is a bit odd
> for deployment in a containerized environment(*).
>
>> I meant to ask, in Raindrops why do you use the netlink API to get the
>> socket backlog instead of `getsockopt(2)` with `TCP_INFO` to get
>> `tcpi_unacked`? (as described in
>> http://www.ryanfrantz.com/posts/apache-tcp-backlog/) We use this to
>> monitor socket backlogs with a sidekick Ruby daemon. Although we're
>> looking to replace it with a middleware to simplify for Kubernetes.
>> It's one of our main metrics for monitoring performance, especially
>> around deploys.
>
> The netlink API allows independently-spawned processes to
> monitor others; so it can be system-wide.  TCP_INFO requires the
> process doing the checking to also have the socket open.
>
> I guess this reasoning for using netlink is invalid for containers,
> though...

If you namespace the network it's problematic, yeah. I'm considering
right now putting Raindrops in a middleware with the netlink API
inside the container, but it feels weird. That said, if you consider
the alternative of using `getsockopt(2)` on the listening socket, I
don't know how you'd get access to the Unicorn listening socket from a
middleware. Would it be nuts to expose a hook in Unicorn that allows
periodic execution for monitoring listening stats from Raindrops on
the listening socket? It seems somewhat of a narrow use-case, but on
the other hand I'm also not a fan of doing
`Raindrops::Linux.tcp_listener_stats("localhost:#{ENV["PORT"}")`, but
that might be less ugly.

>
>> I was going to use `env["unicorn.socket"]`/`env["puma.socket"]`, but
>> you could also do `env.delete("hijack_io")` after hijacking to allow
>> Unicorn to still render the response. Unfortunately the
>> `.socket` key is not part of the Rack standard, so I'm
>> hesitant to use it. When this gets into Unicorn I'm planning to
>> propose it upstream to Puma as well.
>
> I was going to say env.delete("rack.hijack_io") is dangerous
> (since env could be copied by middleware); but apparently not:
> rack.hijack won't work with a copied env, either.
> You only need to delete with the same env object you call
> rack.hijack with.
>
> But calling rack.hijack followed by env.delete may still
> have unintended side-effects in other servers; so I guess
> we (again) cannot rely on hijack working portably.

Exactly, it gives the illusion of portability but e.g. Puma stores an
instance variable to check whether a middleware hijacked, rendering
the `env#delete` option useless.

>
>> Cool. How would you suggest I check for TCP_INFO compatible platforms
>> in Unicorn? Is `RUBY_PLATFORM.ends_with?("linux".freeze)` sufficient
>> or do you prefer another mechanism? I agree that we should fall back
>> to the write hack on other platforms.
>
> The Raindrops::TCP_Info class should be undefined on unsupported
> platforms, so I think you can check for that, instead.  Then it
> should be transparent to add FreeBSD support from unicorn's
> perspective.

Perfect. I'll start working on a patch.

>
>
> (*) So I've been wondering if adding a "unicorn-mode" to an
> existing C10K, slow-client-capable Ruby/Rack server +
> reverse proxy makes sense for containerized deploys.
> Maybe...

I'd love to hear more about this idea. What are you contemplating?



Re: [PATCH] check_client_connection: use tcp state on linux

2017-03-21 Thread Simon Eskildsen
> I prefer we use a hash or case statement.  Both allow more
> optimization in the YARV VM of CRuby (opt_aref and
> opt_case_dispatch in insns.def).  case _might_ be a little
> faster if there's no constant lookup overhead, but
> a microbench or dumping the bytecode will be necessary
> to be sure :)
>
> A hash or a case can also help portability-wise in case
> we hit a system where these numbers are non-sequential;
> or if we forgot something.

Good point. I double checked all the states on Linux and found that we
were missing TCP_CLOSING [1] [2]. This is a state where the other side
is closed, and you have buffered data on your side. It doesn't seem
like this would ever happen in Unicorn, but I think we should include
it for completeness. This also means the range becomes non-sequential.
I looked at Illumus (solaris-derived) [3] and BSD [4] and for the TCP
states we're interested in it also appears to have a non-continues
range.

My co-worker, Kir Shatrov, benchmarked a bunch of approaches to the
state check and found that case is a good solution [5].  Due to the
realness of non-sequential states in common operating systems, I think
case is the way to go here as you suggested. I've made sure to
short-circuit the common-case of TCP_ESTABLISHED. I've only seen
CLOSE_WAIT in testing, but in the wild-life of large production scale
I would assume you would see TIME_WAIT and CLOSE. LAST_ACK_CLOSING it
seems pretty unlikely to hit, but not impossible. As with CLOSING,
I've included LAST_ACK_CLOSING for completeness.

[1] 
https://github.com/torvalds/linux/blob/5924bbecd0267d87c24110cbe2041b5075173a25/include/net/tcp_states.h#L27
[2] 
https://github.com/torvalds/linux/blob/ca78d3173cff3503bcd15723b049757f75762d15/net/ipv4/tcp.c#L228
[3] 
https://github.com/freebsd/freebsd/blob/386ddae58459341ec567604707805814a2128a57/sys/netinet/tcp_fsm.h
[4] 
https://github.com/illumos/illumos-gate/blob/f7877f5d39900cfd8b20dd673e5ccc1ef7cc7447/usr/src/uts/common/netinet/tcp_fsm.h
[5] https://gist.github.com/kirs/11ba4ce84c08188c9f7eba9c639616a5

> Yep, we need to account for the UNIX socket case.  I forget if
> kgio even makes them different...

I read the implementation and verified by dumping the class when
testing on some test boxes. You are right—it's a simple Kgio::Socket
object, not differentiating between Kgio::TCPSocket and
Kgio::UnixSocket at the class level. Kgio only does this if they're
explicitly passed to override the class returned from #try_accept.
Unicorn doesn't do this.

I've tried to find a way to determine the socket domain (INET vs.
UNIX) on the socket object, but neither Ruby's Socket class nor Kgio
seems to expose this. I'm not entirely sure what the simplest way to
do this check would be. We could have the accept loop pass the correct
class to #try_accept based on the listening socket that came back from
#accept. If we passed the listening socket to #read after accept, we'd
know.. but I don't like that the request knows about the listener
either. Alternatively, we could expose the socket domain in Kgio, but
that'll be problematic in the near-ish future as you've mentioned
wanting to move away from Kgio as Ruby's IO library is at parity as
per Ruby 2.4.

What do you suggest pursuing here to check whether the client socket
is a TCP socket?

Below is a patch addressing the other concerns. I had to include
require raindrops so the `defined?` check would do the right thing, as
the only other file that requires Raindrops is the worker one which is
loaded after http_request. I can change the load-order or require
raindrops in lib/unicorn.rb if you prefer.

Missing is the socket type check. Thanks for your feedback!

---
 lib/unicorn/http_request.rb | 31 ++-
 1 file changed, 26 insertions(+), 5 deletions(-)

diff --git a/lib/unicorn/http_request.rb b/lib/unicorn/http_request.rb
index 0c1f9bb..eedccac 100644
--- a/lib/unicorn/http_request.rb
+++ b/lib/unicorn/http_request.rb
@@ -2,6 +2,7 @@
 # :enddoc:
 # no stable API here
 require 'unicorn_http'
+require 'raindrops'

 # TODO: remove redundant names
 Unicorn.const_set(:HttpRequest, Unicorn::HttpParser)
@@ -29,6 +30,7 @@ class Unicorn::HttpParser
   # 2.2+ optimizes hash assignments when used with literal string keys
   HTTP_RESPONSE_START = [ 'HTTP', '/1.1 ']
   @@input_class = Unicorn::TeeInput
+  @@raindrops_tcp_info_defined = defined?(Raindrops::TCP_Info)
   @@check_client_connection = false

   def self.input_class
@@ -83,11 +85,7 @@ def read(socket)
   false until add_parse(socket.kgio_read!(16384))
 end

-# detect if the socket is valid by writing a partial response:
-if @@check_client_connection && headers?
-  self.response_start_sent = true
-  HTTP_RESPONSE_START.each { |c| socket.write(c) }
-end
+check_client_connection(socket) if @@check_client_connection

 e['rack.input'] = 0 == content_length ?
   NULL_IO : @@input_class.new(socket, self)
@@ -108,4 +106,27 @@ def 

Re: check_client_connection using getsockopt(2)

2017-03-21 Thread Simon Eskildsen
On Wed, Feb 22, 2017 at 1:33 PM, Eric Wong <e...@80x24.org> wrote:
> Simon Eskildsen <simon.eskild...@shopify.com> wrote:
>
>  great to know it's still working after all these years :>
>
>> This confirms Eric's comment that the existing
>> `check_client_connection` works perfectly on loopback, but as soon as
>> you put an actual network between the Unicorn and client—it's only
>> effective 20% of the time, even with `TCP_NODELAY`. I'm assuming, due
>> to buffering, even when disabling Nagle's. As we're changing our
>> architecture, we move from ngx (lb) -> ngx (host) -> unicorn to ngx
>> (lb) -> unicorn. That means this patch will no longer work for us.
>
> Side comment: I'm a bit curious how you guys arrived at removing
> nginx at the host level (if you're allowed to share that info)
>
> I've mainly kept nginx on the same host as unicorn, but used
> haproxy or similar (with minimal buffering) at the LB level.
> That allows better filesystem load distribution for large uploads.


Absolutely. We're starting to experiment with Kubernetes, and a result
we're interested in simplifying ingress as much as possible. We could
still run them, but if I can avoid explaining why they're there for
the next 5 years—I'll be happy :) As I see, the current use-cases we
have for the host nginx are (with why we can get rid of it):

* Keepalive between edge nginx LBs and host LBs to avoid excessive
network traffic. This is not a deal-breaker, so we'll just ignore this
problem. It's a _massive_ amount of traffic normally though.
* Out of rotation. We take nodes gracefully out of rotation by
touching a file that'll return 404 on a health-checking endpoint from
the edge LBs. Kubernetes implements this by just removing all the
containers on that node.
* Graceful deploys. When we deploy with containers, we take each
container out of rotation with the local host Nginx, wait for it to
come up, and put it back in rotation. We don't utilize Unicorns
graceful restarts because we want a Ruby upgrade deploy (replacing the
container) to be the same as an updated code rollout.
* File uploads. As you mention, currently we load-balance this between
them. I haven't finished the investigation on whether this is feasible
on the front LBs. Otherwise we may go for a 2-tier Nginx solution or
expand the edge. However, with Kubernetes I'd really like to avoid
having host nginxes. It's not very native to the Kubernetes paradigm.
Does it balance across the network, or only to the pods on that node?
* check_client_connection working. :-) This thread.

Slow clients and other advantages we find from the edge Nginxes.

>> I propose instead of the early `write(2)` hack, that we use
>> `getsockopt(2)` with the `TCP_INFO` flag and read the state of the
>> socket. If it's in `CLOSE_WAIT` or `CLOSE`, we kill the connection and
>> move on to the next.
>>
>> https://github.com/torvalds/linux/blob/8fa3b6f9392bf6d90cb7b908e07bd90166639f0a/include/uapi/linux/tcp.h#L163
>>
>> This is not going to be portable, but we can do this on only Linux
>> which I suspect is where most production deployments of Unicorn that
>> would benefit from this feature run. It's not useful in development
>> (which is likely more common to not be Linux). We could also introduce
>> it under a new configuration option if desired. In my testing, this
>> works to reject 100% of requests early when not on loopback.
>
> Good to know about it's effectiveness!  I don't mind adding
> Linux-only features as long as existing functionality still
> works on the server-oriented *BSDs.
>
>> The code is essentially:
>>)?
>> def client_closed?
>>   tcp_info = socket.getsockopt(Socket::SOL_TCP, Socket::TCP_INFO)
>>   state = tcp_info.unpack("c")[0]
>>   state == TCP_CLOSE || state == TCP_CLOSE_WAIT
>> end
>
> +Cc: raindrops-pub...@bogomips.org -> https://bogomips.org/raindrops-public/
>
> Fwiw, raindrops (already a dependency of unicorn) has TCP_INFO
> support under Linux:
>
> https://bogomips.org/raindrops.git/tree/ext/raindrops/linux_tcp_info.c
>
> I haven't used it, much, but FreeBSD also implements a subset of
> this feature, nowadays, and ought to be supportable, too.  We
> can look into adding it for raindrops.

Cool, I think it makes sense to use Raindrops here, advantage being
it'd use the actual struct instead of a sketchy `#unpack`.

I meant to ask, in Raindrops why do you use the netlink API to get the
socket backlog instead of `getsockopt(2)` with `TCP_INFO` to get
`tcpi_unacked`? (as described in
http://www.ryanfrantz.com/posts/apache-tcp-backlog/) We use this to
monitor socket backlogs with a sidekick Ruby daemon. Although we're
looking to replace it with a middleware to simplify for Kubernetes.
It's one of our 

[PATCH] check_client_connection: use tcp state on linux

2017-03-21 Thread Simon Eskildsen
The implementation of the check_client_connection causing an early write is
ineffective when not performed on loopback. In my testing, when on non-loopback,
such as another host, we only see a 10-20% rejection rate with TCP_NODELAY of
clients that are closed. This means 90-80% of responses in this case are
rendered in vain.

This patch uses getosockopt(2) with TCP_INFO on Linux to read the socket state.
If the socket is in a state where it doesn't take a response, we'll abort the
request with a similar error as to what write(2) would give us on a closed
socket in case of an error.

In my testing, this has a 100% rejection rate. This testing was conducted with
the following script:

100.times do |i|
  client = TCPSocket.new("some-unicorn", 20_000)
  client.write("GET /collections/#{rand(1)}
HTTP/1.1\r\nHost:walrusser.myshopify.com\r\n\r\n")
  client.close
end
---
 lib/unicorn/http_request.rb | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/lib/unicorn/http_request.rb b/lib/unicorn/http_request.rb
index 0c1f9bb..b4c95fc 100644
--- a/lib/unicorn/http_request.rb
+++ b/lib/unicorn/http_request.rb
@@ -31,6 +31,9 @@ class Unicorn::HttpParser
   @@input_class = Unicorn::TeeInput
   @@check_client_connection = false

+  # TCP_TIME_WAIT: 6, TCP_CLOSE: 7, TCP_CLOSE_WAIT: 8, TCP_LAST_ACK: 9
+  IGNORED_CHECK_CLIENT_SOCKET_STATES = (6..9)
+
   def self.input_class
 @@input_class
   end
@@ -83,10 +86,18 @@ def read(socket)
   false until add_parse(socket.kgio_read!(16384))
 end

-# detect if the socket is valid by writing a partial response:
-if @@check_client_connection && headers?
-  self.response_start_sent = true
-  HTTP_RESPONSE_START.each { |c| socket.write(c) }
+# detect if the socket is valid by checking client connection.
+if @@check_client_connection
+  if defined?(Raindrops::TCP_Info)
+tcp_info = Raindrops::TCP_Info.new(socket)
+if IGNORED_CHECK_CLIENT_SOCKET_STATES.cover?(tcp_info.state)
+  socket.close
+  raise Errno::EPIPE
+end
+  elsif headers?
+self.response_start_sent = true
+HTTP_RESPONSE_START.each { |c| socket.write(c) }
+  end
 end

 e['rack.input'] = 0 == content_length ?
-- 
2.11.0