> I seem to remember it was possible to get that information out of Rails logs pretty easily, already; and I seem recall doing that back many years ago when I used Rails. (This is probably why USR1 log reopening waits until a response is done before triggering...)
Yes I am able to get the info out of our logs, I just read through haproxy logs filtering on stuff that takes longer than N. Getting alerts though is a bit trickier, but I can probably workaround without hacking at unicorn. > I also believe the unicorn `timeout' is a misfeature that probably set the entire Rack/Ruby ecosystem back 10 years or more This is super duper tricky and I do not think this is deficiency of timeout. It is merely being employed as a workaround for design problems. There is usually only 1 or 2 spots that could cause timeouts, 99% of the time it is slow database queries under extreme load. It is super likely this could be handled in the app if we had: db_connection.timeout_at Time.now + 29 Then the connection could trigger the timeout and kill off the request without needing to tear down the entire process and re-forking. Making this happen is a bit tricky cause it would require some hacking on the pg gem. I am just not sure how hacking at timeout can make stuff any better, it is an escape hatch, just in case code misbehaves. On Mon, Jan 15, 2018 at 12:57 PM, Eric Wong <[email protected]> wrote: > Sam Saffron <[email protected]> wrote: >> I would love to start logging the actual URL that timed out when >> murder_lazy_workers does its thing. >> >> Clearly the master process has no knowledge here, but perhaps if we >> had a named pipe from each child to master we could quickly post >> current url down the pipe so we would have something to log when we >> murder a url. > > That would make the master a bottleneck. > > Instead, I suggest logging a START action with the > URL+PID+Thread(*)+serial number; and then matching it a > corresponding END action in the response_body#close Anything > without a corresponding END action can be deemed a loss and > matched up with the KILL action based on PID. > > (*) Log Thread/Fiber so it can work with other servers, too. > > > I seem to remember it was possible to get that information out of > Rails logs pretty easily, already; and I seem recall doing that > back many years ago when I used Rails. (This is probably why > USR1 log reopening waits until a response is done before > triggering...) > > > And as I've stated many times before, I don't want any sort of > lock-in or even guide-in to make people feel like they're stuck > using unicorn (by having code which depends on it). I also > believe the unicorn `timeout' is a misfeature that probably set > the entire Rack/Ruby ecosystem back 10 years or more, so I'd > rather people stop depending on it and fix their timeouts. > > (To that end, I may see about making timeout.rb in stdlib better > for Ruby 2.6...) > >> Clearly an opt-in thing, but would be very handy for quick diagnostics >> cause we can then avoid deeper log analysis and raise events just as >> this happens. > > Sorry, I prefer generic solutions which work with other servers, too.
