> I seem to remember it was possible to get that information out of
Rails logs pretty easily, already; and I seem recall doing that
back many years ago when I used Rails. (This is probably why
USR1 log reopening waits until a response is done before
triggering...)

Yes I am able to get the info out of our logs, I just read through
haproxy logs filtering on stuff that takes longer than N. Getting
alerts though is a bit trickier, but I can probably workaround without
hacking at unicorn.

> I also believe the unicorn `timeout' is a misfeature that probably set
the entire Rack/Ruby ecosystem back 10 years or more

This is super duper tricky and I do not think this is deficiency of
timeout. It is merely being employed as a workaround for design
problems.

There is usually only 1 or 2 spots that could cause timeouts, 99% of
the time it is slow database queries under extreme load.

It is super likely this could be handled in the app if we had:

db_connection.timeout_at Time.now + 29

Then the connection could trigger the timeout and kill off the request
without needing to tear down the entire process and re-forking.

Making this happen is a bit tricky cause it would require some hacking
on the pg gem.

I am just not sure how hacking at timeout can make stuff any better,
it is an escape hatch, just in case code misbehaves.


On Mon, Jan 15, 2018 at 12:57 PM, Eric Wong <[email protected]> wrote:
> Sam Saffron <[email protected]> wrote:
>> I would love to start logging the actual URL that timed out when
>> murder_lazy_workers does its thing.
>>
>> Clearly the master process has no knowledge here, but perhaps if we
>> had a named pipe from each child to master we could quickly post
>> current url down the pipe so we would have something to log when we
>> murder a url.
>
> That would make the master a bottleneck.
>
> Instead, I suggest logging a START action with the
> URL+PID+Thread(*)+serial number; and then matching it a
> corresponding END action in the response_body#close Anything
> without a corresponding END action can be deemed a loss and
> matched up with the KILL action based on PID.
>
> (*) Log Thread/Fiber so it can work with other servers, too.
>
>
> I seem to remember it was possible to get that information out of
> Rails logs pretty easily, already; and I seem recall doing that
> back many years ago when I used Rails. (This is probably why
> USR1 log reopening waits until a response is done before
> triggering...)
>
>
> And as I've stated many times before, I don't want any sort of
> lock-in or even guide-in to make people feel like they're stuck
> using unicorn (by having code which depends on it).  I also
> believe the unicorn `timeout' is a misfeature that probably set
> the entire Rack/Ruby ecosystem back 10 years or more, so I'd
> rather people stop depending on it and fix their timeouts.
>
> (To that end, I may see about making timeout.rb in stdlib better
>  for Ruby 2.6...)
>
>> Clearly an opt-in thing, but would be very handy for quick diagnostics
>> cause we can then avoid deeper log analysis and raise events just as
>> this happens.
>
> Sorry, I prefer generic solutions which work with other servers, too.

Reply via email to