On Jan 10, 5:22 pm, Emily Dresner-Thornber <[email protected]> wrote:
> 1. The hang and crash is a TCP/IP issue relating with management of the
> connection pool and the database.  I can see the bytes written on the
> socket and then the socket time out waiting on the response from the
> database through tcpdump.  I can also see the crash in strace traces
> attached to running processes.

If this is the case, it sounds like this could theoretically happen
with any code using the driver.  Running certain types of queries may
make things more likely, but there is probably an underlying issue,
below the level of Sequel that is causing the problem.

> 3. I have set the :single_threaded=>true flag in the Sequel.connect()
> object and it has no noticeable effect on the issue.

If that is the case, the problem is not a threading issue at all.

> 4. The issue appears when using the ORM portion of Sequel.  Once the
> Sequel::Model object is called anywhere in the code path, /even if the
> object is never invoked after creation/, it causes the crash.

The ORM part of Sequel may call slightly different queries depending
on your code, but it issues the queries in the exact same way.

> 5. However, I do not see the crash if the Sequel::Model object is never
> called.  If the code only uses direct SQL with the Sequel DSL on the
> connection object it works reliably.  I was able to pour ~200K transactions
> through some simple queries.

Are you using the exact same queries in the both cases?

> I spent several hours with my DBA this afternoon trying to model how the
> Sequel connection pooling was communicating with the database.  What we
> theorize from the connection counts on the database side is Sequel is
> consuming a dead connection out of the pool and hanging.  I have tried
> resetting several of the pool and timeout settings to no noticeable effect.

Sequel does not do proactive checking of connections.  It assumes that
once a connection is added to the pool, it will remain valid.  If
using it later raises an exception, the connection is then removed
from the pool.  In general, this doesn't cause issues.  Some people
want autoreconnection, which you can do by rescuing
DatabaseDisconnectErrors and handling them manually.  If dead
connections cause your app to freeze, that's a bug in the lower levels
of the stack that should be fixed there.

> I do know OpenVZ has an open issue with connections hanging in a CLOSE_WAIT
> state.  This may be negatively affecting Sequel when consuming the ORM and
> may mean Sequel is not as puritanical as it should dealing with the states
> of connections in the pool before releasing and reconnecting or making the
> assumption the underlying operating system is acting in good faith when
> dealing with connection lifecycles.  What worries me deeply is this will
> also affect the working portions of Sequel when connections age and time
> out when the service is idle.

I object to your use of "should" here. :)  Sequel should not try to
code defensively around bugs in the lower levels of the stack (driver/
OS).  If your operating system/virtualization layer has bugs, then
there's not much that Sequel can do about it.

You would be correct to worry about this affecting all parts of
Sequel, not just the modeling part.  I would still worry about it if
you used ActiveRecord. The solution here should be to fix the code
that is buggy, not try to work around the issue by patching Sequel
(which I'm not even sure what you would do).  That said, it should be
reasonably easy to monkey patch the connection pool to do some
proactive checking if you want to do that for your application
(overriding ConnectionPool#hold).  However, I'm not sure that that
would solve the issue.

> Is there a specific code path the Sequel::Model object uses that the
> Sequel::Connect object does not in regards to the connection pool that
> would cause one form of the library to recycle connections properly and one
> not?

No.  Sequel::Model does not do anything special to the lower levels of
Sequel.  If you are hitting these bugs when you are using
Sequel::Model but not plain Sequel, mostly likely your queries are
different and that is contributing to the problem, or you are getting
unlucky (race condition/different memory layout).

You didn't respond to my request for a reproducible test case, but
hopefully that was an oversight. :)  If there is a reason you can't
share it, that's fine, but it would be helpful to know why.

Jeremy

-- 
You received this message because you are subscribed to the Google Groups 
"sequel-talk" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sequel-talk?hl=en.

Reply via email to