Failed Experiments

Jeremy Evans Tue, 15 Sep 2009 10:52:45 -0700

I get ideas for Sequel modifications and new features on a regular
basis.  Most of these turn out fairly well and are committed, but a
few didn't turn out very well and I've decided not to pursue them
further.  I'm going to go over a couple of recent cases were I
partially implemented a feature and then decided to abandon it, in
hopes that this message informs people about the Sequel development
process.



The first feature was a modification to Sequel's connection pool so
that you could open a new connection while the current thread already
had a connection open.  So you could do something like:

  DB[:items].each do |row|
    DB.synchronize(:new_conn=>true) do
       DB[:items].filter(:id=>row[:id]).update(:name=>row
[:name].upcase)
    end
  end

That's not possible currently, because a thread cannot open multiple
simultaneous connections, and inside the Dataset#each call, you cannot
issue a new query (on most adapters).  The usual solution is just to
use Dataset#all instead of Dataset#each, but then you are loading all
rows in memory, which doesn't work for large datasets.

One way to work around it is:

  DB[:items].each do |row|
    Thread.new do
       DB[:items].filter(:id=>row[:id]).update(:name=>row
[:name].upcase)
    end.join
  end

Since a new thread is opened, a new connection will be used.  However,
that uses a thread per row, which is probably inefficient.  You can
open a separate thread outside the Dataset#each call and use a shared
queue to pass rows from instead the Dataset#each block to the separate
thread, but that's a lot of setup work.

Anyway, being able to open up a new connection in the same thread
would fix the problem easier, can you could even do things like:

  DB[:items].each do |row|
    DB[:items].new_conn.filter(:id=>row[:id]).update(:name=>row
[:name].upcase)
  end

Anyway, I have a partially working implementation at 
http://pastie.org/617662.txt.
If anyone wants the functionality and wants a nice starting point,
they can look there.  However, since nobody has even requested the
feature (everyone seems to be OK with using Dataset#all), I decided to
stop work on it.  It makes the implementation more complex, probably
slows it down, and I didn't want to debug it anymore.  I think the
basics work, but it seems to use more connections than it is
configured to.  I never did any integration testing with it.


The second feature is having Dataset#literal call sql_literal on
objects if they respond to them, instead of using a large case
statement.  This makes it easier to support literalization of
arbitrary objects (you currently need to override
Dataset#literal_other), and I thought performance would be better.
Turns out that performance is worse in all cases unless you modify all
of the core classes to support sql_literal and change the dataset
literal_* methods to be public instead of private, and even then
performance is only significantly better on ruby 1.8 (about 10%).

A basic patch for this is available at http://pastie.org/617657.txt,
with a modification to make the dataset methods public for testing on
SQLite at http://pastie.org/617658.txt.  Benchmark results are at
http://pastie.org/617679.txt.

I've decided not to pursue this further either.  If anyone wants the
ability to support literalization of arbitrary objects by defining an
sql_literal method on them, I'll consider a patch to
Dataset#literal_other to check for sql_literal.


Anyway, hope this was informative.

Jeremy
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"sequel-talk" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/sequel-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Failed Experiments

Reply via email to