While doing some research on how to speed up eager loading limited
associations, I came across a need for quickly generating large queries
made up of many subqueries with slight differences in each subquery. It
turns out that that problem is more general and really applies to any
situation where you want to literalize a dataset multiple times with minor
changes each time.
I came across a solution for doing this in Sequel that is quite simple and
fast. Since Sequel 3.30.0, Sequel has used an append-only approach for
almost all dataset literalization. Instead of doing something like:
sql = ''
sql << "SELECT #{column_list(opts[:select])}"
sql << " FROM #{from_list(opts[:from])}"
sql << " WHERE #{literal(opts[:where])}"
where column_list is:
opts[:select].map{|x| literal(x)}.join(', ')
Sequel does something like:
sql = ''
sql << "SELECT "
column_list_append(sql, opts[:select])
sql << " FROM "
from_list_append(sql, opts[:from])
sql << " WHERE "
literal_append(sql, opts[:where])
where column_list_append is
opts[:select].each do |x|
literal_append(sql, x)
sql << ', '
end
Note that this is greatly simplified pseudocode, but the idea is all
objects are appended directly to the SQL query, no intermediate strings are
created. The append only approach is easier on the garbage collector, and
eliminates certain corner cases for pathological queries.
Anyway, with this approach, an object being literalized into a query knows
exactly what position in the query it appears. The advantage to that is it
can record this position. Example
ds = DB[:table].where(:id=>X)
def X.sql_literal_append(sql, ds)
@query_position = sql.length
end
You can then get the SQL for this query:
sql = ds.sql
After the query is finished, you can split the string at this position:
before, after = sql[0...X.query_position], sql[X.query_position..-1]
Then creating a new query substituting any value for X (in this case 1)
becomes as simple as:
sql = ''
sql << before
literal_append(sql, 1)
sql << after
This simple string concatenation is much faster than doing:
DB[:items].where(:id=>1).sql
Expanding on that general idea, I have added
Dataset::PlaceholderLiteralizer. Here's the API:
loader = Sequel::Dataset::PlaceholderLiteralizer.loader(DB[:items]) do
|pl, ds|
ds.where(:flag=>pl.arg).where(:name=>pl.arg)
end
loader.all(1, 'a') # WHERE ((flag = 1) AND (name = 'a'))
loader.all(2, 'b') # WHERE ((flag = 2) AND (name = 'b'))
Basically, this is a way of recording arbitrary operations on a dataset
using placeholders for the arguments, and allowing very fast SQL generation
by substituting values for the placeholders later (about 3x faster SQL
generation). Note that this literalizer is even smart enough to handle
certain cases in Sequel that result in different SQL depending on the type:
loader.all([1, 2], DB[:foo].select(:bar)) # WHERE ((flag IN (1, 2)) AND
(name IN (SELECT bar FROM foo)))
There are certain cases it does not or cannot handle, but I don't expect
those to be common.
The Dataset::PlaceholderLiteralizer API is fairly low level and a bit
verbose. For models, there is a higher level API. Assuming that you have
a class method that returns a dataset:
class Album < Sequel::Model
def self.with_artist_and_tag_by_name(artist_id, tag_id)
where(:artist_id=>artist_id, :tag_id=>tag_id).order(:name)
end
end
You can create optimized methods by passing Model.finder a symbol for the
method name
Album.finder :with_artist_and_tag_by_name
Album.first_with_artist_and_tag_by_name(1, 2)
By default, it creates a first_* method that returns the first object. You
can pass an a :type option to create an all_* method that returns all
objects.
Album.finder :with_artist_and_tag_by_name, :type=>all
Album.all_with_artist_and_tag_by_name(1, 2)
There are a few other options as well, check the RDoc for Model.finder.
Models now have single optimized method by default (first_where), and
Model.find, Model.first, Model.first!, and Model.[] will now call that
method automatically, which speeds up the methods by about 50% when used
with a local database.
I've run the full release test suites for these changes, and I have
confidence that everything works as designed. However, as this is a
significant internal change, I would greatly appreciate if users could try
out the master branch on their applications and let me know:
1) Did anything break?
2) Did it make a significant performance difference?
General comments or questions are appreciated as well.
Thanks,
Jeremy
--
You received this message because you are subscribed to the Google Groups
"sequel-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sequel-talk.
For more options, visit https://groups.google.com/d/optout.