Dataset Literalization Optimization

Jeremy Evans Tue, 18 Mar 2014 15:56:26 -0700

While doing some research on how to speed up eager loading limited 
associations, I came across a need for quickly generating large queries 
made up of many subqueries with slight differences in each subquery.  It 
turns out that that problem is more general and really applies to any 
situation where you want to literalize a dataset multiple times with minor 
changes each time.


I came across a solution for doing this in Sequel that is quite simple and 
fast.  Since Sequel 3.30.0, Sequel has used an append-only approach for 
almost all dataset literalization.  Instead of doing something like:

  sql  = ''
  sql << "SELECT #{column_list(opts[:select])}"
  sql << " FROM #{from_list(opts[:from])}"
  sql << " WHERE #{literal(opts[:where])}"

where column_list is:

  opts[:select].map{|x| literal(x)}.join(', ')

Sequel does something like:

  sql  = ''
  sql << "SELECT "
  column_list_append(sql, opts[:select])
  sql << " FROM "
  from_list_append(sql, opts[:from])
  sql << " WHERE "
  literal_append(sql, opts[:where])

where column_list_append is 
  
  opts[:select].each do |x|
    literal_append(sql, x)
    sql << ', ' 
  end

Note that this is greatly simplified pseudocode, but the idea is all 
objects are appended directly to the SQL query, no intermediate strings are 
created.  The append only approach is easier on the garbage collector, and 
eliminates certain corner cases for pathological queries.

Anyway, with this approach, an object being literalized into a query knows 
exactly what position in the query it appears.  The advantage to that is it 
can record this position.  Example

  ds = DB[:table].where(:id=>X)

  def X.sql_literal_append(sql, ds)
    @query_position = sql.length
  end

You can then get the SQL for this query:

  sql = ds.sql

After the query is finished, you can split the string at this position:

  before, after = sql[0...X.query_position], sql[X.query_position..-1]

Then creating a new query substituting any value for X (in this case 1) 
becomes as simple as:

  sql = ''
  sql << before
  literal_append(sql, 1)
  sql << after

This simple string concatenation is much faster than doing:

  DB[:items].where(:id=>1).sql

Expanding on that general idea, I have added 
Dataset::PlaceholderLiteralizer.  Here's the API:

  loader = Sequel::Dataset::PlaceholderLiteralizer.loader(DB[:items]) do 
|pl, ds|
    ds.where(:flag=>pl.arg).where(:name=>pl.arg)
  end
  loader.all(1, 'a') # WHERE ((flag = 1) AND (name = 'a'))
  loader.all(2, 'b') # WHERE ((flag = 2) AND (name = 'b'))

Basically, this is a way of recording arbitrary operations on a dataset 
using placeholders for the arguments, and allowing very fast SQL generation 
by substituting values for the placeholders later (about 3x faster SQL 
generation).  Note that this literalizer is even smart enough to handle 
certain cases in Sequel that result in different SQL depending on the type:

  loader.all([1, 2], DB[:foo].select(:bar)) # WHERE ((flag IN (1, 2)) AND 
(name IN (SELECT bar FROM foo)))

There are certain cases it does not or cannot handle, but I don't expect 
those to be common.

The Dataset::PlaceholderLiteralizer API is fairly low level and a bit 
verbose.  For models, there is a higher level API.  Assuming that you have 
a class method that returns a dataset:

  class Album < Sequel::Model
    def self.with_artist_and_tag_by_name(artist_id, tag_id)
      where(:artist_id=>artist_id, :tag_id=>tag_id).order(:name)
    end
  end

You can create optimized methods by passing Model.finder a symbol for the 
method name

  Album.finder :with_artist_and_tag_by_name
  Album.first_with_artist_and_tag_by_name(1, 2)

By default, it creates a first_* method that returns the first object.  You 
can pass an a :type option to create an all_* method that returns all 
objects.

  Album.finder :with_artist_and_tag_by_name, :type=>all
  Album.all_with_artist_and_tag_by_name(1, 2)

There are a few other options as well, check the RDoc for Model.finder.

Models now have single optimized method by default (first_where), and 
Model.find, Model.first, Model.first!, and Model.[] will now call that 
method automatically, which speeds up the methods by about 50% when used 
with a local database.

I've run the full release test suites for these changes, and I have 
confidence that everything works as designed.  However, as this is a 
significant internal change, I would greatly appreciate if users could try 
out the master branch on their applications and let me know:

1) Did anything break?
2) Did it make a significant performance difference?

General comments or questions are appreciated as well.

Thanks,
Jeremy

-- 
You received this message because you are subscribed to the Google Groups 
"sequel-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sequel-talk.
For more options, visit https://groups.google.com/d/optout.

Dataset Literalization Optimization

Reply via email to