Re: Sequel's memory utilization

Jeremy Evans Tue, 03 Jan 2012 19:46:21 -0800

On Jan 3, 12:27 pm, Olek Poplavsky <[email protected]> wrote:
> Hi Jeremy
>
> I was able to fetch a bit more information on the subject.
>
> First of all, memprof back-end processing seems to be not working
> right, my uploaded data got stuck there in their processing queue.
> Still, data I was able to get from it locally was very valuable.
>
> I found that replacing 'model-centric' query with 'table-centric'
> query saved me about 40kb per query. Does not seem like much, but with
> 5 of those queries per request and 500 request in a row this change
> alone reduced 'short-lived objects' total size from 600MB to 500MB.


Certainly using plain datasets instead of models is going to be
faster, both in terms of less processing and less intermediate
objects.

> Old query:
>
>           Sequel::Channel.
>             eager_graph(:subscriptions).
>             filter(:subscriptions__user_id => :$user_id).
>             select(:channels__id).
>             order(:id).
>             ungraphed

If want to use this code but don't actually need model objects, you
could add a #naked call so that plain datasets are returned instead of
models.

> New query:
>
>           SDB[:users].
>             join(:subscriptions, :user_id => :id).
>             filter(:users__id => :$user_id).
>             select(:subscriptions__channel_id___id).
>             order(:id)
>
> Still, this query seems to create lots of strings and hashes in
> memory. It turned out that postgres driver returns back rows as hashes
> (sure this is not a surprise to you, Jeremy). Therefore, for each row
> it creates new Hash object (waste in this degenerate case of one
> column per row), and this Hash object contains key-value pairs as
> field-name-to-field-value. Field name is a string, and therefore is
> repeated for each and every row, generating more wasted memory.

I don't think the postgres driver is doing it, it's the Sequel adapter
that returns all results as hashes with symbol keys (this is true of
all Sequel adapters).  PGresult#each does yield rows, but I don't
think Sequel uses that.

As François mentioned, using #select_map would reduce the number of
intermediate objects created (if sequel_pg is used), but can't
integrate with prepared statements (since #select_map potentially
changes the SQL).  sequel_pg can also optimize regular #map calls, if
you call #map with a symbol or array of symbols.  However, that also
currently doesn't integrate with prepared statements.

It should be possible to support a new prepared statement type for map
that would integrate with prepared statements (e.g.
prepare([:map, :column])).  Take a look at lib/sequel/dataset/
prepared_statements.rb, you should only have to change #prepared_sql
and #run.  This appears to work from my very brief testing:

diff --git a/lib/sequel/dataset/prepared_statements.rb b/lib/sequel/
dataset/prepared_statements.rb
index 21ea12a..0ce0a27 100644
--- a/lib/sequel/dataset/prepared_statements.rb
+++ b/lib/sequel/dataset/prepared_statements.rb
@@ -76,8 +76,6 @@ module Sequel
       # the type of the statement and the prepared_modify_values.
       def prepared_sql
         case @prepared_type
-        when :select, :all
-          select_sql
         when :first
           clone(:limit=>1).select_sql
         when :insert_select
@@ -88,6 +86,8 @@ module Sequel
           update_sql(*@prepared_modify_values)
         when :delete
           delete_sql
+        else
+          select_sql
         end
       end

@@ -121,8 +121,6 @@ module Sequel
       # types running the method with the same name as the type.
       def run(&block)
         case @prepared_type
-        when :select, :all
-          all(&block)
         when :insert_select
           with_sql(prepared_sql).first
         when :first
@@ -133,6 +131,13 @@ module Sequel
           update(*@prepared_modify_values)
         when :delete
           delete
+        when Array
+          case @prepared_type.at(0)
+          when :map
+            map(*@prepared_type[1..-1], &block)
+          end
+        else
+          all(&block)
         end
       end



> Replacing Sequel query with
>
>         pg = ActiveRecord::Base.connection.execute("select channel_id
> from subscriptions where user_id = #{self.id} order by channel_id")
>         rtn.concat(pg.column_values(0).map { |n| Integer(n) })
>
> resulted in 80kb decrease of waste per request (for one query change).
> If I do that for all 5 queries, for 500 requests that is 200MB less
> waste, bringing me from 500MB to 300MB.

It's not surprising that ActiveRecord is faster in that case, since
you are just using the connection and not ActiveRecord itself. With
Sequel, that's basically the same thing as:

  DB.synchronize{|c| c.query("select channel_id from subscriptions
where user_id = #{self.id} order by
channel_id")).column_values(0).map{|i| Integer(i)}}

That Sequel code should give you roughly the same results as the
ActiveRecord code you are using above.

> But of course, this code is @#$ ugly. Which leads me to the following
> question. Is it possible to use Sequel for query generation/
> preparation/execution but... not have it parse the results and instead
> hand me back PGresult object?

It's actually fairly easy to do that with Sequel if you don't want to
use prepared statements:

  class Sequel::Postgres::Dataset
    def pg_result
      db.synchronize{|c| c.query(sql)}
    end
  end

With prepared statements, it's probably slightly more difficult,
because of the connection pool and having to check if the statement
has been prepared on the given connection.  But the same idea
applies.  If you read the Sequel postgres adapter code (in lib/sequel/
adapters/postgres.rb), you'll probably be able to figure out how to do
that (if not, feel free to ask questions here).

> This would be an almost palatable compromise...
>
> And BTW - quickly running memprof on some of the old ActiveRecord code
> suggested that generic ActiveRecord-centric code is much more
> wasteful, and that pretty much makes sense... but I do not know how to
> explain initial better GC results for ActiveRecord-centric version of
> the code. Still there is a mystery here to uncover.

I'd have to look at the memprof output of both the ActiveRecord
version and the Sequel version to see where the differences were.

> And, in case you are curious, here is a top section of memprof stats
> for my whole action (as you can see, it is mostly sequel with
> exception of YAJL json generator and some other small chunks).
>
>      48 /Volumes/Foo/src/proj/vendor/bundle/ruby/1.8/gems/yajl-
> ruby-1.1.0/lib/yajl.rb:72:String
>      39 /Volumes/Foo/src/proj/vendor/bundle/ruby/1.8/bundler/gems/
> sequel-47e347167fd9/lib/sequel/model/associations.rb:2095:Array
>      36 /Volumes/Foo/src/proj/vendor/bundle/ruby/1.8/bundler/gems/
> sequel-47e347167fd9/lib/sequel/adapters/postgres.rb:468:String
>      33 /Volumes/Foo/src/proj/vendor/bundle/ruby/1.8/bundler/gems/
> sequel-47e347167fd9/lib/sequel/model/associations.rb:2085:Array

This looks like memprof output for the old model query instead of the
new dataset query.  So for further tuning you are going to want to
submit the memprof output for the newer query.

Thanks,
Jeremy

-- 
You received this message because you are subscribed to the Google Groups 
"sequel-talk" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sequel-talk?hl=en.

Re: Sequel's memory utilization

Reply via email to