Executive Summary:
================

I've recently come across some performances problems with eager loading multiple has_many relationships (and to a lesser extent a single has_many with many objects in the collection) and had some thoughts.

The case I came across involved some models like so:

class Question < ActiveRecord::Base
  has_many :incoming_messages
  has_many :outgoing_messages
end

class IncomingMessage < ActiveRecord::Base
  belongs_to :question
end

class OutgoingMessage < ActiveRecord::Base
  belongs_to :question
end

In various parts of the app we load a question (or multiple ones) with :include => [:incoming_messages, :outgoing_messages]

Typically a question has a small number of incoming and outgoing messages (often only 1 or 2) and this all works absolutely fine. However at some point we ended up with a question with many incoming and outgoing_messages. Our servers (quite literally) ground to a halt whenever loading that question with the aforementioned includes, so I had a look under the hood.

The underlying thing is that in this case Question.find(1, :include => [:incoming_messages, :outgoing_messages]) returns quite a few rows and so even fairly small things add up very quickly

I've put together some changes that improve the situation, along with some numbers

Numbers:
===========

In my benchmarks I've used 2 instances of Question: one with 150 incoming and 80 outgoing (big question) and one with 225 incoming and 120 outgoing (huge question) (ie 50% more of each, so total row count goes up by 2.25)

bmbm(5) do |x|
x.report("big question incoming_messages") {Question.find_by_id big_question.id, :include => :incoming_messages} x.report("big question all") {Question.find_by_id big_question.id, :include => [:incoming_messages, :outgoing_messages]} x.report("huge question incoming_messages") {Question.find_by_id huge_question.id, :include => :incoming_messages} x.report(" question all") {Question.find_by_id huge_question.id, :include => [:incoming_messages, :outgoing_messages]}
end

Vanilla rails 1.2.3:

user system total real big question incoming_messages 0.050000 0.000000 0.050000 ( 0.052436) big question all 6.060000 0.080000 6.140000 ( 6.362781) huge question incoming_messages 0.100000 0.000000 0.100000 ( 0.111775) huge question all 20.960000 0.290000 21.250000 ( 23.186990)


Rails 1.2.3 + my changes:
user system total real user system total real big question incoming_messages 0.010000 0.010000 0.020000 ( 0.013589) big question all 1.040000 0.070000 1.110000 ( 1.325704) huge question incoming_messages 0.020000 0.000000 0.020000 ( 0.019577) huge question all 2.310000 0.160000 2.470000 ( 2.944555)

Note that in the current code (huge question all)/(big question all) ~= 3.4 (ie > the increase in size of the dataset (2.25), with my fixed that ratio is the same as the increase in size of the data set (ie execution time scaling linearly with the data set side).


Morally I think eager loading in such cases is bad: it's silly to get the db to send us 10000 rows when there are actually only 200 rows of real content. However (at least for my usage) the case of a huge question is an insignificant (in terms of the numbers of such questions, not in terms of the effects) proportion of the overall set of questions, 99.999% of the time eager loading saves us time and so I don't really want to throw the baby out with the bath water. As long as we cope reasonably in the rare cases when things go awry I can happily continue eager loading

The changes:
I've put together a range of things, but by far the biggest performance boost was due to the fact that rails checked whether the collection already contains an object for the row being considered with collection.target.include?(association). I've used hashes instead

The tests obviously pass in my environment (minus some unrelated failures that I think are down to me using mysql 4.1 which failed before I started changing code), my instinct would be that this would be more or less database independent.

If people are interested I will tidy & write this up as a ticket/ patch (at the very least I think it is worth mentioning in the docs that overzealous eager loading with several large :has_manys can be a bad thing).

Thanks to anyone who has kept reading this far, thoughts/comments appreciated
Fred


Index: activerecord/lib/active_record/associations.rb
@@ -1313,12 +1313,19 @@
           end

           def instantiate(rows)
+            @already_loaded_associations_cache = {}
+            @collections = {}
+
+            join_associations_copy = join_associations
             rows.each_with_index do |row, i|
               primary_id = join_base.record_id(row)
               unless @base_records_hash[primary_id]
@base_records_in_order << (@base_records_hash [primary_id] = join_base.instantiate(row))
               end
- construct(@base_records_hash[primary_id], @associations, join_associations.dup, row) + construct(@base_records_hash[primary_id], primary_id.to_s, @associations, join_associations_copy, row)
             end
             return @base_records_in_order
           end
@@ -1350,42 +1357,56 @@
               end
             end

-            def construct(parent, associations, joins, row)
+            def construct(parent, parent_id, associations, joins, row)
               case associations
                 when Symbol, String
- while (join = joins.shift).reflection.name.to_s != associations.to_s - raise ConfigurationError, "Not Enough Associations" if joins.empty?
+                  association_name = associations.to_s
+                  joins.each do |join|
+                    if join.reflection.name.to_s == association_name
+ return construct_association(parent, parent_id, association_name, join, row)
+                    end
                   end
-                  construct_association(parent, join, row)
+ raise ConfigurationError, "Not Enough Associations" if joins.empty?
                 when Array
                   associations.each do |association|
-                    construct(parent, association, joins, row)
+ construct(parent, parent_id, association, joins, row)
                   end
                 when Hash
associations.keys.sort{|a,b|a.to_s<=>b.to_s}.each do |name| - association = construct_association(parent, joins.shift, row) - construct(association, associations[name], joins, row) if association + association = construct_association(parent, parent_id, joins.first.reflection.name, joins.first, row) + construct(association, association.id.to_s, associations[name], joins[1..-1], row) if association
                   end
                 else
                   raise ConfigurationError, associations.inspect
               end
             end

-            def construct_association(record, join, row)
+ def construct_association(record, record_id, join_reflection_name, join, row)
               case join.reflection.macro
                 when :has_many, :has_and_belongs_to_many
-                  collection = record.send(join.reflection.name)
-                  collection.loaded
+                  association_id = row[join.aliased_primary_key]

- return nil if record.id.to_s != join.parent.record_id(row).to_s or row[join.aliased_primary_key].nil?
-                  association = join.instantiate(row)
- collection.target.push(association) unless collection.target.include?(association) + unless collection_target = @collections[record_id + join_reflection_name.to_s]
+                    collection = record.send(join_reflection_name)
+                    collection.loaded
+                    collection_target = collection.target
+ @collections[record_id + join_reflection_name.to_s] = collection_target
+                  end
+
+ return nil if record_id != join.parent.record_id (row).to_s or association_id.nil?
+
+ cache = (@already_loaded_associations_cache [collection_target.object_id] ||= {})
+                  unless association = cache[association_id]
+                    association = join.instantiate(row)
+                    collection_target.push(association)
+                    cache[association_id] = association
+                  end
                 when :has_one
- return if record.id.to_s != join.parent.record_id (row).to_s + return if record_id != join.parent.record_id (row).to_s association = join.instantiate(row) unless row [join.aliased_primary_key].nil? record.send("set_#{join.reflection.name}_target", association)
                 when :belongs_to
- return if record.id.to_s != join.parent.record_id (row).to_s or row[join.aliased_primary_key].nil? + return if record_id != join.parent.record_id (row).to_s or row[join.aliased_primary_key].nil?
                   association = join.instantiate(row)
record.send("set_#{join.reflection.name}_target", association)
                 else
@@ -1402,6 +1423,7 @@
               @active_record = active_record
               @cached_record = {}
               @table_joins   = joins
+              aliased_primary_key
             end

             def aliased_prefix
@@ -1409,7 +1431,7 @@
             end

             def aliased_primary_key
-              "#{ aliased_prefix }_r0"
+              @aliased_primary_key ||= "#{ aliased_prefix }_r0"
             end

             def aliased_table_name
@@ -1431,7 +1453,7 @@
             end

             def record_id(row)
-              row[aliased_primary_key]
+              [EMAIL PROTECTED]
             end

             def instantiate(row)
@@ -1455,7 +1477,9 @@
               @aliased_prefix     = "t#{ join_dependency.joins.size }"
@aliased_table_name = table_name #.tr('.', '_') # start with the table name, sub out any .'s
               @parent_table_name  = parent.active_record.table_name
-
+              @aliased_primary_key = nil
+              aliased_primary_key
+
if !parent.table_joins.blank? && parent.table_joins.to_s.downcase =~ %r{join(\s+\w+)?\s+# {aliased_table_name.downcase}\son}
                 join_dependency.table_aliases[aliased_table_name] += 1
               end


Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to