Executive Summary: ================I've recently come across some performances problems with eager loading multiple has_many relationships (and to a lesser extent a single has_many with many objects in the collection) and had some thoughts.
The case I came across involved some models like so: class Question < ActiveRecord::Base has_many :incoming_messages has_many :outgoing_messages end class IncomingMessage < ActiveRecord::Base belongs_to :question end class OutgoingMessage < ActiveRecord::Base belongs_to :question endIn various parts of the app we load a question (or multiple ones) with :include => [:incoming_messages, :outgoing_messages]
Typically a question has a small number of incoming and outgoing messages (often only 1 or 2) and this all works absolutely fine. However at some point we ended up with a question with many incoming and outgoing_messages. Our servers (quite literally) ground to a halt whenever loading that question with the aforementioned includes, so I had a look under the hood.
The underlying thing is that in this case Question.find(1, :include => [:incoming_messages, :outgoing_messages]) returns quite a few rows and so even fairly small things add up very quickly
I've put together some changes that improve the situation, along with some numbers
Numbers: ===========In my benchmarks I've used 2 instances of Question: one with 150 incoming and 80 outgoing (big question) and one with 225 incoming and 120 outgoing (huge question) (ie 50% more of each, so total row count goes up by 2.25)
bmbm(5) do |x|x.report("big question incoming_messages") {Question.find_by_id big_question.id, :include => :incoming_messages} x.report("big question all") {Question.find_by_id big_question.id, :include => [:incoming_messages, :outgoing_messages]} x.report("huge question incoming_messages") {Question.find_by_id huge_question.id, :include => :incoming_messages} x.report(" question all") {Question.find_by_id huge_question.id, :include => [:incoming_messages, :outgoing_messages]}
end Vanilla rails 1.2.3:user system total real big question incoming_messages 0.050000 0.000000 0.050000 ( 0.052436) big question all 6.060000 0.080000 6.140000 ( 6.362781) huge question incoming_messages 0.100000 0.000000 0.100000 ( 0.111775) huge question all 20.960000 0.290000 21.250000 ( 23.186990)
Rails 1.2.3 + my changes:user system total real user system total real big question incoming_messages 0.010000 0.010000 0.020000 ( 0.013589) big question all 1.040000 0.070000 1.110000 ( 1.325704) huge question incoming_messages 0.020000 0.000000 0.020000 ( 0.019577) huge question all 2.310000 0.160000 2.470000 ( 2.944555)
Note that in the current code (huge question all)/(big question all) ~= 3.4 (ie > the increase in size of the dataset (2.25), with my fixed that ratio is the same as the increase in size of the data set (ie execution time scaling linearly with the data set side).
Morally I think eager loading in such cases is bad: it's silly to get the db to send us 10000 rows when there are actually only 200 rows of real content. However (at least for my usage) the case of a huge question is an insignificant (in terms of the numbers of such questions, not in terms of the effects) proportion of the overall set of questions, 99.999% of the time eager loading saves us time and so I don't really want to throw the baby out with the bath water. As long as we cope reasonably in the rare cases when things go awry I can happily continue eager loading
The changes:I've put together a range of things, but by far the biggest performance boost was due to the fact that rails checked whether the collection already contains an object for the row being considered with collection.target.include?(association). I've used hashes instead
The tests obviously pass in my environment (minus some unrelated failures that I think are down to me using mysql 4.1 which failed before I started changing code), my instinct would be that this would be more or less database independent.
If people are interested I will tidy & write this up as a ticket/ patch (at the very least I think it is worth mentioning in the docs that overzealous eager loading with several large :has_manys can be a bad thing).
Thanks to anyone who has kept reading this far, thoughts/comments appreciated
Fred
Index: activerecord/lib/active_record/associations.rb
@@ -1313,12 +1313,19 @@
end
def instantiate(rows)
+ @already_loaded_associations_cache = {}
+ @collections = {}
+
+ join_associations_copy = join_associations
rows.each_with_index do |row, i|
primary_id = join_base.record_id(row)
unless @base_records_hash[primary_id]
@base_records_in_order << (@base_records_hash
[primary_id] = join_base.instantiate(row))
end
- construct(@base_records_hash[primary_id],
@associations, join_associations.dup, row)
+ construct(@base_records_hash[primary_id],
primary_id.to_s, @associations, join_associations_copy, row)
end
return @base_records_in_order
end
@@ -1350,42 +1357,56 @@
end
end
- def construct(parent, associations, joins, row)
+ def construct(parent, parent_id, associations, joins, row)
case associations
when Symbol, String
- while (join = joins.shift).reflection.name.to_s !=
associations.to_s
- raise ConfigurationError, "Not Enough
Associations" if joins.empty?
+ association_name = associations.to_s + joins.each do |join| + if join.reflection.name.to_s == association_name+ return construct_association(parent, parent_id, association_name, join, row)
+ end
end
- construct_association(parent, join, row)
+ raise ConfigurationError, "Not Enough
Associations" if joins.empty?
when Array
associations.each do |association|
- construct(parent, association, joins, row)
+ construct(parent, parent_id, association, joins,
row)
end
when Hash
associations.keys.sort{|a,b|a.to_s<=>b.to_s}.each
do |name|
- association = construct_association(parent,
joins.shift, row)
- construct(association, associations[name],
joins, row) if association
+ association = construct_association(parent,
parent_id, joins.first.reflection.name, joins.first, row)
+ construct(association, association.id.to_s,
associations[name], joins[1..-1], row) if association
end
else
raise ConfigurationError, associations.inspect
end
end
- def construct_association(record, join, row)
+ def construct_association(record, record_id,
join_reflection_name, join, row)
case join.reflection.macro
when :has_many, :has_and_belongs_to_many
- collection = record.send(join.reflection.name)
- collection.loaded
+ association_id = row[join.aliased_primary_key]
- return nil if record.id.to_s !=
join.parent.record_id(row).to_s or row[join.aliased_primary_key].nil?
- association = join.instantiate(row)- collection.target.push(association) unless collection.target.include?(association) + unless collection_target = @collections[record_id + join_reflection_name.to_s]
+ collection = record.send(join_reflection_name) + collection.loaded + collection_target = collection.target+ @collections[record_id + join_reflection_name.to_s] = collection_target
+ end ++ return nil if record_id != join.parent.record_id (row).to_s or association_id.nil?
++ cache = (@already_loaded_associations_cache [collection_target.object_id] ||= {})
+ unless association = cache[association_id]
+ association = join.instantiate(row)
+ collection_target.push(association)
+ cache[association_id] = association
+ end
when :has_one
- return if record.id.to_s != join.parent.record_id
(row).to_s
+ return if record_id != join.parent.record_id
(row).to_s
association = join.instantiate(row) unless row
[join.aliased_primary_key].nil?
record.send("set_#{join.reflection.name}_target",
association)
when :belongs_to
- return if record.id.to_s != join.parent.record_id
(row).to_s or row[join.aliased_primary_key].nil?
+ return if record_id != join.parent.record_id
(row).to_s or row[join.aliased_primary_key].nil?
association = join.instantiate(row)
record.send("set_#{join.reflection.name}_target",
association)
else
@@ -1402,6 +1423,7 @@
@active_record = active_record
@cached_record = {}
@table_joins = joins
+ aliased_primary_key
end
def aliased_prefix
@@ -1409,7 +1431,7 @@
end
def aliased_primary_key
- "#{ aliased_prefix }_r0"
+ @aliased_primary_key ||= "#{ aliased_prefix }_r0"
end
def aliased_table_name
@@ -1431,7 +1453,7 @@
end
def record_id(row)
- row[aliased_primary_key]
+ [EMAIL PROTECTED]
end
def instantiate(row)
@@ -1455,7 +1477,9 @@
@aliased_prefix = "t#{ join_dependency.joins.size }"
@aliased_table_name = table_name #.tr('.', '_') #
start with the table name, sub out any .'s
@parent_table_name = parent.active_record.table_name
-
+ @aliased_primary_key = nil
+ aliased_primary_key
+
if !parent.table_joins.blank? &&
parent.table_joins.to_s.downcase =~ %r{join(\s+\w+)?\s+#
{aliased_table_name.downcase}\son}
join_dependency.table_aliases[aliased_table_name] += 1
end
smime.p7s
Description: S/MIME cryptographic signature
