I've been looking into memory behavior in Rails workers.  One thing I've 
noticed is that it's easy to instantiate multiple tens of thousands of 
objects on the Ruby heap even with find_each operating in batches of 1000. 
 Most of these objects appear to be highly redundant.

Consider loading 1000 instances of an AR object MyClass which has 20 
database fields.  There will be at least 20 x 1000 strings allocated, as 
measured by GC.start; ObjectSpace.count_objects[:T_STRING].  Digging 
deeper, it looks like each instance has an internal attributes hash in an 
instance variable.  The first key is typically the string "id".  Each "id" 
string is an individual object, as determined by object_id, even though all 
of these strings are frozen for use as hash keys.

Would it be possible to take advantage of the very large amount of 
duplication in the keys of this hash to save thousands of unnecessary 
objects from being allocated every time a bulk query is run?  Maybe 
something like a StringPool, or getting the column name directly from a 
lower layer, or using symbols would work.

There are also a bunch of empty hashes which could probably be shared in a 
copy on write style.  Right now it looks like six Hashes per AR instance 
with 4 of them empty initially in a typical find query.

Thanks for any thoughts!

-- 
You received this message because you are subscribed to the Google Groups "Ruby 
on Rails: Core" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/rubyonrails-core/-/KHOjbe_BlM0J.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/rubyonrails-core?hl=en.

Reply via email to