"Basically we lose all the expressiveness/terseness of ActiveRecord."
I'm not quite sure why you'd want to use ActiveRecord at all in this
scenario - it seems like a bit of a golden hammer. Sure, it has some nice
expressiveness - but you should be able to build your own object model that
is just as expressive, and if you have all the objects in memory you aren't
doing all the painful work AR has to do to deal with SQL.
My approach would be to load all 80,000 records into memory, build a
querying/processing model over them yourself, and then serialize them back
to the database at the end. You could use AR to do the loading/saving, or
any other ORM/DB layer.
"A lot of the time this means that we pull
the records out of the database and put them into some kind of look-up
structure (a hash with a key equal to the attributes (plural) of
interest in the model). This allows us to do a kind of "in memory"
query but, annoyingly, only on whatever we choose as the key for the
hash. You can no longer perform general queries on the collection in
memory."
You can store as many different indexes as you want, with this sort of
thing. I regularly use something like:
class IndexedStuff
attr_reader :by_key, :by_foo, :by_bar
def initialize(stuff)
@by_key = {}
@by_foo = {}
@by_bar = {}
stuff.each do |thing|
@by_key[thing.key] = thing
(@by_foo[thing.foo] ||= []) << thing
(@by_bar[thing.bar] ||= []) << thing
end
end
end
This is a bit hard-coded; it'd be possible to build something more generic,
but it really depends how generic your data access really needs to be.
For full-table scan equivalents, you can always write your own 'find_by'
method that just scans all the data - with a bit of metaprogramming you
should be able to make it pretty expressive, too. You can always steal the
metaprogramming code from ActiveRecord/ActiveModel if you want...
I've had too much pain in the past dealing with hierarchies in databases -
if you have the luxury of loading it all in to memory, you should do so!
(Admittedly, most of my database pain was in Hibernate not in ruby-land; but
it's fundamentally pretty painful, mapping object hierarchies to database
tables)
- Korny
On Thu, May 13, 2010 at 2:43 PM, Sean Seefried <[email protected]>wrote:
> Hi all,
>
> I've got a question that I hope generates healthy debate and perhaps
> even a solution for me. Without going into too much detail I'm
> working on a project in which we perform calculations on a large
> hierarchical data set. We haven't used the acts_as_tree or
> acts_as_nested plugins because each level in the hierarchy has a well
> defined role and various attributes that only fit at that level in the
> hierarchy. To give you a brief taste the hierarchy roughly goes: local
> government area, precinct, building type, consumption.
>
> For a given local government area the number of records contained in
> the entire hierarchy is about 80000, the bulk of them being
> consumption records (since they are the leaves of the hierarchy). A
> feature of the project is that one should be able to perform a
> projection of consumptions into the future. This is a fairly complex
> algorithm and involves looking at all 80000 records and combining them
> in various ways.
>
> This algorithm, naively written, has a huge database latency. The bulk
> of the time is spent querying and receiving results from the
> database.
>
> We have had some success in optimising various parts of the algorithm
> by performing less queries. A lot of the time this means that we pull
> the records out of the database and put them into some kind of look-up
> structure (a hash with a key equal to the attributes (plural) of
> interest in the model). This allows us to do a kind of "in memory"
> query but, annoyingly, only on whatever we choose as the key for the
> hash. You can no longer perform general queries on the collection in
> memory. Basically we lose all the expressiveness/terseness of
> ActiveRecord.
>
> What we really want to be able to do is this:
>
> 1. We want to pull a large collection of objects from the database
> into memory
> 2. We want to be able to select subsets of these in-memory objects
> with a similar flexibility
> to querying them using ActiveRecord
> 3. After having updated them in memory we want to be able to write
> them back to the database. I should metion that none of their unique
> keys will have changed. We want this to be some kind of bulk update.
>
> Some approaches that I've thought of but I don't think work:
> - Simply caching will not work because, as far as I know, this only
> works when you perform the same query twice. We are not doing this.
> We're querying the collection in many different ways.
> - This isn't really an issue to do with the kind of database.
> Switching over to CouchDB or an Object Database doesn't obviously
> solve our problem. The problem is that although we know in advance
> that all our queries will returns results that are a subset of the
> collection of 80000 objects we want to be able to perform many
> different sorts of queries returning many different subsets of the
> 80000 objects. Having the objects all sitting in memory really seems
> to be the way to go.
> - We could also just create a class hierarchy that mirrors the
> hierarchy of the data and forget about ActiveRecord entirely. We could
> then just serialize this structure and write it to disk or a database.
> We don't get the advantage of being able to query the data structure
> this way though.
>
> Some final notes:
> a) This site will, at most, have a few simultaneous clients so having
> all 80000 records in memory should not be a problem.
> b) We've had some luck with bulk-update part of point 3 above. Zach
> Dennis' AR-extensions plug-in has been quite useful.
>
> This email is still not as clear as I wanted it to be even though I've
> spent some time on it. If you need any clarification please feel free
> to ask.
>
> Cheers,
>
> Sean
>
> --
> You received this message because you are subscribed to the Google Groups
> "Ruby or Rails Oceania" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected]<rails-oceania%[email protected]>
> .
> For more options, visit this group at
> http://groups.google.com/group/rails-oceania?hl=en.
>
>
--
Kornelis Sietsma korny at my surname dot com
kornys on twitter/fb/gtalk/gwave www.sietsma.com/korny
"Every jumbled pile of person has a thinking part
that wonders what the part that isn't thinking
isn't thinking of"
--
You received this message because you are subscribed to the Google Groups "Ruby
or Rails Oceania" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/rails-oceania?hl=en.