No disrespect to Anthony, but if you have an N^2 algorithm anyhow,
doing it in memory is smarter than getting SQL to do it. It depends
on whether your algorithm and how you can pivot the data to reduce
the complexity.

I've handled this class of problem by loading all the data into memory.
Loading using Active Record might be too slow for this though, so
consider using Model.connection.select_rows with some hand-written
SQL, and building the Ruby structure you need without using AR.

You can index the data multiple different ways with very little incremental
cost, even if some of your computations won't need all the indices.

Clifford Heath, Data Constellation, http://dataconstellation.com
Agile Information Management and Design

On 13/05/2010, at 2:43 PM, Sean Seefried wrote:

Hi all,

I've got a question that I hope generates healthy debate and perhaps
even a solution for me.  Without going into too much detail I'm
working on a project in which we perform calculations on a large
hierarchical data set. We haven't used the acts_as_tree or
acts_as_nested plugins because each level in the hierarchy has a well
defined role and various attributes that only fit at that level in the
hierarchy. To give you a brief taste the hierarchy roughly goes: local
government area, precinct, building type, consumption.

For a given local government area the number of records contained in
the entire hierarchy is about 80000, the bulk of them being
consumption records (since they are the leaves of the hierarchy). A
feature of the project is that one should be able to perform a
projection of consumptions into the future. This is a fairly complex
algorithm and involves looking at all 80000 records and combining them
in various ways.

This algorithm, naively written, has a huge database latency. The bulk
of the time is spent querying and receiving results from the
database.

We have had some success in optimising various parts of the algorithm
by performing less queries. A lot of the time this means that we pull
the records out of the database and put them into some kind of look-up
structure (a hash with a key equal to the attributes (plural) of
interest in the model).  This allows us to do a kind of "in memory"
query but, annoyingly, only on whatever we choose as the key for the
hash. You can no longer perform general queries on the collection in
memory. Basically we lose all the expressiveness/terseness of
ActiveRecord.

What we really want to be able to do is this:

1. We want to pull a large collection of objects from the database
into memory
2. We want to be able to select subsets of these in-memory objects
with a similar flexibility
  to querying them using ActiveRecord
3. After having updated them in memory we want to be able to write
them back to the database. I should metion that none of their unique
keys will have changed. We want this to be some kind of bulk update.

Some approaches that I've thought of but I don't think work:
- Simply caching will not work because, as far as I know, this only
works when you perform the same query twice. We are not doing this.
We're querying the collection in many different ways.
- This isn't really an issue to do with the kind of database.
Switching over to CouchDB or an Object Database doesn't obviously
solve our problem. The problem is that although we know in advance
that all our queries will returns results that are a subset of the
collection of 80000 objects we want to be able to perform many
different sorts of queries returning many different subsets of the
80000 objects.  Having the objects all sitting in memory really seems
to be the way to go.
- We could also just create a class hierarchy that mirrors the
hierarchy of the data and forget about ActiveRecord entirely. We could
then just serialize this structure and write it to disk or a database.
We don't get the advantage of being able to query the data structure
this way though.

Some final notes:
a) This site will, at most, have a few simultaneous clients so having
all 80000 records in memory should not be a problem.
b) We've had some luck with bulk-update part of point 3 above.  Zach
Dennis' AR-extensions plug-in has been quite useful.

This email is still not as clear as I wanted it to be even though I've
spent some time on it. If you need any clarification please feel free
to ask.

Cheers,

Sean

--
You received this message because you are subscribed to the Google Groups "Ruby or Rails Oceania" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to [email protected] . For more options, visit this group at http://groups.google.com/group/rails-oceania?hl=en .


--
You received this message because you are subscribed to the Google Groups "Ruby or 
Rails Oceania" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/rails-oceania?hl=en.

Reply via email to