OK, saying 80,000 records is pretty small was dumb. I mearly meant to say a
RDBMS should be able to work with that number without any great problem.

Cheers,

Anthony

On Thu, May 13, 2010 at 2:20 PM, Anthony Richardson <
[email protected]> wrote:

> 80,000 records is pretty small. Assuming your calcuations/processing can be
> expressed in the native function language of your Database you should be
> able to get the database to do this fast. This is the bread and butter of
> databases.
>
> Almost certainly using ruby to manually process 80,000 objects is a bad
> idea within a request.
>
> Cheers,
>
> Anthony Richardson
>
>
>
> On Thu, May 13, 2010 at 2:13 PM, Sean Seefried <[email protected]>wrote:
>
>> Hi all,
>>
>> I've got a question that I hope generates healthy debate and perhaps
>> even a solution for me.  Without going into too much detail I'm
>> working on a project in which we perform calculations on a large
>> hierarchical data set. We haven't used the acts_as_tree or
>> acts_as_nested plugins because each level in the hierarchy has a well
>> defined role and various attributes that only fit at that level in the
>> hierarchy. To give you a brief taste the hierarchy roughly goes: local
>> government area, precinct, building type, consumption.
>>
>> For a given local government area the number of records contained in
>> the entire hierarchy is about 80000, the bulk of them being
>> consumption records (since they are the leaves of the hierarchy). A
>> feature of the project is that one should be able to perform a
>> projection of consumptions into the future. This is a fairly complex
>> algorithm and involves looking at all 80000 records and combining them
>> in various ways.
>>
>> This algorithm, naively written, has a huge database latency. The bulk
>> of the time is spent querying and receiving results from the
>> database.
>>
>> We have had some success in optimising various parts of the algorithm
>> by performing less queries. A lot of the time this means that we pull
>> the records out of the database and put them into some kind of look-up
>> structure (a hash with a key equal to the attributes (plural) of
>> interest in the model).  This allows us to do a kind of "in memory"
>> query but, annoyingly, only on whatever we choose as the key for the
>> hash. You can no longer perform general queries on the collection in
>> memory. Basically we lose all the expressiveness/terseness of
>> ActiveRecord.
>>
>> What we really want to be able to do is this:
>>
>> 1. We want to pull a large collection of objects from the database
>> into memory
>> 2. We want to be able to select subsets of these in-memory objects
>> with a similar flexibility
>>   to querying them using ActiveRecord
>> 3. After having updated them in memory we want to be able to write
>> them back to the database. I should metion that none of their unique
>> keys will have changed. We want this to be some kind of bulk update.
>>
>> Some approaches that I've thought of but I don't think work:
>> - Simply caching will not work because, as far as I know, this only
>> works when you perform the same query twice. We are not doing this.
>> We're querying the collection in many different ways.
>> - This isn't really an issue to do with the kind of database.
>> Switching over to CouchDB or an Object Database doesn't obviously
>> solve our problem. The problem is that although we know in advance
>> that all our queries will returns results that are a subset of the
>> collection of 80000 objects we want to be able to perform many
>> different sorts of queries returning many different subsets of the
>> 80000 objects.  Having the objects all sitting in memory really seems
>> to be the way to go.
>> - We could also just create a class hierarchy that mirrors the
>> hierarchy of the data and forget about ActiveRecord entirely. We could
>> then just serialize this structure and write it to disk or a database.
>> We don't get the advantage of being able to query the data structure
>> this way though.
>>
>> Some final notes:
>> a) This site will, at most, have a few simultaneous clients so having
>> all 80000 records in memory should not be a problem.
>> b) We've had some luck with bulk-update part of point 3 above.  Zach
>> Dennis' AR-extensions plug-in has been quite useful.
>>
>> This email is still not as clear as I wanted it to be even though I've
>> spent some time on it. If you need any clarification please feel free
>> to ask.
>>
>> Cheers,
>>
>> Sean
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Ruby or Rails Oceania" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected]<rails-oceania%[email protected]>
>> .
>> For more options, visit this group at
>> http://groups.google.com/group/rails-oceania?hl=en.
>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups "Ruby 
or Rails Oceania" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/rails-oceania?hl=en.

Reply via email to