OK, saying 80,000 records is pretty small was dumb. I mearly meant to say a RDBMS should be able to work with that number without any great problem.
Cheers, Anthony On Thu, May 13, 2010 at 2:20 PM, Anthony Richardson < [email protected]> wrote: > 80,000 records is pretty small. Assuming your calcuations/processing can be > expressed in the native function language of your Database you should be > able to get the database to do this fast. This is the bread and butter of > databases. > > Almost certainly using ruby to manually process 80,000 objects is a bad > idea within a request. > > Cheers, > > Anthony Richardson > > > > On Thu, May 13, 2010 at 2:13 PM, Sean Seefried <[email protected]>wrote: > >> Hi all, >> >> I've got a question that I hope generates healthy debate and perhaps >> even a solution for me. Without going into too much detail I'm >> working on a project in which we perform calculations on a large >> hierarchical data set. We haven't used the acts_as_tree or >> acts_as_nested plugins because each level in the hierarchy has a well >> defined role and various attributes that only fit at that level in the >> hierarchy. To give you a brief taste the hierarchy roughly goes: local >> government area, precinct, building type, consumption. >> >> For a given local government area the number of records contained in >> the entire hierarchy is about 80000, the bulk of them being >> consumption records (since they are the leaves of the hierarchy). A >> feature of the project is that one should be able to perform a >> projection of consumptions into the future. This is a fairly complex >> algorithm and involves looking at all 80000 records and combining them >> in various ways. >> >> This algorithm, naively written, has a huge database latency. The bulk >> of the time is spent querying and receiving results from the >> database. >> >> We have had some success in optimising various parts of the algorithm >> by performing less queries. A lot of the time this means that we pull >> the records out of the database and put them into some kind of look-up >> structure (a hash with a key equal to the attributes (plural) of >> interest in the model). This allows us to do a kind of "in memory" >> query but, annoyingly, only on whatever we choose as the key for the >> hash. You can no longer perform general queries on the collection in >> memory. Basically we lose all the expressiveness/terseness of >> ActiveRecord. >> >> What we really want to be able to do is this: >> >> 1. We want to pull a large collection of objects from the database >> into memory >> 2. We want to be able to select subsets of these in-memory objects >> with a similar flexibility >> to querying them using ActiveRecord >> 3. After having updated them in memory we want to be able to write >> them back to the database. I should metion that none of their unique >> keys will have changed. We want this to be some kind of bulk update. >> >> Some approaches that I've thought of but I don't think work: >> - Simply caching will not work because, as far as I know, this only >> works when you perform the same query twice. We are not doing this. >> We're querying the collection in many different ways. >> - This isn't really an issue to do with the kind of database. >> Switching over to CouchDB or an Object Database doesn't obviously >> solve our problem. The problem is that although we know in advance >> that all our queries will returns results that are a subset of the >> collection of 80000 objects we want to be able to perform many >> different sorts of queries returning many different subsets of the >> 80000 objects. Having the objects all sitting in memory really seems >> to be the way to go. >> - We could also just create a class hierarchy that mirrors the >> hierarchy of the data and forget about ActiveRecord entirely. We could >> then just serialize this structure and write it to disk or a database. >> We don't get the advantage of being able to query the data structure >> this way though. >> >> Some final notes: >> a) This site will, at most, have a few simultaneous clients so having >> all 80000 records in memory should not be a problem. >> b) We've had some luck with bulk-update part of point 3 above. Zach >> Dennis' AR-extensions plug-in has been quite useful. >> >> This email is still not as clear as I wanted it to be even though I've >> spent some time on it. If you need any clarification please feel free >> to ask. >> >> Cheers, >> >> Sean >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Ruby or Rails Oceania" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]<rails-oceania%[email protected]> >> . >> For more options, visit this group at >> http://groups.google.com/group/rails-oceania?hl=en. >> >> > -- You received this message because you are subscribed to the Google Groups "Ruby or Rails Oceania" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/rails-oceania?hl=en.
