[rails-oceania] Re: How should we handle our data?

Daniel Thu, 13 May 2010 17:23:34 -0700

I believe postgres is able to run stored procedures written in ruby
(or perl, python, lisp, c...).
One alternative I haven't seen suggested is to load the entire
database into an SQLITE in-memory instance.
That would give you in-memory sql access.


On May 13, 10:17 pm, Shanon McQuay <[email protected]> wrote:
> 80,000 records is small enough data set that you should have a lot of
> options available.
>
> Just taking a step back from the problem:
>
> 1) Do you have a specific, measurable goal for your performance?
> 2) How big is the gap between your goal and where you are now?
>
> On May 13, 2:43 pm, Sean Seefried <[email protected]> wrote:
>
>
>
>
>
> > Hi all,
>
> > I've got a question that I hope generates healthy debate and perhaps
> > even a solution for me.  Without going into too much detail I'm
> > working on a project in which we perform calculations on a large
> > hierarchical data set. We haven't used the acts_as_tree or
> > acts_as_nested plugins because each level in the hierarchy has a well
> > defined role and various attributes that only fit at that level in the
> > hierarchy. To give you a brief taste the hierarchy roughly goes: local
> > government area, precinct, building type, consumption.
>
> > For a given local government area the number of records contained in
> > the entire hierarchy is about 80000, the bulk of them being
> > consumption records (since they are the leaves of the hierarchy). A
> > feature of the project is that one should be able to perform a
> > projection of consumptions into the future. This is a fairly complex
> > algorithm and involves looking at all 80000 records and combining them
> > in various ways.
>
> > This algorithm, naively written, has a huge database latency. The bulk
> > of the time is spent querying and receiving results from the
> > database.
>
> > We have had some success in optimising various parts of the algorithm
> > by performing less queries. A lot of the time this means that we pull
> > the records out of the database and put them into some kind of look-up
> > structure (a hash with a key equal to the attributes (plural) of
> > interest in the model).  This allows us to do a kind of "in memory"
> > query but, annoyingly, only on whatever we choose as the key for the
> > hash. You can no longer perform general queries on the collection in
> > memory. Basically we lose all the expressiveness/terseness of
> > ActiveRecord.
>
> > What we really want to be able to do is this:
>
> > 1. We want to pull a large collection of objects from the database
> > into memory
> > 2. We want to be able to select subsets of these in-memory objects
> > with a similar flexibility
> >    to querying them using ActiveRecord
> > 3. After having updated them in memory we want to be able to write
> > them back to the database. I should metion that none of their unique
> > keys will have changed. We want this to be some kind of bulk update.
>
> > Some approaches that I've thought of but I don't think work:
> > - Simply caching will not work because, as far as I know, this only
> > works when you perform the same query twice. We are not doing this.
> > We're querying the collection in many different ways.
> > - This isn't really an issue to do with the kind of database.
> > Switching over to CouchDB or an Object Database doesn't obviously
> > solve our problem. The problem is that although we know in advance
> > that all our queries will returns results that are a subset of the
> > collection of 80000 objects we want to be able to perform many
> > different sorts of queries returning many different subsets of the
> > 80000 objects.  Having the objects all sitting in memory really seems
> > to be the way to go.
> > - We could also just create a class hierarchy that mirrors the
> > hierarchy of the data and forget about ActiveRecord entirely. We could
> > then just serialize this structure and write it to disk or a database.
> > We don't get the advantage of being able to query the data structure
> > this way though.
>
> > Some final notes:
> > a) This site will, at most, have a few simultaneous clients so having
> > all 80000 records in memory should not be a problem.
> > b) We've had some luck with bulk-update part of point 3 above.  Zach
> > Dennis' AR-extensions plug-in has been quite useful.
>
> > This email is still not as clear as I wanted it to be even though I've
> > spent some time on it. If you need any clarification please feel free
> > to ask.
>
> > Cheers,
>
> > Sean
>
> > --
> > You received this message because you are subscribed to the Google Groups 
> > "Ruby or Rails Oceania" group.
> > To post to this group, send email to [email protected].
> > To unsubscribe from this group, send email to 
> > [email protected].
> > For more options, visit this group 
> > athttp://groups.google.com/group/rails-oceania?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Ruby or Rails Oceania" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group 
> athttp://groups.google.com/group/rails-oceania?hl=en.

-- 
You received this message because you are subscribed to the Google Groups "Ruby 
or Rails Oceania" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/rails-oceania?hl=en.

[rails-oceania] Re: How should we handle our data?

Reply via email to