> 1. We want to pull a large collection of objects from the database
into memory
I guess you could put the database completely in memory.
Alternatively, export just those 80,000 rows you're interested in, load
into e.g. sqlite3 in memory database, and you have all the
expressiveness of SQL via AR / Arel if that's what you want.
> This isn't really an issue to do with the kind of database.
Depends on what type of calculations you need to do, and how important
speed is. On-the-fly time series calculations are virtually impossible
with row-oriented databases for example, whereas another database could
yield a result within subseconds having considered a billion records.
Lawrence
OK, saying 80,000 records is pretty small was dumb. I mearly meant to
say a RDBMS should be able to work with that number without any great
problem.
Cheers,
Anthony
On Thu, May 13, 2010 at 2:20 PM, Anthony Richardson
<[email protected] <mailto:[email protected]>>
wrote:
80,000 records is pretty small. Assuming your
calcuations/processing can be expressed in the native function
language of your Database you should be able to get the database
to do this fast. This is the bread and butter of databases.
Almost certainly using ruby to manually process 80,000 objects is
a bad idea within a request.
Cheers,
Anthony Richardson
On Thu, May 13, 2010 at 2:13 PM, Sean Seefried
<[email protected] <mailto:[email protected]>> wrote:
Hi all,
I've got a question that I hope generates healthy debate and
perhaps
even a solution for me. Without going into too much detail I'm
working on a project in which we perform calculations on a large
hierarchical data set. We haven't used the acts_as_tree or
acts_as_nested plugins because each level in the hierarchy has
a well
defined role and various attributes that only fit at that
level in the
hierarchy. To give you a brief taste the hierarchy roughly
goes: local
government area, precinct, building type, consumption.
For a given local government area the number of records
contained in
the entire hierarchy is about 80000, the bulk of them being
consumption records (since they are the leaves of the
hierarchy). A
feature of the project is that one should be able to perform a
projection of consumptions into the future. This is a fairly
complex
algorithm and involves looking at all 80000 records and
combining them
in various ways.
This algorithm, naively written, has a huge database latency.
The bulk
of the time is spent querying and receiving results from the
database.
We have had some success in optimising various parts of the
algorithm
by performing less queries. A lot of the time this means that
we pull
the records out of the database and put them into some kind of
look-up
structure (a hash with a key equal to the attributes (plural) of
interest in the model). This allows us to do a kind of "in
memory"
query but, annoyingly, only on whatever we choose as the key
for the
hash. You can no longer perform general queries on the
collection in
memory. Basically we lose all the expressiveness/terseness of
ActiveRecord.
What we really want to be able to do is this:
1. We want to pull a large collection of objects from the database
into memory
2. We want to be able to select subsets of these in-memory objects
with a similar flexibility
to querying them using ActiveRecord
3. After having updated them in memory we want to be able to write
them back to the database. I should metion that none of their
unique
keys will have changed. We want this to be some kind of bulk
update.
Some approaches that I've thought of but I don't think work:
- Simply caching will not work because, as far as I know, this
only
works when you perform the same query twice. We are not doing
this.
We're querying the collection in many different ways.
- This isn't really an issue to do with the kind of database.
Switching over to CouchDB or an Object Database doesn't obviously
solve our problem. The problem is that although we know in advance
that all our queries will returns results that are a subset of the
collection of 80000 objects we want to be able to perform many
different sorts of queries returning many different subsets of the
80000 objects. Having the objects all sitting in memory
really seems
to be the way to go.
- We could also just create a class hierarchy that mirrors the
hierarchy of the data and forget about ActiveRecord entirely.
We could
then just serialize this structure and write it to disk or a
database.
We don't get the advantage of being able to query the data
structure
this way though.
Some final notes:
a) This site will, at most, have a few simultaneous clients so
having
all 80000 records in memory should not be a problem.
b) We've had some luck with bulk-update part of point 3 above.
Zach
Dennis' AR-extensions plug-in has been quite useful.
This email is still not as clear as I wanted it to be even
though I've
spent some time on it. If you need any clarification please
feel free
to ask.
Cheers,
Sean
--
You received this message because you are subscribed to the
Google Groups "Ruby or Rails Oceania" group.
To post to this group, send email to
[email protected]
<mailto:[email protected]>.
To unsubscribe from this group, send email to
[email protected]
<mailto:rails-oceania%[email protected]>.
For more options, visit this group at
http://groups.google.com/group/rails-oceania?hl=en.
--
You received this message because you are subscribed to the Google
Groups "Ruby or Rails Oceania" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/rails-oceania?hl=en.
--
You received this message because you are subscribed to the Google Groups "Ruby or
Rails Oceania" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/rails-oceania?hl=en.