> 1. We want to pull a large collection of objects from the database into memory

I guess you could put the database completely in memory.

Alternatively, export just those 80,000 rows you're interested in, load into e.g. sqlite3 in memory database, and you have all the expressiveness of SQL via AR / Arel if that's what you want.


> This isn't really an issue to do with the kind of database.

Depends on what type of calculations you need to do, and how important speed is. On-the-fly time series calculations are virtually impossible with row-oriented databases for example, whereas another database could yield a result within subseconds having considered a billion records.



Lawrence

OK, saying 80,000 records is pretty small was dumb. I mearly meant to say a RDBMS should be able to work with that number without any great problem.

Cheers,

Anthony

On Thu, May 13, 2010 at 2:20 PM, Anthony Richardson <[email protected] <mailto:[email protected]>> wrote:

    80,000 records is pretty small. Assuming your
    calcuations/processing can be expressed in the native function
    language of your Database you should be able to get the database
to do this fast. This is the bread and butter of databases.
    Almost certainly using ruby to manually process 80,000 objects is
    a bad idea within a request.

    Cheers,

    Anthony Richardson



    On Thu, May 13, 2010 at 2:13 PM, Sean Seefried
    <[email protected] <mailto:[email protected]>> wrote:

        Hi all,

        I've got a question that I hope generates healthy debate and
        perhaps
        even a solution for me.  Without going into too much detail I'm
        working on a project in which we perform calculations on a large
        hierarchical data set. We haven't used the acts_as_tree or
        acts_as_nested plugins because each level in the hierarchy has
        a well
        defined role and various attributes that only fit at that
        level in the
        hierarchy. To give you a brief taste the hierarchy roughly
        goes: local
        government area, precinct, building type, consumption.

        For a given local government area the number of records
        contained in
        the entire hierarchy is about 80000, the bulk of them being
        consumption records (since they are the leaves of the
        hierarchy). A
        feature of the project is that one should be able to perform a
        projection of consumptions into the future. This is a fairly
        complex
        algorithm and involves looking at all 80000 records and
        combining them
        in various ways.

        This algorithm, naively written, has a huge database latency.
        The bulk
        of the time is spent querying and receiving results from the
        database.

        We have had some success in optimising various parts of the
        algorithm
        by performing less queries. A lot of the time this means that
        we pull
        the records out of the database and put them into some kind of
        look-up
        structure (a hash with a key equal to the attributes (plural) of
        interest in the model).  This allows us to do a kind of "in
        memory"
        query but, annoyingly, only on whatever we choose as the key
        for the
        hash. You can no longer perform general queries on the
        collection in
        memory. Basically we lose all the expressiveness/terseness of
        ActiveRecord.

        What we really want to be able to do is this:

        1. We want to pull a large collection of objects from the database
        into memory
        2. We want to be able to select subsets of these in-memory objects
        with a similar flexibility
          to querying them using ActiveRecord
        3. After having updated them in memory we want to be able to write
        them back to the database. I should metion that none of their
        unique
        keys will have changed. We want this to be some kind of bulk
        update.

        Some approaches that I've thought of but I don't think work:
        - Simply caching will not work because, as far as I know, this
        only
        works when you perform the same query twice. We are not doing
        this.
        We're querying the collection in many different ways.
        - This isn't really an issue to do with the kind of database.
        Switching over to CouchDB or an Object Database doesn't obviously
        solve our problem. The problem is that although we know in advance
        that all our queries will returns results that are a subset of the
        collection of 80000 objects we want to be able to perform many
        different sorts of queries returning many different subsets of the
        80000 objects.  Having the objects all sitting in memory
        really seems
        to be the way to go.
        - We could also just create a class hierarchy that mirrors the
        hierarchy of the data and forget about ActiveRecord entirely.
        We could
        then just serialize this structure and write it to disk or a
        database.
        We don't get the advantage of being able to query the data
        structure
        this way though.

        Some final notes:
        a) This site will, at most, have a few simultaneous clients so
        having
        all 80000 records in memory should not be a problem.
        b) We've had some luck with bulk-update part of point 3 above.
         Zach
        Dennis' AR-extensions plug-in has been quite useful.

        This email is still not as clear as I wanted it to be even
        though I've
        spent some time on it. If you need any clarification please
        feel free
        to ask.

        Cheers,

        Sean

        --
        You received this message because you are subscribed to the
        Google Groups "Ruby or Rails Oceania" group.
        To post to this group, send email to
        [email protected]
        <mailto:[email protected]>.
        To unsubscribe from this group, send email to
        [email protected]
        <mailto:rails-oceania%[email protected]>.
        For more options, visit this group at
        http://groups.google.com/group/rails-oceania?hl=en.



--
You received this message because you are subscribed to the Google Groups "Ruby or Rails Oceania" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/rails-oceania?hl=en.

--
You received this message because you are subscribed to the Google Groups "Ruby or 
Rails Oceania" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/rails-oceania?hl=en.

Reply via email to