On Jan 18, 2012, at 12:11 AM, Chris McCann wrote:

> SD Ruby,
> 
> What's the most efficient way to update a large batch of ActiveRecord
> model instances where the same fields are not being updated for each
> model?
> 
> Here's the setup:
> 
> I've got a database of members (about 13,000) and a CSV file that has
> contact information (3 phone numbers and 3 email addresses) for a
> subset of the members (say, 350).  Each row of the CSV file may have
> between 0 and 6 of the contact fields filled in.
> 
> There also might be data in those same fields in each database entry
> for the same member.  What I'm trying to do is update any field in the
> database that's empty with the corresponding attribute in the CSV file
> for that member that isn't blank.
> 
> The brute-force approach is to just loop through all 350 members in
> the CSV file (which contains the primary key of the associated
> database table) and update the attributes with the non-null CSV data.
> Each SQL update statement will be different since there are 6 fields
> that could potentially be updated.  This seems painfully inefficient
> to me.
> 
> An ActiveRecord udpate_all call won't work here since each update
> statement has to be uniquely constructed.
> 
> Can anyone suggest a better approach?

Off the top of my head: insert all the records from the CSV file into a 
temporary table with fields matching the target table, and nulls where you 
don’t have values. If you’re really wanting to do that part fast, use COPY:

http://www.postgresql.org/docs/9.1/interactive/sql-copy.html

(The inferior pseudo-database MySQL has something similar).

Then you can do something like:

UPDATE
        contact_info
SET
        field1 = COALESCE(temp.field1, orig.field1),
        field2 = COALESCE(temp.field2, orig.field2),
        …
FROM
        contact_info orig JOIN
        temp_contact_info temp ON (orig.id = temp.id)

Not tried it. That might not be quite the right syntax, but something like that 
should work.

If you’re doing it often, have the temporary table be permanent and just 
TRUNCATE it before you start.

Combine COPY with the above SQL (and maybe see if some indexes help, although I 
don’t think they will help), and this will happen like greased lightning.

Note that you can easily turn all of this into a stored procedure, and since 
all you’re doing is executing a few database commands, that probably makes 
sense.

-- 
SD Ruby mailing list
[email protected]
http://groups.google.com/group/sdruby

Reply via email to