I have to say Chris,
  I probably wouldn't optimize this that far.  Iterating through the CSV really 
won't take that much time, especially if you are doing it as a background job.

Rob

On Jan 18, 2012, at 0:35 , Guyren Howe wrote:

> On Jan 18, 2012, at 12:11 AM, Chris McCann wrote:
> 
>> SD Ruby,
>> 
>> What's the most efficient way to update a large batch of ActiveRecord
>> model instances where the same fields are not being updated for each
>> model?
>> 
>> Here's the setup:
>> 
>> I've got a database of members (about 13,000) and a CSV file that has
>> contact information (3 phone numbers and 3 email addresses) for a
>> subset of the members (say, 350).  Each row of the CSV file may have
>> between 0 and 6 of the contact fields filled in.
>> 
>> There also might be data in those same fields in each database entry
>> for the same member.  What I'm trying to do is update any field in the
>> database that's empty with the corresponding attribute in the CSV file
>> for that member that isn't blank.
>> 
>> The brute-force approach is to just loop through all 350 members in
>> the CSV file (which contains the primary key of the associated
>> database table) and update the attributes with the non-null CSV data.
>> Each SQL update statement will be different since there are 6 fields
>> that could potentially be updated.  This seems painfully inefficient
>> to me.
>> 
>> An ActiveRecord udpate_all call won't work here since each update
>> statement has to be uniquely constructed.
>> 
>> Can anyone suggest a better approach?
> 
> Off the top of my head: insert all the records from the CSV file into a 
> temporary table with fields matching the target table, and nulls where you 
> don’t have values. If you’re really wanting to do that part fast, use COPY:
> 
> http://www.postgresql.org/docs/9.1/interactive/sql-copy.html
> 
> (The inferior pseudo-database MySQL has something similar).
> 
> Then you can do something like:
> 
> UPDATE
>       contact_info
> SET
>       field1 = COALESCE(temp.field1, orig.field1),
>       field2 = COALESCE(temp.field2, orig.field2),
>       …
> FROM
>       contact_info orig JOIN
>       temp_contact_info temp ON (orig.id = temp.id)
> 
> Not tried it. That might not be quite the right syntax, but something like 
> that should work.
> 
> If you’re doing it often, have the temporary table be permanent and just 
> TRUNCATE it before you start.
> 
> Combine COPY with the above SQL (and maybe see if some indexes help, although 
> I don’t think they will help), and this will happen like greased lightning.
> 
> Note that you can easily turn all of this into a stored procedure, and since 
> all you’re doing is executing a few database commands, that probably makes 
> sense.
> 
> -- 
> SD Ruby mailing list
> [email protected]
> http://groups.google.com/group/sdruby

-- 
SD Ruby mailing list
[email protected]
http://groups.google.com/group/sdruby

Reply via email to