I have to say Chris, I probably wouldn't optimize this that far. Iterating through the CSV really won't take that much time, especially if you are doing it as a background job.
Rob On Jan 18, 2012, at 0:35 , Guyren Howe wrote: > On Jan 18, 2012, at 12:11 AM, Chris McCann wrote: > >> SD Ruby, >> >> What's the most efficient way to update a large batch of ActiveRecord >> model instances where the same fields are not being updated for each >> model? >> >> Here's the setup: >> >> I've got a database of members (about 13,000) and a CSV file that has >> contact information (3 phone numbers and 3 email addresses) for a >> subset of the members (say, 350). Each row of the CSV file may have >> between 0 and 6 of the contact fields filled in. >> >> There also might be data in those same fields in each database entry >> for the same member. What I'm trying to do is update any field in the >> database that's empty with the corresponding attribute in the CSV file >> for that member that isn't blank. >> >> The brute-force approach is to just loop through all 350 members in >> the CSV file (which contains the primary key of the associated >> database table) and update the attributes with the non-null CSV data. >> Each SQL update statement will be different since there are 6 fields >> that could potentially be updated. This seems painfully inefficient >> to me. >> >> An ActiveRecord udpate_all call won't work here since each update >> statement has to be uniquely constructed. >> >> Can anyone suggest a better approach? > > Off the top of my head: insert all the records from the CSV file into a > temporary table with fields matching the target table, and nulls where you > don’t have values. If you’re really wanting to do that part fast, use COPY: > > http://www.postgresql.org/docs/9.1/interactive/sql-copy.html > > (The inferior pseudo-database MySQL has something similar). > > Then you can do something like: > > UPDATE > contact_info > SET > field1 = COALESCE(temp.field1, orig.field1), > field2 = COALESCE(temp.field2, orig.field2), > … > FROM > contact_info orig JOIN > temp_contact_info temp ON (orig.id = temp.id) > > Not tried it. That might not be quite the right syntax, but something like > that should work. > > If you’re doing it often, have the temporary table be permanent and just > TRUNCATE it before you start. > > Combine COPY with the above SQL (and maybe see if some indexes help, although > I don’t think they will help), and this will happen like greased lightning. > > Note that you can easily turn all of this into a stored procedure, and since > all you’re doing is executing a few database commands, that probably makes > sense. > > -- > SD Ruby mailing list > [email protected] > http://groups.google.com/group/sdruby -- SD Ruby mailing list [email protected] http://groups.google.com/group/sdruby
