Re: Model.import best practice, lots of bad data (not my fault!)

Michael Lang Tue, 08 Jan 2013 08:25:09 -0800

Brandt,

If those records have a unique ID attached to them, one way to use
Sequel and the DB's constraints is to insert as many as you can in
bulk and then use the ID's as a way to locate the failed ones by
selecting all the ID's back from the server and compare against the
records you have, esp. if you can delimit by date ranges (in the case
of very large datasets).


I actually used to use a combination of approaches to maintaining a
data warehouse using Sequel...I had an export by date range that would
clear each day's records out and then (re)load from external database
source. and I also had a "diff approach" to scanning a day's records
to see if anything changed for that day's dataset since the export
(our system had ability to back-date records (which I thought was bad
design, but had no control over) -- so even though we exported a day
at a time every night, I still needed to reconcile periodically to
ensure all back-dated records made it into the warehouse database, but
didn't want to be exporting a few hundred-thousand records for each
day that had a smattering of changes.  Fortunately, comparing lists of
ID's was a very fast solution that could reconcile a couple hundred
thousand records in a few seconds vs. 20 minutes exporting the whole
day.

Another approach to dealing with large datasets on VPN is to build a
PL/SQL file with insert statements for each record, gzip it, transfer
to server, and then unzip and load with Oracle's command line tools.
Which would work well if the database constraints keep out the
duplicates and you have shell access to the Oracle server (or a
machine in near proximity to the server).

Michael
-- 
http://codeconnoisseur.org

-- 
You received this message because you are subscribed to the Google Groups 
"sequel-talk" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sequel-talk?hl=en.

Re: Model.import best practice, lots of bad data (not my fault!)

Reply via email to