Re: Model.import best practice, lots of bad data (not my fault!)

Jeremy Evans Sat, 05 Jan 2013 19:59:26 -0800

On Friday, January 4, 2013 7:48:43 PM UTC-8, Brandt Lofton wrote:
> Alright, so I'm new to Sequel and I'm enjoying it quite a bit after getting 
> over having to make a connection before defining a model class.
> 
> 
> So here is the logic I need to use.
> 
> 
> Collect x rows of data, x = ~30k usually
> Loop through these records in batches of say, ~500
> Model import(those 500 records)Import Fail? Move these bad 500 to a low 
> priority queue so I can loop through them 1 by 1Do another 500
> I know what your asking... Why do you have to do this?  Because I didn't 
> design the part that the data comes from, or the database it ends up in.
> I cannot "detect" this bad data because it is violating duplicate key 
> constraints on the db side.  So I don't know if its bad until the database 
> tells me so.
> 
> 
> So anyways, I have been playing with the options Sequel has available.  
> DB.transaction, Model.import(:slice => 1000) etc and trying to figure out 
> what's going to do the trick.
> 
> 
> But for once I'm going to take a step back and try some advice first.  
> Anybody more experienced in this regard have any best practice advice for me 
> before I go much further? I'm over my wtf quota for today...


Your approach sounds reasonable.  If you are using MySQL, you might want to use 
insert_ignore, which will skip duplicate records automatically.

Thanks,
Jeremy

-- 
You received this message because you are subscribed to the Google Groups 
"sequel-talk" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/sequel-talk/-/3a8PV5uXumQJ.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sequel-talk?hl=en.

Re: Model.import best practice, lots of bad data (not my fault!)

Reply via email to