On Wednesday, May 8, 2019 at 1:46:02 PM UTC-7, Jonathan Simpson wrote:
>
> Hello!
>
> I am using Sequel for a plethora of ETL jobs at work. We've had a lot of 
> success with PostgreSQL, MySQL, and MariaDB databases so far. We're slowly 
> rolling all of our existing data engineering infrastructure into this 
> project and are moving on to the SQL Server databases recently. That has 
> been going well, also, until today when we hit a snag.
>
> We're using Sequel, TinyTDS, and Datasets for this work. What we're doing 
> is piping Dataset#paged_each into Dataset#multi_insert, passing blocks 
> between a read and a write method on our Extract and Load handlers, by 
> running Array#each_slice(1000) on the rows we get back from 
> Dataset#paged_each. This has served us _very_ well in every other aspect of 
> this application. Our ETLs are fast and have a ludicrously small memory 
> footprint for the amount of data we are moving around (the background job 
> processor stays steady at ~150MB despite moving GB of data on a daily 
> basis).
>
> This is also serving us well with SQL Server tables that are lightweight. 
> Where we hit the snag is a denormalized table that is ~80 columns wide with 
> just under 1.5 million row that we want to move between a data mart and a 
> warehouse.
>
> We're seeing TinyTDS connection timeouts from the read call that we send 
> to Dataset#paged_each. I've tried debugging around a bit but I'm not making 
> much progress. Any help or thoughts are appreciated!
>

Looking at the TinyTDS documentation, you may want to set a :timeout option 
if you aren't already, as the default is fairly low (5 seconds).  Beyond 
that, I can't really help without seeing a backtrace and an example of what 
you are doing.

I doubt it is related, but are you using the same Sequel::Database object 
for both the paged_each and the multi_insert?  In general you don't want to 
do that unless you are using a separate thread or shard, as Sequel in 
general does not suppose running queries inside Dataset#each (and 
#paged_each calls #each).  Whether it actually works depends on how the 
adapter/driver is implemented, but in any case it is not supported.

Thanks,
Jeremy

-- 
You received this message because you are subscribed to the Google Groups 
"sequel-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sequel-talk.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sequel-talk/ead8c1ca-d7f4-4cd7-adea-f711a01658e3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to