Hello! I am using Sequel for a plethora of ETL jobs at work. We've had a lot of success with PostgreSQL, MySQL, and MariaDB databases so far. We're slowly rolling all of our existing data engineering infrastructure into this project and are moving on to the SQL Server databases recently. That has been going well, also, until today when we hit a snag.
We're using Sequel, TinyTDS, and Datasets for this work. What we're doing is piping Dataset#paged_each into Dataset#multi_insert, passing blocks between a read and a write method on our Extract and Load handlers, by running Array#each_slice(1000) on the rows we get back from Dataset#paged_each. This has served us _very_ well in every other aspect of this application. Our ETLs are fast and have a ludicrously small memory footprint for the amount of data we are moving around (the background job processor stays steady at ~150MB despite moving GB of data on a daily basis). This is also serving us well with SQL Server tables that are lightweight. Where we hit the snag is a denormalized table that is ~80 columns wide with just under 1.5 million row that we want to move between a data mart and a warehouse. We're seeing TinyTDS connection timeouts from the read call that we send to Dataset#paged_each. I've tried debugging around a bit but I'm not making much progress. Any help or thoughts are appreciated! Thanks! -- You received this message because you are subscribed to the Google Groups "sequel-talk" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/sequel-talk. To view this discussion on the web visit https://groups.google.com/d/msgid/sequel-talk/7dd07add-340a-4c9d-a1c1-426e0f66bb28%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
