Thanks, Jeremy!

No, we are not using the same database object. Each source or destination 
in these ETL jobs is it's own database object.

What follows is the backtrace and I've cobbled together a quick/contrived 
example that I hope better demonstrates what we're doing specifically. If 
that doesn't help, I could work on putting together an actual example. 
Thanks again for your help!

Backtrace:

/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/adapters/tinytds.rb:205:in
 `fields'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/adapters/tinytds.rb:205:in
 `block in fetch_rows'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/adapters/tinytds.rb:65:in
 `block in execute'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/connection_pool/threaded.rb:88:in
 `hold'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/database/connecting.rb:270:in
 `synchronize'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/adapters/tinytds.rb:34:in
 `execute'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/dataset/actions.rb:1087:in
 `execute'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/adapters/tinytds.rb:202:in
 `fetch_rows'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/dataset/actions.rb:152:in
 `each'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/dataset/actions.rb:590:in
 `block in paged_each'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/database/transactions.rb:245:in
 `_transaction'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/database/transactions.rb:220:in
 `block in transaction'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/connection_pool/threaded.rb:92:in
 `hold'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/database/connecting.rb:270:in
 `synchronize'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/database/transactions.rb:186:in
 `transaction'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/dataset/actions.rb:566:in
 `paged_each'


Reads and writes:

class Reader
  class << self
    def read(dataset)
      dataset.paged_each { |row| row }
    end
  end
end

class Writer
  class << self
    def write(dataset, &block)
       block.call.each_slice(1000) { |batch| dataset.multi_insert(batch) }
    end
  end
end

dataset = @database[@table_name]
Writer.write(Reader.read(dataset))

On Wednesday, May 8, 2019 at 3:01:40 PM UTC-7, Jeremy Evans wrote:
>
> On Wednesday, May 8, 2019 at 1:46:02 PM UTC-7, Jonathan Simpson wrote:
>>
>> Hello!
>>
>> I am using Sequel for a plethora of ETL jobs at work. We've had a lot of 
>> success with PostgreSQL, MySQL, and MariaDB databases so far. We're slowly 
>> rolling all of our existing data engineering infrastructure into this 
>> project and are moving on to the SQL Server databases recently. That has 
>> been going well, also, until today when we hit a snag.
>>
>> We're using Sequel, TinyTDS, and Datasets for this work. What we're doing 
>> is piping Dataset#paged_each into Dataset#multi_insert, passing blocks 
>> between a read and a write method on our Extract and Load handlers, by 
>> running Array#each_slice(1000) on the rows we get back from 
>> Dataset#paged_each. This has served us _very_ well in every other aspect of 
>> this application. Our ETLs are fast and have a ludicrously small memory 
>> footprint for the amount of data we are moving around (the background job 
>> processor stays steady at ~150MB despite moving GB of data on a daily 
>> basis).
>>
>> This is also serving us well with SQL Server tables that are lightweight. 
>> Where we hit the snag is a denormalized table that is ~80 columns wide with 
>> just under 1.5 million row that we want to move between a data mart and a 
>> warehouse.
>>
>> We're seeing TinyTDS connection timeouts from the read call that we send 
>> to Dataset#paged_each. I've tried debugging around a bit but I'm not making 
>> much progress. Any help or thoughts are appreciated!
>>
>
> Looking at the TinyTDS documentation, you may want to set a :timeout 
> option if you aren't already, as the default is fairly low (5 seconds).  
> Beyond that, I can't really help without seeing a backtrace and an example 
> of what you are doing.
>
> I doubt it is related, but are you using the same Sequel::Database object 
> for both the paged_each and the multi_insert?  In general you don't want to 
> do that unless you are using a separate thread or shard, as Sequel in 
> general does not suppose running queries inside Dataset#each (and 
> #paged_each calls #each).  Whether it actually works depends on how the 
> adapter/driver is implemented, but in any case it is not supported.
>
> Thanks,
> Jeremy
>

-- 
You received this message because you are subscribed to the Google Groups 
"sequel-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sequel-talk.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sequel-talk/8e11961c-f3c4-4701-b036-e627947b0ab3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to