Re: TinyTDS, Sequel::Dataset#paged_each, timeout

Jonathan Simpson Wed, 08 May 2019 17:14:29 -0700

Thanks, Jeremy!

No, we are not using the same database object. Each source or destination 
in these ETL jobs is it's own database object.


What follows is the backtrace and I've cobbled together a quick/contrived 
example that I hope better demonstrates what we're doing specifically. If 
that doesn't help, I could work on putting together an actual example. 
Thanks again for your help!

Backtrace:

/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/adapters/tinytds.rb:205:in
 `fields'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/adapters/tinytds.rb:205:in
 `block in fetch_rows'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/adapters/tinytds.rb:65:in
 `block in execute'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/connection_pool/threaded.rb:88:in
 `hold'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/database/connecting.rb:270:in
 `synchronize'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/adapters/tinytds.rb:34:in
 `execute'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/dataset/actions.rb:1087:in
 `execute'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/adapters/tinytds.rb:202:in
 `fetch_rows'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/dataset/actions.rb:152:in
 `each'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/dataset/actions.rb:590:in
 `block in paged_each'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/database/transactions.rb:245:in
 `_transaction'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/database/transactions.rb:220:in
 `block in transaction'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/connection_pool/threaded.rb:92:in
 `hold'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/database/connecting.rb:270:in
 `synchronize'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/database/transactions.rb:186:in
 `transaction'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/dataset/actions.rb:566:in
 `paged_each'


Reads and writes:

class Reader
  class << self
    def read(dataset)
      dataset.paged_each { |row| row }
    end
  end
end

class Writer
  class << self
    def write(dataset, &block)
       block.call.each_slice(1000) { |batch| dataset.multi_insert(batch) }
    end
  end
end

dataset = @database[@table_name]
Writer.write(Reader.read(dataset))

On Wednesday, May 8, 2019 at 3:01:40 PM UTC-7, Jeremy Evans wrote:
>
> On Wednesday, May 8, 2019 at 1:46:02 PM UTC-7, Jonathan Simpson wrote:
>>
>> Hello!
>>
>> I am using Sequel for a plethora of ETL jobs at work. We've had a lot of 
>> success with PostgreSQL, MySQL, and MariaDB databases so far. We're slowly 
>> rolling all of our existing data engineering infrastructure into this 
>> project and are moving on to the SQL Server databases recently. That has 
>> been going well, also, until today when we hit a snag.
>>
>> We're using Sequel, TinyTDS, and Datasets for this work. What we're doing 
>> is piping Dataset#paged_each into Dataset#multi_insert, passing blocks 
>> between a read and a write method on our Extract and Load handlers, by 
>> running Array#each_slice(1000) on the rows we get back from 
>> Dataset#paged_each. This has served us _very_ well in every other aspect of 
>> this application. Our ETLs are fast and have a ludicrously small memory 
>> footprint for the amount of data we are moving around (the background job 
>> processor stays steady at ~150MB despite moving GB of data on a daily 
>> basis).
>>
>> This is also serving us well with SQL Server tables that are lightweight. 
>> Where we hit the snag is a denormalized table that is ~80 columns wide with 
>> just under 1.5 million row that we want to move between a data mart and a 
>> warehouse.
>>
>> We're seeing TinyTDS connection timeouts from the read call that we send 
>> to Dataset#paged_each. I've tried debugging around a bit but I'm not making 
>> much progress. Any help or thoughts are appreciated!
>>
>
> Looking at the TinyTDS documentation, you may want to set a :timeout 
> option if you aren't already, as the default is fairly low (5 seconds).  
> Beyond that, I can't really help without seeing a backtrace and an example 
> of what you are doing.
>
> I doubt it is related, but are you using the same Sequel::Database object 
> for both the paged_each and the multi_insert?  In general you don't want to 
> do that unless you are using a separate thread or shard, as Sequel in 
> general does not suppose running queries inside Dataset#each (and 
> #paged_each calls #each).  Whether it actually works depends on how the 
> adapter/driver is implemented, but in any case it is not supported.
>
> Thanks,
> Jeremy
>

-- 
You received this message because you are subscribed to the Google Groups 
"sequel-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sequel-talk.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sequel-talk/8e11961c-f3c4-4701-b036-e627947b0ab3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: TinyTDS, Sequel::Dataset#paged_each, timeout

Reply via email to