Thanks, Jeremy!
No, we are not using the same database object. Each source or destination
in these ETL jobs is it's own database object.
What follows is the backtrace and I've cobbled together a quick/contrived
example that I hope better demonstrates what we're doing specifically. If
that doesn't help, I could work on putting together an actual example.
Thanks again for your help!
Backtrace:
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/adapters/tinytds.rb:205:in
`fields'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/adapters/tinytds.rb:205:in
`block in fetch_rows'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/adapters/tinytds.rb:65:in
`block in execute'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/connection_pool/threaded.rb:88:in
`hold'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/database/connecting.rb:270:in
`synchronize'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/adapters/tinytds.rb:34:in
`execute'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/dataset/actions.rb:1087:in
`execute'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/adapters/tinytds.rb:202:in
`fetch_rows'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/dataset/actions.rb:152:in
`each'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/dataset/actions.rb:590:in
`block in paged_each'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/database/transactions.rb:245:in
`_transaction'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/database/transactions.rb:220:in
`block in transaction'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/connection_pool/threaded.rb:92:in
`hold'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/database/connecting.rb:270:in
`synchronize'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/database/transactions.rb:186:in
`transaction'
/home/deploy/redacted/shared/bundle/ruby/2.6.0/gems/sequel-5.17.0/lib/sequel/dataset/actions.rb:566:in
`paged_each'
Reads and writes:
class Reader
class << self
def read(dataset)
dataset.paged_each { |row| row }
end
end
end
class Writer
class << self
def write(dataset, &block)
block.call.each_slice(1000) { |batch| dataset.multi_insert(batch) }
end
end
end
dataset = @database[@table_name]
Writer.write(Reader.read(dataset))
On Wednesday, May 8, 2019 at 3:01:40 PM UTC-7, Jeremy Evans wrote:
>
> On Wednesday, May 8, 2019 at 1:46:02 PM UTC-7, Jonathan Simpson wrote:
>>
>> Hello!
>>
>> I am using Sequel for a plethora of ETL jobs at work. We've had a lot of
>> success with PostgreSQL, MySQL, and MariaDB databases so far. We're slowly
>> rolling all of our existing data engineering infrastructure into this
>> project and are moving on to the SQL Server databases recently. That has
>> been going well, also, until today when we hit a snag.
>>
>> We're using Sequel, TinyTDS, and Datasets for this work. What we're doing
>> is piping Dataset#paged_each into Dataset#multi_insert, passing blocks
>> between a read and a write method on our Extract and Load handlers, by
>> running Array#each_slice(1000) on the rows we get back from
>> Dataset#paged_each. This has served us _very_ well in every other aspect of
>> this application. Our ETLs are fast and have a ludicrously small memory
>> footprint for the amount of data we are moving around (the background job
>> processor stays steady at ~150MB despite moving GB of data on a daily
>> basis).
>>
>> This is also serving us well with SQL Server tables that are lightweight.
>> Where we hit the snag is a denormalized table that is ~80 columns wide with
>> just under 1.5 million row that we want to move between a data mart and a
>> warehouse.
>>
>> We're seeing TinyTDS connection timeouts from the read call that we send
>> to Dataset#paged_each. I've tried debugging around a bit but I'm not making
>> much progress. Any help or thoughts are appreciated!
>>
>
> Looking at the TinyTDS documentation, you may want to set a :timeout
> option if you aren't already, as the default is fairly low (5 seconds).
> Beyond that, I can't really help without seeing a backtrace and an example
> of what you are doing.
>
> I doubt it is related, but are you using the same Sequel::Database object
> for both the paged_each and the multi_insert? In general you don't want to
> do that unless you are using a separate thread or shard, as Sequel in
> general does not suppose running queries inside Dataset#each (and
> #paged_each calls #each). Whether it actually works depends on how the
> adapter/driver is implemented, but in any case it is not supported.
>
> Thanks,
> Jeremy
>
--
You received this message because you are subscribed to the Google Groups
"sequel-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sequel-talk.
To view this discussion on the web visit
https://groups.google.com/d/msgid/sequel-talk/8e11961c-f3c4-4701-b036-e627947b0ab3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.