On Thursday, September 1, 2016 at 12:51:58 PM UTC-5, Jeremy Evans wrote:
>
> On Thursday, September 1, 2016 at 9:42:11 AM UTC-7, Trevor Turk wrote:
>>
>> On Thursday, September 1, 2016 at 11:34:44 AM UTC-5, Trevor Turk wrote:
>>>
>>> In my case, I'm on AWS RDS PostgreSQL version 9.3.10 and I'm using the 
>>> "sequel" and "pg" gems. So, I'd be using cursors with paged_each as it 
>>> stands. Do you know if that ought to be safe? I'm considering switching to 
>>> the "sequel_pg" gem for the streaming support, but I'm reluctant to change 
>>> things unnecessarily, unless streaming might work where cursors would not. 
>>>
>>> Thanks again,
>>> - Trevor
>>>
>>  
>> Apologies, I just noticed that "pg" is required for "sequel_pg" so I'll 
>> give that a try now :)
>>
>
> Streaming should be faster, so that's one reason to use sequel_pg. 
>  Streaming also only executes a single query on the database instead of 
> multiple queries.  I would recommend using sequel_pg in general due to the 
> other optimizations it uses.
>
> I think either using cursors or stream should be safe in terms of not 
> locking the whole table, unless you are using SELECT FOR UPDATE, but again, 
> that's something you should test to make sure
>
> Thanks,
> Jeremy
>

Hello -- just reporting back on my progress so far. 

It appears we can use paged_each safely, however I was surprised to find 
that we can't run two simultaneously if they issue updates in the block. 

It seems we can work around this using the :hold option added to 
Dataset#use_cursor in the 4.9.0 release, which appears to remove the 
transaction and (considering the caveats) seems to work well for our case.

The relevant commit is: 
https://github.com/jeremyevans/sequel/commit/6083b73af616c13583f6f360ee028e3baa83c1e4

At first I thought the problem with simultaneous use might be due to 
duplicate cursor names, but it seems `hold: true` is all we need. 

I'm a little concerned about this section of the docs 
https://www.postgresql.org/docs/9.3/static/sql-declare.html:

"A cursor created with WITH HOLD... [i]n the current implementation, the 
rows represented by a held cursor are copied into a temporary file or 
memory area so that they remain available for subsequent transactions."

I'm hoping they don't mean that all rows in a table with millions of rows 
will be held in memory, but rather each page of them. Even if it's all of 
the rows, I'm not sure if that's a cause for concern. (I'll be iterating 
over a table with tens of millions of rows and performing updates to 
backfill newly created columns that we set to null so we could run a 
migration without downtime.)

I'm not concerned about paged_each returning duplicate rows or 
recalculating values etc as the docs warn, because I'll be selecting rows 
with a null value for a specific column which my update will set not a 
non-null value. 

So, I think I can safely proceed with paged_each(hold: true) but I'm 
tempted to experiment with streaming support via sequel_pg in any case. 
I'll do some more research and testing around that and report back again. 

Here's an example of my testing so far:

ok simultaneously:
Person.dataset.limit(15).paged_each { |p| sleep(1) ; print '.' }
Person.dataset.limit(15).paged_each { |p| sleep(1) ; print '.' }

not ok simultaneously:
Person.dataset.limit(15).paged_each { |p| sleep(1) ; 
p.update_all(updated_at: Time.now.utc) ; print '.' }
Person.dataset.limit(15).paged_each { |p| sleep(1) ; 
p.update_all(updated_at: Time.now.utc) ; print '.' }

not ok simultaneously:
Person.dataset.limit(15).paged_each(cursor_name: 'cursor_one') { |p| 
sleep(1) ; p.update_all(updated_at: Time.now.utc) ; print '.' }
Person.dataset.limit(15).paged_each(cursor_name: 'cursor_two') { |p| 
sleep(1) ; p.update_all(updated_at: Time.now.utc) ; print '.' }

ok simultaneously:
Person.dataset.limit(15).paged_each(cursor_name: 'cursor_one', hold: true) 
{ |p| sleep(1) ; p.update_all(updated_at: Time.now.utc) ; print '.' }
Person.dataset.limit(15).paged_each(cursor_name: 'cursor_two', hold: true) 
{ |p| sleep(1) ; p.update_all(updated_at: Time.now.utc) ; print '.' }

ok simultaneously:
Person.dataset.limit(15).paged_each(hold: true) { |p| sleep(1) ; 
p.update_all(updated_at: Time.now.utc) ; print '.' }
Person.dataset.limit(15).paged_each(hold: true) { |p| sleep(1) ; 
p.update_all(updated_at: Time.now.utc) ; print '.' }

Thanks again,
- Trevor 

-- 
You received this message because you are subscribed to the Google Groups 
"sequel-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sequel-talk.
For more options, visit https://groups.google.com/d/optout.

Reply via email to