On Thursday, September 1, 2016 at 12:51:58 PM UTC-5, Jeremy Evans wrote: > > On Thursday, September 1, 2016 at 9:42:11 AM UTC-7, Trevor Turk wrote: >> >> On Thursday, September 1, 2016 at 11:34:44 AM UTC-5, Trevor Turk wrote: >>> >>> In my case, I'm on AWS RDS PostgreSQL version 9.3.10 and I'm using the >>> "sequel" and "pg" gems. So, I'd be using cursors with paged_each as it >>> stands. Do you know if that ought to be safe? I'm considering switching to >>> the "sequel_pg" gem for the streaming support, but I'm reluctant to change >>> things unnecessarily, unless streaming might work where cursors would not. >>> >>> Thanks again, >>> - Trevor >>> >> >> Apologies, I just noticed that "pg" is required for "sequel_pg" so I'll >> give that a try now :) >> > > Streaming should be faster, so that's one reason to use sequel_pg. > Streaming also only executes a single query on the database instead of > multiple queries. I would recommend using sequel_pg in general due to the > other optimizations it uses. > > I think either using cursors or stream should be safe in terms of not > locking the whole table, unless you are using SELECT FOR UPDATE, but again, > that's something you should test to make sure > > Thanks, > Jeremy >
Hello -- just reporting back on my progress so far. It appears we can use paged_each safely, however I was surprised to find that we can't run two simultaneously if they issue updates in the block. It seems we can work around this using the :hold option added to Dataset#use_cursor in the 4.9.0 release, which appears to remove the transaction and (considering the caveats) seems to work well for our case. The relevant commit is: https://github.com/jeremyevans/sequel/commit/6083b73af616c13583f6f360ee028e3baa83c1e4 At first I thought the problem with simultaneous use might be due to duplicate cursor names, but it seems `hold: true` is all we need. I'm a little concerned about this section of the docs https://www.postgresql.org/docs/9.3/static/sql-declare.html: "A cursor created with WITH HOLD... [i]n the current implementation, the rows represented by a held cursor are copied into a temporary file or memory area so that they remain available for subsequent transactions." I'm hoping they don't mean that all rows in a table with millions of rows will be held in memory, but rather each page of them. Even if it's all of the rows, I'm not sure if that's a cause for concern. (I'll be iterating over a table with tens of millions of rows and performing updates to backfill newly created columns that we set to null so we could run a migration without downtime.) I'm not concerned about paged_each returning duplicate rows or recalculating values etc as the docs warn, because I'll be selecting rows with a null value for a specific column which my update will set not a non-null value. So, I think I can safely proceed with paged_each(hold: true) but I'm tempted to experiment with streaming support via sequel_pg in any case. I'll do some more research and testing around that and report back again. Here's an example of my testing so far: ok simultaneously: Person.dataset.limit(15).paged_each { |p| sleep(1) ; print '.' } Person.dataset.limit(15).paged_each { |p| sleep(1) ; print '.' } not ok simultaneously: Person.dataset.limit(15).paged_each { |p| sleep(1) ; p.update_all(updated_at: Time.now.utc) ; print '.' } Person.dataset.limit(15).paged_each { |p| sleep(1) ; p.update_all(updated_at: Time.now.utc) ; print '.' } not ok simultaneously: Person.dataset.limit(15).paged_each(cursor_name: 'cursor_one') { |p| sleep(1) ; p.update_all(updated_at: Time.now.utc) ; print '.' } Person.dataset.limit(15).paged_each(cursor_name: 'cursor_two') { |p| sleep(1) ; p.update_all(updated_at: Time.now.utc) ; print '.' } ok simultaneously: Person.dataset.limit(15).paged_each(cursor_name: 'cursor_one', hold: true) { |p| sleep(1) ; p.update_all(updated_at: Time.now.utc) ; print '.' } Person.dataset.limit(15).paged_each(cursor_name: 'cursor_two', hold: true) { |p| sleep(1) ; p.update_all(updated_at: Time.now.utc) ; print '.' } ok simultaneously: Person.dataset.limit(15).paged_each(hold: true) { |p| sleep(1) ; p.update_all(updated_at: Time.now.utc) ; print '.' } Person.dataset.limit(15).paged_each(hold: true) { |p| sleep(1) ; p.update_all(updated_at: Time.now.utc) ; print '.' } Thanks again, - Trevor -- You received this message because you are subscribed to the Google Groups "sequel-talk" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/sequel-talk. For more options, visit https://groups.google.com/d/optout.
