On Friday, July 7, 2017 at 2:53:02 PM UTC-7, Bruce Steedman wrote: > > Apologies, I should have been clearer. I would like to execute a single > query on multiple cores - e.g. using the Parallel gem, as per the example > link I posted. I am just after confirmation this will work with Sequel a > code example of such a query and pointers on how I may need to configure my > database (e.g. the docs > <https://www.postgresql.org/docs/9.6/static/when-can-parallel-query-be-used.html> > > say I need to set *max_parallel_workers_per_gather* to a non-zero value) > > Here is the query, which identifies orphaned records (out of several > hundred thousand). This presently maxes out one core for several minutes & > I'd therefore like to use all 4 cores I have available. > > puzzle_images = DB[:puzzles_images].select_map(:image_hash) > orphans = DB[:images].where('imhash NOT in ?', puzzle_images).all > > Whether PostgreSQL will use parallel execution for a given query is completely unrelated to Sequel. You mention the use of the parallel gem, but that is unrelated to PostgreSQL's parallel query execution. Using the parallel gem (or just separate plain ruby threads) would be a way to execute multiple database queries concurrently on separate database connections, but that is not the same as PostgreSQL 9.6's parallel query execution support and probably not what you would want to use here. You probably shouldn't mention PostgreSQL 9.6 parallel query execution and any ruby parallelism libraries together, as it leads one to the conclusion that you don't really know what you want.
For one, you should probably use a subquery instead of passing in an array: DB[:images].exclude(:imhash=>DB[:puzzles_images].select(:image_hash)) Second, if that is still not fast enough and doesn't use a parallel query, do: puts DB[:images].exclude(:imhash=>DB[:puzzles_images].select(:image_hash) ).sql Take the resulting SQL, and ask on a PostgreSQL forum why parallel execution is not used. I'm not sure if using EXISTS or a join approach would be faster or slower, but it may be worth trying if the above NOT IN subquery approach is not fast enough. Thanks, Jeremy -- You received this message because you are subscribed to the Google Groups "sequel-talk" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/sequel-talk. For more options, visit https://groups.google.com/d/optout.
