On Friday, July 7, 2017 at 2:53:02 PM UTC-7, Bruce Steedman wrote:
>
> Apologies, I should have been clearer. I would like to execute a single 
> query on multiple cores - e.g. using the Parallel gem, as per the example 
> link I posted. I am just after confirmation this will work with Sequel a 
> code example of such a query and pointers on how I may need to configure my 
> database (e.g. the docs 
> <https://www.postgresql.org/docs/9.6/static/when-can-parallel-query-be-used.html>
>  
> say I need to set *max_parallel_workers_per_gather* to a non-zero value)
>
> Here is the query, which identifies orphaned records (out of several 
> hundred thousand). This presently maxes out one core for several minutes & 
> I'd therefore like to use all 4 cores I have available.
>
> puzzle_images = DB[:puzzles_images].select_map(:image_hash)
> orphans = DB[:images].where('imhash NOT in ?', puzzle_images).all
>
>
Whether PostgreSQL will use parallel execution for a given query is 
completely unrelated to Sequel. You mention the use of the parallel gem, 
but that is unrelated to PostgreSQL's parallel query execution.  Using the 
parallel gem (or just separate plain ruby threads) would be a way to 
execute multiple database queries concurrently on separate database 
connections, but that is not the same as PostgreSQL 9.6's parallel query 
execution support and probably not what you would want to use here.  You 
probably shouldn't mention PostgreSQL 9.6 parallel query execution and any 
ruby parallelism libraries together, as it leads one to the conclusion that 
you don't really know what you want.

For one, you should probably use a subquery instead of passing in an array:

 DB[:images].exclude(:imhash=>DB[:puzzles_images].select(:image_hash))

Second, if that is still not fast enough and doesn't use a parallel query, 
do:

 puts DB[:images].exclude(:imhash=>DB[:puzzles_images].select(:image_hash)
).sql

Take the resulting SQL, and ask on a PostgreSQL forum why parallel 
execution is not used.

I'm not sure if using EXISTS or a join approach would be faster or slower, 
but it may be worth trying if the above NOT IN subquery approach is not 
fast enough.

Thanks,
Jeremy

-- 
You received this message because you are subscribed to the Google Groups 
"sequel-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sequel-talk.
For more options, visit https://groups.google.com/d/optout.

Reply via email to