Sorry brain freeze - of course parallelism in Ruby is a red herring.

Thanks for the suggested improvements. The first query to get the array is 
almost instantaneous, so I don't expect much performance improvement by 
combining the two - but I will reach out to the PostgreSQL community re how 
to run this on multiple cores.

Have a great weekend Jeremy and thanks again.

On Saturday, 8 July 2017 00:06:17 UTC+1, Jeremy Evans wrote:
>
> On Friday, July 7, 2017 at 2:53:02 PM UTC-7, Bruce Steedman wrote:
>>
>> Apologies, I should have been clearer. I would like to execute a single 
>> query on multiple cores - e.g. using the Parallel gem, as per the example 
>> link I posted. I am just after confirmation this will work with Sequel a 
>> code example of such a query and pointers on how I may need to configure my 
>> database (e.g. the docs 
>> <https://www.postgresql.org/docs/9.6/static/when-can-parallel-query-be-used.html>
>>  
>> say I need to set *max_parallel_workers_per_gather* to a non-zero value)
>>
>> Here is the query, which identifies orphaned records (out of several 
>> hundred thousand). This presently maxes out one core for several minutes & 
>> I'd therefore like to use all 4 cores I have available.
>>
>> puzzle_images = DB[:puzzles_images].select_map(:image_hash)
>> orphans = DB[:images].where('imhash NOT in ?', puzzle_images).all
>>
>>
> Whether PostgreSQL will use parallel execution for a given query is 
> completely unrelated to Sequel. You mention the use of the parallel gem, 
> but that is unrelated to PostgreSQL's parallel query execution.  Using the 
> parallel gem (or just separate plain ruby threads) would be a way to 
> execute multiple database queries concurrently on separate database 
> connections, but that is not the same as PostgreSQL 9.6's parallel query 
> execution support and probably not what you would want to use here.  You 
> probably shouldn't mention PostgreSQL 9.6 parallel query execution and any 
> ruby parallelism libraries together, as it leads one to the conclusion that 
> you don't really know what you want.
>
> For one, you should probably use a subquery instead of passing in an array:
>
>  DB[:images].exclude(:imhash=>DB[:puzzles_images].select(:image_hash))
>
> Second, if that is still not fast enough and doesn't use a parallel query, 
> do:
>
>  puts DB[:images].exclude(:imhash=>DB[:puzzles_images].select(:image_hash)
> ).sql
>
> Take the resulting SQL, and ask on a PostgreSQL forum why parallel 
> execution is not used.
>
> I'm not sure if using EXISTS or a join approach would be faster or slower, 
> but it may be worth trying if the above NOT IN subquery approach is not 
> fast enough.
>
> Thanks,
> Jeremy
>

-- 
You received this message because you are subscribed to the Google Groups 
"sequel-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sequel-talk.
For more options, visit https://groups.google.com/d/optout.

Reply via email to