FWIW I tried a join - less than 1/3 second to achieve the same thing. You 
learn something new every day!

On Saturday, 8 July 2017 00:15:41 UTC+1, Bruce Steedman wrote:
>
> Sorry brain freeze - of course parallelism in Ruby is a red herring.
>
> Thanks for the suggested improvements. The first query to get the array is 
> almost instantaneous, so I don't expect much performance improvement by 
> combining the two - but I will reach out to the PostgreSQL community re how 
> to run this on multiple cores.
>
> Have a great weekend Jeremy and thanks again.
>
> On Saturday, 8 July 2017 00:06:17 UTC+1, Jeremy Evans wrote:
>>
>> On Friday, July 7, 2017 at 2:53:02 PM UTC-7, Bruce Steedman wrote:
>>>
>>> Apologies, I should have been clearer. I would like to execute a single 
>>> query on multiple cores - e.g. using the Parallel gem, as per the example 
>>> link I posted. I am just after confirmation this will work with Sequel a 
>>> code example of such a query and pointers on how I may need to configure my 
>>> database (e.g. the docs 
>>> <https://www.postgresql.org/docs/9.6/static/when-can-parallel-query-be-used.html>
>>>  
>>> say I need to set *max_parallel_workers_per_gather* to a non-zero value)
>>>
>>> Here is the query, which identifies orphaned records (out of several 
>>> hundred thousand). This presently maxes out one core for several minutes & 
>>> I'd therefore like to use all 4 cores I have available.
>>>
>>> puzzle_images = DB[:puzzles_images].select_map(:image_hash)
>>> orphans = DB[:images].where('imhash NOT in ?', puzzle_images).all
>>>
>>>
>> Whether PostgreSQL will use parallel execution for a given query is 
>> completely unrelated to Sequel. You mention the use of the parallel gem, 
>> but that is unrelated to PostgreSQL's parallel query execution.  Using the 
>> parallel gem (or just separate plain ruby threads) would be a way to 
>> execute multiple database queries concurrently on separate database 
>> connections, but that is not the same as PostgreSQL 9.6's parallel query 
>> execution support and probably not what you would want to use here.  You 
>> probably shouldn't mention PostgreSQL 9.6 parallel query execution and any 
>> ruby parallelism libraries together, as it leads one to the conclusion that 
>> you don't really know what you want.
>>
>> For one, you should probably use a subquery instead of passing in an 
>> array:
>>
>>  DB[:images].exclude(:imhash=>DB[:puzzles_images].select(:image_hash))
>>
>> Second, if that is still not fast enough and doesn't use a parallel 
>> query, do:
>>
>>  puts DB[:images].exclude(:imhash=>DB[:puzzles_images].select(:image_hash
>> )).sql
>>
>> Take the resulting SQL, and ask on a PostgreSQL forum why parallel 
>> execution is not used.
>>
>> I'm not sure if using EXISTS or a join approach would be faster or 
>> slower, but it may be worth trying if the above NOT IN subquery approach is 
>> not fast enough.
>>
>> Thanks,
>> Jeremy
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"sequel-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sequel-talk.
For more options, visit https://groups.google.com/d/optout.

Reply via email to