FWIW I tried a join - less than 1/3 second to achieve the same thing. You learn something new every day!
On Saturday, 8 July 2017 00:15:41 UTC+1, Bruce Steedman wrote: > > Sorry brain freeze - of course parallelism in Ruby is a red herring. > > Thanks for the suggested improvements. The first query to get the array is > almost instantaneous, so I don't expect much performance improvement by > combining the two - but I will reach out to the PostgreSQL community re how > to run this on multiple cores. > > Have a great weekend Jeremy and thanks again. > > On Saturday, 8 July 2017 00:06:17 UTC+1, Jeremy Evans wrote: >> >> On Friday, July 7, 2017 at 2:53:02 PM UTC-7, Bruce Steedman wrote: >>> >>> Apologies, I should have been clearer. I would like to execute a single >>> query on multiple cores - e.g. using the Parallel gem, as per the example >>> link I posted. I am just after confirmation this will work with Sequel a >>> code example of such a query and pointers on how I may need to configure my >>> database (e.g. the docs >>> <https://www.postgresql.org/docs/9.6/static/when-can-parallel-query-be-used.html> >>> >>> say I need to set *max_parallel_workers_per_gather* to a non-zero value) >>> >>> Here is the query, which identifies orphaned records (out of several >>> hundred thousand). This presently maxes out one core for several minutes & >>> I'd therefore like to use all 4 cores I have available. >>> >>> puzzle_images = DB[:puzzles_images].select_map(:image_hash) >>> orphans = DB[:images].where('imhash NOT in ?', puzzle_images).all >>> >>> >> Whether PostgreSQL will use parallel execution for a given query is >> completely unrelated to Sequel. You mention the use of the parallel gem, >> but that is unrelated to PostgreSQL's parallel query execution. Using the >> parallel gem (or just separate plain ruby threads) would be a way to >> execute multiple database queries concurrently on separate database >> connections, but that is not the same as PostgreSQL 9.6's parallel query >> execution support and probably not what you would want to use here. You >> probably shouldn't mention PostgreSQL 9.6 parallel query execution and any >> ruby parallelism libraries together, as it leads one to the conclusion that >> you don't really know what you want. >> >> For one, you should probably use a subquery instead of passing in an >> array: >> >> DB[:images].exclude(:imhash=>DB[:puzzles_images].select(:image_hash)) >> >> Second, if that is still not fast enough and doesn't use a parallel >> query, do: >> >> puts DB[:images].exclude(:imhash=>DB[:puzzles_images].select(:image_hash >> )).sql >> >> Take the resulting SQL, and ask on a PostgreSQL forum why parallel >> execution is not used. >> >> I'm not sure if using EXISTS or a join approach would be faster or >> slower, but it may be worth trying if the above NOT IN subquery approach is >> not fast enough. >> >> Thanks, >> Jeremy >> > -- You received this message because you are subscribed to the Google Groups "sequel-talk" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/sequel-talk. For more options, visit https://groups.google.com/d/optout.
