Sorry brain freeze - of course parallelism in Ruby is a red herring. Thanks for the suggested improvements. The first query to get the array is almost instantaneous, so I don't expect much performance improvement by combining the two - but I will reach out to the PostgreSQL community re how to run this on multiple cores.
Have a great weekend Jeremy and thanks again. On Saturday, 8 July 2017 00:06:17 UTC+1, Jeremy Evans wrote: > > On Friday, July 7, 2017 at 2:53:02 PM UTC-7, Bruce Steedman wrote: >> >> Apologies, I should have been clearer. I would like to execute a single >> query on multiple cores - e.g. using the Parallel gem, as per the example >> link I posted. I am just after confirmation this will work with Sequel a >> code example of such a query and pointers on how I may need to configure my >> database (e.g. the docs >> <https://www.postgresql.org/docs/9.6/static/when-can-parallel-query-be-used.html> >> >> say I need to set *max_parallel_workers_per_gather* to a non-zero value) >> >> Here is the query, which identifies orphaned records (out of several >> hundred thousand). This presently maxes out one core for several minutes & >> I'd therefore like to use all 4 cores I have available. >> >> puzzle_images = DB[:puzzles_images].select_map(:image_hash) >> orphans = DB[:images].where('imhash NOT in ?', puzzle_images).all >> >> > Whether PostgreSQL will use parallel execution for a given query is > completely unrelated to Sequel. You mention the use of the parallel gem, > but that is unrelated to PostgreSQL's parallel query execution. Using the > parallel gem (or just separate plain ruby threads) would be a way to > execute multiple database queries concurrently on separate database > connections, but that is not the same as PostgreSQL 9.6's parallel query > execution support and probably not what you would want to use here. You > probably shouldn't mention PostgreSQL 9.6 parallel query execution and any > ruby parallelism libraries together, as it leads one to the conclusion that > you don't really know what you want. > > For one, you should probably use a subquery instead of passing in an array: > > DB[:images].exclude(:imhash=>DB[:puzzles_images].select(:image_hash)) > > Second, if that is still not fast enough and doesn't use a parallel query, > do: > > puts DB[:images].exclude(:imhash=>DB[:puzzles_images].select(:image_hash) > ).sql > > Take the resulting SQL, and ask on a PostgreSQL forum why parallel > execution is not used. > > I'm not sure if using EXISTS or a join approach would be faster or slower, > but it may be worth trying if the above NOT IN subquery approach is not > fast enough. > > Thanks, > Jeremy > -- You received this message because you are subscribed to the Google Groups "sequel-talk" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/sequel-talk. For more options, visit https://groups.google.com/d/optout.
