On Mon, Aug 22, 2011 at 01:45:11PM +0200, Jonathan Hoser wrote: > Hello Tomas, > > yes, of course I know that an 'ORDER BY' will actually increase the CPU > effort for the query. > However, my consideration was: Why is the original query running eternally? > My thoughts were: The IDs to be filtered against in the DELETE [...] > WHERE id NOT IN ([subselect]) were randomly ordered; > I know that working on a non-sorted data input is requiring much more > effort than just going for a quick sort and working on the product. > So there's the reason for that. > > But let me get these 'explain's for you: > ########### > ORIGINAL: > ########### > spacewalk_1_5=# explain Delete from rhnPackageChangeLogData WHERE id NOT > IN ( SELECT changelog_data_id FROM rhnPackageChangeLogRec); > QUERY PLAN > ---------------------------------------------------------------------------------------------- > Seq Scan on rhnpackagechangelogdata (cost=108146.08..11134104860.94 > rows=151984 width=6) > Filter: (NOT (SubPlan 1)) > SubPlan 1 > -> Materialize (cost=108146.08..170240.97 rows=4465189 width=8) > -> Seq Scan on rhnpackagechangelogrec (cost=0.00..86237.89 > rows=4465189 width=8) > (5 rows) > > ################ > MODIFICATION: > ################ > spacewalk_1_5=# explain Delete from rhnPackageChangeLogData WHERE id NOT > IN ( SELECT DISTINCT changelog_data_id FROM rhnPackageChangeLogRec ORDER > BY changelog_data_id ASC); > QUERY PLAN > > ------------------------------------------------------------------------------------------------- > --- > Seq Scan on rhnpackagechangelogdata (cost=104819.05..115188.64 > rows=151984 width=6) > Filter: (NOT (hashed SubPlan 1)) > SubPlan 1 > -> Sort (cost=104433.39..104626.22 rows=77132 width=8) > Sort Key: rhnpackagechangelogrec.changelog_data_id > -> HashAggregate (cost=97400.86..98172.18 rows=77132 width=8) > -> Seq Scan on rhnpackagechangelogrec > (cost=0.00..86237.89 rows=4465189 width= > 8) > (7 rows) > #################### >
All right. I see there is some speed improvement in the DISTINCT variant for Postgresql. We've tried to check explain plans for Oracle and there was no difference. Committing as: c529c7ea18211e9b5f0ee95469625893d9f8e30e The fix will be available in nightly repo in spacewalk-java-1.6.28-1. Regards, Tomas -- Tomas Lestach RHN Satellite Engineering, Red Hat > Ok, so I went for an 'EXPLAIN ANALYZE', on three queries: > The Original, my 'original' modification and modification without the > 'ORDER BY'; > Turns out, without the ORDER BY, the query is another 600msec faster, > while the 'Explain Analyze' of the original is again *still* executing > (since 5++ Minutes). > So my bad for not testing without the ORDER BY - a hunch of mine in the > wrong direction. > > Anyhow: Here are the 'EXPLAIN ANALYSE' of Mod, and Mod-without-ORDER-BY: > > ############# > Modified: > ############# > spacewalk_1_5=# explain analyze Delete from rhnPackageChangeLogData > WHERE id NOT IN ( SELECT DISTINCT changelog_data_id FROM > rhnPackageChangeLogRec ORDER BY changelog_data_id ASC); > > QUERY PLAN > > ------------------------------------------------------------------------------------------------- > ----------------------------------------------------- > Seq Scan on rhnpackagechangelogdata (cost=104819.05..115188.64 > rows=151984 width=6) (actual tim > e=3476.682..3476.682 rows=0 loops=1) > Filter: (NOT (hashed SubPlan 1)) > SubPlan 1 > -> Sort (cost=104433.39..104626.22 rows=77132 width=8) (actual > time=3149.620..3192.113 row > s=301937 loops=1) > Sort Key: rhnpackagechangelogrec.changelog_data_id > Sort Method: quicksort Memory: 26442kB > -> HashAggregate (cost=97400.86..98172.18 rows=77132 > width=8) (actual time=2473.983. > .2553.576 rows=301937 loops=1) > -> Seq Scan on rhnpackagechangelogrec > (cost=0.00..86237.89 rows=4465189 width= > 8) (actual time=0.008..730.139 rows=4466337 loops=1) > Total runtime: 3483.172 ms > (9 rows) > > ############ > Modified without ORDER-BY > ############ > spacewalk_1_5=# explain analyze Delete from rhnPackageChangeLogData > WHERE id NOT IN ( SELECT DISTINCT changelog_data_id FROM > rhnPackageChangeLogRec); > > QUERY PLAN > > ------------------------------------------------------------------------------------------------- > ----------------------------------------------- > Seq Scan on rhnpackagechangelogdata (cost=98365.01..108734.60 > rows=151984 width=6) (actual time > =2869.294..2869.294 rows=0 loops=1) > Filter: (NOT (hashed SubPlan 1)) > SubPlan 1 > -> HashAggregate (cost=97400.86..98172.18 rows=77132 width=8) > (actual time=2494.079..2577. > 750 rows=301937 loops=1) > -> Seq Scan on rhnpackagechangelogrec (cost=0.00..86237.89 > rows=4465189 width=8) (ac > tual time=0.011..741.078 rows=4466337 loops=1) > Total runtime: 2873.545 ms > (6 rows) > > ############## > Original... > ############## > [no output yet, if it completes before my remote ssh is cut, I'll report > it.] > > Best > -Jonathan > > On 08/19/2011 02:54 PM, Tomas Lestach wrote: > >Hello Jonathan, > > > >>Testdriving the two queries got me the following: > >>Original: ....still running since 30minutes now (top-CPU time) > >>modified: 3184ms > >The speed improvement is very nice. However here're my comments: > > - both DISTINCT and ORDER are syntactically correct for Oracle > > - ORDER BY cannot make any speed improvements, in opposite, it take some > > time to sort the query > > - regarding DISTINCT, I'm not sure, if it can have such a great impact on > > performance in this concrete case > > Could you, please, send explain plans for the original and the 'DISTINCT > > variant'? > > > > > >Regards, > >Tomas > >-- > >Tomas Lestach > >RHN Satellite Engineering, Red Hat > > > >_______________________________________________ > >Spacewalk-devel mailing list > >Spacewalk-devel@redhat.com > >https://www.redhat.com/mailman/listinfo/spacewalk-devel > > > > > -- > Jonathan Hoser, M.Sc. > Institute of Bioinformatics and System Biology > > WWW: http://mips.helmholtz-muenchen.de > > > > Helmholtz Zentrum München > Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH) > Ingolstädter Landstr. 1 > 85764 Neuherberg > www.helmholtz-muenchen.de > Aufsichtsratsvorsitzende: MinDir´in Bärbel Brumme-Bothe > Geschäftsführer: Prof. Dr. Günther Wess und Dr. Nikolaus Blum > Registergericht: Amtsgericht München HRB 6466 > USt-IdNr: DE 129521671 > > _______________________________________________ > Spacewalk-devel mailing list > Spacewalk-devel@redhat.com > https://www.redhat.com/mailman/listinfo/spacewalk-devel _______________________________________________ Spacewalk-devel mailing list Spacewalk-devel@redhat.com https://www.redhat.com/mailman/listinfo/spacewalk-devel