On Mon, Aug 22, 2011 at 01:45:11PM +0200, Jonathan Hoser wrote:
> Hello Tomas,
> 
> yes, of course I know that an 'ORDER BY' will actually increase the CPU
> effort for the query.
> However, my consideration was: Why is the original query running eternally?
> My thoughts were: The IDs to be filtered against in the DELETE [...]
> WHERE id NOT IN ([subselect]) were randomly ordered;
> I know that working on a non-sorted data input is requiring much more
> effort than just going for a quick sort and working on the product.
> So there's the reason for that.
> 
> But let me get these 'explain's for you:
> ###########
> ORIGINAL:
> ###########
> spacewalk_1_5=# explain Delete from rhnPackageChangeLogData WHERE id NOT
> IN ( SELECT changelog_data_id FROM rhnPackageChangeLogRec);
>                                           QUERY PLAN
> ----------------------------------------------------------------------------------------------
>  Seq Scan on rhnpackagechangelogdata  (cost=108146.08..11134104860.94
> rows=151984 width=6)
>    Filter: (NOT (SubPlan 1))
>    SubPlan 1
>      ->  Materialize  (cost=108146.08..170240.97 rows=4465189 width=8)
>            ->  Seq Scan on rhnpackagechangelogrec  (cost=0.00..86237.89
> rows=4465189 width=8)
> (5 rows)
> 
> ################
> MODIFICATION:
> ################
> spacewalk_1_5=# explain Delete from rhnPackageChangeLogData WHERE id NOT
> IN ( SELECT DISTINCT changelog_data_id FROM rhnPackageChangeLogRec ORDER
> BY changelog_data_id ASC);
>                                              QUERY PLAN
> 
> -------------------------------------------------------------------------------------------------
> ---
>  Seq Scan on rhnpackagechangelogdata  (cost=104819.05..115188.64
> rows=151984 width=6)
>    Filter: (NOT (hashed SubPlan 1))
>    SubPlan 1
>      ->  Sort  (cost=104433.39..104626.22 rows=77132 width=8)
>            Sort Key: rhnpackagechangelogrec.changelog_data_id
>            ->  HashAggregate  (cost=97400.86..98172.18 rows=77132 width=8)
>                  ->  Seq Scan on rhnpackagechangelogrec
> (cost=0.00..86237.89 rows=4465189 width=
> 8)
> (7 rows)
> ####################
> 

All right.
I see there is some speed improvement in the DISTINCT variant for Postgresql.
We've tried to check explain plans for Oracle and there was no difference.
Committing as:
    c529c7ea18211e9b5f0ee95469625893d9f8e30e

The fix will be available in nightly repo in spacewalk-java-1.6.28-1.

Regards,
Tomas
--
Tomas Lestach
RHN Satellite Engineering, Red Hat


> Ok, so I went for an 'EXPLAIN ANALYZE', on three queries:
> The Original, my 'original' modification and modification without the
> 'ORDER BY';
> Turns out, without the ORDER BY, the query is another 600msec faster,
> while the 'Explain Analyze' of the original is again *still* executing
> (since 5++ Minutes).
> So my bad for not testing without the ORDER BY - a hunch of mine in the
> wrong direction.
> 
> Anyhow: Here are the 'EXPLAIN ANALYSE' of Mod, and Mod-without-ORDER-BY:
> 
> #############
> Modified:
> #############
> spacewalk_1_5=# explain analyze Delete from rhnPackageChangeLogData
> WHERE id NOT IN ( SELECT DISTINCT changelog_data_id FROM
> rhnPackageChangeLogRec ORDER BY changelog_data_id ASC);
> 
> QUERY PLAN
> 
> -------------------------------------------------------------------------------------------------
> -----------------------------------------------------
>  Seq Scan on rhnpackagechangelogdata  (cost=104819.05..115188.64
> rows=151984 width=6) (actual tim
> e=3476.682..3476.682 rows=0 loops=1)
>    Filter: (NOT (hashed SubPlan 1))
>    SubPlan 1
>      ->  Sort  (cost=104433.39..104626.22 rows=77132 width=8) (actual
> time=3149.620..3192.113 row
> s=301937 loops=1)
>            Sort Key: rhnpackagechangelogrec.changelog_data_id
>            Sort Method:  quicksort  Memory: 26442kB
>            ->  HashAggregate  (cost=97400.86..98172.18 rows=77132
> width=8) (actual time=2473.983.
> .2553.576 rows=301937 loops=1)
>                  ->  Seq Scan on rhnpackagechangelogrec
> (cost=0.00..86237.89 rows=4465189 width=
> 8) (actual time=0.008..730.139 rows=4466337 loops=1)
>  Total runtime: 3483.172 ms
> (9 rows)
> 
> ############
> Modified without ORDER-BY
> ############
> spacewalk_1_5=# explain analyze Delete from rhnPackageChangeLogData
> WHERE id NOT IN ( SELECT DISTINCT changelog_data_id FROM
> rhnPackageChangeLogRec);
> 
> QUERY PLAN
> 
> -------------------------------------------------------------------------------------------------
> -----------------------------------------------
>  Seq Scan on rhnpackagechangelogdata  (cost=98365.01..108734.60
> rows=151984 width=6) (actual time
> =2869.294..2869.294 rows=0 loops=1)
>    Filter: (NOT (hashed SubPlan 1))
>    SubPlan 1
>      ->  HashAggregate  (cost=97400.86..98172.18 rows=77132 width=8)
> (actual time=2494.079..2577.
> 750 rows=301937 loops=1)
>            ->  Seq Scan on rhnpackagechangelogrec  (cost=0.00..86237.89
> rows=4465189 width=8) (ac
> tual time=0.011..741.078 rows=4466337 loops=1)
>  Total runtime: 2873.545 ms
> (6 rows)
> 
> ##############
> Original...
> ##############
> [no output yet, if it completes before my remote ssh is cut, I'll report
> it.]
> 
> Best
> -Jonathan
> 
> On 08/19/2011 02:54 PM, Tomas Lestach wrote:
> >Hello Jonathan,
> >
> >>Testdriving the two queries got me the following:
> >>Original:    ....still running since 30minutes now (top-CPU time)
> >>modified:  3184ms
> >The speed improvement is very nice. However here're my comments:
> >  - both DISTINCT and ORDER are syntactically correct for Oracle
> >  - ORDER BY cannot make any speed improvements, in opposite, it take some 
> > time to sort the query
> >  - regarding DISTINCT, I'm not sure, if it can have such a great impact on 
> > performance in this concrete case
> >    Could you, please, send explain plans for the original and the 'DISTINCT 
> > variant'?
> >
> >
> >Regards,
> >Tomas
> >--
> >Tomas Lestach
> >RHN Satellite Engineering, Red Hat
> >
> >_______________________________________________
> >Spacewalk-devel mailing list
> >Spacewalk-devel@redhat.com
> >https://www.redhat.com/mailman/listinfo/spacewalk-devel
> >
> 
> 
> --
> Jonathan Hoser, M.Sc.
> Institute of Bioinformatics and System Biology
> 
> WWW: http://mips.helmholtz-muenchen.de
> 
> 
> 
> Helmholtz Zentrum München
> Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH)
> Ingolstädter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de
> Aufsichtsratsvorsitzende: MinDir´in Bärbel Brumme-Bothe
> Geschäftsführer: Prof. Dr. Günther Wess und Dr. Nikolaus Blum
> Registergericht: Amtsgericht München HRB 6466
> USt-IdNr: DE 129521671
> 
> _______________________________________________
> Spacewalk-devel mailing list
> Spacewalk-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/spacewalk-devel

_______________________________________________
Spacewalk-devel mailing list
Spacewalk-devel@redhat.com
https://www.redhat.com/mailman/listinfo/spacewalk-devel

Reply via email to