[moving to pgsql-sql]
On Thu, 11 Jul 2002 17:22:14 +0200, "Luis Alberto Amigo Navarro"
<[EMAIL PROTECTED]> wrote:
>I can't improve performance on this query:
>
>SELECT
> supplier.name,
> supplier.address
>FROM
> supplier,
> nation
>WHERE
> supplier.suppkey IN(
>  SELECT
>   partsupp.suppkey
>  FROM
>   partsupp
>  WHERE
>   partsupp.partkey IN(
>    SELECT
>     part.partkey
>    FROM
>     part
>    WHERE
>     part.name like 'forest%'
>     )
>   AND partsupp.availqty>(
>    SELECT
>     0.5*(sum(lineitem.quantity)::FLOAT)
>    FROM
>     lineitem
>    WHERE
>     lineitem.partkey=partsupp.partkey
>     AND lineitem.suppkey=partsupp.partkey
                                    ^^^^^^^
                                    suppkey ???
>     AND lineitem.shipdate>=('1994-01-01')::DATE
>     AND lineitem.shipdate<(('1994-01-01')::DATE+('1 year')::INTERVAL)::DATE
>     )
>  )
> AND supplier.nationkey=nation.nationkey
> AND nation.name='CANADA'
>ORDER BY
> supplier.name;

Luis,
rules of thumb: "Avoid subselects; use joins!" and "If you have to use
subselects, avoid IN, use EXISTS!"

Let's try.  If partkey is unique in part, then
|   FROM partsupp
|   WHERE partsupp.partkey IN (SELECT part.partkey

can be replaced by
    FROM partsupp ps, part p
    WHERE ps.partkey = p.partkey

or
    partsupp ps INNER JOIN part p
      ON (ps.partkey = p.partkey AND p.name LIKE '...')

When we ignore "part" for now, your subselect boils down to

|  SELECT partsupp.suppkey
|  FROM partsupp
|  WHERE partsupp.availqty > (
|    SELECT 0.5*(sum(lineitem.quantity)::FLOAT)
|    FROM lineitem
|    WHERE lineitem.partkey=partsupp.partkey
|      AND lineitem.suppkey=partsupp.suppkey
|      AND lineitem.shipdate BETWEEN ... AND ...
|     )

which can be rewritten to (untested)

  SELECT ps.suppkey
  FROM partsupp ps, lineitem li
  WHERE li.partkey=ps.partkey
    AND li.suppkey=ps.suppkey
    AND lineitem.shipdate BETWEEN ... AND ...
  GROUP BY ps.partkey, ps.suppkey
  HAVING min(ps.availqty) > 0.5*(sum(lineitem.quantity)::FLOAT)
         ^^^
         As all ps.availqty are equal in one group, you can as well
use max() or avg().

Now we have left only one IN:
| WHERE supplier.suppkey IN (
|  SELECT partsupp.suppkey FROM partsupp WHERE <condition> )

Being to lazy to find out, if this can be rewritten to a join, let`s
apply rule 2 here:

  WHERE EXISTS (
    SELECT ... FROM partsupp ps
    WHERE supplier.suppkey = ps.suppkey
      AND <condition> )

HTH, but use with a grain of salt ...

>Sort  (cost=2777810917708.17..2777810917708.17 rows=200 width=81)
             ^^^^^^^^^^^^^^^^
BTW, how many years are these? :-)

Servus
 Manfred

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Reply via email to