Re: [PERFORM] PostgreSQL planner

2013-05-10 Thread Robert Haas
On Sat, Mar 23, 2013 at 8:12 PM, Misa Simic misa.si...@gmail.com wrote:
 but problem is - we don't know the thing id - we know calc_id:

 EXPLAIN ANALYZE
 SELECT * FROM t2_left_t3_volatile v INNER JOIN t1 USING (thing_id)
 WHERE calc_id = 20

With this query you've got to scan all three tables.  The calc_id qual
can only be pushed down into the scan on t1, so you need the whole
t2/t3 join product.

 EXPLAIN ANALYZE
 SELECT v.no_index FROM t2_left_t3 v INNER JOIN t1 USING (thing_id)
 WHERE calc_id = 20

With this query you only need to scan 2 tables.  The join between t2
and t3 is eliminated by the join removal code in favor of scanning
only t2, as shown in the plan you included:

 Nested Loop  (cost=437.49..13047.74 rows=12111 width=4) (actual
 time=6.360..71.818 rows=12038 loops=1)
   -  Seq Scan on t1  (cost=0.00..2.25 rows=1 width=4) (actual
 time=0.016..0.024 rows=1 loops=1)
 Filter: (calc_id = 20)
 Rows Removed by Filter: 99
   -  Bitmap Heap Scan on t2  (cost=437.49..12924.38 rows=12111 width=12)
 (actual time=6.330..69.063 rows=12038 loops=1)
 Recheck Cond: (thing_id = t1.thing_id)
 -  Bitmap Index Scan on t5_c2_idx  (cost=0.00..434.46 rows=12111
 width=0) (actual time=4.372..4.372 rows=12038 loops=1)
   Index Cond: (thing_id = t1.thing_id)
 Total runtime: 72.461 ms

The difference is that this query has only one column in its target list, not *.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL planner

2013-05-10 Thread Misa Simic
On Friday, May 10, 2013, Robert Haas wrote:

 On Sat, Mar 23, 2013 at 8:12 PM, Misa Simic 
 misa.si...@gmail.comjavascript:;
 wrote:
  but problem is - we don't know the thing id - we know calc_id:
 
  EXPLAIN ANALYZE
  SELECT * FROM t2_left_t3_volatile v INNER JOIN t1 USING (thing_id)
  WHERE calc_id = 20

 With this query you've got to scan all three tables.  The calc_id qual
 can only be pushed down into the scan on t1, so you need the whole
 t2/t3 join product.

  EXPLAIN ANALYZE
  SELECT v.no_index FROM t2_left_t3 v INNER JOIN t1 USING (thing_id)
  WHERE calc_id = 20

 With this query you only need to scan 2 tables.  The join between t2
 and t3 is eliminated by the join removal code in favor of scanning
 only t2, as shown in the plan you included:

  Nested Loop  (cost=437.49..13047.74 rows=12111 width=4) (actual
  time=6.360..71.818 rows=12038 loops=1)
-  Seq Scan on t1  (cost=0.00..2.25 rows=1 width=4) (actual
  time=0.016..0.024 rows=1 loops=1)
  Filter: (calc_id = 20)
  Rows Removed by Filter: 99
-  Bitmap Heap Scan on t2  (cost=437.49..12924.38 rows=12111
 width=12)
  (actual time=6.330..69.063 rows=12038 loops=1)
  Recheck Cond: (thing_id = t1.thing_id)
  -  Bitmap Index Scan on t5_c2_idx  (cost=0.00..434.46
 rows=12111
  width=0) (actual time=4.372..4.372 rows=12038 loops=1)
Index Cond: (thing_id = t1.thing_id)
  Total runtime: 72.461 ms

 The difference is that this query has only one column in its target list,
 not *.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company



Thanks Robert,

That is a bit old problem to us...

Solution for that kind of problems is: rephrase the question. So we have
added one more layer to transform input query to better query for
postgres - very wierd...

However, there are no differences... Planer use the same bad plan for:

 SELECT v.no_index FROM t2_left_t3_volatile v INNER JOIN t1 USING (thing_id)
WHERE calc_id = 20

In that is one column as well...

We basicaly above query transform to:

SELECT v.no_index FROM t2_left_t3_volatile v
WHERE v.thing_id = (
SELECT thing_id FROM t1
WHERE calc_id = 20
)

What give us good result... Very wierd

Thanks,
Misa


[PERFORM] PostgreSQL planner

2013-03-23 Thread Misa Simic
HI,

I have a wierd problem with PostgreSQL planner...

Problem showed up in Production on PG9.1 (Ubuntu)

But I have succeeded to get the same behavior on my PG 9.2 on Windows...

it is about 3 tables  onad one view - but view have volatile function:


CREATE TABLE t1
(
  calc_id serial NOT NULL,
  thing_id integer,
  CONSTRAINT t1_pk PRIMARY KEY (calc_id)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE t1
  OWNER TO postgres;

-- Index: t1_thing_id_idx

-- DROP INDEX t1_thing_id_idx;

CREATE INDEX t1_thing_id_idx
  ON t1
  USING btree
  (thing_id);


other columns from this real table are discarted - and not important, what
is important is that in the moment I want to run the query... I know
calc_id (pk of this table - but don't know thing_id)...

to simplify test I filled t1 with 100 rows with same values in calc_id and
thing_id...

Second table are transactions about things:

CREATE TABLE t2
(
  trans_id serial NOT NULL,
  thing_id integer,
  no_index integer,
  CONSTRAINT t2_pk PRIMARY KEY (trans_id)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE t2
  OWNER TO postgres;

-- Index: t5_c2_idx

-- DROP INDEX t5_c2_idx;

CREATE INDEX t5_c2_idx
  ON t2
  USING btree
  (thing_id);


this table I have filled with 1m rows with rundom number in thing_id
between 1 and 100

when we enter transaction about thing to t2, in some moment we could have
additional info about the thing, in some moment not... so if we have
additional info in the same time row is inserted in t2 and t3 with the same
trans_id...

CREATE TABLE t3
(
  trans_id integer NOT NULL,
  c2_text text,
  CONSTRAINT t3_pk PRIMARY KEY (trans_id)
)
WITH (
  OIDS=FALSE
);

no additional indexes on t3...

now we have made a view:

CREATE OR REPLACE VIEW t2_left_t3_volatile AS
 SELECT t2.trans_id, t2.thing_id, t2.no_index, t3.c2_text, random() AS
random
   FROM t2
   LEFT JOIN t3 USING (trans_id);

And here we go:

we want see all transactions about the thing_id

EXPLAIN ANALYZE
SELECT * FROM t2_left_t3_volatile
WHERE thing_id = 20

everything is fine:

Hash Left Join  (cost=452.46..13067.16 rows=12474 width=45) (actual
time=6.537..62.633 rows=12038 loops=1)
  Hash Cond: (t2.trans_id = t3.trans_id)
  -  Bitmap Heap Scan on t2  (cost=448.30..12985.03 rows=12474 width=12)
(actual time=6.418..57.498 rows=12038 loops=1)
Recheck Cond: (thing_id = 20)
-  Bitmap Index Scan on t5_c2_idx  (cost=0.00..445.18 rows=12474
width=0) (actual time=4.429..4.429 rows=12038 loops=1)
  Index Cond: (thing_id = 20)
  -  Hash  (cost=2.96..2.96 rows=96 width=37) (actual time=0.086..0.086
rows=96 loops=1)
Buckets: 1024  Batches: 1  Memory Usage: 7kB
-  Seq Scan on t3  (cost=0.00..2.96 rows=96 width=37) (actual
time=0.016..0.045 rows=96 loops=1)
Total runtime: 63.217 ms

but problem is - we don't know the thing id - we know calc_id:

EXPLAIN ANALYZE
SELECT * FROM t2_left_t3_volatile v INNER JOIN t1 USING (thing_id)
WHERE calc_id = 20

and planner picks:

Hash Join  (cost=6.42..48367.52 rows=12111 width=4) (actual
time=0.261..471.042 rows=12038 loops=1)
  Hash Cond: (t2.thing_id = t1.thing_id)
  -  Hash Left Join  (cost=4.16..31591.51 rows=1211101 width=45) (actual
time=0.161..394.076 rows=1211101 loops=1)
Hash Cond: (t2.trans_id = t3.trans_id)
-  Seq Scan on t2  (cost=0.00..24017.01 rows=1211101 width=12)
(actual time=0.075..140.937 rows=1211101 loops=1)
-  Hash  (cost=2.96..2.96 rows=96 width=37) (actual
time=0.069..0.069 rows=96 loops=1)
  Buckets: 1024  Batches: 1  Memory Usage: 7kB
  -  Seq Scan on t3  (cost=0.00..2.96 rows=96 width=37)
(actual time=0.008..0.035 rows=96 loops=1)
  -  Hash  (cost=2.25..2.25 rows=1 width=4) (actual time=0.035..0.035
rows=1 loops=1)
Buckets: 1024  Batches: 1  Memory Usage: 1kB
-  Seq Scan on t1  (cost=0.00..2.25 rows=1 width=4) (actual
time=0.017..0.030 rows=1 loops=1)
  Filter: (calc_id = 20)
  Rows Removed by Filter: 99
Total runtime: 471.505 ms

Seq scan on all tables...

First thought was - maybe because of volatile function...

but on:
SELECT v.no_index FROM t2_left_t3_volatile v INNER JOIN t1 USING (thing_id)
WHERE calc_id = 20

planner picks the same scenario... even function column is not in the
query...

however, situation is fine, if we have a view without the volatile function:

CREATE OR REPLACE VIEW t2_left_t3 AS
 SELECT t2.trans_id, t2.thing_id, t2.no_index, t3.c2_text
   FROM t2
   LEFT JOIN t3 USING (trans_id);

EXPLAIN ANALYZE
SELECT v.no_index FROM t2_left_t3 v INNER JOIN t1 USING (thing_id)
WHERE calc_id = 20


Nested Loop  (cost=437.49..13047.74 rows=12111 width=4) (actual
time=6.360..71.818 rows=12038 loops=1)
  -  Seq Scan on t1  (cost=0.00..2.25 rows=1 width=4) (actual
time=0.016..0.024 rows=1 loops=1)
Filter: (calc_id = 20)
Rows Removed by Filter: 99
  -  Bitmap Heap Scan on t2  (cost=437.49..12924.38 rows=12111 width=12)
(actual time=6.330..69.063 rows=12038 loops=1)
Recheck Cond: (thing_id