Re: [PERFORM] Trying to eliminate union and sort

2013-07-17 Thread Vitalii Tymchyshyn
I'd try to check why discounts are different. Join with 'or' should work.
Build (one query) except all (another query) and check some rows from
result.
 13 лип. 2013 01:28, Brian Fehrle bri...@consistentstate.com напис.

 On 07/11/2013 06:46 PM, Josh Berkus wrote:

 Brian,

  3. I'm trying to eliminate the union, however I have two problems.
 A) I can't figure out how to have an 'or' clause in a single join that
 would fetch all the correct rows. If I just do:
 LEFT OUTER JOIN table2 t2 ON (t2.real_id = t.id OR t2.real_id =
 t.backup_id), I end up with many less rows than the original query. B.

 I believe the issue with this is a row could have one of three
 possibilities:
 * part of the first query but not the second - results in 1 row after
 the union
 * part of the second query but not the first - results in 1 row after
 the union
 * part of the first query and the second - results in 2 rows after the
 union (see 'B)' for why)

 B) the third and fourth column in the SELECT will need to be different
 depending on what column the row is joined on in the LEFT OUTER JOIN to
 table2, so I may need some expensive case when logic to filter what is
 put there based on whether that row came from the first join clause, or
 the second.

 No, it doesn't:

 SELECT t.id,
 t.mycolumn1,
 table3.otherid as otherid1,
 table3a.otherid as otherid2,
 t.mycolumn2
 FROM t
 LEFT OUTER JOIN table2
ON ( t.id = t2.real_id OR t.backup_id = t2.real_id )
 LEFT OUTER JOIN table3
ON ( t.typeid = table3.id )
  LEFT OUTER JOIN table3 as table3a
 ON ( table2.third_id = table3.id )
 WHERE t.external_id IN ( ... )
 ORDER BY t.mycolumn2, t.id

 I tried this originally, however my resulting rowcount is different.

 The original query returns 9,955,729 rows
 This above one returns 7,213,906

 As for the counts on the tables:
 table1  3,653,472
 table2  2,191,314
 table325,676,589

 I think it's safe to assume right now that any resulting joins are not
 one-to-one

 - Brian F




 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.**
 org pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/**mailpref/pgsql-performancehttp://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] Trying to eliminate union and sort

2013-07-15 Thread Brian Fehrle

On 07/12/2013 04:43 PM, Josh Berkus wrote:

As for the counts on the tables:
table1  3,653,472
table2  2,191,314
table325,676,589

I think it's safe to assume right now that any resulting joins are not
one-to-one

Hmmm?  How is doing a subselect in the SELECT clause even working, then?

Oh my, this is sad. the query in all returns 9,955,729 rows, so the sub 
queries are run on each of these resulting rows, however in this entire 
result set, subquery 1 returns 16 distinct rows, subquery 2 returns 63 
different rows, but those sub queries are run over 10 million times to 
return these few distinct rows. So it's running many times, but 
returning the same small set of data over and over again.


- Brian F




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Trying to eliminate union and sort

2013-07-15 Thread Marc Mamin
Hello,

you may want to try starting with some CTE that first retrieve required subsets.
adding ordering  within those CTE might also improve the timing of following 
sort/join operations.
(sorry for the top posting)

Something like:

WITH T1 AS (
  SELECT id, typeid, backup_id, mycolumn1, mycolumn2
  FROM table1 WHERE t.external_id IN ('6544', '2234', '2', '4536')
  ORDER BY mycolumn2, id ?
 ),
 
TYPES AS (SELECT DISTINCT typeid FROM T1),

T3_OTHERS AS ( SELECT id, otherid FROM table3 JOIN TYPES ON table3.id = 
TYPES.typeid
   -- Order BY id ?  
   ), 

SELECT
T1.id,
T1.mycolumn1,
T3_OTHERS.otherid,
T3_2.otherid,
T1.mycolumn2 AS mycolumn2

FROM T1 
LEFT OUTER JOIN T3_OTHERS ON T1.typeid = T3_OTHERS.id
LEFT OUTER JOIN table2 t2 ON (t2.real_id = T1.backup_id OR t2.real_id = t.id
LEFT OUTER JOIN table3 T3_2  ON t2.third_id = T3_2.id
ORDER BY T1.mycolumn2,T1.id

regards,
Marc Mamin


Von: pgsql-performance-ow...@postgresql.org 
[pgsql-performance-ow...@postgresql.org]quot; im Auftrag von quot;Brian 
Fehrle [bri...@consistentstate.com]
Gesendet: Montag, 15. Juli 2013 18:12
An: pgsql-performance@postgresql.org
Betreff: Re: [PERFORM] Trying to eliminate union and sort

On 07/12/2013 04:43 PM, Josh Berkus wrote:
 As for the counts on the tables:
 table1  3,653,472
 table2  2,191,314
 table325,676,589

 I think it's safe to assume right now that any resulting joins are not
 one-to-one
 Hmmm?  How is doing a subselect in the SELECT clause even working, then?

Oh my, this is sad. the query in all returns 9,955,729 rows, so the sub
queries are run on each of these resulting rows, however in this entire
result set, subquery 1 returns 16 distinct rows, subquery 2 returns 63
different rows, but those sub queries are run over 10 million times to
return these few distinct rows. So it's running many times, but
returning the same small set of data over and over again.

- Brian F




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Trying to eliminate union and sort

2013-07-12 Thread Brian Fehrle

On 07/11/2013 06:46 PM, Josh Berkus wrote:

Brian,


3. I'm trying to eliminate the union, however I have two problems.
A) I can't figure out how to have an 'or' clause in a single join that
would fetch all the correct rows. If I just do:
LEFT OUTER JOIN table2 t2 ON (t2.real_id = t.id OR t2.real_id =
t.backup_id), I end up with many less rows than the original query. B.

I believe the issue with this is a row could have one of three
possibilities:
* part of the first query but not the second - results in 1 row after
the union
* part of the second query but not the first - results in 1 row after
the union
* part of the first query and the second - results in 2 rows after the
union (see 'B)' for why)

B) the third and fourth column in the SELECT will need to be different
depending on what column the row is joined on in the LEFT OUTER JOIN to
table2, so I may need some expensive case when logic to filter what is
put there based on whether that row came from the first join clause, or
the second.

No, it doesn't:

SELECT t.id,
t.mycolumn1,
table3.otherid as otherid1,
table3a.otherid as otherid2,
t.mycolumn2
FROM t
LEFT OUTER JOIN table2
   ON ( t.id = t2.real_id OR t.backup_id = t2.real_id )
LEFT OUTER JOIN table3
   ON ( t.typeid = table3.id )
 LEFT OUTER JOIN table3 as table3a
ON ( table2.third_id = table3.id )
WHERE t.external_id IN ( ... )
ORDER BY t.mycolumn2, t.id

I tried this originally, however my resulting rowcount is different.

The original query returns 9,955,729 rows
This above one returns 7,213,906

As for the counts on the tables:
table1  3,653,472
table2  2,191,314
table325,676,589

I think it's safe to assume right now that any resulting joins are not 
one-to-one


- Brian F




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Trying to eliminate union and sort

2013-07-12 Thread Josh Berkus

 As for the counts on the tables:
 table1  3,653,472
 table2  2,191,314
 table325,676,589
 
 I think it's safe to assume right now that any resulting joins are not
 one-to-one

Hmmm?  How is doing a subselect in the SELECT clause even working, then?


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Trying to eliminate union and sort

2013-07-11 Thread Brian Fehrle

Hi All,

(basic info)
PostgreSQL 9.2.4
64 bit Linux host
4GB shared_buffers with 14GB system memory, dedicated database VM
10MB work_mem


I have a query that takes over 6 minutes to complete, and it's due 
mainly to the two sorting operations being done on this query. The data 
it is returning itself is quite large, and with the two sorting 
operations it does (one because of the union, one because of the order 
by), it ends up using about *2.1GB* of temporary file space (and there 
is no way I can increase work_mem to hold this).


--[Query]
SELECT
t.id,
t.mycolumn1,
(SELECT table3.otherid FROM table3 WHERE table3.id = t.typeid),
(SELECT table3.otherid FROM table3 WHERE table3.id = t2.third_id),
t.mycolumn2 AS mycolumn2

FROM table1 t

LEFT OUTER JOIN table2 t2 ON t2.real_id = t.id

WHERE t.external_id IN ('6544', '2234', '2', '4536')

UNION

SELECT
t.id,
t.mycolumn1,
(SELECT table3.otherid FROM table3 WHERE table3.id = t.typeid),
(SELECT table3.otherid FROM table3 WHERE table3.id = t2.third_id),
t.mycolumn2 AS mycolumn2

FROM table1 t

LEFT OUTER JOIN table2 t2 ON t2.real_id = t.backup_id

WHERE t.external_id IN ('6544', '2234', '2', '4536')

ORDER BY t.mycolumn2, t.id;


--[Explain Analyze (sorry for the anonymizing)]
Sort  (cost=133824460.450..133843965.150 rows=7801882 width=56) (actual 
time=849405.656..894724.453 rows=9955729 loops=1)

Sort Key: romeo.three, romeo.quebec_seven
*Sort Method: external merge  Disk: 942856kB*
  -  Unique  (cost=132585723.530..132702751.760 rows=7801882 width=56) 
(actual time=535267.196..668982.694 rows=9955729 loops=1)
-  Sort  (cost=132585723.530..132605228.240 rows=7801882 
width=56) (actual time=535267.194..649304.631 rows=10792011 loops=1)
Sort Key: romeo.quebec_seven, romeo.golf, ((delta_four 
3)), ((delta_four 4)), romeo.three

*Sort Method: external merge  Disk: 1008216kB*
  -  Append  (cost=0.000..131464014.850 rows=7801882 
width=56) (actual time=0.798..291477.595 rows=10792011 loops=1)
-  Merge Left Join (cost=0.000..46340412.140 
rows=2748445 width=56) (actual time=0.797..70748.213 rows=3690431 loops=1)
Merge Cond: (romeo.quebec_seven = 
alpha_bravo.six_kilo)
  -  Index Scan using juliet on zulu romeo  
(cost=0.000..163561.710 rows=2748445 width=52) (actual 
time=0.019..11056.883 rows=3653472 loops=1)
  Filter: (delta_uniform = ANY 
('two'::integer[]))
  -  Index Only Scan using sierra on 
quebec_juliet alpha_bravo  (cost=0.000..86001.860 rows=2191314 width=8) 
(actual time=0.047..3996.543 rows=2191314 loops=1)

  Heap Fetches: 2191314
  SubPlan
-  Index Scan using oscar on six_delta  
(cost=0.000..8.380 rows=1 width=23) (actual time=0.009..0.009 rows=1 
loops=3690431)

Index Cond: (quebec_seven = romeo.lima)
  SubPlan
-  Index Scan using oscar on six_delta  
(cost=0.000..8.380 rows=1 width=23) (actual time=0.001..0.001 rows=0 
loops=3690431)
Index Cond: (quebec_seven = 
alpha_bravo.delta_november)
-  Merge Right Join (cost=0.000..85045583.890 
rows=5053437 width=56) (actual time=0.843..213450.477 rows=7101580 loops=1)
Merge Cond: (alpha_bravo.six_kilo = 
romeo.india)
  -  Index Only Scan using sierra on 
quebec_juliet alpha_bravo  (cost=0.000..86001.860 rows=2191314 width=8) 
(actual time=0.666..6165.870 rows=2191314 loops=1)

  Heap Fetches: 2191314
  -  Materialize (cost=0.000..193106.580 
rows=2748445 width=56) (actual time=0.134..25852.353 rows=7101580 loops=1)
-  Index Scan using alpha_seven on 
zulu romeo  (cost=0.000..186235.470 rows=2748445 width=56) (actual 
time=0.108..18439.857 rows=3653472 loops=1)
Filter: (delta_uniform = ANY 
('two'::integer[]))

  SubPlan
-  Index Scan using oscar on six_delta  
(cost=0.000..8.380 rows=1 width=23) (actual time=0.009..0.010 rows=1 
loops=7101580)

Index Cond: (quebec_seven = romeo.lima)
  SubPlan
-  Index Scan using oscar on six_delta  
(cost=0.000..8.380 rows=1 width=23) (actual time=0.007..0.008 rows=1 
loops=7101580)
Index Cond: (quebec_seven = 
alpha_bravo.delta_november)

--[end]

*My attempts:*
1. I tried to get an index set up on table1 that orders t.mycolumn2 and 
t.id so that the sorting operation might be skipped, however when table1 
is 

Re: [PERFORM] Trying to eliminate union and sort

2013-07-11 Thread Josh Berkus
Brian,

 3. I'm trying to eliminate the union, however I have two problems.
 A) I can't figure out how to have an 'or' clause in a single join that
 would fetch all the correct rows. If I just do:
 LEFT OUTER JOIN table2 t2 ON (t2.real_id = t.id OR t2.real_id =
 t.backup_id), I end up with many less rows than the original query. B.
 
 I believe the issue with this is a row could have one of three
 possibilities:
 * part of the first query but not the second - results in 1 row after
 the union
 * part of the second query but not the first - results in 1 row after
 the union
 * part of the first query and the second - results in 2 rows after the
 union (see 'B)' for why)
 
 B) the third and fourth column in the SELECT will need to be different
 depending on what column the row is joined on in the LEFT OUTER JOIN to
 table2, so I may need some expensive case when logic to filter what is
 put there based on whether that row came from the first join clause, or
 the second.

No, it doesn't:

SELECT t.id,
t.mycolumn1,
table3.otherid as otherid1,
table3a.otherid as otherid2,
t.mycolumn2
FROM t
LEFT OUTER JOIN table2
   ON ( t.id = t2.real_id OR t.backup_id = t2.real_id )
LEFT OUTER JOIN table3
   ON ( t.typeid = table3.id )
LEFT OUTER JOIN table3 as table3a
   ON ( table2.third_id = table3.id )
WHERE t.external_id IN ( ... )
ORDER BY t.mycolumn2, t.id

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance