Re: [PERFORM] FW: performance issue with a 2.5gb joinded table

2013-01-09 Thread Vladimir Sitnikov
Daniel,

Somehow oracle seems to know that a right join is the better way to go.
In fact, PostgreSQL is just doing the same thing: it hashes smaller table
and scans the bigger one.

Could you please clarify how do you consume 25M rows?
It could be the difference of response times comes not from the PostgreSQL
itself, but from the client code.

Could you please add the following information?
1) Execution time of simple query that selects MAX of all the required
columns select max(test1.slsales_batch) , max(test1.slsales_checksum),

I mean not explain (analyze, buffers), but simple execution.
The purpose of MAX is to split overhead of consuming of the resultset from
the overhead of producing it.

2) explain (analyze, buffers) for the same query with maxes. That should
reveal the overhead of explain analyze itself.

3) The output of the following SQLPlus script (from Oracle):
  set linesize 1000 pagesize 1 trimout on trimspool on time on timing on
  spool slow_query.lst
  select /*+ gather_plan_statistics */ max(test1.slsales_batch) ,
max(test1.slsales_checksum), ..;
  select * from table(dbms_xplan.display_cursor(null, null, 'ALLSTATS
LAST'));
  spool off

  That would display detailed statistics on execution time similar to the
explain (analyze, buffers).

4) Could you please clarify how did you migrate test1 table?
I guess the order of rows in that table might affect overall execution time.
Sorted table would be more CPU cache friendly, thus giving speedup. (see
[1] for similar example).
As far as I understand, simple create table as select * from test1 order by
slsales_date_id, slsales_prod_id should improve cache locality.


[1]:
http://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-an-unsorted-array

-- 
Regards,
Vladimir Sitnikov


Re: [PERFORM] FW: performance issue with a 2.5gb joinded table

2013-01-04 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 One difference is that numerics are stored more tightly packed on 
 Oracle. Which is particularly good for Oracle as they don't have other 
 numeric data types than number. On PostgreSQL, you'll want to use int4 
 for ID-fields, where possible. An int4 always takes up 4 bytes, while a 
 numeric holding an integer value in the same range is typically 5-9 bytes.

Replacing those numeric(8) and numeric(16) fields with int4 and int8
would be greatly beneficial to comparison and hashing performance,
not just table size.  I'm a bit surprised that EDB's porting tools
evidently don't do this automatically (I infer from the reference to
PPAS that the OP is using EDB ...)

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] FW: performance issue with a 2.5gb joinded table

2013-01-04 Thread Daniel Westermann
-Original Message-
From: Tom Lane [mailto:t...@sss.pgh.pa.us] 
Sent: Freitag, 4. Januar 2013 21:41
To: Heikki Linnakangas
Cc: Daniel Westermann; 'pgsql-performance@postgresql.org'
Subject: Re: [PERFORM] FW: performance issue with a 2.5gb joinded table

Heikki Linnakangas hlinnakan...@vmware.com writes:
 One difference is that numerics are stored more tightly packed on 
 Oracle. Which is particularly good for Oracle as they don't have other 
 numeric data types than number. On PostgreSQL, you'll want to use int4 
 for ID-fields, where possible. An int4 always takes up 4 bytes, while 
 a numeric holding an integer value in the same range is typically 5-9 bytes.

 Replacing those numeric(8) and numeric(16) fields with int4 and int8 would 
 be greatly beneficial to comparison and hashing performance, not just table 
 size.  I'm a  bit surprised that EDB's porting tools evidently don't do 
 this automatically (I infer from the reference to PPAS that the OP is using 
 EDB ...)

  regards, tom lane

Thanks, tom. Any clue where there remaining around 500mb difference come from ? 
converted all the numeric(8) to int and this saved around 380mb of storage and 
around 10 secs exectution time... both databases have their files on standard 
ext3, same fs options. Given that the table has around 25'000'000 rows this is 
still approx. 20 bytes more per row on average

Regards
Daniel


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] FW: performance issue with a 2.5gb joinded table

2013-01-03 Thread Daniel Westermann
Hi Listers,

we migrated an oracle datawarehouse to postgresql 9.1 ( ppas 9.1.7.12 ) and are 
facing massive issues with response times in postgres when compared to the 
oracle system. Both database run on the same hardware and storage ( rhel5.8 
64bit ).

Oracle memory parameters are:
SGA=1gb
PGA=200mb

Postgres currently runs with 15gb of shared buffers ( that’s because the big 
table in question is around 2.5gb in size and one suggestion was to increase 
that much so postgresql will cache the complete table. and this is the case now 
).

explain (analyze,buffers) SELECT test1.slsales_batch
 , test1.slsales_checksum
 , test1.slsales_reg_id
 , test1.slsales_prod_id
 , test1.slsales_date_id
 , test1.slsales_pos_id
 , test1.slsales_amt_sales_gross
 , test1.slsales_amt_sales_discount
 , test1.slsales_units_sales_gross
 , test1.slsales_amt_returns
 , test1.slsales_amt_returns_discount
 , test1.slsales_units_returns
 , (test1.slsales_amt_sales_gross - test1.slsales_amt_returns)
 * mgmt_fact_winratio.winratio_ratio AS slsales_amt_est_winnings
 , mgmt_fact_winratio.winratio_ratio AS slsales_ratio
  FROM mgmtt_own.test1
   LEFT JOIN mgmtt_own.mgmt_fact_winratio
 ON mgmt_fact_winratio.winratio_date_id = test1.slsales_date_id

Oracle’s explain plan looks like this:


| Id  | Operation | Name   | Rows  | Bytes |TempSpc| 
Cost (%CPU)| Time |

|   0 | SELECT STATEMENT  ||25M|  1527M|   |   
115K  (3)| 00:23:10 |
|*  1 |  HASH JOIN RIGHT OUTER||25M|  1527M|  4376K|   
115K  (3)| 00:23:10 |
|   2 |   TABLE ACCESS FULL   | MGMT_FACT_WINRATIO |   159K|  2498K|   |   
167   (5)| 00:00:03 |
|   3 |   TABLE ACCESS FULL   | TEST1  |25M|  1139M|   | 
43435   (5)| 00:08:42 |

Predicate Information (identified by operation id):
---
   1 - 
access(MGMT_FACT_WINRATIO.WINRATIO_PROD_ID(+)=TEST1.SLSALES_PROD_ID AND
  
MGMT_FACT_WINRATIO.WINRATIO_DATE_ID(+)=TEST1.SLSALES_DATE_ID)

Somehow oracle seems to know that a right join is the better way to go.

Postgres’s explain plan:

   QUERY PLAN

Hash Left Join  (cost=3948.52..13646089.21 rows=25262160 width=61) (actual 
time=260.642..81240.692 rows=25262549 loops=1)
   Hash Cond: ((test1.slsales_date_id = mgmt_fact_winratio.winratio_date_id) 
AND (test1.slsales_prod_id = mgmt_fact_winratio.winratio_prod_id))
   Buffers: shared hit=306590
   -  Seq Scan on test1  (cost=0.00..254148.75 rows=25262160 width=56) (actual 
time=0.009..15674.535 rows=25262161 loops=1)
 Buffers: shared hit=305430
   -  Hash  (cost=1582.89..1582.89 rows=157709 width=19) (actual 
time=260.564..260.564 rows=157709 loops=1)
 Buckets: 16384  Batches: 1  Memory Usage: 7855kB
 Buffers: shared hit=1160
 -  Seq Scan on mgmt_fact_winratio  (cost=0.00..1582.89 rows=157709 
width=19) (actual time=0.008..114.406 rows=157709 loops=1)
   Buffers: shared hit=1160
Total runtime: 95762.025 ms
(11 rows)

Tried to modify the statement according to oracle’s plan, but this did not help:

explain (analyze,buffers) SELECT test1.slsales_batch
 , test1.slsales_checksum
 , test1.slsales_reg_id
 , test1.slsales_prod_id
 , test1.slsales_date_id
 , test1.slsales_pos_id
 , test1.slsales_amt_sales_gross
 , test1.slsales_amt_sales_discount
 , test1.slsales_units_sales_gross
 , test1.slsales_amt_returns
 , test1.slsales_amt_returns_discount
 , test1.slsales_units_returns
 , (test1.slsales_amt_sales_gross - test1.slsales_amt_returns)
 * mgmt_fact_winratio.winratio_ratio AS slsales_amt_est_winnings
 , mgmt_fact_winratio.winratio_ratio AS slsales_ratio
  FROM mgmtt_own.test1
 , mgmtt_own.mgmt_fact_winratio
 WHERE mgmt_fact_winratio.winratio_prod_id(+) = test1.slsales_prod_id
   AND mgmt_fact_winratio.winratio_date_id(+) = test1.slsales_date_id
;
  QUERY PLAN

Hash Left Join  (cost=3948.52..13646089.21 rows=25262160 width=61) (actual 
time=276.605..80629.400 rows=25262549 loops=1)
   Hash Cond: ((test1.slsales_prod_id = mgmt_fact_winratio.winratio_prod_id) 
AND (test1.slsales_date_id = 

Re: [PERFORM] FW: performance issue with a 2.5gb joinded table

2013-01-03 Thread Heikki Linnakangas

On 03.01.2013 15:30, Daniel Westermann wrote:

What additionally makes me wonder is, that the same table in oracle is taking 
much less space than in postgresql:

SQL  select  sum(bytes) from dba_extents where segment_name = 'TEST1';
SUM(BYTES)
--
1610612736

select pg_relation_size('mgmtt_own.test1');
pg_relation_size
--
2502082560
(1 row)

(sysdba@[local]:) [bi_dwht]  \d+ mgmtt_own.test1
  Table mgmtt_own.test1
 Column| Type  | Modifiers | Storage | 
Description
--+---+---+-+-
slsales_batch| numeric(8,0)  |   | main|
slsales_checksum | numeric(8,0)  |   | main|
slsales_reg_id   | numeric(8,0)  |   | main|
slsales_prod_id  | numeric(8,0)  |   | main|
slsales_date_id  | numeric(8,0)  |   | main|
slsales_pos_id   | numeric(8,0)  |   | main|
slsales_amt_sales_gross  | numeric(16,6) |   | main|
slsales_amt_sales_discount   | numeric(16,6) |   | main|
slsales_units_sales_gross| numeric(8,0)  |   | main|
slsales_amt_returns  | numeric(16,6) |   | main|
slsales_amt_returns_discount | numeric(16,6) |   | main|
slsales_units_returns| numeric(8,0)  |   | main|
slsales_amt_est_winnings | numeric(16,6) |   | main|
Indexes:
 itest1 btree (slsales_date_id) CLUSTER, tablespace mgmtt_idx
 itest2 btree (slsales_prod_id), tablespace mgmtt_idx
Has OIDs: no
Tablespace: mgmtt_dat


One difference is that numerics are stored more tightly packed on 
Oracle. Which is particularly good for Oracle as they don't have other 
numeric data types than number. On PostgreSQL, you'll want to use int4 
for ID-fields, where possible. An int4 always takes up 4 bytes, while a 
numeric holding an integer value in the same range is typically 5-9 bytes.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] FW: performance issue with a 2.5gb joinded table

2013-01-03 Thread Daniel Westermann
-Original Message-
From: Heikki Linnakangas [mailto:hlinnakan...@vmware.com] 
Sent: Donnerstag, 3. Januar 2013 18:02
To: Daniel Westermann
Cc: 'pgsql-performance@postgresql.org'
Subject: Re: [PERFORM] FW: performance issue with a 2.5gb joinded table

On 03.01.2013 15:30, Daniel Westermann wrote:
 What additionally makes me wonder is, that the same table in oracle is taking 
 much less space than in postgresql:

 SQL  select  sum(bytes) from dba_extents where segment_name = 
 SQL 'TEST1';
 SUM(BYTES)
 --
 1610612736

 select pg_relation_size('mgmtt_own.test1');
 pg_relation_size
 --
 2502082560
 (1 row)

 (sysdba@[local]:) [bi_dwht]  \d+ mgmtt_own.test1
   Table mgmtt_own.test1
  Column| Type  | Modifiers | Storage | 
 Description
 --+---+---+-+-
 --+---+---+-+-
 --+---+---+-+-
 --+---+---+-+-
 --+---+---+-+-
 --+---+---+-+-
 --+---+---+-+-
 --+---+---+-+-
 --+---+---+-+-
 --+---+---+-+-
 --+---+---+-+-
 --+---+---+-+-
 --+---+---+-+-
 slsales_batch| numeric(8,0)  |   | main|
 slsales_checksum | numeric(8,0)  |   | main|
 slsales_reg_id   | numeric(8,0)  |   | main|
 slsales_prod_id  | numeric(8,0)  |   | main|
 slsales_date_id  | numeric(8,0)  |   | main|
 slsales_pos_id   | numeric(8,0)  |   | main|
 slsales_amt_sales_gross  | numeric(16,6) |   | main|
 slsales_amt_sales_discount   | numeric(16,6) |   | main|
 slsales_units_sales_gross| numeric(8,0)  |   | main|
 slsales_amt_returns  | numeric(16,6) |   | main|
 slsales_amt_returns_discount | numeric(16,6) |   | main|
 slsales_units_returns| numeric(8,0)  |   | main|
 slsales_amt_est_winnings | numeric(16,6) |   | main|
 Indexes:
  itest1 btree (slsales_date_id) CLUSTER, tablespace mgmtt_idx
  itest2 btree (slsales_prod_id), tablespace mgmtt_idx
 Has OIDs: no
 Tablespace: mgmtt_dat

One difference is that numerics are stored more tightly packed on Oracle. Which 
is particularly good for Oracle as they don't have other numeric data types 
than number. On PostgreSQL, you'll want to use int4 for ID-fields, where 
possible. An int4 always takes up 4 bytes, while a numeric holding an integer 
value in the same range is typically 5-9 bytes.

- Heikki

Thanks for poiting that out, Heikki.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance