Re: [PERFORM] Performance Issue -- "Materialize"

2017-08-21 Thread Jeff Janes
On Sat, Aug 19, 2017 at 10:37 AM, anand086  wrote:

Your email is very hard to read, the formatting and line wrapping is
heavily mangled.  You might want to attach the plans as files attachments
instead of or in addition to putting the in the body.



>  -> Index Only Scan using ui_nkey_test_tab on test_tab test_tab_1
> (cost=0.56..2.03 rows=1 width=8) |
>
>  Index Cond: ((import_num = '4520460'::numeric) AND (login IS NOT 
> NULL))
>
>
It looks like the statistics for your table are desperately out of date, as
a later query showed there are 762599 rows (unless login is null for all of
them) but the above is estimating there is only one.   When was the table
last analyzed?

Cheers,

Jeff

On Sat, Aug 19, 2017 at 10:37 AM, anand086  wrote:

> I am a Postgres Newbie and trying to learn :) We have a scenario wherein,
> one of the SQL with different input value for import_num showing different
> execution plan. As an example, with import_num = '4520440' the execution
> plan shows Nested Loop and is taking ~12secs. With import_num = '4520460'
> execution plan showed using "Materialize" and never completed. After I set
> enable_material to off, the execution plan is changed using Hash Semi Join
> and completes in less than 3 secs. SELECT count(*) FROM test_tab WHERE
> login IN (SELECT DISTINCT login FROM test_tab WHERE import_num = '4520440'
> AND login IS NOT NULL EXCEPT SELECT DISTINCT login FROM test_tab WHERE
> import_num = '0' AND login IS NOT NULL) AND import_num = '4520440';
> ++ | count | ++ | 746982 | ++ (1 row) Time:
> 12054.274 ms
>
> +---+
> |
> QUERY PLAN
>  |
> +---+
> | Aggregate  (cost=351405.08..351405.09 rows=1 width=8)   
>   
> |
> |   ->  Nested Loop  (cost=349846.23..350366.17 rows=415562 width=0)  
>   
> |
> | ->  HashAggregate  (cost=349845.67..349847.67 rows=200 width=96)
>   
> |
> |   Group Key: ("ANY_subquery".login)::text   
>   
> |
> |   ->  Subquery Scan on "ANY_subquery"  
> (cost=340828.23..348557.47 rows=515282 width=96)  
>  |
> | ->  SetOp Except  (cost=340828.23..343404.65 
> rows=515282 width=100)
>|
> |   ->  Sort  (cost=340828.23..342116.44 rows=515283 
> width=100)
>  |
> | Sort Key: "*SELECT* 1".login
>   
> |
> | ->  Append  (cost=0.56..275836.74 
> rows=515283 width=100)
>   |
> |   ->  Subquery Scan on "*SELECT* 1"  
> (cost=0.56..275834.70 rows=515282 width=12)   
>|
> | ->  Unique  
> (cost=0.56..270681.88 rows=515282 width=8)
> |
> |   ->  Index Only Scan using 
> ui_nkey_test_tab on test_tab test_tab_1  (cost=0.56..268604.07 rows=831125 
> width=8) |
> | Index Cond: 
> ((import_num = '4520440'::numeric) AND (login IS NOT NULL))   
> |
> |   ->  Subquery Scan on "*SELECT* 2"  
> (cost=0.56..2.04 rows=1 width=12) 
>|
> | ->  Unique  (cost=0.56..2.03 
> rows=1 width=8)   
>|
> |   ->  Index Only Scan using 
> ui_nkey_test_tab on test_tab test_tab_2  (cost=0.56..2.03 rows=1 width=8) 
>   |
> | Index Cond: 
> ((import_num = '0'::numeric) AND (login IS NOT NULL)) 
> |
> | ->  Index Only Scan using 

Re: [PERFORM] Performance Issue -- "Materialize"

2017-08-21 Thread Carlos Augusto Machado
I think you query is a bit confusing and have many subqueries, so I tried
to simplify

If you cant´t have more import_num = 0 to the same login, try this

SELECT count(*)
FROM test_tab tab1
LEFT JOIN test_tab tab2
ON tab1.login = tab2.login AND tab2.import_num = '0'
WHERE
   tab2.login IS NULL AND
   import_num = '4520440'

otherwise try this

SELECT count(*)
FROM test_tab tab1
LEFT JOIN (
   SELECT DISTINCT login FROM test_tab WHERE import_num = '0'
) tab2
ON tab1.login = tab2.login
WHERE
   tab2.login IS NULL AND
   import_num = '4520440'


Em seg, 21 de ago de 2017 às 15:47, Carlos Augusto Machado <
caugus...@gmail.com> escreveu:

>
> Do you have an index on login column ?
>
> If not, try creating an index and taking off those DISTICTs.
>
> Em seg, 21 de ago de 2017 às 15:33, Justin Pryzby 
> escreveu:
>
>> On Sat, Aug 19, 2017 at 10:37:56AM -0700, anand086 wrote:
>> >
>> +---+|
>> > QUERY PLAN
>> >
>> |+---+|
>> > Aggregate  (cost=351405.08..351405.09 rows=1 width=8)
>>
>> Would you send explain ANALYZE and not just explain ?
>>
>> Justin
>>
>>
>> --
>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org
>> )
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
>>
>


Re: [PERFORM] Performance Issue -- "Materialize"

2017-08-21 Thread Carlos Augusto Machado
Do you have an index on login column ?

If not, try creating an index and taking off those DISTICTs.

Em seg, 21 de ago de 2017 às 15:33, Justin Pryzby 
escreveu:

> On Sat, Aug 19, 2017 at 10:37:56AM -0700, anand086 wrote:
> >
> +---+|
> > QUERY PLAN
> >
> |+---+|
> > Aggregate  (cost=351405.08..351405.09 rows=1 width=8)
>
> Would you send explain ANALYZE and not just explain ?
>
> Justin
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: [PERFORM] Performance Issue -- "Materialize"

2017-08-21 Thread Justin Pryzby
On Sat, Aug 19, 2017 at 10:37:56AM -0700, anand086 wrote:
> +---+|
>
> QUERY PLAN
> 
> |+---+|
> Aggregate  (cost=351405.08..351405.09 rows=1 width=8) 
>

Would you send explain ANALYZE and not just explain ?

Justin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance Issue -- "Materialize"

2017-08-21 Thread anand086
Any thoughts on this? 



--
View this message in context: 
http://www.postgresql-archive.org/Performance-Issue-Materialize-tp5979128p5979481.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Performance Issue -- "Materialize"

2017-08-19 Thread anand086
I am a Postgres Newbie and trying to learn :)We have a scenario wherein, one
of the SQL with different input value for  import_num showing different
execution plan.As an example, with import_num = '4520440' the execution plan
shows Nested Loop and is taking ~12secs. With import_num = '4520460'
execution plan showed using "Materialize" and never completed. After I set
enable_material to off, the execution plan is changed using Hash Semi Join
and completes in less than 3 secs. SELECT count(*) FROM test_tab WHERE login
IN (SELECT DISTINCT login FROM test_tab WHERE import_num = '4520440' AND
login IS NOT NULL EXCEPT SELECT DISTINCT login FROM test_tab WHERE
import_num = '0' AND login IS NOT NULL) AND import_num =
'4520440';++| count  |++| 746982 |++(1 row)Time:
12054.274 ms
+---+|
   
QUERY PLAN  
  
|+---+|
Aggregate  (cost=351405.08..351405.09 rows=1 width=8)   
 
||   ->  Nested Loop  (cost=349846.23..350366.17 rows=415562 width=0)   

|| ->  HashAggregate  (cost=349845.67..349847.67 rows=200 width=96) 

||   Group Key: ("ANY_subquery".login)::text

||   ->  Subquery Scan on "ANY_subquery" 
(cost=340828.23..348557.47 rows=515282 width=96)
  
|| ->  SetOp Except  (cost=340828.23..343404.65
rows=515282 width=100)  

||   ->  Sort  (cost=340828.23..342116.44
rows=515283 width=100)  
  
|| Sort Key: "*SELECT* 1".login 

|| ->  Append  (cost=0.56..275836.74
rows=515283 width=100)  
   
||   ->  Subquery Scan on "*SELECT* 1" 
(cost=0.56..275834.70 rows=515282 width=12) 
|| ->  Unique 
(cost=0.56..270681.88 rows=515282 width=8)  
 
||   ->  Index Only Scan
using ui_nkey_test_tab on test_tab test_tab_1  (cost=0.56..268604.07
rows=831125 width=8) ||
Index Cond: ((import_num = '4520440'::numeric) AND (login IS NOT NULL)) 
 
||   ->  Subquery Scan on "*SELECT* 2" 
(cost=0.56..2.04 rows=1 width=12)   
|| ->  Unique  (cost=0.56..2.03
rows=1 width=8) 
||   ->  Index Only Scan
using ui_nkey_test_tab on test_tab test_tab_2  (cost=0.56..2.03 rows=1
width=8)   ||
Index Cond: ((import_num = '0'::numeric) AND (login IS NOT NULL))   
 
|| ->  Index Only Scan using ui_nkey_test_tab on test_tab 
(cost=0.56..2.58 rows=1 width=8)
 
||   Index Cond: ((import_num = '4520440'::numeric) AND (login =
("ANY_subquery".login)::text)) 
|+---+(19
rows)
SELECT count(*) FROM test_tab WHERE import_num = '4520460' and login IN
(SELECT DISTINCT login FROM test_tab WHERE import_num = '4520460' AND login
IS NOT NULL EXCEPT SELECT DISTINCT login FROM test_tab WHERE import_num =
'0' AND login IS NOT NULL);The SQL was never completing and had the below
SQL execution plan --
+---+|
   
QUERY PLAN   

Re: [PERFORM] Performance issue in PostgreSQL server...

2017-04-27 Thread Justin Pryzby
On Mon, Mar 06, 2017 at 12:17:22PM +, Dinesh Chandra 12108 wrote:
> Below is the output of Query SELECT * FROM pg_stats WHERE tablename='point' 
> AND attname='domain_class_id' ;
> 
> 
> schemaname | tablename | attname | inherited | null_frac | avg_width 
> | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | 
> correlation
> 
> "evidence"|"point"|"domain_class_id"|f|0|8|10|"{7,9,2,11,43,3,1,10,4,17}"|"{0.9322,0.0451333,0.0145,0.0039,0.0018,0.00146667,0.0005,0.0003,6.7e-05,6.7e-05}"|""|0.889078

On Fri, Mar 03, 2017 at 12:44:07PM +, Dinesh Chandra 12108 wrote:
>->  Index Scan using point_domain_class_id_index on point p  
> (cost=0.00..1483472.70 rows=1454751 width=16) (actual time=27.265..142101.1 
> 59 rows=1607491 loops=1)

On Sun, Mar 05, 2017 at 08:23:08PM -0800, Jeff Janes wrote:
> Why wouldn't this be using a bitmap scan rather than a regular index scan?
> It seems like it should prefer the bitmap scan, unless the table is well
> clustered on domain_class_id.  In which case, why isn't it just faster?

I missed your response until now, and can't see that anybody else responded,
but I suspect the issue is that the *table* is highly correlated WRT this
column, but the index may not be, probably due to duplicated index keys.
postgres only stores statistics on expression indices, and falls back to
correlation of table column of a simple indices.

If you're still fighting this, would you send result of:

SELECT domain_class_id, count(1) FROM point GROUP BY 1 ORDER BY 2 DESC LIMIT 22;
or,
SELECT count(1) FROM point GROUP BY domain_class_id ORDER BY 1 DESC LIMIT 22;

if there's much repetition in the index keys, then PG's planner thinks an index
scan has low random_page_cost, and effective_cache_size has little effect on
large tables, and it never uses bitmap scan, which blows up if the index is
fragmented and has duplicate keys.  The table reads end up costing something
like 1454751*random_page_cost nonsequential reads and fseek() calls when it
thinks it'll cost only 1454751*16*seq_page_cost.

Is the query much faster if you first reindex point_domain_class_id_index ?

This has come up before, see:
> https://www.postgresql.org/message-id/flat/520D6610.8040907%40emulex.com#520d6610.8040...@emulex.com
> https://www.postgresql.org/message-id/flat/20160524173914.GA11880%40telsasoft.com#20160524173914.ga11...@telsasoft.com
> https://www.postgresql.org/message-id/flat/n6cmpug13b9rk1srebjvhphg0lm8dou1kn%404ax.com#n6cmpug13b9rk1srebjvhphg0lm8dou...@4ax.com

Justin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance issue after upgrading from 9.4 to 9.6

2017-03-27 Thread Merlin Moncure
On Mon, Mar 6, 2017 at 7:20 AM, Piotr Gasidło  wrote:
> We are having some performance issues after we upgraded to newest
> version of PostgreSQL, before it everything was fast and smooth.
>
> Upgrade was done by pg_upgrade from 9.4 directly do 9.6.1. Now we
> upgraded to 9.6.2 with no improvement.
>
> Some information about our setup: Freebsd, Solaris (SmartOS), simple
> master-slave using streaming replication.
>
> Problem:
> Very high system CPU when master is streaming replication data, CPU
> goes up to 77%. Only one process is generating this load, it's a
> postgresql startup process. When I attached a truss to this process I
> saw a lot o read calls with almost the same number of errors (EAGAIN).
>
> root@d8:~ # truss -c -p 38091
> ^Csyscall seconds   calls  errors
> semop   0.001611782 198   0
> write   0.74404   2   0
> read2.281535100   17266   12375
> openat  0.000683532  48   0
> lseek   0.177612479   20443   0
> close   0.000395549  48   0
>   - --- ---
> 2.461912846   38005   12375
>
> read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
> lseek(444,0x0,SEEK_END) = 32571392 (0x1f1)
> read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
> lseek(444,0x0,SEEK_END) = 32571392 (0x1f1)
> read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
> lseek(7,0x0,SEEK_END) = 164487168 (0x9cde000)
> lseek(778,0x0,SEEK_END) = 57344 (0xe000)
> read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
> lseek(894,0x0,SEEK_END) = 139296768 (0x84d8000)
> lseek(894,0x4b7e000,SEEK_SET) = 79159296 (0x4b7e000)
> read(894," ~\0\08\a\M--m\0\0\^A\0\M^T\"...,8192) = 8192 (0x2000)
> lseek(3,0xfa6000,SEEK_SET) = 16408576 (0xfa6000)
> read(3,"\M^S\M-P\^E\0\^A\0\0\0\0`\M-z"...,8192) = 8192 (0x2000)
> read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
> lseek(894,0x0,SEEK_END) = 139296768 (0x84d8000)
> lseek(894,0x0,SEEK_END) = 139296768 (0x84d8000)
> lseek(894,0x449c000,SEEK_SET) = 71942144 (0x449c000)
> read(894,"\^_~\0\0\M-H\M-H\M-B\M-b\0\0\^E"...,8192) = 8192 (0x2000)
> lseek(818,0x0,SEEK_END) = 57344 (0xe000)
> read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
> lseek(442,0x0,SEEK_END) = 10174464 (0x9b4000)
> lseek(442,0x4c4000,SEEK_SET) = 4997120 (0x4c4000)
> read(442,"\^_~\0\0\M-P\M-+\M-1\M-b\0\0\0\0"...,8192) = 8192 (0x2000)
> read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
>
> Descriptor 6 is a pipe
>
> Read call try to read one byte over and over, I looked up to source
> code and I think this file is responsible for this behavior
> src/backend/storage/ipc/latch.c. There was no such file in 9.4.

Is a git bisect out of the question?

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Performance issue after upgrading from 9.4 to 9.6

2017-03-06 Thread Piotr Gasidło
We are having some performance issues after we upgraded to newest
version of PostgreSQL, before it everything was fast and smooth.

Upgrade was done by pg_upgrade from 9.4 directly do 9.6.1. Now we
upgraded to 9.6.2 with no improvement.

Some information about our setup: Freebsd, Solaris (SmartOS), simple
master-slave using streaming replication.

Problem:
Very high system CPU when master is streaming replication data, CPU
goes up to 77%. Only one process is generating this load, it's a
postgresql startup process. When I attached a truss to this process I
saw a lot o read calls with almost the same number of errors (EAGAIN).

root@d8:~ # truss -c -p 38091
^Csyscall seconds   calls  errors
semop   0.001611782 198   0
write   0.74404   2   0
read2.281535100   17266   12375
openat  0.000683532  48   0
lseek   0.177612479   20443   0
close   0.000395549  48   0
  - --- ---
2.461912846   38005   12375

read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
lseek(444,0x0,SEEK_END) = 32571392 (0x1f1)
read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
lseek(444,0x0,SEEK_END) = 32571392 (0x1f1)
read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
lseek(7,0x0,SEEK_END) = 164487168 (0x9cde000)
lseek(778,0x0,SEEK_END) = 57344 (0xe000)
read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
lseek(894,0x0,SEEK_END) = 139296768 (0x84d8000)
lseek(894,0x4b7e000,SEEK_SET) = 79159296 (0x4b7e000)
read(894," ~\0\08\a\M--m\0\0\^A\0\M^T\"...,8192) = 8192 (0x2000)
lseek(3,0xfa6000,SEEK_SET) = 16408576 (0xfa6000)
read(3,"\M^S\M-P\^E\0\^A\0\0\0\0`\M-z"...,8192) = 8192 (0x2000)
read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
lseek(894,0x0,SEEK_END) = 139296768 (0x84d8000)
lseek(894,0x0,SEEK_END) = 139296768 (0x84d8000)
lseek(894,0x449c000,SEEK_SET) = 71942144 (0x449c000)
read(894,"\^_~\0\0\M-H\M-H\M-B\M-b\0\0\^E"...,8192) = 8192 (0x2000)
lseek(818,0x0,SEEK_END) = 57344 (0xe000)
read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
lseek(442,0x0,SEEK_END) = 10174464 (0x9b4000)
lseek(442,0x4c4000,SEEK_SET) = 4997120 (0x4c4000)
read(442,"\^_~\0\0\M-P\M-+\M-1\M-b\0\0\0\0"...,8192) = 8192 (0x2000)
read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'

Descriptor 6 is a pipe

Read call try to read one byte over and over, I looked up to source
code and I think this file is responsible for this behavior
src/backend/storage/ipc/latch.c. There was no such file in 9.4.


-- 
Piotr Gasidło


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance issue in PostgreSQL server...

2017-03-06 Thread Dinesh Chandra 12108
Dear Justin,

Below is the output of Query SELECT * FROM pg_stats WHERE tablename='point' AND 
attname='domain_class_id' ;


schemaname | tablename | attname | inherited | null_frac | avg_width | 
n_distinct | most_common_vals | most_common_freqs | histogram_bounds | 
correlation

"evidence"|"point"|"domain_class_id"|f|0|8|10|"{7,9,2,11,43,3,1,10,4,17}"|"{0.9322,0.0451333,0.0145,0.0039,0.0018,0.00146667,0.0005,0.0003,6.7e-05,6.7e-05}"|""|0.889078


Regards,
Dinesh Chandra
|Database administrator (Oracle/PostgreSQL)| Cyient Ltd. Noida.


-Original Message-
From: Justin Pryzby [mailto:pry...@telsasoft.com]
Sent: 06 March, 2017 10:54 AM
To: Dinesh Chandra 12108 <dinesh.chan...@cyient.com>
Cc: Nur Agus <nuragus.li...@gmail.com>; Jeff Janes <jeff.ja...@gmail.com>; 
pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Performance issue in PostgreSQL server...

On Sun, Mar 05, 2017 at 08:23:08PM -0800, Jeff Janes wrote:
> On Fri, Mar 3, 2017 at 4:44 AM, Dinesh Chandra 12108 
> <dinesh.chan...@cyient.com> wrote:
> > The below is the output for psql=> EXPLAIN ANALYZE SELECT DISTINCT
> > feature_id FROM evidence.point p INNER JOIN
> > evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id
> > WHERE p.domain_class_id IN (11) AND (p.modification_time >
> > '2015-05-10 00:06:56.056 IST' OR oe.modification_time > '2015-05-10
> > 00:06:56.056 IST') ORDER BY feature_id
> ...
>
> >  ->  Index Scan using point_domain_class_id_index on 
> > point p  (cost=0.00..1483472.70 rows=1454751 width=16) (actual 
> > time=27.265..142101.1 59 rows=1607491 loops=1)
> >Index Cond: (domain_class_id = 11)
>
> Why wouldn't this be using a bitmap scan rather than a regular index scan?
> It seems like it should prefer the bitmap scan, unless the table is
> well clustered on domain_class_id.  In which case, why isn't it just faster?

Could you send:

SELECT * FROM pg_stats WHERE tablename='point' AND attname='domain_class_id' ;

.. or if that's too verbose or you don't want to share the histogram or MCV
list:

SELECT correlation FROM pg_stats WHERE tablename='point' AND 
attname='domain_class_id' ;

Justin



DISCLAIMER:

This email message is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information. Any unauthorized review, use, 
disclosure or distribution is prohibited. If you are not the intended 
recipient, please contact the sender by reply email and destroy all copies of 
the original message. Check all attachments for viruses before opening them. 
All views or opinions presented in this e-mail are those of the author and may 
not reflect the opinion of Cyient or those of our affiliates.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance issue in PostgreSQL server...

2017-03-05 Thread Justin Pryzby
On Sun, Mar 05, 2017 at 08:23:08PM -0800, Jeff Janes wrote:
> On Fri, Mar 3, 2017 at 4:44 AM, Dinesh Chandra 12108 
>  wrote:
> > The below is the output for psql=> EXPLAIN ANALYZE SELECT DISTINCT
> > feature_id FROM evidence.point p INNER JOIN evidence.observation_evidence
> > oe ON p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (11) AND
> > (p.modification_time > '2015-05-10 00:06:56.056 IST' OR
> > oe.modification_time > '2015-05-10 00:06:56.056 IST') ORDER BY feature_id
> ...
> 
> >  ->  Index Scan using point_domain_class_id_index on 
> > point p  (cost=0.00..1483472.70 rows=1454751 width=16) (actual 
> > time=27.265..142101.1 59 rows=1607491 loops=1)
> >Index Cond: (domain_class_id = 11)
> 
> Why wouldn't this be using a bitmap scan rather than a regular index scan?
> It seems like it should prefer the bitmap scan, unless the table is well
> clustered on domain_class_id.  In which case, why isn't it just faster?

Could you send:

SELECT * FROM pg_stats WHERE tablename='point' AND attname='domain_class_id' ;

.. or if that's too verbose or you don't want to share the histogram or MCV
list:

SELECT correlation FROM pg_stats WHERE tablename='point' AND 
attname='domain_class_id' ;

Justin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance issue in PostgreSQL server...

2017-03-05 Thread Jeff Janes
On Fri, Mar 3, 2017 at 4:44 AM, Dinesh Chandra 12108 <
dinesh.chan...@cyient.com> wrote:

> Dear Nur,
>
>
>
> The below is the output for psql=> EXPLAIN ANALYZE SELECT DISTINCT
> feature_id FROM evidence.point p INNER JOIN evidence.observation_evidence
> oe ON p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (11) AND
> (p.modification_time > '2015-05-10 00:06:56.056 IST' OR
> oe.modification_time > '2015-05-10 00:06:56.056 IST') ORDER BY feature_id
>
>
>
>
...


>  ->  Index Scan using point_domain_class_id_index on
> point p  (cost=0.00..1483472.70 rows=1454751 width=16) (actual
> time=27.265..142101.1
>
> 59 rows=1607491 loops=1)
>
>Index Cond: (domain_class_id = 11)
>

Why wouldn't this be using a bitmap scan rather than a regular index scan?
It seems like it should prefer the bitmap scan, unless the table is well
clustered on domain_class_id.  In which case, why isn't it just faster?

You could try repeating the explain analyze after setting enable_indexscan
=off to see what that gives.  If it gives a seq scan, then repeat with
enable_seqscan also turned off.  Or If it gives the bitmap scan, then
repeat with enable_bitmapscan turned off.

How many rows is in point, and how big is it?

The best bet for making this better might be to have an index on
(domain_class_id, modification_time) and hope for an index only scan.
Except that you are on 9.1, so first you would have to upgrade.  Which
would allow you to use BUFFERS in the explain analyze, as well as
track_io_timings, both of which would also be pretty nice to see.  Using
9.1 is like having one hand tied behind your back.

Also, any idea why this execution of this query 15 is times faster than the
execution you found in the log file?  Was the top output you showed in the
first email happening at the time the really slow query was running, or was
that from a different period?

Cheers,

Jeff


Re: [PERFORM] Performance issue in PostgreSQL server...

2017-03-03 Thread Tom Lane
Dinesh Chandra 12108  writes:
> The below is the output for psql=> EXPLAIN ANALYZE SELECT DISTINCT feature_id 
> FROM evidence.point p INNER JOIN evidence.observation_evidence oe ON 
> p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (11) AND 
> (p.modification_time > '2015-05-10 00:06:56.056 IST' OR oe.modification_time 
> > '2015-05-10 00:06:56.056 IST') ORDER BY feature_id

I think that's a fundamentally slow query and you're not going to be able
to make it better without rethinking your requirements and/or data
representation.  As written, that requires the server to form the entire
join of p to oe on feature_id, with the only filter before the join being
the evidently-none-too-selective domain_class_id condition.  Only after
joining can it apply the OR condition.  So this is inherently processing a
lot of rows.

If the OR arms were individually pretty selective you could rewrite this
into a UNION of two joins, a la the discussion at
https://www.postgresql.org/message-id/flat/7f70bd5a-5d16-e05c-f0b4-2fdfc8873...@bluetreble.com
but given the dates involved I'm betting that won't help very much.

Or maybe you could try

select feature_id from p where domain_class_id IN (11) AND p.modification_time 
> '2015-05-10 00:06:56.056 IST'
intersect
select feature_id from oe where oe.modification_time > '2015-05-10 00:06:56.056 
IST'
order by feature_id

although I'm not entirely certain that that has exactly the same
semantics (-ENOCAFFEINE), and it might still be none too quick.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance issue in PostgreSQL server...

2017-03-03 Thread Dinesh Chandra 12108
Dear Nur,

The below is the output for psql=> EXPLAIN ANALYZE SELECT DISTINCT feature_id 
FROM evidence.point p INNER JOIN evidence.observation_evidence oe ON 
p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (11) AND 
(p.modification_time > '2015-05-10 00:06:56.056 IST' OR oe.modification_time > 
'2015-05-10 00:06:56.056 IST') ORDER BY feature_id



  QUERY PLAN

-
-
Unique  (cost=1679730.32..1679837.46 rows=21428 width=8) (actual 
time=154753.528..155657.818 rows=1607489 loops=1)
   ->  Sort  (cost=1679730.32..1679783.89 rows=21428 width=8) (actual 
time=154753.514..155087.734 rows=4053270 loops=1)
 Sort Key: p.feature_id
 Sort Method: quicksort  Memory: 288302kB
 ->  Hash Join  (cost=1501657.09..1678188.87 rows=21428 width=8) 
(actual time=144146.620..152050.311 rows=4053270 loops=1)
   Hash Cond: (oe.evd_feature_id = p.feature_id)
   Join Filter: ((p.modification_time > '2015-05-10 
03:36:56.056+05:30'::timestamp with time zone) OR (oe.modification_time > 
'2015-05-10 03:36:5
6.056+05:30'::timestamp with time zone))
   ->  Seq Scan on observation_evidence oe  (cost=0.00..121733.18 
rows=5447718 width=16) (actual time=0.007..1534.905 rows=5434406 loops=1)
   ->  Hash  (cost=1483472.70..1483472.70 rows=1454751 width=16) 
(actual time=144144.653..144144.653 rows=1607491 loops=1)
 Buckets: 262144  Batches: 1  Memory Usage: 75352kB
 ->  Index Scan using point_domain_class_id_index on point 
p  (cost=0.00..1483472.70 rows=1454751 width=16) (actual time=27.265..142101.1
59 rows=1607491 loops=1)
   Index Cond: (domain_class_id = 11)
Total runtime: 155787.379 ms
(13 rows)


Regards,
Dinesh Chandra
|Database administrator (Oracle/PostgreSQL)| Cyient Ltd. Noida.
--
Mobile: +91-9953975849 | Ext 1078 
|dinesh.chan...@cyient.com<mailto:%7cdinesh.chan...@cyient.com>
Plot No. 7, NSEZ, Phase-II ,Noida-Dadri Road, Noida - 201 305,India.

From: Nur Agus [mailto:nuragus.li...@gmail.com]
Sent: 03 March, 2017 5:54 PM
To: Dinesh Chandra 12108 <dinesh.chan...@cyient.com>
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Performance issue in PostgreSQL server...

Hello Dinesh,

You can try the EXPLAIN tool

psql=> EXPLAIN ANALYZE SELECT DISTINCT feature_id FROM evidence.point p INNER 
JOIN evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE 
p.domain_class_id IN (11) AND (p.modification_time > '2015-05-10 00:06:56.056 
IST' OR oe.modification_time > '2015-05-10 00:06:56.056 IST') ORDER BY 
feature_id

Then paste here the result.

Thanks

On Fri, Mar 3, 2017 at 5:29 PM, Dinesh Chandra 12108 
<dinesh.chan...@cyient.com<mailto:dinesh.chan...@cyient.com>> wrote:
Dear Experts,

I need your suggestions to resolve the performance issue reported on our 
PostgreSQL9.1 production database having 1.5 TB Size. I have observed that, 
some select queries with order by clause are taking lot of time in execution 
and forcing applications to give slow response.

The configuration of database server is :

Architecture: x86_64
CPU op-mode(s):32-bit, 64-bit
CPU’s : 8
Core(s) per socket:4
Socket(s): 2
Model name:Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz

RAM : 32 GB
SWAP :8 Gb

Kernel parameter:

kernel.shmmax = 32212254720
kernel.shmall = 1073741824


Values of PostgreSQL.conf parameters are :

shared_buffers = 10GB
temp_buffers = 32MB
work_mem = 512MB
maintenance_work_mem = 2048MB
max_files_per_process = 2000
checkpoint_segments = 200
max_wal_senders = 5
wal_buffers = -1  # min 32kB, -1 sets based on 
shared_buffers


Queries taking lot of time are:
==


2017-03-02 00:46:50 IST LOG:  duration: 2492951.927 ms  execute : 
SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN 
evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE 
p.domain_class_id IN (11) AND (p.modification_time > '2015-05-10 00:06:56.056 
IST' OR oe.modification_time > '2015-05-10 00:06:56.056 IST') ORDER BY 
feature_id


2017-03-02 01:05:16 IST LOG:  duration: 516250.512 ms  execute : 
SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN 
evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE 
p.domain_class_id IN (3) AND (p.modification_time > '2015-05-10 01:22:59.059 
IST' OR oe.modification_time > '2015-05-10 01:22:59.059 IST') ORDER BY 
feature_id


Top command output:

top - 15:13:15 up 66 days,  3:45,  8 users,  load average: 1.8

Re: [PERFORM] Performance issue in PostgreSQL server...

2017-03-03 Thread Nur Agus
Hello Dinesh,

You can try the EXPLAIN tool

psql=> EXPLAIN ANALYZE SELECT DISTINCT feature_id FROM evidence.point p
INNER JOIN evidence.observation_evidence oe ON p.feature_id =
oe.evd_feature_id WHERE p.domain_class_id IN (11) AND (p.modification_time
> '2015-05-10 00:06:56.056 IST' OR oe.modification_time > '2015-05-10
00:06:56.056 IST') ORDER BY feature_id

Then paste here the result.

Thanks

On Fri, Mar 3, 2017 at 5:29 PM, Dinesh Chandra 12108 <
dinesh.chan...@cyient.com> wrote:

> Dear Experts,
>
>
>
> I need your suggestions to resolve the performance issue reported on our
> *PostgreSQL9.1* production database having 1.5 TB *Size*. I have observed
> that, some select queries with order by clause are taking lot of time in
> execution and forcing applications to give slow response.
>
>
>
> The configuration of database server is :
>
>
>
> Architecture: x86_64
>
> CPU op-mode(s):32-bit, 64-bit
>
> CPU’s : 8
>
> Core(s) per socket:4
>
> Socket(s): 2
>
> Model name:Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz
>
>
>
> *RAM* : 32 GB
>
> *SWAP* :8 Gb
>
>
>
> *Kernel parameter*:
>
>
>
> kernel.shmmax = 32212254720
>
> kernel.shmall = 1073741824
>
>
>
>
>
> Values of PostgreSQL.conf parameters are :
>
>
>
> shared_buffers = 10GB
>
> temp_buffers = 32MB
>
> work_mem = 512MB
>
> maintenance_work_mem = 2048MB
>
> max_files_per_process = 2000
>
> checkpoint_segments = 200
>
> max_wal_senders = 5
>
> wal_buffers = -1  # min 32kB, -1 sets based on
> shared_buffers
>
>
>
>
>
> *Queries taking lot of time are:*
>
> ==
>
>
>
>
>
> 2017-03-02 00:46:50 IST LOG:  duration: 2492951.927 ms  execute :
> SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN
> evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE
> p.domain_class_id IN (11) AND (p.modification_time > '2015-05-10
> 00:06:56.056 IST' OR oe.modification_time > '2015-05-10 00:06:56.056 IST')
> ORDER BY feature_id
>
>
>
>
>
> 2017-03-02 01:05:16 IST LOG:  duration: 516250.512 ms  execute :
> SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN
> evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE
> p.domain_class_id IN (3) AND (p.modification_time > '2015-05-10
> 01:22:59.059 IST' OR oe.modification_time > '2015-05-10 01:22:59.059 IST')
> ORDER BY feature_id
>
>
>
>
>
> *Top command output*:
>
>
>
> top - 15:13:15 up 66 days,  3:45,  8 users,  load average: 1.84, 1.59, 1.57
>
> Tasks: 830 total,   1 running, 828 sleeping,   0 stopped,   1 zombie
>
> Cpu(s):  3.4%us,  0.7%sy,  0.0%ni, 81.7%id, 14.2%wa,  0.0%hi,  0.0%si,
> 0.0%st
>
> *Mem:*  32830016k total, *32142596k* used,   *687420k* free,77460k
> buffers
>
> Swap:  8190972k total,   204196k used,  7986776k free, 27981268k cached
>
>
>
>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>
> 30639 postgres  20   0 10.5g 4.7g 4.7g S 13.5 14.9  10:20.95 postgres
>
> 18185 postgres  20   0 10.5g 603m 596m S  4.9  1.9   2:51.16 postgres
>
> 16543 postgres  20   0 10.5g 2.8g 2.8g S  4.3  8.8   1:34.04 postgres
>
> 14710 postgres  20   0 10.5g 2.9g 2.9g S  3.9  9.2   1:20.84 postgres
>
> 1214 root  20   0 15668 1848  896 S  1.0  0.0 130:46.43 top
>
> 13462 postgres  20   0 10.5g 1.4g 1.3g S  1.0  4.3   0:25.56 postgres
>
> 20081 root  20   0 15668 1880  936 R  1.0  0.0   0:00.12 top
>
> 13478 postgres  20   0 10.5g 2.1g 2.1g S  0.7  6.9   0:56.43 postgres
>
> 41107 root  20   0  416m  10m 4892 S  0.7  0.0 305:25.71 pgadmin3
>
> 2680 root  20   0 000 S  0.3  0.0 103:38.54 nfsiod
>
> 3558 root  20   0 13688 1100  992 S  0.3  0.0  45:00.36 gam_server
>
> 15576 root  20   0 000 S  0.3  0.0   0:01.16 flush-253:1
>
> 18430 postgres  20   0 10.5g  18m  13m S  0.3  0.1   0:00.64 postgres
>
> 20083 postgres  20   0  105m 1852 1416 S  0.3  0.0   0:00.01 bash
>
> 24188 postgres  20   0  102m 1856  832 S  0.3  0.0   0:23.39 sshd
>
> 28250 postgres  20   0  156m 1292  528 S  0.3  0.0   0:46.86 postgres
>
> 1 root  20   0 19356 1188  996 S  0.0  0.0   0:05.00 init
>
>
>
> *Regards,*
>
> *Dinesh Chandra*
>
> *|Database administrator (Oracle/PostgreSQL)| *
>
>
>
> --
>
> DISCLAIMER:
>
> This email message is for the sole use of the intended recipient(s) and
> may contain confidential and privileged information. Any unauthorized
> review, use, disclosure or distribution is prohibited. If you are not the
> intended recipient, please contact the sender by reply email and destroy
> all copies of the original message. Check all attachments for viruses
> before opening them. All views or opinions presented in this e-mail are
> those of the author and may not reflect the opinion of Cyient or those of
> our affiliates.
>


[PERFORM] Performance issue in PostgreSQL server...

2017-03-03 Thread Dinesh Chandra 12108
Dear Experts,

I need your suggestions to resolve the performance issue reported on our 
PostgreSQL9.1 production database having 1.5 TB Size. I have observed that, 
some select queries with order by clause are taking lot of time in execution 
and forcing applications to give slow response.

The configuration of database server is :

Architecture: x86_64
CPU op-mode(s):32-bit, 64-bit
CPU's : 8
Core(s) per socket:4
Socket(s): 2
Model name:Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz

RAM : 32 GB
SWAP :8 Gb

Kernel parameter:

kernel.shmmax = 32212254720
kernel.shmall = 1073741824


Values of PostgreSQL.conf parameters are :

shared_buffers = 10GB
temp_buffers = 32MB
work_mem = 512MB
maintenance_work_mem = 2048MB
max_files_per_process = 2000
checkpoint_segments = 200
max_wal_senders = 5
wal_buffers = -1  # min 32kB, -1 sets based on 
shared_buffers


Queries taking lot of time are:
==


2017-03-02 00:46:50 IST LOG:  duration: 2492951.927 ms  execute : 
SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN 
evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE 
p.domain_class_id IN (11) AND (p.modification_time > '2015-05-10 00:06:56.056 
IST' OR oe.modification_time > '2015-05-10 00:06:56.056 IST') ORDER BY 
feature_id


2017-03-02 01:05:16 IST LOG:  duration: 516250.512 ms  execute : 
SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN 
evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE 
p.domain_class_id IN (3) AND (p.modification_time > '2015-05-10 01:22:59.059 
IST' OR oe.modification_time > '2015-05-10 01:22:59.059 IST') ORDER BY 
feature_id


Top command output:

top - 15:13:15 up 66 days,  3:45,  8 users,  load average: 1.84, 1.59, 1.57
Tasks: 830 total,   1 running, 828 sleeping,   0 stopped,   1 zombie
Cpu(s):  3.4%us,  0.7%sy,  0.0%ni, 81.7%id, 14.2%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  32830016k total, 32142596k used,   687420k free,77460k buffers
Swap:  8190972k total,   204196k used,  7986776k free, 27981268k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
30639 postgres  20   0 10.5g 4.7g 4.7g S 13.5 14.9  10:20.95 postgres
18185 postgres  20   0 10.5g 603m 596m S  4.9  1.9   2:51.16 postgres
16543 postgres  20   0 10.5g 2.8g 2.8g S  4.3  8.8   1:34.04 postgres
14710 postgres  20   0 10.5g 2.9g 2.9g S  3.9  9.2   1:20.84 postgres
1214 root  20   0 15668 1848  896 S  1.0  0.0 130:46.43 top
13462 postgres  20   0 10.5g 1.4g 1.3g S  1.0  4.3   0:25.56 postgres
20081 root  20   0 15668 1880  936 R  1.0  0.0   0:00.12 top
13478 postgres  20   0 10.5g 2.1g 2.1g S  0.7  6.9   0:56.43 postgres
41107 root  20   0  416m  10m 4892 S  0.7  0.0 305:25.71 pgadmin3
2680 root  20   0 000 S  0.3  0.0 103:38.54 nfsiod
3558 root  20   0 13688 1100  992 S  0.3  0.0  45:00.36 gam_server
15576 root  20   0 000 S  0.3  0.0   0:01.16 flush-253:1
18430 postgres  20   0 10.5g  18m  13m S  0.3  0.1   0:00.64 postgres
20083 postgres  20   0  105m 1852 1416 S  0.3  0.0   0:00.01 bash
24188 postgres  20   0  102m 1856  832 S  0.3  0.0   0:23.39 sshd
28250 postgres  20   0  156m 1292  528 S  0.3  0.0   0:46.86 postgres
1 root  20   0 19356 1188  996 S  0.0  0.0   0:05.00 init

Regards,
Dinesh Chandra
|Database administrator (Oracle/PostgreSQL)|




DISCLAIMER:

This email message is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information. Any unauthorized review, use, 
disclosure or distribution is prohibited. If you are not the intended 
recipient, please contact the sender by reply email and destroy all copies of 
the original message. Check all attachments for viruses before opening them. 
All views or opinions presented in this e-mail are those of the author and may 
not reflect the opinion of Cyient or those of our affiliates.


Re: [PERFORM] Performance issue with castings args of the function

2017-01-02 Thread Андрей Хозов
Thanks all for explain!

On Mon, Jan 2, 2017 at 9:36 PM, Tom Lane  wrote:

> =?UTF-8?B?0JDQvdC00YDQtdC5INCl0L7Qt9C+0LI=?=  writes:
> > create table t1 (id serial, str char(32));
>
> > create function f1(line text) returns void as $$
> > begin
> >   perform * from t1 where str = line;
> > end;
> > $$ language plpgsql;
>
> This query is specifying a text comparison (text = text operator).
> Since the table column isn't text, a char-to-text conversion must
> happen at each line.
>
> > create function f2(line char) returns void as $$
> > begin
> >   perform * from t1 where str = line;
> > end;
> > $$ language plpgsql;
>
> This query is specifying a char(n) comparison (char = char operator).
> No type conversion step needed, so it's faster.
>
> regards, tom lane
>



-- 
​Andrey Khozov


Re: [PERFORM] Performance issue with castings args of the function

2017-01-02 Thread Tom Lane
=?UTF-8?B?0JDQvdC00YDQtdC5INCl0L7Qt9C+0LI=?=  writes:
> create table t1 (id serial, str char(32));

> create function f1(line text) returns void as $$
> begin
>   perform * from t1 where str = line;
> end;
> $$ language plpgsql;

This query is specifying a text comparison (text = text operator).
Since the table column isn't text, a char-to-text conversion must
happen at each line.

> create function f2(line char) returns void as $$
> begin
>   perform * from t1 where str = line;
> end;
> $$ language plpgsql;

This query is specifying a char(n) comparison (char = char operator).
No type conversion step needed, so it's faster.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance issue with castings args of the function

2017-01-02 Thread Pavel Stehule
Hi

2017-01-02 15:34 GMT+01:00 Андрей Хозов :

> Hello there!
>
> I have an performance issue with functions and args type.
>
> Table and data:
> create table t1 (id serial, str char(32));
> insert into t1 (str) select md5(s::text) from generate_series(1, 100)
> as s;
>
> And simple functions:
> create function f1(line text) returns void as $$
> begin
>   perform * from t1 where str = line;
> end;
> $$ language plpgsql;
>
> create function f2(line char) returns void as $$
> begin
>   perform * from t1 where str = line;
> end;
> $$ language plpgsql;
>
> ​Query:
> test=> explain analyze select f2('2b00042f7481c7b056c4b410d28f33cf');
>QUERY PLAN
>
> 
> 
>  Result  (cost=0.00..0.26 rows=1 width=0) (actual time=189.008..189.010
> rows=1 loops=1)
>  Planning time: 0.039 ms
>  Execution time: 189.039 ms
> (3 rows)
>
> Time: 189,524 ms
> test=> explain analyze select f1('2b00042f7481c7b056c4b410d28f33cf');
>QUERY PLAN
>
> 
> 
>  Result  (cost=0.00..0.26 rows=1 width=0) (actual time=513.734..513.735
> rows=1 loops=1)
>  Planning time: 0.024 ms
>  Execution time: 513.757 ms
> (3 rows)
>
> Time: 514,125 ms
> test=> explain analyze select f1('2b00042f7481c7b056c4b410d2
> 8f33cf'::char(32));
>QUERY PLAN
>
> 
> 
>  Result  (cost=0.00..0.26 rows=1 width=0) (actual time=513.507..513.509
> rows=1 loops=1)
>  Planning time: 0.074 ms
>  Execution time: 513.535 ms
> (3 rows)
>

This explain shows nothing - you need to use nested explain

look on auto-explain
https://www.postgresql.org/docs/current/static/auto-explain.html

Maybe index was not used due different types.

Regards

Pavel


> Time: 514,104 ms
> test=>
> ​
> ​Seems that casting param from text to char(32) needs to be done only once
> and​ f1 and f2 must be identical on performance. But function f2 with text
> param significantly slower, even with casting arg while pass it to function.
>
> Tested postgresql versions 9.5.5 and 9.6.1 on Ubuntu 16.04. It's normal
> behavior or it's can be fixed?
>
> --
> ​Andrey Khozov
>


[PERFORM] Performance issue with castings args of the function

2017-01-02 Thread Андрей Хозов
Hello there!

I have an performance issue with functions and args type.

Table and data:
create table t1 (id serial, str char(32));
insert into t1 (str) select md5(s::text) from generate_series(1, 100)
as s;

And simple functions:
create function f1(line text) returns void as $$
begin
  perform * from t1 where str = line;
end;
$$ language plpgsql;

create function f2(line char) returns void as $$
begin
  perform * from t1 where str = line;
end;
$$ language plpgsql;

​Query:
test=> explain analyze select f2('2b00042f7481c7b056c4b410d28f33cf');
   QUERY PLAN



 Result  (cost=0.00..0.26 rows=1 width=0) (actual time=189.008..189.010
rows=1 loops=1)
 Planning time: 0.039 ms
 Execution time: 189.039 ms
(3 rows)

Time: 189,524 ms
test=> explain analyze select f1('2b00042f7481c7b056c4b410d28f33cf');
   QUERY PLAN



 Result  (cost=0.00..0.26 rows=1 width=0) (actual time=513.734..513.735
rows=1 loops=1)
 Planning time: 0.024 ms
 Execution time: 513.757 ms
(3 rows)

Time: 514,125 ms
test=> explain analyze select f1('2b00042f7481c7b056c4b410d28f33
cf'::char(32));
   QUERY PLAN



 Result  (cost=0.00..0.26 rows=1 width=0) (actual time=513.507..513.509
rows=1 loops=1)
 Planning time: 0.074 ms
 Execution time: 513.535 ms
(3 rows)

Time: 514,104 ms
test=>
​
​Seems that casting param from text to char(32) needs to be done only once
and​ f1 and f2 must be identical on performance. But function f2 with text
param significantly slower, even with casting arg while pass it to function.

Tested postgresql versions 9.5.5 and 9.6.1 on Ubuntu 16.04. It's normal
behavior or it's can be fixed?

-- 
​Andrey Khozov


Re: [PERFORM] performance issue with bitmap index scans on huge amounts of big jsonb documents

2016-12-09 Thread Merlin Moncure
On Wed, Nov 30, 2016 at 6:26 AM, Marc-Olaf Jaschke
 wrote:
> Hi,
>
> i have a performance issue with bitmap index scans on huge amounts of big 
> jsonb documents.
>
>
> = Background =
>
> - table with big jsonb documents
> - gin index on these documents
> - queries using index conditions with low selectivity
>
>
> = Example =
>
> select version();
>> PostgreSQL 9.6.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 
>> 20120313 (Red Hat 4.4.7-17), 64-bit
>
> show work_mem;
>> 1GB
>
> -- setup test data
> create table bitmap_scan_test as
> select
> i,
> (select jsonb_agg(jsonb_build_object('x', i % 2, 'filler', md5(j::text))) 
> from generate_series(0, 100) j) big_jsonb
> from
> generate_series(0, 10) i;
>
> create index on bitmap_scan_test using gin (big_jsonb);
>
> analyze bitmap_scan_test;
>
>
> --  query with bitmap scan
> explain analyze
> select
> count(*)
> from
> bitmap_scan_test
> where
> big_jsonb @> '[{"x": 1, "filler": "cfcd208495d565ef66e7dff9f98764da"}]';
>
> Aggregate  (cost=272.74..272.75 rows=1 width=8) (actual time=622.272..622.272 
> rows=1 loops=1)
>   ->  Bitmap Heap Scan on bitmap_scan_test  (cost=120.78..272.49 rows=100 
> width=0) (actual time=16.496..617.431 rows=5 loops=1)
> Recheck Cond: (big_jsonb @> '[{"x": 1, "filler": 
> "cfcd208495d565ef66e7dff9f98764da"}]'::jsonb)
> Heap Blocks: exact=637
> ->  Bitmap Index Scan on bitmap_scan_test_big_jsonb_idx  
> (cost=0.00..120.75 rows=100 width=0) (actual time=16.371..16.371 rows=5 
> loops=1)
>   Index Cond: (big_jsonb @> '[{"x": 1, "filler": 
> "cfcd208495d565ef66e7dff9f98764da"}]'::jsonb)
> Planning time: 0.106 ms
> Execution time: 622.334 ms
>
>
> perf top -p... shows heavy usage of pglz_decompress:
>
> Overhead  Shared Object Symbol
>   51,06%  postgres  [.] pglz_decompress
>7,33%  libc-2.12.so  [.] memcpy

Another thing to possibly look at is configuring the column not to
compress; over half the time is spent decompressing the data.  See:
ALTER [ COLUMN ] column_name SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN }

Naturally this is a huge tradeoff so do some careful analysis before
making the change.

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance issue with bitmap index scans on huge amounts of big jsonb documents

2016-12-05 Thread Marc-Olaf Jaschke
Thanks for the explanation!

Best Regards,
Marc-Olaf

Marc-Olaf Jaschke · Softwareentwickler
shopping24 GmbH

Werner-Otto-Straße 1-7 · 22179 Hamburg
Telefon: +49 (0) 40 6461 5830 · Fax: +49 (0) 40 64 61 7879
marc-olaf.jasc...@s24.com · www.s24.com
AG Hamburg HRB 63371
vertreten durch Dr. Björn Schäfers und Martin Mildner

2016-12-05 3:28 GMT+01:00 Jeff Janes :

>
> > big_jsonb @> '[{"x": 1, "filler": "cfcd208495d565ef66e7dff9f98764da"}]';
>
>
>> I wonder why bitmap heap scan adds such a big amount of time on top of
>> the plain bitmap index scan.
>> It seems to me, that the recheck is active although all blocks are exact
>> [1] and that pg is loading the jsonb for the recheck.
>>
>> Is this an expected behavior?
>>
>
>
> Yes, this is expected.  The gin index is lossy.  It knows that all the
> elements are present (except when it doesn't--large elements might get
> hashed down and suffer hash collisions), but it doesn't know what the
> recursive structure between them is, and has to do a recheck.
>
> For example, if you change your example where clause to:
>
> big_jsonb @> '[{"filler": 1, "x": "cfcd208495d565ef66e7dff9f98764da"}]';
>
> You will see that the index still returns 50,000 rows, but now all of them
> get rejected upon the recheck.
>
> You could try changing the type of index to jsonb_path_ops.  In your given
> example, it won't make a difference, because you are actually counting half
> the table and so half the table needs to be rechecked.  But in my example,
> jsonb_path_ops successfully rejects all the rows at the index stage.
>
> Cheers,
>
> Jeff
>


Re: [PERFORM] performance issue with bitmap index scans on huge amounts of big jsonb documents

2016-12-04 Thread Jeff Janes
> big_jsonb @> '[{"x": 1, "filler": "cfcd208495d565ef66e7dff9f98764da"}]';


> I wonder why bitmap heap scan adds such a big amount of time on top of the
> plain bitmap index scan.
> It seems to me, that the recheck is active although all blocks are exact
> [1] and that pg is loading the jsonb for the recheck.
>
> Is this an expected behavior?
>


Yes, this is expected.  The gin index is lossy.  It knows that all the
elements are present (except when it doesn't--large elements might get
hashed down and suffer hash collisions), but it doesn't know what the
recursive structure between them is, and has to do a recheck.

For example, if you change your example where clause to:

big_jsonb @> '[{"filler": 1, "x": "cfcd208495d565ef66e7dff9f98764da"}]';

You will see that the index still returns 50,000 rows, but now all of them
get rejected upon the recheck.

You could try changing the type of index to jsonb_path_ops.  In your given
example, it won't make a difference, because you are actually counting half
the table and so half the table needs to be rechecked.  But in my example,
jsonb_path_ops successfully rejects all the rows at the index stage.

Cheers,

Jeff


[PERFORM] performance issue with bitmap index scans on huge amounts of big jsonb documents

2016-11-30 Thread Marc-Olaf Jaschke
Hi,

i have a performance issue with bitmap index scans on huge amounts of big jsonb 
documents.


= Background = 

- table with big jsonb documents
- gin index on these documents
- queries using index conditions with low selectivity


= Example = 

select version(); 
> PostgreSQL 9.6.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 
> (Red Hat 4.4.7-17), 64-bit

show work_mem;
> 1GB

-- setup test data
create table bitmap_scan_test as
select
i,
(select jsonb_agg(jsonb_build_object('x', i % 2, 'filler', md5(j::text))) from 
generate_series(0, 100) j) big_jsonb
from
generate_series(0, 10) i;

create index on bitmap_scan_test using gin (big_jsonb);

analyze bitmap_scan_test;


--  query with bitmap scan
explain analyze
select
count(*)
from
bitmap_scan_test
where
big_jsonb @> '[{"x": 1, "filler": "cfcd208495d565ef66e7dff9f98764da"}]';

Aggregate  (cost=272.74..272.75 rows=1 width=8) (actual time=622.272..622.272 
rows=1 loops=1)
  ->  Bitmap Heap Scan on bitmap_scan_test  (cost=120.78..272.49 rows=100 
width=0) (actual time=16.496..617.431 rows=5 loops=1)
Recheck Cond: (big_jsonb @> '[{"x": 1, "filler": 
"cfcd208495d565ef66e7dff9f98764da"}]'::jsonb)
Heap Blocks: exact=637
->  Bitmap Index Scan on bitmap_scan_test_big_jsonb_idx  
(cost=0.00..120.75 rows=100 width=0) (actual time=16.371..16.371 rows=5 
loops=1)
  Index Cond: (big_jsonb @> '[{"x": 1, "filler": 
"cfcd208495d565ef66e7dff9f98764da"}]'::jsonb)
Planning time: 0.106 ms
Execution time: 622.334 ms


perf top -p... shows heavy usage of pglz_decompress:

Overhead  Shared Object Symbol
  51,06%  postgres  [.] pglz_decompress
   7,33%  libc-2.12.so  [.] memcpy
...

= End of example =  


I wonder why bitmap heap scan adds such a big amount of time on top of the 
plain bitmap index scan. 
It seems to me, that the recheck is active although all blocks are exact [1] 
and that pg is loading the jsonb for the recheck.

Is this an expected behavior?


Regards,
Marc-Olaf


[1] 
(http://dba.stackexchange.com/questions/106264/recheck-cond-line-in-query-plans-with-a-bitmap-index-scan)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] performance issue with inherited foreign table

2016-01-27 Thread Dzmitry Nikitsin
in postgres 9.5.0 i have partitioned table, that collect data by months, i 
tried to use new postgres feature foreign table inheritance & pushed one month 
of data to another postgres server, so i got foreign table. when i am running 
my query from my primary server, query takes 7x more time to execute query, 
then on another server where i have foreign table. i am not passing a lot of 
data by network, my query looking like

explain analyze SELECT source, global_action, paid, organic, device, count(*) 
as count, sum(price) as sum FROM "toys" WHERE "toys"."container_id" = 857 AND 
(toys.created_at >= '2015-12-02 05:00:00.00') AND (toys.created_at <= 
'2015-12-30 04:59:59.99') AND ("toys"."source" IS NOT NULL) GROUP BY 
"toys"."source", "toys"."global_action", "toys"."paid", "toys"."organic", 
"toys"."device";

HashAggregate (cost=1143634.94..1143649.10 rows=1133 width=15) (actual 
time=1556.894..1557.017 rows=372 loops=1) Group Key: toys.source, 
toys.global_action, toys.paid, toys.organic, toys.device -> Append 
(cost=0.00..1143585.38 rows=2832 width=15) (actual time=113.420..1507.373 
rows=76593 loops=1) -> Seq Scan on toys (cost=0.00..0.00 rows=1 width=242) 
(actual time=0.001..0.001 rows=0 loops=1) Filter: ((source IS NOT NULL) AND 
(created_at >= '2015-12-02 05:00:00'::timestamp without time zone) AND 
(created_at <= '2015-12-30 04:59:59.99'::timestamp without time zone) AND 
(container_id = 857)) -> Foreign Scan on job_stats_201512_new 
(cost=100.00..1143585.38 rows=2831 width=15) (actual time=113.419..1488.445 
rows=76593 loops=1) Planning time: 2.990 ms Execution time: 1560.131 ms

does postgres use indexes on foreign tables ?(i have indexes defined in foreign 
table), if i run query directly on that server it takes - 200ms.





Re: [PERFORM] Performance issue with NestedLoop query

2015-08-05 Thread Qingqing Zhou
On Tue, Aug 4, 2015 at 8:40 PM, Ram N yrami...@gmail.com wrote:

 Thanks much for responding guys. I have tried both, building multi column
 indexes and GIST, with no improvement. I have reduced the window from 180
 days to 30 days and below are the numbers

 Composite index -  takes 30 secs

 With Btree indexing  - takes 9 secs

 With GIST - takes 30 secs with kind of materialize plan in explain

 Any other ideas I can do for window based joins.


From this query:

select sum(a), count(id), a.ts, st from table1 a, table2 b where a.ts
 b.start_date and a.ts  b.end_date and a.ts  '2015-01-01
20:50:44.00 +00:00:00' and a.ts  '2015-07-01 19:50:44.00
+00:00:00' group by a.ts, st order by a.ts

We can actually derive that b.start_date  '2015-07-01 19:50:44.00
+00:00:00' and b.end_date  '2015-01-01 20:50:44.00 +00:00:00'. If
we add these two predicates to the original query, does it help?

Thanks,
Qingqing


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance issue with NestedLoop query

2015-08-04 Thread Ram N
Thanks much for responding guys. I have tried both, building multi column
indexes and GIST, with no improvement. I have reduced the window from 180
days to 30 days and below are the numbers

Composite index -  takes 30 secs

With Btree indexing  - takes 9 secs

With GIST - takes 30 secs with kind of materialize plan in explain

Any other ideas I can do for window based joins.

--yr


On Fri, Jul 31, 2015 at 11:37 AM, Qingqing Zhou zhouqq.postg...@gmail.com
wrote:

 On Fri, Jul 31, 2015 at 10:55 AM, Ram N yrami...@gmail.com wrote:
 
  Thanks Qingqing for responding. That didn't help. It in fact increased
 the
  scan time. Looks like a lot of time is being spent on the NestedLoop Join
  than index lookups though I am not sure how to optimize the join.
 

 Good news is that optimizer is right this time :-). The NLJ here does
 almost nothing but schedule each outer row to probing the inner index.
 So the index seek is the major cost.

 Have you tried build a two column index on (b.start_date, b.end_date)?

 Regards,
 Qingqing



Re: [PERFORM] Performance issue with NestedLoop query

2015-07-31 Thread Matheus de Oliveira
On Thu, Jul 30, 2015 at 4:51 AM, Ram N yrami...@gmail.com wrote:

 select sum(a), count(id), a.ts, st from table1 a, table2 b where a.ts 
 b.start_date and a.ts  b.end_date and a.ts  '2015-01-01 20:50:44.00
 +00:00:00' and a.ts  '2015-07-01 19:50:44.00 +00:00:00' group by a.ts,
 st order by a.ts


You could try to use a range type:

CREATE INDEX ON table2 USING gin (tstzrange(start_date, end_date,
'()'));

Then:

select sum(a), count(id), a.ts, st
from table1 a, table2 b
where tstzrange(b.start_date, b.end_date, '()') @ a.ts
and a.ts  '2015-07-01 19:50:44.00 +00:00:00'
group by a.ts, st
order by a.ts

Regards,
-- 
Matheus de Oliveira


Re: [PERFORM] Performance issue with NestedLoop query

2015-07-31 Thread Qingqing Zhou
On Fri, Jul 31, 2015 at 10:55 AM, Ram N yrami...@gmail.com wrote:

 Thanks Qingqing for responding. That didn't help. It in fact increased the
 scan time. Looks like a lot of time is being spent on the NestedLoop Join
 than index lookups though I am not sure how to optimize the join.


Good news is that optimizer is right this time :-). The NLJ here does
almost nothing but schedule each outer row to probing the inner index.
So the index seek is the major cost.

Have you tried build a two column index on (b.start_date, b.end_date)?

Regards,
Qingqing


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance issue with NestedLoop query

2015-07-31 Thread Matheus de Oliveira
On Fri, Jul 31, 2015 at 3:06 PM, Matheus de Oliveira 
matioli.math...@gmail.com wrote:

 CREATE INDEX ON table2 USING gin (tstzrange(start_date, end_date,
 '()'));


The index should be USING GIST, not GIN. Sorry.


-- 
Matheus de Oliveira


Re: [PERFORM] Performance issue with NestedLoop query

2015-07-31 Thread Ram N
Thanks Qingqing for responding. That didn't help. It in fact increased the
scan time. Looks like a lot of time is being spent on the NestedLoop Join
than index lookups though I am not sure how to optimize the join. I am
assuming its in memory join, so I am not sure why it should take such a lot
of time. Increase work_mem has helped in reducing the processing time but
it's still  1 min.

--yr

On Thu, Jul 30, 2015 at 1:24 PM, Qingqing Zhou zhouqq.postg...@gmail.com
wrote:

 On Thu, Jul 30, 2015 at 12:51 AM, Ram N yrami...@gmail.com wrote:
-  Index Scan using end_date_idx on public.table2 b
  (cost=0.43..23181.37 rows=345833 width=52) (actual time=0.063..622.274
  rows=403936 loops=181)
  Output: b.serial_no, b.name, b.st, b.end_date, b.a,
  b.start_date
  Index Cond: (a.ts  b.end_date)
  Filter: (a.ts  b.start_date)
  Rows Removed by Filter: 392642

 In your case, do you have index built for both b.end_date and
 b.start_date? If so, can you try

 set enable_index=off

 to see if bitmap heap scan helps?

 Regards,
 Qingqing



Re: [PERFORM] Performance issue with NestedLoop query

2015-07-30 Thread Qingqing Zhou
On Thu, Jul 30, 2015 at 12:51 AM, Ram N yrami...@gmail.com wrote:
   -  Index Scan using end_date_idx on public.table2 b
 (cost=0.43..23181.37 rows=345833 width=52) (actual time=0.063..622.274
 rows=403936 loops=181)
 Output: b.serial_no, b.name, b.st, b.end_date, b.a,
 b.start_date
 Index Cond: (a.ts  b.end_date)
 Filter: (a.ts  b.start_date)
 Rows Removed by Filter: 392642

In your case, do you have index built for both b.end_date and
b.start_date? If so, can you try

set enable_index=off

to see if bitmap heap scan helps?

Regards,
Qingqing


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Performance issue with NestedLoop query

2015-07-30 Thread Ram N
Hi,

I am trying to see if I can do anything to optimize the following plan.

I have two tables and I am doing a join between them. After joining it
calculates aggregates (Sum and Count)
Table 1 : timestamp (one per day) for 2 years (730 records)
Table 2 : Window based validity records. Window here means start and end
timestamp indicating a period of validity for a record.
Hash some 10 odd columns including start_time and end_time.  (1 million
records)

Machine has 244 GB RAM. Queries are taking more than a min and in some case
2-3 mins.

Below is the plan I am getting. The Nested loop blows up the number of
records and we expect that. I have tried playing around work_mem and cache
configs which hasn't helped.

Query
select sum(a), count(id), a.ts, st from table1 a, table2 b where a.ts 
b.start_date and a.ts  b.end_date and a.ts  '2015-01-01 20:50:44.00
+00:00:00' and a.ts  '2015-07-01 19:50:44.00 +00:00:00' group by a.ts,
st order by a.ts

Plan (EXPLAIN ANALYZE)
Sort  (cost=10005447874.54..10005447879.07 rows=1810 width=44) (actual
time=178883.936..178884.159 rows=1355 loops=1)
  Output: (sum(b.a)), (count(b.id)), a.ts, b.st
  Sort Key: a.ts
  Sort Method: quicksort  Memory: 154kB
  Buffers: shared hit=47068722 read=102781
  I/O Timings: read=579.946
  -  HashAggregate  (cost=10005447758.51..10005447776.61 rows=1810
width=44) (actual time=178882.874..178883.320 rows=1355 loops=1)
Output: sum(b.a), count(b.id), a.ts, b.st
Group Key: a.ts, b.st
Buffers: shared hit=47068719 read=102781
I/O Timings: read=579.946
-  Nested Loop  (cost=100.43..10004821800.38
rows=62595813 width=44) (actual time=0.167..139484.854 rows=73112419
loops=1)
  Output: a.ts, b.st, b.a, b.id
  Buffers: shared hit=47068719 read=102781
  I/O Timings: read=579.946
  -  Seq Scan on public.table1 a  (cost=0.00..14.81 rows=181
width=8) (actual time=0.058..0.563 rows=181 loops=1)
Output: a.ts
Filter: ((a.ts  '2015-01-01 20:50:44+00'::timestamp
with time zone) AND (a.ts  '2015-07-01 19:50:44+00'::timestamp with time
zone))
Rows Removed by Filter: 540
Buffers: shared read=4
I/O Timings: read=0.061
  -  Index Scan using end_date_idx on public.table2 b
 (cost=0.43..23181.37 rows=345833 width=52) (actual time=0.063..622.274
rows=403936 loops=181)
Output: b.serial_no, b.name, b.st, b.end_date, b.a,
b.start_date
Index Cond: (a.ts  b.end_date)
Filter: (a.ts  b.start_date)
Rows Removed by Filter: 392642
Buffers: shared hit=47068719 read=102777
I/O Timings: read=579.885
Planning time: 0.198 ms
Execution time: 178884.467 ms

Any pointers on how to go about optimizing this?

--yr


Re: [PERFORM] Performance issue: index not used on GROUP BY...

2014-09-03 Thread gmb
Thanks for the feedback, everybody.
I spent a couple of days trying to optimise this;  
As mentioned , the increased memory is not an option for me, as this query
is part of a report that can be run by any user on an ad hoc basis.
Allocating the required memory to any session on demand is not feasible in
this environment.

In the end , it seems to me that a more sustainable solution will be to
introduce an additional table to carry the summarized values and lookup on
that table in this type of scenario.

Regards



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Performance-issue-index-not-used-on-GROUP-BY-tp5816702p5817622.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Performance issue: index not used on GROUP BY...

2014-08-28 Thread gmb
Hi all

I have the following table with 10+ million records:

create table ddetail (
ddet_id serial,
co_id integer,
client_id integer,
doc_no varchar,
line_id integer,
batch_no integer,
amount NUMERIC , 
...,
constraint PRIMAR KEY (  co_id ,  client_id , doc_no ,  line_id, ddet_id )
) ;

When doing the following query on this table, performance is really slow:

SELECT  co_id ,  client_id , doc_no ,  line_id , sum( amount  )
FROM ddetail 
GROUP BY co_id ,  client_id , doc_no ,  line_id 

It seems as if the planner is not using the PRIMARY KEY as index which was
my assumption.

Can somebody please confirm whether aggregate functions such as GROUP BY
should use indexes ? 


Thanks in advance

gmb 



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Performance-issue-index-not-used-on-GROUP-BY-tp5816702.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance issue: index not used on GROUP BY...

2014-08-28 Thread Victor Yegorov
2014-08-28 11:50 GMT+03:00 gmb gmbou...@gmail.com:

 It seems as if the planner is not using the PRIMARY KEY as index which was
 my assumption.


Can you send `EXPLAIN (analyze, buffers)` for your query instead?
It'll show exactly what's going on.


-- 
Victor Y. Yegorov


Re: [PERFORM] Performance issue: index not used on GROUP BY...

2014-08-28 Thread gmb


 Can you send `EXPLAIN (analyze, buffers)` for your query instead?
 It'll show exactly what's going on.

GroupAggregate  (cost=303425.31..339014.43 rows=136882 width=48) (actual
time=4708.181..6688.699 rows=287268 loops=1)
  Buffers: shared read=23899, temp read=30974 written=30974
  -  Sort  (cost=303425.31..306847.34 rows=1368812 width=48) (actual
time=4708.170..5319.429 rows=1368744 loops=1)
Sort Key: co_id, client_id, doc_no, 
Sort Method: external merge  Disk: 80304kB
Buffers: shared read=23899, temp read=30974 written=30974
-  Seq Scan on ddetail  (cost=0.00..37587.12 rows=1368812 width=48)
(actual time=0.122..492.964 rows=1368744 loops=1)
  Buffers: shared read=23899
Total runtime: 6708.244 ms


My initial attempt was this  (this is what I actually need):

SELECT  co_id ,  client_id , doc_no ,  line_id , batch_no , sum( amount  )
FROM ddetail
GROUP BY co_id ,  client_id , doc_no ,  line_id  , batch_no ;

but I removed column batch_no from the query because I thought this was the
cause of the problem ( batch_no is not part of my PK ).


Thanks 



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Performance-issue-index-not-used-on-GROUP-BY-tp5816702p5816706.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance issue: index not used on GROUP BY...

2014-08-28 Thread Victor Yegorov
2014-08-28 12:08 GMT+03:00 gmb gmbou...@gmail.com:

 GroupAggregate  (cost=303425.31..339014.43 rows=136882 width=48) (actual
 time=4708.181..6688.699 rows=287268 loops=1)
   Buffers: shared read=23899, temp read=30974 written=30974
   -  Sort  (cost=303425.31..306847.34 rows=1368812 width=48) (actual
 time=4708.170..5319.429 rows=1368744 loops=1)
 Sort Key: co_id, client_id, doc_no,
 Sort Method: external merge  Disk: 80304kB
 Buffers: shared read=23899, temp read=30974 written=30974
 -  Seq Scan on ddetail  (cost=0.00..37587.12 rows=1368812
 width=48)
 (actual time=0.122..492.964 rows=1368744 loops=1)
   Buffers: shared read=23899
 Total runtime: 6708.244 ms


 My initial attempt was this  (this is what I actually need):

 SELECT  co_id ,  client_id , doc_no ,  line_id , batch_no , sum( amount  )
 FROM ddetail
 GROUP BY co_id ,  client_id , doc_no ,  line_id  , batch_no ;


I think index will be of no help here, as (1) you're reading whole table
anyway and (2) `amount` is not part of your index.

Try to avoid disk-based sort by increasing `work_mem` for your session, I
think value in the range 120MB-150MB should work:

SET work_mem TO '150MB';

Check `EXPLAIN` output after the change.

-- 
Victor Y. Yegorov


Re: [PERFORM] Performance issue: index not used on GROUP BY...

2014-08-28 Thread Marti Raudsepp
On Thu, Aug 28, 2014 at 11:50 AM, gmb gmbou...@gmail.com wrote:
 Can somebody please confirm whether aggregate functions such as GROUP BY
 should use indexes ?

Yes, if the planner deems it faster than other approaches. It can make
wrong choices for many reasons, but usually when your planner tunables
like random_page_cost, effective_cache_size aren't set appropriately.

There's some advice here:
https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server

Just for the purpose of testing, you could try set enable_sort=false
in your session and see if that makes it faster.

On Thu, Aug 28, 2014 at 12:08 PM, gmb gmbou...@gmail.com wrote:
 Sort Key: co_id, client_id, doc_no,

Something went missing from this line...

 Sort Method: external merge  Disk: 80304kB

Depends on your hardware and workloads, but more work_mem may also
improve queries to avoid sorts and hashes needing to use disk. But
beware, setting it too high may result in your server running out of
memory.

Regards,
Marti


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance issue: index not used on GROUP BY...

2014-08-28 Thread gmb

Thanks for these suggestions

Unfortunately , I don't have a lot of memory available ( 65 connections ,
work_mem = 64MB in pg conf ).

 I think index will be of no help here, as (1) you're reading whole table
 anyway and (2) `amount` is not part of your index.

I did not think that the the field being used in the agg function should
also be part of the index. 
I'll try this and check the result. 

My problem is that dropping / adding indexes on this table takes a LOT of
time, so I'm stuck with doing the tests using the indexes as is, or doing
the tests on a smaller dataset.

On the smaller dataset ( 1.5 mill records on that table ) the planner did
not take the index into account, even when I omit the amount column:


CREATE INDEX ix_1
  ON ddetail
  USING btree
  (co_id ,  client_id , doc_no ,  line_id , batch_no);

SELECT  co_id ,  client_id , doc_no ,  line_id , batch_no 
FROM ddetail
GROUP BY co_id ,  client_id , doc_no ,  line_id  , batch_no ;

HashAggregate  (cost=54695.74..56064.49 rows=136875 width=22)
  -  Seq Scan on debfdetail  (cost=0.00..37586.44 rows=1368744 width=22)

still does a seq scan instead of the index scan.
I guess it is possible that on the 1.4 million records, it is faster to do a
seq scan ? 
So I guess I'll  have to try and do this on the 10 mill table and check the
result there.








--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Performance-issue-index-not-used-on-GROUP-BY-tp5816702p5816715.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance issue: index not used on GROUP BY...

2014-08-28 Thread Victor Yegorov
2014-08-28 14:29 GMT+03:00 gmb gmbou...@gmail.com:

 Unfortunately , I don't have a lot of memory available ( 65 connections ,
 work_mem = 64MB in pg conf ).


You don't have to change cluster-wide settings here.

You can issue `SET` command from your client right before running your
query, only your session will be affected.


-- 
Victor Y. Yegorov


Re: [PERFORM] Performance issue: index not used on GROUP BY...

2014-08-28 Thread Josh Berkus
On 08/28/2014 01:50 AM, gmb wrote:
 Can somebody please confirm whether aggregate functions such as GROUP BY
 should use indexes ? 

Sometimes.  In your case, the index has one more column than the GROUP
BY, which makes it less likely that Postgres will use it (since
depending on the cardinality of ddet_id, it might actually be slower to
use the index).

In addition, other folks on this thread have already pointed out the
memory settings issues to you.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Performance issue with Insert

2011-06-27 Thread Jenish
Hi All,

I am facing some performance issue with insert into some table.

I am using postgres 8.4.x

Table is having 3 before insert trigger and one after insert trigger.

With all triggers enable it is inserting only 4-5 record per second.

But if I disable after insert trigger it is able to insert 667 records per
second.

After insert trigger is recursive trigger.


My question.

How to avoid the bottleneck?

Parallel processing is possible in Postgres? How?


Please give you suggestion.

-- 
Thanks  regards,
JENISH VYAS


Re: [PERFORM] Performance issue with Insert

2011-06-27 Thread Kevin Grittner
Jenish jenishv...@gmail.com wrote:
 
 I am using postgres 8.4.x
 
With x being what?  On what OS and hardware?
 
 Table is having 3 before insert trigger and one after insert
 trigger.
 
 With all triggers enable it is inserting only 4-5 record per
 second.
 
 But if I disable after insert trigger it is able to insert 667
 records per second.
 
http://wiki.postgresql.org/wiki/SlowQueryQuestions
 
 After insert trigger is recursive trigger.
 
So are you counting only the top level inserts or also the ones
generated by the recursive inserts?
 
 My question.
 
 How to avoid the bottleneck?
 
First you need to find out what the bottleneck is.
 
 Parallel processing is possible in Postgres? How?
 
To achieve parallel processing in PostgreSQL you need to use
multiple connections.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance issue with Insert

2011-06-27 Thread Jenish
Hi,

DB : POSTGRES 8.4.8
OS  : Debian
HD : SAS 10k rpm

Shared_buffer is 4096 25 % of RAM , effective_cache is 8GB 75% of RAM

After insert trigger is again calling 2 more trigger and insert record in
another table depends on condition.

with all trigger enable there are 8 insert and 32 updates(approx. update is
depends on hierarchy)

Plz explain multiple connections. Current scenario application server is
sending all requests.

-- 
Thanks  regards,
JENISH VYAS

On Mon, Jun 27, 2011 at 5:37 PM, Kevin Grittner kevin.gritt...@wicourts.gov
 wrote:

 Jenish jenishv...@gmail.com wrote:

  I am using postgres 8.4.x

 With x being what?  On what OS and hardware?

  Table is having 3 before insert trigger and one after insert
  trigger.
 
  With all triggers enable it is inserting only 4-5 record per
  second.
 
  But if I disable after insert trigger it is able to insert 667
  records per second.

 http://wiki.postgresql.org/wiki/SlowQueryQuestions

  After insert trigger is recursive trigger.

 So are you counting only the top level inserts or also the ones
 generated by the recursive inserts?

  My question.
 
  How to avoid the bottleneck?

 First you need to find out what the bottleneck is.

  Parallel processing is possible in Postgres? How?

 To achieve parallel processing in PostgreSQL you need to use
 multiple connections.

 -Kevin



Re: [PERFORM] Performance issue with Insert

2011-06-27 Thread tv
 Hi,

 DB : POSTGRES 8.4.8
 OS  : Debian
 HD : SAS 10k rpm

 Shared_buffer is 4096 25 % of RAM , effective_cache is 8GB 75% of RAM

 After insert trigger is again calling 2 more trigger and insert record in
 another table depends on condition.

 with all trigger enable there are 8 insert and 32 updates(approx. update
 is
 depends on hierarchy)

Hi,

it's very difficult to give you reliable recommendations with this little
info, but the triggers are obviously the bottleneck. We have no idea what
queries are executed in them, but I guess there are some slow queries.

Find out what queries are executed in the triggers, benchmark each of them
and make them faster. Just don't forget that those SQL queries are
executed as prepared statements, so they may behave a bit differently than
plain queries. So use 'PREPARE' and 'EXPLAIN EXECUTE' to tune them.

 Plz explain multiple connections. Current scenario application server is
 sending all requests.

PostgreSQL does not support parallel queries (i.e. a query distributed on
multiple CPUs) so each query may use just a single CPU. If you're CPU
bound (one CPU is 100% utilized but the other CPUs are idle), you can
usually parallelize the workload on your own - just use multiple
connections.

But if you're using an application server and there are multiple
connections used, this is not going to help you. How many connections are
active at the same time? Are the CPUs idle or utilized?

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance issue with Insert

2011-06-27 Thread Jenish
Hi,

I have already checked all the statements present in the trigger, no one is
taking more then 20 ms.

I am using 8-Processor, Quad-Core Server ,CPU utilization is more then 90-95
% for all. (htop result)

DB has 960 concurrent users.

io : writing 3-4 MB per second or less (iotop result).

Scenario :  All insert are waiting for previous insert to complete. Cant
we avoid this situation ?
What is the max_connections postgresql support?

Plz help


-- 
Thanks  regards,
JENISH VYAS






On Mon, Jun 27, 2011 at 6:32 PM, t...@fuzzy.cz wrote:

  Hi,
 
  DB : POSTGRES 8.4.8
  OS  : Debian
  HD : SAS 10k rpm
 
  Shared_buffer is 4096 25 % of RAM , effective_cache is 8GB 75% of RAM
 
  After insert trigger is again calling 2 more trigger and insert record in
  another table depends on condition.
 
  with all trigger enable there are 8 insert and 32 updates(approx. update
  is
  depends on hierarchy)

 Hi,

 it's very difficult to give you reliable recommendations with this little
 info, but the triggers are obviously the bottleneck. We have no idea what
 queries are executed in them, but I guess there are some slow queries.

 Find out what queries are executed in the triggers, benchmark each of them
 and make them faster. Just don't forget that those SQL queries are
 executed as prepared statements, so they may behave a bit differently than
 plain queries. So use 'PREPARE' and 'EXPLAIN EXECUTE' to tune them.

  Plz explain multiple connections. Current scenario application server is
  sending all requests.

 PostgreSQL does not support parallel queries (i.e. a query distributed on
 multiple CPUs) so each query may use just a single CPU. If you're CPU
 bound (one CPU is 100% utilized but the other CPUs are idle), you can
 usually parallelize the workload on your own - just use multiple
 connections.

 But if you're using an application server and there are multiple
 connections used, this is not going to help you. How many connections are
 active at the same time? Are the CPUs idle or utilized?

 Tomas




Re: [PERFORM] Performance issue with Insert

2011-06-27 Thread Merlin Moncure
On Mon, Jun 27, 2011 at 9:22 AM, Jenish jenishv...@gmail.com wrote:
 Hi All,

 I am facing some performance issue with insert into some table.

 I am using postgres 8.4.x

 Table is having 3 before insert trigger and one after insert trigger.

 With all triggers enable it is inserting only 4-5 record per second.

 But if I disable after insert trigger it is able to insert 667 records per
 second.

 After insert trigger is recursive trigger.

 My question.

 How to avoid the bottleneck?

 Parallel processing is possible in Postgres? How?

 Please give you suggestion.

this sounds like a coding issue -- to get to the bottom of this we are
going to need to see the table and the triggers.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance issue with Insert

2011-06-27 Thread Tomas Vondra
Dne 27.6.2011 17:58, Jenish napsal(a):
 
 Hi,
 
 I have already checked all the statements present in the trigger, no one
 is taking more then 20 ms.
 
 I am using 8-Processor, Quad-Core Server ,CPU utilization is more then
 90-95 % for all. (htop result)

So all cores are 95% utilized? That means you're CPU bound and you need
to fix that somehow.

How much of that belongs to postgres? Are there other processes
consuming significant portion of CPU? And what do you mean by
'utilized'? Does that mean user/sys time, or wait time?

 DB has 960 concurrent users. 

Whad does that mean? Does that mean there are 960 active connections?

 io : writing 3-4 MB per second or less (iotop result).

Sequential or random? Post a few lines of 'iostat -x 1' and a few lines
of 'vmstat 1' (collected when the database is busy).

 Scenario :  All insert are waiting for previous insert to complete. Cant
 we avoid this situation ?

What do you mean by 'previous'? Does that mean another insert in the
same session (connection), or something performed in another session?

 What is the max_connections postgresql support? 

That limits number of background processes - each connection is served
by a dedicated posgres process. You can see that in top / ps output.

High values usually mean you need some kind of pooling (you probably
already have one as you're using application server). And if the
connections are really active (doing something all the time), this
should not be significantly higher than the number of cores.

See, you have 8 cores, which means 8 seconds of CPU time each second. No
matter how many connections you allow, you still have just those 8
seconds. So if you need to perform 100x something that takes 1 second,
you need to spend 100 seconds of CPU time. So with those 8 cores, you
can do that in about 12,5 seconds.

Actually if you create too many connections, you'll notice it takes much
more - there's an overhead with process management, context switching,
locking etc.

regards
Tomas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance issue with Insert

2011-06-27 Thread Tomas Vondra
Dne 27.6.2011 17:01, Jenish napsal(a):
 Hi,
 
 DB : POSTGRES 8.4.8
 OS  : Debian
 HD : SAS 10k rpm
 
 Shared_buffer is 4096 25 % of RAM , effective_cache is 8GB 75% of RAM

How much data are we talking about? Does that fit into the shared
buffers or is it significantly larger? Do the triggers touch the whole
database or just a small part of it (active part)?

regards
Tomas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance issue with Insert

2011-06-27 Thread Jenish
Hi ,

This server is the dedicated database server.

 And I am testing the limit for the concurrent active users. When I am
running my test for 400 concurrent user ie. Active connection. I am getting
good performance but when I am running the same the same test for 950
concurrent users I am getting very bad performance.



 Scenario :  All insert are waiting for previous insert to complete.

I don’t know whether it is the same session or different session.



DB id huge but Triggers are not touching the whole database.

I’ll provide the result set of vmstat and iostat tomorrow.


-- 
Thanks  regards,
JENISH VYAS



On Mon, Jun 27, 2011 at 10:48 PM, Tomas Vondra t...@fuzzy.cz wrote:

 Dne 27.6.2011 17:01, Jenish napsal(a):
  Hi,
 
  DB : POSTGRES 8.4.8
  OS  : Debian
  HD : SAS 10k rpm
 
  Shared_buffer is 4096 25 % of RAM , effective_cache is 8GB 75% of RAM

 How much data are we talking about? Does that fit into the shared
 buffers or is it significantly larger? Do the triggers touch the whole
 database or just a small part of it (active part)?

 regards
 Tomas

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] Performance issue with Insert

2011-06-27 Thread Tomas Vondra
Dne 27.6.2011 22:14, Jenish napsal(a):
  And I am testing the limit for the concurrent active users. When I am
 running my test for 400 concurrent user ie. Active connection. I am
 getting good performance but when I am running the same the same test
 for 950 concurrent users I am getting very bad performance.

This is typical behaviour - the performance is good up until some point,
then it degrades much faster.

Why do you even need such number of connections? Does that really
improve performance (e.g. how many inserts do you do with 100 and 400
connections)?

Such number of active connections is not going to give you any advantage
I guess ...

regards
Tomas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance issue with Insert

2011-06-27 Thread Kevin Grittner
Jenish jenishv...@gmail.com wrote:
 
 This server is the dedicated database server.
 
 And I am testing the limit for the concurrent active users. When I
 am running my test for 400 concurrent user ie. Active connection.
 I am getting good performance but when I am running the same the
 same test for 950 concurrent users I am getting very bad
 performance.
 
To serve a large number of concurrent users you need to use a
connection pooler which limits the number of database connections to
a small number.  Typically the most effective number of database
connections is somewhere between the number of actual cores on your
server and twice that plus the number of disk drives.  (It depends
on the details of your hardware and your load.)  The connection
pooler should queue requests which arrive when all database
connections are busy and release them for execution as transactions
complete.  Restricting the active database connections in this way
improves both throughput and latency and will allow you to serve a
much larger number of users without getting into bad performance;
and when you do hit the wall performance will degrade more
gracefully.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance issue in the fields.

2011-02-24 Thread Pierre C



I have created two tables. In the first table i am using many fields to
store 3 address.
as well as in b table, i am using array data type to store 3 address.  
 is
there any issue would face in performance related things which one  
will

cause the performance issue.


The array is interesting :
- if you put a gist index on it and do searches like array contains  
values X and Y and Z, gist index has a some special optimizations for this
- if you might store a variable number of integers, and for some reason  
you don't want a normalized one-line-per-value approach



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance issue in the fields.

2011-02-23 Thread Merlin Moncure
On Mon, Feb 14, 2011 at 5:36 AM, Pavel Stehule pavel.steh...@gmail.com wrote:
 Hello

 2011/2/14 dba dba...@gmail.com:


 create table a( address1 int,address2 int,address3 int)
 create table b(address int[3])

 I have created two tables. In the first table i am using many fields to
 store 3 address.
 as well as in b table, i am using array data type to store 3 address.  is
 there any issue would face in performance related things which one will
 cause the performance issue.

 yes, there is. Planner can not to work well with foreign keys stored in array.

also the array variant is going to be bigger on disk. This is because
as fields, all the important info about the fields is stored in the
table header (inside the system catalogs).  But with the array,
various header information specific to the array has to be stored with
each row.  This is largely due to some questionable design decisions
made in early array implementation that we are stuck with :-).

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] performance issue in the fields.

2011-02-14 Thread dba


create table a( address1 int,address2 int,address3 int)
create table b(address int[3])

I have created two tables. In the first table i am using many fields to
store 3 address. 
as well as in b table, i am using array data type to store 3 address.  is
there any issue would face in performance related things which one will
cause the performance issue.
-- 
View this message in context: 
http://postgresql.1045698.n5.nabble.com/performance-issue-in-the-fields-tp3384307p3384307.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance issue in the fields.

2011-02-14 Thread Pavel Stehule
Hello

2011/2/14 dba dba...@gmail.com:


 create table a( address1 int,address2 int,address3 int)
 create table b(address int[3])

 I have created two tables. In the first table i am using many fields to
 store 3 address.
 as well as in b table, i am using array data type to store 3 address.  is
 there any issue would face in performance related things which one will
 cause the performance issue.

yes, there is. Planner can not to work well with foreign keys stored in array.

Regards

Pavel Stehule

 --
 View this message in context: 
 http://postgresql.1045698.n5.nabble.com/performance-issue-in-the-fields-tp3384307p3384307.html
 Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Performance issue - 2 linux machines, identical configs, different performance

2009-06-17 Thread Mark Steben
Hi, sorry about the blank post yesterday - let's try again

 

We have two machines.  Both running Linux Redhat, both running postgres
8.2.5.

Both have nearly identical 125 GB databases.  In fact we use PITR Recovery
to 

Replicate from one to the other.  The machine we replicate to runs a query
with

About 10 inner and left joins about 5 times slower than the original machine

I run an explain on both.  Machine1 (original) planner favors hash joins
about 3 to 1

Over nested loop joins.  Machine2 (replicated) uses only nested loop joins -
no hash at all.

 

A few details - I can always provide more

 

 MACHINE1 - original:

TOTAL RAW MEMORY - 30 GB

TOTAL SHARED MEMORY (shmmax value) - 4 GB

 

 Database configs

  SHARED_BUFFERS - 1525 MB

  MAX_PREPARED_TRANSACTIONS - 5

  WORK_MEM - 300 MB

  MAINTENANCE_WORK_MEM - 512 MB 

  MAX_FSM_PAGES -- 3,000,000

  CHECKPOINT_SEGMENTS - 64

  WAL_BUFFERS -768

   EFFECTIVE_CACHE_SIZE    2 GB

  Planner method configs all turned on by default, including
enable_hashjoin

  

   MACHINE2 - we run 2 postgres instances.  Port 5433 runs continuous PITR
recoveries

   Port 5432 receives the 'latest and greatest' database when port 5433
finishes a recovery

  TOTAL RAW MEMORY - 16 GB (this is a VMWARE setup on a netapp)

  TOTAL SHARED MEMORY (shmmax value) - 4 GB

 

 Database configs - port 5432 instance

   SHARED_BUFFERS  1500 MB

   MAX_PREPARED_TRANSACTIONS - 1 (we don't run prepared transactions
here)

  WORK_MEM - 300 MB

  MAINTENANCE_WORK_MEM - 100 MB  (don't think this comes into play
in this conversation)

  MAX_FSM_PAGES -- 1,000,000

  CHECKPOINT_SEGMENTS - 32

  WAL_BUFFERS -768

  EFFECTIVE_CACHE_SIZE    2 GB

  Planner method configs all turned on by default, including
enable_hashjoin

 

Database configs - port 5433 instance

   SHARED_BUFFERS  1500 MB

   MAX_PREPARED_TRANSACTIONS - 1 (we don't run prepared transactions
here)

  WORK_MEM - 250 MB

  MAINTENANCE_WORK_MEM - 100 MB  (don't think this comes into play
in this conversation)

  MAX_FSM_PAGES -- 1,000,000

  CHECKPOINT_SEGMENTS - 32

  WAL_BUFFERS -768

  EFFECTIVE_CACHE_SIZE    2 GB

  Planner method configs all turned on by default, including
enable_hashjoin

 

   Now some size details about the 11 tables involved in the join

 All join fields are indexed unless otherwise noted and are of type
integer unless otherwise noted

 

TABLE1  -398 pages

TABLE2    5,014 pages INNER JOIN on TABLE1

TABLE3  --- 34,729 pages INNER JOIN on TABLE2 

TABLE4 1,828,000 pages INNER JOIN on TABLE2

TABLE5 1,838,000 pages INNER JOIN on TABLE4

TABLE6 -- 122,500 pages INNER JOIN on TABLE4 

TABLE7 ---  621 pages INNER JOIN on TABLE6

TABLE8  -- 4 pages INNER JOIN on TABLE7 (TABLE7 column
not indexed)

TABLE9 --- 2 pages INNER JOIN on TABLE8 (TABLE8 column
not indexed)

TABLE10 -   13 pages LEFT JOIN on TABLE6  (columns on both
tables text, neither column indexed)

TABLE11 -1,976,430 pages LEFT JOIN on TABLE5. AND explicit join on
TABLE6

   The WHERE clause filters out primary key values from TABLE1 to 1
value and a 1 month range of 

   Indexed dates from TABLE4.

 

 So, my guess is the disparity of performance (40 seconds vs 180 seconds)
has to do with MACHINE2 not

 Availing itself of hash joins which by my understanding is much faster.

 

Any help / insight appreciated.  Thank you

 

  

   

 

 

 

Mark Steben│Database Administrator│ 

@utoRevenue-R- Join the Revenue-tion
95 Ashley Ave. West Springfield, MA., 01089 
413-243-4800 x1512 (Phone) │ 413-732-1824 (Fax)

@utoRevenue is a registered trademark and a division of Dominion Enterprises

 



Re: [PERFORM] Performance issue - 2 linux machines, identical configs, different performance

2009-06-17 Thread Robert Haas
2009/6/17 Mark Steben mste...@autorevenue.com:
 A few details – I can always provide more

Could you send:

1. Exact text of query.

2. EXPLAIN ANALYZE output on each machine.

3. VACUUM VERBOSE output on each machine, or at least the last 10 lines.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance issue - 2 linux machines, identical configs, different performance

2009-06-17 Thread Dave Dutcher

We have two machines.  Both running Linux Redhat, both running postgres
8.2.5.
Both have nearly identical 125 GB databases.  In fact we use PITR Recovery
to 
Replicate from one to the other.  

I have to ask the obvious question.  Do you regularly analyze the machine
you replicate too?


Dave



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance issue - 2 linux machines, identical configs, different performance

2009-06-17 Thread Mark Steben
Yes I analyze after each replication.

Mark Steben│Database Administrator│ 

@utoRevenue-R- Join the Revenue-tion
95 Ashley Ave. West Springfield, MA., 01089 
413-243-4800 x1512 (Phone) │ 413-732-1824 (Fax)

@utoRevenue is a registered trademark and a division of Dominion Enterprises

-Original Message-
From: pgsql-performance-ow...@postgresql.org
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Dave Dutcher
Sent: Wednesday, June 17, 2009 1:39 PM
To: 'Mark Steben'; pgsql-performance@postgresql.org
Cc: 'Rich Garabedian'
Subject: Re: [PERFORM] Performance issue - 2 linux machines, identical
configs, different performance


We have two machines.  Both running Linux Redhat, both running postgres
8.2.5.
Both have nearly identical 125 GB databases.  In fact we use PITR Recovery
to 
Replicate from one to the other.  

I have to ask the obvious question.  Do you regularly analyze the machine
you replicate too?


Dave



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Performance issue using Tsearch2

2008-02-05 Thread Viviane Lestic
Hi,

I'm having a performance problem on a request using Tsearch2: the
request strangely takes several minutes.

I've tried to follow Tsearch tuning recommendations, I've searched
through the archives, but I can't seem to find a solution to solve my
problem.

The ts_vector field was created using dictionnary fr_ispell only on
types lword, lpart_hword and lhword. An index was created on this
field.

According to the stat() function, there are only 42,590 word stems indexed.
I also did a VACUUM FULL ANALYZE.

Here's the result of EXPLAIN ANALYZE on a filtered version of my
request (the non-filtered version takes so long I usually cancel it):
**
explain analyze SELECT idstruct, headline(zonetext, q),
rank(zoneindex_test, q) FROM tab_ocr, tab_chemin, to_tsquery('partir')
AS q WHERE tab_chemin.chemin like '%;2;%' AND tab_ocr.idstruct =
tab_chemin.label AND zoneindex_test @@ q ORDER BY rank(zoneindex_test,
q) DESC;
QUERY PLAN
-
Sort (cost=2345.54..2345.58 rows=16 width=308) (actual
time=270638.774..270643.142 rows=7106 loops=1)
Sort Key: rank(tab_ocr.zoneindex_test, q.q)
- Nested Loop (cost=80.04..2345.22 rows=16 width=308) (actual
time=40886.553..270619.730 rows=7106 loops=1)
- Nested Loop (cost=80.04..1465.76 rows=392 width=308) (actual
time=38209.193..173932.313 rows=272414 loops=1)
- Function Scan on q (cost=0.00..0.01 rows=1 width=32) (actual
time=0.006..0.007 rows=1 loops=1)
- Bitmap Heap Scan on tab_ocr (cost=80.04..1460.85 rows=392
width=276) (actual time=38209.180..173507.052 rows=272414 loops=1)
Filter: (tab_ocr.zoneindex_test @@ q.q)
- Bitmap Index Scan on zoneindex_test_idx (cost=0.00..79.94 rows=392
width=0) (actual time=38204.261..38204.261 rows=283606 loops=1)
Index Cond: (tab_ocr.zoneindex_test @@ q.q)
- Index Scan using tab_chemin_label_index on tab_chemin
(cost=0.00..2.23 rows=1 width=4) (actual time=0.036..0.036 rows=0
loops=272414)
Index Cond: (tab_ocr.idstruct = tab_chemin.label)
Filter: ((chemin)::text ~~ '%;2;%'::text)
Total runtime: 270647.946 ms
**

Could someone help me analyze this problem?
I don't manage to see if the problem comes from bad tsearch tuning,
postgresql configuration, or something else...

Thanks.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Performance issue using Tsearch2

2008-02-05 Thread Ansgar -59cobalt- Wiechers
On 2008-02-05 Viviane Lestic wrote:
 QUERY PLAN
 -
 Sort (cost=2345.54..2345.58 rows=16 width=308) (actual
 time=270638.774..270643.142 rows=7106 loops=1)
 Sort Key: rank(tab_ocr.zoneindex_test, q.q)
 - Nested Loop (cost=80.04..2345.22 rows=16 width=308) (actual
 time=40886.553..270619.730 rows=7106 loops=1)
 - Nested Loop (cost=80.04..1465.76 rows=392 width=308) (actual
 time=38209.193..173932.313 rows=272414 loops=1)
 - Function Scan on q (cost=0.00..0.01 rows=1 width=32) (actual
 time=0.006..0.007 rows=1 loops=1)
 - Bitmap Heap Scan on tab_ocr (cost=80.04..1460.85 rows=392
 width=276) (actual time=38209.180..173507.052 rows=272414 loops=1)
 Filter: (tab_ocr.zoneindex_test @@ q.q)
 - Bitmap Index Scan on zoneindex_test_idx (cost=0.00..79.94 rows=392
 width=0) (actual time=38204.261..38204.261 rows=283606 loops=1)
 Index Cond: (tab_ocr.zoneindex_test @@ q.q)
 - Index Scan using tab_chemin_label_index on tab_chemin
 (cost=0.00..2.23 rows=1 width=4) (actual time=0.036..0.036 rows=0
 loops=272414)
 Index Cond: (tab_ocr.idstruct = tab_chemin.label)
 Filter: ((chemin)::text ~~ '%;2;%'::text)
 Total runtime: 270647.946 ms
 **
 
 Could someone help me analyze this problem?

Your planner estimates are way off. Try increasing the statistics target
for the columns used in this query and re-analyze the tables after doing
so.

Regards
Ansgar Wiechers
-- 
The Mac OS X kernel should never panic because, when it does, it
seriously inconveniences the user.
--http://developer.apple.com/technotes/tn2004/tn2118.html

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Performance issue using Tsearch2

2008-02-05 Thread Guillaume Smet
On Feb 5, 2008 12:47 PM, Viviane Lestic [EMAIL PROTECTED] wrote:
 Could someone help me analyze this problem?
 I don't manage to see if the problem comes from bad tsearch tuning,
 postgresql configuration, or something else...

Can you try to replace zoneindex_test @@ q with zoneindex_test @@
to_tsquery('partir')? Increasing the statistics for zoneindex_test may
be a good idea too (see ALTER TABLE ... ALTER COLUMN doc).
I'm surprised you have the word partir in so many documents? Do you
use real data?

--
Guillaume

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Performance issue using Tsearch2

2008-02-05 Thread Viviane Lestic
2008/2/5, Ansgar -59cobalt- Wiechers wrote:
 Your planner estimates are way off. Try increasing the statistics target
 for the columns used in this query and re-analyze the tables after doing
 so.

I first set STATISTICS to 1000 for column zoneindex_test and saw no
significant improvement (with a vacuum full analyze in between). Then
I set default_statistics_target to 1000: there is now an improvement,
but the overall time is still way too long... (and the estimated costs
didn't change...)
Here are the results with default_statistics_target set to 1000:

explain analyze SELECT idstruct, headline(zonetext, q),
rank(zoneindex_test, q) FROM tab_ocr, tab_chemin, to_tsquery('partir')
AS q WHERE tab_chemin.chemin like '%;2;%' AND tab_ocr.idstruct =
tab_chemin.label AND zoneindex_test @@ q ORDER BY rank(zoneindex_test,
q) DESC;

  QUERY PLAN
-
 Sort  (cost=2345.30..2345.32 rows=8 width=327) (actual
time=229913.715..229918.172 rows=7106 loops=1)
   Sort Key: rank(tab_ocr.zoneindex_test, q.q)
   -  Nested Loop  (cost=80.04..2345.18 rows=8 width=327) (actual
time=28159.626..229892.957 rows=7106 loops=1)
 -  Nested Loop  (cost=80.04..1465.76 rows=392 width=327)
(actual time=26084.558..130979.395 rows=272414 loops=1)
   -  Function Scan on q  (cost=0.00..0.01 rows=1
width=32) (actual time=0.006..0.007 rows=1 loops=1)
   -  Bitmap Heap Scan on tab_ocr  (cost=80.04..1460.85
rows=392 width=295) (actual time=26084.544..130562.220 rows=272414
loops=1)
 Filter: (tab_ocr.zoneindex_test @@ q.q)
 -  Bitmap Index Scan on zoneindex_test_idx
(cost=0.00..79.94 rows=392 width=0) (actual time=26073.315..26073.315
rows=283606 loops=1)
   Index Cond: (tab_ocr.zoneindex_test @@ q.q)
 -  Index Scan using tab_chemin_label_index on tab_chemin
(cost=0.00..2.23 rows=1 width=4) (actual time=0.040..0.040 rows=0
loops=272414)
   Index Cond: (tab_ocr.idstruct = tab_chemin.label)
   Filter: ((chemin)::text ~~ '%;2;%'::text)
 Total runtime: 229922.864 ms


2008/2/5, Guillaume Smet wrote:
 Can you try to replace zoneindex_test @@ q with zoneindex_test @@
 to_tsquery('partir')?

The improvement seems negligible (with default_statistics_target back
to 10, its default value):
explain analyze SELECT idstruct, headline(zonetext, q),
rank(zoneindex_test, q) FROM tab_ocr, tab_chemin, to_tsquery('partir')
AS q WHERE tab_chemin.chemin like '%;2;%' AND tab_ocr.idstruct =
tab_chemin.label AND zoneindex_test @@ to_tsquery('partir') ORDER BY
rank(zoneindex_test, q) DESC;

  QUERY PLAN
-
 Sort  (cost=4358.91..4358.95 rows=16 width=308) (actual
time=266489.667..266494.132 rows=7106 loops=1)
   Sort Key: rank(tab_ocr.zoneindex_test, q.q)
   -  Nested Loop  (cost=80.04..4358.59 rows=16 width=308) (actual
time=42245.881..266469.644 rows=7106 loops=1)
 -  Function Scan on q  (cost=0.00..0.01 rows=1 width=32)
(actual time=0.007..0.008 rows=1 loops=1)
 -  Nested Loop  (cost=80.04..4358.34 rows=16 width=276)
(actual time=42239.570..178496.761 rows=7106 loops=1)
   -  Bitmap Heap Scan on tab_ocr  (cost=80.04..1461.83
rows=392 width=276) (actual time=38317.423..174188.779 rows=272414
loops=1)
 Filter: (zoneindex_test @@ '''partir'''::tsquery)
 -  Bitmap Index Scan on zoneindex_test_idx
(cost=0.00..79.94 rows=392 width=0) (actual time=38289.289..38289.289
rows=283606 loops=1)
   Index Cond: (zoneindex_test @@ '''partir'''::tsquery)
   -  Index Scan using tab_chemin_label_index on
tab_chemin  (cost=0.00..7.38 rows=1 width=4) (actual time=0.014..0.014
rows=0 loops=272414)
 Index Cond: (tab_ocr.idstruct = tab_chemin.label)
 Filter: ((chemin)::text ~~ '%;2;%'::text)
 Total runtime: 266498.704 ms

 Increasing the statistics for zoneindex_test may
 be a good idea too (see ALTER TABLE ... ALTER COLUMN doc).

I posted the results above.

 I'm surprised you have the word partir in so many documents? Do you
 use real data?

I'm using real data. The indexed documents are extracted from
newspapers, and partir (and its derivates) is quite a common verb in
the French language, so I'm not that surprised to see it show up in
many documents.

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Performance issue with nested loop

2007-08-29 Thread Jens Reufsteck
Hi all,

I'm having a strange performance issue with two almost similar queries, the
one running as expected, the other one taking far more time. The only
difference is that I have uniid in (10) in the normally running query and
uniid in (9,10) in the other one. The number of rows resulting from the
respective table differs not very much being 406 for the first and 511 for
the second query.

This is the full query - the uniid in (9,10) is in the last subquery:


SELECT 'Anzahl' AS column1, count(DISTINCT sid) AS column2
FROM (
SELECT sid
FROM stud
WHERE stud.status  0
AND length(stud.vname)  1
AND length(stud.nname)  1
) AS qur_filter_1 INNER JOIN (
SELECT DISTINCT sid
FROM stud_vera
INNER JOIN phon USING (sid)
WHERE veraid = 22
AND stud_vera.status  0
AND (
(
veraid IN (2, 3, 22, 24, 36)
AND phontyp = 5
AND phon.typ = 1
AND phon.status  0
) OR (
veraid NOT IN (2, 3, 22, 24, 36)
)
)
) AS qur_filter_2 USING (sid) INNER JOIN (
SELECT DISTINCT sid 
FROM ausb
INNER JOIN uni USING (uniid)
WHERE uni.uniort IN ('Augsburg')
AND ausb.overview = 1
AND ausb.zweitstudium != 2
AND ausb.status  0
) AS qur_filter_3 USING (sid) INNER JOIN (
SELECT DISTINCT sid 
FROM ausb
WHERE uniid IN (9, 10)
AND ausb.overview = 1
AND ausb.zweitstudium != 2
AND ausb.status  0
) AS qur_filter_4 USING (sid)



These are the query-plans for both queries, first the problematic one:



Aggregate  (cost=78785.78..78785.79 rows=1 width=4) (actual
time=698777.890..698777.891 rows=1 loops=1)

  -  Nested Loop  (cost=65462.58..78785.78 rows=1 width=4) (actual
time=6743.856..698776.957 rows=250 loops=1)

Join Filter: (outer.sid = inner.sid)

-  Merge Join  (cost=11031.79..11883.12 rows=1 width=12) (actual
time=387.837..433.612 rows=494 loops=1)

  Merge Cond: (outer.sid = inner.sid)

  -  Nested Loop  (cost=5643.11..6490.17 rows=19 width=8)
(actual time=114.323..154.043 rows=494 loops=1)

-  Unique  (cost=5643.11..5645.35 rows=180 width=4)
(actual time=114.202..116.002 rows=511 loops=1)

  -  Sort  (cost=5643.11..5644.23 rows=448 width=4)
(actual time=114.199..114.717 rows=511 loops=1)

Sort Key: public.ausb.sid

-  Seq Scan on ausb  (cost=0.00..5623.38
rows=448 width=4) (actual time=0.351..112.459 rows=511 loops=1)

  Filter: (((uniid = 9) OR (uniid = 10))
AND (overview = 1) AND (zweitstudium  2) AND (status  0))

-  Index Scan using stud_pkey on stud  (cost=0.00..4.67
rows=1 width=4) (actual time=0.062..0.067 rows=1 loops=511)

  Index Cond: (stud.sid = outer.sid)

  Filter: ((status  0) AND (length((vname)::text) 
1) AND (length((nname)::text)  1))

  -  Materialize  (cost=5388.68..5392.05 rows=337 width=4)
(actual time=273.506..276.785 rows=511 loops=1)

-  Unique  (cost=5383.29..5384.98 rows=337 width=4)
(actual time=273.501..275.421 rows=511 loops=1)

  -  Sort  (cost=5383.29..5384.13 rows=337 width=4)
(actual time=273.499..274.091 rows=511 loops=1)

Sort Key: public.ausb.sid

-  Hash Join  (cost=17.61..5369.14 rows=337
width=4) (actual time=1.139..272.465 rows=511 loops=1)

  Hash Cond: (outer.uniid =
inner.uniid)

  -  Seq Scan on ausb
(cost=0.00..4827.30 rows=104174 width=8) (actual time=0.026..200.111
rows=103593 loops=1)

Filter: ((overview = 1) AND
(zweitstudium  2) AND (status  0))

  -  Hash  (cost=17.60..17.60 rows=2
width=4) (actual time=0.435..0.435 rows=2 loops=1)

-  Seq Scan on uni
(cost=0.00..17.60 rows=2 width=4) (actual time=0.412..0.424 rows=2 loops=1)

  Filter: ((uniort)::text =
'Augsburg'::text)

-  Unique  (cost=54430.79..4.18 rows=10599 width=4) (actual
time=6.851..1374.135 rows=40230 loops=494)

  -  Merge Join  (cost=54430.79..66319.65 rows=137811 width=4)
(actual time=6.849..1282.333 rows=40233 loops=494)

Merge Cond: (outer.sid = inner.sid)

Join Filter: outer.veraid = 2) OR (outer.veraid
= 3) OR (outer.veraid = 22) OR (outer.veraid = 24) OR (outer.veraid =
36)) AND (inner.phontyp = 5) AND (inner.typ = 1) AND (inner.status 
0)) OR ((outer.veraid  2) AND (outer.veraid  3) AND (outer.veraid
 22) AND (outer.veraid  24) AND 

[PERFORM] Performance issue with nested loop

2007-08-29 Thread Jens Reufsteck
Hi all,

I'm having a strange performance issue with two almost similar queries, the
one running as expected, the other one taking far more time. The only
difference is that I have uniid in (10) in the normally running query and
uniid in (9,10) in the other one. The number of rows resulting from the
respective table differs not very much being 406 for the first and 511 for
the second query.

This is the full query - the uniid in (9,10) is in the last subquery:


SELECT 'Anzahl' AS column1, count(DISTINCT sid) AS column2
FROM (
SELECT sid
FROM stud
WHERE stud.status  0
AND length(stud.vname)  1
AND length(stud.nname)  1
) AS qur_filter_1 INNER JOIN (
SELECT DISTINCT sid
FROM stud_vera
INNER JOIN phon USING (sid)
WHERE veraid = 22
AND stud_vera.status  0
AND (
(
veraid IN (2, 3, 22, 24, 36)
AND phontyp = 5
AND phon.typ = 1
AND phon.status  0
) OR (
veraid NOT IN (2, 3, 22, 24, 36)
)
)
) AS qur_filter_2 USING (sid) INNER JOIN (
SELECT DISTINCT sid 
FROM ausb
INNER JOIN uni USING (uniid)
WHERE uni.uniort IN ('Augsburg')
AND ausb.overview = 1
AND ausb.zweitstudium != 2
AND ausb.status  0
) AS qur_filter_3 USING (sid) INNER JOIN (
SELECT DISTINCT sid 
FROM ausb
WHERE uniid IN (9, 10)
AND ausb.overview = 1
AND ausb.zweitstudium != 2
AND ausb.status  0
) AS qur_filter_4 USING (sid)



These are the query-plans for both queries, first the problematic one:



Aggregate  (cost=78785.78..78785.79 rows=1 width=4) (actual
time=698777.890..698777.891 rows=1 loops=1)

  -  Nested Loop  (cost=65462.58..78785.78 rows=1 width=4) (actual
time=6743.856..698776.957 rows=250 loops=1)

Join Filter: (outer.sid = inner.sid)

-  Merge Join  (cost=11031.79..11883.12 rows=1 width=12) (actual
time=387.837..433.612 rows=494 loops=1)

  Merge Cond: (outer.sid = inner.sid)

  -  Nested Loop  (cost=5643.11..6490.17 rows=19 width=8)
(actual time=114.323..154.043 rows=494 loops=1)

-  Unique  (cost=5643.11..5645.35 rows=180 width=4)
(actual time=114.202..116.002 rows=511 loops=1)

  -  Sort  (cost=5643.11..5644.23 rows=448 width=4)
(actual time=114.199..114.717 rows=511 loops=1)

Sort Key: public.ausb.sid

-  Seq Scan on ausb  (cost=0.00..5623.38
rows=448 width=4) (actual time=0.351..112.459 rows=511 loops=1)

  Filter: (((uniid = 9) OR (uniid = 10))
AND (overview = 1) AND (zweitstudium  2) AND (status  0))

-  Index Scan using stud_pkey on stud  (cost=0.00..4.67
rows=1 width=4) (actual time=0.062..0.067 rows=1 loops=511)

  Index Cond: (stud.sid = outer.sid)

  Filter: ((status  0) AND (length((vname)::text) 
1) AND (length((nname)::text)  1))

  -  Materialize  (cost=5388.68..5392.05 rows=337 width=4)
(actual time=273.506..276.785 rows=511 loops=1)

-  Unique  (cost=5383.29..5384.98 rows=337 width=4)
(actual time=273.501..275.421 rows=511 loops=1)

  -  Sort  (cost=5383.29..5384.13 rows=337 width=4)
(actual time=273.499..274.091 rows=511 loops=1)

Sort Key: public.ausb.sid

-  Hash Join  (cost=17.61..5369.14 rows=337
width=4) (actual time=1.139..272.465 rows=511 loops=1)

  Hash Cond: (outer.uniid =
inner.uniid)

  -  Seq Scan on ausb
(cost=0.00..4827.30 rows=104174 width=8) (actual time=0.026..200.111
rows=103593 loops=1)

Filter: ((overview = 1) AND
(zweitstudium  2) AND (status  0))

  -  Hash  (cost=17.60..17.60 rows=2
width=4) (actual time=0.435..0.435 rows=2 loops=1)

-  Seq Scan on uni
(cost=0.00..17.60 rows=2 width=4) (actual time=0.412..0.424 rows=2 loops=1)

  Filter: ((uniort)::text =
'Augsburg'::text)

-  Unique  (cost=54430.79..4.18 rows=10599 width=4) (actual
time=6.851..1374.135 rows=40230 loops=494)

  -  Merge Join  (cost=54430.79..66319.65 rows=137811 width=4)
(actual time=6.849..1282.333 rows=40233 loops=494)

Merge Cond: (outer.sid = inner.sid)

Join Filter: outer.veraid = 2) OR (outer.veraid
= 3) OR (outer.veraid = 22) OR (outer.veraid = 24) OR (outer.veraid =
36)) AND (inner.phontyp = 5) AND (inner.typ = 1) AND (inner.status 
0)) OR ((outer.veraid  2) AND (outer.veraid  3) AND (outer.veraid
 22) AND (outer.veraid  24) AND 

Re: [PERFORM] Performance issue

2007-08-28 Thread Willo van der Merwe

Hi Guys,

Following Tom Lane's advice I upgraded to 8.2, and that solved all my 
problems. :D


Thank  you so much for your input, I really appreciate it.

Kind regards

Willo van der Merwe


---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Performance issue

2007-08-28 Thread Willo van der Merwe

Hi Guys,

I have something odd. I have Gallery2 running on PostgreSQL 8.1, and 
recently I upgraded to 8.1.9-1.el4s1.1 (64bit). The issue here really is 
how do I get PostgreSQL to work with their horrible code. The queries 
they generate look something like :
SELECT blah, blah FROM table1, table2 WHERE some relational stuff AND 
id IN (here a list of 42000+ IDs are listed)


On the previous version (which I can't recall what it was, but it was a 
version 8.1) the queries executed fine, but suddenly now, these queries 
are taking up-to 4 minutes to complete. I am convinced it's the 
parsing/handling of the IN clause. It could, of course, be that the list 
has grown so large that it can't fit into a buffer anymore. For obvious 
reasons I can't run an EXPLAIN ANALYZE from a prompt. I vacuum and 
reindex  the database daily.


I'd prefer not to have to rewrite the code, so any suggestions would be 
very welcome.


Kind regards

Willo van der Merwe

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[PERFORM] Performance issue

2007-08-27 Thread Willo van der Merwe

Hi Guys,

I have something odd. I have Gallery2 running on PostgreSQL 8.1, and
recently I upgraded to 8.1.9-1.el4s1.1 (64bit). The issue here really is
how do I get PostgreSQL to work with their horrible code. The queries
they generate look something like :
SELECT blah, blah FROM table1, table2 WHERE some relational stuff AND
id IN (here a list of 42000+ IDs are listed)

On the previous version (which I can't recall what it was, but it was a
version 8.1) the queries executed fine, but suddenly now, these queries
are taking up-to 4 minutes to complete. I am convinced it's the
parsing/handling of the IN clause. It could, of course, be that the list
has grown so large that it can't fit into a buffer anymore. For obvious
reasons I can't run an EXPLAIN ANALYZE from a prompt. I vacuum and
reindex  the database daily.

I'd prefer not to have to rewrite the code, so any suggestions would be
very welcome.

Kind regards

Willo van der Merwe


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Performance issue

2007-08-27 Thread Bill Moran
In response to Willo van der Merwe [EMAIL PROTECTED]:

 Hi Guys,
 
 I have something odd. I have Gallery2 running on PostgreSQL 8.1, and
 recently I upgraded to 8.1.9-1.el4s1.1 (64bit). The issue here really is
 how do I get PostgreSQL to work with their horrible code. The queries
 they generate look something like :
 SELECT blah, blah FROM table1, table2 WHERE some relational stuff AND
 id IN (here a list of 42000+ IDs are listed)
 
 On the previous version (which I can't recall what it was, but it was a
 version 8.1) the queries executed fine, but suddenly now, these queries
 are taking up-to 4 minutes to complete. I am convinced it's the
 parsing/handling of the IN clause. It could, of course, be that the list
 has grown so large that it can't fit into a buffer anymore. For obvious
 reasons I can't run an EXPLAIN ANALYZE from a prompt.

Those reasons are not obvious to me.  The explain analyze output is
going to be key to working this out -- unless it's something like
your postgresql.conf isn't properly tuned.

 I vacuum and
 reindex  the database daily.
 
 I'd prefer not to have to rewrite the code, so any suggestions would be
 very welcome.

-- 
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

[EMAIL PROTECTED]
Phone: 412-422-3463x4023

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Performance issue

2007-08-27 Thread Tom Lane
Willo van der Merwe [EMAIL PROTECTED] writes:
 I have something odd. I have Gallery2 running on PostgreSQL 8.1, and
 recently I upgraded to 8.1.9-1.el4s1.1 (64bit). The issue here really is
 how do I get PostgreSQL to work with their horrible code. The queries
 they generate look something like :
 SELECT blah, blah FROM table1, table2 WHERE some relational stuff AND
 id IN (here a list of 42000+ IDs are listed)

 On the previous version (which I can't recall what it was, but it was a
 version 8.1) the queries executed fine, but suddenly now, these queries
 are taking up-to 4 minutes to complete. I am convinced it's the
 parsing/handling of the IN clause.

You're wrong about that, because we have not done anything to change IN
planning in 8.1.x.  You might need to re-ANALYZE or something; it sounds
to me more like the planner has changed strategies in the wrong direction.

FWIW, 8.2 should be vastly more efficient than 8.1 for this sort of
query --- any chance of an upgrade?

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Performance issue with 8.2.3 - C application

2007-07-25 Thread Nis Jørgensen
Karl Denninger skrev:
 I've got an interesting issue here that I'm running into with 8.2.3
 
 This is an application that has run quite well for a long time, and has
 been operating without significant changes (other than recompilation)
 since back in the early 7.x Postgres days.  But now we're seeing a LOT
 more load than we used to with it, and suddenly, we're seeing odd
 performance issues.
 
 It APPEARS that the problem isn't query performance per-se.  That is,
 while I can find a few processes here and there in a run state when I
 look with a PS, I don't see them consistently churning.
 
 But here's the query that has a habit of taking the most time
 
 select forum, * from post where toppost = 1 and (replied  (select
 lastview from forumlog where login='theuser' and forum=post.forum and
 number is null)) is not false AND (replied  (select lastview from
 forumlog where login='theuser' and forum=post.forum and
 number=post.number)) is not false order by pinned desc, replied desc offset 0 
 limit 20

Since I can do little to help you with anything else, here is a little
help from a guy with a hammer. It seems you may be able to convert the
subqueries into a left join. Not sure whether this helps, nor whether I
got some bits of the logic wrong, but something like this might help the
planner find a better plan:

SELECT forum, *
FROM post
LEFT JOIN forumlog
ON post.forum = forumlog.forum
AND forumlog.login = 'theuser'
AND (post.number = forumlog.number OR forumlog.number IS NULL)
AND post.replied = lastview
WHERE forumlog.forum IS NULL
AND forum.toppost = 1
ORDER BY pinned DESC, replied DESC OFFSET 0 LIMIT 20 ;


Nis


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Performance issue with 8.2.3 - C application

2007-07-25 Thread Gregory Stark
Karl Denninger [EMAIL PROTECTED] writes:

 Not sure where to start here.  It appears that I'm CPU limited and the problem
 may be that this is a web-served application that must connect to the Postgres
 backend for each transaction, perform its queries, and then close the
 connection down - in other words the load may be coming not from Postgres but
 rather from places I can't fix at the application layer (e.g. fork() overhead,
 etc).  The DBMS and Apache server are on the same machine, so there's no 
 actual
 network overhead involved.

 If that's the case the only solution is to throw more hardware at it.  I can 
 do
 that, but before I go tossing more CPU at the problem I'd like to know I'm not
 just wasting money.

I know you found the proximate cause of your current problems, but it sounds
like you have something else you should consider looking at here. There are
techniques for avoiding separate database connections for each request.

If you're using Apache you can reduce the CPU usage a lot by writing your
module as an Apache module instead of a CGI or whatever type of program it is
now. Then your module would live as long as a single Apache instance which you
can configure to be hours or days instead of a single request. It can keep
around the database connection for that time.

If that's impossible there are still techniques that can help. You can set up
PGPool or PGBouncer or some other connection aggregating tool to handle the
connections. This is a pretty low-impact change which shouldn't require making
any application changes aside from changing the database connection string.
Effectively this is a just a connection pool that lives in a separate
process.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Performance issue with 8.2.3 - C application

2007-07-25 Thread Karl Denninger
Looks like that was the problem - got a day under the belt now with the 
8.2.4 rev and all is back to normal.


Karl Denninger ([EMAIL PROTECTED])
http://www.denninger.net




Karl Denninger wrote:

Aha!

BIG difference.  I won't know for sure until the biz day tomorrow but 
the first blush look is that it makes a HUGE difference in system 
load, and I no longer have the stats collector process on the top of 
the top list..


Karl Denninger ([EMAIL PROTECTED])
http://www.denninger.net




Tom Lane wrote:

Karl Denninger [EMAIL PROTECTED] writes:
 
Hm. now that's interesting.  Stats collector IS accumulating 
quite a bit of runtime. me thinks its time to go grab 8.2.4.



I think Merlin might have nailed it --- the stats collector bug is
that it tries to write out the stats file way more often than it
should.  So any excessive userland CPU time you see is just the tip
of the iceberg compared to the system and I/O costs incurred.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


%SPAMBLOCK-SYS: Matched [hub.org+], message ok
  



%SPAMBLOCK-SYS: Matched [EMAIL PROTECTED], message ok

---(end of broadcast)---
TIP 6: explain analyze is your friend


%SPAMBLOCK-SYS: Matched [hub.org+], message ok



%SPAMBLOCK-SYS: Matched [EMAIL PROTECTED], message ok

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] Performance issue with 8.2.3 - C application

2007-07-24 Thread Karl Denninger

I've got an interesting issue here that I'm running into with 8.2.3

This is an application that has run quite well for a long time, and has 
been operating without significant changes (other than recompilation) 
since back in the early 7.x Postgres days.  But now we're seeing a LOT 
more load than we used to with it, and suddenly, we're seeing odd 
performance issues.


It APPEARS that the problem isn't query performance per-se.  That is, 
while I can find a few processes here and there in a run state when I 
look with a PS, I don't see them consistently churning.


But here's the query that has a habit of taking the most time

select forum, * from post where toppost = 1 and (replied  (select 
lastview from forumlog where login='theuser' and forum=post.forum and 
number is null)) is not false AND (replied  (select lastview from 
forumlog where login='theuser' and forum=post.forum and 
number=post.number)) is not f

alse order by pinned desc, replied desc offset 0 limit 20

Now the numeric and login fields may change; when I plug it into 
explain what I get back is:


QUERY 
PLAN  
---

Limit  (cost=57270.22..57270.27 rows=20 width=757)
  -  Sort  (cost=57270.22..57270.50 rows=113 width=757)
Sort Key: pinned, replied
-  Index Scan using post_top on post  (cost=0.00..57266.37 
rows=113 width=757)

  Index Cond: (toppost = 1)
  Filter: (((replied  (subplan)) IS NOT FALSE) AND 
((replied  (subplan)) IS NOT FALSE))

  SubPlan
-  Index Scan using forumlog_composite on forumlog  
(cost=0.00..8.29 rows=1 width
  Index Cond: ((login = 'theuser'::text) AND 
(forum = $0) AND (number = $1))
-  Bitmap Heap Scan on forumlog  (cost=4.39..47.61 
rows=1 width=8)
  Recheck Cond: ((login = 'theuser'::text) AND 
(forum = $0))

  Filter: (number IS NULL)
  -  Bitmap Index Scan on forumlog_composite  
(cost=0.00..4.39 rows=12 width=0)
Index Cond: ((login = 'theuser'::text) 
AND (forum = $0))

(14 rows)

And indeed, it returns a fairly reasonable number of rows.

This takes a second or two to return - not all that bad - although this 
is one that people hit a LOT. 


One thing that DOES bother me is this line from the EXPLAIN output:
-  Index Scan using post_top on post  (cost=0.00..57266.53 rows=113 
width=757)


This is indexed using:

 post_top btree (toppost)

Ain't nothing fancy there.  So how come the planner thinks this is going 
to take that long?!?


More interesting, if I do a simple query on that line, I get

ticker= explain select forum from post where toppost='1';
   QUERY PLAN
---

Index Scan using post_top on post  (cost=0.00..632.03 rows=1013 width=11)
  Index Cond: (toppost = 1)

Hmm; that's a bit more reasonable.  So what's up with the above line?

What I'm seeing is that as concurrency increases, I see the CPU load 
spike.  Disk I/O is low to moderate at less than 10% of maximum 
according to systat -vm, no swap in use, 300mb dedicated to shared 
memory buffers for Postgres (machine has 1GB of RAM and is a P4/3G/HT 
running FreeBSD 6.2-STABLE)  It does not swap at all, so it does not 
appear I've got a problem with running out of physical memory.  shmem is 
pinned to physical memory via the sysctl tuning parameter to prevent 
page table thrashing.


However, load goes up substantially and under moderate to high 
concurrency gets into the 4-5 range with response going somewhat to 
crap.  The application is still usable, but its not crisp.  If I do a 
ps during times that performance is particularly bad, I don't see any 
particular overrepresentation of this query .vs. others (I have the 
application doing a setproctitle so I know what command - and thus 
what sets of queries - it is executing.)


Not sure where to start here.  It appears that I'm CPU limited and the 
problem may be that this is a web-served application that must connect 
to the Postgres backend for each transaction, perform its queries, and 
then close the connection down - in other words the load may be coming 
not from Postgres but rather from places I can't fix at the application 
layer (e.g. fork() overhead, etc).  The DBMS and Apache server are on 
the same machine, so there's no actual network overhead involved. 

If that's the case the only solution is to throw more hardware at it.  I 
can do that, but before I go tossing more CPU at the problem I'd like to 
know I'm not just wasting money.


The application uses the C language interface and just calls 
Connectdb - the only parameter is the dbname, so it should be 

Re: [PERFORM] Performance issue with 8.2.3 - C application

2007-07-24 Thread Karl Denninger
Yeah, the problem doesn't appear to be there.  As I said, if I look at 
the PS of the system when its bogging, there aren't a whole bunch of 
processes stuck doing these, so while this does take a second or two to 
come back, that's not that bad.


Its GENERAL performance that just bites - the system is obviously out of 
CPU, but what I can't get a handle on is WHY.  It does not appear to be 
accumulating large amounts of runtime in processes I can catch, but the 
load average is quite high.


This is why I'm wondering if what I'm taking here is a hit on the 
fork/exec inside the portmaster, in the setup internally in there, in 
the IPC between my process via libPQ, etc - and how I can profile what's 
going on.


Karl Denninger ([EMAIL PROTECTED])
http://www.denninger.net




Tom Lane wrote:

Karl Denninger [EMAIL PROTECTED] writes:
  

But here's the query that has a habit of taking the most time



  
select forum, * from post where toppost = 1 and (replied  (select 
lastview from forumlog where login='theuser' and forum=post.forum and 
number is null)) is not false AND (replied  (select lastview from 
forumlog where login='theuser' and forum=post.forum and 
number=post.number)) is not f

alse order by pinned desc, replied desc offset 0 limit 20



Did that ever perform well for you?  It's the sub-selects that are
likely to hurt ... in particular,

  
 -  Index Scan using post_top on post  (cost=0.00..57266.37 
rows=113 width=757)

   Index Cond: (toppost = 1)
   Filter: (((replied  (subplan)) IS NOT FALSE) AND 
((replied  (subplan)) IS NOT FALSE))



versus

  

 Index Scan using post_top on post  (cost=0.00..632.03 rows=1013 width=11)
   Index Cond: (toppost = 1)



The planner thinks that the two subplan filter conditions will eliminate
about 90% of the rows returned by the bare indexscan (IIRC this is
purely a rule of thumb, not based on any statistics) and that testing
them 1013 times will add over 5 cost units to the basic indexscan.
That part I believe --- correlated subqueries are expensive.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


%SPAMBLOCK-SYS: Matched [hub.org+], message ok
  


Re: [PERFORM] Performance issue with 8.2.3 - C application

2007-07-24 Thread Merlin Moncure

On 7/25/07, Karl Denninger [EMAIL PROTECTED] wrote:


 Yeah, the problem doesn't appear to be there.  As I said, if I look at the
PS of the system when its bogging, there aren't a whole bunch of processes
stuck doing these, so while this does take a second or two to come back,
that's not that bad.

 Its GENERAL performance that just bites - the system is obviously out of
CPU, but what I can't get a handle on is WHY.  It does not appear to be
accumulating large amounts of runtime in processes I can catch, but the load
average is quite high.


8.2.3 has the 'stats collector bug' (fixed in 8.2.4) which increased
load in high concurrency conditions.  on a client's machine after
patching the postmaster load drop from the 4-5 range to 1-2 range on a
500 tps server.  maybe this is biting you?  symptoms are high load avg
and high cpu usage of stats collector process.

merlin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Performance issue with 8.2.3 - C application

2007-07-24 Thread Karl Denninger
Hm. now that's interesting.  Stats collector IS accumulating 
quite a bit of runtime. me thinks its time to go grab 8.2.4.


Karl Denninger ([EMAIL PROTECTED])
http://www.denninger.net




Merlin Moncure wrote:

On 7/25/07, Karl Denninger [EMAIL PROTECTED] wrote:


 Yeah, the problem doesn't appear to be there.  As I said, if I look 
at the
PS of the system when its bogging, there aren't a whole bunch of 
processes

stuck doing these, so while this does take a second or two to come back,
that's not that bad.

 Its GENERAL performance that just bites - the system is obviously 
out of

CPU, but what I can't get a handle on is WHY.  It does not appear to be
accumulating large amounts of runtime in processes I can catch, but 
the load

average is quite high.


8.2.3 has the 'stats collector bug' (fixed in 8.2.4) which increased
load in high concurrency conditions.  on a client's machine after
patching the postmaster load drop from the 4-5 range to 1-2 range on a
500 tps server.  maybe this is biting you?  symptoms are high load avg
and high cpu usage of stats collector process.

merlin


%SPAMBLOCK-SYS: Matched [google.com+], message ok



%SPAMBLOCK-SYS: Matched [EMAIL PROTECTED], message ok

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Performance issue with 8.2.3 - C application

2007-07-24 Thread Tom Lane
Karl Denninger [EMAIL PROTECTED] writes:
 Hm. now that's interesting.  Stats collector IS accumulating 
 quite a bit of runtime. me thinks its time to go grab 8.2.4.

I think Merlin might have nailed it --- the stats collector bug is
that it tries to write out the stats file way more often than it
should.  So any excessive userland CPU time you see is just the tip
of the iceberg compared to the system and I/O costs incurred.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Performance issue with 8.2.3 - C application

2007-07-24 Thread Karl Denninger

Aha!

BIG difference.  I won't know for sure until the biz day tomorrow but 
the first blush look is that it makes a HUGE difference in system 
load, and I no longer have the stats collector process on the top of the 
top list..


Karl Denninger ([EMAIL PROTECTED])
http://www.denninger.net




Tom Lane wrote:

Karl Denninger [EMAIL PROTECTED] writes:
  
Hm. now that's interesting.  Stats collector IS accumulating 
quite a bit of runtime. me thinks its time to go grab 8.2.4.



I think Merlin might have nailed it --- the stats collector bug is
that it tries to write out the stats file way more often than it
should.  So any excessive userland CPU time you see is just the tip
of the iceberg compared to the system and I/O costs incurred.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


%SPAMBLOCK-SYS: Matched [hub.org+], message ok
  



%SPAMBLOCK-SYS: Matched [EMAIL PROTECTED], message ok

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] performance issue with a specific query

2006-07-27 Thread Eliott
Hi!I hope I'm sending my question to the right list, please don't flame if it's the wrong one.I have noticed that while a query runs in about 1.5seconds on a 8.xx version postgresql server on our 7.4.13 it takes around 15-20 minutes. Since we are using RHEL4 on our server we are stuck with 
7.4.13. The enormous time difference between the different builds drives me crazy. Can you please help me identifying the bottleneck or suggest anything to improve the dismal performance.The query is the following:
Select car_license_plate.license_plate, substr(date_trunc('day', car_km_fuel.transaction_time), 1, 10), substr(date_trunc('second', car_km_fuel.transaction_time), 12, 8), vehicle_make.make, vehicle_type.model,
 engine_size, vehicle_fuel_type.fuel_type, v_org_person_displayname.displayname_lastfirst, car_km_fuel.ammount, car_km_fuel.unit_price, car_km_fuel.total_ammount, currency.currency AS,
 car_km_fuel.km AS, vehicle_specific.fuel_capacity, CASE WHEN (car_km_fuel.ammount  vehicle_specific.fuel_capacity) THEN CAST(ROUND(CAST(car_km_fuel.ammount - vehicle_specific.fuel_capacity AS NUMERIC), 2) AS varchar) ELSE '---' END AS over,
 car_km_fuel.notes,CASE WHEN (prev_car_km_fuel.km IS NOT NULL AND car_km_fuel.km IS NOT NULL AND (car_km_fuel.km - prev_car_km_fuel.km  0)) THEN CAST(Round(CAST(((car_km_fuel.ammount / (car_km_fuel.km - prev_car_km_fuel.km)) * 100) AS Numeric), 2) AS VARCHAR)
 WHEN (prev_car_km_fuel.km IS NULL) THEN ''  WHEN (car_km_fuel.km IS NULL) THEN 'error' END AS average, vehicle_specific.consumption_town, org_person.email_addressFROM
 car_km_fuelLEFT JOIN car ON car.id = car_km_fuel.car_idLEFT JOIN car_license_plate ON car_license_plate.car_id = car.id AND  (car_license_plate.license_plate_end_date  date_trunc('day', car_km_fuel.transaction_time) OR car_license_plate.license_plate_end_date IS NULL)
LEFT JOIN vehicle_specific ON vehicle_specific.id = car.vehicle_specific_idLEFT JOIN vehicle_variant ON vehicle_variant.id = vehicle_specific.vehicle_variant_idLEFT JOIN  vehicle_type ON vehicle_type.id = vehicle_variant.vehicle_type_id
LEFT JOIN vehicle_make ON vehicle_make.id = vehicle_type.vehicle_make_idLEFT JOIN vehicle_fuel_type ON vehicle_fuel_type.id = vehicle_specific.fuel_type_idLEFT JOIN car_driver ON car_driver.car_id = 
car.id AND car_driver.allocation_date = date_trunc('day', car_km_fuel.transaction_time) AND (car_driver.end_date = date_trunc('day', car_km_fuel.transaction_time) OR car_driver.end_date IS NULL)
LEFT JOIN  v_org_person_displayname ON v_org_person_displayname.id = car_driver.car_driver_idLEFT JOIN org_person ON org_person.id = v_org_person_displayname.idLEFT JOIN currency ON 
currency.id = car_km_fuel.currency_idLEFT JOIN  car_km_fuel AS prev_car_km_fuel ON prev_car_km_fuel.transaction_time = (SELECT MAX(transaction_time) FROM car_km_fuel as car_km_fuel2 WHERE car_km_fuel2.car_id = 
car.id AND car_km_fuel2.transaction_time  car_km_fuel.transaction_time)LEFT JOIN  org_company ON org_company.id = org_person.company_idWHERE (lower(org_company.name) LIKE lower(:param3) || '%') AND
 (car_km_fuel.transaction_time = :param1 OR :param1 IS NULL) AND (car_km_fuel.transaction_time = :param2 OR :param2 IS NULL) ORDER BY 1, 2, 3;The output of explain if the following under 
7.4.13:QUERY PLAN  
-
 Sort  (cost=66.66..66.66 rows=1 width=917)Sort Key: car_license_plate.license_plate, substr((date_trunc('day'::text, car_km_fuel.transaction_time))::text, 1, 10), substr((date_trunc('second'::text, car_km_fuel.transaction_time))::text, 12, 8)
   -  Nested Loop  (cost=44.93..66.65 rows=1 width=917) -  Nested Loop Left Join  (cost=44.93..62.23 rows=1 width=921)Join Filter: (inner.transaction_time = (subplan))
   -  Nested Loop Left Join  (cost=44.93..62.21 rows=1 width=917) Join Filter: (inner.id = outer.currency_id) -  Nested Loop  (cost=
44.93..60.92 rows=1 width=828)   -  Hash Join  (cost=44.93..58.32 rows=1 width=805)  Hash Cond: (outer.id = inner.car_driver_id)
  -  Subquery Scan v_org_person_displayname  (cost=16.42..28.82 rows=196 width=520)   -  Merge Right Join  (cost=16.42..26.86 rows=196 width=51)
  Merge Cond: (outer.id = inner.company_id) 

Re: [PERFORM] performance issue with a specific query

2006-07-27 Thread Scott Marlowe
On Thu, 2006-07-27 at 09:23, Eliott wrote:
 Hi!
 
 I hope I'm sending my question to the right list, please don't flame
 if it's the wrong one.
 
 I have noticed that while a query runs in about 1.5seconds on a 8.xx
 version postgresql server on our 7.4.13 it takes around 15-20 minutes.
 Since we are using RHEL4 on our server we are stuck with 7.4.13. The
 enormous time difference between the different builds drives me crazy.
 Can you please help me identifying the bottleneck or suggest anything
 to improve the dismal performance.

You are absolutely on the right list.  A couple of points.

1:  Which 8.xx?  8.0.x or 8.1.x?  8.1.x is literally light years ahead
of 7.4 in terms of performance.  8.0 is somewhere between them.  The
performance difference you're seeing is pretty common.

2:  Looking at your query, there are places where you're joining on
things like date_trunc(...).  In 7.4 the database will not, and cannot
use a normal index on the date field for those kinds of things.  It can,
however, use a funtional index on some of them.  Try creating an index
on date_trunc('day',yourfieldhere) and see if that helps.

3:  You are NOT Stuck on 7.4.13.  I have a RHEL server that will be
running 8.1.4 or so pretty soon as a dataware house.  It may get updated
to RHEL4, may not.  You can either compile from the .tar.[gz|bz2] files
or download the PGDG rpms for your distro.

4:  You are fighting an uphill battle.  There were a LOT of improvements
made all over in the march from 7.4 to 8.1.  Not all of them were simple
planner tweaks and shortcuts, but honest to goodness changes to the way
things happen.  No amount of tuning can make 7.4 run as fast as 8.1.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] performance issue with a specific query

2006-07-27 Thread Merlin Moncure

On 7/27/06, Eliott [EMAIL PROTECTED] wrote:

Hi!

I hope I'm sending my question to the right list, please don't flame if it's
the wrong one.

I have noticed that while a query runs in about 1.5seconds on a 8.xx version
postgresql server on our 7.4.13 it takes around 15-20 minutes. Since we are
using RHEL4 on our server we are stuck with 7.4.13. The enormous time
difference between the different builds drives me crazy. Can you please help
me identifying the bottleneck or suggest anything to improve the dismal
performance.
The query is the following:



try turning off genetic query optimization.  regarding the rhel4
issue...does rhel not come with a c compiler? :)

merlin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] performance issue with a specific query

2006-07-27 Thread Joshua D. Drake




try turning off genetic query optimization.  regarding the rhel4
issue...does rhel not come with a c compiler? :)


Enterprises are not going to compile. They are going to accept the 
latest support by vendor release.


Redhat has a tendency to  be incredibly stupid about this particular 
area of their packaging.


Joshua D. Drake




merlin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq




--

   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
   Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] performance issue with a specific query

2006-07-27 Thread Alvaro Herrera
Joshua D. Drake wrote:
 
 
 try turning off genetic query optimization.  regarding the rhel4
 issue...does rhel not come with a c compiler? :)
 
 Enterprises are not going to compile. They are going to accept the 
 latest support by vendor release.
 
 Redhat has a tendency to  be incredibly stupid about this particular 
 area of their packaging.

Stupid how?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] performance issue with a specific query

2006-07-27 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Joshua D. Drake wrote:
 Enterprises are not going to compile. They are going to accept the 
 latest support by vendor release.
 
 Redhat has a tendency to  be incredibly stupid about this particular 
 area of their packaging.

 Stupid how?

Red Hat feels (apparently accurately, judging by their subscription
revenue ;-)) that what RHEL customers want is a platform that's stable
over multi-year application lifespans.  So major incompatible changes in
the system software are not looked on with favor.  That's why RHEL4
is still shipping PG 7.4.*.  You can call it a stupid policy if you
like, but it's hard to argue with success.

However, there will be an RH-supported release of PG 8.1.* as an optional
add-on for RHEL4.  Real Soon Now, I hope --- the release date has been
pushed back a couple times already.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] PERFORMANCE ISSUE ODBC x LIBPQ C++ Application

2005-06-27 Thread grupos

Hi !

My company is evaluating to compatibilizate our system (developed in 
C++) to PostgreSQL.


Our programmer made a lot of tests and he informed me that the 
performance using ODBC is very similar than using libpq, even with a big 
number of simultaneous connections/queries. Of course that for us is 
simpler use ODBC because will be easier to maintan as we already support 
a lot of other databases using ODBC (MySQL, DB2, etc).


Someone already had this experience? What are the key benefits using 
libpq insted of ODBC ?


Our application have a heavy load and around 150 concorrent users.

Regards,

Rodrigo Carvalhaes

--
Esta mensagem foi verificada pelo sistema de antivírus e
acredita-se estar livre de perigo.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] PERFORMANCE ISSUE ODBC x LIBPQ C++ Application

2005-06-27 Thread Merlin Moncure
 Hi !
 
 My company is evaluating to compatibilizate our system (developed in
 C++) to PostgreSQL.
 
 Our programmer made a lot of tests and he informed me that the
 performance using ODBC is very similar than using libpq, even with a
big
 number of simultaneous connections/queries. Of course that for us is
 simpler use ODBC because will be easier to maintan as we already
support
 a lot of other databases using ODBC (MySQL, DB2, etc).
 
 Someone already had this experience? What are the key benefits using
 libpq insted of ODBC ?
 
 Our application have a heavy load and around 150 concorrent users.

The ODBC driver for postgresql implements its own protocol stack.
Unfortunately, it is still on protocol revision 2 (out of 3).  Also, IMO
libpq is a little better tested and durable than the odbc driver.  This
naturally follows from the fact that libpq is more widely used and more
actively developed than odbc.

If you are heavily C++ invested you can consider wrapping libpq yourself
if you want absolute maximum performance.  If you happen to be
developing on Borland platform give strong consideration to Zeos
connection library which is very well designed (it wraps libpq).

You might want to consider posting your question to the odbc list.

Merlin 


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] PERFORMANCE ISSUE ODBC x LIBPQ C++ Application

2005-06-27 Thread Eric Lauzon
i would take a peek at psqlodbc-8.0 drivers ..
i wouldn't battle with other version you might find such as (unixodbc
ones)


-elz


 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Merlin Moncure
 Sent: 27 juin 2005 10:29
 To: grupos
 Cc: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] PERFORMANCE ISSUE ODBC x LIBPQ C++ Application
 
  Hi !
  
  My company is evaluating to compatibilizate our system (developed in
  C++) to PostgreSQL.
  
  Our programmer made a lot of tests and he informed me that the 
  performance using ODBC is very similar than using libpq, even with a
 big
  number of simultaneous connections/queries. Of course that 
 for us is 
  simpler use ODBC because will be easier to maintan as we already
 support
  a lot of other databases using ODBC (MySQL, DB2, etc).
  
  Someone already had this experience? What are the key 
 benefits using 
  libpq insted of ODBC ?
  
  Our application have a heavy load and around 150 concorrent users.
 
 The ODBC driver for postgresql implements its own protocol stack.
 Unfortunately, it is still on protocol revision 2 (out of 3). 
  Also, IMO libpq is a little better tested and durable than 
 the odbc driver.  This naturally follows from the fact that 
 libpq is more widely used and more actively developed than odbc.
 
 If you are heavily C++ invested you can consider wrapping 
 libpq yourself if you want absolute maximum performance.  If 
 you happen to be developing on Borland platform give strong 
 consideration to Zeos connection library which is very well 
 designed (it wraps libpq).
 
 You might want to consider posting your question to the odbc list.
 
 Merlin 
 
 
 ---(end of 
 broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
 

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] Performance issue

2003-09-24 Thread peter
Hello,

I have been trying to get my Postgres database to do faster inserts.

The environment is basically a single user situation.

The part that I would like to speed up is when a User copys a Project.
A Project consists of a number of Rooms(say 60). Each room contains a 
number of items.
A project will contain say 20,000 records.

Anyway the copying process gets slower and slower, as more projects are 
added to the database.

My statistics(Athlon 1.8Ghz)

20,000 itemsTakes on average 0.078seconds/room
385,000 items  Takes on average .11seconds/room
690,000 items  takes on average .270seconds/room
1,028,000 items   Takes on average .475seconds/room
As can be seen the time taken to process each room increases. A commit 
occurs when a room has been copied.
The hard drive  is not being driven very hard. The hard drive light 
only flashes about twice a second when there are a million records in 
the database.

I thought that the problem could have been my plpgsql procedure because 
I assume the code is interpreted.
However I have just rewriten the code using straight sql(with some temp 
fields),
and the times turn out to be almost exactly the same as the plpgsql 
version.

The read speed for the Application is fine. The sql planner seems to be 
doing a good job. There has been only one problem
that I have found with one huge select, which was fixed by a cross join.

 I am running Red hat 8. Some of my conf entries that I have changed 
follow
shared_buffers = 3700
effective_cache_size = 4000
sort_mem = 32168

Are the increasing times reasonable?
The times themselves might look slow, but thats because there are a 
number of tables involved in a Copy

I can increase the shared buffer sizes above 32M, but would this really 
help?

TIA

peter Mcgregor

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Performance issue

2003-09-24 Thread Joseph Bove
Peter,

One possibility is to drop all the indexes, do the insert and re-add the 
indexes.

The more indexes that exist and the more rows that exist, the more costly 
the insert.

Regards,

Joseph

At 05:48 PM 9/24/2003 +1200, peter wrote:
Hello,

I have been trying to get my Postgres database to do faster inserts.

The environment is basically a single user situation.

The part that I would like to speed up is when a User copys a Project.
A Project consists of a number of Rooms(say 60). Each room contains a 
number of items.
A project will contain say 20,000 records.

Anyway the copying process gets slower and slower, as more projects are 
added to the database.

My statistics(Athlon 1.8Ghz)

20,000 itemsTakes on average 0.078seconds/room
385,000 items  Takes on average .11seconds/room
690,000 items  takes on average .270seconds/room
1,028,000 items   Takes on average .475seconds/room
As can be seen the time taken to process each room increases. A commit 
occurs when a room has been copied.
The hard drive  is not being driven very hard. The hard drive light only 
flashes about twice a second when there are a million records in the database.

I thought that the problem could have been my plpgsql procedure because I 
assume the code is interpreted.
However I have just rewriten the code using straight sql(with some temp 
fields),
and the times turn out to be almost exactly the same as the plpgsql version.

The read speed for the Application is fine. The sql planner seems to be 
doing a good job. There has been only one problem
that I have found with one huge select, which was fixed by a cross join.

 I am running Red hat 8. Some of my conf entries that I have changed follow
shared_buffers = 3700
effective_cache_size = 4000
sort_mem = 32168
Are the increasing times reasonable?
The times themselves might look slow, but thats because there are a number 
of tables involved in a Copy

I can increase the shared buffer sizes above 32M, but would this really help?

TIA

peter Mcgregor

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Performance issue

2003-09-24 Thread Richard Jones
get rid of any unnecessary indexes?
i've found that droping indexes and re-creating them isn't usually worth the 
effort

mount the disk with the noatime option which saves you the time involved in 
updating the last access time on files

make sure you're doing all the inserts in one transaction..  wrapping a bunch 
of INSERTS in BEGIN  COMMIT speeds them up loads.




 At 05:48 PM 9/24/2003 +1200, peter wrote:
 Hello,
 
 I have been trying to get my Postgres database to do faster inserts.
 
 The environment is basically a single user situation.
 
 The part that I would like to speed up is when a User copys a Project.
 A Project consists of a number of Rooms(say 60). Each room contains a
 number of items.
 A project will contain say 20,000 records.
 
 Anyway the copying process gets slower and slower, as more projects are
 added to the database.
 
 My statistics(Athlon 1.8Ghz)
 
 20,000 itemsTakes on average 0.078seconds/room
 385,000 items  Takes on average .11seconds/room
 690,000 items  takes on average .270seconds/room
 1,028,000 items   Takes on average .475seconds/room
 
 As can be seen the time taken to process each room increases. A commit
 occurs when a room has been copied.
 The hard drive  is not being driven very hard. The hard drive light only
 flashes about twice a second when there are a million records in the
  database.
 
 I thought that the problem could have been my plpgsql procedure because I
 assume the code is interpreted.
 However I have just rewriten the code using straight sql(with some temp
 fields),
 and the times turn out to be almost exactly the same as the plpgsql
  version.
 
 The read speed for the Application is fine. The sql planner seems to be
 doing a good job. There has been only one problem
 that I have found with one huge select, which was fixed by a cross join.
 
   I am running Red hat 8. Some of my conf entries that I have changed
  follow shared_buffers = 3700
 effective_cache_size = 4000
 sort_mem = 32168
 
 Are the increasing times reasonable?
 The times themselves might look slow, but thats because there are a number
 of tables involved in a Copy
 
 I can increase the shared buffer sizes above 32M, but would this really
  help?
 
 TIA
 
 peter Mcgregor
 
 
 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster

 ---(end of broadcast)---
 TIP 2: you can get off all lists at once with the unregister command
 (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Performance issue

2003-09-24 Thread Sean Chittenden
 My statistics(Athlon 1.8Ghz)
 
 20,000 itemsTakes on average 0.078seconds/room
 385,000 items  Takes on average .11seconds/room
 690,000 items  takes on average .270seconds/room
 1,028,000 items   Takes on average .475seconds/room
[snip]
  I am running Red hat 8. Some of my conf entries that I have changed 
 follow
 shared_buffers = 3700
 effective_cache_size = 4000
 sort_mem = 32168

Have you twiddled with your wal_buffers or checkpoint_segments?  Might
be something to look at.

-sc

-- 
Sean Chittenden

---(end of broadcast)---
TIP 8: explain analyze is your friend