subject:"Re\: \[PERFORM\] Performance issue"

Re: [PERFORM] Performance Issue -- "Materialize"

2017-08-21 Thread Jeff Janes

On Sat, Aug 19, 2017 at 10:37 AM, anand086  wrote:

Your email is very hard to read, the formatting and line wrapping is
heavily mangled.  You might want to attach the plans as files attachments
instead of or in addition to putting the in the body.



>  -> Index Only Scan using ui_nkey_test_tab on test_tab test_tab_1
> (cost=0.56..2.03 rows=1 width=8) |
>
>  Index Cond: ((import_num = '4520460'::numeric) AND (login IS NOT 
> NULL))
>
>
It looks like the statistics for your table are desperately out of date, as
a later query showed there are 762599 rows (unless login is null for all of
them) but the above is estimating there is only one.   When was the table
last analyzed?

Cheers,

Jeff

On Sat, Aug 19, 2017 at 10:37 AM, anand086  wrote:

> I am a Postgres Newbie and trying to learn :) We have a scenario wherein,
> one of the SQL with different input value for import_num showing different
> execution plan. As an example, with import_num = '4520440' the execution
> plan shows Nested Loop and is taking ~12secs. With import_num = '4520460'
> execution plan showed using "Materialize" and never completed. After I set
> enable_material to off, the execution plan is changed using Hash Semi Join
> and completes in less than 3 secs. SELECT count(*) FROM test_tab WHERE
> login IN (SELECT DISTINCT login FROM test_tab WHERE import_num = '4520440'
> AND login IS NOT NULL EXCEPT SELECT DISTINCT login FROM test_tab WHERE
> import_num = '0' AND login IS NOT NULL) AND import_num = '4520440';
> ++ | count | ++ | 746982 | ++ (1 row) Time:
> 12054.274 ms
>
> +---+
> |
> QUERY PLAN
>  |
> +---+
> | Aggregate  (cost=351405.08..351405.09 rows=1 width=8)   
>   
> |
> |   ->  Nested Loop  (cost=349846.23..350366.17 rows=415562 width=0)  
>   
> |
> | ->  HashAggregate  (cost=349845.67..349847.67 rows=200 width=96)
>   
> |
> |   Group Key: ("ANY_subquery".login)::text   
>   
> |
> |   ->  Subquery Scan on "ANY_subquery"  
> (cost=340828.23..348557.47 rows=515282 width=96)  
>  |
> | ->  SetOp Except  (cost=340828.23..343404.65 
> rows=515282 width=100)
>|
> |   ->  Sort  (cost=340828.23..342116.44 rows=515283 
> width=100)
>  |
> | Sort Key: "*SELECT* 1".login
>   
> |
> | ->  Append  (cost=0.56..275836.74 
> rows=515283 width=100)
>   |
> |   ->  Subquery Scan on "*SELECT* 1"  
> (cost=0.56..275834.70 rows=515282 width=12)   
>|
> | ->  Unique  
> (cost=0.56..270681.88 rows=515282 width=8)
> |
> |   ->  Index Only Scan using 
> ui_nkey_test_tab on test_tab test_tab_1  (cost=0.56..268604.07 rows=831125 
> width=8) |
> | Index Cond: 
> ((import_num = '4520440'::numeric) AND (login IS NOT NULL))   
> |
> |   ->  Subquery Scan on "*SELECT* 2"  
> (cost=0.56..2.04 rows=1 width=12) 
>|
> | ->  Unique  (cost=0.56..2.03 
> rows=1 width=8)   
>|
> |   ->  Index Only Scan using 
> ui_nkey_test_tab on test_tab test_tab_2  (cost=0.56..2.03 rows=1 width=8) 
>   |
> | Index Cond: 
> ((import_num = '0'::numeric) AND (login IS NOT NULL)) 
> |
> | ->  Index Only Scan using

Re: [PERFORM] Performance Issue -- "Materialize"

2017-08-21 Thread Carlos Augusto Machado

I think you query is a bit confusing and have many subqueries, so I tried
to simplify

If you cant´t have more import_num = 0 to the same login, try this

SELECT count(*)
FROM test_tab tab1
LEFT JOIN test_tab tab2
ON tab1.login = tab2.login AND tab2.import_num = '0'
WHERE
   tab2.login IS NULL AND
   import_num = '4520440'

otherwise try this

SELECT count(*)
FROM test_tab tab1
LEFT JOIN (
   SELECT DISTINCT login FROM test_tab WHERE import_num = '0'
) tab2
ON tab1.login = tab2.login
WHERE
   tab2.login IS NULL AND
   import_num = '4520440'


Em seg, 21 de ago de 2017 às 15:47, Carlos Augusto Machado <
caugus...@gmail.com> escreveu:

>
> Do you have an index on login column ?
>
> If not, try creating an index and taking off those DISTICTs.
>
> Em seg, 21 de ago de 2017 às 15:33, Justin Pryzby 
> escreveu:
>
>> On Sat, Aug 19, 2017 at 10:37:56AM -0700, anand086 wrote:
>> >
>> +---+|
>> > QUERY PLAN
>> >
>> |+---+|
>> > Aggregate  (cost=351405.08..351405.09 rows=1 width=8)
>>
>> Would you send explain ANALYZE and not just explain ?
>>
>> Justin
>>
>>
>> --
>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org
>> )
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
>>
>

Re: [PERFORM] Performance Issue -- "Materialize"

2017-08-21 Thread Carlos Augusto Machado

Do you have an index on login column ?

If not, try creating an index and taking off those DISTICTs.

Em seg, 21 de ago de 2017 às 15:33, Justin Pryzby 
escreveu:

> On Sat, Aug 19, 2017 at 10:37:56AM -0700, anand086 wrote:
> >
> +---+|
> > QUERY PLAN
> >
> |+---+|
> > Aggregate  (cost=351405.08..351405.09 rows=1 width=8)
>
> Would you send explain ANALYZE and not just explain ?
>
> Justin
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: [PERFORM] Performance Issue -- "Materialize"

2017-08-21 Thread Justin Pryzby

On Sat, Aug 19, 2017 at 10:37:56AM -0700, anand086 wrote:
> +---+|
>
> QUERY PLAN
> 
> |+---+|
> Aggregate  (cost=351405.08..351405.09 rows=1 width=8) 
>

Would you send explain ANALYZE and not just explain ?

Justin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance Issue -- "Materialize"

2017-08-21 Thread anand086

Any thoughts on this? 



--
View this message in context: 
http://www.postgresql-archive.org/Performance-Issue-Materialize-tp5979128p5979481.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance issue in PostgreSQL server...

2017-04-27 Thread Justin Pryzby

On Mon, Mar 06, 2017 at 12:17:22PM +, Dinesh Chandra 12108 wrote:
> Below is the output of Query SELECT * FROM pg_stats WHERE tablename='point' 
> AND attname='domain_class_id' ;
> 
> 
> schemaname | tablename | attname | inherited | null_frac | avg_width 
> | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | 
> correlation
> 
> "evidence"|"point"|"domain_class_id"|f|0|8|10|"{7,9,2,11,43,3,1,10,4,17}"|"{0.9322,0.0451333,0.0145,0.0039,0.0018,0.00146667,0.0005,0.0003,6.7e-05,6.7e-05}"|""|0.889078

On Fri, Mar 03, 2017 at 12:44:07PM +, Dinesh Chandra 12108 wrote:
>->  Index Scan using point_domain_class_id_index on point p  
> (cost=0.00..1483472.70 rows=1454751 width=16) (actual time=27.265..142101.1 
> 59 rows=1607491 loops=1)

On Sun, Mar 05, 2017 at 08:23:08PM -0800, Jeff Janes wrote:
> Why wouldn't this be using a bitmap scan rather than a regular index scan?
> It seems like it should prefer the bitmap scan, unless the table is well
> clustered on domain_class_id.  In which case, why isn't it just faster?

I missed your response until now, and can't see that anybody else responded,
but I suspect the issue is that the *table* is highly correlated WRT this
column, but the index may not be, probably due to duplicated index keys.
postgres only stores statistics on expression indices, and falls back to
correlation of table column of a simple indices.

If you're still fighting this, would you send result of:

SELECT domain_class_id, count(1) FROM point GROUP BY 1 ORDER BY 2 DESC LIMIT 22;
or,
SELECT count(1) FROM point GROUP BY domain_class_id ORDER BY 1 DESC LIMIT 22;

if there's much repetition in the index keys, then PG's planner thinks an index
scan has low random_page_cost, and effective_cache_size has little effect on
large tables, and it never uses bitmap scan, which blows up if the index is
fragmented and has duplicate keys.  The table reads end up costing something
like 1454751*random_page_cost nonsequential reads and fseek() calls when it
thinks it'll cost only 1454751*16*seq_page_cost.

Is the query much faster if you first reindex point_domain_class_id_index ?

This has come up before, see:
> https://www.postgresql.org/message-id/flat/520D6610.8040907%40emulex.com#520d6610.8040...@emulex.com
> https://www.postgresql.org/message-id/flat/20160524173914.GA11880%40telsasoft.com#20160524173914.ga11...@telsasoft.com
> https://www.postgresql.org/message-id/flat/n6cmpug13b9rk1srebjvhphg0lm8dou1kn%404ax.com#n6cmpug13b9rk1srebjvhphg0lm8dou...@4ax.com

Justin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance issue after upgrading from 9.4 to 9.6

2017-03-27 Thread Merlin Moncure

On Mon, Mar 6, 2017 at 7:20 AM, Piotr Gasidło  wrote:
> We are having some performance issues after we upgraded to newest
> version of PostgreSQL, before it everything was fast and smooth.
>
> Upgrade was done by pg_upgrade from 9.4 directly do 9.6.1. Now we
> upgraded to 9.6.2 with no improvement.
>
> Some information about our setup: Freebsd, Solaris (SmartOS), simple
> master-slave using streaming replication.
>
> Problem:
> Very high system CPU when master is streaming replication data, CPU
> goes up to 77%. Only one process is generating this load, it's a
> postgresql startup process. When I attached a truss to this process I
> saw a lot o read calls with almost the same number of errors (EAGAIN).
>
> root@d8:~ # truss -c -p 38091
> ^Csyscall seconds   calls  errors
> semop   0.001611782 198   0
> write   0.74404   2   0
> read2.281535100   17266   12375
> openat  0.000683532  48   0
> lseek   0.177612479   20443   0
> close   0.000395549  48   0
>   - --- ---
> 2.461912846   38005   12375
>
> read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
> lseek(444,0x0,SEEK_END) = 32571392 (0x1f1)
> read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
> lseek(444,0x0,SEEK_END) = 32571392 (0x1f1)
> read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
> lseek(7,0x0,SEEK_END) = 164487168 (0x9cde000)
> lseek(778,0x0,SEEK_END) = 57344 (0xe000)
> read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
> lseek(894,0x0,SEEK_END) = 139296768 (0x84d8000)
> lseek(894,0x4b7e000,SEEK_SET) = 79159296 (0x4b7e000)
> read(894," ~\0\08\a\M--m\0\0\^A\0\M^T\"...,8192) = 8192 (0x2000)
> lseek(3,0xfa6000,SEEK_SET) = 16408576 (0xfa6000)
> read(3,"\M^S\M-P\^E\0\^A\0\0\0\0`\M-z"...,8192) = 8192 (0x2000)
> read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
> lseek(894,0x0,SEEK_END) = 139296768 (0x84d8000)
> lseek(894,0x0,SEEK_END) = 139296768 (0x84d8000)
> lseek(894,0x449c000,SEEK_SET) = 71942144 (0x449c000)
> read(894,"\^_~\0\0\M-H\M-H\M-B\M-b\0\0\^E"...,8192) = 8192 (0x2000)
> lseek(818,0x0,SEEK_END) = 57344 (0xe000)
> read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
> lseek(442,0x0,SEEK_END) = 10174464 (0x9b4000)
> lseek(442,0x4c4000,SEEK_SET) = 4997120 (0x4c4000)
> read(442,"\^_~\0\0\M-P\M-+\M-1\M-b\0\0\0\0"...,8192) = 8192 (0x2000)
> read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
>
> Descriptor 6 is a pipe
>
> Read call try to read one byte over and over, I looked up to source
> code and I think this file is responsible for this behavior
> src/backend/storage/ipc/latch.c. There was no such file in 9.4.

Is a git bisect out of the question?

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance issue in PostgreSQL server...

2017-03-06 Thread Dinesh Chandra 12108

Dear Justin,

Below is the output of Query SELECT * FROM pg_stats WHERE tablename='point' AND 
attname='domain_class_id' ;


schemaname | tablename | attname | inherited | null_frac | avg_width | 
n_distinct | most_common_vals | most_common_freqs | histogram_bounds | 
correlation

"evidence"|"point"|"domain_class_id"|f|0|8|10|"{7,9,2,11,43,3,1,10,4,17}"|"{0.9322,0.0451333,0.0145,0.0039,0.0018,0.00146667,0.0005,0.0003,6.7e-05,6.7e-05}"|""|0.889078


Regards,
Dinesh Chandra
|Database administrator (Oracle/PostgreSQL)| Cyient Ltd. Noida.


-Original Message-
From: Justin Pryzby [mailto:pry...@telsasoft.com]
Sent: 06 March, 2017 10:54 AM
To: Dinesh Chandra 12108 <dinesh.chan...@cyient.com>
Cc: Nur Agus <nuragus.li...@gmail.com>; Jeff Janes <jeff.ja...@gmail.com>; 
pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Performance issue in PostgreSQL server...

On Sun, Mar 05, 2017 at 08:23:08PM -0800, Jeff Janes wrote:
> On Fri, Mar 3, 2017 at 4:44 AM, Dinesh Chandra 12108 
> <dinesh.chan...@cyient.com> wrote:
> > The below is the output for psql=> EXPLAIN ANALYZE SELECT DISTINCT
> > feature_id FROM evidence.point p INNER JOIN
> > evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id
> > WHERE p.domain_class_id IN (11) AND (p.modification_time >
> > '2015-05-10 00:06:56.056 IST' OR oe.modification_time > '2015-05-10
> > 00:06:56.056 IST') ORDER BY feature_id
> ...
>
> >  ->  Index Scan using point_domain_class_id_index on 
> > point p  (cost=0.00..1483472.70 rows=1454751 width=16) (actual 
> > time=27.265..142101.1 59 rows=1607491 loops=1)
> >Index Cond: (domain_class_id = 11)
>
> Why wouldn't this be using a bitmap scan rather than a regular index scan?
> It seems like it should prefer the bitmap scan, unless the table is
> well clustered on domain_class_id.  In which case, why isn't it just faster?

Could you send:

SELECT * FROM pg_stats WHERE tablename='point' AND attname='domain_class_id' ;

.. or if that's too verbose or you don't want to share the histogram or MCV
list:

SELECT correlation FROM pg_stats WHERE tablename='point' AND 
attname='domain_class_id' ;

Justin



DISCLAIMER:

This email message is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information. Any unauthorized review, use, 
disclosure or distribution is prohibited. If you are not the intended 
recipient, please contact the sender by reply email and destroy all copies of 
the original message. Check all attachments for viruses before opening them. 
All views or opinions presented in this e-mail are those of the author and may 
not reflect the opinion of Cyient or those of our affiliates.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance issue in PostgreSQL server...

2017-03-05 Thread Justin Pryzby

On Sun, Mar 05, 2017 at 08:23:08PM -0800, Jeff Janes wrote:
> On Fri, Mar 3, 2017 at 4:44 AM, Dinesh Chandra 12108 
>  wrote:
> > The below is the output for psql=> EXPLAIN ANALYZE SELECT DISTINCT
> > feature_id FROM evidence.point p INNER JOIN evidence.observation_evidence
> > oe ON p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (11) AND
> > (p.modification_time > '2015-05-10 00:06:56.056 IST' OR
> > oe.modification_time > '2015-05-10 00:06:56.056 IST') ORDER BY feature_id
> ...
> 
> >  ->  Index Scan using point_domain_class_id_index on 
> > point p  (cost=0.00..1483472.70 rows=1454751 width=16) (actual 
> > time=27.265..142101.1 59 rows=1607491 loops=1)
> >Index Cond: (domain_class_id = 11)
> 
> Why wouldn't this be using a bitmap scan rather than a regular index scan?
> It seems like it should prefer the bitmap scan, unless the table is well
> clustered on domain_class_id.  In which case, why isn't it just faster?

Could you send:

SELECT * FROM pg_stats WHERE tablename='point' AND attname='domain_class_id' ;

.. or if that's too verbose or you don't want to share the histogram or MCV
list:

SELECT correlation FROM pg_stats WHERE tablename='point' AND 
attname='domain_class_id' ;

Justin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance issue in PostgreSQL server...

2017-03-05 Thread Jeff Janes

On Fri, Mar 3, 2017 at 4:44 AM, Dinesh Chandra 12108 <
dinesh.chan...@cyient.com> wrote:

> Dear Nur,
>
>
>
> The below is the output for psql=> EXPLAIN ANALYZE SELECT DISTINCT
> feature_id FROM evidence.point p INNER JOIN evidence.observation_evidence
> oe ON p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (11) AND
> (p.modification_time > '2015-05-10 00:06:56.056 IST' OR
> oe.modification_time > '2015-05-10 00:06:56.056 IST') ORDER BY feature_id
>
>
>
>
...

>  ->  Index Scan using point_domain_class_id_index on
> point p  (cost=0.00..1483472.70 rows=1454751 width=16) (actual
> time=27.265..142101.1
>
> 59 rows=1607491 loops=1)
>
>Index Cond: (domain_class_id = 11)
>

Why wouldn't this be using a bitmap scan rather than a regular index scan?
It seems like it should prefer the bitmap scan, unless the table is well
clustered on domain_class_id.  In which case, why isn't it just faster?

You could try repeating the explain analyze after setting enable_indexscan
=off to see what that gives.  If it gives a seq scan, then repeat with
enable_seqscan also turned off.  Or If it gives the bitmap scan, then
repeat with enable_bitmapscan turned off.

How many rows is in point, and how big is it?

The best bet for making this better might be to have an index on
(domain_class_id, modification_time) and hope for an index only scan.
Except that you are on 9.1, so first you would have to upgrade.  Which
would allow you to use BUFFERS in the explain analyze, as well as
track_io_timings, both of which would also be pretty nice to see.  Using
9.1 is like having one hand tied behind your back.

Also, any idea why this execution of this query 15 is times faster than the
execution you found in the log file?  Was the top output you showed in the
first email happening at the time the really slow query was running, or was
that from a different period?

Cheers,

Jeff

Re: [PERFORM] Performance issue in PostgreSQL server...

2017-03-03 Thread Tom Lane

Dinesh Chandra 12108  writes:
> The below is the output for psql=> EXPLAIN ANALYZE SELECT DISTINCT feature_id 
> FROM evidence.point p INNER JOIN evidence.observation_evidence oe ON 
> p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (11) AND 
> (p.modification_time > '2015-05-10 00:06:56.056 IST' OR oe.modification_time 
> > '2015-05-10 00:06:56.056 IST') ORDER BY feature_id

I think that's a fundamentally slow query and you're not going to be able
to make it better without rethinking your requirements and/or data
representation.  As written, that requires the server to form the entire
join of p to oe on feature_id, with the only filter before the join being
the evidently-none-too-selective domain_class_id condition.  Only after
joining can it apply the OR condition.  So this is inherently processing a
lot of rows.

If the OR arms were individually pretty selective you could rewrite this
into a UNION of two joins, a la the discussion at
https://www.postgresql.org/message-id/flat/7f70bd5a-5d16-e05c-f0b4-2fdfc8873...@bluetreble.com
but given the dates involved I'm betting that won't help very much.

Or maybe you could try

select feature_id from p where domain_class_id IN (11) AND p.modification_time 
> '2015-05-10 00:06:56.056 IST'
intersect
select feature_id from oe where oe.modification_time > '2015-05-10 00:06:56.056 
IST'
order by feature_id

although I'm not entirely certain that that has exactly the same
semantics (-ENOCAFFEINE), and it might still be none too quick.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance issue in PostgreSQL server...

2017-03-03 Thread Dinesh Chandra 12108

Dear Nur,

The below is the output for psql=> EXPLAIN ANALYZE SELECT DISTINCT feature_id 
FROM evidence.point p INNER JOIN evidence.observation_evidence oe ON 
p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (11) AND 
(p.modification_time > '2015-05-10 00:06:56.056 IST' OR oe.modification_time > 
'2015-05-10 00:06:56.056 IST') ORDER BY feature_id



  QUERY PLAN

-
-
Unique  (cost=1679730.32..1679837.46 rows=21428 width=8) (actual 
time=154753.528..155657.818 rows=1607489 loops=1)
   ->  Sort  (cost=1679730.32..1679783.89 rows=21428 width=8) (actual 
time=154753.514..155087.734 rows=4053270 loops=1)
 Sort Key: p.feature_id
 Sort Method: quicksort  Memory: 288302kB
 ->  Hash Join  (cost=1501657.09..1678188.87 rows=21428 width=8) 
(actual time=144146.620..152050.311 rows=4053270 loops=1)
   Hash Cond: (oe.evd_feature_id = p.feature_id)
   Join Filter: ((p.modification_time > '2015-05-10 
03:36:56.056+05:30'::timestamp with time zone) OR (oe.modification_time > 
'2015-05-10 03:36:5
6.056+05:30'::timestamp with time zone))
   ->  Seq Scan on observation_evidence oe  (cost=0.00..121733.18 
rows=5447718 width=16) (actual time=0.007..1534.905 rows=5434406 loops=1)
   ->  Hash  (cost=1483472.70..1483472.70 rows=1454751 width=16) 
(actual time=144144.653..144144.653 rows=1607491 loops=1)
 Buckets: 262144  Batches: 1  Memory Usage: 75352kB
 ->  Index Scan using point_domain_class_id_index on point 
p  (cost=0.00..1483472.70 rows=1454751 width=16) (actual time=27.265..142101.1
59 rows=1607491 loops=1)
   Index Cond: (domain_class_id = 11)
Total runtime: 155787.379 ms
(13 rows)


Regards,
Dinesh Chandra
|Database administrator (Oracle/PostgreSQL)| Cyient Ltd. Noida.
--
Mobile: +91-9953975849 | Ext 1078 
|dinesh.chan...@cyient.com<mailto:%7cdinesh.chan...@cyient.com>
Plot No. 7, NSEZ, Phase-II ,Noida-Dadri Road, Noida - 201 305,India.

From: Nur Agus [mailto:nuragus.li...@gmail.com]
Sent: 03 March, 2017 5:54 PM
To: Dinesh Chandra 12108 <dinesh.chan...@cyient.com>
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Performance issue in PostgreSQL server...

Hello Dinesh,

You can try the EXPLAIN tool

psql=> EXPLAIN ANALYZE SELECT DISTINCT feature_id FROM evidence.point p INNER 
JOIN evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE 
p.domain_class_id IN (11) AND (p.modification_time > '2015-05-10 00:06:56.056 
IST' OR oe.modification_time > '2015-05-10 00:06:56.056 IST') ORDER BY 
feature_id

Then paste here the result.

Thanks

On Fri, Mar 3, 2017 at 5:29 PM, Dinesh Chandra 12108 
<dinesh.chan...@cyient.com<mailto:dinesh.chan...@cyient.com>> wrote:
Dear Experts,

I need your suggestions to resolve the performance issue reported on our 
PostgreSQL9.1 production database having 1.5 TB Size. I have observed that, 
some select queries with order by clause are taking lot of time in execution 
and forcing applications to give slow response.

The configuration of database server is :

Architecture: x86_64
CPU op-mode(s):32-bit, 64-bit
CPU’s : 8
Core(s) per socket:4
Socket(s): 2
Model name:Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz

RAM : 32 GB
SWAP :8 Gb

Kernel parameter:

kernel.shmmax = 32212254720
kernel.shmall = 1073741824


Values of PostgreSQL.conf parameters are :

shared_buffers = 10GB
temp_buffers = 32MB
work_mem = 512MB
maintenance_work_mem = 2048MB
max_files_per_process = 2000
checkpoint_segments = 200
max_wal_senders = 5
wal_buffers = -1  # min 32kB, -1 sets based on 
shared_buffers


Queries taking lot of time are:
==


2017-03-02 00:46:50 IST LOG:  duration: 2492951.927 ms  execute : 
SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN 
evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE 
p.domain_class_id IN (11) AND (p.modification_time > '2015-05-10 00:06:56.056 
IST' OR oe.modification_time > '2015-05-10 00:06:56.056 IST') ORDER BY 
feature_id


2017-03-02 01:05:16 IST LOG:  duration: 516250.512 ms  execute : 
SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN 
evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE 
p.domain_class_id IN (3) AND (p.modification_time > '2015-05-10 01:22:59.059 
IST' OR oe.modification_time > '2015-05-10 01:22:59.059 IST') ORDER BY 
feature_id


Top command output:

top - 15:13:15 up 66 days,  3:45,  8 users,  load average: 1.8

Re: [PERFORM] Performance issue in PostgreSQL server...

2017-03-03 Thread Nur Agus

Hello Dinesh,

You can try the EXPLAIN tool

psql=> EXPLAIN ANALYZE SELECT DISTINCT feature_id FROM evidence.point p
INNER JOIN evidence.observation_evidence oe ON p.feature_id =
oe.evd_feature_id WHERE p.domain_class_id IN (11) AND (p.modification_time
> '2015-05-10 00:06:56.056 IST' OR oe.modification_time > '2015-05-10
00:06:56.056 IST') ORDER BY feature_id

Then paste here the result.

Thanks

On Fri, Mar 3, 2017 at 5:29 PM, Dinesh Chandra 12108 <
dinesh.chan...@cyient.com> wrote:

> Dear Experts,
>
>
>
> I need your suggestions to resolve the performance issue reported on our
> *PostgreSQL9.1* production database having 1.5 TB *Size*. I have observed
> that, some select queries with order by clause are taking lot of time in
> execution and forcing applications to give slow response.
>
>
>
> The configuration of database server is :
>
>
>
> Architecture: x86_64
>
> CPU op-mode(s):32-bit, 64-bit
>
> CPU’s : 8
>
> Core(s) per socket:4
>
> Socket(s): 2
>
> Model name:Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz
>
>
>
> *RAM* : 32 GB
>
> *SWAP* :8 Gb
>
>
>
> *Kernel parameter*:
>
>
>
> kernel.shmmax = 32212254720
>
> kernel.shmall = 1073741824
>
>
>
>
>
> Values of PostgreSQL.conf parameters are :
>
>
>
> shared_buffers = 10GB
>
> temp_buffers = 32MB
>
> work_mem = 512MB
>
> maintenance_work_mem = 2048MB
>
> max_files_per_process = 2000
>
> checkpoint_segments = 200
>
> max_wal_senders = 5
>
> wal_buffers = -1  # min 32kB, -1 sets based on
> shared_buffers
>
>
>
>
>
> *Queries taking lot of time are:*
>
> ==
>
>
>
>
>
> 2017-03-02 00:46:50 IST LOG:  duration: 2492951.927 ms  execute :
> SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN
> evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE
> p.domain_class_id IN (11) AND (p.modification_time > '2015-05-10
> 00:06:56.056 IST' OR oe.modification_time > '2015-05-10 00:06:56.056 IST')
> ORDER BY feature_id
>
>
>
>
>
> 2017-03-02 01:05:16 IST LOG:  duration: 516250.512 ms  execute :
> SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN
> evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE
> p.domain_class_id IN (3) AND (p.modification_time > '2015-05-10
> 01:22:59.059 IST' OR oe.modification_time > '2015-05-10 01:22:59.059 IST')
> ORDER BY feature_id
>
>
>
>
>
> *Top command output*:
>
>
>
> top - 15:13:15 up 66 days,  3:45,  8 users,  load average: 1.84, 1.59, 1.57
>
> Tasks: 830 total,   1 running, 828 sleeping,   0 stopped,   1 zombie
>
> Cpu(s):  3.4%us,  0.7%sy,  0.0%ni, 81.7%id, 14.2%wa,  0.0%hi,  0.0%si,
> 0.0%st
>
> *Mem:*  32830016k total, *32142596k* used,   *687420k* free,77460k
> buffers
>
> Swap:  8190972k total,   204196k used,  7986776k free, 27981268k cached
>
>
>
>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>
> 30639 postgres  20   0 10.5g 4.7g 4.7g S 13.5 14.9  10:20.95 postgres
>
> 18185 postgres  20   0 10.5g 603m 596m S  4.9  1.9   2:51.16 postgres
>
> 16543 postgres  20   0 10.5g 2.8g 2.8g S  4.3  8.8   1:34.04 postgres
>
> 14710 postgres  20   0 10.5g 2.9g 2.9g S  3.9  9.2   1:20.84 postgres
>
> 1214 root  20   0 15668 1848  896 S  1.0  0.0 130:46.43 top
>
> 13462 postgres  20   0 10.5g 1.4g 1.3g S  1.0  4.3   0:25.56 postgres
>
> 20081 root  20   0 15668 1880  936 R  1.0  0.0   0:00.12 top
>
> 13478 postgres  20   0 10.5g 2.1g 2.1g S  0.7  6.9   0:56.43 postgres
>
> 41107 root  20   0  416m  10m 4892 S  0.7  0.0 305:25.71 pgadmin3
>
> 2680 root  20   0 000 S  0.3  0.0 103:38.54 nfsiod
>
> 3558 root  20   0 13688 1100  992 S  0.3  0.0  45:00.36 gam_server
>
> 15576 root  20   0 000 S  0.3  0.0   0:01.16 flush-253:1
>
> 18430 postgres  20   0 10.5g  18m  13m S  0.3  0.1   0:00.64 postgres
>
> 20083 postgres  20   0  105m 1852 1416 S  0.3  0.0   0:00.01 bash
>
> 24188 postgres  20   0  102m 1856  832 S  0.3  0.0   0:23.39 sshd
>
> 28250 postgres  20   0  156m 1292  528 S  0.3  0.0   0:46.86 postgres
>
> 1 root  20   0 19356 1188  996 S  0.0  0.0   0:05.00 init
>
>
>
> *Regards,*
>
> *Dinesh Chandra*
>
> *|Database administrator (Oracle/PostgreSQL)| *
>
>
>
> --
>
> DISCLAIMER:
>
> This email message is for the sole use of the intended recipient(s) and
> may contain confidential and privileged information. Any unauthorized
> review, use, disclosure or distribution is prohibited. If you are not the
> intended recipient, please contact the sender by reply email and destroy
> all copies of the original message. Check all attachments for viruses
> before opening them. All views or opinions presented in this e-mail are
> those of the author and may not reflect the opinion of Cyient or those of
> our affiliates.
>

Re: [PERFORM] Performance issue with castings args of the function

2017-01-02 Thread Андрей Хозов

Thanks all for explain!

On Mon, Jan 2, 2017 at 9:36 PM, Tom Lane  wrote:

> =?UTF-8?B?0JDQvdC00YDQtdC5INCl0L7Qt9C+0LI=?=  writes:
> > create table t1 (id serial, str char(32));
>
> > create function f1(line text) returns void as $$
> > begin
> >   perform * from t1 where str = line;
> > end;
> > $$ language plpgsql;
>
> This query is specifying a text comparison (text = text operator).
> Since the table column isn't text, a char-to-text conversion must
> happen at each line.
>
> > create function f2(line char) returns void as $$
> > begin
> >   perform * from t1 where str = line;
> > end;
> > $$ language plpgsql;
>
> This query is specifying a char(n) comparison (char = char operator).
> No type conversion step needed, so it's faster.
>
> regards, tom lane
>



-- 
Andrey Khozov

Re: [PERFORM] Performance issue with castings args of the function

2017-01-02 Thread Tom Lane

=?UTF-8?B?0JDQvdC00YDQtdC5INCl0L7Qt9C+0LI=?=  writes:
> create table t1 (id serial, str char(32));

> create function f1(line text) returns void as $$
> begin
>   perform * from t1 where str = line;
> end;
> $$ language plpgsql;

This query is specifying a text comparison (text = text operator).
Since the table column isn't text, a char-to-text conversion must
happen at each line.

> create function f2(line char) returns void as $$
> begin
>   perform * from t1 where str = line;
> end;
> $$ language plpgsql;

This query is specifying a char(n) comparison (char = char operator).
No type conversion step needed, so it's faster.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance issue with castings args of the function

2017-01-02 Thread Pavel Stehule

Hi

2017-01-02 15:34 GMT+01:00 Андрей Хозов :

> Hello there!
>
> I have an performance issue with functions and args type.
>
> Table and data:
> create table t1 (id serial, str char(32));
> insert into t1 (str) select md5(s::text) from generate_series(1, 100)
> as s;
>
> And simple functions:
> create function f1(line text) returns void as $$
> begin
>   perform * from t1 where str = line;
> end;
> $$ language plpgsql;
>
> create function f2(line char) returns void as $$
> begin
>   perform * from t1 where str = line;
> end;
> $$ language plpgsql;
>
> Query:
> test=> explain analyze select f2('2b00042f7481c7b056c4b410d28f33cf');
>QUERY PLAN
>
> 
> 
>  Result  (cost=0.00..0.26 rows=1 width=0) (actual time=189.008..189.010
> rows=1 loops=1)
>  Planning time: 0.039 ms
>  Execution time: 189.039 ms
> (3 rows)
>
> Time: 189,524 ms
> test=> explain analyze select f1('2b00042f7481c7b056c4b410d28f33cf');
>QUERY PLAN
>
> 
> 
>  Result  (cost=0.00..0.26 rows=1 width=0) (actual time=513.734..513.735
> rows=1 loops=1)
>  Planning time: 0.024 ms
>  Execution time: 513.757 ms
> (3 rows)
>
> Time: 514,125 ms
> test=> explain analyze select f1('2b00042f7481c7b056c4b410d2
> 8f33cf'::char(32));
>QUERY PLAN
>
> 
> 
>  Result  (cost=0.00..0.26 rows=1 width=0) (actual time=513.507..513.509
> rows=1 loops=1)
>  Planning time: 0.074 ms
>  Execution time: 513.535 ms
> (3 rows)
>

This explain shows nothing - you need to use nested explain

look on auto-explain
https://www.postgresql.org/docs/current/static/auto-explain.html

Maybe index was not used due different types.

Regards

Pavel


> Time: 514,104 ms
> test=>
> 
> Seems that casting param from text to char(32) needs to be done only once
> and f1 and f2 must be identical on performance. But function f2 with text
> param significantly slower, even with casting arg while pass it to function.
>
> Tested postgresql versions 9.5.5 and 9.6.1 on Ubuntu 16.04. It's normal
> behavior or it's can be fixed?
>
> --
> Andrey Khozov
>

Re: [PERFORM] performance issue with bitmap index scans on huge amounts of big jsonb documents

2016-12-09 Thread Merlin Moncure

On Wed, Nov 30, 2016 at 6:26 AM, Marc-Olaf Jaschke
 wrote:
> Hi,
>
> i have a performance issue with bitmap index scans on huge amounts of big 
> jsonb documents.
>
>
> = Background =
>
> - table with big jsonb documents
> - gin index on these documents
> - queries using index conditions with low selectivity
>
>
> = Example =
>
> select version();
>> PostgreSQL 9.6.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 
>> 20120313 (Red Hat 4.4.7-17), 64-bit
>
> show work_mem;
>> 1GB
>
> -- setup test data
> create table bitmap_scan_test as
> select
> i,
> (select jsonb_agg(jsonb_build_object('x', i % 2, 'filler', md5(j::text))) 
> from generate_series(0, 100) j) big_jsonb
> from
> generate_series(0, 10) i;
>
> create index on bitmap_scan_test using gin (big_jsonb);
>
> analyze bitmap_scan_test;
>
>
> --  query with bitmap scan
> explain analyze
> select
> count(*)
> from
> bitmap_scan_test
> where
> big_jsonb @> '[{"x": 1, "filler": "cfcd208495d565ef66e7dff9f98764da"}]';
>
> Aggregate  (cost=272.74..272.75 rows=1 width=8) (actual time=622.272..622.272 
> rows=1 loops=1)
>   ->  Bitmap Heap Scan on bitmap_scan_test  (cost=120.78..272.49 rows=100 
> width=0) (actual time=16.496..617.431 rows=5 loops=1)
> Recheck Cond: (big_jsonb @> '[{"x": 1, "filler": 
> "cfcd208495d565ef66e7dff9f98764da"}]'::jsonb)
> Heap Blocks: exact=637
> ->  Bitmap Index Scan on bitmap_scan_test_big_jsonb_idx  
> (cost=0.00..120.75 rows=100 width=0) (actual time=16.371..16.371 rows=5 
> loops=1)
>   Index Cond: (big_jsonb @> '[{"x": 1, "filler": 
> "cfcd208495d565ef66e7dff9f98764da"}]'::jsonb)
> Planning time: 0.106 ms
> Execution time: 622.334 ms
>
>
> perf top -p... shows heavy usage of pglz_decompress:
>
> Overhead  Shared Object Symbol
>   51,06%  postgres  [.] pglz_decompress
>7,33%  libc-2.12.so  [.] memcpy

Another thing to possibly look at is configuring the column not to
compress; over half the time is spent decompressing the data.  See:
ALTER [ COLUMN ] column_name SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN }

Naturally this is a huge tradeoff so do some careful analysis before
making the change.

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] performance issue with bitmap index scans on huge amounts of big jsonb documents

2016-12-05 Thread Marc-Olaf Jaschke

Thanks for the explanation!

Best Regards,
Marc-Olaf

Marc-Olaf Jaschke · Softwareentwickler
shopping24 GmbH

Werner-Otto-Straße 1-7 · 22179 Hamburg
Telefon: +49 (0) 40 6461 5830 · Fax: +49 (0) 40 64 61 7879
marc-olaf.jasc...@s24.com · www.s24.com
AG Hamburg HRB 63371
vertreten durch Dr. Björn Schäfers und Martin Mildner

2016-12-05 3:28 GMT+01:00 Jeff Janes :

>
> > big_jsonb @> '[{"x": 1, "filler": "cfcd208495d565ef66e7dff9f98764da"}]';
>
>
>> I wonder why bitmap heap scan adds such a big amount of time on top of
>> the plain bitmap index scan.
>> It seems to me, that the recheck is active although all blocks are exact
>> [1] and that pg is loading the jsonb for the recheck.
>>
>> Is this an expected behavior?
>>
>
>
> Yes, this is expected.  The gin index is lossy.  It knows that all the
> elements are present (except when it doesn't--large elements might get
> hashed down and suffer hash collisions), but it doesn't know what the
> recursive structure between them is, and has to do a recheck.
>
> For example, if you change your example where clause to:
>
> big_jsonb @> '[{"filler": 1, "x": "cfcd208495d565ef66e7dff9f98764da"}]';
>
> You will see that the index still returns 50,000 rows, but now all of them
> get rejected upon the recheck.
>
> You could try changing the type of index to jsonb_path_ops.  In your given
> example, it won't make a difference, because you are actually counting half
> the table and so half the table needs to be rechecked.  But in my example,
> jsonb_path_ops successfully rejects all the rows at the index stage.
>
> Cheers,
>
> Jeff
>

Re: [PERFORM] performance issue with bitmap index scans on huge amounts of big jsonb documents

2016-12-04 Thread Jeff Janes

> big_jsonb @> '[{"x": 1, "filler": "cfcd208495d565ef66e7dff9f98764da"}]';


> I wonder why bitmap heap scan adds such a big amount of time on top of the
> plain bitmap index scan.
> It seems to me, that the recheck is active although all blocks are exact
> [1] and that pg is loading the jsonb for the recheck.
>
> Is this an expected behavior?
>


Yes, this is expected.  The gin index is lossy.  It knows that all the
elements are present (except when it doesn't--large elements might get
hashed down and suffer hash collisions), but it doesn't know what the
recursive structure between them is, and has to do a recheck.

For example, if you change your example where clause to:

big_jsonb @> '[{"filler": 1, "x": "cfcd208495d565ef66e7dff9f98764da"}]';

You will see that the index still returns 50,000 rows, but now all of them
get rejected upon the recheck.

You could try changing the type of index to jsonb_path_ops.  In your given
example, it won't make a difference, because you are actually counting half
the table and so half the table needs to be rechecked.  But in my example,
jsonb_path_ops successfully rejects all the rows at the index stage.

Cheers,

Jeff

Re: [PERFORM] Performance issue with NestedLoop query

2015-08-05 Thread Qingqing Zhou

On Tue, Aug 4, 2015 at 8:40 PM, Ram N yrami...@gmail.com wrote:

 Thanks much for responding guys. I have tried both, building multi column
 indexes and GIST, with no improvement. I have reduced the window from 180
 days to 30 days and below are the numbers

 Composite index -  takes 30 secs

 With Btree indexing  - takes 9 secs

 With GIST - takes 30 secs with kind of materialize plan in explain

 Any other ideas I can do for window based joins.


From this query:

select sum(a), count(id), a.ts, st from table1 a, table2 b where a.ts
 b.start_date and a.ts  b.end_date and a.ts  '2015-01-01
20:50:44.00 +00:00:00' and a.ts  '2015-07-01 19:50:44.00
+00:00:00' group by a.ts, st order by a.ts

We can actually derive that b.start_date  '2015-07-01 19:50:44.00
+00:00:00' and b.end_date  '2015-01-01 20:50:44.00 +00:00:00'. If
we add these two predicates to the original query, does it help?

Thanks,
Qingqing


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance issue with NestedLoop query

2015-08-04 Thread Ram N

Thanks much for responding guys. I have tried both, building multi column
indexes and GIST, with no improvement. I have reduced the window from 180
days to 30 days and below are the numbers

Composite index -  takes 30 secs

With Btree indexing  - takes 9 secs

With GIST - takes 30 secs with kind of materialize plan in explain

Any other ideas I can do for window based joins.

--yr


On Fri, Jul 31, 2015 at 11:37 AM, Qingqing Zhou zhouqq.postg...@gmail.com
wrote:

 On Fri, Jul 31, 2015 at 10:55 AM, Ram N yrami...@gmail.com wrote:
 
  Thanks Qingqing for responding. That didn't help. It in fact increased
 the
  scan time. Looks like a lot of time is being spent on the NestedLoop Join
  than index lookups though I am not sure how to optimize the join.
 

 Good news is that optimizer is right this time :-). The NLJ here does
 almost nothing but schedule each outer row to probing the inner index.
 So the index seek is the major cost.

 Have you tried build a two column index on (b.start_date, b.end_date)?

 Regards,
 Qingqing

Re: [PERFORM] Performance issue with NestedLoop query

2015-07-31 Thread Matheus de Oliveira

On Thu, Jul 30, 2015 at 4:51 AM, Ram N yrami...@gmail.com wrote:

 select sum(a), count(id), a.ts, st from table1 a, table2 b where a.ts 
 b.start_date and a.ts  b.end_date and a.ts  '2015-01-01 20:50:44.00
 +00:00:00' and a.ts  '2015-07-01 19:50:44.00 +00:00:00' group by a.ts,
 st order by a.ts


You could try to use a range type:

CREATE INDEX ON table2 USING gin (tstzrange(start_date, end_date,
'()'));

Then:

select sum(a), count(id), a.ts, st
from table1 a, table2 b
where tstzrange(b.start_date, b.end_date, '()') @ a.ts
and a.ts  '2015-07-01 19:50:44.00 +00:00:00'
group by a.ts, st
order by a.ts

Regards,
-- 
Matheus de Oliveira

Re: [PERFORM] Performance issue with NestedLoop query

2015-07-31 Thread Qingqing Zhou

On Fri, Jul 31, 2015 at 10:55 AM, Ram N yrami...@gmail.com wrote:

 Thanks Qingqing for responding. That didn't help. It in fact increased the
 scan time. Looks like a lot of time is being spent on the NestedLoop Join
 than index lookups though I am not sure how to optimize the join.


Good news is that optimizer is right this time :-). The NLJ here does
almost nothing but schedule each outer row to probing the inner index.
So the index seek is the major cost.

Have you tried build a two column index on (b.start_date, b.end_date)?

Regards,
Qingqing


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance issue with NestedLoop query

2015-07-31 Thread Matheus de Oliveira

On Fri, Jul 31, 2015 at 3:06 PM, Matheus de Oliveira 
matioli.math...@gmail.com wrote:

 CREATE INDEX ON table2 USING gin (tstzrange(start_date, end_date,
 '()'));


The index should be USING GIST, not GIN. Sorry.


-- 
Matheus de Oliveira

Re: [PERFORM] Performance issue with NestedLoop query

2015-07-31 Thread Ram N

Thanks Qingqing for responding. That didn't help. It in fact increased the
scan time. Looks like a lot of time is being spent on the NestedLoop Join
than index lookups though I am not sure how to optimize the join. I am
assuming its in memory join, so I am not sure why it should take such a lot
of time. Increase work_mem has helped in reducing the processing time but
it's still  1 min.

--yr

On Thu, Jul 30, 2015 at 1:24 PM, Qingqing Zhou zhouqq.postg...@gmail.com
wrote:

 On Thu, Jul 30, 2015 at 12:51 AM, Ram N yrami...@gmail.com wrote:
-  Index Scan using end_date_idx on public.table2 b
  (cost=0.43..23181.37 rows=345833 width=52) (actual time=0.063..622.274
  rows=403936 loops=181)
  Output: b.serial_no, b.name, b.st, b.end_date, b.a,
  b.start_date
  Index Cond: (a.ts  b.end_date)
  Filter: (a.ts  b.start_date)
  Rows Removed by Filter: 392642

 In your case, do you have index built for both b.end_date and
 b.start_date? If so, can you try

 set enable_index=off

 to see if bitmap heap scan helps?

 Regards,
 Qingqing

Re: [PERFORM] Performance issue with NestedLoop query

2015-07-30 Thread Qingqing Zhou

On Thu, Jul 30, 2015 at 12:51 AM, Ram N yrami...@gmail.com wrote:
   -  Index Scan using end_date_idx on public.table2 b
 (cost=0.43..23181.37 rows=345833 width=52) (actual time=0.063..622.274
 rows=403936 loops=181)
 Output: b.serial_no, b.name, b.st, b.end_date, b.a,
 b.start_date
 Index Cond: (a.ts  b.end_date)
 Filter: (a.ts  b.start_date)
 Rows Removed by Filter: 392642

In your case, do you have index built for both b.end_date and
b.start_date? If so, can you try

set enable_index=off

to see if bitmap heap scan helps?

Regards,
Qingqing


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance issue: index not used on GROUP BY...

2014-09-03 Thread gmb

Thanks for the feedback, everybody.
I spent a couple of days trying to optimise this;  
As mentioned , the increased memory is not an option for me, as this query
is part of a report that can be run by any user on an ad hoc basis.
Allocating the required memory to any session on demand is not feasible in
this environment.

In the end , it seems to me that a more sustainable solution will be to
introduce an additional table to carry the summarized values and lookup on
that table in this type of scenario.

Regards



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Performance-issue-index-not-used-on-GROUP-BY-tp5816702p5817622.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance issue: index not used on GROUP BY...

2014-08-28 Thread Victor Yegorov

2014-08-28 11:50 GMT+03:00 gmb gmbou...@gmail.com:

 It seems as if the planner is not using the PRIMARY KEY as index which was
 my assumption.


Can you send `EXPLAIN (analyze, buffers)` for your query instead?
It'll show exactly what's going on.


-- 
Victor Y. Yegorov

Re: [PERFORM] Performance issue: index not used on GROUP BY...

2014-08-28 Thread gmb



 Can you send `EXPLAIN (analyze, buffers)` for your query instead?
 It'll show exactly what's going on.

GroupAggregate  (cost=303425.31..339014.43 rows=136882 width=48) (actual
time=4708.181..6688.699 rows=287268 loops=1)
  Buffers: shared read=23899, temp read=30974 written=30974
  -  Sort  (cost=303425.31..306847.34 rows=1368812 width=48) (actual
time=4708.170..5319.429 rows=1368744 loops=1)
Sort Key: co_id, client_id, doc_no, 
Sort Method: external merge  Disk: 80304kB
Buffers: shared read=23899, temp read=30974 written=30974
-  Seq Scan on ddetail  (cost=0.00..37587.12 rows=1368812 width=48)
(actual time=0.122..492.964 rows=1368744 loops=1)
  Buffers: shared read=23899
Total runtime: 6708.244 ms


My initial attempt was this  (this is what I actually need):

SELECT  co_id ,  client_id , doc_no ,  line_id , batch_no , sum( amount  )
FROM ddetail
GROUP BY co_id ,  client_id , doc_no ,  line_id  , batch_no ;

but I removed column batch_no from the query because I thought this was the
cause of the problem ( batch_no is not part of my PK ).


Thanks 



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Performance-issue-index-not-used-on-GROUP-BY-tp5816702p5816706.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance issue: index not used on GROUP BY...

2014-08-28 Thread Victor Yegorov

2014-08-28 12:08 GMT+03:00 gmb gmbou...@gmail.com:

 GroupAggregate  (cost=303425.31..339014.43 rows=136882 width=48) (actual
 time=4708.181..6688.699 rows=287268 loops=1)
   Buffers: shared read=23899, temp read=30974 written=30974
   -  Sort  (cost=303425.31..306847.34 rows=1368812 width=48) (actual
 time=4708.170..5319.429 rows=1368744 loops=1)
 Sort Key: co_id, client_id, doc_no,
 Sort Method: external merge  Disk: 80304kB
 Buffers: shared read=23899, temp read=30974 written=30974
 -  Seq Scan on ddetail  (cost=0.00..37587.12 rows=1368812
 width=48)
 (actual time=0.122..492.964 rows=1368744 loops=1)
   Buffers: shared read=23899
 Total runtime: 6708.244 ms


 My initial attempt was this  (this is what I actually need):

 SELECT  co_id ,  client_id , doc_no ,  line_id , batch_no , sum( amount  )
 FROM ddetail
 GROUP BY co_id ,  client_id , doc_no ,  line_id  , batch_no ;


I think index will be of no help here, as (1) you're reading whole table
anyway and (2) `amount` is not part of your index.

Try to avoid disk-based sort by increasing `work_mem` for your session, I
think value in the range 120MB-150MB should work:

SET work_mem TO '150MB';

Check `EXPLAIN` output after the change.

-- 
Victor Y. Yegorov

Re: [PERFORM] Performance issue: index not used on GROUP BY...

2014-08-28 Thread Marti Raudsepp

On Thu, Aug 28, 2014 at 11:50 AM, gmb gmbou...@gmail.com wrote:
 Can somebody please confirm whether aggregate functions such as GROUP BY
 should use indexes ?

Yes, if the planner deems it faster than other approaches. It can make
wrong choices for many reasons, but usually when your planner tunables
like random_page_cost, effective_cache_size aren't set appropriately.

There's some advice here:
https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server

Just for the purpose of testing, you could try set enable_sort=false
in your session and see if that makes it faster.

On Thu, Aug 28, 2014 at 12:08 PM, gmb gmbou...@gmail.com wrote:
 Sort Key: co_id, client_id, doc_no,

Something went missing from this line...

 Sort Method: external merge  Disk: 80304kB

Depends on your hardware and workloads, but more work_mem may also
improve queries to avoid sorts and hashes needing to use disk. But
beware, setting it too high may result in your server running out of
memory.

Regards,
Marti


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance issue: index not used on GROUP BY...

2014-08-28 Thread gmb


Thanks for these suggestions

Unfortunately , I don't have a lot of memory available ( 65 connections ,
work_mem = 64MB in pg conf ).

 I think index will be of no help here, as (1) you're reading whole table
 anyway and (2) `amount` is not part of your index.

I did not think that the the field being used in the agg function should
also be part of the index. 
I'll try this and check the result. 

My problem is that dropping / adding indexes on this table takes a LOT of
time, so I'm stuck with doing the tests using the indexes as is, or doing
the tests on a smaller dataset.

On the smaller dataset ( 1.5 mill records on that table ) the planner did
not take the index into account, even when I omit the amount column:


CREATE INDEX ix_1
  ON ddetail
  USING btree
  (co_id ,  client_id , doc_no ,  line_id , batch_no);

SELECT  co_id ,  client_id , doc_no ,  line_id , batch_no 
FROM ddetail
GROUP BY co_id ,  client_id , doc_no ,  line_id  , batch_no ;

HashAggregate  (cost=54695.74..56064.49 rows=136875 width=22)
  -  Seq Scan on debfdetail  (cost=0.00..37586.44 rows=1368744 width=22)

still does a seq scan instead of the index scan.
I guess it is possible that on the 1.4 million records, it is faster to do a
seq scan ? 
So I guess I'll  have to try and do this on the 10 mill table and check the
result there.








--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Performance-issue-index-not-used-on-GROUP-BY-tp5816702p5816715.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance issue: index not used on GROUP BY...

2014-08-28 Thread Victor Yegorov

2014-08-28 14:29 GMT+03:00 gmb gmbou...@gmail.com:

 Unfortunately , I don't have a lot of memory available ( 65 connections ,
 work_mem = 64MB in pg conf ).


You don't have to change cluster-wide settings here.

You can issue `SET` command from your client right before running your
query, only your session will be affected.


-- 
Victor Y. Yegorov

Re: [PERFORM] Performance issue: index not used on GROUP BY...

2014-08-28 Thread Josh Berkus

On 08/28/2014 01:50 AM, gmb wrote:
 Can somebody please confirm whether aggregate functions such as GROUP BY
 should use indexes ? 

Sometimes.  In your case, the index has one more column than the GROUP
BY, which makes it less likely that Postgres will use it (since
depending on the cardinality of ddet_id, it might actually be slower to
use the index).

In addition, other folks on this thread have already pointed out the
memory settings issues to you.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance issue with Insert

2011-06-27 Thread Kevin Grittner

Jenish jenishv...@gmail.com wrote:
 
 I am using postgres 8.4.x
 
With x being what?  On what OS and hardware?
 
 Table is having 3 before insert trigger and one after insert
 trigger.
 
 With all triggers enable it is inserting only 4-5 record per
 second.
 
 But if I disable after insert trigger it is able to insert 667
 records per second.
 
http://wiki.postgresql.org/wiki/SlowQueryQuestions
 
 After insert trigger is recursive trigger.
 
So are you counting only the top level inserts or also the ones
generated by the recursive inserts?
 
 My question.
 
 How to avoid the bottleneck?
 
First you need to find out what the bottleneck is.
 
 Parallel processing is possible in Postgres? How?
 
To achieve parallel processing in PostgreSQL you need to use
multiple connections.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance issue with Insert

2011-06-27 Thread Jenish

Hi,

DB : POSTGRES 8.4.8
OS  : Debian
HD : SAS 10k rpm

Shared_buffer is 4096 25 % of RAM , effective_cache is 8GB 75% of RAM

After insert trigger is again calling 2 more trigger and insert record in
another table depends on condition.

with all trigger enable there are 8 insert and 32 updates(approx. update is
depends on hierarchy)

Plz explain multiple connections. Current scenario application server is
sending all requests.

-- 
Thanks  regards,
JENISH VYAS

On Mon, Jun 27, 2011 at 5:37 PM, Kevin Grittner kevin.gritt...@wicourts.gov
 wrote:

 Jenish jenishv...@gmail.com wrote:

  I am using postgres 8.4.x

 With x being what?  On what OS and hardware?

  Table is having 3 before insert trigger and one after insert
  trigger.
 
  With all triggers enable it is inserting only 4-5 record per
  second.
 
  But if I disable after insert trigger it is able to insert 667
  records per second.

 http://wiki.postgresql.org/wiki/SlowQueryQuestions

  After insert trigger is recursive trigger.

 So are you counting only the top level inserts or also the ones
 generated by the recursive inserts?

  My question.
 
  How to avoid the bottleneck?

 First you need to find out what the bottleneck is.

  Parallel processing is possible in Postgres? How?

 To achieve parallel processing in PostgreSQL you need to use
 multiple connections.

 -Kevin

Re: [PERFORM] Performance issue with Insert

2011-06-27 Thread tv

 Hi,

 DB : POSTGRES 8.4.8
 OS  : Debian
 HD : SAS 10k rpm

 Shared_buffer is 4096 25 % of RAM , effective_cache is 8GB 75% of RAM

 After insert trigger is again calling 2 more trigger and insert record in
 another table depends on condition.

 with all trigger enable there are 8 insert and 32 updates(approx. update
 is
 depends on hierarchy)

Hi,

it's very difficult to give you reliable recommendations with this little
info, but the triggers are obviously the bottleneck. We have no idea what
queries are executed in them, but I guess there are some slow queries.

Find out what queries are executed in the triggers, benchmark each of them
and make them faster. Just don't forget that those SQL queries are
executed as prepared statements, so they may behave a bit differently than
plain queries. So use 'PREPARE' and 'EXPLAIN EXECUTE' to tune them.

 Plz explain multiple connections. Current scenario application server is
 sending all requests.

PostgreSQL does not support parallel queries (i.e. a query distributed on
multiple CPUs) so each query may use just a single CPU. If you're CPU
bound (one CPU is 100% utilized but the other CPUs are idle), you can
usually parallelize the workload on your own - just use multiple
connections.

But if you're using an application server and there are multiple
connections used, this is not going to help you. How many connections are
active at the same time? Are the CPUs idle or utilized?

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance issue with Insert

2011-06-27 Thread Jenish

Hi,

I have already checked all the statements present in the trigger, no one is
taking more then 20 ms.

I am using 8-Processor, Quad-Core Server ,CPU utilization is more then 90-95
% for all. (htop result)

DB has 960 concurrent users.

io : writing 3-4 MB per second or less (iotop result).

Scenario :  All insert are waiting for previous insert to complete. Cant
we avoid this situation ?
What is the max_connections postgresql support?

Plz help


-- 
Thanks  regards,
JENISH VYAS






On Mon, Jun 27, 2011 at 6:32 PM, t...@fuzzy.cz wrote:

  Hi,
 
  DB : POSTGRES 8.4.8
  OS  : Debian
  HD : SAS 10k rpm
 
  Shared_buffer is 4096 25 % of RAM , effective_cache is 8GB 75% of RAM
 
  After insert trigger is again calling 2 more trigger and insert record in
  another table depends on condition.
 
  with all trigger enable there are 8 insert and 32 updates(approx. update
  is
  depends on hierarchy)

 Hi,

 it's very difficult to give you reliable recommendations with this little
 info, but the triggers are obviously the bottleneck. We have no idea what
 queries are executed in them, but I guess there are some slow queries.

 Find out what queries are executed in the triggers, benchmark each of them
 and make them faster. Just don't forget that those SQL queries are
 executed as prepared statements, so they may behave a bit differently than
 plain queries. So use 'PREPARE' and 'EXPLAIN EXECUTE' to tune them.

  Plz explain multiple connections. Current scenario application server is
  sending all requests.

 PostgreSQL does not support parallel queries (i.e. a query distributed on
 multiple CPUs) so each query may use just a single CPU. If you're CPU
 bound (one CPU is 100% utilized but the other CPUs are idle), you can
 usually parallelize the workload on your own - just use multiple
 connections.

 But if you're using an application server and there are multiple
 connections used, this is not going to help you. How many connections are
 active at the same time? Are the CPUs idle or utilized?

 Tomas

Re: [PERFORM] Performance issue with Insert

2011-06-27 Thread Merlin Moncure

On Mon, Jun 27, 2011 at 9:22 AM, Jenish jenishv...@gmail.com wrote:
 Hi All,

 I am facing some performance issue with insert into some table.

 I am using postgres 8.4.x

 Table is having 3 before insert trigger and one after insert trigger.

 With all triggers enable it is inserting only 4-5 record per second.

 But if I disable after insert trigger it is able to insert 667 records per
 second.

 After insert trigger is recursive trigger.

 My question.

 How to avoid the bottleneck?

 Parallel processing is possible in Postgres? How?

 Please give you suggestion.

this sounds like a coding issue -- to get to the bottom of this we are
going to need to see the table and the triggers.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance issue with Insert

2011-06-27 Thread Tomas Vondra

Dne 27.6.2011 17:58, Jenish napsal(a):
 
 Hi,
 
 I have already checked all the statements present in the trigger, no one
 is taking more then 20 ms.
 
 I am using 8-Processor, Quad-Core Server ,CPU utilization is more then
 90-95 % for all. (htop result)

So all cores are 95% utilized? That means you're CPU bound and you need
to fix that somehow.

How much of that belongs to postgres? Are there other processes
consuming significant portion of CPU? And what do you mean by
'utilized'? Does that mean user/sys time, or wait time?

 DB has 960 concurrent users. 

Whad does that mean? Does that mean there are 960 active connections?

 io : writing 3-4 MB per second or less (iotop result).

Sequential or random? Post a few lines of 'iostat -x 1' and a few lines
of 'vmstat 1' (collected when the database is busy).

 Scenario :  All insert are waiting for previous insert to complete. Cant
 we avoid this situation ?

What do you mean by 'previous'? Does that mean another insert in the
same session (connection), or something performed in another session?

 What is the max_connections postgresql support? 

That limits number of background processes - each connection is served
by a dedicated posgres process. You can see that in top / ps output.

High values usually mean you need some kind of pooling (you probably
already have one as you're using application server). And if the
connections are really active (doing something all the time), this
should not be significantly higher than the number of cores.

See, you have 8 cores, which means 8 seconds of CPU time each second. No
matter how many connections you allow, you still have just those 8
seconds. So if you need to perform 100x something that takes 1 second,
you need to spend 100 seconds of CPU time. So with those 8 cores, you
can do that in about 12,5 seconds.

Actually if you create too many connections, you'll notice it takes much
more - there's an overhead with process management, context switching,
locking etc.

regards
Tomas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance issue with Insert

2011-06-27 Thread Tomas Vondra

Dne 27.6.2011 17:01, Jenish napsal(a):
 Hi,
 
 DB : POSTGRES 8.4.8
 OS  : Debian
 HD : SAS 10k rpm
 
 Shared_buffer is 4096 25 % of RAM , effective_cache is 8GB 75% of RAM

How much data are we talking about? Does that fit into the shared
buffers or is it significantly larger? Do the triggers touch the whole
database or just a small part of it (active part)?

regards
Tomas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance issue with Insert

2011-06-27 Thread Jenish

Hi ,

This server is the dedicated database server.

 And I am testing the limit for the concurrent active users. When I am
running my test for 400 concurrent user ie. Active connection. I am getting
good performance but when I am running the same the same test for 950
concurrent users I am getting very bad performance.



 Scenario :  All insert are waiting for previous insert to complete.

I don’t know whether it is the same session or different session.



DB id huge but Triggers are not touching the whole database.

I’ll provide the result set of vmstat and iostat tomorrow.


-- 
Thanks  regards,
JENISH VYAS



On Mon, Jun 27, 2011 at 10:48 PM, Tomas Vondra t...@fuzzy.cz wrote:

 Dne 27.6.2011 17:01, Jenish napsal(a):
  Hi,
 
  DB : POSTGRES 8.4.8
  OS  : Debian
  HD : SAS 10k rpm
 
  Shared_buffer is 4096 25 % of RAM , effective_cache is 8GB 75% of RAM

 How much data are we talking about? Does that fit into the shared
 buffers or is it significantly larger? Do the triggers touch the whole
 database or just a small part of it (active part)?

 regards
 Tomas

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance issue with Insert

2011-06-27 Thread Tomas Vondra

Dne 27.6.2011 22:14, Jenish napsal(a):
  And I am testing the limit for the concurrent active users. When I am
 running my test for 400 concurrent user ie. Active connection. I am
 getting good performance but when I am running the same the same test
 for 950 concurrent users I am getting very bad performance.

This is typical behaviour - the performance is good up until some point,
then it degrades much faster.

Why do you even need such number of connections? Does that really
improve performance (e.g. how many inserts do you do with 100 and 400
connections)?

Such number of active connections is not going to give you any advantage
I guess ...

regards
Tomas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance issue with Insert

2011-06-27 Thread Kevin Grittner

Jenish jenishv...@gmail.com wrote:
 
 This server is the dedicated database server.
 
 And I am testing the limit for the concurrent active users. When I
 am running my test for 400 concurrent user ie. Active connection.
 I am getting good performance but when I am running the same the
 same test for 950 concurrent users I am getting very bad
 performance.
 
To serve a large number of concurrent users you need to use a
connection pooler which limits the number of database connections to
a small number.  Typically the most effective number of database
connections is somewhere between the number of actual cores on your
server and twice that plus the number of disk drives.  (It depends
on the details of your hardware and your load.)  The connection
pooler should queue requests which arrive when all database
connections are busy and release them for execution as transactions
complete.  Restricting the active database connections in this way
improves both throughput and latency and will allow you to serve a
much larger number of users without getting into bad performance;
and when you do hit the wall performance will degrade more
gracefully.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] performance issue in the fields.

2011-02-24 Thread Pierre C




I have created two tables. In the first table i am using many fields to
store 3 address.
as well as in b table, i am using array data type to store 3 address.  
 is
there any issue would face in performance related things which one  
will

cause the performance issue.


The array is interesting :
- if you put a gist index on it and do searches like array contains  
values X and Y and Z, gist index has a some special optimizations for this
- if you might store a variable number of integers, and for some reason  
you don't want a normalized one-line-per-value approach



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] performance issue in the fields.

2011-02-23 Thread Merlin Moncure

On Mon, Feb 14, 2011 at 5:36 AM, Pavel Stehule pavel.steh...@gmail.com wrote:
 Hello

 2011/2/14 dba dba...@gmail.com:


 create table a( address1 int,address2 int,address3 int)
 create table b(address int[3])

 I have created two tables. In the first table i am using many fields to
 store 3 address.
 as well as in b table, i am using array data type to store 3 address.  is
 there any issue would face in performance related things which one will
 cause the performance issue.

 yes, there is. Planner can not to work well with foreign keys stored in array.

also the array variant is going to be bigger on disk. This is because
as fields, all the important info about the fields is stored in the
table header (inside the system catalogs).  But with the array,
various header information specific to the array has to be stored with
each row.  This is largely due to some questionable design decisions
made in early array implementation that we are stuck with :-).

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] performance issue in the fields.

2011-02-14 Thread Pavel Stehule

Hello

2011/2/14 dba dba...@gmail.com:


 create table a( address1 int,address2 int,address3 int)
 create table b(address int[3])

 I have created two tables. In the first table i am using many fields to
 store 3 address.
 as well as in b table, i am using array data type to store 3 address.  is
 there any issue would face in performance related things which one will
 cause the performance issue.

yes, there is. Planner can not to work well with foreign keys stored in array.

Regards

Pavel Stehule

 --
 View this message in context: 
 http://postgresql.1045698.n5.nabble.com/performance-issue-in-the-fields-tp3384307p3384307.html
 Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance issue - 2 linux machines, identical configs, different performance

2009-06-17 Thread Robert Haas

2009/6/17 Mark Steben mste...@autorevenue.com:
 A few details – I can always provide more

Could you send:

1. Exact text of query.

2. EXPLAIN ANALYZE output on each machine.

3. VACUUM VERBOSE output on each machine, or at least the last 10 lines.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance issue - 2 linux machines, identical configs, different performance

2009-06-17 Thread Dave Dutcher


We have two machines.  Both running Linux Redhat, both running postgres
8.2.5.
Both have nearly identical 125 GB databases.  In fact we use PITR Recovery
to 
Replicate from one to the other.  

I have to ask the obvious question.  Do you regularly analyze the machine
you replicate too?


Dave



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance issue - 2 linux machines, identical configs, different performance

2009-06-17 Thread Mark Steben

Yes I analyze after each replication.

Mark Steben│Database Administrator│ 

@utoRevenue-R- Join the Revenue-tion
95 Ashley Ave. West Springfield, MA., 01089 
413-243-4800 x1512 (Phone) │ 413-732-1824 (Fax)

@utoRevenue is a registered trademark and a division of Dominion Enterprises

-Original Message-
From: pgsql-performance-ow...@postgresql.org
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Dave Dutcher
Sent: Wednesday, June 17, 2009 1:39 PM
To: 'Mark Steben'; pgsql-performance@postgresql.org
Cc: 'Rich Garabedian'
Subject: Re: [PERFORM] Performance issue - 2 linux machines, identical
configs, different performance


We have two machines.  Both running Linux Redhat, both running postgres
8.2.5.
Both have nearly identical 125 GB databases.  In fact we use PITR Recovery
to 
Replicate from one to the other.  

I have to ask the obvious question.  Do you regularly analyze the machine
you replicate too?


Dave



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance issue using Tsearch2

2008-02-05 Thread Ansgar -59cobalt- Wiechers

On 2008-02-05 Viviane Lestic wrote:
 QUERY PLAN
 -
 Sort (cost=2345.54..2345.58 rows=16 width=308) (actual
 time=270638.774..270643.142 rows=7106 loops=1)
 Sort Key: rank(tab_ocr.zoneindex_test, q.q)
 - Nested Loop (cost=80.04..2345.22 rows=16 width=308) (actual
 time=40886.553..270619.730 rows=7106 loops=1)
 - Nested Loop (cost=80.04..1465.76 rows=392 width=308) (actual
 time=38209.193..173932.313 rows=272414 loops=1)
 - Function Scan on q (cost=0.00..0.01 rows=1 width=32) (actual
 time=0.006..0.007 rows=1 loops=1)
 - Bitmap Heap Scan on tab_ocr (cost=80.04..1460.85 rows=392
 width=276) (actual time=38209.180..173507.052 rows=272414 loops=1)
 Filter: (tab_ocr.zoneindex_test @@ q.q)
 - Bitmap Index Scan on zoneindex_test_idx (cost=0.00..79.94 rows=392
 width=0) (actual time=38204.261..38204.261 rows=283606 loops=1)
 Index Cond: (tab_ocr.zoneindex_test @@ q.q)
 - Index Scan using tab_chemin_label_index on tab_chemin
 (cost=0.00..2.23 rows=1 width=4) (actual time=0.036..0.036 rows=0
 loops=272414)
 Index Cond: (tab_ocr.idstruct = tab_chemin.label)
 Filter: ((chemin)::text ~~ '%;2;%'::text)
 Total runtime: 270647.946 ms
 **
 
 Could someone help me analyze this problem?

Your planner estimates are way off. Try increasing the statistics target
for the columns used in this query and re-analyze the tables after doing
so.

Regards
Ansgar Wiechers
-- 
The Mac OS X kernel should never panic because, when it does, it
seriously inconveniences the user.
--http://developer.apple.com/technotes/tn2004/tn2118.html

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Performance issue using Tsearch2

2008-02-05 Thread Guillaume Smet

On Feb 5, 2008 12:47 PM, Viviane Lestic [EMAIL PROTECTED] wrote:
 Could someone help me analyze this problem?
 I don't manage to see if the problem comes from bad tsearch tuning,
 postgresql configuration, or something else...

Can you try to replace zoneindex_test @@ q with zoneindex_test @@
to_tsquery('partir')? Increasing the statistics for zoneindex_test may
be a good idea too (see ALTER TABLE ... ALTER COLUMN doc).
I'm surprised you have the word partir in so many documents? Do you
use real data?

--
Guillaume

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Performance issue using Tsearch2

2008-02-05 Thread Viviane Lestic

2008/2/5, Ansgar -59cobalt- Wiechers wrote:
 Your planner estimates are way off. Try increasing the statistics target
 for the columns used in this query and re-analyze the tables after doing
 so.

I first set STATISTICS to 1000 for column zoneindex_test and saw no
significant improvement (with a vacuum full analyze in between). Then
I set default_statistics_target to 1000: there is now an improvement,
but the overall time is still way too long... (and the estimated costs
didn't change...)
Here are the results with default_statistics_target set to 1000:

explain analyze SELECT idstruct, headline(zonetext, q),
rank(zoneindex_test, q) FROM tab_ocr, tab_chemin, to_tsquery('partir')
AS q WHERE tab_chemin.chemin like '%;2;%' AND tab_ocr.idstruct =
tab_chemin.label AND zoneindex_test @@ q ORDER BY rank(zoneindex_test,
q) DESC;

  QUERY PLAN
-
 Sort  (cost=2345.30..2345.32 rows=8 width=327) (actual
time=229913.715..229918.172 rows=7106 loops=1)
   Sort Key: rank(tab_ocr.zoneindex_test, q.q)
   -  Nested Loop  (cost=80.04..2345.18 rows=8 width=327) (actual
time=28159.626..229892.957 rows=7106 loops=1)
 -  Nested Loop  (cost=80.04..1465.76 rows=392 width=327)
(actual time=26084.558..130979.395 rows=272414 loops=1)
   -  Function Scan on q  (cost=0.00..0.01 rows=1
width=32) (actual time=0.006..0.007 rows=1 loops=1)
   -  Bitmap Heap Scan on tab_ocr  (cost=80.04..1460.85
rows=392 width=295) (actual time=26084.544..130562.220 rows=272414
loops=1)
 Filter: (tab_ocr.zoneindex_test @@ q.q)
 -  Bitmap Index Scan on zoneindex_test_idx
(cost=0.00..79.94 rows=392 width=0) (actual time=26073.315..26073.315
rows=283606 loops=1)
   Index Cond: (tab_ocr.zoneindex_test @@ q.q)
 -  Index Scan using tab_chemin_label_index on tab_chemin
(cost=0.00..2.23 rows=1 width=4) (actual time=0.040..0.040 rows=0
loops=272414)
   Index Cond: (tab_ocr.idstruct = tab_chemin.label)
   Filter: ((chemin)::text ~~ '%;2;%'::text)
 Total runtime: 229922.864 ms


2008/2/5, Guillaume Smet wrote:
 Can you try to replace zoneindex_test @@ q with zoneindex_test @@
 to_tsquery('partir')?

The improvement seems negligible (with default_statistics_target back
to 10, its default value):
explain analyze SELECT idstruct, headline(zonetext, q),
rank(zoneindex_test, q) FROM tab_ocr, tab_chemin, to_tsquery('partir')
AS q WHERE tab_chemin.chemin like '%;2;%' AND tab_ocr.idstruct =
tab_chemin.label AND zoneindex_test @@ to_tsquery('partir') ORDER BY
rank(zoneindex_test, q) DESC;

  QUERY PLAN
-
 Sort  (cost=4358.91..4358.95 rows=16 width=308) (actual
time=266489.667..266494.132 rows=7106 loops=1)
   Sort Key: rank(tab_ocr.zoneindex_test, q.q)
   -  Nested Loop  (cost=80.04..4358.59 rows=16 width=308) (actual
time=42245.881..266469.644 rows=7106 loops=1)
 -  Function Scan on q  (cost=0.00..0.01 rows=1 width=32)
(actual time=0.007..0.008 rows=1 loops=1)
 -  Nested Loop  (cost=80.04..4358.34 rows=16 width=276)
(actual time=42239.570..178496.761 rows=7106 loops=1)
   -  Bitmap Heap Scan on tab_ocr  (cost=80.04..1461.83
rows=392 width=276) (actual time=38317.423..174188.779 rows=272414
loops=1)
 Filter: (zoneindex_test @@ '''partir'''::tsquery)
 -  Bitmap Index Scan on zoneindex_test_idx
(cost=0.00..79.94 rows=392 width=0) (actual time=38289.289..38289.289
rows=283606 loops=1)
   Index Cond: (zoneindex_test @@ '''partir'''::tsquery)
   -  Index Scan using tab_chemin_label_index on
tab_chemin  (cost=0.00..7.38 rows=1 width=4) (actual time=0.014..0.014
rows=0 loops=272414)
 Index Cond: (tab_ocr.idstruct = tab_chemin.label)
 Filter: ((chemin)::text ~~ '%;2;%'::text)
 Total runtime: 266498.704 ms

 Increasing the statistics for zoneindex_test may
 be a good idea too (see ALTER TABLE ... ALTER COLUMN doc).

I posted the results above.

 I'm surprised you have the word partir in so many documents? Do you
 use real data?

I'm using real data. The indexed documents are extracted from
newspapers, and partir (and its derivates) is quite a common verb in
the French language, so I'm not that surprised to see it show up in
many documents.

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Performance issue

2007-08-28 Thread Willo van der Merwe


Hi Guys,

Following Tom Lane's advice I upgraded to 8.2, and that solved all my 
problems. :D


Thank  you so much for your input, I really appreciate it.

Kind regards

Willo van der Merwe


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Performance issue

2007-08-27 Thread Bill Moran

In response to Willo van der Merwe [EMAIL PROTECTED]:

 Hi Guys,
 
 I have something odd. I have Gallery2 running on PostgreSQL 8.1, and
 recently I upgraded to 8.1.9-1.el4s1.1 (64bit). The issue here really is
 how do I get PostgreSQL to work with their horrible code. The queries
 they generate look something like :
 SELECT blah, blah FROM table1, table2 WHERE some relational stuff AND
 id IN (here a list of 42000+ IDs are listed)
 
 On the previous version (which I can't recall what it was, but it was a
 version 8.1) the queries executed fine, but suddenly now, these queries
 are taking up-to 4 minutes to complete. I am convinced it's the
 parsing/handling of the IN clause. It could, of course, be that the list
 has grown so large that it can't fit into a buffer anymore. For obvious
 reasons I can't run an EXPLAIN ANALYZE from a prompt.

Those reasons are not obvious to me.  The explain analyze output is
going to be key to working this out -- unless it's something like
your postgresql.conf isn't properly tuned.

 I vacuum and
 reindex  the database daily.
 
 I'd prefer not to have to rewrite the code, so any suggestions would be
 very welcome.

-- 
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

[EMAIL PROTECTED]
Phone: 412-422-3463x4023

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Performance issue

2007-08-27 Thread Tom Lane

Willo van der Merwe [EMAIL PROTECTED] writes:
 I have something odd. I have Gallery2 running on PostgreSQL 8.1, and
 recently I upgraded to 8.1.9-1.el4s1.1 (64bit). The issue here really is
 how do I get PostgreSQL to work with their horrible code. The queries
 they generate look something like :
 SELECT blah, blah FROM table1, table2 WHERE some relational stuff AND
 id IN (here a list of 42000+ IDs are listed)

 On the previous version (which I can't recall what it was, but it was a
 version 8.1) the queries executed fine, but suddenly now, these queries
 are taking up-to 4 minutes to complete. I am convinced it's the
 parsing/handling of the IN clause.

You're wrong about that, because we have not done anything to change IN
planning in 8.1.x.  You might need to re-ANALYZE or something; it sounds
to me more like the planner has changed strategies in the wrong direction.

FWIW, 8.2 should be vastly more efficient than 8.1 for this sort of
query --- any chance of an upgrade?

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Performance issue with 8.2.3 - C application

2007-07-25 Thread Nis Jørgensen

Karl Denninger skrev:
 I've got an interesting issue here that I'm running into with 8.2.3
 
 This is an application that has run quite well for a long time, and has
 been operating without significant changes (other than recompilation)
 since back in the early 7.x Postgres days.  But now we're seeing a LOT
 more load than we used to with it, and suddenly, we're seeing odd
 performance issues.
 
 It APPEARS that the problem isn't query performance per-se.  That is,
 while I can find a few processes here and there in a run state when I
 look with a PS, I don't see them consistently churning.
 
 But here's the query that has a habit of taking the most time
 
 select forum, * from post where toppost = 1 and (replied  (select
 lastview from forumlog where login='theuser' and forum=post.forum and
 number is null)) is not false AND (replied  (select lastview from
 forumlog where login='theuser' and forum=post.forum and
 number=post.number)) is not false order by pinned desc, replied desc offset 0 
 limit 20

Since I can do little to help you with anything else, here is a little
help from a guy with a hammer. It seems you may be able to convert the
subqueries into a left join. Not sure whether this helps, nor whether I
got some bits of the logic wrong, but something like this might help the
planner find a better plan:

SELECT forum, *
FROM post
LEFT JOIN forumlog
ON post.forum = forumlog.forum
AND forumlog.login = 'theuser'
AND (post.number = forumlog.number OR forumlog.number IS NULL)
AND post.replied = lastview
WHERE forumlog.forum IS NULL
AND forum.toppost = 1
ORDER BY pinned DESC, replied DESC OFFSET 0 LIMIT 20 ;


Nis


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Performance issue with 8.2.3 - C application

2007-07-25 Thread Gregory Stark

Karl Denninger [EMAIL PROTECTED] writes:

 Not sure where to start here.  It appears that I'm CPU limited and the problem
 may be that this is a web-served application that must connect to the Postgres
 backend for each transaction, perform its queries, and then close the
 connection down - in other words the load may be coming not from Postgres but
 rather from places I can't fix at the application layer (e.g. fork() overhead,
 etc).  The DBMS and Apache server are on the same machine, so there's no 
 actual
 network overhead involved.

 If that's the case the only solution is to throw more hardware at it.  I can 
 do
 that, but before I go tossing more CPU at the problem I'd like to know I'm not
 just wasting money.

I know you found the proximate cause of your current problems, but it sounds
like you have something else you should consider looking at here. There are
techniques for avoiding separate database connections for each request.

If you're using Apache you can reduce the CPU usage a lot by writing your
module as an Apache module instead of a CGI or whatever type of program it is
now. Then your module would live as long as a single Apache instance which you
can configure to be hours or days instead of a single request. It can keep
around the database connection for that time.

If that's impossible there are still techniques that can help. You can set up
PGPool or PGBouncer or some other connection aggregating tool to handle the
connections. This is a pretty low-impact change which shouldn't require making
any application changes aside from changing the database connection string.
Effectively this is a just a connection pool that lives in a separate
process.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Performance issue with 8.2.3 - C application

2007-07-25 Thread Karl Denninger

Looks like that was the problem - got a day under the belt now with the 
8.2.4 rev and all is back to normal.


Karl Denninger ([EMAIL PROTECTED])
http://www.denninger.net




Karl Denninger wrote:

Aha!

BIG difference.  I won't know for sure until the biz day tomorrow but 
the first blush look is that it makes a HUGE difference in system 
load, and I no longer have the stats collector process on the top of 
the top list..


Karl Denninger ([EMAIL PROTECTED])
http://www.denninger.net




Tom Lane wrote:

Karl Denninger [EMAIL PROTECTED] writes:
 
Hm. now that's interesting.  Stats collector IS accumulating 
quite a bit of runtime. me thinks its time to go grab 8.2.4.



I think Merlin might have nailed it --- the stats collector bug is
that it tries to write out the stats file way more often than it
should.  So any excessive userland CPU time you see is just the tip
of the iceberg compared to the system and I/O costs incurred.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


%SPAMBLOCK-SYS: Matched [hub.org+], message ok
  



%SPAMBLOCK-SYS: Matched [EMAIL PROTECTED], message ok

---(end of broadcast)---
TIP 6: explain analyze is your friend


%SPAMBLOCK-SYS: Matched [hub.org+], message ok



%SPAMBLOCK-SYS: Matched [EMAIL PROTECTED], message ok

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Performance issue with 8.2.3 - C application

2007-07-24 Thread Karl Denninger

Yeah, the problem doesn't appear to be there.  As I said, if I look at 
the PS of the system when its bogging, there aren't a whole bunch of 
processes stuck doing these, so while this does take a second or two to 
come back, that's not that bad.


Its GENERAL performance that just bites - the system is obviously out of 
CPU, but what I can't get a handle on is WHY.  It does not appear to be 
accumulating large amounts of runtime in processes I can catch, but the 
load average is quite high.


This is why I'm wondering if what I'm taking here is a hit on the 
fork/exec inside the portmaster, in the setup internally in there, in 
the IPC between my process via libPQ, etc - and how I can profile what's 
going on.


Karl Denninger ([EMAIL PROTECTED])
http://www.denninger.net




Tom Lane wrote:

Karl Denninger [EMAIL PROTECTED] writes:
  

But here's the query that has a habit of taking the most time



  
select forum, * from post where toppost = 1 and (replied  (select 
lastview from forumlog where login='theuser' and forum=post.forum and 
number is null)) is not false AND (replied  (select lastview from 
forumlog where login='theuser' and forum=post.forum and 
number=post.number)) is not f

alse order by pinned desc, replied desc offset 0 limit 20



Did that ever perform well for you?  It's the sub-selects that are
likely to hurt ... in particular,

  
 -  Index Scan using post_top on post  (cost=0.00..57266.37 
rows=113 width=757)

   Index Cond: (toppost = 1)
   Filter: (((replied  (subplan)) IS NOT FALSE) AND 
((replied  (subplan)) IS NOT FALSE))



versus

  

 Index Scan using post_top on post  (cost=0.00..632.03 rows=1013 width=11)
   Index Cond: (toppost = 1)



The planner thinks that the two subplan filter conditions will eliminate
about 90% of the rows returned by the bare indexscan (IIRC this is
purely a rule of thumb, not based on any statistics) and that testing
them 1013 times will add over 5 cost units to the basic indexscan.
That part I believe --- correlated subqueries are expensive.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


%SPAMBLOCK-SYS: Matched [hub.org+], message ok

Re: [PERFORM] Performance issue with 8.2.3 - C application

2007-07-24 Thread Merlin Moncure


On 7/25/07, Karl Denninger [EMAIL PROTECTED] wrote:


 Yeah, the problem doesn't appear to be there.  As I said, if I look at the
PS of the system when its bogging, there aren't a whole bunch of processes
stuck doing these, so while this does take a second or two to come back,
that's not that bad.

 Its GENERAL performance that just bites - the system is obviously out of
CPU, but what I can't get a handle on is WHY.  It does not appear to be
accumulating large amounts of runtime in processes I can catch, but the load
average is quite high.


8.2.3 has the 'stats collector bug' (fixed in 8.2.4) which increased
load in high concurrency conditions.  on a client's machine after
patching the postmaster load drop from the 4-5 range to 1-2 range on a
500 tps server.  maybe this is biting you?  symptoms are high load avg
and high cpu usage of stats collector process.

merlin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Performance issue with 8.2.3 - C application

2007-07-24 Thread Karl Denninger

Hm. now that's interesting.  Stats collector IS accumulating 
quite a bit of runtime. me thinks its time to go grab 8.2.4.


Karl Denninger ([EMAIL PROTECTED])
http://www.denninger.net




Merlin Moncure wrote:

On 7/25/07, Karl Denninger [EMAIL PROTECTED] wrote:


 Yeah, the problem doesn't appear to be there.  As I said, if I look 
at the
PS of the system when its bogging, there aren't a whole bunch of 
processes

stuck doing these, so while this does take a second or two to come back,
that's not that bad.

 Its GENERAL performance that just bites - the system is obviously 
out of

CPU, but what I can't get a handle on is WHY.  It does not appear to be
accumulating large amounts of runtime in processes I can catch, but 
the load

average is quite high.


8.2.3 has the 'stats collector bug' (fixed in 8.2.4) which increased
load in high concurrency conditions.  on a client's machine after
patching the postmaster load drop from the 4-5 range to 1-2 range on a
500 tps server.  maybe this is biting you?  symptoms are high load avg
and high cpu usage of stats collector process.

merlin


%SPAMBLOCK-SYS: Matched [google.com+], message ok



%SPAMBLOCK-SYS: Matched [EMAIL PROTECTED], message ok

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Performance issue with 8.2.3 - C application

2007-07-24 Thread Tom Lane

Karl Denninger [EMAIL PROTECTED] writes:
 Hm. now that's interesting.  Stats collector IS accumulating 
 quite a bit of runtime. me thinks its time to go grab 8.2.4.

I think Merlin might have nailed it --- the stats collector bug is
that it tries to write out the stats file way more often than it
should.  So any excessive userland CPU time you see is just the tip
of the iceberg compared to the system and I/O costs incurred.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Performance issue with 8.2.3 - C application

2007-07-24 Thread Karl Denninger


Aha!

BIG difference.  I won't know for sure until the biz day tomorrow but 
the first blush look is that it makes a HUGE difference in system 
load, and I no longer have the stats collector process on the top of the 
top list..


Karl Denninger ([EMAIL PROTECTED])
http://www.denninger.net




Tom Lane wrote:

Karl Denninger [EMAIL PROTECTED] writes:
  
Hm. now that's interesting.  Stats collector IS accumulating 
quite a bit of runtime. me thinks its time to go grab 8.2.4.



I think Merlin might have nailed it --- the stats collector bug is
that it tries to write out the stats file way more often than it
should.  So any excessive userland CPU time you see is just the tip
of the iceberg compared to the system and I/O costs incurred.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


%SPAMBLOCK-SYS: Matched [hub.org+], message ok
  



%SPAMBLOCK-SYS: Matched [EMAIL PROTECTED], message ok

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] performance issue with a specific query

2006-07-27 Thread Scott Marlowe

On Thu, 2006-07-27 at 09:23, Eliott wrote:
 Hi!
 
 I hope I'm sending my question to the right list, please don't flame
 if it's the wrong one.
 
 I have noticed that while a query runs in about 1.5seconds on a 8.xx
 version postgresql server on our 7.4.13 it takes around 15-20 minutes.
 Since we are using RHEL4 on our server we are stuck with 7.4.13. The
 enormous time difference between the different builds drives me crazy.
 Can you please help me identifying the bottleneck or suggest anything
 to improve the dismal performance.

You are absolutely on the right list.  A couple of points.

1:  Which 8.xx?  8.0.x or 8.1.x?  8.1.x is literally light years ahead
of 7.4 in terms of performance.  8.0 is somewhere between them.  The
performance difference you're seeing is pretty common.

2:  Looking at your query, there are places where you're joining on
things like date_trunc(...).  In 7.4 the database will not, and cannot
use a normal index on the date field for those kinds of things.  It can,
however, use a funtional index on some of them.  Try creating an index
on date_trunc('day',yourfieldhere) and see if that helps.

3:  You are NOT Stuck on 7.4.13.  I have a RHEL server that will be
running 8.1.4 or so pretty soon as a dataware house.  It may get updated
to RHEL4, may not.  You can either compile from the .tar.[gz|bz2] files
or download the PGDG rpms for your distro.

4:  You are fighting an uphill battle.  There were a LOT of improvements
made all over in the march from 7.4 to 8.1.  Not all of them were simple
planner tweaks and shortcuts, but honest to goodness changes to the way
things happen.  No amount of tuning can make 7.4 run as fast as 8.1.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] performance issue with a specific query

2006-07-27 Thread Merlin Moncure


On 7/27/06, Eliott [EMAIL PROTECTED] wrote:

Hi!

I hope I'm sending my question to the right list, please don't flame if it's
the wrong one.

I have noticed that while a query runs in about 1.5seconds on a 8.xx version
postgresql server on our 7.4.13 it takes around 15-20 minutes. Since we are
using RHEL4 on our server we are stuck with 7.4.13. The enormous time
difference between the different builds drives me crazy. Can you please help
me identifying the bottleneck or suggest anything to improve the dismal
performance.
The query is the following:



try turning off genetic query optimization.  regarding the rhel4
issue...does rhel not come with a c compiler? :)

merlin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [PERFORM] performance issue with a specific query

2006-07-27 Thread Joshua D. Drake





try turning off genetic query optimization.  regarding the rhel4
issue...does rhel not come with a c compiler? :)


Enterprises are not going to compile. They are going to accept the 
latest support by vendor release.


Redhat has a tendency to  be incredibly stupid about this particular 
area of their packaging.


Joshua D. Drake




merlin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq




--

   === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
   Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [PERFORM] performance issue with a specific query

2006-07-27 Thread Alvaro Herrera

Joshua D. Drake wrote:
 
 
 try turning off genetic query optimization.  regarding the rhel4
 issue...does rhel not come with a c compiler? :)
 
 Enterprises are not going to compile. They are going to accept the 
 latest support by vendor release.
 
 Redhat has a tendency to  be incredibly stupid about this particular 
 area of their packaging.

Stupid how?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] performance issue with a specific query

2006-07-27 Thread Tom Lane

Alvaro Herrera [EMAIL PROTECTED] writes:
 Joshua D. Drake wrote:
 Enterprises are not going to compile. They are going to accept the 
 latest support by vendor release.
 
 Redhat has a tendency to  be incredibly stupid about this particular 
 area of their packaging.

 Stupid how?

Red Hat feels (apparently accurately, judging by their subscription
revenue ;-)) that what RHEL customers want is a platform that's stable
over multi-year application lifespans.  So major incompatible changes in
the system software are not looked on with favor.  That's why RHEL4
is still shipping PG 7.4.*.  You can call it a stupid policy if you
like, but it's hard to argue with success.

However, there will be an RH-supported release of PG 8.1.* as an optional
add-on for RHEL4.  Real Soon Now, I hope --- the release date has been
pushed back a couple times already.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] PERFORMANCE ISSUE ODBC x LIBPQ C++ Application

2005-06-27 Thread Merlin Moncure

 Hi !
 
 My company is evaluating to compatibilizate our system (developed in
 C++) to PostgreSQL.
 
 Our programmer made a lot of tests and he informed me that the
 performance using ODBC is very similar than using libpq, even with a
big
 number of simultaneous connections/queries. Of course that for us is
 simpler use ODBC because will be easier to maintan as we already
support
 a lot of other databases using ODBC (MySQL, DB2, etc).
 
 Someone already had this experience? What are the key benefits using
 libpq insted of ODBC ?
 
 Our application have a heavy load and around 150 concorrent users.

The ODBC driver for postgresql implements its own protocol stack.
Unfortunately, it is still on protocol revision 2 (out of 3).  Also, IMO
libpq is a little better tested and durable than the odbc driver.  This
naturally follows from the fact that libpq is more widely used and more
actively developed than odbc.

If you are heavily C++ invested you can consider wrapping libpq yourself
if you want absolute maximum performance.  If you happen to be
developing on Borland platform give strong consideration to Zeos
connection library which is very well designed (it wraps libpq).

You might want to consider posting your question to the odbc list.

Merlin 


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] PERFORMANCE ISSUE ODBC x LIBPQ C++ Application

2005-06-27 Thread Eric Lauzon

i would take a peek at psqlodbc-8.0 drivers ..
i wouldn't battle with other version you might find such as (unixodbc
ones)


-elz


 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Merlin Moncure
 Sent: 27 juin 2005 10:29
 To: grupos
 Cc: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] PERFORMANCE ISSUE ODBC x LIBPQ C++ Application
 
  Hi !
  
  My company is evaluating to compatibilizate our system (developed in
  C++) to PostgreSQL.
  
  Our programmer made a lot of tests and he informed me that the 
  performance using ODBC is very similar than using libpq, even with a
 big
  number of simultaneous connections/queries. Of course that 
 for us is 
  simpler use ODBC because will be easier to maintan as we already
 support
  a lot of other databases using ODBC (MySQL, DB2, etc).
  
  Someone already had this experience? What are the key 
 benefits using 
  libpq insted of ODBC ?
  
  Our application have a heavy load and around 150 concorrent users.
 
 The ODBC driver for postgresql implements its own protocol stack.
 Unfortunately, it is still on protocol revision 2 (out of 3). 
  Also, IMO libpq is a little better tested and durable than 
 the odbc driver.  This naturally follows from the fact that 
 libpq is more widely used and more actively developed than odbc.
 
 If you are heavily C++ invested you can consider wrapping 
 libpq yourself if you want absolute maximum performance.  If 
 you happen to be developing on Borland platform give strong 
 consideration to Zeos connection library which is very well 
 designed (it wraps libpq).
 
 You might want to consider posting your question to the odbc list.
 
 Merlin 
 
 
 ---(end of 
 broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
 

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Performance issue

2003-09-24 Thread Joseph Bove

Peter,

One possibility is to drop all the indexes, do the insert and re-add the 
indexes.

The more indexes that exist and the more rows that exist, the more costly 
the insert.

Regards,

Joseph

At 05:48 PM 9/24/2003 +1200, peter wrote:
Hello,

I have been trying to get my Postgres database to do faster inserts.

The environment is basically a single user situation.

The part that I would like to speed up is when a User copys a Project.
A Project consists of a number of Rooms(say 60). Each room contains a 
number of items.
A project will contain say 20,000 records.

Anyway the copying process gets slower and slower, as more projects are 
added to the database.

My statistics(Athlon 1.8Ghz)

20,000 itemsTakes on average 0.078seconds/room
385,000 items  Takes on average .11seconds/room
690,000 items  takes on average .270seconds/room
1,028,000 items   Takes on average .475seconds/room
As can be seen the time taken to process each room increases. A commit 
occurs when a room has been copied.
The hard drive  is not being driven very hard. The hard drive light only 
flashes about twice a second when there are a million records in the database.

I thought that the problem could have been my plpgsql procedure because I 
assume the code is interpreted.
However I have just rewriten the code using straight sql(with some temp 
fields),
and the times turn out to be almost exactly the same as the plpgsql version.

The read speed for the Application is fine. The sql planner seems to be 
doing a good job. There has been only one problem
that I have found with one huge select, which was fixed by a cross join.

 I am running Red hat 8. Some of my conf entries that I have changed follow
shared_buffers = 3700
effective_cache_size = 4000
sort_mem = 32168
Are the increasing times reasonable?
The times themselves might look slow, but thats because there are a number 
of tables involved in a Copy

I can increase the shared buffer sizes above 32M, but would this really help?

TIA

peter Mcgregor

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [PERFORM] Performance issue

2003-09-24 Thread Richard Jones

get rid of any unnecessary indexes?
i've found that droping indexes and re-creating them isn't usually worth the 
effort

mount the disk with the noatime option which saves you the time involved in 
updating the last access time on files

make sure you're doing all the inserts in one transaction..  wrapping a bunch 
of INSERTS in BEGIN  COMMIT speeds them up loads.




 At 05:48 PM 9/24/2003 +1200, peter wrote:
 Hello,
 
 I have been trying to get my Postgres database to do faster inserts.
 
 The environment is basically a single user situation.
 
 The part that I would like to speed up is when a User copys a Project.
 A Project consists of a number of Rooms(say 60). Each room contains a
 number of items.
 A project will contain say 20,000 records.
 
 Anyway the copying process gets slower and slower, as more projects are
 added to the database.
 
 My statistics(Athlon 1.8Ghz)
 
 20,000 itemsTakes on average 0.078seconds/room
 385,000 items  Takes on average .11seconds/room
 690,000 items  takes on average .270seconds/room
 1,028,000 items   Takes on average .475seconds/room
 
 As can be seen the time taken to process each room increases. A commit
 occurs when a room has been copied.
 The hard drive  is not being driven very hard. The hard drive light only
 flashes about twice a second when there are a million records in the
  database.
 
 I thought that the problem could have been my plpgsql procedure because I
 assume the code is interpreted.
 However I have just rewriten the code using straight sql(with some temp
 fields),
 and the times turn out to be almost exactly the same as the plpgsql
  version.
 
 The read speed for the Application is fine. The sql planner seems to be
 doing a good job. There has been only one problem
 that I have found with one huge select, which was fixed by a cross join.
 
   I am running Red hat 8. Some of my conf entries that I have changed
  follow shared_buffers = 3700
 effective_cache_size = 4000
 sort_mem = 32168
 
 Are the increasing times reasonable?
 The times themselves might look slow, but thats because there are a number
 of tables involved in a Copy
 
 I can increase the shared buffer sizes above 32M, but would this really
  help?
 
 TIA
 
 peter Mcgregor
 
 
 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster

 ---(end of broadcast)---
 TIP 2: you can get off all lists at once with the unregister command
 (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Performance issue

2003-09-24 Thread Sean Chittenden

 My statistics(Athlon 1.8Ghz)
 
 20,000 itemsTakes on average 0.078seconds/room
 385,000 items  Takes on average .11seconds/room
 690,000 items  takes on average .270seconds/room
 1,028,000 items   Takes on average .475seconds/room
[snip]
  I am running Red hat 8. Some of my conf entries that I have changed 
 follow
 shared_buffers = 3700
 effective_cache_size = 4000
 sort_mem = 32168

Have you twiddled with your wal_buffers or checkpoint_segments?  Might
be something to look at.

-sc

-- 
Sean Chittenden

---(end of broadcast)---
TIP 8: explain analyze is your friend

74 matches

Mail list logo