[PERFORM] High CPU Usage

2012-06-20 Thread Siddharth Shah

Hello,

I have around 1000 schema in database, Each schema having similar data 
structure with different data
Each schema has few tables which never updates (Read only table) and 
other tables rewrites almost everyday so I prefer to TRUNCATE those 
tables and restores with new data


Now facing issue on high CPU  IO on database primarily of Stats 
Collector  Vacuuming, size of statfile is almost 28MB and when I 
manually vacuum analyze complete database it takes almost 90 minutes 
though auto vacuum is configured


Restoring dump on each schema may minor data variations
Executing SQL statements on schema are few , Affecting less than 50 
touple / day


My Questions :

Increasing Maintainace_Work_Mem improves auto / manual vacuum 
performance ? If it improves will it require more IO / CPU resource ?
If I stops Stats Collector process  auto vaccuming  Execute manual 
vaccum based on schema restoration with major change what performance 
parameter I need to consider ? (Restoring data has vary few changes)
Is Vacuuming  Stats required here for Metadata for improving 
performance ? (Table structures remain same)


Any more on this which can help to reduce IO without affecting major 
performance


regards,
Siddharth

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] scale up (postgresql vs mssql)

2012-06-20 Thread Eyal Wilde
Hi, all.

this is an obligation from the past:
http://archives.postgresql.org/pgsql-performance/2012-05/msg00017.php

the same test, that did ~230 results, is now doing ~700 results. that is,
BTW even better than mssql.

the ultimate solution for that problem was to NOT to do ON COMMIT DELETE
ROWS for the temporary tables. instead, we just do DELETE FROM
temp_table1.

doing TRUNCATE temp_table1 is defiantly the worst case (~100 results in
the same test). this is something we knew for a long time, which is why we
did ON COMMIT DELETE ROWS, but eventually it turned out as far from being
the best.

another minor issue is that when configuring
 temp_tablespace='other_tablespace', the sequences of the temporary tables
remain on the 'main_tablespace'.

i hope that will help making postgres even better :)


[PERFORM] index-only scan is missing the INCLUDE feature

2012-06-20 Thread Eyal Wilde
Hi all,

as far as i looked around about the new feature: index-only scan, i guess
this feature will not include the option such as ms-sql INCLUDE.

well, i have a table with columns: a,b,c
i query the table like this: select a,c from table where a=x and b=y
as for now, i have unique-index on (a,b)

in the future (when upgrading to 9.2), i would like to have unique-index on
(a,b, INCLUDE c). but that wont be possible (right?).

so... in order to have index-only scan, i will have to create an index like
(a,b,c), but this has problems:
1. i lose the uniqueness enforcement of (a,b), unless, i will create 2
indexes: (a,b) and (a,b,c).
2. every update to column c would result in an unnecessary index-key-update
(or what ever you call that operation), which is not just updating a tuple,
but also an attempt to re-ordering it(!).
3. i just wonder: practically there is uniqueness of (a,b). now, if i
create index like (a,b,c) the optimizer dose not know about the uniqueness
of (a,b), therefore i afraid, it may not pick the best query-plan..

Thanks for any comment.


[PERFORM] Why is a hash join being used?

2012-06-20 Thread Tim Jacobs
I am running the following query:

SELECT res1.x, res1.y, res1.z
FROM test t
JOIN residue_atom_coords res1 ON
t.struct_id_1 = res1.struct_id AND
res1.atomno IN (1,2,3,4) AND 
(res1.seqpos BETWEEN t.pair_1_helix_1_begin AND 
t.pair_1_helix_1_end)
WHERE
t.compare_id BETWEEN 1 AND 1;

The 'test' table is very large (~270 million rows) as is the 
residue_atom_coords table (~540 million rows).

The number of compare_ids I select in the 'WHERE' clause determines the join 
type in the following way:

t.compare_id BETWEEN 1 AND 5000;

 Nested Loop  (cost=766.52..15996963.12 rows=3316307 width=24)
   -  Index Scan using test_pkey on test t  (cost=0.00..317.20 rows=5372 
width=24)
 Index Cond: ((compare_id = 1) AND (compare_id = 5000))
   -  Bitmap Heap Scan on residue_atom_coords res1  (cost=766.52..2966.84 
rows=625 width=44)
 Recheck Cond: ((struct_id = t.struct_id_1) AND (seqpos = 
t.pair_1_helix_1_begin) AND (seqpos = t.pair_1_helix_1_end) AND (atomno = ANY 
('{1,2,3,4}'::integer[])))
 -  Bitmap Index Scan on residue_atom_coords_pkey  (cost=0.00..766.36 
rows=625 width=0)
   Index Cond: ((struct_id = t.struct_id_1) AND (seqpos = 
t.pair_1_helix_1_begin) AND (seqpos = t.pair_1_helix_1_end) AND (atomno = ANY 
('{1,2,3,4}'::integer[])))

t.compare_id BETWEEN 1 AND 1;

 Hash Join  (cost=16024139.91..20940899.94 rows=6633849 width=24)
   Hash Cond: (t.struct_id_1 = res1.struct_id)
   Join Filter: ((res1.seqpos = t.pair_1_helix_1_begin) AND (res1.seqpos = 
t.pair_1_helix_1_end))
   -  Index Scan using test_pkey on test t  (cost=0.00..603.68 rows=10746 
width=24)
 Index Cond: ((compare_id = 1) AND (compare_id = 1))
   -  Hash  (cost=13357564.16..13357564.16 rows=125255660 width=44)
 -  Seq Scan on residue_atom_coords res1  (cost=0.00..13357564.16 
rows=125255660 width=44)
   Filter: (atomno = ANY ('{1,2,3,4}'::integer[]))

The nested loop join performs very quickly, whereas the hash join is incredibly 
slow. If I disable the hash join temporarily then a nested loop join is used in 
the second case and is the query runs much more quickly. How can I change my 
configuration to favor the nested join in this case? Is this a bad idea? 
Alternatively, since I will be doing selections like this many times, what 
indexes can be put in place to expedite the query without mucking with the 
query optimizer? I've already created an index on the struct_id field of 
residue_atom_coords (each unique struct_id should only have a small number of 
rows for the residue_atom_coords table).

Thanks in advance,
Tim



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] pgbouncer - massive overhead?

2012-06-20 Thread Strange, John W
Given a baseline postgresql.conf config and a couple DL580 40 core/256GB memory 
I noticed a large over head for pgbouncer, has  anyone seen this before?


$ pgbench -h `hostname -i` -j 32 -p 4320 -U asgprod -s 500 -c 32 -S -T 60 
pgbench_500
Scale option ignored, using pgbench_branches table count = 500
starting vacuum...end.
transaction type: SELECT only
scaling factor: 500
query mode: simple
number of clients: 32
number of threads: 32
duration: 60 s
number of transactions actually processed: 1743073
tps = 29049.88 (including connections establishing)
tps = 29050.308194 (excluding connections establishing)

$ pgbench -h `hostname -i` -j 32 -p 4310 -U asgprod -s 500 -c 32 -S -T 60 
pgbench_500
Scale option ignored, using pgbench_branches table count = 500
starting vacuum...end.
transaction type: SELECT only
scaling factor: 500
query mode: simple
number of clients: 32
number of threads: 32
duration: 60 s
number of transactions actually processed: 8692204
tps = 144857.505107 (including connections establishing)
tps = 144880.181341 (excluding connections establishing)

processor   : 39
vendor_id   : GenuineIntel
cpu family  : 6
model   : 47
model name  :Intel(R) Xeon(R) CPU E7- 4860  @ 2.27GHz



This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why is a hash join being used?

2012-06-20 Thread Sergey Konoplev
On Wed, Jun 20, 2012 at 1:34 AM, Tim Jacobs tjaco...@email.unc.edu wrote:
 The nested loop join performs very quickly, whereas the hash join is 
 incredibly slow. If I disable the hash join temporarily then a nested loop 
 join is used in the second case and is the query runs much more quickly. How 
 can I change my configuration to favor the nested join in this case? Is this 
 a bad idea?

First do ANALYZE the tables and try the tests again.

If it helped check your autovacuum configuration. Look at
http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html#AUTOVACUUM
and the pg_stat_user_tables table (last_* and *_count fields).

If it still produces wrong plan then try to increase statistics
entries by ALTER TABLE SET STATISTICS (do not forget to ANALYZE after
doing it) or by the default_statistics_target configuration parameter.
Read more about it here
http://www.postgresql.org/docs/9.1/static/planner-stats.html.

 Alternatively, since I will be doing selections like this many times, what 
 indexes can be put in place to expedite the query without mucking with the 
 query optimizer? I've already created an index on the struct_id field of 
 residue_atom_coords (each unique struct_id should only have a small number of 
 rows for the residue_atom_coords table).

As I can see everything is okay with indexes.


 Thanks in advance,
 Tim



 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



-- 
Sergey Konoplev

a database architect, software developer at PostgreSQL-Consulting.com
http://www.postgresql-consulting.com

Jabber: gray...@gmail.com Skype: gray-hemp Phone: +79160686204

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] scale up (postgresql vs mssql)

2012-06-20 Thread Andy Colson

On 6/20/2012 1:01 AM, Eyal Wilde wrote:

Hi, all.

this is an obligation from the past:
http://archives.postgresql.org/pgsql-performance/2012-05/msg00017.php

the same test, that did ~230 results, is now doing ~700 results. that
is, BTW even better than mssql.

the ultimate solution for that problem was to NOT to do ON COMMIT
DELETE ROWS for the temporary tables. instead, we just do DELETE FROM
temp_table1.

doing TRUNCATE temp_table1 is defiantly the worst case (~100 results
in the same test). this is something we knew for a long time, which is
why we did ON COMMIT DELETE ROWS, but eventually it turned out as far
from being the best.

another minor issue is that when configuring
  temp_tablespace='other_tablespace', the sequences of the temporary
tables remain on the 'main_tablespace'.

i hope that will help making postgres even better :)



Did you ever try re-writing some of the temp table usage to use 
subselect's/views/cte/etc?


-Andy


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] index-only scan is missing the INCLUDE feature

2012-06-20 Thread Craig Ringer

On 06/20/2012 12:46 PM, Eyal Wilde wrote:

Hi all,

as far as i looked around about the new feature: index-only scan, i 
guess this feature will not include the option such as ms-sql INCLUDE.


For those of us who don't know MS-SQL, can you give a quick explanation 
of what the INCLUDE keyword in an index definition is expected to do, or 
some documentation references? It's possible to guess it somewhat from 
your description, but it's helpful to be specific when asking a question 
about features from another DBMS.


--
Craig Ringer


Re: [PERFORM] scale up (postgresql vs mssql)

2012-06-20 Thread Merlin Moncure
On Wed, Jun 20, 2012 at 8:43 AM, Andy Colson a...@squeakycode.net wrote:
 this is an obligation from the past:
 http://archives.postgresql.org/pgsql-performance/2012-05/msg00017.php

 the same test, that did ~230 results, is now doing ~700 results. that
 is, BTW even better than mssql.

 the ultimate solution for that problem was to NOT to do ON COMMIT
 DELETE ROWS for the temporary tables. instead, we just do DELETE FROM
 temp_table1.

 doing TRUNCATE temp_table1 is defiantly the worst case (~100 results
 in the same test). this is something we knew for a long time, which is
 why we did ON COMMIT DELETE ROWS, but eventually it turned out as far
 from being the best.

 another minor issue is that when configuring
  temp_tablespace='other_tablespace', the sequences of the temporary
 tables remain on the 'main_tablespace'.

 i hope that will help making postgres even better :)


 Did you ever try re-writing some of the temp table usage to use
 subselect's/views/cte/etc?

Yeah -- especially CTE.  But, assuming you really do need to keep a
temp table organized and you want absolutely minimum latency in the
temp table manipulating function, you can use a nifty trick so
organize a table around txid_current();

CREATE UNLOGGED TABLE Cache (txid BIGINT DEFAULT txid_current(), a
TEXT, b TEXT);
CREATE INDEX ON Cache(txid);
-- or --
CREATE INDEX ON Cache(txid, a); -- if a is lookup key etc.

When you insert to the table let the default catch the current txid
and make sure that all queries are properly filtering the table on
txid, and that all indexes are left prefixed on txid.

Why do this? Now the record delete operations can be delegated to an
external process.  At any time, a scheduled process can do:
DELETE from Cache;

This is not guaranteed to be faster, but it probably will be.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pgbouncer - massive overhead?

2012-06-20 Thread Steve Crawford

On 06/19/2012 09:00 AM, Strange, John W wrote:

Given a baseline postgresql.conf config and a couple DL580 40 core/256GB memory 
I noticed a large over head for pgbouncer, has  anyone seen this before?


$ pgbench -h `hostname -i` -j 32 -p 4320 -U asgprod -s 500 -c 32 -S -T 60 
pgbench_500
Scale option ignored, using pgbench_branches table count = 500
starting vacuum...end.
transaction type: SELECT only
scaling factor: 500
query mode: simple
number of clients: 32
number of threads: 32
duration: 60 s
number of transactions actually processed: 1743073
tps = 29049.88 (including connections establishing)
tps = 29050.308194 (excluding connections establishing)

$ pgbench -h `hostname -i` -j 32 -p 4310 -U asgprod -s 500 -c 32 -S -T 60 
pgbench_500
Scale option ignored, using pgbench_branches table count = 500
starting vacuum...end.
transaction type: SELECT only
scaling factor: 500
query mode: simple
number of clients: 32
number of threads: 32
duration: 60 s
number of transactions actually processed: 8692204
tps = 144857.505107 (including connections establishing)
tps = 144880.181341 (excluding connections establishing)

processor   : 39
vendor_id   : GenuineIntel
cpu family  : 6
model   : 47
model name  :Intel(R) Xeon(R) CPU E7- 4860  @ 2.27GHz


I'm very dubious that the stats are meaningful as run. Were the above 
stats generated on consecutive runs on the same machine or was the test 
database fully returned to baseline between runs and the machine 
restarted to clear cache?


I doubt anyone here would trust the results of a 60-second pgbench run - 
especially a select-only test on a server that will likely end up with 
virtually everything ultimately in cache. Make sure each run is started 
from the same state and run for 30-60 minutes.


Still, you *are* adding a layer between the client and the server. 
Running the simplest of read-only queries against a fully-cached 
database on a fast many-core machine is likely to emphasize any latency 
introduced by pgbouncer. But it's also not a use-case for which 
pgbouncer is intended. If you were to add -C so each query required a 
new client connection a different picture would emerge. Same thing if 
you had 2000 client connections of which only a handful were running 
queries at any moment.


Cheers,
Steve


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] index-only scan is missing the INCLUDE feature

2012-06-20 Thread Shaun Thomas

On 06/20/2012 09:11 AM, Craig Ringer wrote:


For those of us who don't know MS-SQL, can you give a quick
explanation of what the INCLUDE keyword in an index definition is
expected to do, or some documentation references?


He's talking about what MS SQL Server commonly calls a covering index. 
In these cases, you can specify columns to be included in the index, but 
not actually part of the calculated hash. This prevents a trip to the 
table data, so selects can be serviced entirely by an index scan.


PostgreSQL is about half way there by allowing index-only scans, though 
I've no idea if they intend on adding further functionality like this. 
Effectively you can trade index bloat for query speed. But considering 
the differences between the engines, it might not be necessary. I 
couldn't say.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
stho...@optionshouse.com

__

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to 
this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] moving tables

2012-06-20 Thread Midge Brown
I need to move a postgres 9.0 database -- with tables, indexes, and wals 
associated with 16 tablespaces on 12 logical drives -- to an existing raid 10 
drive in another volume on the same server. Once I get the data off the initial 
12 drives they will be reconfigured, at which point I'll need to move 
everything from the 2nd volume to the aforementioned 12 logical drives on the 
first volume. This is being done both to free up the 2nd volume and to better 
utilize raid 10.

I checked around and found a way to create sql statements to alter the public 
tablespaces and indexes, but I haven't found anything that provides information 
about moving the numerous associated config files, log files, etc. 

ANY comments, suggestions, or direction to existing documentation would be 
greatly appreciated. 

Current server info:

- 4 dual-core AMD Opteron 2212 processors, 2010.485 MHz
- 64GB RAM
- 16 67GB RAID 1 drives and 1 464GB RAID 10 drive (all ext3) on 2 volumes.
- Linux 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30 EDT 2009 x86_64 x86_64 x86_64 
GNU/Linux


Thank you,
Midge





Re: [PERFORM] moving tables

2012-06-20 Thread Josh Berkus
On 6/20/12 3:27 PM, Midge Brown wrote:
 I need to move a postgres 9.0 database -- with tables, indexes, and wals 
 associated with 16 tablespaces on 12 logical drives -- to an existing raid 10 
 drive in another volume on the same server. Once I get the data off the 
 initial 12 drives they will be reconfigured, at which point I'll need to move 
 everything from the 2nd volume to the aforementioned 12 logical drives on the 
 first volume. This is being done both to free up the 2nd volume and to better 
 utilize raid 10.
 
 I checked around and found a way to create sql statements to alter the public 
 tablespaces and indexes, but I haven't found anything that provides 
 information about moving the numerous associated config files, log files, 
 etc. 
 
 ANY comments, suggestions, or direction to existing documentation would be 
 greatly appreciated. 

1. back everything up.

2. create a bunch of directories on the RAID10 to match the existing
tablespaces (they won't be mounts, but Postgres doesn't care about that).

3. shut down postgres

4. copy all your files to the new directories

5. change your mount points which were in use by the old tablespaces to
symlinks which point at the new diretories

6. start postgres back up from the new location

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] index-only scan is missing the INCLUDE feature

2012-06-20 Thread Craig Ringer

On 06/20/2012 11:32 PM, Shaun Thomas wrote:

On 06/20/2012 09:11 AM, Craig Ringer wrote:


For those of us who don't know MS-SQL, can you give a quick
explanation of what the INCLUDE keyword in an index definition is
expected to do, or some documentation references?


He's talking about what MS SQL Server commonly calls a covering 
index. In these cases, you can specify columns to be included in the 
index, but not actually part of the calculated hash. This prevents a 
trip to the table data, so selects can be serviced entirely by an 
index scan.


Oh, OK, so it's a covering index with added fields that don't form part 
of the searchable index structure to make the index a little less 
expensive than a fully covering index on all the columns of interest. 
Fair enough. Thanks for the explanation.


Eyal, you'll get a better response to questions about other DBMSs if you 
explain what you need/want to do with the desired feature and what that 
feature does in the other DBMS.


PostgreSQL is about half way there by allowing index-only scans, 
though I've no idea if they intend on adding further functionality 
like this.


There's certainly lots of interest in adding more, but not that many 
people with the expertise to be able to do it - and fewer still who're 
paid to work on Pg so they have time to focus on it. Covering indexes 
with Pg's MVCC model seem to be particularly challenging, too.


--
Craig Ringer



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance