Re: [GENERAL] Partitioning Advice

2012-06-10 Thread Jasen Betts
On 2012-06-06, Ben Carbery ben.carb...@gmail.com wrote:

 In the current environment I am given a single VHDD which I have not
 partitioned at all. The SAN performance is pretty good, but we have noticed
 slowdowns at various times.. The database does a lot of logging - constant
 small writes, with some probably insignificant reads of smaller tables.
 Delays in logging can effect the service which is doing the logging and
 cause problems upstream. Typically this does not happen, but there are
 hourly jobs which generate stats from the logs. Due to their complexity the
 query planner always chooses to do sequential scans on the main log
 table. This table is truncated monthly when the data is archived to another
 table, but peaks in size at around 10GB at the end of the month. Generally
 any time the stats jobs are running there are delays which I would like to
 reduce/eliminate. There is also a fair bit of iowait on the cpu.

do many of the statistical queries use the whole month's data?
have you considered partitioning the log table by day?  if you can
reduce the number of rows involved in the table-scans by partitioning
it'll be help performance.





-- 
⚂⚃ 100% natural


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Partitioning Advice

2012-06-10 Thread Ben Carbery
 do many of the statistical queries use the whole month's data?
 have you considered partitioning the log table by day?  if you can
 reduce the number of rows involved in the table-scans by partitioning
 it'll be help performance.


I am summarising by hour, day, week and month. So I guess partitioning by
day would help for the daily and hourly summaries.

Thanks for the suggestions everyone, I'll look into all these ideas.

cheers


Re: [GENERAL] Partitioning Advice

2012-06-07 Thread Ben Carbery


 Since you are on RHEL 6 I would use ext4 throughout.

Great, I'll look into it. However my research suggested a journalled
filesystem is unnecessary for xlogs and I assume ext4 is journalled?


 You say you have I/O problems when stats jobs run.  Can you describe
 those jobs
 and what they are doing?


They summarise all the activity in the logs since the last run, essentially
counting logs. Since there are around 30 columns in each log and I want to
summarise in a number of ways - think multiple but separate groups bys with
a regexp to filter out some of the logs which are not interesting -
inevitably there are several passes over the table. Of course regexps are
also very slow in postgres. I found various indices did not help at all,
query planner thinks sequential scan is the way to go.



 If you have a lot of sequential scans on a 10GB table, that will suck
 majorly
 no matter how you tune it.


Ok.. but they I don't care how long they take, only that they don't affect
new writes to the table. So will splitting xlogs off into a different
partition/lun etc help?



 Two more things that you can try out:
 - Change the I/O scheduler to deadline by booting with
 elevator=deadline.

I'll check it out.


 - Split the 400GB LUN into several smaller LUNs and use tablespaces.


This could be worth doing on the active logging table if splitting off
xlogs won't help. I archive (copy) these records off to the a new table
every month and then use inheritance to access the entire logset..

log_master
..inherits
  log_active
  log_201205
  log_201204
  ..etc

This is how I got the active table down to 10GB :)



 I don't say that that is guaranteed to help, but I have made good
 experiences
 with it.

 Yours,
 Laurenz Albe


thanks,

Ben


Re: [GENERAL] Partitioning Advice

2012-06-07 Thread Greg Smith

On 06/06/2012 01:07 AM, Ben Carbery wrote:
The new server has a great deal more memory which I am hoping will 
help (shared_buffers = 8GB, total RAM 20GB), but I am looking at what 
might be optimal for the storage configuration. From looking at 
previous conversations here I am thinking of something like this..


100GB OS (ext3)
50GB pg_xlog (ext2)
400GB pg_data (ext3 data=writeback noatime?)

Hopefully this would mean the small writes can continue while a large 
read is going.



Latency on ext3 is better on RHEL6 than earlier versions, but it's still 
hard to get to keep it low with that filesystem.  You should consider 
ext4 or xfs instead if you're already running into slow periods limited 
by disk I/O.


Large values of shared_buffers can also make write latency spikes worse, 
particularly when the underlying storage isn't very capable--which is 
likely to be the case in a VM environment.  Most of the performance gain 
is from going from the tiny default (=32MB) for shared_buffers to a 
moderate size.  You'll probably get most of the performance gain setting 
that to around 1GB instead, and the worst case performance might improve.


If you already are seeing problems on your existing server, there are 
two things you could do to monitor what's going on:


-Turn on log_checkpoints on the server.  If you see high numbers for the 
sync= section, that normally narrows your problem very specifically to 
the database's background checkpoints.
-Watch /proc/meminfo , specificially the Dirty: number.  If that 
number gets very high during the same periods the slowdowns happen at, 
it might be possible to make things better by decreasing the amount of 
caching Linux does.  There's some intro material on that subject at 
http://notemagnet.blogspot.com/2008/08/linux-write-cache-mystery.html 
and http://blog.2ndquadrant.com/tuning_linux_for_low_postgresq/ (note 
that some of the links in that second one, to the test pgbench results, 
are broken; http://www.highperfpostgres.com/pgbench-results/index.htm is 
the right URL now)


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com




Re: [GENERAL] Partitioning Advice

2012-06-06 Thread Albe Laurenz
Ben Carbery wrote:
 I have a postgres server I need to move onto a new OS (RHEL6) on a new VM and 
 am looking for advice on
 how to partition the disks to gain some performance improvement.
 
 In the current environment I am given a single VHDD which I have not 
 partitioned at all. The SAN
 performance is pretty good, but we have noticed slowdowns at various times.. 
 The database does a lot
 of logging - constant small writes, with some probably insignificant reads of 
 smaller tables. Delays
 in logging can effect the service which is doing the logging and cause 
 problems upstream. Typically
 this does not happen, but there are hourly jobs which generate stats from the 
 logs. Due to their
 complexity the query planner always chooses to do sequential scans on the 
 main log table. This table
 is truncated monthly when the data is archived to another table, but peaks in 
 size at around 10GB at
 the end of the month. Generally any time the stats jobs are running there are 
 delays which I would
 like to reduce/eliminate. There is also a fair bit of iowait on the cpu.
 
 The new server has a great deal more memory which I am hoping will help 
 (shared_buffers = 8GB, total
 RAM 20GB), but I am looking at what might be optimal for the storage 
 configuration. From looking at
 previous conversations here I am thinking of something like this..
 
 100GB OS (ext3)
 50GB pg_xlog (ext2)
 400GB pg_data (ext3 data=writeback noatime?)
 
 Hopefully this would mean the small writes can continue while a large read is 
 going. Currently there
 is no streaming replication so only a gig or so is actually needed for xlogs. 
 We do however use slony
 to sync some smaller tables to a secondary which may or may not affect 
 anything.
 
 This is the first time I have needed to delve into the storage configuration 
 before of a database
 before so any advice or comments welcome.

Since you are on RHEL 6 I would use ext4 throughout.

You say you have I/O problems when stats jobs run.  Can you describe those 
jobs
and what they are doing?

If you have a lot of sequential scans on a 10GB table, that will suck majorly
no matter how you tune it.

Two more things that you can try out:
- Change the I/O scheduler to deadline by booting with elevator=deadline.
- Split the 400GB LUN into several smaller LUNs and use tablespaces.

I don't say that that is guaranteed to help, but I have made good experiences
with it.

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Partitioning Advice

2012-06-05 Thread Ben Carbery
I have a postgres server I need to move onto a new OS (RHEL6) on a new VM
and am looking for advice on how to partition the disks to gain some
performance improvement.

In the current environment I am given a single VHDD which I have not
partitioned at all. The SAN performance is pretty good, but we have noticed
slowdowns at various times.. The database does a lot of logging - constant
small writes, with some probably insignificant reads of smaller tables.
Delays in logging can effect the service which is doing the logging and
cause problems upstream. Typically this does not happen, but there are
hourly jobs which generate stats from the logs. Due to their complexity the
query planner always chooses to do sequential scans on the main log
table. This table is truncated monthly when the data is archived to another
table, but peaks in size at around 10GB at the end of the month. Generally
any time the stats jobs are running there are delays which I would like to
reduce/eliminate. There is also a fair bit of iowait on the cpu.

The new server has a great deal more memory which I am hoping will help
(shared_buffers = 8GB, total RAM 20GB), but I am looking at what might be
optimal for the storage configuration. From looking at previous
conversations here I am thinking of something like this..

100GB OS (ext3)
50GB pg_xlog (ext2)
400GB pg_data (ext3 data=writeback noatime?)

Hopefully this would mean the small writes can continue while a large read
is going. Currently there is no streaming replication so only a gig or so
is actually needed for xlogs. We do however use slony to sync some smaller
tables to a secondary which may or may not affect anything.

This is the first time I have needed to delve into the storage
configuration before of a database before so any advice or comments welcome.

Ben