Re: [PERFORM] Recommended Initial Settings

2007-02-23 Thread Richard Huxton

Campbell, Lance wrote:

I would like to get someone's recommendations on the best initial
settings for a dedicated PostgreSQL server.  I do realize that there are
a lot of factors that influence how one should configure a database.  I
am just looking for a good starting point.  Ideally I would like the
database to reside as much as possible in memory with no disk access.
The current database size of my 7.x version of PostgreSQL generates a 6
Gig file when doing a database dump.


Your operating-system should be doing the caching for you.


Dedicated PostgreSQL 8.2 Server
Redhat Linux 4.x AS 64 bit version (EM64T)
4 Intel Xeon Processors


If these are older Xeons, check the mailing list archives for xeon 
context switch.



20 Gig Memory
Current PostgreSQL database is 6 Gig file when doing a database dump


OK, so it's plausible the whole thing will fit in RAM (as a 
rule-of-thumb I assume headers, indexes etc. triple or quadruple the 
size). To know better, check the actual disk-usage of $PGDATA.



/etc/sysctl.conf  file settings:

# 11 Gig

kernel.shmmax = 11811160064


Hmm - that's a lot of shared RAM. See shared_buffers below.


kernel.sem = 250 32000 100 128

net.ipv4.ip_local_port_range = 1024 65000

net.core.rmem_default = 262144  

net.core.rmem_max = 262144 


net.core.wmem_default = 262144

net.core.wmem_max = 262144  



postgresql.conf file settings (if not listed then I used the defaults):

max_connections = 300


How many connections do you expect typically/peak? It doesn't cost much 
to have max_connections set high but your workload is the most important 
thing missing from your question.



shared_buffers = 10240MB


For 7.x that's probably way too big, but 8.x organises its buffers 
better. I'd still be tempted to start a 1 or 2GB and work up - see where 
it stops buying you an improvement.



work_mem = 10MB


If you have large queries, doing big sorts I'd increase this. Don't 
forget it's per-sort, so if you have got about 300 connections live at 
any one time that could be 300*10MB*N if they're all doing something 
complicated. If you only have one connection live, you can increase this 
quite substantially.



effective_cache_size = 512MB


This isn't setting PG's memory usage, it's telling PG how much data your 
operating-system is caching. Check free and see what it says. For you, 
I'd expect 10GB+.



maintenance_work_mem = 100MB


This is for admin-related tasks, so you could probably increase it.

Workload workload workload - we need to know what you're doing with it. 
Once connection summarising the entire database will want larger numbers 
than 100 connections running many small queries.


HTH
--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Recommended Initial Settings

2007-02-23 Thread Campbell, Lance
Richard,
Thanks for your reply.  

You said:
Your operating-system should be doing the caching for you.

My understanding is that as long as Linux has memory available it will
cache files.  Then from your comment I get the impression that since
Linux would be caching the data files for the postgres database it would
be redundant to have a large shared_buffers.  Did I understand you
correctly?

Thanks,



Lance Campbell
Project Manager/Software Architect
Web Services at Public Affairs
University of Illinois
217.333.0382
http://webservices.uiuc.edu
 
-Original Message-
From: Richard Huxton [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 23, 2007 10:29 AM
To: Campbell, Lance
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recommended Initial Settings

Campbell, Lance wrote:
 I would like to get someone's recommendations on the best initial
 settings for a dedicated PostgreSQL server.  I do realize that there
are
 a lot of factors that influence how one should configure a database.
I
 am just looking for a good starting point.  Ideally I would like the
 database to reside as much as possible in memory with no disk access.
 The current database size of my 7.x version of PostgreSQL generates a
6
 Gig file when doing a database dump.

Your operating-system should be doing the caching for you.

 Dedicated PostgreSQL 8.2 Server
 Redhat Linux 4.x AS 64 bit version (EM64T)
 4 Intel Xeon Processors

If these are older Xeons, check the mailing list archives for xeon 
context switch.

 20 Gig Memory
 Current PostgreSQL database is 6 Gig file when doing a database dump

OK, so it's plausible the whole thing will fit in RAM (as a 
rule-of-thumb I assume headers, indexes etc. triple or quadruple the 
size). To know better, check the actual disk-usage of $PGDATA.

 /etc/sysctl.conf  file settings:
 
 # 11 Gig
 
 kernel.shmmax = 11811160064

Hmm - that's a lot of shared RAM. See shared_buffers below.

 kernel.sem = 250 32000 100 128
 
 net.ipv4.ip_local_port_range = 1024 65000
 
 net.core.rmem_default = 262144  
 
 net.core.rmem_max = 262144 
 
 net.core.wmem_default = 262144
 
 net.core.wmem_max = 262144  

 postgresql.conf file settings (if not listed then I used the
defaults):
 
 max_connections = 300

How many connections do you expect typically/peak? It doesn't cost much 
to have max_connections set high but your workload is the most important

thing missing from your question.

 shared_buffers = 10240MB

For 7.x that's probably way too big, but 8.x organises its buffers 
better. I'd still be tempted to start a 1 or 2GB and work up - see where

it stops buying you an improvement.

 work_mem = 10MB

If you have large queries, doing big sorts I'd increase this. Don't 
forget it's per-sort, so if you have got about 300 connections live at 
any one time that could be 300*10MB*N if they're all doing something 
complicated. If you only have one connection live, you can increase this

quite substantially.

 effective_cache_size = 512MB

This isn't setting PG's memory usage, it's telling PG how much data your

operating-system is caching. Check free and see what it says. For you,

I'd expect 10GB+.

 maintenance_work_mem = 100MB

This is for admin-related tasks, so you could probably increase it.

Workload workload workload - we need to know what you're doing with it. 
Once connection summarising the entire database will want larger numbers

than 100 connections running many small queries.

HTH
-- 
   Richard Huxton
   Archonet Ltd

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Recommended Initial Settings

2007-02-23 Thread Richard Huxton

Campbell, Lance wrote:

Richard,
Thanks for your reply.  


You said:
Your operating-system should be doing the caching for you.

My understanding is that as long as Linux has memory available it will
cache files.  Then from your comment I get the impression that since
Linux would be caching the data files for the postgres database it would
be redundant to have a large shared_buffers.  Did I understand you
correctly?


That's right - PG works with the O.S. This means it *might* not be a big 
advantage to have a large shared_buffers.


On older versions of PG, the buffer management code wasn't great with 
large shared_buffers values too.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Recommended Initial Settings

2007-02-23 Thread Jim C. Nasby
If you're doing much updating at all you'll also want to bump up
checkpoint_segments. I like setting checkpoint_warning just a bit under
checkpoint_timeout as a way to monitor how often you're checkpointing
due to running out of segments.

With a large shared_buffers you'll likely need to make the bgwriter more
aggressive as well (increase the max_pages numbers), though how
important that is depends on how much updating you're doing. If you see
periodic spikes in IO corresponding to checkpoints, that's an indication
bgwriter isn't doing a good enough job.

If everything ends up in memory, it might be good to decrease
random_page_cost to 1 or something close to it; though the database
should just rely on effective_cache to figure out that everything's in
memory.

If you're on pre-8.2, you'll want to cut all the autovacuum parameters
in half, if you're using it.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Recommended Initial Settings

2007-02-23 Thread Bill Moran
In response to Campbell, Lance [EMAIL PROTECTED]:

 Richard,
 Thanks for your reply.  
 
 You said:
 Your operating-system should be doing the caching for you.
 
 My understanding is that as long as Linux has memory available it will
 cache files.  Then from your comment I get the impression that since
 Linux would be caching the data files for the postgres database it would
 be redundant to have a large shared_buffers.  Did I understand you
 correctly?

Keep in mind that keeping the data in the kernel's buffer requires
Postgres to make a syscall to read a file, which the kernel then realizes
is cached in memory.  The kernel then has to make that data available
to the Postgres (userland) process.

If the data is in Postgres' buffers, Postgres can fetch it directly, thus
avoiding the overhead of the syscalls and the kernel activity.  You still
have to make sysvshm calls, though.

So, it depends on which is able to manage the memory better.  Is the
kernel so much more efficient that it makes up for the overhead of the
syscalls?  My understanding is that in recent versions of Postgres,
this is not the case, and large shared_buffers improve performance.
I've yet to do any in-depth testing on this, though.

-- 
Bill Moran
Collaborative Fusion Inc.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match