[PERFORM] Tuning Help - What did I do wrong?

2007-10-04 Thread Josh Trutwin
We have a pretty busy linux server running postgres 8.1.4, waiting to
upgrade until 8.3 to avoid dump/restoring twice.

# cat /proc/meminfo
total:used:free:  shared: buffers:  cached:
Mem:  3704217600 3592069120 1121484800 39460864 2316271616
Swap: 2516918272   270336 2516647936

# cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 15
model   : 4
model name  : Intel(R) Xeon(TM) CPU 3.00GHz
stepping: 3
cpu MHz : 2992.795

The postgresql.conf was basically the default so I decided to
increase the cache size and a couple paramaters to make more use of
that memory - here's what I did:

shared_buffers = 16384 (was 1000)
work_mem = 16384 (was 1024)
wal_buffers = 24 (was 8)
checkpoint_segments = 5 (was 3)
effective_cache_size = 1 (was 1000)
stats_command_string = on (was off)
stats_block_level = on (was off)
stats_row_level = on (was off)

In order to do this I had to change /proc/sys/kernel/shmmax to
536870912 (don't have /etc/sysctl)

Also, the entire cluster gets vacuumed analyzed nightly. 

After making these changes, the performance on the server actually
worsened.   I slowly backed off on some of the paramaters but didn't
seem to help.

Wondering if those changes are silly?  For a server this size I
didn't think this would be problematic.

Thank you,

Josh


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Tuning Help - What did I do wrong?

2007-10-04 Thread Scott Marlowe
On 10/4/07, Josh Trutwin [EMAIL PROTECTED] wrote:
 We have a pretty busy linux server running postgres 8.1.4, waiting to
 upgrade until 8.3 to avoid dump/restoring twice.

You should immediate update your version to 8.1.whateverislatest.
That requires no dump / restore and it is a bug fix update.  I doubt
this problem is because you're out of date on patches, but who
knows...

 # cat /proc/meminfo
 total:used:free:  shared: buffers:  cached:
 Mem:  3704217600 3592069120 1121484800 39460864 2316271616
 Swap: 2516918272   270336 2516647936

Well, you've got plenty of memory, and a large chunk is being used as cache.

 The postgresql.conf was basically the default so I decided to
 increase the cache size and a couple paramaters to make more use of
 that memory - here's what I did:

 shared_buffers = 16384 (was 1000)
 work_mem = 16384 (was 1024)
 wal_buffers = 24 (was 8)
 checkpoint_segments = 5 (was 3)
 effective_cache_size = 1 (was 1000)
 stats_command_string = on (was off)
 stats_block_level = on (was off)
 stats_row_level = on (was off)

Your changes seem reasonable.

 Also, the entire cluster gets vacuumed analyzed nightly.

You should look into running the autovacuum daemon.  for heavily used
databases nightly vacuuming may not be enough.

 After making these changes, the performance on the server actually
 worsened.   I slowly backed off on some of the paramaters but didn't
 seem to help.

Most likely turning on stats collection slowed you down a bit.

We need to see examples of what's slow, including explain analyze
output for slow queries.  Also a brief explanation of the type of load
your database server is seeing.  I.e. is it a lot of little
transactions, mostly read, batch processing, lots of users, one user,
etc...   Right now we don't have enough info to really help you.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Tuning Help - What did I do wrong?

2007-10-04 Thread Scott Marlowe
Oh, and in addition to  my previous message, you should use tools like
vmstat, iostat and top to get an idea of what your server is doing.

What kind of drive subsystem do you have?  What kind of raid controller?  etc...

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Tuning Help - What did I do wrong?

2007-10-04 Thread Josh Trutwin
On Thu, 4 Oct 2007 11:19:22 -0500
Scott Marlowe [EMAIL PROTECTED] wrote:

 We need to see examples of what's slow, including explain analyze
 output for slow queries.  Also a brief explanation of the type of
 load your database server is seeing.  I.e. is it a lot of little
 transactions, mostly read, batch processing, lots of users, one
 user, etc...   Right now we don't have enough info to really help
 you.

Sorry, this server is for a few (100+?) websites so it's running
along site apache, php.  All connections to postgresql (except for
the occaional psql console login) are done from php requests, using
the same user (basically there are two users, the one php uses and
postgres).  The bulk of the activity would be reads, but
certainly inesrts/updates/deletes would be interspersed in there.
Most of the activity is done via auto-commits, not many long
transactions.

From your followup email:

 ... you should use tools like vmstat, iostat and top to get an idea
 of what your server is doing.

# vmstat
   procs  memoryswap  io
system cpu
 r  b  w   swpd   free   buff  cache  si  sobibo   incs
us  sy  id
 3  1  0268  68332  39016 2201436   0   0 3 34
2   3   4   2

sorry about the wrapping...

iostat is not found - will see if I can download it.  top typically
shows postmaster as the top process with 10-15% of the CPU, followed
by apache threads.

 12:01pm  up 104 days, 12:05,  2 users,  load average: 9.75, 9.30,
7.70
215 processes: 214 sleeping, 1 running, 0 zombie, 0 stopped
CPU states:  0.1% user,  0.0% system,  0.0% nice,  0.4% idle
Mem:  3617400K av, 3552784K used,  64616K free,  0K shrd,  37456K
buff
Swap: 2457928K av,264K used, 2457664K free
2273664K cached

  PID USER PRI  NI  SIZE  RSS SHARE STAT  LIB %CPU %MEM   TIME
COMMAND
31797 postgres  17   0 28836  28M  1784 S   0  8.5  0.7  10:15
postmaster

 What kind of drive subsystem do you have?  What kind of raid
 controller?  etc...

Gathering more information on this - Raid is a software
RAID-1.  Some information: 
I believe itI believe it
# df -h
FilesystemSize  Used Avail Use% Mounted on
/dev/md0   66G   50G   16G  76% /
/dev/sda1  15M  6.6M  8.5M  44% /boot

# cat /proc/mdstat
Personalities : [raid0] [raid1]
read_ahead 1024 sectors
md0 : active raid1 sdb3[0] sdc3[1]
  70573440 blocks [2/2] [UU]

unused devices: none

Thanks for your help, I'm more of a developer guy so let me know what
else is useful.

Josh

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Tuning Help - What did I do wrong?

2007-10-04 Thread Scott Marlowe
On 10/4/07, Josh Trutwin [EMAIL PROTECTED] wrote:
 On Thu, 4 Oct 2007 11:19:22 -0500
 Scott Marlowe [EMAIL PROTECTED] wrote:

  We need to see examples of what's slow, including explain analyze
  output for slow queries.  Also a brief explanation of the type of
  load your database server is seeing.  I.e. is it a lot of little
  transactions, mostly read, batch processing, lots of users, one
  user, etc...   Right now we don't have enough info to really help
  you.

 Sorry, this server is for a few (100+?) websites so it's running
 along site apache, php.  All connections to postgresql (except for
 the occaional psql console login) are done from php requests, using
 the same user (basically there are two users, the one php uses and
 postgres).  The bulk of the activity would be reads, but
 certainly inesrts/updates/deletes would be interspersed in there.
 Most of the activity is done via auto-commits, not many long
 transactions.

So, are there certain queries that are much slower than the others?
Run them from psql with explain analyze in front of them and post the
query and the output here.

 From your followup email:

  ... you should use tools like vmstat, iostat and top to get an idea
  of what your server is doing.

 # vmstat
procs  memoryswap  io
 system cpu
  r  b  w   swpd   free   buff  cache  si  sobibo   incs
 us  sy  id
  3  1  0268  68332  39016 2201436   0   0 3 34
 2   3   4   2

vmstat needs to be run for a while to give you useful numbers.   try:

vmstat 5

and let it run for a few minutes.  The first line won't count so much,
but after that you'll get more reasonable numbers.

 iostat is not found - will see if I can download it.  top typically
 shows postmaster as the top process with 10-15% of the CPU, followed
 by apache threads.

What OS are you on?

  12:01pm  up 104 days, 12:05,  2 users,  load average: 9.75, 9.30,
 7.70

That's pretty heavy load.  I notice there's no wait % listed for CPU,
so I assume it's not a late model Linux kernel or anything.

 215 processes: 214 sleeping, 1 running, 0 zombie, 0 stopped
 CPU states:  0.1% user,  0.0% system,  0.0% nice,  0.4% idle
 Mem:  3617400K av, 3552784K used,  64616K free,  0K shrd,  37456K
 buff
 Swap: 2457928K av,264K used, 2457664K free
 2273664K cached

   PID USER PRI  NI  SIZE  RSS SHARE STAT  LIB %CPU %MEM   TIME
 COMMAND
 31797 postgres  17   0 28836  28M  1784 S   0  8.5  0.7  10:15
 postmaster

Are the postmasters using most of the CPU?  OR the other processes?

  What kind of drive subsystem do you have?  What kind of raid
  controller?  etc...

 Gathering more information on this - Raid is a software
 RAID-1.  Some information:

OK, given that it's read mostly, it's likely not a problem that a
faster RAID controller would help.  Possibly more drives in a RAID 10
would help a little, but let's look at optimizing your query and
postmaster first.

Do you have the postmaster configured to log long running queries?
That's a good starting point.  also google pg_fouine (I think I spelt
it right) for analyzing your logs.

It's quite likely the issue here is one long running query that
chewing all your I/O or CPU and making everything else slow.  Once we
find that query things should get better and we can worry about
performance tuning in a more leisurely manner.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Tuning Help - What did I do wrong?

2007-10-04 Thread Kevin Grittner
 On Thu, Oct 4, 2007 at 10:28 AM, in message
[EMAIL PROTECTED], Josh Trutwin
[EMAIL PROTECTED] wrote: 
 running postgres 8.1.4
 
 # cat /proc/meminfo
 total:used:free:  shared: buffers:  cached:
 Mem:  3704217600 3592069120 1121484800 39460864 2316271616
 
 shared_buffers = 16384 (was 1000)
 effective_cache_size = 1 (was 1000)
 
It's kind of silly to tell PostgreSQL that its total cache space is 1
pages when you've got more than that in shared buffers plus all that OS
cache space.  Try something around 285000 pages for effective_cache_size.
 
 stats_command_string = on (was off)
 stats_block_level = on (was off)
 stats_row_level = on (was off)
 
 After making these changes, the performance on the server actually
 worsened.   I slowly backed off on some of the paramaters but didn't
 seem to help.
 
Did you try turning off the collection of those additional statistics?
That isn't free.
 
You didn't get specific about what you saw in performance problems.  If
you are seeing occasional freezes of all queries, you are likely looking
at a known issue with spikiness of disk output.  For some this can be
corrected by using very aggressive background writer settings.  Some have
solved it by disabling OS write delays.  Some haven't found a solution and
are waiting for 8.3; there have been some serious changes made to attempt
to resolve this issue.
 
-Kevin
 



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Tuning Help - What did I do wrong?

2007-10-04 Thread Josh Trutwin
On Thu, 04 Oct 2007 14:03:07 -0500
Kevin Grittner [EMAIL PROTECTED] wrote:

 It's kind of silly to tell PostgreSQL that its total cache space is
 1 pages when you've got more than that in shared buffers plus
 all that OS cache space.  Try something around 285000 pages for
 effective_cache_size. 

Good point.

  stats_command_string = on (was off)
  stats_block_level = on (was off)
  stats_row_level = on (was off)
  
  After making these changes, the performance on the server actually
  worsened.   I slowly backed off on some of the paramaters but
  didn't seem to help.
  
 Did you try turning off the collection of those additional
 statistics? That isn't free.

I turned off all but row level since I decided to try turning
autovacuum on.

 You didn't get specific about what you saw in performance
 problems.  If you are seeing occasional freezes of all queries,
 you are likely looking at a known issue with spikiness of disk
 output.  For some this can be corrected by using very aggressive
 background writer settings.  Some have solved it by disabling OS
 write delays.  Some haven't found a solution and are waiting for
 8.3; there have been some serious changes made to attempt to
 resolve this issue. 

Thanks - I put some additional information in replies to Scott, but
mainly the performance of the web sites that are talking to postgres
is the problem - people calling in, etc.  I turned on slow query
logging to see if I can find if it's particular queries or something
else?  

Josh

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings