Re: [HACKERS] improvements to pgtune

2011-05-10 Thread Bruce Momjian

FYI, I can help if you need javascript assistance.

---

Greg Smith wrote:
 Shiv wrote:
   So my exams are over now and am fully committed to the project in 
  terms of time. I have started compiling a sort of personal todo for 
  myself. I agree with your advice to start the project with small steps 
  first. (I have a copy of the code and am trying to glean as much of it 
  as I can)
 
 I just fixed a couple of bugs in the program that were easier to correct 
 than explain.  The code changes have been pushed to the github repo.  
 I've also revised the output format to be a lot nicer.  There's a UI 
 shortcut you may find useful too; the program now takes a single input 
 parameter as the input file, outputting to standard out.
 
 So a sample run might look like this now:
 
 $ ./pgtune postgresql.conf.sample
 [old settings]
 #--
 # pgtune wizard run on 2011-05-08
 # Based on 2060728 KB RAM in the server
 #--
 
 default_statistics_target = 100
 maintenance_work_mem = 120MB
 checkpoint_completion_target = 0.9
 effective_cache_size = 1408MB
 work_mem = 12MB
 wal_buffers = 8MB
 checkpoint_segments = 16
 shared_buffers = 480MB
 max_connections = 80
 
   I would really appreciate your reply to Josh's thoughts. It would 
  help me understand the variety of tasks and a possible ordering for me 
  to attempt them.
  Josh's comments :/ What would you list as the main things pgtune 
  doesn't cover right now?  I have my own list, but I suspect that yours 
  is somewhat different./
  /
  /
  /I do think that autotuning based on interrogating the database is 
  possible.  However, I think the way to make it not be a tar baby is to 
  tackle it one setting at a time, and start with ones we have the most 
  information for.  One of the real challenges there is that some data 
  can be gleaned from pg_* views, but a *lot* of useful performance data 
  only shows up in the activity log, and then only if certain settings 
  are enabled./
 
 I just revised the entire TODO file (which is now TODO.rst, formatted in 
 ReST markup:  http://docutils.sourceforge.net/rst.html ; test with 
 rst2html TODO.rst  TODO.html and look at the result).  It should be 
 easier to follow the flow of now, and it's organized in approximately 
 the order I think things need to get finished in.
 
 There are few major areas for expansion that might happen on this 
 program to choose from.  I was thinking about doing them in this order:
 
 1) Fix the settings validation and limits.  I consider this a good place 
 to start on hacking the code.  it's really necessary work eventually, 
 and it's easier to get started with than the other ideas.
 
 2) Improve internals related to tracking things like memory and 
 connections so they're easier to pass around the program.  Adding a 
 platform class is what I was thinking of.  See the Estimating shared 
 memory usage section of the TODO for more information.  Add PostgreSQL 
 version as another input to that.
 
 3) Improve the settings model used for existing parameters.  Right now 
 people have reported that the work_mem settings suggested in particular 
 are too high for many servers.  Ideas about why that is are in the 
 TODO.  (This really requires the platform change be done first, or the 
 code will be too hard to write/maintain)
 
 4) Estimate memory used by the configuration and output sysctl.conf 
 files.  (Needs platform change too)
 
 5) Add tuning suggestions for new parameters.  The most obvious ideas 
 all involve adding common logging changes.
 
 6) Create some new UIs for running the program.  A text-based program 
 that asked questions (a 'wizard') or a GUI program doing the same are 
 two common suggestions.
 
 The ideas Josh was talking about for interrogating the database for 
 things are all a long ways off from the current state of the code being 
 able to support them.  If (1) through (3) here were done, that whole 
 direction starts with (5) and then runs further that way.  That might be 
 a valid direction to move next instead of the (4), (6) I've listed 
 here.  You'd have finished something that taught enough about how the 
 existing program works to be able to make some more difficult design 
 decisions about fitting new features into it.
 
 If you really want to get right into live server analysis, there's no 
 way for that to fit into the current program yet.  And I don't think 
 you'll get enough practice to see how it would without doing some more 
 basic work first.  You might as well write something new if that's your 
 goal, and expect that you may not finish anything useful by the end of 
 the summer.  If you want to complete a project that results in code that 
 people absolutely will use, the more boring plan I've outlined goes that 
 way.  One of 

Re: [HACKERS] improvements to pgtune

2011-05-08 Thread Greg Smith

Shiv wrote:
 So my exams are over now and am fully committed to the project in 
terms of time. I have started compiling a sort of personal todo for 
myself. I agree with your advice to start the project with small steps 
first. (I have a copy of the code and am trying to glean as much of it 
as I can)


I just fixed a couple of bugs in the program that were easier to correct 
than explain.  The code changes have been pushed to the github repo.  
I've also revised the output format to be a lot nicer.  There's a UI 
shortcut you may find useful too; the program now takes a single input 
parameter as the input file, outputting to standard out.


So a sample run might look like this now:

$ ./pgtune postgresql.conf.sample
[old settings]
#--
# pgtune wizard run on 2011-05-08
# Based on 2060728 KB RAM in the server
#--

default_statistics_target = 100
maintenance_work_mem = 120MB
checkpoint_completion_target = 0.9
effective_cache_size = 1408MB
work_mem = 12MB
wal_buffers = 8MB
checkpoint_segments = 16
shared_buffers = 480MB
max_connections = 80

 I would really appreciate your reply to Josh's thoughts. It would 
help me understand the variety of tasks and a possible ordering for me 
to attempt them.
Josh's comments :/ What would you list as the main things pgtune 
doesn't cover right now?  I have my own list, but I suspect that yours 
is somewhat different./

/
/
/I do think that autotuning based on interrogating the database is 
possible.  However, I think the way to make it not be a tar baby is to 
tackle it one setting at a time, and start with ones we have the most 
information for.  One of the real challenges there is that some data 
can be gleaned from pg_* views, but a *lot* of useful performance data 
only shows up in the activity log, and then only if certain settings 
are enabled./


I just revised the entire TODO file (which is now TODO.rst, formatted in 
ReST markup:  http://docutils.sourceforge.net/rst.html ; test with 
rst2html TODO.rst  TODO.html and look at the result).  It should be 
easier to follow the flow of now, and it's organized in approximately 
the order I think things need to get finished in.


There are few major areas for expansion that might happen on this 
program to choose from.  I was thinking about doing them in this order:


1) Fix the settings validation and limits.  I consider this a good place 
to start on hacking the code.  it's really necessary work eventually, 
and it's easier to get started with than the other ideas.


2) Improve internals related to tracking things like memory and 
connections so they're easier to pass around the program.  Adding a 
platform class is what I was thinking of.  See the Estimating shared 
memory usage section of the TODO for more information.  Add PostgreSQL 
version as another input to that.


3) Improve the settings model used for existing parameters.  Right now 
people have reported that the work_mem settings suggested in particular 
are too high for many servers.  Ideas about why that is are in the 
TODO.  (This really requires the platform change be done first, or the 
code will be too hard to write/maintain)


4) Estimate memory used by the configuration and output sysctl.conf 
files.  (Needs platform change too)


5) Add tuning suggestions for new parameters.  The most obvious ideas 
all involve adding common logging changes.


6) Create some new UIs for running the program.  A text-based program 
that asked questions (a 'wizard') or a GUI program doing the same are 
two common suggestions.


The ideas Josh was talking about for interrogating the database for 
things are all a long ways off from the current state of the code being 
able to support them.  If (1) through (3) here were done, that whole 
direction starts with (5) and then runs further that way.  That might be 
a valid direction to move next instead of the (4), (6) I've listed 
here.  You'd have finished something that taught enough about how the 
existing program works to be able to make some more difficult design 
decisions about fitting new features into it.


If you really want to get right into live server analysis, there's no 
way for that to fit into the current program yet.  And I don't think 
you'll get enough practice to see how it would without doing some more 
basic work first.  You might as well write something new if that's your 
goal, and expect that you may not finish anything useful by the end of 
the summer.  If you want to complete a project that results in code that 
people absolutely will use, the more boring plan I've outlined goes that 
way.  One of the secrets to software development is that ideas for 
complicated features rarely result in software that gets released, while 
working on simpler programs that don't aim so high leads to software 
that ships to the world and finds users.  The only 

Re: [HACKERS] improvements to pgtune

2011-05-06 Thread Shiv
Hi Greg,
 So my exams are over now and am fully committed to the project in terms of
time. I have started compiling a sort of personal todo for myself. I agree
with your advice to start the project with small steps first. (I have a copy
of the code and am trying to glean as much of it as I can)
 I would really appreciate your reply to Josh's thoughts. It would help me
understand the variety of tasks and a possible ordering for me to attempt
them.
Josh's comments :* What would you list as the main things pgtune doesn't
cover right now?  I have my own list, but I suspect that yours is somewhat
different.*
*
*
*I do think that autotuning based on interrogating the database is possible.
 However, I think the way to make it not be a tar baby is to tackle it one
setting at a time, and start with ones we have the most information for.
 One of the real challenges there is that some data can be gleaned from pg_*
views, but a *lot* of useful performance data only shows up in the activity
log, and then only if certain settings are enabled.*
Regards,
Shiv


On Thu, Apr 28, 2011 at 9:34 PM, Shiv rama.the...@gmail.com wrote:

 That's some great starting advice there. I have a couple of final exams in
 the next 36 hours. Will get to work almost immediately after that.
 I will definitely take small steps before going for some of the tougher
 tasks. I would of-course like this conversation to go on, so I can see a
 more comprehensive TODO list.
 One of my first tasks on GSoC is to make sure I create a good project
 specification document. So there can be definite expectations and targets.
 This conversation helps me do that!
 Regards,
 Shiv


 On Thu, Apr 28, 2011 at 9:50 AM, Greg Smith g...@2ndquadrant.com wrote:

 Shiv wrote:

  On the program I hope to learn as much about professional software
 engineering principles as PostgreSQL. My project is aimed towards extending
 and hopefully improving upon pgtune. If any of you have some ideas or
 thoughts to share. I am all ears!!


 Well, first step on the software engineering side is to get a copy of the
 code in a form you can modify.  I'd recommend grabbing it from
 https://github.com/gregs1104/pgtune ; while there is a copy of the
 program on git.postgresql.org, it's easier to work with the one on github
 instead.  I can push updates over to the copy on postgresql.org easily
 enough, and that way you don't have to worry about getting an account on
 that server.

 There's a long list of suggested improvements to make at
 https://github.com/gregs1104/pgtune/blob/master/TODO

 Where I would recommend getting started is doing some of the small items
 on there, some of which I have already put comments into the code about but
 just not finished yet.  Some examples:

 -Validate against min/max
 -Show original value in output
 -Limit shared memory use on Windows (see notes on shared_buffers at
 http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server for more
 information)
 -Look for postgresql.conf file using PGDATA environment variable
 -Look for settings files based on path of the pgtune executable
 -Save a settings reference files for newer versions of PostgreSQL (right
 now I only target 8.4) and allow passing in the version you're configuring.

 A common mistake made by GSOC students is to dive right in to trying to
 make big changes.  You'll be more successful if you get practice at things
 like preparing and sharing patches on smaller changes first.

 At the next level, there are a few larger features that I would consider
 valuable that are not really addressed by the program yet:

 -Estimate how much shared memory is used by the combination of settings.
  See Table 17-2 at
 http://www.postgresql.org/docs/9.0/static/kernel-resources.html ; those
 numbers aren't perfect, and improving that table is its own useful project.
  But it gives an idea how they fit together.  I have some notes at the end
 of the TODO file on how I think the information needed to produce this needs
 to be passed around the inside of pgtune.

 -Use that estimate to produce a sysctl.conf file for one platform; Linux
 is the easiest one to start with.  I've attached a prototype showing how to
 do that, written in bash.

 -Write a Python-TK or web-based front-end for the program.

 Now that I know someone is going to work on this program again, I'll see
 what I can do to clean some parts of it up.  There are a couple of things
 it's easier for me to just fix rather than to describe, like the way I
 really want to change how it adds comments to the settings it changes.

 --
 Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
 PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



 #!/bin/bash

 # Output lines suitable for sysctl configuration based
 # on total amount of RAM on the system.  The output
 # will allow up to 50% of physical memory to be allocated
 # into shared memory.

 # On Linux, you can use it as follows (as root):
 #
 # ./shmsetup  

Re: [HACKERS] improvements to pgtune

2011-04-28 Thread Shiv
That's some great starting advice there. I have a couple of final exams in
the next 36 hours. Will get to work almost immediately after that.
I will definitely take small steps before going for some of the tougher
tasks. I would of-course like this conversation to go on, so I can see a
more comprehensive TODO list.
One of my first tasks on GSoC is to make sure I create a good project
specification document. So there can be definite expectations and targets.
This conversation helps me do that!
Regards,
Shiv


On Thu, Apr 28, 2011 at 9:50 AM, Greg Smith g...@2ndquadrant.com wrote:

 Shiv wrote:

  On the program I hope to learn as much about professional software
 engineering principles as PostgreSQL. My project is aimed towards extending
 and hopefully improving upon pgtune. If any of you have some ideas or
 thoughts to share. I am all ears!!


 Well, first step on the software engineering side is to get a copy of the
 code in a form you can modify.  I'd recommend grabbing it from
 https://github.com/gregs1104/pgtune ; while there is a copy of the program
 on git.postgresql.org, it's easier to work with the one on github instead.
  I can push updates over to the copy on postgresql.org easily enough, and
 that way you don't have to worry about getting an account on that server.

 There's a long list of suggested improvements to make at
 https://github.com/gregs1104/pgtune/blob/master/TODO

 Where I would recommend getting started is doing some of the small items on
 there, some of which I have already put comments into the code about but
 just not finished yet.  Some examples:

 -Validate against min/max
 -Show original value in output
 -Limit shared memory use on Windows (see notes on shared_buffers at
 http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server for more
 information)
 -Look for postgresql.conf file using PGDATA environment variable
 -Look for settings files based on path of the pgtune executable
 -Save a settings reference files for newer versions of PostgreSQL (right
 now I only target 8.4) and allow passing in the version you're configuring.

 A common mistake made by GSOC students is to dive right in to trying to
 make big changes.  You'll be more successful if you get practice at things
 like preparing and sharing patches on smaller changes first.

 At the next level, there are a few larger features that I would consider
 valuable that are not really addressed by the program yet:

 -Estimate how much shared memory is used by the combination of settings.
  See Table 17-2 at
 http://www.postgresql.org/docs/9.0/static/kernel-resources.html ; those
 numbers aren't perfect, and improving that table is its own useful project.
  But it gives an idea how they fit together.  I have some notes at the end
 of the TODO file on how I think the information needed to produce this needs
 to be passed around the inside of pgtune.

 -Use that estimate to produce a sysctl.conf file for one platform; Linux is
 the easiest one to start with.  I've attached a prototype showing how to do
 that, written in bash.

 -Write a Python-TK or web-based front-end for the program.

 Now that I know someone is going to work on this program again, I'll see
 what I can do to clean some parts of it up.  There are a couple of things
 it's easier for me to just fix rather than to describe, like the way I
 really want to change how it adds comments to the settings it changes.

 --
 Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
 PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



 #!/bin/bash

 # Output lines suitable for sysctl configuration based
 # on total amount of RAM on the system.  The output
 # will allow up to 50% of physical memory to be allocated
 # into shared memory.

 # On Linux, you can use it as follows (as root):
 #
 # ./shmsetup  /etc/sysctl.conf
 # sysctl -p

 # Early FreeBSD versions do not support the sysconf interface
 # used here.  The exact version where this works hasn't
 # been confirmed yet.

 page_size=`getconf PAGE_SIZE`
 phys_pages=`getconf _PHYS_PAGES`

 if [ -z $page_size ]; then
  echo Error:  cannot determine page size
  exit 1
 fi

 if [ -z $phys_pages ]; then
  echo Error:  cannot determine number of memory pages
  exit 2
 fi

 shmall=`expr $phys_pages / 2`
 shmmax=`expr $shmall \* $page_size`

 echo \# Maximum shared segment size in bytes
 echo kernel.shmmax = $shmmax
 echo \# Maximum number of shared memory segments in pages
 echo kernel.shmall = $shmall




Re: [HACKERS] improvements to pgtune

2011-04-27 Thread Greg Smith

Shiv wrote:
 On the program I hope to learn as much about professional software 
engineering principles as PostgreSQL. My project is aimed towards 
extending and hopefully improving upon pgtune. If any of you have some 
ideas or thoughts to share. I am all ears!!


Well, first step on the software engineering side is to get a copy of 
the code in a form you can modify.  I'd recommend grabbing it from 
https://github.com/gregs1104/pgtune ; while there is a copy of the 
program on git.postgresql.org, it's easier to work with the one on 
github instead.  I can push updates over to the copy on postgresql.org 
easily enough, and that way you don't have to worry about getting an 
account on that server.


There's a long list of suggested improvements to make at 
https://github.com/gregs1104/pgtune/blob/master/TODO


Where I would recommend getting started is doing some of the small items 
on there, some of which I have already put comments into the code about 
but just not finished yet.  Some examples:


-Validate against min/max
-Show original value in output
-Limit shared memory use on Windows (see notes on shared_buffers at 
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server for more 
information)

-Look for postgresql.conf file using PGDATA environment variable
-Look for settings files based on path of the pgtune executable
-Save a settings reference files for newer versions of PostgreSQL (right 
now I only target 8.4) and allow passing in the version you're configuring.


A common mistake made by GSOC students is to dive right in to trying to 
make big changes.  You'll be more successful if you get practice at 
things like preparing and sharing patches on smaller changes first.


At the next level, there are a few larger features that I would consider 
valuable that are not really addressed by the program yet:


-Estimate how much shared memory is used by the combination of 
settings.  See Table 17-2 at 
http://www.postgresql.org/docs/9.0/static/kernel-resources.html ; those 
numbers aren't perfect, and improving that table is its own useful 
project.  But it gives an idea how they fit together.  I have some notes 
at the end of the TODO file on how I think the information needed to 
produce this needs to be passed around the inside of pgtune.


-Use that estimate to produce a sysctl.conf file for one platform; Linux 
is the easiest one to start with.  I've attached a prototype showing how 
to do that, written in bash.


-Write a Python-TK or web-based front-end for the program.

Now that I know someone is going to work on this program again, I'll see 
what I can do to clean some parts of it up.  There are a couple of 
things it's easier for me to just fix rather than to describe, like the 
way I really want to change how it adds comments to the settings it changes.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


#!/bin/bash

# Output lines suitable for sysctl configuration based
# on total amount of RAM on the system.  The output
# will allow up to 50% of physical memory to be allocated
# into shared memory.

# On Linux, you can use it as follows (as root):
# 
# ./shmsetup  /etc/sysctl.conf
# sysctl -p

# Early FreeBSD versions do not support the sysconf interface
# used here.  The exact version where this works hasn't
# been confirmed yet.

page_size=`getconf PAGE_SIZE`
phys_pages=`getconf _PHYS_PAGES`

if [ -z $page_size ]; then
  echo Error:  cannot determine page size
  exit 1
fi

if [ -z $phys_pages ]; then
  echo Error:  cannot determine number of memory pages
  exit 2
fi

shmall=`expr $phys_pages / 2`
shmmax=`expr $shmall \* $page_size` 

echo \# Maximum shared segment size in bytes
echo kernel.shmmax = $shmmax
echo \# Maximum number of shared memory segments in pages
echo kernel.shmall = $shmall

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] improvements to pgtune

2011-04-27 Thread Greg Smith

Daniel Farina wrote:

It seems like in general it lacks a feedback mechanism to figure things out 
settings
from workloads, instead relying on Greg Smith's sizable experience to
do some arithmetic and get you off the ground in a number of common cases.
  


To credit appropriately, the model used right now actually originated 
with a Josh Berkus spreadsheet, from before I was doing this sort of 
work full-time.  That's held up pretty well, but it doesn't fully 
reflect how I do things nowadays.  The recent realization that pgtune is 
actually shipping as a package for Debian/Ubuntu now has made realize 
this is a much higher profile project now, one that I should revisit 
doing a better job on.


Every time I've gotten pulled into discussions of setting parameters 
based on live monitoring, it's turned into a giant black hole--absorbs a 
lot of energy, nothing useful escapes from it.  I credit completely 
ignoring that idea altogether, and using the simplest possible static 
settings instead, as one reason I managed to ship code here that people 
find useful.  I'm not closed to the idea, just not optimistic it will 
lead anywhere useful.  That makes it hard to work on when there are so 
many obvious things guaranteed to improve the program that could be done 
instead.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] improvements to pgtune

2011-04-27 Thread Joshua Berkus

 Every time I've gotten pulled into discussions of setting parameters 
 based on live monitoring, it's turned into a giant black hole--absorbs a 
 lot of energy, nothing useful escapes from it.  I credit completely 
 ignoring that idea altogether, and using the simplest possible static 
 settings instead, as one reason I managed to ship code here that people 
 find useful.  I'm not closed to the idea, just not optimistic it will 
 lead anywhere useful.  That makes it hard to work on when there are so 
 many obvious things guaranteed to improve the program that could be done 
 instead.

What would you list as the main things pgtune doesn't cover right now?  I have 
my own list, but I suspect that yours is somewhat different.

I do think that autotuning based on interrogating the database is possible.  
However, I think the way to make it not be a tar baby is to tackle it one 
setting at a time, and start with ones we have the most information for.  One 
of the real challenges there is that some data can be gleaned from pg_* views, 
but a *lot* of useful performance data only shows up in the activity log, and 
then only if certain settings are enabled.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
San Francisco

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers