Re: [HACKERS] improvements to pgtune
FYI, I can help if you need javascript assistance. --- Greg Smith wrote: Shiv wrote: So my exams are over now and am fully committed to the project in terms of time. I have started compiling a sort of personal todo for myself. I agree with your advice to start the project with small steps first. (I have a copy of the code and am trying to glean as much of it as I can) I just fixed a couple of bugs in the program that were easier to correct than explain. The code changes have been pushed to the github repo. I've also revised the output format to be a lot nicer. There's a UI shortcut you may find useful too; the program now takes a single input parameter as the input file, outputting to standard out. So a sample run might look like this now: $ ./pgtune postgresql.conf.sample [old settings] #-- # pgtune wizard run on 2011-05-08 # Based on 2060728 KB RAM in the server #-- default_statistics_target = 100 maintenance_work_mem = 120MB checkpoint_completion_target = 0.9 effective_cache_size = 1408MB work_mem = 12MB wal_buffers = 8MB checkpoint_segments = 16 shared_buffers = 480MB max_connections = 80 I would really appreciate your reply to Josh's thoughts. It would help me understand the variety of tasks and a possible ordering for me to attempt them. Josh's comments :/ What would you list as the main things pgtune doesn't cover right now? I have my own list, but I suspect that yours is somewhat different./ / / /I do think that autotuning based on interrogating the database is possible. However, I think the way to make it not be a tar baby is to tackle it one setting at a time, and start with ones we have the most information for. One of the real challenges there is that some data can be gleaned from pg_* views, but a *lot* of useful performance data only shows up in the activity log, and then only if certain settings are enabled./ I just revised the entire TODO file (which is now TODO.rst, formatted in ReST markup: http://docutils.sourceforge.net/rst.html ; test with rst2html TODO.rst TODO.html and look at the result). It should be easier to follow the flow of now, and it's organized in approximately the order I think things need to get finished in. There are few major areas for expansion that might happen on this program to choose from. I was thinking about doing them in this order: 1) Fix the settings validation and limits. I consider this a good place to start on hacking the code. it's really necessary work eventually, and it's easier to get started with than the other ideas. 2) Improve internals related to tracking things like memory and connections so they're easier to pass around the program. Adding a platform class is what I was thinking of. See the Estimating shared memory usage section of the TODO for more information. Add PostgreSQL version as another input to that. 3) Improve the settings model used for existing parameters. Right now people have reported that the work_mem settings suggested in particular are too high for many servers. Ideas about why that is are in the TODO. (This really requires the platform change be done first, or the code will be too hard to write/maintain) 4) Estimate memory used by the configuration and output sysctl.conf files. (Needs platform change too) 5) Add tuning suggestions for new parameters. The most obvious ideas all involve adding common logging changes. 6) Create some new UIs for running the program. A text-based program that asked questions (a 'wizard') or a GUI program doing the same are two common suggestions. The ideas Josh was talking about for interrogating the database for things are all a long ways off from the current state of the code being able to support them. If (1) through (3) here were done, that whole direction starts with (5) and then runs further that way. That might be a valid direction to move next instead of the (4), (6) I've listed here. You'd have finished something that taught enough about how the existing program works to be able to make some more difficult design decisions about fitting new features into it. If you really want to get right into live server analysis, there's no way for that to fit into the current program yet. And I don't think you'll get enough practice to see how it would without doing some more basic work first. You might as well write something new if that's your goal, and expect that you may not finish anything useful by the end of the summer. If you want to complete a project that results in code that people absolutely will use, the more boring plan I've outlined goes that way. One of
Re: [HACKERS] improvements to pgtune
Shiv wrote: So my exams are over now and am fully committed to the project in terms of time. I have started compiling a sort of personal todo for myself. I agree with your advice to start the project with small steps first. (I have a copy of the code and am trying to glean as much of it as I can) I just fixed a couple of bugs in the program that were easier to correct than explain. The code changes have been pushed to the github repo. I've also revised the output format to be a lot nicer. There's a UI shortcut you may find useful too; the program now takes a single input parameter as the input file, outputting to standard out. So a sample run might look like this now: $ ./pgtune postgresql.conf.sample [old settings] #-- # pgtune wizard run on 2011-05-08 # Based on 2060728 KB RAM in the server #-- default_statistics_target = 100 maintenance_work_mem = 120MB checkpoint_completion_target = 0.9 effective_cache_size = 1408MB work_mem = 12MB wal_buffers = 8MB checkpoint_segments = 16 shared_buffers = 480MB max_connections = 80 I would really appreciate your reply to Josh's thoughts. It would help me understand the variety of tasks and a possible ordering for me to attempt them. Josh's comments :/ What would you list as the main things pgtune doesn't cover right now? I have my own list, but I suspect that yours is somewhat different./ / / /I do think that autotuning based on interrogating the database is possible. However, I think the way to make it not be a tar baby is to tackle it one setting at a time, and start with ones we have the most information for. One of the real challenges there is that some data can be gleaned from pg_* views, but a *lot* of useful performance data only shows up in the activity log, and then only if certain settings are enabled./ I just revised the entire TODO file (which is now TODO.rst, formatted in ReST markup: http://docutils.sourceforge.net/rst.html ; test with rst2html TODO.rst TODO.html and look at the result). It should be easier to follow the flow of now, and it's organized in approximately the order I think things need to get finished in. There are few major areas for expansion that might happen on this program to choose from. I was thinking about doing them in this order: 1) Fix the settings validation and limits. I consider this a good place to start on hacking the code. it's really necessary work eventually, and it's easier to get started with than the other ideas. 2) Improve internals related to tracking things like memory and connections so they're easier to pass around the program. Adding a platform class is what I was thinking of. See the Estimating shared memory usage section of the TODO for more information. Add PostgreSQL version as another input to that. 3) Improve the settings model used for existing parameters. Right now people have reported that the work_mem settings suggested in particular are too high for many servers. Ideas about why that is are in the TODO. (This really requires the platform change be done first, or the code will be too hard to write/maintain) 4) Estimate memory used by the configuration and output sysctl.conf files. (Needs platform change too) 5) Add tuning suggestions for new parameters. The most obvious ideas all involve adding common logging changes. 6) Create some new UIs for running the program. A text-based program that asked questions (a 'wizard') or a GUI program doing the same are two common suggestions. The ideas Josh was talking about for interrogating the database for things are all a long ways off from the current state of the code being able to support them. If (1) through (3) here were done, that whole direction starts with (5) and then runs further that way. That might be a valid direction to move next instead of the (4), (6) I've listed here. You'd have finished something that taught enough about how the existing program works to be able to make some more difficult design decisions about fitting new features into it. If you really want to get right into live server analysis, there's no way for that to fit into the current program yet. And I don't think you'll get enough practice to see how it would without doing some more basic work first. You might as well write something new if that's your goal, and expect that you may not finish anything useful by the end of the summer. If you want to complete a project that results in code that people absolutely will use, the more boring plan I've outlined goes that way. One of the secrets to software development is that ideas for complicated features rarely result in software that gets released, while working on simpler programs that don't aim so high leads to software that ships to the world and finds users. The only
Re: [HACKERS] improvements to pgtune
Hi Greg, So my exams are over now and am fully committed to the project in terms of time. I have started compiling a sort of personal todo for myself. I agree with your advice to start the project with small steps first. (I have a copy of the code and am trying to glean as much of it as I can) I would really appreciate your reply to Josh's thoughts. It would help me understand the variety of tasks and a possible ordering for me to attempt them. Josh's comments :* What would you list as the main things pgtune doesn't cover right now? I have my own list, but I suspect that yours is somewhat different.* * * *I do think that autotuning based on interrogating the database is possible. However, I think the way to make it not be a tar baby is to tackle it one setting at a time, and start with ones we have the most information for. One of the real challenges there is that some data can be gleaned from pg_* views, but a *lot* of useful performance data only shows up in the activity log, and then only if certain settings are enabled.* Regards, Shiv On Thu, Apr 28, 2011 at 9:34 PM, Shiv rama.the...@gmail.com wrote: That's some great starting advice there. I have a couple of final exams in the next 36 hours. Will get to work almost immediately after that. I will definitely take small steps before going for some of the tougher tasks. I would of-course like this conversation to go on, so I can see a more comprehensive TODO list. One of my first tasks on GSoC is to make sure I create a good project specification document. So there can be definite expectations and targets. This conversation helps me do that! Regards, Shiv On Thu, Apr 28, 2011 at 9:50 AM, Greg Smith g...@2ndquadrant.com wrote: Shiv wrote: On the program I hope to learn as much about professional software engineering principles as PostgreSQL. My project is aimed towards extending and hopefully improving upon pgtune. If any of you have some ideas or thoughts to share. I am all ears!! Well, first step on the software engineering side is to get a copy of the code in a form you can modify. I'd recommend grabbing it from https://github.com/gregs1104/pgtune ; while there is a copy of the program on git.postgresql.org, it's easier to work with the one on github instead. I can push updates over to the copy on postgresql.org easily enough, and that way you don't have to worry about getting an account on that server. There's a long list of suggested improvements to make at https://github.com/gregs1104/pgtune/blob/master/TODO Where I would recommend getting started is doing some of the small items on there, some of which I have already put comments into the code about but just not finished yet. Some examples: -Validate against min/max -Show original value in output -Limit shared memory use on Windows (see notes on shared_buffers at http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server for more information) -Look for postgresql.conf file using PGDATA environment variable -Look for settings files based on path of the pgtune executable -Save a settings reference files for newer versions of PostgreSQL (right now I only target 8.4) and allow passing in the version you're configuring. A common mistake made by GSOC students is to dive right in to trying to make big changes. You'll be more successful if you get practice at things like preparing and sharing patches on smaller changes first. At the next level, there are a few larger features that I would consider valuable that are not really addressed by the program yet: -Estimate how much shared memory is used by the combination of settings. See Table 17-2 at http://www.postgresql.org/docs/9.0/static/kernel-resources.html ; those numbers aren't perfect, and improving that table is its own useful project. But it gives an idea how they fit together. I have some notes at the end of the TODO file on how I think the information needed to produce this needs to be passed around the inside of pgtune. -Use that estimate to produce a sysctl.conf file for one platform; Linux is the easiest one to start with. I've attached a prototype showing how to do that, written in bash. -Write a Python-TK or web-based front-end for the program. Now that I know someone is going to work on this program again, I'll see what I can do to clean some parts of it up. There are a couple of things it's easier for me to just fix rather than to describe, like the way I really want to change how it adds comments to the settings it changes. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us #!/bin/bash # Output lines suitable for sysctl configuration based # on total amount of RAM on the system. The output # will allow up to 50% of physical memory to be allocated # into shared memory. # On Linux, you can use it as follows (as root): # # ./shmsetup
Re: [HACKERS] improvements to pgtune
That's some great starting advice there. I have a couple of final exams in the next 36 hours. Will get to work almost immediately after that. I will definitely take small steps before going for some of the tougher tasks. I would of-course like this conversation to go on, so I can see a more comprehensive TODO list. One of my first tasks on GSoC is to make sure I create a good project specification document. So there can be definite expectations and targets. This conversation helps me do that! Regards, Shiv On Thu, Apr 28, 2011 at 9:50 AM, Greg Smith g...@2ndquadrant.com wrote: Shiv wrote: On the program I hope to learn as much about professional software engineering principles as PostgreSQL. My project is aimed towards extending and hopefully improving upon pgtune. If any of you have some ideas or thoughts to share. I am all ears!! Well, first step on the software engineering side is to get a copy of the code in a form you can modify. I'd recommend grabbing it from https://github.com/gregs1104/pgtune ; while there is a copy of the program on git.postgresql.org, it's easier to work with the one on github instead. I can push updates over to the copy on postgresql.org easily enough, and that way you don't have to worry about getting an account on that server. There's a long list of suggested improvements to make at https://github.com/gregs1104/pgtune/blob/master/TODO Where I would recommend getting started is doing some of the small items on there, some of which I have already put comments into the code about but just not finished yet. Some examples: -Validate against min/max -Show original value in output -Limit shared memory use on Windows (see notes on shared_buffers at http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server for more information) -Look for postgresql.conf file using PGDATA environment variable -Look for settings files based on path of the pgtune executable -Save a settings reference files for newer versions of PostgreSQL (right now I only target 8.4) and allow passing in the version you're configuring. A common mistake made by GSOC students is to dive right in to trying to make big changes. You'll be more successful if you get practice at things like preparing and sharing patches on smaller changes first. At the next level, there are a few larger features that I would consider valuable that are not really addressed by the program yet: -Estimate how much shared memory is used by the combination of settings. See Table 17-2 at http://www.postgresql.org/docs/9.0/static/kernel-resources.html ; those numbers aren't perfect, and improving that table is its own useful project. But it gives an idea how they fit together. I have some notes at the end of the TODO file on how I think the information needed to produce this needs to be passed around the inside of pgtune. -Use that estimate to produce a sysctl.conf file for one platform; Linux is the easiest one to start with. I've attached a prototype showing how to do that, written in bash. -Write a Python-TK or web-based front-end for the program. Now that I know someone is going to work on this program again, I'll see what I can do to clean some parts of it up. There are a couple of things it's easier for me to just fix rather than to describe, like the way I really want to change how it adds comments to the settings it changes. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us #!/bin/bash # Output lines suitable for sysctl configuration based # on total amount of RAM on the system. The output # will allow up to 50% of physical memory to be allocated # into shared memory. # On Linux, you can use it as follows (as root): # # ./shmsetup /etc/sysctl.conf # sysctl -p # Early FreeBSD versions do not support the sysconf interface # used here. The exact version where this works hasn't # been confirmed yet. page_size=`getconf PAGE_SIZE` phys_pages=`getconf _PHYS_PAGES` if [ -z $page_size ]; then echo Error: cannot determine page size exit 1 fi if [ -z $phys_pages ]; then echo Error: cannot determine number of memory pages exit 2 fi shmall=`expr $phys_pages / 2` shmmax=`expr $shmall \* $page_size` echo \# Maximum shared segment size in bytes echo kernel.shmmax = $shmmax echo \# Maximum number of shared memory segments in pages echo kernel.shmall = $shmall
Re: [HACKERS] improvements to pgtune
Shiv wrote: On the program I hope to learn as much about professional software engineering principles as PostgreSQL. My project is aimed towards extending and hopefully improving upon pgtune. If any of you have some ideas or thoughts to share. I am all ears!! Well, first step on the software engineering side is to get a copy of the code in a form you can modify. I'd recommend grabbing it from https://github.com/gregs1104/pgtune ; while there is a copy of the program on git.postgresql.org, it's easier to work with the one on github instead. I can push updates over to the copy on postgresql.org easily enough, and that way you don't have to worry about getting an account on that server. There's a long list of suggested improvements to make at https://github.com/gregs1104/pgtune/blob/master/TODO Where I would recommend getting started is doing some of the small items on there, some of which I have already put comments into the code about but just not finished yet. Some examples: -Validate against min/max -Show original value in output -Limit shared memory use on Windows (see notes on shared_buffers at http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server for more information) -Look for postgresql.conf file using PGDATA environment variable -Look for settings files based on path of the pgtune executable -Save a settings reference files for newer versions of PostgreSQL (right now I only target 8.4) and allow passing in the version you're configuring. A common mistake made by GSOC students is to dive right in to trying to make big changes. You'll be more successful if you get practice at things like preparing and sharing patches on smaller changes first. At the next level, there are a few larger features that I would consider valuable that are not really addressed by the program yet: -Estimate how much shared memory is used by the combination of settings. See Table 17-2 at http://www.postgresql.org/docs/9.0/static/kernel-resources.html ; those numbers aren't perfect, and improving that table is its own useful project. But it gives an idea how they fit together. I have some notes at the end of the TODO file on how I think the information needed to produce this needs to be passed around the inside of pgtune. -Use that estimate to produce a sysctl.conf file for one platform; Linux is the easiest one to start with. I've attached a prototype showing how to do that, written in bash. -Write a Python-TK or web-based front-end for the program. Now that I know someone is going to work on this program again, I'll see what I can do to clean some parts of it up. There are a couple of things it's easier for me to just fix rather than to describe, like the way I really want to change how it adds comments to the settings it changes. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us #!/bin/bash # Output lines suitable for sysctl configuration based # on total amount of RAM on the system. The output # will allow up to 50% of physical memory to be allocated # into shared memory. # On Linux, you can use it as follows (as root): # # ./shmsetup /etc/sysctl.conf # sysctl -p # Early FreeBSD versions do not support the sysconf interface # used here. The exact version where this works hasn't # been confirmed yet. page_size=`getconf PAGE_SIZE` phys_pages=`getconf _PHYS_PAGES` if [ -z $page_size ]; then echo Error: cannot determine page size exit 1 fi if [ -z $phys_pages ]; then echo Error: cannot determine number of memory pages exit 2 fi shmall=`expr $phys_pages / 2` shmmax=`expr $shmall \* $page_size` echo \# Maximum shared segment size in bytes echo kernel.shmmax = $shmmax echo \# Maximum number of shared memory segments in pages echo kernel.shmall = $shmall -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] improvements to pgtune
Daniel Farina wrote: It seems like in general it lacks a feedback mechanism to figure things out settings from workloads, instead relying on Greg Smith's sizable experience to do some arithmetic and get you off the ground in a number of common cases. To credit appropriately, the model used right now actually originated with a Josh Berkus spreadsheet, from before I was doing this sort of work full-time. That's held up pretty well, but it doesn't fully reflect how I do things nowadays. The recent realization that pgtune is actually shipping as a package for Debian/Ubuntu now has made realize this is a much higher profile project now, one that I should revisit doing a better job on. Every time I've gotten pulled into discussions of setting parameters based on live monitoring, it's turned into a giant black hole--absorbs a lot of energy, nothing useful escapes from it. I credit completely ignoring that idea altogether, and using the simplest possible static settings instead, as one reason I managed to ship code here that people find useful. I'm not closed to the idea, just not optimistic it will lead anywhere useful. That makes it hard to work on when there are so many obvious things guaranteed to improve the program that could be done instead. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] improvements to pgtune
Every time I've gotten pulled into discussions of setting parameters based on live monitoring, it's turned into a giant black hole--absorbs a lot of energy, nothing useful escapes from it. I credit completely ignoring that idea altogether, and using the simplest possible static settings instead, as one reason I managed to ship code here that people find useful. I'm not closed to the idea, just not optimistic it will lead anywhere useful. That makes it hard to work on when there are so many obvious things guaranteed to improve the program that could be done instead. What would you list as the main things pgtune doesn't cover right now? I have my own list, but I suspect that yours is somewhat different. I do think that autotuning based on interrogating the database is possible. However, I think the way to make it not be a tar baby is to tackle it one setting at a time, and start with ones we have the most information for. One of the real challenges there is that some data can be gleaned from pg_* views, but a *lot* of useful performance data only shows up in the activity log, and then only if certain settings are enabled. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com San Francisco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers