[PERFORM] Tuning
Hi! I'm planning to move from mysql to postgresql as I believe the latter performs better when it comes to complex queries. The mysql database that I'm running is about 150 GB in size, with 300 million rows in the largest table. We do quite a lot of statistical analysis on the data which means heavy queries that run for days. Now that I've got two new servers with 32GB of ram I'm eager to switch to postgresql to improve perfomance. One database is to be an analysis server and the other an OLTP server feeding a web site with pages. I'm setting for Postgresql 8.1 as it is available as a package in Debian Etch AMD64. As I'm new to postgresql I've googled to find some tips and found some interesting links how configure and tune the database manager. Among others I've found the PowerPostgresql pages with a performance checklist and annotated guide to postgresql.conf [http://www.powerpostgresql.com/]. And of course the postgresql site itself is a good way to start. RevSys have a short guide as well [http://www.revsys.com/writings/postgresql-performance.html] I just wonder if someone on this list have some tips from the real world how to tune postgresql and what is to be avoided. AFAIK the following parameters seems important to adjust to start with are: -work_mem -maintenance_work_mem - 50% of the largest table? -shared_buffers - max value 5 -effective_cache_size - max 2/3 of available ram, ie 24GB on the hardware described above -shmmax - how large dare I set this value on dedicated postgres servers? -checkpoint_segments - this is crucial as one of the server is transaction heavy -vacuum_cost_delay Of course some values can only be estimated after database has been feed data and queries have been run in a production like manner. Cheers // John Ps. I sent to list before but the messages where withheld as I'm not a member of any of the restrict_post groups. This is perhaps due to the fact that we have changed email address a few weeks ago and there was a mismatch between addresses. So I apologize if any similar messages show up from me, just ignore them. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Tuning
On 26-Jan-07, at 6:28 AM, John Parnefjord wrote: Hi! I'm planning to move from mysql to postgresql as I believe the latter performs better when it comes to complex queries. The mysql database that I'm running is about 150 GB in size, with 300 million rows in the largest table. We do quite a lot of statistical analysis on the data which means heavy queries that run for days. Now that I've got two new servers with 32GB of ram I'm eager to switch to postgresql to improve perfomance. One database is to be an analysis server and the other an OLTP server feeding a web site with pages. I'm setting for Postgresql 8.1 as it is available as a package in Debian Etch AMD64. As I'm new to postgresql I've googled to find some tips and found some interesting links how configure and tune the database manager. Among others I've found the PowerPostgresql pages with a performance checklist and annotated guide to postgresql.conf [http://www.powerpostgresql.com/]. And of course the postgresql site itself is a good way to start. RevSys have a short guide as well [http://www.revsys.com/writings/postgresql-performance.html] I just wonder if someone on this list have some tips from the real world how to tune postgresql and what is to be avoided. AFAIK the following parameters seems important to adjust to start with are: -work_mem -maintenance_work_mem - 50% of the largest table? Isn't it possible for this to be larger than memory ? -shared_buffers - max value 5 Where does this shared buffers maximum come from ? It's wrong it should be 1/4 of available memory (8G) to start and tuned from there -effective_cache_size - max 2/3 of available ram, ie 24GB on the hardware described above -shmmax - how large dare I set this value on dedicated postgres servers? as big as required by shared buffer setting above -checkpoint_segments - this is crucial as one of the server is transaction heavy -vacuum_cost_delay Of course some values can only be estimated after database has been feed data and queries have been run in a production like manner. Cheers // John Ps. I sent to list before but the messages where withheld as I'm not a member of any of the restrict_post groups. This is perhaps due to the fact that we have changed email address a few weeks ago and there was a mismatch between addresses. So I apologize if any similar messages show up from me, just ignore them. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Tuning
Hello ! Am Freitag 26 Januar 2007 12:28 schrieb John Parnefjord: Hi! I'm planning to move from mysql to postgresql as I believe the latter performs better when it comes to complex queries. The mysql database that I'm running is about 150 GB in size, with 300 million rows in the largest table. We do quite a lot of statistical analysis on the data which means heavy queries that run for days. Now that I've got two new servers with 32GB of ram I'm eager to switch to postgresql to improve perfomance. One database is to be an analysis server and the other an OLTP server feeding a web site with pages. I'm setting for Postgresql 8.1 as it is available as a package in Debian Etch AMD64. As I'm new to postgresql I've googled to find some tips and found some interesting links how configure and tune the database manager. Among others I've found the PowerPostgresql pages with a performance checklist and annotated guide to postgresql.conf [http://www.powerpostgresql.com/]. And of course the postgresql site itself is a good way to start. RevSys have a short guide as well [http://www.revsys.com/writings/postgresql-performance.html] I just wonder if someone on this list have some tips from the real world how to tune postgresql and what is to be avoided. AFAIK the following parameters seems important to adjust to start with are: -work_mem -maintenance_work_mem - 50% of the largest table? -shared_buffers - max value 5 -effective_cache_size - max 2/3 of available ram, ie 24GB on the Do you use a Opteron with a NUMA architecture ? You could end up with switching pages between your memory nodes, which slowed down heavily my server (Tyan 2895, 2 x 275 cpu, 8 GB)... Try first to use only one numa node for your cache. hardware described above -shmmax - how large dare I set this value on dedicated postgres servers? -checkpoint_segments - this is crucial as one of the server is transaction heavy -vacuum_cost_delay Of course some values can only be estimated after database has been feed data and queries have been run in a production like manner. Cheers // John Ps. I sent to list before but the messages where withheld as I'm not a member of any of the restrict_post groups. This is perhaps due to the fact that we have changed email address a few weeks ago and there was a mismatch between addresses. So I apologize if any similar messages show up from me, just ignore them. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [PERFORM] how to plan for vacuum?
On Jan 25, 2007, at 10:33 AM, Ray Stell wrote: On Thu, Jan 25, 2007 at 08:04:49AM -0800, Joshua D. Drake wrote: It really depends on the system. Most of our systems run anywhere from 10-25ms. I find that any more than that, Vacuum takes too long. How do you measure the impact of setting it to 12 as opposed to 15? If you've got a tool that will report disk utilization as a percentage it's very easy; I'll decrease the setting until I'm at about 90% utilization with the system's normal workload (leaving some room for spikes, etc). Sometimes I'll also tune the costs if reads vs. writes are a concern. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Seqscan/Indexscan still a known issue?
Hi, I find various references in the list to this issue of queries being too slow because the planner miscalculates things and decides to go for a sequenctial scan when an index is available and would lead to better performance. Is this still an issue with the latest version? I'm doing some tests right now, but I have version 7.4 (and not sure when I will be able to spend the effort to move our system to 8.2). When I force it via set enable_seqscan to off, the index scan takes about 0.1 msec (as reported by explain analyze), whereas with the default, it chooses a seq. scan, for a total execution time around 10 msec!! (yes: 100 times slower!). The table has 20 thousand records, and the WHERE part of the query uses one field that is part of the primary key (as in, the primary key is the combination of field1,field2, and the query involves a where field1=1 and some_other_field=2). I don't think I'm doing something wrong, and I find no reason not to expect the query planner to choose an index scan. For the time being, I'm using an explicit enable_seqscan off in the client code, before executing the select. But I wonder: Is this still an issue, or has it been solved in the latest version? Thanks, Carlos -- ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Seqscan/Indexscan still a known issue?
Carlos Moreno skrev: When I force it via set enable_seqscan to off, the index scan takes about 0.1 msec (as reported by explain analyze), whereas For the time being, I'm using an explicit enable_seqscan off in the client code, before executing the select. But I wonder: Is this still an issue, or has it been solved in the latest version? For most queries it has never been an issue. Every once in a while there is a query that the planner makes a non-optimal plan for, but it's not that common. In general the optimizer has improved with every new version of pg. Almost everyone I've talked to that has upgraded has got a faster database tham before. It was like that for 7.4-8.0, for 8.0-8.1 and for 8.1-8.2. So in your case going from 7.4-8.2 is most likely going to give a speedup (especially if you have some queries that isn't just simple primary key lookups). In your case it's hard to give any advice since you didn't share the EXPLAIN ANALYZE output with us. I'm pretty sure it's possible to tune pg so it makes the right choice even for this query of yours but without the EXPLAIN ANALYZE output we would just be guessing anyway. If you want to share it then it might be helpful to show the plan both with and without seqscan enabled. How often do you run VACUUM ANALYZE; on the database? /Dennis ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Seqscan/Indexscan still a known issue?
Carlos Moreno wrote: Hi, I find various references in the list to this issue of queries being too slow because the planner miscalculates things and decides to go for a sequenctial scan when an index is available and would lead to better performance. Is this still an issue with the latest version? I'm doing some tests right now, but I have version 7.4 (and not sure when I will be able to spend the effort to move our system to 8.2). When I force it via set enable_seqscan to off, the index scan takes about 0.1 msec (as reported by explain analyze), whereas with the default, it chooses a seq. scan, for a total execution time around 10 msec!! (yes: 100 times slower!). The table has 20 thousand records, and the WHERE part of the query uses one field that is part of the primary key (as in, the primary key is the combination of field1,field2, and the query involves a where field1=1 and some_other_field=2). I don't think I'm doing something wrong, and I find no reason not to expect the query planner to choose an index scan. For the time being, I'm using an explicit enable_seqscan off in the client code, before executing the select. But I wonder: Is this still an issue, or has it been solved in the latest version? Please supply explain analyze for the query in both the index and sequence scan operation. We may be able to tell you why it's choosing the wrong options. Guess 1 would be that your primary key is int8, but can't be certain that is what's causing the problem. Regards Russell Smith Thanks, Carlos ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq