[PERFORM] Tuning

2007-01-26 Thread John Parnefjord

Hi!
 
I'm planning to move from mysql to postgresql as I believe the latter
performs better when it comes to complex queries. The mysql database
that I'm running is about 150 GB in size, with 300 million rows in the
largest table. We do quite a lot of statistical analysis on the data
which means heavy queries that run for days. Now that I've got two new
servers with 32GB of ram I'm eager to switch to postgresql to improve
perfomance. One database is to be an analysis server and the other an
OLTP server feeding a web site with pages. 
 
I'm setting for Postgresql 8.1 as it is available as a package in Debian
Etch AMD64.
 
As I'm new to postgresql I've googled to find some tips and found some
interesting links how configure and tune the database manager. Among
others I've found the PowerPostgresql pages with a performance checklist
and annotated guide to postgresql.conf
[http://www.powerpostgresql.com/]. And of course the postgresql site
itself is a good way to start. RevSys have a short guide as well
[http://www.revsys.com/writings/postgresql-performance.html]
 
I just wonder if someone on this list have some tips from the real world
how to tune postgresql and what is to be avoided. AFAIK the following
parameters seems important to adjust to start with are:

-work_mem
-maintenance_work_mem - 50% of the largest table?
-shared_buffers - max value 5
-effective_cache_size - max 2/3 of available ram, ie 24GB on the
hardware described above
-shmmax - how large dare I set this value on dedicated postgres servers?
-checkpoint_segments - this is crucial as one of the server is
transaction heavy
-vacuum_cost_delay

Of course some values can only be estimated after database has been feed
data and queries have been run in a production like manner. 

Cheers
// John

Ps. I sent to list before but the messages where withheld as I'm not a
member of any of the restrict_post groups. This is perhaps due to the
fact that we have changed email address a few weeks ago and there was a
mismatch between addresses. So I apologize if any similar messages show
up from me, just ignore them.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Tuning

2007-01-26 Thread Dave Cramer


On 26-Jan-07, at 6:28 AM, John Parnefjord wrote:



Hi!

I'm planning to move from mysql to postgresql as I believe the latter
performs better when it comes to complex queries. The mysql database
that I'm running is about 150 GB in size, with 300 million rows in the
largest table. We do quite a lot of statistical analysis on the data
which means heavy queries that run for days. Now that I've got two new
servers with 32GB of ram I'm eager to switch to postgresql to improve
perfomance. One database is to be an analysis server and the other an
OLTP server feeding a web site with pages.

I'm setting for Postgresql 8.1 as it is available as a package in  
Debian

Etch AMD64.

As I'm new to postgresql I've googled to find some tips and found some
interesting links how configure and tune the database manager. Among
others I've found the PowerPostgresql pages with a performance  
checklist

and annotated guide to postgresql.conf
[http://www.powerpostgresql.com/]. And of course the postgresql site
itself is a good way to start. RevSys have a short guide as well
[http://www.revsys.com/writings/postgresql-performance.html]

I just wonder if someone on this list have some tips from the real  
world

how to tune postgresql and what is to be avoided. AFAIK the following
parameters seems important to adjust to start with are:

-work_mem
-maintenance_work_mem - 50% of the largest table?

Isn't it possible for this to be larger than memory ?

-shared_buffers - max value 5
Where does this shared buffers maximum come from ? It's wrong it  
should be 1/4 of available memory (8G) to start and tuned from there



-effective_cache_size - max 2/3 of available ram, ie 24GB on the
hardware described above
-shmmax - how large dare I set this value on dedicated postgres  
servers?

as big as required by shared buffer setting above

-checkpoint_segments - this is crucial as one of the server is
transaction heavy
-vacuum_cost_delay

Of course some values can only be estimated after database has been  
feed

data and queries have been run in a production like manner.

Cheers
// John

Ps. I sent to list before but the messages where withheld as I'm  
not a

member of any of the restrict_post groups. This is perhaps due to the
fact that we have changed email address a few weeks ago and there  
was a
mismatch between addresses. So I apologize if any similar messages  
show

up from me, just ignore them.

---(end of  
broadcast)---

TIP 2: Don't 'kill -9' the postmaster




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Tuning

2007-01-26 Thread Anton Rommerskirchen
Hello !

Am Freitag 26 Januar 2007 12:28 schrieb John Parnefjord:
 Hi!

 I'm planning to move from mysql to postgresql as I believe the latter
 performs better when it comes to complex queries. The mysql database
 that I'm running is about 150 GB in size, with 300 million rows in the
 largest table. We do quite a lot of statistical analysis on the data
 which means heavy queries that run for days. Now that I've got two new
 servers with 32GB of ram I'm eager to switch to postgresql to improve
 perfomance. One database is to be an analysis server and the other an
 OLTP server feeding a web site with pages.

 I'm setting for Postgresql 8.1 as it is available as a package in Debian
 Etch AMD64.

 As I'm new to postgresql I've googled to find some tips and found some
 interesting links how configure and tune the database manager. Among
 others I've found the PowerPostgresql pages with a performance checklist
 and annotated guide to postgresql.conf
 [http://www.powerpostgresql.com/]. And of course the postgresql site
 itself is a good way to start. RevSys have a short guide as well
 [http://www.revsys.com/writings/postgresql-performance.html]

 I just wonder if someone on this list have some tips from the real world
 how to tune postgresql and what is to be avoided. AFAIK the following
 parameters seems important to adjust to start with are:

 -work_mem
 -maintenance_work_mem - 50% of the largest table?
 -shared_buffers - max value 5
 -effective_cache_size - max 2/3 of available ram, ie 24GB on the

Do you use a Opteron with a NUMA architecture ?

You could end up with switching pages between your memory nodes, which slowed 
down heavily my server (Tyan 2895, 2 x 275 cpu, 8 GB)...

Try first to use only one numa node for your cache.

 hardware described above
 -shmmax - how large dare I set this value on dedicated postgres servers?
 -checkpoint_segments - this is crucial as one of the server is
 transaction heavy
 -vacuum_cost_delay

 Of course some values can only be estimated after database has been feed
 data and queries have been run in a production like manner.

 Cheers
 // John

 Ps. I sent to list before but the messages where withheld as I'm not a
 member of any of the restrict_post groups. This is perhaps due to the
 fact that we have changed email address a few weeks ago and there was a
 mismatch between addresses. So I apologize if any similar messages show
 up from me, just ignore them.

 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [PERFORM] how to plan for vacuum?

2007-01-26 Thread Jim Nasby

On Jan 25, 2007, at 10:33 AM, Ray Stell wrote:

On Thu, Jan 25, 2007 at 08:04:49AM -0800, Joshua D. Drake wrote:


It really depends on the system. Most of our systems run anywhere  
from

10-25ms. I find that any more than that, Vacuum takes too long.



How do you measure the impact of setting it to 12 as opposed to 15?


If you've got a tool that will report disk utilization as a  
percentage it's very easy; I'll decrease the setting until I'm at  
about 90% utilization with the system's normal workload (leaving some  
room for spikes, etc). Sometimes I'll also tune the costs if reads  
vs. writes are a concern.

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[PERFORM] Seqscan/Indexscan still a known issue?

2007-01-26 Thread Carlos Moreno


Hi,

I find various references in the list to this issue of queries
being too slow because the planner miscalculates things and
decides to go for a sequenctial scan when an index is available
and would lead to better performance.

Is this still an issue with the latest version?   I'm doing some
tests right now, but I have version 7.4  (and not sure when I will
be able to spend the effort to move our system to 8.2).

When I force it via  set enable_seqscan to off, the index scan
takes about 0.1 msec  (as reported by explain analyze), whereas
with the default, it chooses a seq. scan, for a total execution
time around 10 msec!!  (yes: 100 times slower!).  The table has
20 thousand records, and the WHERE part of the query uses one
field that is part of the primary key  (as in, the primary key
is the combination of field1,field2, and the query involves a
where field1=1 and some_other_field=2).  I don't think I'm doing
something wrong, and I find no reason not to expect the query
planner to choose an index scan.

For the time being, I'm using an explicit enable_seqscan off
in the client code, before executing the select.  But I wonder:
Is this still an issue, or has it been solved in the latest
version?

Thanks,

Carlos
--


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Seqscan/Indexscan still a known issue?

2007-01-26 Thread Dennis Bjorklund

Carlos Moreno skrev:


When I force it via  set enable_seqscan to off, the index scan
takes about 0.1 msec  (as reported by explain analyze), whereas



For the time being, I'm using an explicit enable_seqscan off
in the client code, before executing the select.  But I wonder:
Is this still an issue, or has it been solved in the latest
version?


For most queries it has never been an issue. Every once in a while there 
is a query that the planner makes a non-optimal plan for, but it's not 
that common.


In general the optimizer has improved with every new version of pg.

Almost everyone I've talked to that has upgraded has got a faster 
database tham before. It was like that for 7.4-8.0, for 8.0-8.1 and 
for 8.1-8.2. So in your case going from 7.4-8.2 is most likely going 
to give a speedup (especially if you have some queries that isn't just 
simple primary key lookups).


In your case it's hard to give any advice since you didn't share the 
EXPLAIN ANALYZE output with us. I'm pretty sure it's possible to tune pg 
so it makes the right choice even for this query of yours but without 
the EXPLAIN ANALYZE output we would just be guessing anyway. If you want 
to share it then it might be helpful to show the plan both with and 
without seqscan enabled.


How often do you run VACUUM ANALYZE; on the database?

/Dennis

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Seqscan/Indexscan still a known issue?

2007-01-26 Thread Russell Smith

Carlos Moreno wrote:


Hi,

I find various references in the list to this issue of queries
being too slow because the planner miscalculates things and
decides to go for a sequenctial scan when an index is available
and would lead to better performance.

Is this still an issue with the latest version?   I'm doing some
tests right now, but I have version 7.4  (and not sure when I will
be able to spend the effort to move our system to 8.2).

When I force it via  set enable_seqscan to off, the index scan
takes about 0.1 msec  (as reported by explain analyze), whereas
with the default, it chooses a seq. scan, for a total execution
time around 10 msec!!  (yes: 100 times slower!).  The table has
20 thousand records, and the WHERE part of the query uses one
field that is part of the primary key  (as in, the primary key
is the combination of field1,field2, and the query involves a
where field1=1 and some_other_field=2).  I don't think I'm doing
something wrong, and I find no reason not to expect the query
planner to choose an index scan.

For the time being, I'm using an explicit enable_seqscan off
in the client code, before executing the select.  But I wonder:
Is this still an issue, or has it been solved in the latest
version?
Please supply explain analyze for the query in both the index and 
sequence scan operation.  We may be able to tell you why it's choosing 
the wrong options.  Guess 1 would be that your primary key is int8, but 
can't be certain that is what's causing the problem.


Regards

Russell Smith


Thanks,

Carlos



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq