Re: [PERFORM] which Xeon processors don't have the context switching problem

2007-02-23 Thread Geoffrey

Guillaume Smet wrote:

On 2/23/07, Geoffrey <[EMAIL PROTECTED]> wrote:

As I've heard.  We're headed for 8 as soon as possible, but until we get
our code ready, we're on 7.4.16.


You should move to at least 8.1 and possibly 8.2. It's not a good idea
to upgrade only to 8 IMHO.


When I said 8, I meant whatever the latest greatest 8 is.  Right now, 
that looks like 8.2.3.


--
Until later, Geoffrey

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
 - Benjamin Franklin

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] pg_trgm performance

2007-02-23 Thread Guillaume Smet

Hi Steinar,

On 2/24/07, Steinar H. Gunderson <[EMAIL PROTECTED]> wrote:

I'm sorry, I can no longer remember where I needed pg_trgm. Simple testing of
your patch seems to indicate that the GiN version is about 65% _slower_ (18ms
vs. 30ms) for a test data set I found lying around, but I remember that on
the data set I needed it, the GIST version was a lot slower than that (think
3-400ms). The 18 vs. 30ms test is a random Amarok database, on 8.2.3
(Debian).


Could you post EXPLAIN ANALYZE for both queries (after 2 or 3 runs)?
And if you can provide  EXPLAIN ANALYZE for a couple of searches
(short length, medium length and long) in both cases, it could be nice
too.

The GiN version is not selective enough currently compared to GiST. It
generally finds the matching rows faster but it has a slower recheck
cond so it's sometimes interesting (in my case) and sometimes not that
interesting (it seems to be your case).

Thanks.

--
Guillaume

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] which Xeon processors don't have the context switching problem

2007-02-23 Thread Magnus Hagander
Alvaro Herrera wrote:
> Steinar H. Gunderson wrote:
>> On Fri, Feb 23, 2007 at 02:05:57PM -0500, Geoffrey wrote:
>>> In searching the archives, I can't find any specific info indentifying 
>>> which Xeon processors don't have this problem.
>> AFAIK the cut-off point is at the Woodcrests. They are overall much better
>> suited to PostgreSQL than the older Xeons were.
>>
>> It's slightly unfortunate that AMD and Intel cling to the Opteron and Xeon
>> names even though they're making significant architecture changes, but that's
>> life, I guess.
> 
> AFAIR Intel has been calling their server processors Xeon since Pentium
> Pro's, at least.
> 
Almost. Xeon was the new name for the "Pro" series. Instead of Pentium
II Pro, we got Pentium II Xeon. The whole Pentium Pro line was a server
line, which is why initial Pentium-II CPUs were significantly slower for
server apps than the much older ppro (which still runs pg at a
reasonable speed if you have enough of them and a low budget, btw)

//Magnus

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] pg_trgm performance

2007-02-23 Thread Steinar H. Gunderson
On Sat, Feb 24, 2007 at 12:09:41AM +0100, Guillaume Smet wrote:
> Could you try to see if the GIN implementation of pg_trgm is faster in
> your cases?

I'm sorry, I can no longer remember where I needed pg_trgm. Simple testing of
your patch seems to indicate that the GiN version is about 65% _slower_ (18ms
vs. 30ms) for a test data set I found lying around, but I remember that on
the data set I needed it, the GIST version was a lot slower than that (think
3-400ms). The 18 vs. 30ms test is a random Amarok database, on 8.2.3
(Debian).

Sorry I couldn't be of more help.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] pg_trgm performance

2007-02-23 Thread Guillaume Smet

Florian, Steinar,

Could you try to see if the GIN implementation of pg_trgm is faster in
your cases?

Florian, instead of using WHERE similarity(...) > 0.4, you should use
set_limit (SELECT set_limit(0.4);).

I posted it on -patches and it is available here:
http://people.openwide.fr/~gsmet/postgresql/pg_trgm_gin3.diff .

The patch is against HEAD but It applies on 8.2 without any problem.

After applying this patch and installing pg_trgm.sql (you should
uninstall pg_trgm before compiling using the old uninstall script),
then you can create:
CREATE INDEX idx_table_word ON table USING gin(word gin_trgm_ops);

17 characters is quite long so I'm not sure it will help you because
it usually has to recheck a high number of rows due to the GIN
implementation but I'd like to know if it's faster or slower in this
case.

If your data are not private and you don't have the time to test it, I
can test it here without any problem.

Thanks.

--
Guillaume

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] long checkpoint_timeout

2007-02-23 Thread Jim C. Nasby
On Fri, Feb 23, 2007 at 12:23:08PM -0800, Jeff Davis wrote:
> On Fri, 2007-02-23 at 14:02 -0600, Jim C. Nasby wrote:
> > > say that checkpoints cause extra disk I/O. Is there a good way to
> > > measure how much extra I/O (and WAL volume) is caused by the
> > > checkpoints? Also, it would be good to know how much total I/O is caused
> > > by a checkpoint so that I know if bgwriter is doing it's job.
> > 
> > There's a patch someone just came up with that provides additional debug
> > info about both bgwriter operation and checkpoints. I know it will at
> > least tell you how much was written out by a checkpoint.
> 
> Excellent, that would answer a lot of my questions. I did some brief
> searching and nothing turned up. Do you have a link to the discussion or
> the patch?

http://archives.postgresql.org/pgsql-hackers/2007-02/msg01083.php
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] which Xeon processors don't have the context switching problem

2007-02-23 Thread Guillaume Smet

On 2/23/07, Geoffrey <[EMAIL PROTECTED]> wrote:

As I've heard.  We're headed for 8 as soon as possible, but until we get
our code ready, we're on 7.4.16.


You should move to at least 8.1 and possibly 8.2. It's not a good idea
to upgrade only to 8 IMHO.

--
Guillaume

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] which Xeon processors don't have the context switching problem

2007-02-23 Thread Guillaume Smet

On 2/23/07, Joshua D. Drake <[EMAIL PROTECTED]> wrote:

Also isn't it pretty much *not* a problem with current versions of
PostgreSQL?


We had a really *big* scalability problem with a quad Xeon MP 2.2 and
PostgreSQL 7.4. The problem is mostly gone since we upgraded to 8.1 a
year ago.

Woodcrest seems to perform really well with PostgreSQL according to
what I can read on the Internet so we will probably change the server
for a dual Woodcrest in a few months.

--
Guillaume

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] which Xeon processors don't have the context switching problem

2007-02-23 Thread Geoffrey

Joshua D. Drake wrote:

Josh Berkus wrote:

Geoffrey,


I recall a reference on the list indicating that newer Xeon processors
don't suffer from the context switching problem reported last year.
Just to be clear, it's a software problem which affects all architectures, 
including AMD and Sparc.  It's just *worse* on the PIII and P4 generation 
Xeons.




Also isn't it pretty much *not* a problem with current versions of
PostgreSQL?


As I've heard.  We're headed for 8 as soon as possible, but until we get 
our code ready, we're on 7.4.16.


--
Until later, Geoffrey

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
 - Benjamin Franklin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] which Xeon processors don't have the context switching problem

2007-02-23 Thread Joshua D. Drake
Josh Berkus wrote:
> Geoffrey,
> 
>> I recall a reference on the list indicating that newer Xeon processors
>> don't suffer from the context switching problem reported last year.
> 
> Just to be clear, it's a software problem which affects all architectures, 
> including AMD and Sparc.  It's just *worse* on the PIII and P4 generation 
> Xeons.
> 

Also isn't it pretty much *not* a problem with current versions of
PostgreSQL?

Joshua D. Drake

-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] which Xeon processors don't have the context switching problem

2007-02-23 Thread Geoffrey

Josh Berkus wrote:

Geoffrey,


I recall a reference on the list indicating that newer Xeon processors
don't suffer from the context switching problem reported last year.


Just to be clear, it's a software problem which affects all architectures, 
including AMD and Sparc.  It's just *worse* on the PIII and P4 generation 
Xeons.


Thanks, that's what I need to hear.  They've since cut a deal for 
Operton based hardware, so the point is now moot.


--
Until later, Geoffrey

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
 - Benjamin Franklin

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] which Xeon processors don't have the context switching problem

2007-02-23 Thread Josh Berkus
Geoffrey,

> I recall a reference on the list indicating that newer Xeon processors
> don't suffer from the context switching problem reported last year.

Just to be clear, it's a software problem which affects all architectures, 
including AMD and Sparc.  It's just *worse* on the PIII and P4 generation 
Xeons.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] long checkpoint_timeout

2007-02-23 Thread Jeff Davis
On Fri, 2007-02-23 at 14:02 -0600, Jim C. Nasby wrote:
> > say that checkpoints cause extra disk I/O. Is there a good way to
> > measure how much extra I/O (and WAL volume) is caused by the
> > checkpoints? Also, it would be good to know how much total I/O is caused
> > by a checkpoint so that I know if bgwriter is doing it's job.
> 
> There's a patch someone just came up with that provides additional debug
> info about both bgwriter operation and checkpoints. I know it will at
> least tell you how much was written out by a checkpoint.

Excellent, that would answer a lot of my questions. I did some brief
searching and nothing turned up. Do you have a link to the discussion or
the patch?

Regards,
Jeff Davis


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] long checkpoint_timeout

2007-02-23 Thread Jim C. Nasby
On Fri, Feb 23, 2007 at 10:14:29AM -0800, Jeff Davis wrote:
> The postgresql.conf says that the maximum checkpoint_timeout is 1 hour.
> However, the following messages seem to suggest that it may be useful to
> set the value significantly higher to reduce unnecessary WAL volume:
> 
> http://archives.postgresql.org/pgsql-hackers/2006-10/msg00527.php
> http://archives.postgresql.org/pgsql-hackers/2006-08/msg01190.php
> 
> Is there a reason for the hour-long limit on checkpoint_timeout? Is
> there a cost to doing so, aside from potentially longer recovery time?
> 
> As I understand it, the background writer keeps the I/O more balanced
> anyway, avoiding I/O spikes at checkpoint. 
> 
> I don't need the checkpoint time to be higher than 1 hour, but I'm
> trying to understand the reasoning behind the limit and the implications
> of a longer checkpoint_timeout.
> 
> The docs here:
> 
> http://www.postgresql.org/docs/current/static/wal-configuration.html
> 
> say that checkpoints cause extra disk I/O. Is there a good way to
> measure how much extra I/O (and WAL volume) is caused by the
> checkpoints? Also, it would be good to know how much total I/O is caused
> by a checkpoint so that I know if bgwriter is doing it's job.

There's a patch someone just came up with that provides additional debug
info about both bgwriter operation and checkpoints. I know it will at
least tell you how much was written out by a checkpoint.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] which Xeon processors don't have the context switching problem

2007-02-23 Thread Steinar H. Gunderson
On Fri, Feb 23, 2007 at 04:53:18PM -0300, Alvaro Herrera wrote:
>> It's slightly unfortunate that AMD and Intel cling to the Opteron and Xeon
>> names even though they're making significant architecture changes, but that's
>> life, I guess.
> AFAIR Intel has been calling their server processors Xeon since Pentium
> Pro's, at least.

Yes, that was sort of my point. :-)

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] which Xeon processors don't have the context switching problem

2007-02-23 Thread Alvaro Herrera
Steinar H. Gunderson wrote:
> On Fri, Feb 23, 2007 at 02:05:57PM -0500, Geoffrey wrote:
> > In searching the archives, I can't find any specific info indentifying 
> > which Xeon processors don't have this problem.
> 
> AFAIK the cut-off point is at the Woodcrests. They are overall much better
> suited to PostgreSQL than the older Xeons were.
> 
> It's slightly unfortunate that AMD and Intel cling to the Opteron and Xeon
> names even though they're making significant architecture changes, but that's
> life, I guess.

AFAIR Intel has been calling their server processors Xeon since Pentium
Pro's, at least.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] which Xeon processors don't have the context switching problem

2007-02-23 Thread Steinar H. Gunderson
On Fri, Feb 23, 2007 at 02:05:57PM -0500, Geoffrey wrote:
> In searching the archives, I can't find any specific info indentifying 
> which Xeon processors don't have this problem.

AFAIK the cut-off point is at the Woodcrests. They are overall much better
suited to PostgreSQL than the older Xeons were.

It's slightly unfortunate that AMD and Intel cling to the Opteron and Xeon
names even though they're making significant architecture changes, but that's
life, I guess.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] which Xeon processors don't have the context switching problem

2007-02-23 Thread Claus Guttesen

I recall a reference on the list indicating that newer Xeon processors
don't suffer from the context switching problem reported last year.

In searching the archives, I can't find any specific info indentifying
which Xeon processors don't have this problem.

Anyone point me to a reference?


We recently migrated to a woodcrest @ 3 GHz from a 2 Ghz opteron. The
woodcrest seems to be enjoying doing db-related work. I don't have
numbers other than load is much lower now.


Is this in any way related to the version of Postgresql one is running?
  We're headed for 8, but have a bit of work before we can get there.
We are currently on 7.4.16.


We are at 7.4.14 which works fine atm.

regards
Claus

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[PERFORM] which Xeon processors don't have the context switching problem

2007-02-23 Thread Geoffrey
I recall a reference on the list indicating that newer Xeon processors 
don't suffer from the context switching problem reported last year.


In searching the archives, I can't find any specific info indentifying 
which Xeon processors don't have this problem.


Anyone point me to a reference?

Is this in any way related to the version of Postgresql one is running? 
 We're headed for 8, but have a bit of work before we can get there. 
We are currently on 7.4.16.


Thanks for any info.
--
Until later, Geoffrey

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
 - Benjamin Franklin

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] long checkpoint_timeout

2007-02-23 Thread Jeff Davis
The postgresql.conf says that the maximum checkpoint_timeout is 1 hour.
However, the following messages seem to suggest that it may be useful to
set the value significantly higher to reduce unnecessary WAL volume:

http://archives.postgresql.org/pgsql-hackers/2006-10/msg00527.php
http://archives.postgresql.org/pgsql-hackers/2006-08/msg01190.php

Is there a reason for the hour-long limit on checkpoint_timeout? Is
there a cost to doing so, aside from potentially longer recovery time?

As I understand it, the background writer keeps the I/O more balanced
anyway, avoiding I/O spikes at checkpoint. 

I don't need the checkpoint time to be higher than 1 hour, but I'm
trying to understand the reasoning behind the limit and the implications
of a longer checkpoint_timeout.

The docs here:

http://www.postgresql.org/docs/current/static/wal-configuration.html

say that checkpoints cause extra disk I/O. Is there a good way to
measure how much extra I/O (and WAL volume) is caused by the
checkpoints? Also, it would be good to know how much total I/O is caused
by a checkpoint so that I know if bgwriter is doing it's job.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Recommended Initial Settings

2007-02-23 Thread Bill Moran
In response to "Campbell, Lance" <[EMAIL PROTECTED]>:

> Richard,
> Thanks for your reply.  
> 
> You said:
> "Your operating-system should be doing the caching for you."
> 
> My understanding is that as long as Linux has memory available it will
> cache files.  Then from your comment I get the impression that since
> Linux would be caching the data files for the postgres database it would
> be redundant to have a large shared_buffers.  Did I understand you
> correctly?

Keep in mind that keeping the data in the kernel's buffer requires
Postgres to make a syscall to read a file, which the kernel then realizes
is cached in memory.  The kernel then has to make that data available
to the Postgres (userland) process.

If the data is in Postgres' buffers, Postgres can fetch it directly, thus
avoiding the overhead of the syscalls and the kernel activity.  You still
have to make sysvshm calls, though.

So, it depends on which is able to manage the memory better.  Is the
kernel so much more efficient that it makes up for the overhead of the
syscalls?  My understanding is that in recent versions of Postgres,
this is not the case, and large shared_buffers improve performance.
I've yet to do any in-depth testing on this, though.

-- 
Bill Moran
Collaborative Fusion Inc.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Very slow bytea data extraction

2007-02-23 Thread [EMAIL PROTECTED]
>are you getting the data from the local box or from a remote site?

Everything is on the local box.

>also explain analyze is showing 
nothing slow but you did not post the
>enitre output. also, try the 
\timing switch in psql.

Actually a line was missing: Total runtime: 
0.337 ms.

Massimo


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Very slow bytea data extraction

2007-02-23 Thread [EMAIL PROTECTED]

>If you look at the "actual time" it's completing very quickly indeed. 
So 
>- it must be something to do with either:
>1. Fetching/formatting 
the data
>>2. Transferring the data to the client.

I do agree.

>What 
happens if you only select half the rows? Does the time to run the 
>select halve?

Yes, it does.
Using pgAdmin, the time to get all 36 
rows is about 67500ms while it's 24235ms to get only 18 rows.

Massimo


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Recommended Initial Settings

2007-02-23 Thread Jim C. Nasby
If you're doing much updating at all you'll also want to bump up
checkpoint_segments. I like setting checkpoint_warning just a bit under
checkpoint_timeout as a way to monitor how often you're checkpointing
due to running out of segments.

With a large shared_buffers you'll likely need to make the bgwriter more
aggressive as well (increase the max_pages numbers), though how
important that is depends on how much updating you're doing. If you see
periodic spikes in IO corresponding to checkpoints, that's an indication
bgwriter isn't doing a good enough job.

If everything ends up in memory, it might be good to decrease
random_page_cost to 1 or something close to it; though the database
should just rely on effective_cache to figure out that everything's in
memory.

If you're on pre-8.2, you'll want to cut all the autovacuum parameters
in half, if you're using it.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Using the 8.2 autovacuum values with 8.1

2007-02-23 Thread Jim C. Nasby
On Fri, Feb 23, 2007 at 10:13:31AM +0100, Csaba Nagy wrote:
> You likely don't need the nightly full vacuum run... we also do here a
> nightly vacuum beside autovacuum, but not a full one, only for tables
> which are big enough that we don't want autovacuum to touch them in high
> business time but they have enough change that we want a vacuum on them
> frequent enough. I discover them by checking the stats, for example:

Something else I like doing is a periodic vacuumdb -av and capture the
output. It's a good way to keep an eye on FSM utilization. Once you've
got vacuuming under control you can probably just do that once a month
or so.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Recommended Initial Settings

2007-02-23 Thread Richard Huxton

Campbell, Lance wrote:

Richard,
Thanks for your reply.  


You said:
"Your operating-system should be doing the caching for you."

My understanding is that as long as Linux has memory available it will
cache files.  Then from your comment I get the impression that since
Linux would be caching the data files for the postgres database it would
be redundant to have a large shared_buffers.  Did I understand you
correctly?


That's right - PG works with the O.S. This means it *might* not be a big 
advantage to have a large shared_buffers.


On older versions of PG, the buffer management code wasn't great with 
large shared_buffers values too.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Recommended Initial Settings

2007-02-23 Thread Campbell, Lance
Richard,
Thanks for your reply.  

You said:
"Your operating-system should be doing the caching for you."

My understanding is that as long as Linux has memory available it will
cache files.  Then from your comment I get the impression that since
Linux would be caching the data files for the postgres database it would
be redundant to have a large shared_buffers.  Did I understand you
correctly?

Thanks,



Lance Campbell
Project Manager/Software Architect
Web Services at Public Affairs
University of Illinois
217.333.0382
http://webservices.uiuc.edu
 
-Original Message-
From: Richard Huxton [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 23, 2007 10:29 AM
To: Campbell, Lance
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Recommended Initial Settings

Campbell, Lance wrote:
> I would like to get someone's recommendations on the best initial
> settings for a dedicated PostgreSQL server.  I do realize that there
are
> a lot of factors that influence how one should configure a database.
I
> am just looking for a good starting point.  Ideally I would like the
> database to reside as much as possible in memory with no disk access.
> The current database size of my 7.x version of PostgreSQL generates a
6
> Gig file when doing a database dump.

Your operating-system should be doing the caching for you.

> Dedicated PostgreSQL 8.2 Server
> Redhat Linux 4.x AS 64 bit version (EM64T)
> 4 Intel Xeon Processors

If these are older Xeons, check the mailing list archives for "xeon 
context switch".

> 20 Gig Memory
> Current PostgreSQL database is 6 Gig file when doing a database dump

OK, so it's plausible the whole thing will fit in RAM (as a 
rule-of-thumb I assume headers, indexes etc. triple or quadruple the 
size). To know better, check the actual disk-usage of $PGDATA.

> /etc/sysctl.conf  file settings:
> 
> # 11 Gig
> 
> kernel.shmmax = 11811160064

Hmm - that's a lot of shared RAM. See shared_buffers below.

> kernel.sem = 250 32000 100 128
> 
> net.ipv4.ip_local_port_range = 1024 65000
> 
> net.core.rmem_default = 262144  
> 
> net.core.rmem_max = 262144 
> 
> net.core.wmem_default = 262144
> 
> net.core.wmem_max = 262144  

> postgresql.conf file settings (if not listed then I used the
defaults):
> 
> max_connections = 300

How many connections do you expect typically/peak? It doesn't cost much 
to have max_connections set high but your workload is the most important

thing missing from your question.

> shared_buffers = 10240MB

For 7.x that's probably way too big, but 8.x organises its buffers 
better. I'd still be tempted to start a 1 or 2GB and work up - see where

it stops buying you an improvement.

> work_mem = 10MB

If you have large queries, doing big sorts I'd increase this. Don't 
forget it's per-sort, so if you have got about 300 connections live at 
any one time that could be 300*10MB*N if they're all doing something 
complicated. If you only have one connection live, you can increase this

quite substantially.

> effective_cache_size = 512MB

This isn't setting PG's memory usage, it's telling PG how much data your

operating-system is caching. Check "free" and see what it says. For you,

I'd expect 10GB+.

> maintenance_work_mem = 100MB

This is for admin-related tasks, so you could probably increase it.

Workload workload workload - we need to know what you're doing with it. 
Once connection summarising the entire database will want larger numbers

than 100 connections running many small queries.

HTH
-- 
   Richard Huxton
   Archonet Ltd

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: R: [PERFORM] Very slow bytea data extraction

2007-02-23 Thread Merlin Moncure

On 2/23/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

Thanks for your reply,


>Is it in executing the query (what does
EXPLAIN ANALYSE show)?

Here is the output of explain analyze SELECT *
FROM "FILE"

"Seq Scan on "FILE"  (cost=0.00..1.36 rows=36 width=235)
(actual time=0.023..0.107 rows=36 loops=1)"


>How are you accessing
the database: odbc,jdbc,other?
>Does it do this with psql too?

The
problem is the same when I access the db with jdbc, pgAdmin and even
psql



are you getting the data from the local box or from a remote site?
also explain analyze is showing nothing slow but you did not post the
enitre output. also, try the \timing switch in psql.

merlin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Recommended Initial Settings

2007-02-23 Thread Richard Huxton

Campbell, Lance wrote:

I would like to get someone's recommendations on the best initial
settings for a dedicated PostgreSQL server.  I do realize that there are
a lot of factors that influence how one should configure a database.  I
am just looking for a good starting point.  Ideally I would like the
database to reside as much as possible in memory with no disk access.
The current database size of my 7.x version of PostgreSQL generates a 6
Gig file when doing a database dump.


Your operating-system should be doing the caching for you.


Dedicated PostgreSQL 8.2 Server
Redhat Linux 4.x AS 64 bit version (EM64T)
4 Intel Xeon Processors


If these are older Xeons, check the mailing list archives for "xeon 
context switch".



20 Gig Memory
Current PostgreSQL database is 6 Gig file when doing a database dump


OK, so it's plausible the whole thing will fit in RAM (as a 
rule-of-thumb I assume headers, indexes etc. triple or quadruple the 
size). To know better, check the actual disk-usage of $PGDATA.



/etc/sysctl.conf  file settings:

# 11 Gig

kernel.shmmax = 11811160064


Hmm - that's a lot of shared RAM. See shared_buffers below.


kernel.sem = 250 32000 100 128

net.ipv4.ip_local_port_range = 1024 65000

net.core.rmem_default = 262144  

net.core.rmem_max = 262144 


net.core.wmem_default = 262144

net.core.wmem_max = 262144  



postgresql.conf file settings (if not listed then I used the defaults):

max_connections = 300


How many connections do you expect typically/peak? It doesn't cost much 
to have max_connections set high but your workload is the most important 
thing missing from your question.



shared_buffers = 10240MB


For 7.x that's probably way too big, but 8.x organises its buffers 
better. I'd still be tempted to start a 1 or 2GB and work up - see where 
it stops buying you an improvement.



work_mem = 10MB


If you have large queries, doing big sorts I'd increase this. Don't 
forget it's per-sort, so if you have got about 300 connections live at 
any one time that could be 300*10MB*N if they're all doing something 
complicated. If you only have one connection live, you can increase this 
quite substantially.



effective_cache_size = 512MB


This isn't setting PG's memory usage, it's telling PG how much data your 
operating-system is caching. Check "free" and see what it says. For you, 
I'd expect 10GB+.



maintenance_work_mem = 100MB


This is for admin-related tasks, so you could probably increase it.

Workload workload workload - we need to know what you're doing with it. 
Once connection summarising the entire database will want larger numbers 
than 100 connections running many small queries.


HTH
--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] Recommended Initial Settings

2007-02-23 Thread Campbell, Lance
I would like to get someone's recommendations on the best initial
settings for a dedicated PostgreSQL server.  I do realize that there are
a lot of factors that influence how one should configure a database.  I
am just looking for a good starting point.  Ideally I would like the
database to reside as much as possible in memory with no disk access.
The current database size of my 7.x version of PostgreSQL generates a 6
Gig file when doing a database dump.

 

Dedicated PostgreSQL 8.2 Server

Redhat Linux 4.x AS 64 bit version (EM64T)

4 Intel Xeon Processors

20 Gig Memory

Current PostgreSQL database is 6 Gig file when doing a database dump

 

 

/etc/sysctl.conf  file settings:

 

# 11 Gig

kernel.shmmax = 11811160064

 

kernel.sem = 250 32000 100 128

net.ipv4.ip_local_port_range = 1024 65000

net.core.rmem_default = 262144  

net.core.rmem_max = 262144 

net.core.wmem_default = 262144

net.core.wmem_max = 262144  

 

 

postgresql.conf file settings (if not listed then I used the defaults):

 

max_connections = 300

shared_buffers = 10240MB

work_mem = 10MB

effective_cache_size = 512MB

maintenance_work_mem = 100MB

 

 

Any suggestions would be appreciated!

 

Thanks,

 

Lance Campbell

Project Manager/Software Architect

Web Services at Public Affairs

University of Illinois

217.333.0382

http://webservices.uiuc.edu

 



Re: R: [PERFORM] Very slow bytea data extraction

2007-02-23 Thread Richard Huxton

[EMAIL PROTECTED] wrote:

Thanks for your reply,


Is it in executing the query (what does 

EXPLAIN ANALYSE show)?

Here is the output of explain analyze SELECT * 
FROM "FILE"


"Seq Scan on "FILE"  (cost=0.00..1.36 rows=36 width=235) 
(actual time=0.023..0.107 rows=36 loops=1)"


If you look at the "actual time" it's completing very quickly indeed. So 
- it must be something to do with either:

1. Fetching/formatting the data
2. Transferring the data to the client.

What happens if you only select half the rows? Does the time to run the 
select halve?


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


R: [PERFORM] Very slow bytea data extraction

2007-02-23 Thread [EMAIL PROTECTED]
Thanks for your reply,


>Is it in executing the query (what does 
EXPLAIN ANALYSE show)?

Here is the output of explain analyze SELECT * 
FROM "FILE"

"Seq Scan on "FILE"  (cost=0.00..1.36 rows=36 width=235) 
(actual time=0.023..0.107 rows=36 loops=1)"


>How are you accessing 
the database: odbc,jdbc,other?
>Does it do this with psql too?

The 
problem is the same when I access the db with jdbc, pgAdmin and even 
psql


Massimo


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Very slow bytea data extraction

2007-02-23 Thread Richard Huxton

[EMAIL PROTECTED] wrote:

Hi all,
I'm using Postgresql 8.2.3 on a Windows XP system.

I need to 
write and retrieve bytea data from a table.
The problem is that, while 
data insertion is quite fast, bytea extraction is very slow.
I'm trying 
to store a 250KB image into the bytea field.
A simple select query on a 
36-row table takes more than one minute to execute.


Where is the problem?

Is it in executing the query (what does EXPLAIN ANALYSE show)?
Is it in fetching/formatting the data (what does the equivalent COUNT(*) 
show)?

How are you accessing the database: odbc,jdbc,other?
Does it do this with psql too?

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Very slow bytea data extraction

2007-02-23 Thread [EMAIL PROTECTED]
Hi all,
I'm using Postgresql 8.2.3 on a Windows XP system.

I need to 
write and retrieve bytea data from a table.
The problem is that, while 
data insertion is quite fast, bytea extraction is very slow.
I'm trying 
to store a 250KB image into the bytea field.
A simple select query on a 
36-row table takes more than one minute to execute.

Any help would be 
very appreciated

Thanks in advance

Massimo


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Using the 8.2 autovacuum values with 8.1

2007-02-23 Thread Csaba Nagy
On Thu, 2007-02-22 at 22:53, Mark Stosberg wrote:
> Thanks to everyone for the feedback about vacuuming. It's been very
> useful. The pointers to the pgstattuple and Pgfouine tools were also
> helpful.
> 
> I'm now considering the following plan for trying Autovacuuming again
> with 8.1. I'd like any peer review you have to offer of the following:
> 
> 1. First, I'll move the settings to match the defaults in 8.2. The ones
> I noticed in particular were:
> 
> autovacuum_vacuum_threshold changes: 1000 -> 500
> autovacuum_anayze_threshold changes: 500 -> 250
> autovacuum_scale_factor  changes:  .4 -> .2
> autovacuum_analyze_scale_factor changes .2 -> .1
> 
> 2. Try the vacuum cost delay feature, starting with a 20ms value:
> 
> autovacuum_vacuum_cost_delay = 20
> 
> 3. Immediately add a row to pg_autovacuum for a huge logging table that
> would be too slow to vacuum usually. We'll still vacuum it once a week
> for good measure by cron.
> 
> 4. For good measure, I think I still keep the nightly cron entry that
> does a complete vacuum analyze (except for that large table...).
> 
> Seem like a reasonable plan?

You likely don't need the nightly full vacuum run... we also do here a
nightly vacuum beside autovacuum, but not a full one, only for tables
which are big enough that we don't want autovacuum to touch them in high
business time but they have enough change that we want a vacuum on them
frequent enough. I discover them by checking the stats, for example:

SELECT 
c.relname,
c.reltuples::bigint as rowcnt,
pg_stat_get_tuples_inserted(c.oid) AS inserted, 
pg_stat_get_tuples_updated(c.oid) AS updated, 
pg_stat_get_tuples_deleted(c.oid) AS deleted
FROM pg_class c
WHERE c.relkind = 'r'::"char"
GROUP BY c.oid, c.relname, c.reltuples
HAVING pg_stat_get_tuples_updated(c.oid) +
pg_stat_get_tuples_deleted(c.oid) > 1000
ORDER BY pg_stat_get_tuples_updated(c.oid) +
pg_stat_get_tuples_deleted(c.oid) DESC;


The top tables in this list for which the (deleted + updated) / rowcnt 
is relatively small but still significant need your attention for
nightly vacuum... the rest is handled just fine by autovacuum.

On the other end of the scale, if you have tables for which the
deletion/update rate is way higher then the row count, that's likely a
hot-spot table which you probably need extra vacuuming during the day.

Cheers,
Csaba.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster