Dear list,

First and foremost, thanks to the great number of people who have responded with various tips and suggestions. I am now starting to fully appreciate the various cache settings and what they can do for performance.

I just want to redefine the problem based on the knowledge of it that I now have.

In my example, I am purposefully forcing a full table scan - select count (*) from table. This table has only 90,000 rows. Each row is comprised of about 300 bytes of data.

If the table has not been cached, I was seeing response times from 5 to 18 seconds to read the table. If it had been cached, then the response time dropped to sub-second response.

Obviously, I can tune the caching so as to make sure that as much data that can be reasonably cached is cached. However, I don't think that a hit of even 5 seconds to read a table of 90,000 rows is acceptable.

One thing that has been tried with some success was to dump the table and recreate it. After this exercise, selecting all rows from the table when it is not in cache takes about 3 seconds. (Of course, when in cache, the same sub-second response time is seen.)

I still think that 3 seconds is not acceptable. However, I reserve the right to be wrong. Does it sound unrealistic to expect PostgreSQL to be able to read 90,000 rows with 300 bytes per row in under a second?

Based on suggestions from the list, I am also thinking of making the following tuning changes:

shared_buffers = 15000
sort_mem = 32168
effective_cache_size = 64000

This is based on one gig of memory.

Does anyone have any feedback on these values? Also, realizing that no two database are the same, etc., etc... does anyone have a good formula for setting these values?

Thanks in advance,


At 09:09 AM 9/16/2003 +0200, Manfred Koizar wrote:
On Mon, 15 Sep 2003 22:26:45 -0400, Christopher Browne
>> select count (*) from table;
>The only possible plan for THAT query will involve a seq scan of the
>whole table.  If the postmaster already has the data in cache, it
>makes sense for it to run in 1 second.  If it has to read it from
>disk, 12 seconds makes a lot of sense.

Yes.  And note that the main difference is between having the data in
memory and having to fetch it from disk.  I don't believe that this
difference can be explained by 9000 read calls hitting the operating
system's cache.

>You might want to increase the "shared_buffers" parameter in
>postgresql.conf; that should lead to increased stability of times as
>it should be more likely that the data in "table" will remain in

Let's not jump to this conclusion before we know what's going on.

Joseph Bove <[EMAIL PROTECTED]> wrote in another message above:
| I did have shared_buffers and sort_mem both set higher originally (15000,
| 32168)

As I read this I think he meant "... and had the same performance

Joseph, what do you get, if you run that
        EXPLAIN ANALYSE SELECT count(*) ...
several times?  What do vmstat and top show while the query is
running?  Are there other processes active during or between the runs?
What kind of processes?  Postgres backends?  Web server? ...


---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
      joining column's datatypes do not match

---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Reply via email to