Re: [PERFORM] Bad performance of SELECT ... where id IN (...)

2009-09-26 Thread Claus Guttesen
 I have a big performance problem in my SQL select query:

 
 select * from event where user_id in
 (500,499,498,497,496,495,494,493,492,491,490,489,488,487,486,485,484,483,482,481,480,479,478,477,476,475,474,473,472,471,470,469,468,467,466,465,464,463,462,461,460,459,458,457,456,455,454,453,452,451,450,449,448,447,446,445,444,443,442,441,440,439,438,437,436,435,434,433,432,431,430,429,428,427,426,425,424,423,422,421,420,419,418,417,416,415,414,413,412,411,410,409,408,407,406,405,404,403,402,401,
 400,399,398,397,396,395,394,393,392,391,390,389,388,387,386,385,384,383,382,381,380,379,378,377,376,375,374,373,372,371,370,369,368,367,366,365,364,363,362,361,360,359,358,357,356,355,354,353,352,351,350,349,348,347,346,345,344,343,342,341,340,339,338,337,336,335,334,333,332,331,330,329,328,327,326,325,324,323,322,321,320,319,318,317,316,315,314,313,312,311,310,309,308,307,306,305,304,303,302,301,
 300,299,298,297,296,295,294,293,292,291,290,289,288,287,286,285,284,283,282,281,280,279,278,277,276,275,274,273,272,271,270,269,268,267,266,265,264,263,262,261,260,259,258,257,256,255,254,253,252,251,250,249,248,247,246,245,244,243,242,241,240,239,238,237,236,235,234,233,232,231,230,229,228,227,226,225,224,223,222,221,220,219,218,217,216,215,214,213,212,211,210,209,208,207,206,205,204,203,202,201,
 200,199,198,197,196,195,194,193,192,191,190,189,188,187,186,185,184,183,182,181,180,179,178,177,176,175,174,173,172,171,170,169,168,167,166,165,164,163,162,161,160,159,158,157,156,155,154,153,152,151,150,149,148,147,146,145,144,143,142,141,140,139,138,137,136,135,134,133,132,131,130,129,128,127,126,125,124,123,122,121,120,119,118,117,116,115,114,113,112,111,110,109,108,107,106,105,104,103,102,101,
 100,99,98,97,96,95,94,93,92,91,90,89,88,87,86,85,84,83,82,81,80,79,78,77,76,75,74,73,72,71,70,69,68,67,66,65,64,63,62,61,60,59,58,57,56,55,54,53,52,51,50,49,48,47,46,45,44,43,42,41,40,39,38,37,36,35,34,33,32,31,30,29,28,27,26,25,24,23,22,21,20,19,18,17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0);
 

What happens if you change the query to

select * from event where user_id = 0 and user_id = 500;

? :-)

-- 
regards
Claus

When lenity and cruelty play for a kingdom,
the gentler gamester is the soonest winner.

Shakespeare

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Bad performance of SELECT ... where id IN (...)

2009-09-26 Thread Xia Qingran
Hi,
I have a big performance problem in my SQL select query:


select * from event where user_id in
(500,499,498,497,496,495,494,493,492,491,490,489,488,487,486,485,484,483,482,481,480,479,478,477,476,475,474,473,472,471,470,469,468,467,466,465,464,463,462,461,460,459,458,457,456,455,454,453,452,451,450,449,448,447,446,445,444,443,442,441,440,439,438,437,436,435,434,433,432,431,430,429,428,427,426,425,424,423,422,421,420,419,418,417,416,415,414,413,412,411,410,409,408,407,406,405,404,403,402,401,
400,399,398,397,396,395,394,393,392,391,390,389,388,387,386,385,384,383,382,381,380,379,378,377,376,375,374,373,372,371,370,369,368,367,366,365,364,363,362,361,360,359,358,357,356,355,354,353,352,351,350,349,348,347,346,345,344,343,342,341,340,339,338,337,336,335,334,333,332,331,330,329,328,327,326,325,324,323,322,321,320,319,318,317,316,315,314,313,312,311,310,309,308,307,306,305,304,303,302,301,
300,299,298,297,296,295,294,293,292,291,290,289,288,287,286,285,284,283,282,281,280,279,278,277,276,275,274,273,272,271,270,269,268,267,266,265,264,263,262,261,260,259,258,257,256,255,254,253,252,251,250,249,248,247,246,245,244,243,242,241,240,239,238,237,236,235,234,233,232,231,230,229,228,227,226,225,224,223,222,221,220,219,218,217,216,215,214,213,212,211,210,209,208,207,206,205,204,203,202,201,
200,199,198,197,196,195,194,193,192,191,190,189,188,187,186,185,184,183,182,181,180,179,178,177,176,175,174,173,172,171,170,169,168,167,166,165,164,163,162,161,160,159,158,157,156,155,154,153,152,151,150,149,148,147,146,145,144,143,142,141,140,139,138,137,136,135,134,133,132,131,130,129,128,127,126,125,124,123,122,121,120,119,118,117,116,115,114,113,112,111,110,109,108,107,106,105,104,103,102,101,
100,99,98,97,96,95,94,93,92,91,90,89,88,87,86,85,84,83,82,81,80,79,78,77,76,75,74,73,72,71,70,69,68,67,66,65,64,63,62,61,60,59,58,57,56,55,54,53,52,51,50,49,48,47,46,45,44,43,42,41,40,39,38,37,36,35,34,33,32,31,30,29,28,27,26,25,24,23,22,21,20,19,18,17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0);


The above SELECT always spends 1200ms.

The EXPLAIN ANLYSE result of it is :
   QUERY PLAN

 Bitmap Heap Scan on event  (cost=73685.08..5983063.49 rows=662018
width=36) (actual time=24.857..242.826 rows=134289 loops=1)
   Recheck Cond: (user_id = ANY
('{499,498,497,496,495,494,493,492,491,490,489,488,487,486,485,484,483,482,481,480,479,478,477,476,475,474,473,472,471,470,469,468,467,466,465,464,463,462,461,460,459,458,457,456,455,454,453,452,451
,450,449,448,447,446,445,444,443,442,441,440,439,438,437,436,435,434,433,432,431,430,429,428,427,426,425,424,423,422,421,420,419,418,417,416,415,414,413,412,411,410,409,408,407,406,405,404,403,402,401,400,399,398,397,396,395,394,3
93,392,391,390,389,388,387,386,385,384,383,382,381,380,379,378,377,376,375,374,373,372,371,370,369,368,367,366,365,364,363,362,361,360,359,358,357,356,355,354,353,352,351,350,349,348,347,346,345,344,343,342,341,340,339,338,337,336
,335,334,333,332,331,330,329,328,327,326,325,324,323,322,321,320,319,318,317,316,315,314,313,312,311,310,309,308,307,306,305,304,303,302,301,300,299,298,297,296,295,294,293,292,291,290,289,288,287,286,285,284,283,282,281,280,279,2
78,277,276,275,274,273,272,271,270,269,268,267,266,265,264,263,262,261,260,259,258,257,256,255,254,253,252,251,250,249,248,247,246,245,244,243,242,241,240,239,238,237,236,235,234,233,232,231,230,229,228,227,226,225,224,223,222,221
,220,219,218,217,216,215,214,213,212,211,210,209,208,207,206,205,204,203,202,201,200,199,198,197,196,195,194,193,192,191,190,189,188,187,186,185,184,183,182,181,180,179,178,177,176,175,174,173,172,171,170,169,168,167,166,165,164,1
63,162,161,160,159,158,157,156,155,154,153,152,151,150,149,148,147,146,145,144,143,142,141,140,139,138,137,136,135,134,133,132,131,130,129,128,127,126,125,124,123,122,121,120,119,118,117,116,115,114,113,112,111,110,109,108,107,106
,105,104,103,102,101,100,99,98,97,96,95,94,93,92,91,90,89,88,87,86,85,84,83,82,81,80,79,78,77,76,75,74,73,72,71,70,69,68,67,66,65,64,63,62,61,60,59,58,57,56,55,54,53,52,51,50,49,48,47,46,45,44,43,42,41,40,39,38,37,36,35,34,33,32,3
1,30,29,28,27,26,25,24,23,22,21,20,19,18,17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0}'::integer[]))
   -  Bitmap Index Scan on event_user_id_idx  (cost=0.00..71699.03
rows=662018 width=0) (actual time=24.610..24.610 rows=134289 loops=1)
 Index Cond: (user_id = ANY
('{499,498,497,496,495,494,493,492,491,490,489,488,487,486,485,484,483,482,481,480,479,478,477,476,475,474,473,472,471,470,469,468,467,466,465,464,463,462,461,460,459,458,457,456,455,454,453,452
,451,450,449,448,447,446,445,444,443,442,441,440,439,438,437,436,435,434,433,432,431,430,429,428,427,426,425,424,423,422,421,420,419,418,417,416,415,414,413,412,411,410,409,408,407,406,405,404,403,402,401,400,399,398,397,396,395,3

Re: [PERFORM] PG 8.3 and large shared buffer settings

2009-09-26 Thread Greg Smith

On Thu, 24 Sep 2009, Dan Sugalski wrote:

Is there any practical limit to the number of shared buffers PG 8.3.7 can 
handle before more becomes counter-productive?


There are actually two distinct questions here you should consider, 
because the popular wisdom here and what makes sense for your case might 
be different.


The biggest shared_buffers tests I've seen come from Sun, where Jignesh 
there saw around 10GB was the largest amount of RAM you could give to the 
database before it stopped improving performance.  As you guessed, there 
is a certain amount of overhead to managing the buffers involved, and as 
the size grows the chance you'll run into locking issues or similar 
resource contention grows too.


Another problem spot are checkpoints.  If you dirty a very large buffer 
cache, that whole thing will have to get dumped to disk eventually, and on 
some workloads people have found they have to reduce shared_buffers 
specifically to keep this from being too painful.


That's not answering your question though; what it answers is how large 
can shared_buffers get before it's counterproductive compared with giving 
the memory to OS to manage?


The basic design of PostgreSQL presumes that the OS buffer cache exists as 
a second-chance source for cached buffers.  The OS cache tends to be 
optimized to handle large numbers of buffers well, but without very much 
memory about what's been used recently to optimize allocations and 
evictions.  The symmetry there is one reason behind why shared_buffers 
shouldn't be most of the RAM on your system; splitting things up so that 
PG has a cut and the OS has at least as large of its own space lets the 
two cache management schemes complement each other.


The box runs other things as well as the database, so the OS buffer cache 
tends to get effectively flushed -- permanently pinning more of the database 
in memory would be an overall win for DB performance, assuming bad things 
don't happen because of buffer management.


This means that the question you want an answer to is if the OS cache 
isn't really available, where does giving memory to shared_buffers becomes 
less efficient than not caching things at all?  My guess is that this 
number is much larger than 10GB, but I don't think anyone has done any 
tests to try to quantify exactly where it is.  Typically when people are 
talking about systems as large as yours, they're dedicated database 
servers at that point, so the OS cache gets considered at the same time. 
If it's effectively out of the picture, the spot where caching still helps 
even when it's somewhat inefficient due to buffer contention isn't well 
explored.


It would depend on the app too.  If you're heavily balanced toward reads 
that don't need locks, you can certainly support a larger shared_buffers 
than someone who is writing a lot (just the checkpoint impact alone makes 
this true, and there's other sources for locking contention).


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bad performance of SELECT ... where id IN (...)

2009-09-26 Thread Tom Lane
Xia Qingran qingran@gmail.com writes:
 I have a big performance problem in my SQL select query:
 select * from event where user_id in
 (500,499,498, ... ,1,0);
 The above SELECT always spends 1200ms.

Your EXPLAIN ANALYZE shows that the actual runtime is only about 240ms.
So either the planning time is about 1000ms, or transmitting and
displaying the 134K rows produced by the query takes that long, or some
combination of the two.  I wouldn't be too surprised if it's the data
display that's slow; but if it's the planning time that you're unhappy
about, updating to a more recent PG release might possibly help.  What
version is this anyway?

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PG 8.3 and large shared buffer settings

2009-09-26 Thread Pierre Frédéric Caillau d


Is there any practical limit to the number of shared buffers PG 8.3.7  
can handle before more becomes counter-productive?


It is more efficient to have the page in shared buffers, rather than doing  
a context switch to the OS, copying the entire page from the OS's cache  
into shared buffers, and coming back to postgres. Shared buffers use less  
CPU. However, this is totally negligible versus the disk wait time of an  
uncached IO.


The same page may be cached once in shared_buffers, and once in the OS  
cache, so if your shared buffers is half your RAM, and the other half is  
disk cache, perhaps it won't be optimal: is stuff is cached twice, you can  
cache half as much stuff.


If your entire database can fit in shared buffers, good for you though.  
But then a checkpoint comes, and postgres will write all dirty buffers to  
disk in the order it finds them in Shared Buffers, which might be totally  
different from the on-disk order. If you have enough OS cache left to  
absorb these writes, the OS will reorder them. If not, lots of random  
writes are going to occur. On a RAID5 this can be a lot of fun.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bad performance of SELECT ... where id IN (...)

2009-09-26 Thread Grzegorz Jaśkiewicz
if you reuse that set a lot, how about storing it in a table , and doing the
join on db side ? if it is large, it sometimes makes sense to create temp
table just for single query (I use that sort of stuff for comparing with few
M records).
But temp tables in that case have to be short lived, as they can't reuse
space (no FSM in temporary table world I'm afraid, I hope it will be fixed
at some stage tho).


Re: [PERFORM] PG 8.3 and large shared buffer settings

2009-09-26 Thread Greg Smith

On Fri, 25 Sep 2009, Jeff Janes wrote:


Does it do this even if the block was already in shared_buffers?


Usually not.  The buffer ring algorithm is used to manage pages that are 
read in specifically to satisfy a sequential scan (there's a slightly 
different ring method used for VACUUM too).  If the buffer you need is 
already available and not pinned (locked by someone else), it's not read 
from disk again.  Instead, its usage count is incremently only if it's at 
zero (this doesn't count as a use unless it's about to be evicted as 
unused), and it's returned without being added to the ring.


There's a section about this (Buffer Ring Replacement Strategy) in the 
source code: 
http://git.postgresql.org/gitweb?p=postgresql.git;a=blob_plain;f=src/backend/storage/buffer/README;hb=HEAD


The commit that added the feature is at 
http://git.postgresql.org/gitweb?p=postgresql.git;a=commit;h=ebf3d5b66360823edbdf5ac4f9a119506fccd4c0


The basic flow of this code is that backends ask for buffers using 
BufferAlloc, which then calls StrategyGetBuffer (where the ring list is 
managed) only if it doesn't first find the page in the buffer cache.  You 
get what you'd hope for here:  a sequential scan will use blocks when 
they're already available in the cache, while reading in less popular 
blocks that weren't cached into the temporary ring area.  There's always 
the OS cache backing the PostrgreSQL one to handle cases where the working 
set you're using is just a bit larger than shared_buffers.  The ring read 
requests may very well be satisfied by that too if there was a recent 
sequential scan the OS is still caching.


You can read a high-level summary of the algorithm used for ring 
management (with an intro to buffer management in general) in my Inside 
the PostgreSQL Buffer Cache presentation at 
http://www.westnet.com/~gsmith/content/postgresql/ on P10 Optimizations 
for problem areas.  That doesn't specifically cover the what if it's in 
the cache already? case though.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bad performance of SELECT ... where id IN (...)

2009-09-26 Thread Craig James

Xia Qingran wrote:

Hi,
I have a big performance problem in my SQL select query:


select * from event where user_id in
(500,499,498,497,496,495,494,493,492,491,490,489,488,487,486,485,484,483,482,481,480,479,478,477,476,475,474,473,472,471,470,469,468,467,466,465,464,463,462,461,460,459,458,457,456,455,454,453,452,451,450,449,448,447,446,445,444,443,442,441,440,439,438,437,436,435,434,433,432,431,430,429,428,427,426,425,424,423,422,421,420,419,418,417,416,415,414,413,412,411,410,409,408,407,406,405,404,403,402,401,
400,399,398,397,396,395,394,393,392,391,390,389,388,387,386,385,384,383,382,381,380,379,378,377,376,375,374,373,372,371,370,369,368,367,366,365,364,363,362,361,360,359,358,357,356,355,354,353,352,351,350,349,348,347,346,345,344,343,342,341,340,339,338,337,336,335,334,333,332,331,330,329,328,327,326,325,324,323,322,321,320,319,318,317,316,315,314,313,312,311,310,309,308,307,306,305,304,303,302,301,
300,299,298,297,296,295,294,293,292,291,290,289,288,287,286,285,284,283,282,281,280,279,278,277,276,275,274,273,272,271,270,269,268,267,266,265,264,263,262,261,260,259,258,257,256,255,254,253,252,251,250,249,248,247,246,245,244,243,242,241,240,239,238,237,236,235,234,233,232,231,230,229,228,227,226,225,224,223,222,221,220,219,218,217,216,215,214,213,212,211,210,209,208,207,206,205,204,203,202,201,
200,199,198,197,196,195,194,193,192,191,190,189,188,187,186,185,184,183,182,181,180,179,178,177,176,175,174,173,172,171,170,169,168,167,166,165,164,163,162,161,160,159,158,157,156,155,154,153,152,151,150,149,148,147,146,145,144,143,142,141,140,139,138,137,136,135,134,133,132,131,130,129,128,127,126,125,124,123,122,121,120,119,118,117,116,115,114,113,112,111,110,109,108,107,106,105,104,103,102,101,
100,99,98,97,96,95,94,93,92,91,90,89,88,87,86,85,84,83,82,81,80,79,78,77,76,75,74,73,72,71,70,69,68,67,66,65,64,63,62,61,60,59,58,57,56,55,54,53,52,51,50,49,48,47,46,45,44,43,42,41,40,39,38,37,36,35,34,33,32,31,30,29,28,27,26,25,24,23,22,21,20,19,18,17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0);


The above SELECT always spends 1200ms.


If your user_id is always in a narrow range like this, or even in any range 
that is a small fraction of the total, then add a range condition, like this:

select * from event where user_id = 500 and user_id = 0 and user_id in (...)

I did this exact same thing in my application and it worked well.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bad performance of SELECT ... where id IN (...)

2009-09-26 Thread Paul Ooi


On 26-Sep-2009, at 10:16 PM, Claus Guttesen wrote:


I have a big performance problem in my SQL select query:


select * from event where user_id in
(500,499,498,497,496,495,494,493,492,491,490,489,488,487,486,485,484,483,482,481,480,479,478,477,476,475,474,473,472,471,470,469,468,467,466,465,464,463,462,461,460,459,458,457,456,455,454,453,452,451,450,449,448,447,446,445,444,443,442,441,440,439,438,437,436,435,434,433,432,431,430,429,428,427,426,425,424,423,422,421,420,419,418,417,416,415,414,413,412,411,410,409,408,407,406,405,404,403,402,401
,
400,399,398,397,396,395,394,393,392,391,390,389,388,387,386,385,384,383,382,381,380,379,378,377,376,375,374,373,372,371,370,369,368,367,366,365,364,363,362,361,360,359,358,357,356,355,354,353,352,351,350,349,348,347,346,345,344,343,342,341,340,339,338,337,336,335,334,333,332,331,330,329,328,327,326,325,324,323,322,321,320,319,318,317,316,315,314,313,312,311,310,309,308,307,306,305,304,303,302,301
,
300,299,298,297,296,295,294,293,292,291,290,289,288,287,286,285,284,283,282,281,280,279,278,277,276,275,274,273,272,271,270,269,268,267,266,265,264,263,262,261,260,259,258,257,256,255,254,253,252,251,250,249,248,247,246,245,244,243,242,241,240,239,238,237,236,235,234,233,232,231,230,229,228,227,226,225,224,223,222,221,220,219,218,217,216,215,214,213,212,211,210,209,208,207,206,205,204,203,202,201
,
200,199,198,197,196,195,194,193,192,191,190,189,188,187,186,185,184,183,182,181,180,179,178,177,176,175,174,173,172,171,170,169,168,167,166,165,164,163,162,161,160,159,158,157,156,155,154,153,152,151,150,149,148,147,146,145,144,143,142,141,140,139,138,137,136,135,134,133,132,131,130,129,128,127,126,125,124,123,122,121,120,119,118,117,116,115,114,113,112,111,110,109,108,107,106,105,104,103,102,101
,
100,99,98,97,96,95,94,93,92,91,90,89,88,87,86,85,84,83,82,81,80,79,78,77,76,75,74,73,72,71,70,69,68,67,66,65,64,63,62,61,60,59,58,57,56,55,54,53,52,51,50,49,48,47,46,45,44,43,42,41,40,39,38,37,36,35,34,33,32,31,30,29,28,27,26,25,24,23,22,21,20,19,18,17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0
);



What happens if you change the query to

select * from event where user_id = 0 and user_id = 500;


or select * from event where user_id = 500; :)

Besides, your index seem quite huge 2G, and it usually takes some  
time to process the result, even though it's already indexed with btree.





? :-)

--
regards
Claus

When lenity and cruelty play for a kingdom,
the gentler gamester is the soonest winner.

Shakespeare

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org 
)

To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PG 8.3 and large shared buffer settings

2009-09-26 Thread Jeff Janes
On Sat, Sep 26, 2009 at 9:57 AM, Gerhard Wiesinger li...@wiesinger.com wrote:
 On Sat, 26 Sep 2009, Greg Smith wrote:

 On Fri, 25 Sep 2009, Jeff Janes wrote:

 Does it do this even if the block was already in shared_buffers?

 Usually not.  The buffer ring algorithm is used to manage pages that are
 read in specifically to satisfy a sequential scan (there's a slightly
 different ring method used for VACUUM too).  If the buffer you need is
 already available and not pinned (locked by someone else), it's not read
 from disk again.  Instead, its usage count is incremently only if it's at
 zero (this doesn't count as a use unless it's about to be evicted as
 unused), and it's returned without being added to the ring.


 Hello Greg,

 What happens when a postmaster dies (e.g. core dump, kill -9, etc.). How is
 reference counting cleaned up and the lock removed?

If a backend dies in disgrace, the master detects this and the whole
cluster is taken down and brought back up.

WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.

(The DETAIL is technically accurate, but somewhat misleading.  If the
crash to another backend happens while your backend is waiting on the
commit record WAL fsync to return, then while the postmaster may have
commanded your session to rollback, it is too late to actually do so
and when the server comes back up and finishes recovery, you will
probably find that your transaction has indeed committed, assuming you
have some way to accurately deduce this)


Jeff

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PG 8.3 and large shared buffer settings

2009-09-26 Thread Jeff Janes
On Sat, Sep 26, 2009 at 8:19 AM, Greg Smith gsm...@gregsmith.com wrote:

 Another problem spot are checkpoints.  If you dirty a very large buffer
 cache, that whole thing will have to get dumped to disk eventually, and on
 some workloads people have found they have to reduce shared_buffers
 specifically to keep this from being too painful.

Hi Greg,

Is this the case even if checkpoint_completion_target is set close to 1.0?

If you dirty buffers fast enough to dirty most of a huge
shared_buffers area between checkpoints, then it seems like lowering
the shared_buffers wouldn't reduce the amount of I/O needed, it would
just shift the I/O from checkpoints to the backends themselves.

It looks like checkpoint_completion_target was introduced in 8.3.0

Cheers,

Jeff

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PG 8.3 and large shared buffer settings

2009-09-26 Thread Greg Smith

On Sat, 26 Sep 2009, Jeff Janes wrote:


On Sat, Sep 26, 2009 at 8:19 AM, Greg Smith gsm...@gregsmith.com wrote:


Another problem spot are checkpoints.  If you dirty a very large buffer
cache, that whole thing will have to get dumped to disk eventually, and on
some workloads people have found they have to reduce shared_buffers
specifically to keep this from being too painful.


Is this the case even if checkpoint_completion_target is set close to 1.0?


Sure.  checkpoint_completion_target aims to utilize more of the space 
between each checkpoint by spreading them out over more of that space, but 
it alone doesn't change the fact that checkpoints are only so long.  By 
default, you're going to get one every five minutes, and on active systems 
they can come every few seconds if you're not aggressive with increasing 
checkpoint_segments.


Some quick math gives an idea of the scale of the problem.  A single cheap 
disk can write random I/O (which checkpoints writes often are) at 1-2MB/s; 
let's call it 100MB/minute.  That means that in 5 minutes, a single disk 
system might be hard pressed to write even 500MB of data out.  But you can 
easily dirty 500MB in seconds nowadays.  Now imagine shared_buffers is 
40GB and you've dirtied a bunch of it; how long will that take to clear 
even on a fast RAID system?  It won't be quick, and the whole system will 
grind to a halt at the end of the checkpoint as all the buffered writes 
queued up are forced out.


If you dirty buffers fast enough to dirty most of a huge shared_buffers 
area between checkpoints, then it seems like lowering the shared_buffers 
wouldn't reduce the amount of I/O needed, it would just shift the I/O 
from checkpoints to the backends themselves.


What's even worse is that backends can be writing data and filling the OS 
buffer cache in between checkpoints too, but all of that is forced to 
complete before the checkpoint can finish too.  You can easily start the 
checkpoint process with the whole OS cache filled with backend writes that 
will slow checkpoint ones if you're not careful.


Because disks are slow, you need to get things that are written to disk as 
soon as feasible, so the OS has more time to work on them, reorder for 
efficient writing, etc.


Ultimately, the sooner you get I/O to the OS cache to write, the better, 
*unless* you're going to write that same block over again before it must 
go to disk.  Normally you want buffers that aren't accessed often to get 
written out to disk early rather than linger until checkpoint time, 
there's nothing wrong with a backend doing a write if that block wasn't 
going to be used again soon. The ideal setup from a latency perspective is 
that you size shared_buffers just large enough to hold the things you 
write to regularly, but not so big that it caches every write.



It looks like checkpoint_completion_target was introduced in 8.3.0


Correct.  Before then, you had no hope for reducing checkpoint overhead 
but to use very small settings for shared_buffers, particularly if you 
cranked the old background writer up so that it wrote lots of redundant 
information too (that's was the main result of tuning it on versions 
before 8.3 as well).


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance