Re: [PERFORM] a heavy duty operation on an "unused" table kills my server

2010-01-20 Thread Matthew Wakeling

On Fri, 15 Jan 2010, Greg Smith wrote:
It seems to me that CFQ is simply bandwidth limited by the extra processing 
it has to perform.


I'm curious what you are doing when you see this.


16 disc 15kRPM RAID0, when using fadvise with more than 100 simultaneous 
8kB random requests. I sent an email to the mailing list on 29 Jan 2008, 
but it got thrown away by the mailing list spam filter because it had an 
image in it (the graph showing interesting information). Gregory Stark 
replied to it in 
http://archives.postgresql.org/pgsql-performance/2008-01/msg00285.php


I was using his synthetic test case program.

My theory has been that the "extra processing it has to perform" you describe 
just doesn't matter in the context of a fast system where physical I/O is 
always the bottleneck.


Basically, to an extent, that's right. However, when you get 16 drives or 
more into a system, then it starts being an issue.


Matthew

--
For every complex problem, there is a solution that is simple, neat, and wrong.
 -- H. L. Mencken 


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Change query join order

2010-01-20 Thread Kaloyan Iliev Iliev




Thanks You,
 I changed the random_page_cost to 2 and the query plan has changed and
speeds up.
 I will check the other queries but I think I will leave it at this
value.

Thank you again.
  Kaloyan Iliev


Robert Haas wrote:

  On Fri, Jan 8, 2010 at 2:23 PM, Tom Lane  wrote:
  
  

  If the other plan does turn out to be faster (and I agree with Tom
that there is no guarantee of that), then one thing to check is
whether seq_page_cost and random_page_cost are set too high.  If the
data is all cached, the default values of 4 and 1 are three orders of
magnitude too large, and they should also be set to equal rather than
unequal values.
  

Tweaking the cost parameters to suit your local situation is the
recommended cure for planner misjudgments; but I'd recommend against
changing them on the basis of only one example.  You could easily
find yourself making other cases worse.  Get a collection of common
queries for your app and look at the overall effects.

  
  
No argument, and well said -- just trying to point out that the
default values really are FAR too high for people with databases that
fit in OS cache.

...Robert

  





Re: [PERFORM] New server to improve performance on our large and busy DB - advice?

2010-01-20 Thread Carlo Stonebanks

yeah, the values are at the end.  Sounds like your vacuum settings are
too non-aggresive.  Generally this is the vacuum cost delay being too
high.


Of course, I have to ask: what's the down side?


Yes!  You can run vacuum verbose against the regular old postgres
database (or just create one for testing with nothing in it) and
you'll still get the fsm usage numbers from that!  So, no need to run
it against the big db.  However, if regular vacuum verbose couldn't
finish in a week, then you've likely got vacuum and autovacuum set to
be too timid in their operation, and may be getting pretty bloated as
we speak.  Once the fsm gets too blown out of the water, it's quicker
to dump and reload the whole DB than to try and fix it.


My client reports this is what they actualyl do on a monthly basis.

And the numbers are in:

NOTICE:  number of page slots needed (4090224) exceeds max_fsm_pages 
(204800)
HINT:  Consider increasing the configuration parameter "max_fsm_pages" to 
a value over 4090224.


Gee, only off by a factor of 20. What happens if I go for this number (once 
again, what's the down side)?


Carlo 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] a heavy duty operation on an "unused" table kills my server

2010-01-20 Thread Greg Smith

Matthew Wakeling wrote:

On Fri, 15 Jan 2010, Greg Smith wrote:
My theory has been that the "extra processing it has to perform" you 
describe just doesn't matter in the context of a fast system where 
physical I/O is always the bottleneck.


Basically, to an extent, that's right. However, when you get 16 drives 
or more into a system, then it starts being an issue.


I guess if I test a system with *only* 16 drives in it one day, maybe 
I'll find out.


Seriously though, there is some difference between a completely 
synthetic test like you noted issues with here, and anything you can see 
when running the database.  I was commenting more on the state of things 
from the perspective of a database app, where I just haven't seen any of 
the CFQ issues I hear reports of in other contexts.  I'm sure there are 
plenty of low-level tests where the differences between the schedulers 
is completely obvious and it doesn't look as good anymore, and I'll take 
a look at whether I can replicate the test case you saw a specific 
concern with here.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server to improve performance on our large and busy DB - advice?

2010-01-20 Thread Kevin Grittner
"Carlo Stonebanks"  wrote:
 
>>  yeah, the values are at the end.  Sounds like your vacuum
>> settings are too non-aggresive.  Generally this is the vacuum
>> cost delay being too high.
> 
> Of course, I have to ask: what's the down side?
 
If you make it too aggressive, it could impact throughput or
response time.  Odds are that the bloat from having it not
aggressive enough is currently having a worse impact.
 
>> Once the fsm gets too blown out of the water, it's quicker
>> to dump and reload the whole DB than to try and fix it.
> 
> My client reports this is what they actualyl do on a monthly
> basis.
 
The probably won't need to do that with proper configuration and
vacuum policies.
 
>>> NOTICE:  number of page slots needed (4090224) exceeds
>>> max_fsm_pages (204800)
>>> HINT:  Consider increasing the configuration parameter
>>> "max_fsm_pages" to a value over 4090224.
> 
> Gee, only off by a factor of 20. What happens if I go for this
> number (once again, what's the down side)?
 
It costs six bytes of shared memory per entry.
 
http://www.postgresql.org/docs/8.3/interactive/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-FSM
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] ext4 finally doing the right thing

2010-01-20 Thread Greg Smith

Jeff Davis wrote:


On one side, we might finally be 
able to use regular drives with their caches turned on safely, taking 
advantage of the cache for other writes while doing the right thing with 
the database writes.



That could be good news. What's your opinion on the practical
performance impact? If it doesn't need to be fsync'd, the kernel
probably shouldn't have written it to the disk yet anyway, right (I'm
assuming here that the OS buffer cache is much larger than the disk
write cache)?
  


I know they just tweaked this area recently so this may be a bit out of 
date, but kernels starting with 2.6.22 allow you to get up to 10% of 
memory dirty before getting really aggressive about writing things out, 
with writes starting to go heavily at 5%.  So even with a 1GB server, 
you could easily find 100MB of data sitting in the kernel buffer cache 
ahead of a database write that needs to hit disc.  Once you start 
considering the case with modern hardware, where even my desktop has 8GB 
of RAM and most serious servers I see have 32GB, you can easily have 
gigabytes of such data queued in front of the write that now needs to 
hit the platter.


The dream is that a proper barrier implementation will then shuffle your 
important write to the front of that queue, without waiting for 
everything else to clear first.  The exact performance impact depends on 
how many non-database writes happen.  But even on a dedicated database 
disk, it should still help because there are plenty of non-sync'd writes 
coming out the background writer via its routine work and the checkpoint 
writes.  And the ability to fully utilize the write cache on the 
individual drives, on commodity hardware, without risking database 
corruption would make life a lot easier.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com



Re: [PERFORM] Re: New server to improve performance on our large and busy DB - advice? (v2)

2010-01-20 Thread Carlo Stonebanks

* A database that is of small to medium size (5 - 10 GB)?
* Around 10 clients that perform constant write operations to the database 
(UPDATE/INSERT)

* Around 10 clients that occasionally read from the database
* Around 6000 tables in your database
* A problem with tuning it all
* Migration to new hardware and/or OS

Is this all correct?


Actually, the tablespace is very large, over 500GB. However, the actualy 
production DB is 200GB.


First thing that is noticeable is that you seem to have way too few drives 
in the server - not because of disk space required but because of speed. 
You didn't say what type of drives you have and you didn't say what you 
would consider desirable performance levels, but off hand (because of the 
"10 clients perform constant writes" part) you will probably want at least 
2x-4x more drives.



With only 4 drives, RAID 10 is the only thing usable here.


What would be the optimum RAID level and number of disks?


> 2) Which Windows OS would you recommend? (currently 2008 x64 Server)

Would not recommend Windows OS.


We may be stuck as my client is only considering Red Hat Linux (still 
waiting to find out which version). If it turns out that this limitatt 
doesn't give better than a marginal improvement, then there is no incentive 
to create more complications in what is basically a Windows shop (although 
the project manager is a Linux advocate).


Most importantly, you didn't say what you would consider desirable 
performance. The hardware and the setup you described will work, but not 
necessarily fast enough.


Once again, it seems as though we are down to the number of drives...

Have you tried decreasing random_page_cost in postgresql.conf? Or setting 
(as a last resort) enable_seqscan = off?


In critical code sections, we do - we have stored procedures and code 
segments which save the current enable_seqscan value, set it to off (local 
to the transaction), then restore it after the code has run. Our current 
"planner cost" values are all default. Is this what you would choose for a 
Intel Core 2 Quads Quad with 48 GB RAM?


# - Planner Cost Constants -
#seq_page_cost = 1.0   # measured on an arbitrary scale
#random_page_cost = 4.0   # same scale as above
#cpu_tuple_cost = 0.01   # same scale as above
#cpu_index_tuple_cost = 0.005  # same scale as above
#cpu_operator_cost = 0.0025  # same scale as above
#effective_cache_size = 128MB

Thanks for the help,

Carlo 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server to improve performance on our large and busy DB - advice?

2010-01-20 Thread Robert Haas
On Wed, Jan 20, 2010 at 3:03 PM, Carlo Stonebanks
 wrote:
>> Yes!  You can run vacuum verbose against the regular old postgres
>> database (or just create one for testing with nothing in it) and
>> you'll still get the fsm usage numbers from that!  So, no need to run
>> it against the big db.  However, if regular vacuum verbose couldn't
>> finish in a week, then you've likely got vacuum and autovacuum set to
>> be too timid in their operation, and may be getting pretty bloated as
>> we speak.  Once the fsm gets too blown out of the water, it's quicker
>> to dump and reload the whole DB than to try and fix it.
>
> My client reports this is what they actualyl do on a monthly basis.

Something is deeply wrong with your client's vacuuming policies.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] ext4 finally doing the right thing

2010-01-20 Thread Greg Stark
That doesn't sound right. The kernel having 10% of memory dirty doesn't mean
there's a queue you have to jump at all. You don't get into any queue until
the kernel initiates write-out which will be based on the usage counters --
basically a lru. fsync and cousins like sync_file_range and
posix_fadvise(DONT_NEED) in initiate write-out right away.

How many pending write-out requests for how much data the kernel should keep
active is another question but I imagine it has more to do with storage
hardware than how much memory your system has. And for most hardware it's
probably on the order of megabytes or less.

greg

On 20 Jan 2010 21:19, "Greg Smith"  wrote:

Jeff Davis wrote: > > >> On one side, we might finally be >> able to use
regular drives with their ...
I know they just tweaked this area recently so this may be a bit out of
date, but kernels starting with 2.6.22 allow you to get up to 10% of memory
dirty before getting really aggressive about writing things out, with writes
starting to go heavily at 5%.  So even with a 1GB server, you could easily
find 100MB of data sitting in the kernel buffer cache ahead of a database
write that needs to hit disc.  Once you start considering the case with
modern hardware, where even my desktop has 8GB of RAM and most serious
servers I see have 32GB, you can easily have gigabytes of such data queued
in front of the write that now needs to hit the platter.

The dream is that a proper barrier implementation will then shuffle your
important write to the front of that queue, without waiting for everything
else to clear first.  The exact performance impact depends on how many
non-database writes happen.  But even on a dedicated database disk, it
should still help because there are plenty of non-sync'd writes coming out
the background writer via its routine work and the checkpoint writes.  And
the ability to fully utilize the write cache on the individual drives, on
commodity hardware, without risking database corruption would make life a
lot easier.

-- 
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


Re: [PERFORM] ext4 finally doing the right thing

2010-01-20 Thread Greg Smith

Greg Stark wrote:


That doesn't sound right. The kernel having 10% of memory dirty 
doesn't mean there's a queue you have to jump at all. You don't get 
into any queue until the kernel initiates write-out which will be 
based on the usage counters -- basically a lru. fsync and cousins like 
sync_file_range and posix_fadvise(DONT_NEED) in initiate write-out 
right away.




Most safe ways ext3 knows how to initiate a write-out on something that 
must go (because it's gotten an fsync on data there) requires flushing 
every outstanding write to that filesystem along with it.  So as soon as 
a single WAL write shows up, bam!  The whole cache is emptied (or at 
least everything associated with that filesystem), and the caller who 
asked for that little write is stuck waiting for everything to clear 
before their fsync returns success.


This particular issue absolutely killed Firefox when they switched to 
using SQLite not too long ago; high-level discussion at 
http://shaver.off.net/diary/2008/05/25/fsyncers-and-curveballs/ and 
confirmation/discussion of the issue on lkml at 
https://kerneltrap.org/mailarchive/linux-fsdevel/2008/5/26/1941354 . 

Note the comment from the first article saying "those delays can be 30 
seconds or more".  On multiple occasions, I've measured systems with 
dozens of disks in a high-performance RAID1+0 with battery-backed 
controller that could grind to a halt for 10, 20, or more seconds in 
this situation, when running pgbench on a big database.  As was the case 
on the latest one I saw, if you've got 32GB of RAM and have let 3.2GB of 
random I/O from background writer/checkpoint writes back up because 
Linux has been lazy about getting to them, that takes a while to clear 
no matter how good the underlying hardware.


Write barriers were supposed to improve all this when added to ext3, but 
they just never seemed to work right for many people.  After reading 
that lkml thread, among others, I know I was left not trusting anything 
beyond the simplest path through this area of the filesystem.  Slow is 
better than corrupted.


So the good news I was relaying is that it looks like this finally work 
on ext4, giving it the behavior you described and expected, but that's 
not actually been there until now.  I was hoping someone with more free 
time than me might be interested to go investigate further if I pointed 
the advance out.  I'm stuck with too many production systems to play 
with new kernels at the moment, but am quite curious.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance