Hi, I have started to implement your suggestions . I have a small error so far 
though. The "vm.dirty_writeback_ratio = 1" command rerurns: 
  error: "vm.dirty_writeback_ratio" is an unknown key
I'm on ubuntu 12.04


Den 28/11/2012 kl. 17.54 skrev Shaun Thomas <stho...@optionshouse.com>:

> On 11/28/2012 10:19 AM, Niels Kristian Schjødt wrote:
> 
>> https://rpm.newrelic.com/public/charts/h2dtedghfsv
> 
> Doesn't this answer your question?
> 
> That iowait is crushing your server into the ground. It's no surprise updates 
> are taking several seconds. That update you sent us *should* execute on the 
> order of only a few milliseconds.
> 
> So I'll reiterate that you *must* move your pg_xlog location elsewhere. 
> You've got row lookup bandwidth conflicting with writes. There are a couple 
> other changes you should probably make to your config:
> 
>> checkpoint_segments = 16
> 
> This is not enough for the workload you describe. Every time the database 
> checkpoints, all of those changes in pg_xlog are applied to the backend data 
> files. You should set these values:
> 
> checkpoint_segments = 100
> checkpoint_timeout = 10m
> checkpoint_completion_target = 0.9
> 
> This will reduce your overall write workload, and make it less active. Too 
> many checkpoints massively reduce write throughput. With the settings you 
> have, it's probably checkpointing constantly while your load runs. Start with 
> this, but experiment with increasing checkpoint_segments further.
> 
> If you check your logs now, you probably see a ton of "checkpoint starting: 
> xlog" in there. That's very bad. It should say "checkpoint starting: time" 
> meaning it's keeping up with your writes naturally.
> 
>> work_mem = 160MB
> 
> This is probably way too high. work_mem is used every sort operation in a 
> query. So each connection could have several of these allocated, thus 
> starting your system of memory which will reduce that available for page 
> cache. Change it to 8mb, and increase it in small increments if necessary.
> 
>> So correct me if I'm wrong here: my theory is, that I have too many
>> too slow update queries, that then often end up in a situation, where
>> they "wait" for each other to finish, hence the sometimes VERY long
>> execution times.
> 
> Sometimes this is the case, but for you, you're running into IO contention, 
> not lock contention. Your 3TB RAID-1 is simply insufficient for this workload.
> 
> If you check your logs after making the changes I've suggested, take a look 
> at your checkpoint sync times. That will tell you how long it took the kernel 
> to physically commit those blocks to disk and get a confirmation back from 
> the controller. If those take longer than a second or two, you're probably 
> running into controller buffer overflows. You have a large amount of RAM, so 
> you should also make these two kernel changes to sysctl.conf:
> 
> vm.dirty_ratio = 10
> vm.dirty_writeback_ratio = 1
> 
> Then run this:
> 
> sysctl -p
> 
> This will help prevent large IO write spikes caused when the kernel decides 
> to write out dirty memory. That can make checkpoints take minutes to commit 
> in some cases, which basically stops all write traffic to your database 
> entirely.
> 
> That should get you going, anyway. You still need more/better disks so you 
> can move your pg_xlog directory. With your write load, that will make a huge 
> difference.
> 
> -- 
> Shaun Thomas
> OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
> 312-444-8534
> stho...@optionshouse.com
> 
> ______________________________________________
> 
> See http://www.peak6.com/email_disclaimer/ for terms and conditions related 
> to this email



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to