Re: [PERFORM] : Performance Improvement Strategy

2011-10-10 Thread Leonardo Francalanci
>>         * Allow CLUSTER to sort the table rather than scanning the index 

> when it seems likely to be cheaper (Leonardo Francalanci)
> 
> Looks like I owe Leonardo Francalanci a pizza.



Well, the patch started from a work by Gregory Stark, and Tom fixed
a nasty bug; but I'll take a slice ;)


Leonardo


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Adding more memory = hugh cpu load

2011-10-10 Thread alexandre - aldeia digital

Hi,

Yesterday, a customer increased the server memory from 16GB to 48GB.

Today, the load of the server hit 40 ~ 50 points.
With 16 GB, the load not surpasses 5 ~ 8 points.

The only parameter that I changed is effective_cache_size (from 14 GB to 
40GB) and shared_buffers (from 1 GB to 5 GB). Setting the values back 
does not take any effect.


This server use CentOS 5.5 (2.6.18-194.3.1.el5.centos.plus - X86_64).
Should I change some vm parameters to this specific kernel ?

Thanks for any help.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Adding more memory = hugh cpu load

2011-10-10 Thread Kevin Grittner
alexandre - aldeia digital  wrote:
 
> Yesterday, a customer increased the server memory from 16GB to
> 48GB.
 
That's usually for the better, but be aware that on some hardware
adding RAM beyond a certain point causes slower RAM access.  Without
knowing more details, it's impossible to say whether that's the case
here.
 
> Today, the load of the server hit 40 ~ 50 points.
> With 16 GB, the load not surpasses 5 ~ 8 points.
 
Are you talking about "load average", CPU usage, or something else?
 
> The only parameter that I changed is effective_cache_size (from 14
> GB to 40GB) and shared_buffers (from 1 GB to 5 GB). Setting the
> values back does not take any effect.
 
What version of PostgreSQL is this?  What settings are in effect? 
How many user connections are active at one time?  How many cores
are there, of what type?  What's the storage system?  What kind of
load is this?
 
http://wiki.postgresql.org/wiki/Guide_to_reporting_problems
http://wiki.postgresql.org/wiki/Server_Configuration
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Adding more memory = hugh cpu load

2011-10-10 Thread Shaun Thomas

On 10/10/2011 08:26 AM, alexandre - aldeia digital wrote:


Yesterday, a customer increased the server memory from 16GB to 48GB.

Today, the load of the server hit 40 ~ 50 points.
With 16 GB, the load not surpasses 5 ~ 8 points.


That's not entirely surprising. The problem with having lots of memory 
is... that you have lots of memory. The operating system likes to cache, 
and this includes writes. Normally this isn't a problem, but with 48GB 
of RAM, the defaults (for CentOS 5.5 in particular) are to use up to 40% 
of that to cache writes.


The settings you're looking for are in:

/proc/sys/vm/dirty_background_ratio
/proc/sys/vm/dirty_ratio

You can set these by putting lines in your /etc/sysctl.conf file:

vm.dirty_background_ratio = 1
vm.dirty_ratio = 10

And then calling:

sudo sysctl -p

The first number, the background ratio, tells the memory manager to 
start writing to disk as soon as 1% of memory is used. The second is 
like a maximum of memory that can be held for caching. If the number of 
pending writes exceeds this, the system goes into synchronous write 
mode, and blocks all other write activity until it can flush everything 
out to disk. You really, really want to avoid this.


The defaults in older Linux systems were this high mostly to optimize 
for desktop performance. For CentOS 5.5, the defaults are 10% and 40%, 
which doesn't seem like a lot. But for servers with tons of ram, 10% of 
48GB is almost 5GB. That's way bigger than all but the largest RAID or 
controller cache, which means IO waits, and thus high load. Those high 
IO waits cause a kind of cascade that slowly cause writes to back up, 
making it more likely you'll reach the hard 40% limit which causes a 
system flush, and then you're in trouble.


You can actually monitor this by checking /proc/meminfo:

grep -A1 Dirty /proc/meminfo

The 'Dirty' line tells you how much memory *could* be written to disk, 
and the 'Writeback' line tells you how much the system is trying to 
write. You want that second line to be 0 or close to it, as much as 
humanly possible. It's also good to keep Dirty low, because it can be an 
indicator that the system is about to start uncontrollably flushing if 
it gets too high.


Generally it's good practice to keep dirty_ratio lower than the size of 
your disk controller cache, but even high-end systems only give 256MB to 
1GB of controller cache. Newer kernels have introduced dirty_bytes and 
dirty_background_bytes, which lets you set a hard byte-specified limit 
instead of relying on some vague integer percentage of system memory. 
This is better for systems with vast amounts of memory that could cause 
these kinds of IO spikes. Of course, in order to use those settings, 
your client will have to either install a custom kernel, or upgrade to 
CentOS 6. Try the 1% first, and it may work out.


Some kernels have a hard 5% limit on dirty_background_ratio, but the one 
included in CentOS 5.5 does not. You can even set it to 0, but your IO 
throughput will take a nosedive, because at that point, it's always 
writing to disk without any effective caching at all. The reason we set 
dirty_ratio to 10%, is because we want to reduce the total amount of 
time a synchronous IO block lasts. You can probably take that as low as 
5%, but be careful and test to find your best equilibrium point. You 
want it at a point it rarely blocks, but if it does, it's over quickly.


There's more info here:

http://www.westnet.com/~gsmith/content/linux-pdflush.htm

(I only went on about this because we had the same problem when we 
increased from 32GB to 72GB. It was a completely unexpected reaction, 
but a manageable one.)


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See http://www.peak6.com/email-disclaimer/ for terms and conditions related to 
this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Adding more memory = hugh cpu load

2011-10-10 Thread Leonardo Francalanci


> That's not entirely surprising. The problem with having lots of memory is... 
> that you have lots of memory. The operating system likes to cache, and this 
> includes writes. Normally this isn't a problem, but with 48GB of RAM, the 
> defaults (for CentOS 5.5 in particular) are to use up to 40% of that to cache 
> writes.


I don't understand: don't you want postgresql to issue the fsync calls when
it "makes sense" (and configure them), rather than having the OS decide
when it's best to flush to disk? That is: don't you want all the memory to
be used for caching, unless postgresql says otherwise (calling fsync), instead
of "as soon as 1% of memory is used"?


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Adding more memory = hugh cpu load

2011-10-10 Thread Shaun Thomas

On 10/10/2011 10:14 AM, Leonardo Francalanci wrote:


I don't understand: don't you want postgresql to issue the fsync
calls when it "makes sense" (and configure them), rather than having
the OS decide when it's best to flush to disk? That is: don't you
want all the memory to be used for caching, unless postgresql says
otherwise (calling fsync), instead of "as soon as 1% of memory is
used"?


You'd think that, which is why this bites so many people. That's not 
quite how it works in practice, though. OS cache is a lot more than 
altered database and WAL files, which do get fsync'd frequently. Beyond 
that, you need to worry about what happens *between* fsync calls.


On a highly loaded database, or even just a database experiencing heavy 
write volume due to some kind of ETL process, your amount of dirty 
memory may increase much more quickly than you expect. For example, say 
your checkpoint_timeout setting is the default of five minutes. An ETL 
process runs that loads 2GB of data into a table, and you're archiving 
transaction logs. So you now have three possible write vectors, not 
including temp tables and what not. And that's only for that connection; 
this gets more complicated if you have other OLTP connections on the 
same DB.


So your memory is now flooded with 2-6GB of data, and that's easy for 
memory to handle, and it can do so quickly. With 48GB of RAM, that's 
well within caching range, so the OS never writes anything until the 
fsync call. Then the database makes the fsync call, and suddenly the OS 
wants to flush 2-6GB of data straight to disk. Without that background 
trickle, you now have a flood that only the highest-end disk controller 
or a backing-store full of SSDs or PCIe NVRAM could ever hope to absorb.


That write flood massively degrades your read IOPS, and degrades future 
writes until it's done flushing, so all of your disk IO is saturated, 
further worsening the situation. Now you're getting closer and closer to 
your dirty_ratio setting, at which point the OS will effectively stop 
responding to anything, so it can finally finish flushing everything to 
disk. This can take a couple minutes, but it's not uncommon for these IO 
storms to last over half an hour depending on the quality of the 
disks/controller in question. During this time, system load is climbing 
precipitously, and clients are getting query timeouts.


Adding more memory can actually make your system performance worse if 
you don't equally increase the capability of your RAID/SAN/whatever to 
compensate for increased size of write chunks.


This is counter-intuitive, but completely borne out by tests. The kernel 
developers agree, or we wouldn't have dirty_bytes, or 
dirty_background_bytes, and they wouldn't have changed the defaults to 
5% and 10% instead of 10% and 40%. It's just one of those things nobody 
expected until machines with vast amounts of RAM started becoming common.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See http://www.peak6.com/email-disclaimer/ for terms and conditions related to 
this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Adding more memory = hugh cpu load

2011-10-10 Thread Greg Smith

On 10/10/2011 10:04 AM, Shaun Thomas wrote:
The problem with having lots of memory is... that you have lots of 
memory. The operating system likes to cache, and this includes writes. 
Normally this isn't a problem, but with 48GB of RAM, the defaults (for 
CentOS 5.5 in particular) are to use up to 40% of that to cache writes.


I make the same sort of tuning changes Shaun suggested on every CentOS 5 
system I come across.  That said, you should turn on log_checkpoints in 
your postgresql.conf and see whether the "sync=" numbers are high.  That 
will help prove or disprove that the slowdown you're seeing is from too 
much write caching.  You also may be able to improve that by adjusting 
checkpoint_segments/checkpoint_timeout, or *decreasing* shared_buffers.  
More about this at 
http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm


There are some other possibilities, too, like that memory addition can 
actually causing average memory speed to drop as Kevin mentioned.  I 
always benchmark with stream-scaling:  
https://github.com/gregs1104/stream-scaling before and after a RAM size 
change, to see whether things are still as fast or not.  It's hard to do 
that in the position you're in now though.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Adding more memory = hugh cpu load

2011-10-10 Thread Leonardo Francalanci
> Then the 

> database makes the fsync call, and suddenly the OS wants to flush 2-6GB of 
> data 
> straight to disk. Without that background trickle, you now have a flood that 
> only the highest-end disk controller or a backing-store full of SSDs or PCIe 
> NVRAM could ever hope to absorb.


Isn't checkpoint_completion_target supposed to deal exactly with that problem?

Plus: if 2-6GB is too much, why not decrease checkpoint_segments? Or
checkpoint_timeout?

> The kernel 
> developers agree, or we wouldn't have dirty_bytes, or 
> dirty_background_bytes, and they wouldn't have changed the defaults to 5% 
> and 10% instead of 10% and 40%. 


I'm not saying that those kernel parameters are "useless"; I'm saying they are 
used
in  the same way as the checkpoint_segments, checkpoint_timeout and
checkpoint_completion_target are used by postgresql; and on a postgresql-only 
system
I would rather have postgresql look after the fsync calls, not the OS.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Adding more memory = hugh cpu load

2011-10-10 Thread alexandre - aldeia digital

Em 10-10-2011 11:04, Shaun Thomas wrote:

That's not entirely surprising. The problem with having lots of memory
is... that you have lots of memory. The operating system likes to cache,
and this includes writes. Normally this isn't a problem, but with 48GB
of RAM, the defaults (for CentOS 5.5 in particular) are to use up to 40%
of that to cache writes.


Hi Shawn and all,

After change the parameters in sysctl.conf, during some time I see that 
load average downs. But the system loads grow again.


Dirty memory in meminfo is about 150MB and Whriteback is mostly 0 kB.

I drop checkpoint_timeout to 1min and turn on log_checkpoint:

<2011-10-10 14:18:48 BRT >LOG:  checkpoint complete: wrote 6885 buffers 
(1.1%); 0 transaction log file(s) added, 0 removed, 1 recycled; 
write=29.862 s, sync=28.466 s, total=58.651 s

<2011-10-10 14:18:50 BRT >LOG:  checkpoint starting: time
<2011-10-10 14:19:40 BRT >LOG:  checkpoint complete: wrote 6415 buffers 
(1.0%); 0 transaction log file(s) added, 0 removed, 1 recycled; 
write=29.981 s, sync=19.960 s, total=50.111 s

<2011-10-10 14:19:50 BRT >LOG:  checkpoint starting: time
<2011-10-10 14:20:45 BRT >LOG:  checkpoint complete: wrote 6903 buffers 
(1.1%); 0 transaction log file(s) added, 0 removed, 1 recycled; 
write=29.653 s, sync=25.504 s, total=55.477 s

<2011-10-10 14:20:50 BRT >LOG:  checkpoint starting: time
<2011-10-10 14:21:45 BRT >LOG:  checkpoint complete: wrote 7231 buffers 
(1.1%); 0 transaction log file(s) added, 0 removed, 2 recycled; 
write=29.911 s, sync=24.899 s, total=55.037 s

<2011-10-10 14:21:50 BRT >LOG:  checkpoint starting: time
<2011-10-10 14:22:45 BRT >LOG:  checkpoint complete: wrote 6569 buffers 
(1.0%); 0 transaction log file(s) added, 0 removed, 1 recycled; 
write=29.947 s, sync=25.303 s, total=55.342 s

<2011-10-10 14:22:50 BRT >LOG:  checkpoint starting: time
<2011-10-10 14:23:44 BRT >LOG:  checkpoint complete: wrote 5711 buffers 
(0.9%); 0 transaction log file(s) added, 0 removed, 1 recycled; 
write=30.036 s, sync=24.299 s, total=54.507 s

<2011-10-10 14:23:50 BRT >LOG:  checkpoint starting: time
<2011-10-10 14:24:50 BRT >LOG:  checkpoint complete: wrote 6744 buffers 
(1.0%); 0 transaction log file(s) added, 0 removed, 2 recycled; 
write=29.946 s, sync=29.792 s, total=60.223 s

<2011-10-10 14:24:50 BRT >LOG:  checkpoint starting: time

[root@servernew data]# vmstat 1 30 -w
procs ---memory-- ---swap-- -io 
--system-- -cpu---
 r  b   swpd   free   buff  cache   si   sobibo 
  in   cs  us sy  id wa st
22  0   26968290280 117852   3843154000   32859 
   9   17  17  3  79  1  0
34  0   26968289288 117852   3843226800 8  2757 
2502 4148  80 20   0  0  0
39  1   26968286128 117852   384323480024   622 
2449 4008  80 20   0  0  0
41  0   26968291100 117852   384337920064   553 
2487 3419  83 17   0  0  0
42  1   26968293596 117852   3843455600   232   776 
2372 2779  83 17   0  0  0
44  1   26968291984 117852   384352520056   408 
2388 3012  82 18   0  0  0
26  0   26968289884 117856   384359240064   698 
2486 3283  83 17   0  0  0
31  0   26968286788 117856   384370520088   664 
2452 3385  82 18   0  0  0
42  0   26968284500 117868   3843751600   176   804 
2492 3876  83 17   0  0  0
44  0   26968281392 117868   384388600024   504 
2338 2916  80 20   0  0  0
44  0   26968278540 117868   384391520032   568 
2337 2937  83 17   0  0  0
45  0   26968280280 117868   384403480072   402 
2492 3635  84 16   0  0  0
35  2   26968279928 117868   3844038800   184   600 
2492 3835  84 16   0  0  0
41  0   26968275948 117872   3844171200   136   620 
2624 4187  79 21   0  0  0
37  0   26968274392 117872   384423720024   640 
2492 3824  84 16   0  0  0
40  0   26968268548 117872   3844312000 0   624 
2421 3584  81 19   0  0  0
32  0   26968268308 117872   384436520016   328 
2384 3767  81 19   0  0  0
38  0   26968281820 117872   384274720072   344 
2505 3810  81 19   0  0  0
41  0   26968279776 117872   384279760016   220 
2496 3428  84 16   0  0  0
27  0   26968283252 117872   3842850800   112   312 
2563 4279  81 19   0  0  0
36  0   26968280332 117872   384292880048   544 
2626 4406  80 20   0  0  0
30  0   26968274372 117872   384293720024   472 
2442 3646  80 19   0  0  0
38  0   26968272144 117872   3842995600   152   256 
2465 4039  83 16   0  0  0
41  2   26968266496 117872   384303240056   304 
2414 3206  82 18   0  0  0
32  0   2696

Re: [PERFORM] Adding more memory = hugh cpu load

2011-10-10 Thread Kevin Grittner
alexandre - aldeia digital  wrote:
 
> Notice that we have no idle % in cpu column.
 
So they're making full use of all the CPUs they paid for.  That in
itself isn't a problem.  Unfortunately you haven't given us nearly
enough information to know whether there is indeed a problem, or if
so, what.  What was throughput before?  What is it now?  How has
latency been affected?  And all those unanswered questions from my
first email
 
The problem *might* be something along the lines of most of the
discussion on the thread.  It might not be.  I just don't know yet,
myself.
 
>   14:26:47 up 2 days,  3:26,  4 users,  load average: 48.61,
> 46.12, 40.47
 
This has me wondering again about your core count and your user
connections.
 
> My client wants to remove the extra memory... :/
 
Maybe we should identify the problem.  It might be that a connection
pooler is the solution.   On the other hand, if critical production
applications are suffering, it might make sense to take this out of
production and figure out a safer place to test things and sort this
out.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Adding more memory = hugh cpu load

2011-10-10 Thread alexandre - aldeia digital

Em 10-10-2011 14:46, Kevin Grittner escreveu:

alexandre - aldeia digital  wrote:


Notice that we have no idle % in cpu column.


So they're making full use of all the CPUs they paid for.  That in
itself isn't a problem.  Unfortunately you haven't given us nearly
enough information to know whether there is indeed a problem, or if
so, what.  What was throughput before?  What is it now?  How has
latency been affected?  And all those unanswered questions from my
first email

The problem *might* be something along the lines of most of the
discussion on the thread.  It might not be.  I just don't know yet,
myself.


From the point of view of the client, the question is simple: until the 
last friday (with 16 GB of RAM), the load average of server rarely 
surpasses 4. Nothing change in normal database use.


Tonight, we will remove the extra memory. :/

Best regards.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Adding more memory = hugh cpu load

2011-10-10 Thread Kevin Grittner
alexandre - aldeia digital  wrote:
 
> From the point of view of the client, the question is simple:
> until the last friday (with 16 GB of RAM), the load average of
> server rarely surpasses 4. Nothing change in normal database use.
 
Really?  The application still performs as well or better, and it's
the load average they care about?  How odd.
 
If they were happy with performance before the RAM was added, why
did they add it?  If they weren't happy with performance, what led
them to believe that adding more RAM would help?  If there's a
performance problem, there's generally one bottleneck which is the
limit, with one set of symptoms.  When you remove that bottleneck
and things get faster, you may well have a new bottleneck with
different symptoms.  (These symptoms might include high load average
or CPU usage, for example.)  You then figure out what is causing
*that* bottleneck, and you can make things yet faster.
 
In this whole thread you have yet to give enough information to know
for sure whether there was or is any performance problem, or what
the actual bottleneck is.  I think you'll find that people happy to
help identify the problem and suggest solutions if you provide that
information.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Adding more memory = hugh cpu load

2011-10-10 Thread Shaun Thomas

On 10/10/2011 12:31 PM, alexandre - aldeia digital wrote:


<2011-10-10 14:18:48 BRT >LOG: checkpoint complete: wrote 6885 buffers
(1.1%); 0 transaction log file(s) added, 0 removed, 1 recycled;
write=29.862 s, sync=28.466 s, total=58.651 s


28.466s sync time?! That's horrifying. At this point, I want to say the 
increase in effective_cache_size or shared_buffers triggered the planner 
to change one of your plans significantly enough it's doing a ton more 
disk IO and starving out your writes. Except you said you changed it 
back and it's still misbehaving.


This also reminds me somewhat of an issue Greg mentioned a while back 
with xlog storms in 9.0 databases. I can't recall how he usually "fixed" 
these, though.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See http://www.peak6.com/email-disclaimer/ for terms and conditions related to 
this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Adding more memory = hugh cpu load

2011-10-10 Thread alexandre - aldeia digital

Em 10-10-2011 16:39, Kevin Grittner escreveu:

alexandre - aldeia digital  wrote:


 From the point of view of the client, the question is simple:
until the last friday (with 16 GB of RAM), the load average of
server rarely surpasses 4. Nothing change in normal database use.


Really?  The application still performs as well or better, and it's
the load average they care about?  How odd.

If they were happy with performance before the RAM was added, why
did they add it?  If they weren't happy with performance, what led
them to believe that adding more RAM would help?  If there's a
performance problem, there's generally one bottleneck which is the
limit, with one set of symptoms.  When you remove that bottleneck
and things get faster, you may well have a new bottleneck with
different symptoms.  (These symptoms might include high load average
or CPU usage, for example.)  You then figure out what is causing
*that* bottleneck, and you can make things yet faster.


Calm down: if the client plans to add , for example, another database
in his server in a couple of weeks, he must only upgrade when this new 
database come to life and add another point of doubt ??? IMHO, the 
reasons to add MEMORY does not matters in this case. I came to the list 
to see if anyone else has experienced the same problem, that not 
necessarily is related with Postgres. Shaun and Greg apparently had the 
same the same problems in CentOS and the information provided by they 
helped too much...





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Adding more memory = hugh cpu load

2011-10-10 Thread Kevin Grittner
alexandre - aldeia digital  wrote:
 
> I came to the list to see if anyone else has experienced the same
> problem
 
A high load average or low idle CPU isn't a problem, it's a
potentially useful bit of information in diagnosing a problem.  I
was hoping to hear what the actual problem was, since I've had a few
problems in high RAM situations, but the solutions depend on what
the actual problems are.  I don't suppose you saw periods where
queries which normally run very quickly (say in a millisecond or
less) were suddenly taking tens of seconds to run -- "stalling" and
then returning to normal?  Because if I knew you were having a
problem like *that* I might have been able to help.  Same for other
set of symptoms; it's just the suggestions would have been
different.  And the suggestions would have depended on what your
system looked like besides the RAM.
 
If you're satisfied with how things are running with less RAM,
though, there's no need.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Adding more memory = hugh cpu load

2011-10-10 Thread Samuel Gendler
On Mon, Oct 10, 2011 at 1:52 PM, Kevin Grittner  wrote:

> alexandre - aldeia digital  wrote:
>
> > I came to the list to see if anyone else has experienced the same
> > problem
>
> A high load average or low idle CPU isn't a problem, it's a
> potentially useful bit of information in diagnosing a problem.  I
> was hoping to hear what the actual problem was, since I've had a few
> problems in high RAM situations, but the solutions depend on what
> the actual problems are.  I don't suppose you saw periods where
> queries which normally run very quickly (say in a millisecond or
> less) were suddenly taking tens of seconds to run -- "stalling" and
> then returning to normal?  Because if I knew you were having a
> problem like *that* I might have been able to help.  Same for other
> set of symptoms; it's just the suggestions would have been
> different.  And the suggestions would have depended on what your
> system looked like besides the RAM.
>
> If you're satisfied with how things are running with less RAM,
> though, there's no need.
>

The original question doesn't actually say that performance has gone down,
only that cpu utilization has gone up. Presumably, with lots more RAM, it is
blocking on I/O a lot less, so it isn't necessarily surprising that CPU
utilization has gone up.  The only problem would be if db performance has
gotten worse. Maybe I missed a message where that was covered?  I don't see
it in the original query to the list.


Re: [PERFORM] Adding more memory = hugh cpu load

2011-10-10 Thread Claudio Freire
On Tue, Oct 11, 2011 at 12:02 AM, Samuel Gendler
 wrote:
> The original question doesn't actually say that performance has gone down,
> only that cpu utilization has gone up. Presumably, with lots more RAM, it is
> blocking on I/O a lot less, so it isn't necessarily surprising that CPU
> utilization has gone up.  The only problem would be if db performance has
> gotten worse. Maybe I missed a message where that was covered?  I don't see
> it in the original query to the list.

Load average (which is presumably the metric in question) includes
both processes using the CPU and processes waiting for I/O.
So it *would* be strange for load average to go up like that, if
database configuration remains the same (ie: equal query plans)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Adding more memory = hugh cpu load

2011-10-10 Thread Greg Smith

On 10/10/2011 01:31 PM, alexandre - aldeia digital wrote:

I drop checkpoint_timeout to 1min and turn on log_checkpoint:

<2011-10-10 14:18:48 BRT >LOG:  checkpoint complete: wrote 6885 
buffers (1.1%); 0 transaction log file(s) added, 0 removed, 1 
recycled; write=29.862 s, sync=28.466 s, total=58.651 s

<2011-10-10 14:18:50 BRT >LOG:  checkpoint starting: time


Sync times that go to 20 seconds suggest there's a serious problem here 
somewhere.  But it would have been better to do these changes one at a 
time:  turn on log_checkpoints, collect some data, then try lowering 
checkpoint_timeout.  A checkpoint every minute is normally a bad idea, 
so that change may have caused this other issue.


procs ---memory-- ---swap-- 
-io --system-- -cpu---
 r  b   swpd   free   buff  cache   si   sobi
bo   in   cs  us sy  id wa st
34  0   26968289288 117852   3843226800 8  
2757 2502 4148  80 20   0  0  0
39  1   26968286128 117852   384323480024   
622 2449 4008  80 20   0  0  0
41  0   26968291100 117852   384337920064   
553 2487 3419  83 17   0  0  0

...Notice that we have no idle % in cpu column.


You also have no waiting for I/O!  This is just plain strange; 
checkpoint sync time spikes with no I/O waits I've never seen before. 
System time going to 20% isn't normal either.


I don't know what's going on with this server.  What I would normally do 
in this case is use "top -c" to see what processes are taking up so much 
runtime, and then look at what they are doing with pg_stat_activity.  
You might see the slow processes in the log files by setting 
log_min_duration_statement instead.  I'd be suspicious of Linux given 
your situation though.


I wonder if increasing the memory is a coincidence, and the real cause 
is something related to the fact that you had to reboot to install it.  
You might have switched to a newer kernel in the process too, for 
example; I'd have to put a kernel bug on the list of suspects with this 
unusual vmstat output.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Adding more memory = hugh cpu load

2011-10-10 Thread Greg Smith

On 10/10/2011 12:14 PM, Leonardo Francalanci wrote:



database makes the fsync call, and suddenly the OS wants to flush 2-6GB of data
straight to disk. Without that background trickle, you now have a flood that
only the highest-end disk controller or a backing-store full of SSDs or PCIe
NVRAM could ever hope to absorb.
 


Isn't checkpoint_completion_target supposed to deal exactly with that problem?
   


checkpoint_completion_targets spreads out the writes to disk.  
PostgreSQL doesn't make any attempt yet to spread out the sync calls.  
On a busy server, what can happen is that the whole OS write cache fills 
with dirty data--none of which is written out to disk because of the 
high kernel threshold--and then it all slams onto disk fast once the 
checkpoint starts executing sync calls.  Lowering the size of the Linux 
write cache helps with that a lot, but can't quite eliminate the problem.



Plus: if 2-6GB is too much, why not decrease checkpoint_segments? Or
checkpoint_timeout?
   


Making checkpoints really frequent increases total disk I/O, both to the 
database and to the WAL, significantly.  You don't want to do that if 
there's another way to achieve the same goal without those costs, which 
is what some kernel tuning can do here.  Just need to be careful not to 
go too far; some write caching at the OS level helps a lot, too.



I'm not saying that those kernel parameters are "useless"; I'm saying 
they are used

in  the same way as the checkpoint_segments, checkpoint_timeout and
checkpoint_completion_target are used by postgresql; and on a postgresql-only 
system
I would rather have postgresql look after the fsync calls, not the OS.
   


Except that PostgreSQL doesn't look after the fsync calls yet.  I wrote 
a patch for 9.1 that spread out the sync calls, similarly to how the 
writes are spread out now.  I wasn't able to prove an improvement 
sufficient to commit the result.  In the Linux case, the OS has more 
information to work with about how to schedule I/O efficiently given how 
the hardware is acting, and it's not possible for PostgreSQL to know all 
that--not without duplicating a large portion of the kernel development 
work at least.  Right now, relying the kernel means that any 
improvements there magically apply to any PostgreSQL version.  So far 
the results there have been beating out improvements made to the 
database fast enough that it's hard to innovate in this area within 
Postgres.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance