Re: [CentOS] Effectiveness of CentOS vm.swappiness

2015-06-11 Thread Markus Shorty Uckelmann

Am 04.06.2015 um 22:18 schrieb Markus Shorty Uckelmann:

Hi all,


Thanks for all your help!

I just found a few additional things one can or should do when 
investigating swap-related issues:


* dmesg - always do that!
* Look for a RAM-disk. These things are kernel memory. So they don't 
show up in smem et.al.

* Take a look at some of swaps contents:
strings -f /dev/sda3 | shuf -n  10  | less


Cheers, Shorty
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Effectiveness of CentOS vm.swappiness

2015-06-08 Thread Kwan Lowe
On Thu, Jun 4, 2015 at 4:18 PM, Markus Shorty Uckelmann sho...@koeln.de
wrote:

 I have lots of C6  C7 machines in use and all of them have the default
 swappiness of 60. The problem now is that a lot of those machines do
 swap although there is no memory pressure. I'm now thinking about
 lowering swappiness to 1. But I'd still like to find out why this
 happens.


 Thanks for this thread. I'm actually looking at the same settings for a
different reason.  Most of our environment is VMWare-based and one major
difference between the Linux and Windows clients is how they use free
memory. Linux grabs it for cache (Free memory is wasted memory.) but
Windows doesn't appear to touch it at all. This means the VMWare hypervisor
can over-commit memory.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Effectiveness of CentOS vm.swappiness

2015-06-08 Thread Jonathan Billings
On Fri, Jun 05, 2015 at 08:40:27AM -0700, Greg Lindahl wrote:
 Linux does not treat various kinds of memory pages differently. If you
 want a daemon to be fully in core, call mlockall(). Here's one way to
 do that without changing the daemon's source:

Another way to do this is to put the services into a named CGroup, and
set memory.swappiness=1 for that cgroup.

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-memory.html

Not necessarily as effective as mlock() but you might want to set some
of the other cgroup features as well.

-- 
Jonathan Billings billi...@negate.org
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Effectiveness of CentOS vm.swappiness

2015-06-06 Thread Markus Shorty Uckelmann

Am 05.06.2015 um 23:32 schrieb Gordon Messmer:

Those two things can't really both be true.  If the pages
swapped out are unused, then the application won't suffer as a
result.


Why not? If you have an application which sees action only every
12 to 24 hours,I think this can happen.


Well, that's not unused, then.


In a matter of speaking it's not unused. But in the case of rarely
used it is possible that parts of the programm are in swap which are
needed.


To measure the swap use of your processes, install smem.  It will
show you the amount of swap that each process is using.


Brilliant! Until now I was using the script under [1].


For more specific information, make a copy of /proc/pid/smaps.

To quantify your problem, let bacula run then save the output of
smem, or /proc/pid/smaps for each of your critical services, or
both, and then access each of the services and quantify the latency
relative to the normal latency.  Finally, after collecting latency
information, get the output of smem and/or /proc/pid/smaps again.
You can compare swap use before and after accessing the service to
see how much was swapped out beforehand (presumably because of the
backup), and how much had to be recovered for your test query.

I'd suggest collecting that information at the normal swappiness
setting and at 0.


Thank you, this will get me further.


If the kernel is swapping out processes in favor of filesystem
cache when swappiness is 0, I believe that would be a bug, and
should be reported to the kernel developers.


Because of what I read in [2] I'm not planning to use 0, rather 1. But
please correct me if I'm wrong.


Timeouts is pretty vague.  Very generally, it's possible that you
have a timeout configured somewhere that is failing on the first
run because the filesystem cache now contains content from your
backup, and your process only completes in time when the files
needed for the deployment are in the filesystem cache.  That's a
stretch as far as explanations go, but if that is the case, then
swappiness isn't going to fix the problem.  You need to fix your
timeout so that it allows enough time for the deployment to finish
when the server is cold booted (using no cache), or prime your
caches before doing deployments.


With timeouts I meant that the salt master tries to contact the
salt-minion to send it the payload. At this point happens the timeout.
In this case it means that the minion doesn't get back to the master
in the configured timeout. Currently it's set to 20 seconds. When we
start a job the first time after several hours we get a lot of
timeouts. A second run mostly helps. I think it is possible that parts
of the minion process which are needed for the payload we send it are
swapped out. At the first run it takes too long to get the pages back
into RAM. But eventually all pages are paged in. So the second run
works. But this is just an assumption. On Monday I'll try to find out
if I'm right or wrong.

BTW: Is there a way to find out which parts of a programm are swapped
out without using monsters like Valgrind? Damn, sounds like an
interesting start of the week...


[1] http://northernmost.org/blog/find-out-what-is-using-your-swap/
[2]
http://www.mysqlperformanceblog.com/2014/04/28/oom-relation-vm-swappiness0-new-kernel/


Cheers to all for the feedback and help,
Shorty
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Effectiveness of CentOS vm.swappiness

2015-06-06 Thread Markus Shorty Uckelmann

Am 06.06.2015 um 05:06 schrieb Dennis Jacobfeuerborn:

That's true but it also means that if you lock that page so it cannot be
swapped out then this page is not available for the page cache so you
incur the i/o hit either way and it's probably going to be worse because
the system has no longer an option to optimize the memory management.
I wouldn't worry about it until there's actually permanent swap activity
going on and then you have to decide if you want to add more ram to the
system or maybe find a way to tell e.g. Bacula to use direct i/o and not
pollute the page cache.
For application that do not allow to specify this a wrapper could be
used such as this one:
http://arighi.blogspot.de/2007/04/how-to-bypass-buffer-cache-in-linux.html


Actually I found better links:
https://code.google.com/p/pagecache-mangagement/
http://lwn.net/Articles/224653/

It is to address the waah, backups fill my memory with pagecache and
the waah, updatedb swapped everything out and the waah, copying a DVD
gobbled all my memory problems.


Dennis, thanks for the links. I hope to get around using these tools. 
But it's good to have them in my arsenal ;)


Cheers, Shorty


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Effectiveness of CentOS vm.swappiness

2015-06-06 Thread Gordon Messmer

On 06/06/2015 02:23 AM, Markus Shorty Uckelmann wrote:


When we
start a job the first time after several hours we get a lot of
timeouts. A second run mostly helps.


In addition to capturing swap use before and after a run that times out, 
I'd cold boot all of the systems involved and see if that job times out 
as well.  If that times out, it's likely that you need to prime your 
caches before a job, or break the job into smaller bits, or extend your 
timeout.



BTW: Is there a way to find out which parts of a programm are swapped
out without using monsters like Valgrind? Damn, sounds like an
interesting start of the week...


The smaps file has that information, in a general sense.  If you want to 
know what variables hold references to the areas that are swapped out, 
you'll need a debugger.

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Effectiveness of CentOS vm.swappiness

2015-06-05 Thread Markus Shorty Uckelmann

Am 05.06.2015 um 00:23 schrieb Dennis Jacobfeuerborn:

If I'd have to venture a guess then I'd say there are memory pages that
are never touched by any processes and as a result the algorithm has
decided that it's more effective to swap out these pages to disk and use
the freed ram for the page-cache.


That's my guess too.

[...]

impact. If however these numbers are consistently larger than 0 then
then that means the system is under acute memory pressure and has to
constantly move pages between ram and disk and that will have a large
negative performance impact on the system. This is the moment when swap
usage becomes bad.


Gladly I don't have constant paging on all systems. And if there is
paging activity it's very low. AFAIK it's, as you already suggested,
just that some (probably unused) parts are swapped out. But, some of
those parts are the salt-minion, php-fpm or mysqld. All services which
are important for us and which suffer badly from being swapped out. I
already made some tests with swappiness 10 which mildly made it better.
But there still was swap usage. So I tend to set swappiness to 1. Which
I don't like to do, since those default values aren't there for nothing.

Is it possible that this happens because the servers are VMs on an
ESX-server. How could that affect this? How can I further debug this
problem and find out what's the culprit? I will go back to our metrics
and see if I can find any patterns/correlations.

Cheers, Shorty


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Effectiveness of CentOS vm.swappiness

2015-06-05 Thread Greg Lindahl
On Fri, Jun 05, 2015 at 12:29:04PM +0200, Markus Shorty Uckelmann wrote:

 How can I further debug this
 problem and find out what's the culprit?

It's working as designed.

Linux does not treat various kinds of memory pages differently. If you
want a daemon to be fully in core, call mlockall(). Here's one way to
do that without changing the daemon's source:

http://superuser.com/questions/196725/how-to-mlock-all-pages-of-a-process-tree-from-console

(I've always only done this with my own code explicitly calling mlock)

If you don't explicitly lock things into memory, file I/O can and will
cause idle pages to get pushed out. It happens less often if you
manipulate swappines.

-- greg

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Effectiveness of CentOS vm.swappiness

2015-06-05 Thread Greg Lindahl
On Fri, Jun 05, 2015 at 09:33:11AM -0700, Gordon Messmer wrote:
 On 06/05/2015 03:29 AM, Markus Shorty Uckelmann wrote:
 some (probably unused) parts are swapped out. But, some of
 those parts are the salt-minion, php-fpm or mysqld. All services which
 are important for us and which suffer badly from being swapped out.
 
 Those two things can't really both be true.  If the pages swapped
 out are unused, then the application won't suffer as a result.

No.

Let's say the application only uses the page once per hour. If there
is also I/O going on, then it's easy to see that the kernel could
decide to page the page out after 50 minutes, leaving the application
having to page it back in 10 minutes later.

-- greg


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Effectiveness of CentOS vm.swappiness

2015-06-05 Thread Markus Shorty Uckelmann

Am 05.06.2015 um 18:33 schrieb Gordon Messmer:

On 06/05/2015 03:29 AM, Markus Shorty Uckelmann wrote:

some (probably unused) parts are swapped out. But, some of
those parts are the salt-minion, php-fpm or mysqld. All services which
are important for us and which suffer badly from being swapped out.


Those two things can't really both be true.  If the pages swapped out
are unused, then the application won't suffer as a result.


Why not? If you have an application which sees action only every 12 to 
24 hours,I think this can happen. Our salt-minion would be a candidate 
for this. Allthough we constantly check if it's alive, we only do once 
or twice a day something heavy like a deployment. And very often we 
have to run thos deployments twice, because the first time we get a lot 
of timeouts. Sure, it might be the software itself. But I think it could 
be possible that it is because of swapped out pages.


I can't be sure about this. That's why I want to find out what and why 
things are happening. But first I need to find the right tools to do this ;)


Cheers, Shorty
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Effectiveness of CentOS vm.swappiness

2015-06-05 Thread Gordon Messmer

On 06/05/2015 03:29 AM, Markus Shorty Uckelmann wrote:

some (probably unused) parts are swapped out. But, some of
those parts are the salt-minion, php-fpm or mysqld. All services which
are important for us and which suffer badly from being swapped out.


Those two things can't really both be true.  If the pages swapped out 
are unused, then the application won't suffer as a result.

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Effectiveness of CentOS vm.swappiness

2015-06-05 Thread Greg Lindahl
On Fri, Jun 05, 2015 at 09:21:43PM +0200, Markus Shorty Uckelmann wrote:
 If you don't explicitly lock things into memory, file I/O can and will
 cause idle pages to get pushed out. It happens less often if you
 manipulate swappines.
 
 So, is a swappiness value of 60 not recommended for servers?

It's probably a fine default. For my most recent purposes, a web-scale
search engine, I locked a ton of daemons into memory with mlockall (on
latency-optimized clusters) and set swappiness to 0 on all clusters
(including batch-optimized clusters.)

This last bit was because I don't expect my systems to ever swap... I
only have a small amount of swap configured to reduce the mayhem
caused by OOMs, and give my home-grown oom daemon (which is locked
into memory, of course) time to open fire on my choice of offending
process.

-- greg

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Effectiveness of CentOS vm.swappiness

2015-06-05 Thread Markus Shorty Uckelmann

Am 05.06.2015 um 17:40 schrieb Greg Lindahl:

On Fri, Jun 05, 2015 at 12:29:04PM +0200, Markus Shorty Uckelmann wrote:


How can I further debug this
problem and find out what's the culprit?


It's working as designed.


Sadly. It is just my first time I see this behaviour to this extent/on 
so many servers. So you can say that I'm kind of a newbie in swapping ;)



If you don't explicitly lock things into memory, file I/O can and will
cause idle pages to get pushed out. It happens less often if you
manipulate swappines.


So, is a swappiness value of 60 not recommended for servers? I worked 
with hundreds of servers (swappiness 60) on a social platform and 
swapping very rarely happened and then only on databases (which had 
swappiness set to 0). The only two differences (that I can see) to my 
current servers are that I used Debian and there was no extra I/O from 
backups.


I might be overstating the swapping thing. That's what I'm trying to 
find out.


Cheers, Shorty

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Effectiveness of CentOS vm.swappiness

2015-06-05 Thread Gordon Messmer

On 06/05/2015 12:09 PM, Markus Shorty Uckelmann wrote:

Am 05.06.2015 um 18:33 schrieb Gordon Messmer:

On 06/05/2015 03:29 AM, Markus Shorty Uckelmann wrote:

some (probably unused) parts are swapped out. But, some of
those parts are the salt-minion, php-fpm or mysqld. All services which
are important for us and which suffer badly from being swapped out.


Those two things can't really both be true.  If the pages swapped out
are unused, then the application won't suffer as a result.


Why not? If you have an application which sees action only every 12 to
24 hours,I think this can happen.


Well, that's not unused, then.

To measure the swap use of your processes, install smem.  It will show 
you the amount of swap that each process is using.


For more specific information, make a copy of /proc/pid/smaps.

To quantify your problem, let bacula run then save the output of smem, 
or /proc/pid/smaps for each of your critical services, or both, and 
then access each of the services and quantify the latency relative to 
the normal latency.  Finally, after collecting latency information, get 
the output of smem and/or /proc/pid/smaps again.  You can compare swap 
use before and after accessing the service to see how much was swapped 
out beforehand (presumably because of the backup), and how much had to 
be recovered for your test query.


I'd suggest collecting that information at the normal swappiness setting 
and at 0.


If the kernel is swapping out processes in favor of filesystem cache 
when swappiness is 0, I believe that would be a bug, and should be 
reported to the kernel developers.



Our salt-minion would be a candidate
for this. Allthough we constantly check if it's alive, we only do once
or twice a day something heavy like a deployment. And very often we
have to run thos deployments twice, because the first time we get a lot
of timeouts. Sure, it might be the software itself. But I think it could
be possible that it is because of swapped out pages.


Timeouts is pretty vague.  Very generally, it's possible that you have 
a timeout configured somewhere that is failing on the first run because 
the filesystem cache now contains content from your backup, and your 
process only completes in time when the files needed for the deployment 
are in the filesystem cache.  That's a stretch as far as explanations 
go, but if that is the case, then swappiness isn't going to fix the 
problem.  You need to fix your timeout so that it allows enough time for 
the deployment to finish when the server is cold booted (using no 
cache), or prime your caches before doing deployments.


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Effectiveness of CentOS vm.swappiness

2015-06-05 Thread Dennis Jacobfeuerborn
On 05.06.2015 19:47, Greg Lindahl wrote:
 On Fri, Jun 05, 2015 at 09:33:11AM -0700, Gordon Messmer wrote:
 On 06/05/2015 03:29 AM, Markus Shorty Uckelmann wrote:
 some (probably unused) parts are swapped out. But, some of
 those parts are the salt-minion, php-fpm or mysqld. All services which
 are important for us and which suffer badly from being swapped out.

 Those two things can't really both be true.  If the pages swapped
 out are unused, then the application won't suffer as a result.
 
 No.
 
 Let's say the application only uses the page once per hour. If there
 is also I/O going on, then it's easy to see that the kernel could
 decide to page the page out after 50 minutes, leaving the application
 having to page it back in 10 minutes later.

That's true but it also means that if you lock that page so it cannot be
swapped out then this page is not available for the page cache so you
incur the i/o hit either way and it's probably going to be worse because
the system has no longer an option to optimize the memory management.
I wouldn't worry about it until there's actually permanent swap activity
going on and then you have to decide if you want to add more ram to the
system or maybe find a way to tell e.g. Bacula to use direct i/o and not
pollute the page cache.
For application that do not allow to specify this a wrapper could be
used such as this one:
http://arighi.blogspot.de/2007/04/how-to-bypass-buffer-cache-in-linux.html

Regards,
  Dennis

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Effectiveness of CentOS vm.swappiness

2015-06-05 Thread Dennis Jacobfeuerborn
On 06.06.2015 04:48, Dennis Jacobfeuerborn wrote:
 On 05.06.2015 19:47, Greg Lindahl wrote:
 On Fri, Jun 05, 2015 at 09:33:11AM -0700, Gordon Messmer wrote:
 On 06/05/2015 03:29 AM, Markus Shorty Uckelmann wrote:
 some (probably unused) parts are swapped out. But, some of
 those parts are the salt-minion, php-fpm or mysqld. All services which
 are important for us and which suffer badly from being swapped out.

 Those two things can't really both be true.  If the pages swapped
 out are unused, then the application won't suffer as a result.

 No.

 Let's say the application only uses the page once per hour. If there
 is also I/O going on, then it's easy to see that the kernel could
 decide to page the page out after 50 minutes, leaving the application
 having to page it back in 10 minutes later.
 
 That's true but it also means that if you lock that page so it cannot be
 swapped out then this page is not available for the page cache so you
 incur the i/o hit either way and it's probably going to be worse because
 the system has no longer an option to optimize the memory management.
 I wouldn't worry about it until there's actually permanent swap activity
 going on and then you have to decide if you want to add more ram to the
 system or maybe find a way to tell e.g. Bacula to use direct i/o and not
 pollute the page cache.
 For application that do not allow to specify this a wrapper could be
 used such as this one:
 http://arighi.blogspot.de/2007/04/how-to-bypass-buffer-cache-in-linux.html

Actually I found better links:
https://code.google.com/p/pagecache-mangagement/
http://lwn.net/Articles/224653/

It is to address the waah, backups fill my memory with pagecache and
the waah, updatedb swapped everything out and the waah, copying a DVD
gobbled all my memory problems.

Regards,
  Dennis

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Effectiveness of CentOS vm.swappiness

2015-06-04 Thread Dennis Jacobfeuerborn
On 04.06.2015 22:18, Markus Shorty Uckelmann wrote:
 Hi all,
 
 This might not be CentOS related at all. Sorry about that.
 
 I have lots of C6  C7 machines in use and all of them have the default
 swappiness of 60. The problem now is that a lot of those machines do
 swap although there is no memory pressure. I'm now thinking about
 lowering swappiness to 1. But I'd still like to find out why this
 happens. The only common thing between all those machines is that there
 are nightly backups done with Bacula. I once came across issues with the
 fs-cache bringing Linux to start paging out. Any hints, explanations and
 suggestions would be much appreciated.

If I'd have to venture a guess then I'd say there are memory pages that
are never touched by any processes and as a result the algorithm has
decided that it's more effective to swap out these pages to disk and use
the freed ram for the page-cache.
Swap usage isn't inherently evil and what you really want to check for
is the si/so columns in the output of the vmstat command. If the
system is using swap space but these columns are mostly 0 then that
means memory has been swapped out in the past but there is no actual
swap activity happening right now and there should be no performance
impact. If however these numbers are consistently larger than 0 then
then that means the system is under acute memory pressure and has to
constantly move pages between ram and disk and that will have a large
negative performance impact on the system. This is the moment when swap
usage becomes bad.

Regards,
  Dennis

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos