Re: kswapd0 causing read timeouts
On Mon, 18 Jun 2012 11:57:17 -0700, Gurpreet Singh wrote: Thanks for all the information Holger. Will do the jvm updates, kernel updates will be slow to come by. I see that with disk access mode standard, the performance is stable and better than in mmap mode, so i will probably stick to that. Please let us know how things work out. Are you suggesting i try out mongodb? Uhm, no. :) I meant that it also uses mmap exclusively (!), and consequently can also have pretty bad/irregular performance when the (active) data set grows much larger than RAM. To be fair, that is a pretty hard problem in general. -h
Re: kswapd0 causing read timeouts
JNA is installed. swappiness was 0. vfs_cache_pressure was 100. 2 questions on this.. 1. Is there a way to find out if mlockall really worked other than just the mlockall successful log message? 2. Does cassandra only mlock the jvm heap or also the mmaped memory? I disabled mmap completely, and things look so much better. latency is surprisingly half of what i see when i have mmap enabled. Its funny that i keep reading tall claims abt mmap, but in practise a lot of ppl have problems with it, especially when it uses up all the memory. We have tried mmap for different purposes in our company before,and had finally ended up disabling it, because it just doesnt handle things right when memory is low. Maybe the proc/sys/vm needs to be configured right, but thats not the easiest of configurations to get right. Right now, i am handling only 80 gigs of data. kernel version is 2.6.26. java version is 1.6.21 /G On Wed, Jun 13, 2012 at 8:42 PM, Al Tobey a...@ooyala.com wrote: I would check /etc/sysctl.conf and get the values of /proc/sys/vm/swappiness and /proc/sys/vm/vfs_cache_pressure. If you don't have JNA enabled (which Cassandra uses to fadvise) and swappiness is at its default of 60, the Linux kernel will happily swap out your heap for cache space. Set swappiness to 1 or 'swapoff -a' and kswapd shouldn't be doing much unless you have a too-large heap or some other app using up memory on the system. On Wed, Jun 13, 2012 at 11:30 AM, ruslan usifov ruslan.usi...@gmail.comwrote: Hm, it's very strange what amount of you data? You linux kernel version? Java version? PS: i can suggest switch diskaccessmode to standart in you case PS:PS also upgrade you linux to latest, and javahotspot to 1.6.32 (from oracle site) 2012/6/13 Gurpreet Singh gurpreet.si...@gmail.com: Alright, here it goes again... Even with mmap_index_only, once the RES memory hit 15 gigs, the read latency went berserk. This happens in 12 hours if diskaccessmode is mmap, abt 48 hrs if its mmap_index_only. only reads happening at 50 reads/second row cache size: 730 mb, row cache hit ratio: 0.75 key cache size: 400 mb, key cache hit ratio: 0.4 heap size (max 8 gigs): used 6.1-6.9 gigs No messages about reducing cache sizes in the logs stats: vmstat 1 : no swapping here, however high sys cpu utilization iostat (looks great) - avg-qu-sz = 8, avg await = 7 ms, svc time = 0.6, util = 15-30% top - VIRT - 19.8g, SHR - 6.1g, RES - 15g, high cpu, buffers - 2mb cfstats - 70-100 ms. This number used to be 20-30 ms. The value of the SHR keeps increasing (owing to mmap i guess), while at the same time buffers keeps decreasing. buffers starts as high as 50 mb, and goes down to 2 mb. This is very easily reproducible for me. Every time the RES memory hits abt 15 gigs, the client starts getting timeouts from cassandra, the sys cpu jumps a lot. All this, even though my row cache hit ratio is almost 0.75. Other than just turning off mmap completely, is there any other solution or setting to avoid a cassandra restart every cpl of days. Something to keep the RES memory to hit such a high number. I have been constantly monitoring the RES, was not seeing issues when RES was at 14 gigs. /G On Fri, Jun 8, 2012 at 10:02 PM, Gurpreet Singh gurpreet.si...@gmail.com wrote: Aaron, Ruslan, I changed the disk access mode to mmap_index_only, and it has been stable ever since, well at least for the past 20 hours. Previously, in abt 10-12 hours, as soon as the resident memory was full, the client would start timing out on all its reads. It looks fine for now, i am going to let it continue to see how long it lasts and if the problem comes again. Aaron, yes, i had turned swap off. The total cpu utilization was at 700% roughly.. It looked like kswapd0 was using just 1 cpu, but cassandra (jsvc) cpu utilization increased quite a bit. top was reporting high system cpu, and low user cpu. vmstat was not showing swapping. java heap size max is 8 gigs. while only 4 gigs was in use, so java heap was doing great. no gc in the logs. iostat was doing ok from what i remember, i will have to reproduce the issue for the exact numbers. cfstats latency had gone very high, but that is partly due to high cpu usage. One thing was clear, that the SHR was inching higher (due to the mmap) while buffer cache which started at abt 20-25mb reduced to 2 MB by the end, which probably means that pagecache was being evicted by the kswapd0. Is there a way to fix the size of the buffer cache and not let system evict it in favour of mmap? Also, mmapping data files would basically cause not only the data (asked for) to be read into main memory, but also a bunch of extra pages (readahead), which would not be very useful, right? The same thing for index would actually be more useful, as there would be more index entries in the readahead
Re: kswapd0 causing read timeouts
Upgrade java (version 1.6.21 have memleaks) to latest 1.6.32. Its abnormally that on 80Gigs you have 15Gigs of index vfs_cache_pressure - used for inodes and dentrys Also to check that you have memleaks use drop_cache sysctl 2012/6/14 Gurpreet Singh gurpreet.si...@gmail.com: JNA is installed. swappiness was 0. vfs_cache_pressure was 100. 2 questions on this.. 1. Is there a way to find out if mlockall really worked other than just the mlockall successful log message? 2. Does cassandra only mlock the jvm heap or also the mmaped memory? I disabled mmap completely, and things look so much better. latency is surprisingly half of what i see when i have mmap enabled. Its funny that i keep reading tall claims abt mmap, but in practise a lot of ppl have problems with it, especially when it uses up all the memory. We have tried mmap for different purposes in our company before,and had finally ended up disabling it, because it just doesnt handle things right when memory is low. Maybe the proc/sys/vm needs to be configured right, but thats not the easiest of configurations to get right. Right now, i am handling only 80 gigs of data. kernel version is 2.6.26. java version is 1.6.21 /G On Wed, Jun 13, 2012 at 8:42 PM, Al Tobey a...@ooyala.com wrote: I would check /etc/sysctl.conf and get the values of /proc/sys/vm/swappiness and /proc/sys/vm/vfs_cache_pressure. If you don't have JNA enabled (which Cassandra uses to fadvise) and swappiness is at its default of 60, the Linux kernel will happily swap out your heap for cache space. Set swappiness to 1 or 'swapoff -a' and kswapd shouldn't be doing much unless you have a too-large heap or some other app using up memory on the system. On Wed, Jun 13, 2012 at 11:30 AM, ruslan usifov ruslan.usi...@gmail.com wrote: Hm, it's very strange what amount of you data? You linux kernel version? Java version? PS: i can suggest switch diskaccessmode to standart in you case PS:PS also upgrade you linux to latest, and javahotspot to 1.6.32 (from oracle site) 2012/6/13 Gurpreet Singh gurpreet.si...@gmail.com: Alright, here it goes again... Even with mmap_index_only, once the RES memory hit 15 gigs, the read latency went berserk. This happens in 12 hours if diskaccessmode is mmap, abt 48 hrs if its mmap_index_only. only reads happening at 50 reads/second row cache size: 730 mb, row cache hit ratio: 0.75 key cache size: 400 mb, key cache hit ratio: 0.4 heap size (max 8 gigs): used 6.1-6.9 gigs No messages about reducing cache sizes in the logs stats: vmstat 1 : no swapping here, however high sys cpu utilization iostat (looks great) - avg-qu-sz = 8, avg await = 7 ms, svc time = 0.6, util = 15-30% top - VIRT - 19.8g, SHR - 6.1g, RES - 15g, high cpu, buffers - 2mb cfstats - 70-100 ms. This number used to be 20-30 ms. The value of the SHR keeps increasing (owing to mmap i guess), while at the same time buffers keeps decreasing. buffers starts as high as 50 mb, and goes down to 2 mb. This is very easily reproducible for me. Every time the RES memory hits abt 15 gigs, the client starts getting timeouts from cassandra, the sys cpu jumps a lot. All this, even though my row cache hit ratio is almost 0.75. Other than just turning off mmap completely, is there any other solution or setting to avoid a cassandra restart every cpl of days. Something to keep the RES memory to hit such a high number. I have been constantly monitoring the RES, was not seeing issues when RES was at 14 gigs. /G On Fri, Jun 8, 2012 at 10:02 PM, Gurpreet Singh gurpreet.si...@gmail.com wrote: Aaron, Ruslan, I changed the disk access mode to mmap_index_only, and it has been stable ever since, well at least for the past 20 hours. Previously, in abt 10-12 hours, as soon as the resident memory was full, the client would start timing out on all its reads. It looks fine for now, i am going to let it continue to see how long it lasts and if the problem comes again. Aaron, yes, i had turned swap off. The total cpu utilization was at 700% roughly.. It looked like kswapd0 was using just 1 cpu, but cassandra (jsvc) cpu utilization increased quite a bit. top was reporting high system cpu, and low user cpu. vmstat was not showing swapping. java heap size max is 8 gigs. while only 4 gigs was in use, so java heap was doing great. no gc in the logs. iostat was doing ok from what i remember, i will have to reproduce the issue for the exact numbers. cfstats latency had gone very high, but that is partly due to high cpu usage. One thing was clear, that the SHR was inching higher (due to the mmap) while buffer cache which started at abt 20-25mb reduced to 2 MB by the end, which probably means that pagecache was being evicted by the kswapd0. Is there a way to fix the size of the buffer cache and not let system evict it in favour of
Re: kswapd0 causing read timeouts
2012/6/14 Gurpreet Singh gurpreet.si...@gmail.com: JNA is installed. swappiness was 0. vfs_cache_pressure was 100. 2 questions on this.. 1. Is there a way to find out if mlockall really worked other than just the mlockall successful log message? yes you must see something like this (from our test server): INFO [main] 2012-06-14 02:03:14,745 DatabaseDescriptor.java (line 233) Global memtable threshold is enabled at 512MB 2. Does cassandra only mlock the jvm heap or also the mmaped memory? Cassandra obviously mlock only heap, and doesn't mmaped sstables I disabled mmap completely, and things look so much better. latency is surprisingly half of what i see when i have mmap enabled. Its funny that i keep reading tall claims abt mmap, but in practise a lot of ppl have problems with it, especially when it uses up all the memory. We have tried mmap for different purposes in our company before,and had finally ended up disabling it, because it just doesnt handle things right when memory is low. Maybe the proc/sys/vm needs to be configured right, but thats not the easiest of configurations to get right. Right now, i am handling only 80 gigs of data. kernel version is 2.6.26. java version is 1.6.21 /G On Wed, Jun 13, 2012 at 8:42 PM, Al Tobey a...@ooyala.com wrote: I would check /etc/sysctl.conf and get the values of /proc/sys/vm/swappiness and /proc/sys/vm/vfs_cache_pressure. If you don't have JNA enabled (which Cassandra uses to fadvise) and swappiness is at its default of 60, the Linux kernel will happily swap out your heap for cache space. Set swappiness to 1 or 'swapoff -a' and kswapd shouldn't be doing much unless you have a too-large heap or some other app using up memory on the system. On Wed, Jun 13, 2012 at 11:30 AM, ruslan usifov ruslan.usi...@gmail.com wrote: Hm, it's very strange what amount of you data? You linux kernel version? Java version? PS: i can suggest switch diskaccessmode to standart in you case PS:PS also upgrade you linux to latest, and javahotspot to 1.6.32 (from oracle site) 2012/6/13 Gurpreet Singh gurpreet.si...@gmail.com: Alright, here it goes again... Even with mmap_index_only, once the RES memory hit 15 gigs, the read latency went berserk. This happens in 12 hours if diskaccessmode is mmap, abt 48 hrs if its mmap_index_only. only reads happening at 50 reads/second row cache size: 730 mb, row cache hit ratio: 0.75 key cache size: 400 mb, key cache hit ratio: 0.4 heap size (max 8 gigs): used 6.1-6.9 gigs No messages about reducing cache sizes in the logs stats: vmstat 1 : no swapping here, however high sys cpu utilization iostat (looks great) - avg-qu-sz = 8, avg await = 7 ms, svc time = 0.6, util = 15-30% top - VIRT - 19.8g, SHR - 6.1g, RES - 15g, high cpu, buffers - 2mb cfstats - 70-100 ms. This number used to be 20-30 ms. The value of the SHR keeps increasing (owing to mmap i guess), while at the same time buffers keeps decreasing. buffers starts as high as 50 mb, and goes down to 2 mb. This is very easily reproducible for me. Every time the RES memory hits abt 15 gigs, the client starts getting timeouts from cassandra, the sys cpu jumps a lot. All this, even though my row cache hit ratio is almost 0.75. Other than just turning off mmap completely, is there any other solution or setting to avoid a cassandra restart every cpl of days. Something to keep the RES memory to hit such a high number. I have been constantly monitoring the RES, was not seeing issues when RES was at 14 gigs. /G On Fri, Jun 8, 2012 at 10:02 PM, Gurpreet Singh gurpreet.si...@gmail.com wrote: Aaron, Ruslan, I changed the disk access mode to mmap_index_only, and it has been stable ever since, well at least for the past 20 hours. Previously, in abt 10-12 hours, as soon as the resident memory was full, the client would start timing out on all its reads. It looks fine for now, i am going to let it continue to see how long it lasts and if the problem comes again. Aaron, yes, i had turned swap off. The total cpu utilization was at 700% roughly.. It looked like kswapd0 was using just 1 cpu, but cassandra (jsvc) cpu utilization increased quite a bit. top was reporting high system cpu, and low user cpu. vmstat was not showing swapping. java heap size max is 8 gigs. while only 4 gigs was in use, so java heap was doing great. no gc in the logs. iostat was doing ok from what i remember, i will have to reproduce the issue for the exact numbers. cfstats latency had gone very high, but that is partly due to high cpu usage. One thing was clear, that the SHR was inching higher (due to the mmap) while buffer cache which started at abt 20-25mb reduced to 2 MB by the end, which probably means that pagecache was being evicted by the kswapd0. Is there a way to fix the size of the buffer cache and not let system evict it
Re: kswapd0 causing read timeouts
Soory i mistaken,here is right string INFO [main] 2012-06-14 02:03:14,520 CLibrary.java (line 109) JNA mlockall successful 2012/6/15 ruslan usifov ruslan.usi...@gmail.com: 2012/6/14 Gurpreet Singh gurpreet.si...@gmail.com: JNA is installed. swappiness was 0. vfs_cache_pressure was 100. 2 questions on this.. 1. Is there a way to find out if mlockall really worked other than just the mlockall successful log message? yes you must see something like this (from our test server): INFO [main] 2012-06-14 02:03:14,745 DatabaseDescriptor.java (line 233) Global memtable threshold is enabled at 512MB 2. Does cassandra only mlock the jvm heap or also the mmaped memory? Cassandra obviously mlock only heap, and doesn't mmaped sstables I disabled mmap completely, and things look so much better. latency is surprisingly half of what i see when i have mmap enabled. Its funny that i keep reading tall claims abt mmap, but in practise a lot of ppl have problems with it, especially when it uses up all the memory. We have tried mmap for different purposes in our company before,and had finally ended up disabling it, because it just doesnt handle things right when memory is low. Maybe the proc/sys/vm needs to be configured right, but thats not the easiest of configurations to get right. Right now, i am handling only 80 gigs of data. kernel version is 2.6.26. java version is 1.6.21 /G On Wed, Jun 13, 2012 at 8:42 PM, Al Tobey a...@ooyala.com wrote: I would check /etc/sysctl.conf and get the values of /proc/sys/vm/swappiness and /proc/sys/vm/vfs_cache_pressure. If you don't have JNA enabled (which Cassandra uses to fadvise) and swappiness is at its default of 60, the Linux kernel will happily swap out your heap for cache space. Set swappiness to 1 or 'swapoff -a' and kswapd shouldn't be doing much unless you have a too-large heap or some other app using up memory on the system. On Wed, Jun 13, 2012 at 11:30 AM, ruslan usifov ruslan.usi...@gmail.com wrote: Hm, it's very strange what amount of you data? You linux kernel version? Java version? PS: i can suggest switch diskaccessmode to standart in you case PS:PS also upgrade you linux to latest, and javahotspot to 1.6.32 (from oracle site) 2012/6/13 Gurpreet Singh gurpreet.si...@gmail.com: Alright, here it goes again... Even with mmap_index_only, once the RES memory hit 15 gigs, the read latency went berserk. This happens in 12 hours if diskaccessmode is mmap, abt 48 hrs if its mmap_index_only. only reads happening at 50 reads/second row cache size: 730 mb, row cache hit ratio: 0.75 key cache size: 400 mb, key cache hit ratio: 0.4 heap size (max 8 gigs): used 6.1-6.9 gigs No messages about reducing cache sizes in the logs stats: vmstat 1 : no swapping here, however high sys cpu utilization iostat (looks great) - avg-qu-sz = 8, avg await = 7 ms, svc time = 0.6, util = 15-30% top - VIRT - 19.8g, SHR - 6.1g, RES - 15g, high cpu, buffers - 2mb cfstats - 70-100 ms. This number used to be 20-30 ms. The value of the SHR keeps increasing (owing to mmap i guess), while at the same time buffers keeps decreasing. buffers starts as high as 50 mb, and goes down to 2 mb. This is very easily reproducible for me. Every time the RES memory hits abt 15 gigs, the client starts getting timeouts from cassandra, the sys cpu jumps a lot. All this, even though my row cache hit ratio is almost 0.75. Other than just turning off mmap completely, is there any other solution or setting to avoid a cassandra restart every cpl of days. Something to keep the RES memory to hit such a high number. I have been constantly monitoring the RES, was not seeing issues when RES was at 14 gigs. /G On Fri, Jun 8, 2012 at 10:02 PM, Gurpreet Singh gurpreet.si...@gmail.com wrote: Aaron, Ruslan, I changed the disk access mode to mmap_index_only, and it has been stable ever since, well at least for the past 20 hours. Previously, in abt 10-12 hours, as soon as the resident memory was full, the client would start timing out on all its reads. It looks fine for now, i am going to let it continue to see how long it lasts and if the problem comes again. Aaron, yes, i had turned swap off. The total cpu utilization was at 700% roughly.. It looked like kswapd0 was using just 1 cpu, but cassandra (jsvc) cpu utilization increased quite a bit. top was reporting high system cpu, and low user cpu. vmstat was not showing swapping. java heap size max is 8 gigs. while only 4 gigs was in use, so java heap was doing great. no gc in the logs. iostat was doing ok from what i remember, i will have to reproduce the issue for the exact numbers. cfstats latency had gone very high, but that is partly due to high cpu usage. One thing was clear, that the SHR was inching higher (due to the mmap) while buffer cache which started at abt 20-25mb
Re: kswapd0 causing read timeouts
Alright, here it goes again... Even with mmap_index_only, once the RES memory hit 15 gigs, the read latency went berserk. This happens in 12 hours if diskaccessmode is mmap, abt 48 hrs if its mmap_index_only. only reads happening at 50 reads/second row cache size: 730 mb, row cache hit ratio: 0.75 key cache size: 400 mb, key cache hit ratio: 0.4 heap size (max 8 gigs): used 6.1-6.9 gigs No messages about reducing cache sizes in the logs stats: vmstat 1 : no swapping here, however high sys cpu utilization iostat (looks great) - avg-qu-sz = 8, avg await = 7 ms, svc time = 0.6, util = 15-30% top - VIRT - 19.8g, SHR - 6.1g, RES - 15g, high cpu, buffers - 2mb cfstats - 70-100 ms. This number used to be 20-30 ms. The value of the SHR keeps increasing (owing to mmap i guess), while at the same time buffers keeps decreasing. buffers starts as high as 50 mb, and goes down to 2 mb. This is very easily reproducible for me. Every time the RES memory hits abt 15 gigs, the client starts getting timeouts from cassandra, the sys cpu jumps a lot. All this, even though my row cache hit ratio is almost 0.75. Other than just turning off mmap completely, is there any other solution or setting to avoid a cassandra restart every cpl of days. Something to keep the RES memory to hit such a high number. I have been constantly monitoring the RES, was not seeing issues when RES was at 14 gigs. /G On Fri, Jun 8, 2012 at 10:02 PM, Gurpreet Singh gurpreet.si...@gmail.comwrote: Aaron, Ruslan, I changed the disk access mode to mmap_index_only, and it has been stable ever since, well at least for the past 20 hours. Previously, in abt 10-12 hours, as soon as the resident memory was full, the client would start timing out on all its reads. It looks fine for now, i am going to let it continue to see how long it lasts and if the problem comes again. Aaron, yes, i had turned swap off. The total cpu utilization was at 700% roughly.. It looked like kswapd0 was using just 1 cpu, but cassandra (jsvc) cpu utilization increased quite a bit. top was reporting high system cpu, and low user cpu. vmstat was not showing swapping. java heap size max is 8 gigs. while only 4 gigs was in use, so java heap was doing great. no gc in the logs. iostat was doing ok from what i remember, i will have to reproduce the issue for the exact numbers. cfstats latency had gone very high, but that is partly due to high cpu usage. One thing was clear, that the SHR was inching higher (due to the mmap) while buffer cache which started at abt 20-25mb reduced to 2 MB by the end, which probably means that pagecache was being evicted by the kswapd0. Is there a way to fix the size of the buffer cache and not let system evict it in favour of mmap? Also, mmapping data files would basically cause not only the data (asked for) to be read into main memory, but also a bunch of extra pages (readahead), which would not be very useful, right? The same thing for index would actually be more useful, as there would be more index entries in the readahead part.. and the index files being small wouldnt cause memory pressure that page cache would be evicted. mmapping the data files would make sense if the data size is smaller than the RAM or the hot data set is smaller than the RAM, otherwise just the index would probably be a better thing to mmap, no?. In my case data size is 85 gigs, while available RAM is 16 gigs (only 8 gigs after heap). /G On Fri, Jun 8, 2012 at 11:44 AM, aaron morton aa...@thelastpickle.comwrote: Ruslan, Why did you suggest changing the disk_access_mode ? Gurpreet, I would leave the disk_access_mode with the default until you have a reason to change it. 8 core, 16 gb ram, 6 data disks raid0, no swap configured is swap disabled ? Gradually, the system cpu becomes high almost 70%, and the client starts getting continuous timeouts 70% of one core or 70% of all cores ? Check the server logs, is there GC activity ? check nodetool cfstats to see the read latency for the cf. Take a look at vmstat to see if you are swapping, and look at iostats to see if io is the problem http://spyced.blogspot.co.nz/2010/01/linux-performance-basics.html Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 8/06/2012, at 9:00 PM, Gurpreet Singh wrote: Thanks Ruslan. I will try the mmap_index_only. Is there any guideline as to when to leave it to auto and when to use mmap_index_only? /G On Fri, Jun 8, 2012 at 1:21 AM, ruslan usifov ruslan.usi...@gmail.comwrote: disk_access_mode: mmap?? set to disk_access_mode: mmap_index_only in cassandra yaml 2012/6/8 Gurpreet Singh gurpreet.si...@gmail.com: Hi, I am testing cassandra 1.1 on a 1 node cluster. 8 core, 16 gb ram, 6 data disks raid0, no swap configured cassandra 1.1.1 heap size: 8 gigs key cache size in mb: 800 (used only 200mb till now) memtable_total_space_in_mb : 2048 I am
Re: kswapd0 causing read timeouts
Hm, it's very strange what amount of you data? You linux kernel version? Java version? PS: i can suggest switch diskaccessmode to standart in you case PS:PS also upgrade you linux to latest, and javahotspot to 1.6.32 (from oracle site) 2012/6/13 Gurpreet Singh gurpreet.si...@gmail.com: Alright, here it goes again... Even with mmap_index_only, once the RES memory hit 15 gigs, the read latency went berserk. This happens in 12 hours if diskaccessmode is mmap, abt 48 hrs if its mmap_index_only. only reads happening at 50 reads/second row cache size: 730 mb, row cache hit ratio: 0.75 key cache size: 400 mb, key cache hit ratio: 0.4 heap size (max 8 gigs): used 6.1-6.9 gigs No messages about reducing cache sizes in the logs stats: vmstat 1 : no swapping here, however high sys cpu utilization iostat (looks great) - avg-qu-sz = 8, avg await = 7 ms, svc time = 0.6, util = 15-30% top - VIRT - 19.8g, SHR - 6.1g, RES - 15g, high cpu, buffers - 2mb cfstats - 70-100 ms. This number used to be 20-30 ms. The value of the SHR keeps increasing (owing to mmap i guess), while at the same time buffers keeps decreasing. buffers starts as high as 50 mb, and goes down to 2 mb. This is very easily reproducible for me. Every time the RES memory hits abt 15 gigs, the client starts getting timeouts from cassandra, the sys cpu jumps a lot. All this, even though my row cache hit ratio is almost 0.75. Other than just turning off mmap completely, is there any other solution or setting to avoid a cassandra restart every cpl of days. Something to keep the RES memory to hit such a high number. I have been constantly monitoring the RES, was not seeing issues when RES was at 14 gigs. /G On Fri, Jun 8, 2012 at 10:02 PM, Gurpreet Singh gurpreet.si...@gmail.com wrote: Aaron, Ruslan, I changed the disk access mode to mmap_index_only, and it has been stable ever since, well at least for the past 20 hours. Previously, in abt 10-12 hours, as soon as the resident memory was full, the client would start timing out on all its reads. It looks fine for now, i am going to let it continue to see how long it lasts and if the problem comes again. Aaron, yes, i had turned swap off. The total cpu utilization was at 700% roughly.. It looked like kswapd0 was using just 1 cpu, but cassandra (jsvc) cpu utilization increased quite a bit. top was reporting high system cpu, and low user cpu. vmstat was not showing swapping. java heap size max is 8 gigs. while only 4 gigs was in use, so java heap was doing great. no gc in the logs. iostat was doing ok from what i remember, i will have to reproduce the issue for the exact numbers. cfstats latency had gone very high, but that is partly due to high cpu usage. One thing was clear, that the SHR was inching higher (due to the mmap) while buffer cache which started at abt 20-25mb reduced to 2 MB by the end, which probably means that pagecache was being evicted by the kswapd0. Is there a way to fix the size of the buffer cache and not let system evict it in favour of mmap? Also, mmapping data files would basically cause not only the data (asked for) to be read into main memory, but also a bunch of extra pages (readahead), which would not be very useful, right? The same thing for index would actually be more useful, as there would be more index entries in the readahead part.. and the index files being small wouldnt cause memory pressure that page cache would be evicted. mmapping the data files would make sense if the data size is smaller than the RAM or the hot data set is smaller than the RAM, otherwise just the index would probably be a better thing to mmap, no?. In my case data size is 85 gigs, while available RAM is 16 gigs (only 8 gigs after heap). /G On Fri, Jun 8, 2012 at 11:44 AM, aaron morton aa...@thelastpickle.com wrote: Ruslan, Why did you suggest changing the disk_access_mode ? Gurpreet, I would leave the disk_access_mode with the default until you have a reason to change it. 8 core, 16 gb ram, 6 data disks raid0, no swap configured is swap disabled ? Gradually, the system cpu becomes high almost 70%, and the client starts getting continuous timeouts 70% of one core or 70% of all cores ? Check the server logs, is there GC activity ? check nodetool cfstats to see the read latency for the cf. Take a look at vmstat to see if you are swapping, and look at iostats to see if io is the problem http://spyced.blogspot.co.nz/2010/01/linux-performance-basics.html Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 8/06/2012, at 9:00 PM, Gurpreet Singh wrote: Thanks Ruslan. I will try the mmap_index_only. Is there any guideline as to when to leave it to auto and when to use mmap_index_only? /G On Fri, Jun 8, 2012 at 1:21 AM, ruslan usifov ruslan.usi...@gmail.com wrote: disk_access_mode: mmap?? set to disk_access_mode: mmap_index_only in
Re: kswapd0 causing read timeouts
I would check /etc/sysctl.conf and get the values of /proc/sys/vm/swappiness and /proc/sys/vm/vfs_cache_pressure. If you don't have JNA enabled (which Cassandra uses to fadvise) and swappiness is at its default of 60, the Linux kernel will happily swap out your heap for cache space. Set swappiness to 1 or 'swapoff -a' and kswapd shouldn't be doing much unless you have a too-large heap or some other app using up memory on the system. On Wed, Jun 13, 2012 at 11:30 AM, ruslan usifov ruslan.usi...@gmail.comwrote: Hm, it's very strange what amount of you data? You linux kernel version? Java version? PS: i can suggest switch diskaccessmode to standart in you case PS:PS also upgrade you linux to latest, and javahotspot to 1.6.32 (from oracle site) 2012/6/13 Gurpreet Singh gurpreet.si...@gmail.com: Alright, here it goes again... Even with mmap_index_only, once the RES memory hit 15 gigs, the read latency went berserk. This happens in 12 hours if diskaccessmode is mmap, abt 48 hrs if its mmap_index_only. only reads happening at 50 reads/second row cache size: 730 mb, row cache hit ratio: 0.75 key cache size: 400 mb, key cache hit ratio: 0.4 heap size (max 8 gigs): used 6.1-6.9 gigs No messages about reducing cache sizes in the logs stats: vmstat 1 : no swapping here, however high sys cpu utilization iostat (looks great) - avg-qu-sz = 8, avg await = 7 ms, svc time = 0.6, util = 15-30% top - VIRT - 19.8g, SHR - 6.1g, RES - 15g, high cpu, buffers - 2mb cfstats - 70-100 ms. This number used to be 20-30 ms. The value of the SHR keeps increasing (owing to mmap i guess), while at the same time buffers keeps decreasing. buffers starts as high as 50 mb, and goes down to 2 mb. This is very easily reproducible for me. Every time the RES memory hits abt 15 gigs, the client starts getting timeouts from cassandra, the sys cpu jumps a lot. All this, even though my row cache hit ratio is almost 0.75. Other than just turning off mmap completely, is there any other solution or setting to avoid a cassandra restart every cpl of days. Something to keep the RES memory to hit such a high number. I have been constantly monitoring the RES, was not seeing issues when RES was at 14 gigs. /G On Fri, Jun 8, 2012 at 10:02 PM, Gurpreet Singh gurpreet.si...@gmail.com wrote: Aaron, Ruslan, I changed the disk access mode to mmap_index_only, and it has been stable ever since, well at least for the past 20 hours. Previously, in abt 10-12 hours, as soon as the resident memory was full, the client would start timing out on all its reads. It looks fine for now, i am going to let it continue to see how long it lasts and if the problem comes again. Aaron, yes, i had turned swap off. The total cpu utilization was at 700% roughly.. It looked like kswapd0 was using just 1 cpu, but cassandra (jsvc) cpu utilization increased quite a bit. top was reporting high system cpu, and low user cpu. vmstat was not showing swapping. java heap size max is 8 gigs. while only 4 gigs was in use, so java heap was doing great. no gc in the logs. iostat was doing ok from what i remember, i will have to reproduce the issue for the exact numbers. cfstats latency had gone very high, but that is partly due to high cpu usage. One thing was clear, that the SHR was inching higher (due to the mmap) while buffer cache which started at abt 20-25mb reduced to 2 MB by the end, which probably means that pagecache was being evicted by the kswapd0. Is there a way to fix the size of the buffer cache and not let system evict it in favour of mmap? Also, mmapping data files would basically cause not only the data (asked for) to be read into main memory, but also a bunch of extra pages (readahead), which would not be very useful, right? The same thing for index would actually be more useful, as there would be more index entries in the readahead part.. and the index files being small wouldnt cause memory pressure that page cache would be evicted. mmapping the data files would make sense if the data size is smaller than the RAM or the hot data set is smaller than the RAM, otherwise just the index would probably be a better thing to mmap, no?. In my case data size is 85 gigs, while available RAM is 16 gigs (only 8 gigs after heap). /G On Fri, Jun 8, 2012 at 11:44 AM, aaron morton aa...@thelastpickle.com wrote: Ruslan, Why did you suggest changing the disk_access_mode ? Gurpreet, I would leave the disk_access_mode with the default until you have a reason to change it. 8 core, 16 gb ram, 6 data disks raid0, no swap configured is swap disabled ? Gradually, the system cpu becomes high almost 70%, and the client starts getting continuous timeouts 70% of one core or 70% of all cores ? Check the server logs, is there GC activity ? check nodetool cfstats to see the read
kswapd0 causing read timeouts
Hi, I am testing cassandra 1.1 on a 1 node cluster. 8 core, 16 gb ram, 6 data disks raid0, no swap configured cassandra 1.1.1 heap size: 8 gigs key cache size in mb: 800 (used only 200mb till now) memtable_total_space_in_mb : 2048 I am running a read workload.. about 30 reads/second. no writes at all. The system runs fine for roughly 12 hours. jconsole shows that my heap size has hardly touched 4 gigs. top shows - SHR increasing slowly from 100 mb to 6.6 gigs in these 12 hrs RES increases slowly from 6 gigs all the way to 15 gigs buffers are at a healthy 25 mb at some point and that goes down to 2 mb in these 12 hrs VIRT stays at 85 gigs I understand that SHR goes up because of mmap, RES goes up because it is showing SHR value as well. After around 10-12 hrs, the cpu utilization of the system starts increasing, and i notice that kswapd0 process starts becoming more active. Gradually, the system cpu becomes high almost 70%, and the client starts getting continuous timeouts. The fact that the buffers went down from 20 mb to 2 mb suggests that kswapd0 is probably swapping out the pagecache. Is there a way out of this to avoid the kswapd0 starting to do things even when there is no swap configured? This is very easily reproducible for me, and would like a way out of this situation. Do i need to adjust vm memory management stuff like pagecache, vfs_cache_pressure.. things like that? just some extra information, jna is installed, mlockall is successful. there is no compaction running. would appreciate any help on this. Thanks Gurpreet
Re: kswapd0 causing read timeouts
disk_access_mode: mmap?? set to disk_access_mode: mmap_index_only in cassandra yaml 2012/6/8 Gurpreet Singh gurpreet.si...@gmail.com: Hi, I am testing cassandra 1.1 on a 1 node cluster. 8 core, 16 gb ram, 6 data disks raid0, no swap configured cassandra 1.1.1 heap size: 8 gigs key cache size in mb: 800 (used only 200mb till now) memtable_total_space_in_mb : 2048 I am running a read workload.. about 30 reads/second. no writes at all. The system runs fine for roughly 12 hours. jconsole shows that my heap size has hardly touched 4 gigs. top shows - SHR increasing slowly from 100 mb to 6.6 gigs in these 12 hrs RES increases slowly from 6 gigs all the way to 15 gigs buffers are at a healthy 25 mb at some point and that goes down to 2 mb in these 12 hrs VIRT stays at 85 gigs I understand that SHR goes up because of mmap, RES goes up because it is showing SHR value as well. After around 10-12 hrs, the cpu utilization of the system starts increasing, and i notice that kswapd0 process starts becoming more active. Gradually, the system cpu becomes high almost 70%, and the client starts getting continuous timeouts. The fact that the buffers went down from 20 mb to 2 mb suggests that kswapd0 is probably swapping out the pagecache. Is there a way out of this to avoid the kswapd0 starting to do things even when there is no swap configured? This is very easily reproducible for me, and would like a way out of this situation. Do i need to adjust vm memory management stuff like pagecache, vfs_cache_pressure.. things like that? just some extra information, jna is installed, mlockall is successful. there is no compaction running. would appreciate any help on this. Thanks Gurpreet
Re: kswapd0 causing read timeouts
Thanks Ruslan. I will try the mmap_index_only. Is there any guideline as to when to leave it to auto and when to use mmap_index_only? /G On Fri, Jun 8, 2012 at 1:21 AM, ruslan usifov ruslan.usi...@gmail.comwrote: disk_access_mode: mmap?? set to disk_access_mode: mmap_index_only in cassandra yaml 2012/6/8 Gurpreet Singh gurpreet.si...@gmail.com: Hi, I am testing cassandra 1.1 on a 1 node cluster. 8 core, 16 gb ram, 6 data disks raid0, no swap configured cassandra 1.1.1 heap size: 8 gigs key cache size in mb: 800 (used only 200mb till now) memtable_total_space_in_mb : 2048 I am running a read workload.. about 30 reads/second. no writes at all. The system runs fine for roughly 12 hours. jconsole shows that my heap size has hardly touched 4 gigs. top shows - SHR increasing slowly from 100 mb to 6.6 gigs in these 12 hrs RES increases slowly from 6 gigs all the way to 15 gigs buffers are at a healthy 25 mb at some point and that goes down to 2 mb in these 12 hrs VIRT stays at 85 gigs I understand that SHR goes up because of mmap, RES goes up because it is showing SHR value as well. After around 10-12 hrs, the cpu utilization of the system starts increasing, and i notice that kswapd0 process starts becoming more active. Gradually, the system cpu becomes high almost 70%, and the client starts getting continuous timeouts. The fact that the buffers went down from 20 mb to 2 mb suggests that kswapd0 is probably swapping out the pagecache. Is there a way out of this to avoid the kswapd0 starting to do things even when there is no swap configured? This is very easily reproducible for me, and would like a way out of this situation. Do i need to adjust vm memory management stuff like pagecache, vfs_cache_pressure.. things like that? just some extra information, jna is installed, mlockall is successful. there is no compaction running. would appreciate any help on this. Thanks Gurpreet
Re: kswapd0 causing read timeouts
2012/6/8 aaron morton aa...@thelastpickle.com: Ruslan, Why did you suggest changing the disk_access_mode ? Because this bring problems on empty seat, in any case for me mmap bring similar problem and i doesn't have find any solution to resolve it, only change disk_access_mode:-((. For me also will be interesting hear results of author of this theme Gurpreet, I would leave the disk_access_mode with the default until you have a reason to change it. 8 core, 16 gb ram, 6 data disks raid0, no swap configured is swap disabled ? Gradually, the system cpu becomes high almost 70%, and the client starts getting continuous timeouts 70% of one core or 70% of all cores ? Check the server logs, is there GC activity ? check nodetool cfstats to see the read latency for the cf. Take a look at vmstat to see if you are swapping, and look at iostats to see if io is the problem http://spyced.blogspot.co.nz/2010/01/linux-performance-basics.html Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 8/06/2012, at 9:00 PM, Gurpreet Singh wrote: Thanks Ruslan. I will try the mmap_index_only. Is there any guideline as to when to leave it to auto and when to use mmap_index_only? /G On Fri, Jun 8, 2012 at 1:21 AM, ruslan usifov ruslan.usi...@gmail.com wrote: disk_access_mode: mmap?? set to disk_access_mode: mmap_index_only in cassandra yaml 2012/6/8 Gurpreet Singh gurpreet.si...@gmail.com: Hi, I am testing cassandra 1.1 on a 1 node cluster. 8 core, 16 gb ram, 6 data disks raid0, no swap configured cassandra 1.1.1 heap size: 8 gigs key cache size in mb: 800 (used only 200mb till now) memtable_total_space_in_mb : 2048 I am running a read workload.. about 30 reads/second. no writes at all. The system runs fine for roughly 12 hours. jconsole shows that my heap size has hardly touched 4 gigs. top shows - SHR increasing slowly from 100 mb to 6.6 gigs in these 12 hrs RES increases slowly from 6 gigs all the way to 15 gigs buffers are at a healthy 25 mb at some point and that goes down to 2 mb in these 12 hrs VIRT stays at 85 gigs I understand that SHR goes up because of mmap, RES goes up because it is showing SHR value as well. After around 10-12 hrs, the cpu utilization of the system starts increasing, and i notice that kswapd0 process starts becoming more active. Gradually, the system cpu becomes high almost 70%, and the client starts getting continuous timeouts. The fact that the buffers went down from 20 mb to 2 mb suggests that kswapd0 is probably swapping out the pagecache. Is there a way out of this to avoid the kswapd0 starting to do things even when there is no swap configured? This is very easily reproducible for me, and would like a way out of this situation. Do i need to adjust vm memory management stuff like pagecache, vfs_cache_pressure.. things like that? just some extra information, jna is installed, mlockall is successful. there is no compaction running. would appreciate any help on this. Thanks Gurpreet
Re: kswapd0 causing read timeouts
Aaron, Ruslan, I changed the disk access mode to mmap_index_only, and it has been stable ever since, well at least for the past 20 hours. Previously, in abt 10-12 hours, as soon as the resident memory was full, the client would start timing out on all its reads. It looks fine for now, i am going to let it continue to see how long it lasts and if the problem comes again. Aaron, yes, i had turned swap off. The total cpu utilization was at 700% roughly.. It looked like kswapd0 was using just 1 cpu, but cassandra (jsvc) cpu utilization increased quite a bit. top was reporting high system cpu, and low user cpu. vmstat was not showing swapping. java heap size max is 8 gigs. while only 4 gigs was in use, so java heap was doing great. no gc in the logs. iostat was doing ok from what i remember, i will have to reproduce the issue for the exact numbers. cfstats latency had gone very high, but that is partly due to high cpu usage. One thing was clear, that the SHR was inching higher (due to the mmap) while buffer cache which started at abt 20-25mb reduced to 2 MB by the end, which probably means that pagecache was being evicted by the kswapd0. Is there a way to fix the size of the buffer cache and not let system evict it in favour of mmap? Also, mmapping data files would basically cause not only the data (asked for) to be read into main memory, but also a bunch of extra pages (readahead), which would not be very useful, right? The same thing for index would actually be more useful, as there would be more index entries in the readahead part.. and the index files being small wouldnt cause memory pressure that page cache would be evicted. mmapping the data files would make sense if the data size is smaller than the RAM or the hot data set is smaller than the RAM, otherwise just the index would probably be a better thing to mmap, no?. In my case data size is 85 gigs, while available RAM is 16 gigs (only 8 gigs after heap). /G On Fri, Jun 8, 2012 at 11:44 AM, aaron morton aa...@thelastpickle.comwrote: Ruslan, Why did you suggest changing the disk_access_mode ? Gurpreet, I would leave the disk_access_mode with the default until you have a reason to change it. 8 core, 16 gb ram, 6 data disks raid0, no swap configured is swap disabled ? Gradually, the system cpu becomes high almost 70%, and the client starts getting continuous timeouts 70% of one core or 70% of all cores ? Check the server logs, is there GC activity ? check nodetool cfstats to see the read latency for the cf. Take a look at vmstat to see if you are swapping, and look at iostats to see if io is the problem http://spyced.blogspot.co.nz/2010/01/linux-performance-basics.html Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 8/06/2012, at 9:00 PM, Gurpreet Singh wrote: Thanks Ruslan. I will try the mmap_index_only. Is there any guideline as to when to leave it to auto and when to use mmap_index_only? /G On Fri, Jun 8, 2012 at 1:21 AM, ruslan usifov ruslan.usi...@gmail.comwrote: disk_access_mode: mmap?? set to disk_access_mode: mmap_index_only in cassandra yaml 2012/6/8 Gurpreet Singh gurpreet.si...@gmail.com: Hi, I am testing cassandra 1.1 on a 1 node cluster. 8 core, 16 gb ram, 6 data disks raid0, no swap configured cassandra 1.1.1 heap size: 8 gigs key cache size in mb: 800 (used only 200mb till now) memtable_total_space_in_mb : 2048 I am running a read workload.. about 30 reads/second. no writes at all. The system runs fine for roughly 12 hours. jconsole shows that my heap size has hardly touched 4 gigs. top shows - SHR increasing slowly from 100 mb to 6.6 gigs in these 12 hrs RES increases slowly from 6 gigs all the way to 15 gigs buffers are at a healthy 25 mb at some point and that goes down to 2 mb in these 12 hrs VIRT stays at 85 gigs I understand that SHR goes up because of mmap, RES goes up because it is showing SHR value as well. After around 10-12 hrs, the cpu utilization of the system starts increasing, and i notice that kswapd0 process starts becoming more active. Gradually, the system cpu becomes high almost 70%, and the client starts getting continuous timeouts. The fact that the buffers went down from 20 mb to 2 mb suggests that kswapd0 is probably swapping out the pagecache. Is there a way out of this to avoid the kswapd0 starting to do things even when there is no swap configured? This is very easily reproducible for me, and would like a way out of this situation. Do i need to adjust vm memory management stuff like pagecache, vfs_cache_pressure.. things like that? just some extra information, jna is installed, mlockall is successful. there is no compaction running. would appreciate any help on this. Thanks Gurpreet