RE: Freeing up disk space on Cassandra 1.1.5 with Size-Tiered compaction.
So if my calculations are correct a terabyte sized database would require a minimum of 15 nodes (RF = 3). That sound about right? 2000 / 400 * RF From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Thursday, December 06, 2012 9:43 PM To: user@cassandra.apache.org Subject: Re: Freeing up disk space on Cassandra 1.1.5 with Size-Tiered compaction. Meaning terabyte size databases. Lots of people have TB sized systems. Just add more nodes. 300 to 400 Gb is just a rough guideline. The bigger picture is considering how routine and non routine maintenance tasks are going to be carried out. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 7/12/2012, at 4:38 AM, Edward Capriolo edlinuxg...@gmail.commailto:edlinuxg...@gmail.com wrote: http://wiki.apache.org/cassandra/LargeDataSetConsiderations On Thu, Dec 6, 2012 at 9:53 AM, Poziombka, Wade L wade.l.poziom...@intel.commailto:wade.l.poziom...@intel.com wrote: Having so much data on each node is a potential bad day. Is this discussed somewhere on the Cassandra documentation (limits, practices etc)? We are also trying to load up quite a lot of data and have hit memory issues (bloom filter etc.) in 1.0.10. I would like to read up on big data usage of Cassandra. Meaning terabyte size databases. I do get your point about the amount of time required to recover downed node. But this 300-400MB business is interesting to me. Thanks in advance. Wade From: aaron morton [mailto:aa...@thelastpickle.commailto:aa...@thelastpickle.com] Sent: Wednesday, December 05, 2012 9:23 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Freeing up disk space on Cassandra 1.1.5 with Size-Tiered compaction. Basically we were successful on two of the nodes. They both took ~2 days and 11 hours to complete and at the end we saw one very large file ~900GB and the rest much smaller (the overall size decreased). This is what we expected! I would recommend having up to 300MB to 400MB per node on a regular HDD with 1GB networking. But on the 3rd node, we suspect major compaction didn't actually finish it's job... The file list looks odd. Check the time stamps, on the files. You should not have files older than when compaction started. 8GB heap The default is 4GB max now days. 1) Do you expect problems with the 3rd node during 2 weeks more of operations, in the conditions seen below? I cannot answer that. 2) Should we restart with leveled compaction next year? I would run some tests to see how it works for you workload. 4) Should we consider increasing the cluster capacity? IMHO yes. You may also want to do some experiments with turing compression on if it not already enabled. Having so much data on each node is a potential bad day. If instead you had to move or repair one of those nodes how long would it take for cassandra to stream all the data over ? (Or to rsync the data over.) How long does it take to run nodetool repair on the node ? With RF 3, if you lose a node you have lost your redundancy. It's important to have a plan about how to get it back and how long it may take. Hope that helps. - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.comhttp://www.thelastpickle.com/ On 6/12/2012, at 3:40 AM, Alexandru Sicoe adsi...@gmail.commailto:adsi...@gmail.com wrote: Hi guys, Sorry for the late follow-up but I waited to run major compactions on all 3 nodes at a time before replying with my findings. Basically we were successful on two of the nodes. They both took ~2 days and 11 hours to complete and at the end we saw one very large file ~900GB and the rest much smaller (the overall size decreased). This is what we expected! But on the 3rd node, we suspect major compaction didn't actually finish it's job. First of all nodetool compact returned much earlier than the rest - after one day and 15 hrs. Secondly from the 1.4TBs initially on the node only about 36GB were freed up (almost the same size as before). Saw nothing in the server log (debug not enabled). Below I pasted some more details about file sizes before and after compaction on this third node and disk occupancy. The situation is maybe not so dramatic for us because in less than 2 weeks we will have a down time till after the new year. During this we can completely delete all the data in the cluster and start fresh with TTLs for 1 month (as suggested by Aaron and 8GB heap as suggested by Alain - thanks). Questions: 1) Do you expect problems with the 3rd node during 2 weeks more of operations, in the conditions seen below? [Note: we expect the minor compactions to continue building up files but never really getting to compacting the large file and thus not needing much temporarily extra disk space]. 2) Should we restart with leveled compaction next year? [Note: Aaron was right, we have 1 week rows which get
RE: Freeing up disk space on Cassandra 1.1.5 with Size-Tiered compaction.
duh, sorry. That estimate is 2 TB would be 15 nodes rf = 3 From: Poziombka, Wade L [mailto:wade.l.poziom...@intel.com] Sent: Friday, December 07, 2012 7:15 AM To: user@cassandra.apache.org Subject: RE: Freeing up disk space on Cassandra 1.1.5 with Size-Tiered compaction. So if my calculations are correct a terabyte sized database would require a minimum of 15 nodes (RF = 3). That sound about right? 2000 / 400 * RF From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Thursday, December 06, 2012 9:43 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Freeing up disk space on Cassandra 1.1.5 with Size-Tiered compaction. Meaning terabyte size databases. Lots of people have TB sized systems. Just add more nodes. 300 to 400 Gb is just a rough guideline. The bigger picture is considering how routine and non routine maintenance tasks are going to be carried out. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 7/12/2012, at 4:38 AM, Edward Capriolo edlinuxg...@gmail.commailto:edlinuxg...@gmail.com wrote: http://wiki.apache.org/cassandra/LargeDataSetConsiderations On Thu, Dec 6, 2012 at 9:53 AM, Poziombka, Wade L wade.l.poziom...@intel.commailto:wade.l.poziom...@intel.com wrote: Having so much data on each node is a potential bad day. Is this discussed somewhere on the Cassandra documentation (limits, practices etc)? We are also trying to load up quite a lot of data and have hit memory issues (bloom filter etc.) in 1.0.10. I would like to read up on big data usage of Cassandra. Meaning terabyte size databases. I do get your point about the amount of time required to recover downed node. But this 300-400MB business is interesting to me. Thanks in advance. Wade From: aaron morton [mailto:aa...@thelastpickle.commailto:aa...@thelastpickle.com] Sent: Wednesday, December 05, 2012 9:23 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Freeing up disk space on Cassandra 1.1.5 with Size-Tiered compaction. Basically we were successful on two of the nodes. They both took ~2 days and 11 hours to complete and at the end we saw one very large file ~900GB and the rest much smaller (the overall size decreased). This is what we expected! I would recommend having up to 300MB to 400MB per node on a regular HDD with 1GB networking. But on the 3rd node, we suspect major compaction didn't actually finish it's job... The file list looks odd. Check the time stamps, on the files. You should not have files older than when compaction started. 8GB heap The default is 4GB max now days. 1) Do you expect problems with the 3rd node during 2 weeks more of operations, in the conditions seen below? I cannot answer that. 2) Should we restart with leveled compaction next year? I would run some tests to see how it works for you workload. 4) Should we consider increasing the cluster capacity? IMHO yes. You may also want to do some experiments with turing compression on if it not already enabled. Having so much data on each node is a potential bad day. If instead you had to move or repair one of those nodes how long would it take for cassandra to stream all the data over ? (Or to rsync the data over.) How long does it take to run nodetool repair on the node ? With RF 3, if you lose a node you have lost your redundancy. It's important to have a plan about how to get it back and how long it may take. Hope that helps. - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.comhttp://www.thelastpickle.com/ On 6/12/2012, at 3:40 AM, Alexandru Sicoe adsi...@gmail.commailto:adsi...@gmail.com wrote: Hi guys, Sorry for the late follow-up but I waited to run major compactions on all 3 nodes at a time before replying with my findings. Basically we were successful on two of the nodes. They both took ~2 days and 11 hours to complete and at the end we saw one very large file ~900GB and the rest much smaller (the overall size decreased). This is what we expected! But on the 3rd node, we suspect major compaction didn't actually finish it's job. First of all nodetool compact returned much earlier than the rest - after one day and 15 hrs. Secondly from the 1.4TBs initially on the node only about 36GB were freed up (almost the same size as before). Saw nothing in the server log (debug not enabled). Below I pasted some more details about file sizes before and after compaction on this third node and disk occupancy. The situation is maybe not so dramatic for us because in less than 2 weeks we will have a down time till after the new year. During this we can completely delete all the data in the cluster and start fresh with TTLs for 1 month (as suggested by Aaron and 8GB heap as suggested by Alain - thanks). Questions: 1) Do you expect problems with the 3rd node during 2 weeks more of operations, in the conditions
RE: Freeing up disk space on Cassandra 1.1.5 with Size-Tiered compaction.
Having so much data on each node is a potential bad day. Is this discussed somewhere on the Cassandra documentation (limits, practices etc)? We are also trying to load up quite a lot of data and have hit memory issues (bloom filter etc.) in 1.0.10. I would like to read up on big data usage of Cassandra. Meaning terabyte size databases. I do get your point about the amount of time required to recover downed node. But this 300-400MB business is interesting to me. Thanks in advance. Wade From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Wednesday, December 05, 2012 9:23 PM To: user@cassandra.apache.org Subject: Re: Freeing up disk space on Cassandra 1.1.5 with Size-Tiered compaction. Basically we were successful on two of the nodes. They both took ~2 days and 11 hours to complete and at the end we saw one very large file ~900GB and the rest much smaller (the overall size decreased). This is what we expected! I would recommend having up to 300MB to 400MB per node on a regular HDD with 1GB networking. But on the 3rd node, we suspect major compaction didn't actually finish it's job... The file list looks odd. Check the time stamps, on the files. You should not have files older than when compaction started. 8GB heap The default is 4GB max now days. 1) Do you expect problems with the 3rd node during 2 weeks more of operations, in the conditions seen below? I cannot answer that. 2) Should we restart with leveled compaction next year? I would run some tests to see how it works for you workload. 4) Should we consider increasing the cluster capacity? IMHO yes. You may also want to do some experiments with turing compression on if it not already enabled. Having so much data on each node is a potential bad day. If instead you had to move or repair one of those nodes how long would it take for cassandra to stream all the data over ? (Or to rsync the data over.) How long does it take to run nodetool repair on the node ? With RF 3, if you lose a node you have lost your redundancy. It's important to have a plan about how to get it back and how long it may take. Hope that helps. - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 6/12/2012, at 3:40 AM, Alexandru Sicoe adsi...@gmail.commailto:adsi...@gmail.com wrote: Hi guys, Sorry for the late follow-up but I waited to run major compactions on all 3 nodes at a time before replying with my findings. Basically we were successful on two of the nodes. They both took ~2 days and 11 hours to complete and at the end we saw one very large file ~900GB and the rest much smaller (the overall size decreased). This is what we expected! But on the 3rd node, we suspect major compaction didn't actually finish it's job. First of all nodetool compact returned much earlier than the rest - after one day and 15 hrs. Secondly from the 1.4TBs initially on the node only about 36GB were freed up (almost the same size as before). Saw nothing in the server log (debug not enabled). Below I pasted some more details about file sizes before and after compaction on this third node and disk occupancy. The situation is maybe not so dramatic for us because in less than 2 weeks we will have a down time till after the new year. During this we can completely delete all the data in the cluster and start fresh with TTLs for 1 month (as suggested by Aaron and 8GB heap as suggested by Alain - thanks). Questions: 1) Do you expect problems with the 3rd node during 2 weeks more of operations, in the conditions seen below? [Note: we expect the minor compactions to continue building up files but never really getting to compacting the large file and thus not needing much temporarily extra disk space]. 2) Should we restart with leveled compaction next year? [Note: Aaron was right, we have 1 week rows which get deleted after 1 month which means older rows end up in big files = to free up space with SizeTiered we will have no choice but run major compactions which we don't know if they will work provided that we get at ~1TB / node / 1 month. You can see we are at the limit!] 3) In case we keep SizeTiered: - How can we improve the performance of our major compactions? (we left all config parameters as default). Would increasing compactions throughput interfere with writes and reads? What about multi-threaded compactions? - Do we still need to run regular repair operations as well? Do these also do a major compaction or are they completely separate operations? [Note: we have 3 nodes with RF=2 and inserting at consistency level 1 and reading at consistency level ALL. We read primarily for exporting reasons - we export 1 week worth of data at a time]. 4) Should we consider increasing the cluster capacity? [We generate ~5million new rows every week which shouldn't come close to the hundreds of millions of rows on a node mentioned by Aaron which are the volumes that would
RE: which high level Java client
I use Pelops and have been very happy. In my opinion the interface is cleaner than that with Hector. I personally do like the serializer business. -Original Message- From: Radim Kolar [mailto:h...@filez.com] Sent: Thursday, June 28, 2012 5:06 AM To: user@cassandra.apache.org Subject: Re: which high level Java client i do not have experience with other clients, only hector. But timeout management in hector is really broken. If you expect your nodes to timeout often (for example, if you are using WAN) better to try something else first.
RE: Much more native memory used by Cassandra then the configured JVM heap size
Just to close this item: with CASSANDRA-4314 applied I see no memory errors (either Java heap or native heap). Cassandra appears to be a hog with its memory mapped files. This caused us to wrongly think it was the culprit in a severe native memory leak. However, our leaky process was a different jsvc process altogether. I wanted to make sure I set the record straight and not leave the idea out there that Cassandra may have a memory problem. Wade Poziombka Intel Americas, Inc. From: Poziombka, Wade L [mailto:wade.l.poziom...@intel.com] Sent: Wednesday, June 13, 2012 10:53 AM To: user@cassandra.apache.org Subject: RE: Much more native memory used by Cassandra then the configured JVM heap size Seems like my only recourse is to remove jna.jar and just take the performance/swapping pain? Obviously can't have the entire box lock up. I can provide a pmap etc. if needed. From: Poziombka, Wade L [mailto:wade.l.poziom...@intel.com] Sent: Wednesday, June 13, 2012 10:28 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: RE: Much more native memory used by Cassandra then the configured JVM heap size I have experienced the same issue. The Java heap seems fine but eventually the OS runs out of heap. In my case it renders the entire box unusable without a hard reboot. Console shows: is there a way to limit the native heap usage? xfs invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0 Call Trace: [800c9d3a] out_of_memory+0x8e/0x2f3 [8002dfd7] __wake_up+0x38/0x4f [8000f677] __alloc_pages+0x27f/0x308 [80013034] __do_page_cache_readahead+0x96/0x17b [80013971] filemap_nopage+0x14c/0x360 [8000896c] __handle_mm_fault+0x1fd/0x103b [8002dfd7] __wake_up+0x38/0x4f [800671f2] do_page_fault+0x499/0x842 [800b8f39] audit_filter_syscall+0x87/0xad [8005dde9] error_exit+0x0/0x84 Node 0 DMA per-cpu: empty Node 0 DMA32 per-cpu: empty Node 0 Normal per-cpu: cpu 0 hot: high 186, batch 31 used:23 cpu 0 cold: high 62, batch 15 used:14 ... cpu 23 cold: high 62, batch 15 used:8 Node 1 HighMem per-cpu: empty Free pages: 158332kB (0kB HighMem) Active:16225503 inactive:1 dirty:0 writeback:0 unstable:0 free:39583 slab:21496 Node 0 DMA free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB lowmem_reserve[]: 0 0 32320 32320 Node 0 DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0 lowmem_reserve[]: 0 0 32320 32320 Node 0 Normal free:16136kB min:16272kB low:20340kB high:24408kB active:3255624 From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Tuesday, June 12, 2012 4:08 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Much more native memory used by Cassandra then the configured JVM heap size see http://wiki.apache.org/cassandra/FAQ#mmap which cause the OS low memory. If the memory is used for mmapped access the os can get it back later. Is the low free memory causing a problem ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 12/06/2012, at 5:52 PM, Jason Tang wrote: Hi I found some information of this issue And seems we can have other strategy for data access to reduce mmap usage, in order to use less memory. But I didn't find the document to describe the parameters for Cassandra 1.x, is it a good way to use this parameter to reduce shared memory usage and what's the impact? (btw, our data model is dynamical, which means the although the through put is high, but the life cycle of the data is short, one hour or less). # Choices are auto, standard, mmap, and mmap_index_only. disk_access_mode: auto http://comments.gmane.org/gmane.comp.db.cassandra.user/7390 2012/6/12 Jason Tang ares.t...@gmail.commailto:ares.t...@gmail.com See my post, I limit the HVM heap 6G, but actually Cassandra will use more memory which is not calculated in JVM heap. I use top to monitor total memory used by Cassandra. = -Xms6G -Xmx6G -Xmn1600M 2012/6/12 Jeffrey Kesselman jef...@gmail.commailto:jef...@gmail.com Btw. I suggest you spin up JConsole as it will give you much more detai kon what your VM is actually doing. On Mon, Jun 11, 2012 at 9:14 PM, Jason Tang ares.t...@gmail.commailto:ares.t...@gmail.com wrote: Hi We have some problem with Cassandra memory usage, we configure the JVM HEAP 6G, but after runing Cassandra for several hours (insert, update, delete). The total memory used by Cassandra go up to 15G, which cause the OS low memory. So I wonder if it is normal to have so many memory used by cassandra? And how to limit the native memory used by Cassandra? === Cassandra 1.0.3, 64 bit jdk. Memory ocupied by Cassandra 15G PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 9567 casadm20 0 28.3g 15g 9.1g S 269 65.1 385:57.65 java
is this something to be concerned about - MUTATION message dropped
INFO [ScheduledTasks:1] 2012-06-14 07:49:54,355 MessagingService.java (line 615) 15 MUTATION message dropped in last 5000ms It is at INFO level so I'm inclined to think not but is seems like whenever messages are dropped there may be some issue?
RE: Much more native memory used by Cassandra then the configured JVM heap size
I have experienced the same issue. The Java heap seems fine but eventually the OS runs out of heap. In my case it renders the entire box unusable without a hard reboot. Console shows: is there a way to limit the native heap usage? xfs invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0 Call Trace: [800c9d3a] out_of_memory+0x8e/0x2f3 [8002dfd7] __wake_up+0x38/0x4f [8000f677] __alloc_pages+0x27f/0x308 [80013034] __do_page_cache_readahead+0x96/0x17b [80013971] filemap_nopage+0x14c/0x360 [8000896c] __handle_mm_fault+0x1fd/0x103b [8002dfd7] __wake_up+0x38/0x4f [800671f2] do_page_fault+0x499/0x842 [800b8f39] audit_filter_syscall+0x87/0xad [8005dde9] error_exit+0x0/0x84 Node 0 DMA per-cpu: empty Node 0 DMA32 per-cpu: empty Node 0 Normal per-cpu: cpu 0 hot: high 186, batch 31 used:23 cpu 0 cold: high 62, batch 15 used:14 ... cpu 23 cold: high 62, batch 15 used:8 Node 1 HighMem per-cpu: empty Free pages: 158332kB (0kB HighMem) Active:16225503 inactive:1 dirty:0 writeback:0 unstable:0 free:39583 slab:21496 Node 0 DMA free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB lowmem_reserve[]: 0 0 32320 32320 Node 0 DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0 lowmem_reserve[]: 0 0 32320 32320 Node 0 Normal free:16136kB min:16272kB low:20340kB high:24408kB active:3255624 From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Tuesday, June 12, 2012 4:08 AM To: user@cassandra.apache.org Subject: Re: Much more native memory used by Cassandra then the configured JVM heap size see http://wiki.apache.org/cassandra/FAQ#mmap which cause the OS low memory. If the memory is used for mmapped access the os can get it back later. Is the low free memory causing a problem ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 12/06/2012, at 5:52 PM, Jason Tang wrote: Hi I found some information of this issue And seems we can have other strategy for data access to reduce mmap usage, in order to use less memory. But I didn't find the document to describe the parameters for Cassandra 1.x, is it a good way to use this parameter to reduce shared memory usage and what's the impact? (btw, our data model is dynamical, which means the although the through put is high, but the life cycle of the data is short, one hour or less). # Choices are auto, standard, mmap, and mmap_index_only. disk_access_mode: auto http://comments.gmane.org/gmane.comp.db.cassandra.user/7390 2012/6/12 Jason Tang ares.t...@gmail.commailto:ares.t...@gmail.com See my post, I limit the HVM heap 6G, but actually Cassandra will use more memory which is not calculated in JVM heap. I use top to monitor total memory used by Cassandra. = -Xms6G -Xmx6G -Xmn1600M 2012/6/12 Jeffrey Kesselman jef...@gmail.commailto:jef...@gmail.com Btw. I suggest you spin up JConsole as it will give you much more detai kon what your VM is actually doing. On Mon, Jun 11, 2012 at 9:14 PM, Jason Tang ares.t...@gmail.commailto:ares.t...@gmail.com wrote: Hi We have some problem with Cassandra memory usage, we configure the JVM HEAP 6G, but after runing Cassandra for several hours (insert, update, delete). The total memory used by Cassandra go up to 15G, which cause the OS low memory. So I wonder if it is normal to have so many memory used by cassandra? And how to limit the native memory used by Cassandra? === Cassandra 1.0.3, 64 bit jdk. Memory ocupied by Cassandra 15G PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 9567 casadm20 0 28.3g 15g 9.1g S 269 65.1 385:57.65 java = -Xms6G -Xmx6G -Xmn1600M # ps -ef | grep 9567 casadm9567 1 55 Jun11 ?05:59:44 /opt/jdk1.6.0_29/bin/java -ea -javaagent:/opt/dve/cassandra/bin/../lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms6G -Xmx6G -Xmn1600M -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=6080 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Daccess.properties=/opt/dve/cassandra/conf/access.properties -Dpasswd.properties=/opt/dve/cassandra/conf/passwd.properties -Dpasswd.mode=MD5 -Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true -cp
RE: Much more native memory used by Cassandra then the configured JVM heap size
Seems like my only recourse is to remove jna.jar and just take the performance/swapping pain? Obviously can't have the entire box lock up. I can provide a pmap etc. if needed. From: Poziombka, Wade L [mailto:wade.l.poziom...@intel.com] Sent: Wednesday, June 13, 2012 10:28 AM To: user@cassandra.apache.org Subject: RE: Much more native memory used by Cassandra then the configured JVM heap size I have experienced the same issue. The Java heap seems fine but eventually the OS runs out of heap. In my case it renders the entire box unusable without a hard reboot. Console shows: is there a way to limit the native heap usage? xfs invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0 Call Trace: [800c9d3a] out_of_memory+0x8e/0x2f3 [8002dfd7] __wake_up+0x38/0x4f [8000f677] __alloc_pages+0x27f/0x308 [80013034] __do_page_cache_readahead+0x96/0x17b [80013971] filemap_nopage+0x14c/0x360 [8000896c] __handle_mm_fault+0x1fd/0x103b [8002dfd7] __wake_up+0x38/0x4f [800671f2] do_page_fault+0x499/0x842 [800b8f39] audit_filter_syscall+0x87/0xad [8005dde9] error_exit+0x0/0x84 Node 0 DMA per-cpu: empty Node 0 DMA32 per-cpu: empty Node 0 Normal per-cpu: cpu 0 hot: high 186, batch 31 used:23 cpu 0 cold: high 62, batch 15 used:14 ... cpu 23 cold: high 62, batch 15 used:8 Node 1 HighMem per-cpu: empty Free pages: 158332kB (0kB HighMem) Active:16225503 inactive:1 dirty:0 writeback:0 unstable:0 free:39583 slab:21496 Node 0 DMA free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB lowmem_reserve[]: 0 0 32320 32320 Node 0 DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0 lowmem_reserve[]: 0 0 32320 32320 Node 0 Normal free:16136kB min:16272kB low:20340kB high:24408kB active:3255624 From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Tuesday, June 12, 2012 4:08 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Much more native memory used by Cassandra then the configured JVM heap size see http://wiki.apache.org/cassandra/FAQ#mmap which cause the OS low memory. If the memory is used for mmapped access the os can get it back later. Is the low free memory causing a problem ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 12/06/2012, at 5:52 PM, Jason Tang wrote: Hi I found some information of this issue And seems we can have other strategy for data access to reduce mmap usage, in order to use less memory. But I didn't find the document to describe the parameters for Cassandra 1.x, is it a good way to use this parameter to reduce shared memory usage and what's the impact? (btw, our data model is dynamical, which means the although the through put is high, but the life cycle of the data is short, one hour or less). # Choices are auto, standard, mmap, and mmap_index_only. disk_access_mode: auto http://comments.gmane.org/gmane.comp.db.cassandra.user/7390 2012/6/12 Jason Tang ares.t...@gmail.commailto:ares.t...@gmail.com See my post, I limit the HVM heap 6G, but actually Cassandra will use more memory which is not calculated in JVM heap. I use top to monitor total memory used by Cassandra. = -Xms6G -Xmx6G -Xmn1600M 2012/6/12 Jeffrey Kesselman jef...@gmail.commailto:jef...@gmail.com Btw. I suggest you spin up JConsole as it will give you much more detai kon what your VM is actually doing. On Mon, Jun 11, 2012 at 9:14 PM, Jason Tang ares.t...@gmail.commailto:ares.t...@gmail.com wrote: Hi We have some problem with Cassandra memory usage, we configure the JVM HEAP 6G, but after runing Cassandra for several hours (insert, update, delete). The total memory used by Cassandra go up to 15G, which cause the OS low memory. So I wonder if it is normal to have so many memory used by cassandra? And how to limit the native memory used by Cassandra? === Cassandra 1.0.3, 64 bit jdk. Memory ocupied by Cassandra 15G PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 9567 casadm20 0 28.3g 15g 9.1g S 269 65.1 385:57.65 java = -Xms6G -Xmx6G -Xmn1600M # ps -ef | grep 9567 casadm9567 1 55 Jun11 ?05:59:44 /opt/jdk1.6.0_29/bin/java -ea -javaagent:/opt/dve/cassandra/bin/../lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms6G -Xmx6G -Xmn1600M -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=6080 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Daccess.properties=/opt/dve/cassandra/conf/access.properties
RE: Much more native memory used by Cassandra then the configured JVM heap size
actually, this is without jna.jar. I will add and see if still have same issue From: Poziombka, Wade L Sent: Wednesday, June 13, 2012 10:53 AM To: user@cassandra.apache.org Subject: RE: Much more native memory used by Cassandra then the configured JVM heap size Seems like my only recourse is to remove jna.jar and just take the performance/swapping pain? Obviously can't have the entire box lock up. I can provide a pmap etc. if needed. From: Poziombka, Wade L [mailto:wade.l.poziom...@intel.com] Sent: Wednesday, June 13, 2012 10:28 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: RE: Much more native memory used by Cassandra then the configured JVM heap size I have experienced the same issue. The Java heap seems fine but eventually the OS runs out of heap. In my case it renders the entire box unusable without a hard reboot. Console shows: is there a way to limit the native heap usage? xfs invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0 Call Trace: [800c9d3a] out_of_memory+0x8e/0x2f3 [8002dfd7] __wake_up+0x38/0x4f [8000f677] __alloc_pages+0x27f/0x308 [80013034] __do_page_cache_readahead+0x96/0x17b [80013971] filemap_nopage+0x14c/0x360 [8000896c] __handle_mm_fault+0x1fd/0x103b [8002dfd7] __wake_up+0x38/0x4f [800671f2] do_page_fault+0x499/0x842 [800b8f39] audit_filter_syscall+0x87/0xad [8005dde9] error_exit+0x0/0x84 Node 0 DMA per-cpu: empty Node 0 DMA32 per-cpu: empty Node 0 Normal per-cpu: cpu 0 hot: high 186, batch 31 used:23 cpu 0 cold: high 62, batch 15 used:14 ... cpu 23 cold: high 62, batch 15 used:8 Node 1 HighMem per-cpu: empty Free pages: 158332kB (0kB HighMem) Active:16225503 inactive:1 dirty:0 writeback:0 unstable:0 free:39583 slab:21496 Node 0 DMA free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB lowmem_reserve[]: 0 0 32320 32320 Node 0 DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0 lowmem_reserve[]: 0 0 32320 32320 Node 0 Normal free:16136kB min:16272kB low:20340kB high:24408kB active:3255624 From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Tuesday, June 12, 2012 4:08 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: Much more native memory used by Cassandra then the configured JVM heap size see http://wiki.apache.org/cassandra/FAQ#mmap which cause the OS low memory. If the memory is used for mmapped access the os can get it back later. Is the low free memory causing a problem ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 12/06/2012, at 5:52 PM, Jason Tang wrote: Hi I found some information of this issue And seems we can have other strategy for data access to reduce mmap usage, in order to use less memory. But I didn't find the document to describe the parameters for Cassandra 1.x, is it a good way to use this parameter to reduce shared memory usage and what's the impact? (btw, our data model is dynamical, which means the although the through put is high, but the life cycle of the data is short, one hour or less). # Choices are auto, standard, mmap, and mmap_index_only. disk_access_mode: auto http://comments.gmane.org/gmane.comp.db.cassandra.user/7390 2012/6/12 Jason Tang ares.t...@gmail.commailto:ares.t...@gmail.com See my post, I limit the HVM heap 6G, but actually Cassandra will use more memory which is not calculated in JVM heap. I use top to monitor total memory used by Cassandra. = -Xms6G -Xmx6G -Xmn1600M 2012/6/12 Jeffrey Kesselman jef...@gmail.commailto:jef...@gmail.com Btw. I suggest you spin up JConsole as it will give you much more detai kon what your VM is actually doing. On Mon, Jun 11, 2012 at 9:14 PM, Jason Tang ares.t...@gmail.commailto:ares.t...@gmail.com wrote: Hi We have some problem with Cassandra memory usage, we configure the JVM HEAP 6G, but after runing Cassandra for several hours (insert, update, delete). The total memory used by Cassandra go up to 15G, which cause the OS low memory. So I wonder if it is normal to have so many memory used by cassandra? And how to limit the native memory used by Cassandra? === Cassandra 1.0.3, 64 bit jdk. Memory ocupied by Cassandra 15G PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 9567 casadm20 0 28.3g 15g 9.1g S 269 65.1 385:57.65 java = -Xms6G -Xmx6G -Xmn1600M # ps -ef | grep 9567 casadm9567 1 55 Jun11 ?05:59:44 /opt/jdk1.6.0_29/bin/java -ea -javaagent:/opt/dve/cassandra/bin/../lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms6G -Xmx6G -Xmn1600M -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1
how to compact an index CF?
I have an index to a column IX in column family A. How would I go about compacting that? I have tried nodetool compact keyspace A.IX But that complains Unknown table/cf pair I'm sure there must be some simple magic to make this happen. I just cannot tell what it is.
RE: how to compact an index CF?
This is reference to https://issues.apache.org/jira/browse/CASSANDRA-4314 in which Jonathan Ellis instructed me (I think me) to If you compact the index CF with this patch applied, that should get rid of the tombstones. (compacting the data CF won't do anything.) However, after much looking I cannot see a way to actually do this? Is it automatic? From: Poziombka, Wade L Sent: Friday, June 08, 2012 2:22 PM To: 'user@cassandra.apache.org' Subject: how to compact an index CF? I have an index to a column IX in column family A. How would I go about compacting that? I have tried nodetool compact keyspace A.IX But that complains Unknown table/cf pair I'm sure there must be some simple magic to make this happen. I just cannot tell what it is.
RE: Cassandra not retrieving the complete data on 2 nodes
what is your consistency level? From: Prakrati Agrawal [mailto:prakrati.agra...@mu-sigma.com] Sent: Wednesday, June 06, 2012 4:58 AM To: user@cassandra.apache.org Subject: RE: Cassandra not retrieving the complete data on 2 nodes Please anyone reply to my query Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.comhttp://www.mu-sigma.com From: Prakrati Agrawal [mailto:prakrati.agra...@mu-sigma.com] Sent: Wednesday, June 06, 2012 2:34 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Cassandra not retrieving the complete data on 2 nodes Dear all I was originally having a 1 node cluster. Then I added one more node to it with initial token configured appropriately. Now when I run my queries I am not getting all my data ie all columns. Output on 2 nodes Time taken to retrieve columns 43707 of key range is 1276 Time taken to retrieve columns 2084199 of all tickers is 54334 Time taken to count is 230776 Total number of rows in the database are 183 Total number of columns in the database are 7903753 Output on 1 node Time taken to retrieve columns 43707 of key range is 767 Time taken to retrieve columns 382 of all tickers is 52793 Time taken to count is 268135 Total number of rows in the database are 396 Total number of columns in the database are 16316426 Please help me. Where is my data going or how should I retrieve it. Thanks and Regards Prakrati Agrawal | Developer - Big Data(ID)| 9731648376 | www.mu-sigma.comhttp://www.mu-sigma.com This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software. This email message may contain proprietary, private and confidential information. The information transmitted is intended only for the person(s) or entities to which it is addressed. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited and may be illegal. If you received this in error, please contact the sender and delete the message from your system. Mu Sigma takes all reasonable steps to ensure that its electronic communications are free from viruses. However, given Internet accessibility, the Company cannot accept liability for any virus introduced by this e-mail or any attachment and you are advised to use up-to-date virus checking software.
RE: memory issue on 1.1.0
I believe so. There are no warnings on startup. So is there a preferred way to completely eliminate a column family? From: Tyler Hobbs [mailto:ty...@datastax.com] Sent: Wednesday, June 06, 2012 1:17 PM To: user@cassandra.apache.org Subject: Re: memory issue on 1.1.0 Just to check, do you have JNA setup correctly? (You should see a couple of log messages about it shortly after startup.) Truncate also performs a snapshot by default. On Wed, Jun 6, 2012 at 12:38 PM, Poziombka, Wade L wade.l.poziom...@intel.commailto:wade.l.poziom...@intel.com wrote: However, after all the work I issued a truncate on the old column family (the one replaced by this process) and I get an out of memory condition then. -- Tyler Hobbs DataStaxhttp://datastax.com/
RE: memory issue on 1.1.0
Alas, upgrading to 1.1.1 did not solve my issue. -Original Message- From: Brandon Williams [mailto:dri...@gmail.com] Sent: Monday, June 04, 2012 11:24 PM To: user@cassandra.apache.org Subject: Re: memory issue on 1.1.0 Perhaps the deletes: https://issues.apache.org/jira/browse/CASSANDRA-3741 -Brandon On Sun, Jun 3, 2012 at 6:12 PM, Poziombka, Wade L wade.l.poziom...@intel.com wrote: Running a very write intensive (new column, delete old column etc.) process and failing on memory. Log file attached. Curiously when I add new data I have never seen this have in past sent hundreds of millions new transactions. It seems to be when I modify. my process is as follows key slice to get columns to modify in batches of 100, in separate threads modify those columns. I advance the slice with the start key each with last key in previous batch. Mutations done are update a column value in one column family(token), delete column and add new column in another (pan). Runs well until after about 5 million rows then it seems to run out of memory. Note that these column families are quite small. WARN [ScheduledTasks:1] 2012-06-03 17:49:01,558 GCInspector.java (line 145) Heap is 0.7967470834946492 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 2772) Unable to reduce heap usage since there are no dirty column families INFO [GossipStage:1] 2012-06-03 17:49:01,999 Gossiper.java (line 797) InetAddress /10.230.34.170 is now UP INFO [ScheduledTasks:1] 2012-06-03 17:49:10,048 GCInspector.java (line 122) GC for ParNew: 206 ms for 1 collections, 7345969520 used; max is 8506048512 INFO [ScheduledTasks:1] 2012-06-03 17:49:53,187 GCInspector.java (line 122) GC for ConcurrentMarkSweep: 12770 ms for 1 collections, 5714800208 used; max is 8506048512 Keyspace: keyspace Read Count: 50042632 Read Latency: 0.23157864418482224 ms. Write Count: 44948323 Write Latency: 0.019460829472992797 ms. Pending Tasks: 0 Column Family: pan SSTable count: 5 Space used (live): 1977467326 Space used (total): 1977467326 Number of Keys (estimate): 16334848 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 74 Read Count: 14985122 Read Latency: 0.408 ms. Write Count: 19972441 Write Latency: 0.022 ms. Pending Tasks: 0 Bloom Filter False Postives: 829 Bloom Filter False Ratio: 0.00073 Bloom Filter Space Used: 37048400 Compacted row minimum size: 125 Compacted row maximum size: 149 Compacted row mean size: 149 Column Family: token SSTable count: 4 Space used (live): 1250973873 Space used (total): 1250973873 Number of Keys (estimate): 14217216 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 49 Read Count: 30059563 Read Latency: 0.167 ms. Write Count: 14985488 Write Latency: 0.014 ms. Pending Tasks: 0 Bloom Filter False Postives: 13642 Bloom Filter False Ratio: 0.00322 Bloom Filter Space Used: 28002984 Compacted row minimum size: 150 Compacted row maximum size: 258 Compacted row mean size: 224 Column Family: counters SSTable count: 2 Space used (live): 561549994 Space used (total): 561549994 Number of Keys (estimate): 9985024 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 38 Read Count: 4997947 Read Latency: 0.092 ms. Write Count: 9990394 Write Latency: 0.023 ms. Pending Tasks: 0 Bloom Filter False Postives: 191 Bloom Filter False Ratio: 0.37525 Bloom Filter Space Used: 18741152 Compacted row minimum size: 125 Compacted row maximum size: 179 Compacted row mean size: 150
RE: memory issue on 1.1.0
Thank you. I do have some of the same observations. Do you do deletes? My observation is that without deletes (or column updates I guess) I can run forever happy. but when I run (what for me is a batch process) operations that delete and modify column values I run into this. Reading bug https://issues.apache.org/jira/browse/CASSANDRA-3741 the advice is to NOT do deletes individually and to truncate. I am scrambling to try to do this but curious if it will be worth the effort. Wade -Original Message- From: Mina Naguib [mailto:mina.nag...@bloomdigital.com] Sent: Tuesday, June 05, 2012 2:24 PM To: user@cassandra.apache.org Subject: Re: memory issue on 1.1.0 Hi Wade I don't know if your scenario matches mine, but I've been struggling with memory pressure in 1.x as well. I made the jump from 0.7.9 to 1.1.0, along with enabling compression and levelled compactions, so I don't know which specifically is the main culprit. Specifically, all my nodes seem to lose heap memory. As parnew and CMS do their job, over any reasonable period of time, the floor of memory after a GC keeps rising. This is quite visible if you leave jconsole connected for a day or so, and manifests itself as a funny-looking cone like so: http://mina.naguib.ca/images/cassandra_jconsole.png Once memory pressure reaches a point where the heap can't be maintained reliably below 75%, cassandra goes into survival mode - via a bunch of tunables in cassandra.conf it'll do things like flush memtables, drop caches, etc - all of which, in my experience, especially with the recent off-heap data structures, exasperate the problem. I've been meaning, of course, to collect enough technical data to file a bug report, but haven't had the time. I have not yet tested 1.1.1 to see if it improves the situation. What I have found however, is a band-aid which you see at the rightmost section of the graph in the screenshot I posted. That is simply to hit Perform GC button in jconsole. It seems that a full System.gc() *DOES* reclaim heap memory that parnew and CMS fail to reclaim. On my production cluster I have a full-GC via JMX scheduled in a rolling fashion every 4 hours. It's extremely expensive (20-40 seconds of unresponsiveness) but is a necessary evil in my situation. Without it, my nodes enter a nasty spiral of constant flushing, constant compactions, high heap usage, instability and high latency. On 2012-06-05, at 2:56 PM, Poziombka, Wade L wrote: Alas, upgrading to 1.1.1 did not solve my issue. -Original Message- From: Brandon Williams [mailto:dri...@gmail.com] Sent: Monday, June 04, 2012 11:24 PM To: user@cassandra.apache.org Subject: Re: memory issue on 1.1.0 Perhaps the deletes: https://issues.apache.org/jira/browse/CASSANDRA-3741 -Brandon On Sun, Jun 3, 2012 at 6:12 PM, Poziombka, Wade L wade.l.poziom...@intel.com wrote: Running a very write intensive (new column, delete old column etc.) process and failing on memory. Log file attached. Curiously when I add new data I have never seen this have in past sent hundreds of millions new transactions. It seems to be when I modify. my process is as follows key slice to get columns to modify in batches of 100, in separate threads modify those columns. I advance the slice with the start key each with last key in previous batch. Mutations done are update a column value in one column family(token), delete column and add new column in another (pan). Runs well until after about 5 million rows then it seems to run out of memory. Note that these column families are quite small. WARN [ScheduledTasks:1] 2012-06-03 17:49:01,558 GCInspector.java (line 145) Heap is 0.7967470834946492 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 2772) Unable to reduce heap usage since there are no dirty column families INFO [GossipStage:1] 2012-06-03 17:49:01,999 Gossiper.java (line 797) InetAddress /10.230.34.170 is now UP INFO [ScheduledTasks:1] 2012-06-03 17:49:10,048 GCInspector.java (line 122) GC for ParNew: 206 ms for 1 collections, 7345969520 used; max is 8506048512 INFO [ScheduledTasks:1] 2012-06-03 17:49:53,187 GCInspector.java (line 122) GC for ConcurrentMarkSweep: 12770 ms for 1 collections, 5714800208 used; max is 8506048512 Keyspace: keyspace Read Count: 50042632 Read Latency: 0.23157864418482224 ms. Write Count: 44948323 Write Latency: 0.019460829472992797 ms. Pending Tasks: 0 Column Family: pan SSTable count: 5 Space used (live): 1977467326 Space used (total): 1977467326
RE: memory issue on 1.1.0
Ok, so I have completely refactored to remove deletes and it still fails. So it is completely unrelated to deletes. I guess I need to go back to 1.0.10? When I originally evaluated I ran 1.0.8... perhaps I went a bridge too far with 1.1. I don't think I am doing anything exotic here. Here is my column family. KsDef(name:TB_UNIT, strategy_class:org.apache.cassandra.locator.SimpleStrategy, strategy_options:{replication_factor=3}, cf_defs:[ CfDef(keyspace:TB_UNIT, name:token, column_type:Standard, comparator_type:BytesType, column_metadata:[ColumnDef(name:70 61 6E 45 6E 63, validation_class:BytesType), ColumnDef(name:63 72 65 61 74 65 54 73, validation_class:DateType), ColumnDef(name:63 72 65 61 74 65 44 61 74 65, validation_class:DateType, index_type:KEYS, index_name:TokenCreateDate), ColumnDef(name:65 6E 63 72 79 70 74 69 6F 6E 53 65 74 74 69 6E 67 73 49 44, validation_class:UTF8Type, index_type:KEYS, index_name:EncryptionSettingsID)], caching:keys_only), CfDef(keyspace:TB_UNIT, name:pan_d721fd40fd9443aa81cc6f59c8e047c6, column_type:Standard, comparator_type:BytesType, caching:keys_only), CfDef(keyspace:TB_UNIT, name:counters, column_type:Standard, comparator_type:BytesType, column_metadata:[ColumnDef(name:75 73 65 43 6F 75 6E 74, validation_class:CounterColumnType)], default_validation_class:CounterColumnType, caching:keys_only) ]) -Original Message- From: Poziombka, Wade L [mailto:wade.l.poziom...@intel.com] Sent: Tuesday, June 05, 2012 3:09 PM To: user@cassandra.apache.org Subject: RE: memory issue on 1.1.0 Thank you. I do have some of the same observations. Do you do deletes? My observation is that without deletes (or column updates I guess) I can run forever happy. but when I run (what for me is a batch process) operations that delete and modify column values I run into this. Reading bug https://issues.apache.org/jira/browse/CASSANDRA-3741 the advice is to NOT do deletes individually and to truncate. I am scrambling to try to do this but curious if it will be worth the effort. Wade -Original Message- From: Mina Naguib [mailto:mina.nag...@bloomdigital.com] Sent: Tuesday, June 05, 2012 2:24 PM To: user@cassandra.apache.org Subject: Re: memory issue on 1.1.0 Hi Wade I don't know if your scenario matches mine, but I've been struggling with memory pressure in 1.x as well. I made the jump from 0.7.9 to 1.1.0, along with enabling compression and levelled compactions, so I don't know which specifically is the main culprit. Specifically, all my nodes seem to lose heap memory. As parnew and CMS do their job, over any reasonable period of time, the floor of memory after a GC keeps rising. This is quite visible if you leave jconsole connected for a day or so, and manifests itself as a funny-looking cone like so: http://mina.naguib.ca/images/cassandra_jconsole.png Once memory pressure reaches a point where the heap can't be maintained reliably below 75%, cassandra goes into survival mode - via a bunch of tunables in cassandra.conf it'll do things like flush memtables, drop caches, etc - all of which, in my experience, especially with the recent off-heap data structures, exasperate the problem. I've been meaning, of course, to collect enough technical data to file a bug report, but haven't had the time. I have not yet tested 1.1.1 to see if it improves the situation. What I have found however, is a band-aid which you see at the rightmost section of the graph in the screenshot I posted. That is simply to hit Perform GC button in jconsole. It seems that a full System.gc() *DOES* reclaim heap memory that parnew and CMS fail to reclaim. On my production cluster I have a full-GC via JMX scheduled in a rolling fashion every 4 hours. It's extremely expensive (20-40 seconds of unresponsiveness) but is a necessary evil in my situation. Without it, my nodes enter a nasty spiral of constant flushing, constant compactions, high heap usage, instability and high latency. On 2012-06-05, at 2:56 PM, Poziombka, Wade L wrote: Alas, upgrading to 1.1.1 did not solve my issue. -Original Message- From: Brandon Williams [mailto:dri...@gmail.com] Sent: Monday, June 04, 2012 11:24 PM To: user@cassandra.apache.org Subject: Re: memory issue on 1.1.0 Perhaps the deletes: https://issues.apache.org/jira/browse/CASSANDRA-3741 -Brandon On Sun, Jun 3, 2012 at 6:12 PM, Poziombka, Wade L wade.l.poziom...@intel.com wrote: Running a very write intensive (new column, delete old column etc.) process and failing on memory. Log file attached. Curiously when I add new data I have never seen this have in past sent hundreds of millions new transactions. It seems to be when I modify. my process is as follows key slice to get columns to modify in batches of 100, in separate threads modify those columns. I advance the slice with the start key each with last key in previous batch. Mutations done are update
my devious QA - how to recover with power lost situation
They 1) setup a two node cluster and loaded 500K rows or something 2) add a third node, run nodetool move 3) while moving they pull the plug on the node Cassandra won't start with the exception below. Now, this is obviously a very exceptional situation but the question is posed: how best to recover this? Observation:- After the machine was up, the cassandra service failed to start. Following exception was observed in the cassandra logs:- java.lang.RuntimeException: java.nio.charset.MalformedInputException: Input length = 1 at org.apache.cassandra.cql3.ColumnIdentifier.init(ColumnIdentifier.java:50) at org.apache.cassandra.cql3.CFDefinition.getKeyId(CFDefinition.java:125) at org.apache.cassandra.cql3.CFDefinition.init(CFDefinition.java:59) at org.apache.cassandra.config.CFMetaData.updateCfDef(CFMetaData.java:1278) at org.apache.cassandra.config.CFMetaData.keyAlias(CFMetaData.java:221) at org.apache.cassandra.config.CFMetaData.fromSchemaNoColumns(CFMetaData.java:1162) at org.apache.cassandra.config.CFMetaData.fromSchema(CFMetaData.java:1190) at org.apache.cassandra.config.KSMetaData.deserializeColumnFamilies(KSMetaData.java:291) at org.apache.cassandra.config.KSMetaData.fromSchema(KSMetaData.java:272) at org.apache.cassandra.db.DefsTable.loadFromTable(DefsTable.java:158) at org.apache.cassandra.config.DatabaseDescriptor.loadSchemas(DatabaseDescriptor.java:533) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:182) at org.apache.cassandra.service.AbstractCassandraDaemon.init(AbstractCassandraDaemon.java:254) at com.intel.soae.cassandra.server.SOAEDaemon.init(SOAEDaemon.java:435) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.commons.daemon.support.DaemonLoader.load(DaemonLoader.java:207) Caused by: java.nio.charset.MalformedInputException: Input length = 1 at java.nio.charset.CoderResult.throwException(Unknown Source) at java.nio.charset.CharsetDecoder.decode(Unknown Source) at org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:163) at org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:120) at org.apache.cassandra.cql3.ColumnIdentifier.init(ColumnIdentifier.java:46) ... 18 more
RE: memory issue on 1.1.0
What JVM settings do you have? -Xms8G -Xmx8G -Xmn800m -XX:+HeapDumpOnOutOfMemoryError -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.rmi.server.hostname=127.0.0.1 -Djava.net.preferIPv4Stack=true -Dcassandra-pidfile=cassandra.pid What is the machine spec ? It is an RH AS5 x64 16gb memory 2 CPU cores 2.8 Ghz As it turns out it is somewhat wimpier than I thought. While weak on it does have a good amount of memory. It is paired with a larger machine. What settings do you have for key and row cache ? A: All the defaults. (yaml template attached); Do the CF's have secondary indexes ? A: Yes one has two. One of them is used in the key slice used to get the row keys used to do the further mutations. How many clients / requests per second ? A: One client process with 10 threads connected to one of the two nodes in the cluster. On thread reading the slice and putting work in a queue. 9 others reading from this queue and applying the mutations. Mutations are completing at about 20,000/minute roughly. From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Monday, June 04, 2012 4:17 PM To: user@cassandra.apache.org Subject: Re: memory issue on 1.1.0 Had a look at the log, this message INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 2772) Unable to reduce heap usage since there are no dirty column families appears correct, it happens after some flush activity and there are not CF's with memtable data. But the heap is still full. Overall the server is overloaded, but it seems like it should be handling it better. What JVM settings do you have? What is the machine spec ? What settings do you have for key and row cache ? Do the CF's have secondary indexes ? How many clients / requests per second ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 4/06/2012, at 11:12 AM, Poziombka, Wade L wrote: Running a very write intensive (new column, delete old column etc.) process and failing on memory. Log file attached. Curiously when I add new data I have never seen this have in past sent hundreds of millions new transactions. It seems to be when I modify. my process is as follows key slice to get columns to modify in batches of 100, in separate threads modify those columns. I advance the slice with the start key each with last key in previous batch. Mutations done are update a column value in one column family(token), delete column and add new column in another (pan). Runs well until after about 5 million rows then it seems to run out of memory. Note that these column families are quite small. WARN [ScheduledTasks:1] 2012-06-03 17:49:01,558 GCInspector.java (line 145) Heap is 0.7967470834946492 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 2772) Unable to reduce heap usage since there are no dirty column families INFO [GossipStage:1] 2012-06-03 17:49:01,999 Gossiper.java (line 797) InetAddress /10.230.34.170 is now UP INFO [ScheduledTasks:1] 2012-06-03 17:49:10,048 GCInspector.java (line 122) GC for ParNew: 206 ms for 1 collections, 7345969520 used; max is 8506048512 INFO [ScheduledTasks:1] 2012-06-03 17:49:53,187 GCInspector.java (line 122) GC for ConcurrentMarkSweep: 12770 ms for 1 collections, 5714800208 used; max is 8506048512 Keyspace: keyspace Read Count: 50042632 Read Latency: 0.23157864418482224 ms. Write Count: 44948323 Write Latency: 0.019460829472992797 ms. Pending Tasks: 0 Column Family: pan SSTable count: 5 Space used (live): 1977467326 Space used (total): 1977467326 Number of Keys (estimate): 16334848 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 74 Read Count: 14985122 Read Latency: 0.408 ms. Write Count: 19972441 Write Latency: 0.022 ms. Pending Tasks: 0 Bloom Filter False Postives: 829 Bloom Filter False Ratio: 0.00073 Bloom Filter Space Used: 37048400 Compacted row minimum size: 125 Compacted row maximum size: 149 Compacted row mean size: 149 Column Family: token SSTable count: 4 Space used (live): 1250973873 Space used (total
RE: memory issue on 1.1.0
I have repeated the test on two quite large machines 12 core, 64 GB as5 boxes and still observed the problem. Interestingly about at the same point. Anything I can monitor... perhaps I'll hook the Yourkit profiler up to it to see if there is some kind of leak? Wade From: Poziombka, Wade L Sent: Monday, June 04, 2012 7:23 PM To: user@cassandra.apache.org Subject: RE: memory issue on 1.1.0 What JVM settings do you have? -Xms8G -Xmx8G -Xmn800m -XX:+HeapDumpOnOutOfMemoryError -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.rmi.server.hostname=127.0.0.1 -Djava.net.preferIPv4Stack=true -Dcassandra-pidfile=cassandra.pid What is the machine spec ? It is an RH AS5 x64 16gb memory 2 CPU cores 2.8 Ghz As it turns out it is somewhat wimpier than I thought. While weak on it does have a good amount of memory. It is paired with a larger machine. What settings do you have for key and row cache ? A: All the defaults. (yaml template attached); Do the CF's have secondary indexes ? A: Yes one has two. One of them is used in the key slice used to get the row keys used to do the further mutations. How many clients / requests per second ? A: One client process with 10 threads connected to one of the two nodes in the cluster. On thread reading the slice and putting work in a queue. 9 others reading from this queue and applying the mutations. Mutations are completing at about 20,000/minute roughly. From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Monday, June 04, 2012 4:17 PM To: user@cassandra.apache.org Subject: Re: memory issue on 1.1.0 Had a look at the log, this message INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 2772) Unable to reduce heap usage since there are no dirty column families appears correct, it happens after some flush activity and there are not CF's with memtable data. But the heap is still full. Overall the server is overloaded, but it seems like it should be handling it better. What JVM settings do you have? What is the machine spec ? What settings do you have for key and row cache ? Do the CF's have secondary indexes ? How many clients / requests per second ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 4/06/2012, at 11:12 AM, Poziombka, Wade L wrote: Running a very write intensive (new column, delete old column etc.) process and failing on memory. Log file attached. Curiously when I add new data I have never seen this have in past sent hundreds of millions new transactions. It seems to be when I modify. my process is as follows key slice to get columns to modify in batches of 100, in separate threads modify those columns. I advance the slice with the start key each with last key in previous batch. Mutations done are update a column value in one column family(token), delete column and add new column in another (pan). Runs well until after about 5 million rows then it seems to run out of memory. Note that these column families are quite small. WARN [ScheduledTasks:1] 2012-06-03 17:49:01,558 GCInspector.java (line 145) Heap is 0.7967470834946492 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 2772) Unable to reduce heap usage since there are no dirty column families INFO [GossipStage:1] 2012-06-03 17:49:01,999 Gossiper.java (line 797) InetAddress /10.230.34.170 is now UP INFO [ScheduledTasks:1] 2012-06-03 17:49:10,048 GCInspector.java (line 122) GC for ParNew: 206 ms for 1 collections, 7345969520 used; max is 8506048512 INFO [ScheduledTasks:1] 2012-06-03 17:49:53,187 GCInspector.java (line 122) GC for ConcurrentMarkSweep: 12770 ms for 1 collections, 5714800208 used; max is 8506048512 Keyspace: keyspace Read Count: 50042632 Read Latency: 0.23157864418482224 ms. Write Count: 44948323 Write Latency: 0.019460829472992797 ms. Pending Tasks: 0 Column Family: pan SSTable count: 5 Space used (live): 1977467326 Space used (total): 1977467326 Number of Keys (estimate): 16334848 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 74 Read Count: 14985122 Read Latency: 0.408 ms. Write Count: 19972441 Write Latency: 0.022 ms. Pending Tasks: 0 Bloom Filter False
RE: nodetool move 0 gets stuck in moving state forever
Let me elaborate a bit. two node cluster node1 has token 0 node2 has token 85070591730234615865843651857942052864 node1 goes down perminently. do a nodetool move 0 on node2. monitor with ring... is in Moving state forever it seems. From: Poziombka, Wade L Sent: Tuesday, May 29, 2012 4:29 PM To: user@cassandra.apache.org Subject: nodetool move 0 gets stuck in moving state forever If the node with token 0 dies and we just want it gone from the cluster we would do a nodetool move 0. Then we monitor using nodetool ring it seems to be stuck on Moving forever. Any ideas?
nodetool move 0 gets stuck in moving state forever
If the node with token 0 dies and we just want it gone from the cluster we would do a nodetool move 0. Then we monitor using nodetool ring it seems to be stuck on Moving forever. Any ideas?
Re: Migrating a column family from one cluster to another
How does counters affect this? Why would be different? Sent from my iPhone On May 18, 2012, at 15:40, Rob Coli rc...@palominodb.com wrote: On Thu, May 17, 2012 at 9:37 AM, Bryan Fernandez bfernande...@gmail.com wrote: What would be the recommended approach to migrating a few column families from a six node cluster to a three node cluster? The easiest way (if you are not using counters) is : 1) make sure all filenames of sstables are unique [1] 2) copy all sstablefiles from the 6 nodes to all 3 nodes 3) run a cleanup compaction on the 3 nodes =Rob [1] https://issues.apache.org/jira/browse/CASSANDRA-1983 -- =Robert Coli AIMGTALK - rc...@palominodb.com YAHOO - rcoli.palominob SKYPE - rcoli_palominodb