Re: replaced node keeps returning in gossip
Hi When I sent the mail I'd had the new node on for about an hour, the old node died about an hour before that. The weirdness in the log files stopped yesterday afternoon, about 4 or 5 hours after I replaced the node so it seems to have resolved itself. Seeing as there's no error to look at in the log files not sure if you still want the output of my gossipinfo but I've pasted it below anyway. Thanks! /10.16.96.212 LOAD:7.8018521345E10 RPC_ADDRESS:0.0.0.0 RELEASE_VERSION:1.0.11 STATUS:NORMAL,113427455640312821154458202477256070484 SCHEMA:9b152e00-fd90-11e1--2d22988ca597 /10.16.128.211 LOAD:7.8416250275E10 RPC_ADDRESS:0.0.0.0 RELEASE_VERSION:1.0.11 STATUS:NORMAL,85070591730234615865843651857942052863 SCHEMA:9b152e00-fd90-11e1--2d22988ca597 /10.16.32.210 LOAD:1.29054735121E11 RPC_ADDRESS:0.0.0.0 RELEASE_VERSION:1.0.11 STATUS:NORMAL,56713727820156407428984779325531226112 SCHEMA:9b152e00-fd90-11e1--2d22988ca597 /10.16.32.211 LOAD:7.2937725831E10 RPC_ADDRESS:0.0.0.0 RELEASE_VERSION:1.0.11 STATUS:NORMAL,141784319550391032739561396922763706368 SCHEMA:9b152e00-fd90-11e1--2d22988ca597 ip-10-16-128-197.localdomain/10.16.128.197 LOAD:6.5571879526E10 RPC_ADDRESS:0.0.0.0 RELEASE_VERSION:1.0.11 STATUS:NORMAL,0 SCHEMA:9b152e00-fd90-11e1--2d22988ca597 /10.16.96.211 LOAD:1.0633383453E11 RPC_ADDRESS:0.0.0.0 RELEASE_VERSION:1.0.11 STATUS:NORMAL,28356863910078203714492389662765613056 SCHEMA:9b152e00-fd90-11e1--2d22988ca597 On Fri, Oct 19, 2012 at 2:56 AM, aaron morton aa...@thelastpickle.comwrote: I replaced it with a new node, IP 10.16.128.197 and again token 0 with a -Dcassandra.replace_token=0 at startup Good Good. How long ago did you bring the new node on ? There is a fail safe to remove 128.210 after 3 days if it does not gossip to other nodes. I *thought* that remove_token would remove the old IP from the ring. Can you post the output from nodetool gossipinfo from the 128.197 node ? Thanks - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 19/10/2012, at 2:44 AM, Thomas van Neerijnen t...@bossastudios.com wrote: Hi all I'm running Cassandra 1.0.11 on Ubuntu 11.10. I've got a ghost node which keeps showing up on my ring. A node living on IP 10.16.128.210 and token 0 died and had to be replaced. I replaced it with a new node, IP 10.16.128.197 and again token 0 with a -Dcassandra.replace_token=0 at startup. This all went well but now I'm seeing the following weirdness constantly reported in the log files around the ring: INFO [GossipTasks:1] 2012-10-18 13:39:22,441 Gossiper.java (line 632) FatClient /10.16.128.210 has been silent for 3ms, removing from gossip INFO [GossipStage:1] 2012-10-18 13:40:25,933 Gossiper.java (line 838) Node /10.16.128.210 is now part of the cluster INFO [GossipStage:1] 2012-10-18 13:40:25,934 Gossiper.java (line 804) InetAddress /10.16.128.210 is now UP INFO [GossipStage:1] 2012-10-18 13:40:25,937 StorageService.java (line 1017) Nodes /10.16.128.210 and /10.16.128.197 have the same token 0. Ignoring /10.16.128.210 INFO [GossipTasks:1] 2012-10-18 13:40:37,509 Gossiper.java (line 818) InetAddress /10.16.128.210 is now dead. INFO [GossipTasks:1] 2012-10-18 13:40:56,526 Gossiper.java (line 632) FatClient /10.16.128.210 has been silent for 3ms, removing from gossip
unexpected behaviour on seed nodes when using -Dcassandra.replace_token
Hi all I recently tried to replace a dead node using -Dcassandra.replace_token=token, which so far has been good to me. However on one of my nodes this option was ignored and the node simply picked a different token to live at and started up there. It was a foolish mistake on my part because it was set as a seed node, which results in this error in the log file: INFO [main] 2012-10-19 12:41:00,886 StorageService.java (line 518) This node will not auto bootstrap because it is configured to be a seed node. but it seems a little scary that this would mean it'll just ignore the fact that you want a replace a token and put itself somewhere else in the cluster. Surely it should behave similarly to trying to replace a live node by throwing some kind of exception?
replaced node keeps returning in gossip
Hi all I'm running Cassandra 1.0.11 on Ubuntu 11.10. I've got a ghost node which keeps showing up on my ring. A node living on IP 10.16.128.210 and token 0 died and had to be replaced. I replaced it with a new node, IP 10.16.128.197 and again token 0 with a -Dcassandra.replace_token=0 at startup. This all went well but now I'm seeing the following weirdness constantly reported in the log files around the ring: INFO [GossipTasks:1] 2012-10-18 13:39:22,441 Gossiper.java (line 632) FatClient /10.16.128.210 has been silent for 3ms, removing from gossip INFO [GossipStage:1] 2012-10-18 13:40:25,933 Gossiper.java (line 838) Node /10.16.128.210 is now part of the cluster INFO [GossipStage:1] 2012-10-18 13:40:25,934 Gossiper.java (line 804) InetAddress /10.16.128.210 is now UP INFO [GossipStage:1] 2012-10-18 13:40:25,937 StorageService.java (line 1017) Nodes /10.16.128.210 and /10.16.128.197 have the same token 0. Ignoring /10.16.128.210 INFO [GossipTasks:1] 2012-10-18 13:40:37,509 Gossiper.java (line 818) InetAddress /10.16.128.210 is now dead. INFO [GossipTasks:1] 2012-10-18 13:40:56,526 Gossiper.java (line 632) FatClient /10.16.128.210 has been silent for 3ms, removing from gossip
Re: java.lang.NoClassDefFoundError when trying to do anything on one CF on one node
Hi Aaron Removing NodeInfo did do the trick, thanks. I see the ticket is already resolved, good news. Thanks for the help. On Fri, Sep 7, 2012 at 12:26 AM, aaron morton aa...@thelastpickle.comwrote: This is a problem… [default@system] list NodeIdInfo ; Using default limit of 100 ... --- RowKey: 43757272656e744c6f63616c = (column=01efa5d0-e133-11e1--51be601cd0ff, value=0a1020d2, timestamp=1344414498989) = (column=92109b80-ea0a-11e1--51be601cd0af, value=0a1020d2, timestamp=1345386691897) There is an assertion that we should only have one column for the CurrentLocal row. The exception is occurring in a static class member initialisation and so is getting lost. I'm not sure how two ended up there. Deleting the NodeIdInfo CF SSTables should fix it. I created https://issues.apache.org/jira/browse/CASSANDRA-4626 can you please add more information there if you can and/or watch the ticket incase there are other questions. Thanks - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 5/09/2012, at 10:18 PM, Thomas van Neerijnen t...@bossastudios.com wrote: forgot to answer your first question. I see this: INFO 14:31:31,896 No saved local node id, using newly generated: 92109b80-ea0a-11e1--51be601cd0af On Wed, Sep 5, 2012 at 8:41 AM, Thomas van Neerijnen t...@bossastudios.com wrote: Thanks for the help Aaron. I've checked NodeIdInfo and LocationInfo as below. What am I looking at? I'm guessing the first row in NodeIdInfo represents the ring with the node ids, but the second row perhaps dead nodes with old schemas? That's a total guess, I'd be very interested to know what it and the LocationInfo are. If there's anything else you'd like me to check let me know, otherwise I'll attempt your workaround later today. [default@system] list NodeIdInfo ; Using default limit of 100 --- RowKey: 4c6f63616c = (column=b10552c0-ea0f-11e0--cb1f02ccbcff, value=0a1020d2, timestamp=1317241393645) = (column=e64fc8f0-595b-11e1--51be601cd0d7, value=0a1020d2, timestamp=1329478703871) = (column=732d4690-a596-11e1--51be601cd09f, value=0a1020d2, timestamp=1337860139385) = (column=bffd9d40-aa45-11e1--51be601cd0fe, value=0a1020d2, timestamp=1338375234836) = (column=01efa5d0-e133-11e1--51be601cd0ff, value=0a1020d2, timestamp=1344414498989) = (column=92109b80-ea0a-11e1--51be601cd0af, value=0a1020d2, timestamp=1345386691897) --- RowKey: 43757272656e744c6f63616c = (column=01efa5d0-e133-11e1--51be601cd0ff, value=0a1020d2, timestamp=1344414498989) = (column=92109b80-ea0a-11e1--51be601cd0af, value=0a1020d2, timestamp=1345386691897) 2 Rows Returned. Elapsed time: 128 msec(s). [default@system] list LocationInfo ; Using default limit of 100 --- RowKey: 52696e67 = (column=00, value=0a1080d2, timestamp=134104900) = (column=04a7128b6c83505dcd618720f92028f4, value=0a1020b7, timestamp=1332360971660) = (column=09249249249249249249249249249249, value=0a1080cd, timestamp=1341136002862) = (column=12492492492492492492492492492492, value=0a1020d3, timestamp=1341135999465) = (column=1500, value=0a1060d3, timestamp=134104671) = (column=1555, value=0a1020d3, timestamp=1344530188382) = (column=1b6db6db6db6db6db6db6db6db6db6db, value=0a1020b1, timestamp=1341135997643) = (column=1c71c71c71c71bff, value=0a1080d2, timestamp=1317241889689) = (column=24924924924924924924924924924924, value=0a1060d3, timestamp=1341135996555) = (column=29ff, value=0a1020d3, timestamp=1317241534292) = (column=2aaa, value=0a1060d3, timestamp=1344530187539) = (column=38e38e38e38e37ff, value=0a1060d3, timestamp=1317241257569) = (column=38e38e38e38e38e38e38e38e38e38e38, value=0a1060d3, timestamp=1343136501647) = (column=393170e0207a17d8519f0c1bfe325d51, value=0a1020d3, timestamp=1345381375120) = (column=3fff, value=0a1080d3, timestamp=134104939) = (column=471c71c71c71c71c71c71c71c71c71c6, value=0a1080d3, timestamp=1343133153701) = (column=471c71c71c71c7ff, value=0a1080d3, timestamp=1317241786636) = (column=49249249249249249249249249249249, value=0a1080d3, timestamp=1341136002693) = (column=52492492492492492492492492492492, value=0a106010, timestamp=1341136002626) = (column=53ff, value=0a1020d4, timestamp=1328473688357) = (column=5554, value=0a1060d4, timestamp=134104910) = (column=5b6db6db6db6db6db6db6db6db6db6da, value=0a1060d4, timestamp=1332389784945) = (column=5b6db6db6db6db6db6db6db6db6db6db, value=0a1060d4, timestamp=1341136001027) = (column=638e38e38e38e38e38e38e38e38e38e2, value=0a1060d4, timestamp=1343125208462) = (column
Re: java.lang.NoClassDefFoundError when trying to do anything on one CF on one node
of the NodeIdInfo and LocationInfo CF's from system ? I *think* a work around may be to: * stop the node * remove LocationInfo and NodeInfo cfs. * restart Note this will read the token from the yaml file again, so make sure it's right. cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 4/09/2012, at 9:51 PM, Thomas van Neerijnen t...@bossastudios.com wrote: Hi I have a single node in a 6 node Cassandra 1.0.11 cluster that seems to have a single column family in a weird state. Repairs, upgradesstables, anything that touches this CF crashes. I've drained the node, removed every file for this CF from said node, removed the commit log, started it up and as soon as data is written to this CF on this node I'm in the same situation again. Anyone have any suggestions for how to fix this? I'm tempted to remove the node and re-add it but I was hoping for something a little less disruptive. $ nodetool -h localhost upgradesstables Player PlayerCounters Error occured while upgrading the sstables for keyspace Player java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: Could not initialize class org.apache.cassandra.utils.NodeId$LocalIds at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.cassandra.db.compaction.CompactionManager.performAllSSTableOperation(CompactionManager.java:219) at org.apache.cassandra.db.compaction.CompactionManager.performSSTableRewrite(CompactionManager.java:235) at org.apache.cassandra.db.ColumnFamilyStore.sstablesRewrite(ColumnFamilyStore.java:999) at org.apache.cassandra.service.StorageService.upgradeSSTables(StorageService.java:1652) at sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427) at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788) at sun.reflect.GeneratedMethodAccessor58.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:303) at sun.rmi.transport.Transport$1.run(Transport.java:159) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:155) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.cassandra.utils.NodeId$LocalIds at org.apache.cassandra.utils.NodeId.localIds(NodeId.java:49) at org.apache.cassandra.utils.NodeId.getOldLocalNodeIds(NodeId.java:79) at org.apache.cassandra.db.CounterColumn.computeOldShardMerger(CounterColumn.java:251) at org.apache.cassandra.db.CounterColumn.mergeAndRemoveOldShards(CounterColumn.java:297) at org.apache.cassandra.db.CounterColumn.mergeAndRemoveOldShards(CounterColumn.java:271) at org.apache.cassandra.db.compaction.PrecompactedRow.removeDeletedAndOldShards(PrecompactedRow.java:81
Re: java.lang.NoClassDefFoundError when trying to do anything on one CF on one node
forgot to answer your first question. I see this: INFO 14:31:31,896 No saved local node id, using newly generated: 92109b80-ea0a-11e1--51be601cd0af On Wed, Sep 5, 2012 at 8:41 AM, Thomas van Neerijnen t...@bossastudios.comwrote: Thanks for the help Aaron. I've checked NodeIdInfo and LocationInfo as below. What am I looking at? I'm guessing the first row in NodeIdInfo represents the ring with the node ids, but the second row perhaps dead nodes with old schemas? That's a total guess, I'd be very interested to know what it and the LocationInfo are. If there's anything else you'd like me to check let me know, otherwise I'll attempt your workaround later today. [default@system] list NodeIdInfo ; Using default limit of 100 --- RowKey: 4c6f63616c = (column=b10552c0-ea0f-11e0--cb1f02ccbcff, value=0a1020d2, timestamp=1317241393645) = (column=e64fc8f0-595b-11e1--51be601cd0d7, value=0a1020d2, timestamp=1329478703871) = (column=732d4690-a596-11e1--51be601cd09f, value=0a1020d2, timestamp=1337860139385) = (column=bffd9d40-aa45-11e1--51be601cd0fe, value=0a1020d2, timestamp=1338375234836) = (column=01efa5d0-e133-11e1--51be601cd0ff, value=0a1020d2, timestamp=1344414498989) = (column=92109b80-ea0a-11e1--51be601cd0af, value=0a1020d2, timestamp=1345386691897) --- RowKey: 43757272656e744c6f63616c = (column=01efa5d0-e133-11e1--51be601cd0ff, value=0a1020d2, timestamp=1344414498989) = (column=92109b80-ea0a-11e1--51be601cd0af, value=0a1020d2, timestamp=1345386691897) 2 Rows Returned. Elapsed time: 128 msec(s). [default@system] list LocationInfo ; Using default limit of 100 --- RowKey: 52696e67 = (column=00, value=0a1080d2, timestamp=134104900) = (column=04a7128b6c83505dcd618720f92028f4, value=0a1020b7, timestamp=1332360971660) = (column=09249249249249249249249249249249, value=0a1080cd, timestamp=1341136002862) = (column=12492492492492492492492492492492, value=0a1020d3, timestamp=1341135999465) = (column=1500, value=0a1060d3, timestamp=134104671) = (column=1555, value=0a1020d3, timestamp=1344530188382) = (column=1b6db6db6db6db6db6db6db6db6db6db, value=0a1020b1, timestamp=1341135997643) = (column=1c71c71c71c71bff, value=0a1080d2, timestamp=1317241889689) = (column=24924924924924924924924924924924, value=0a1060d3, timestamp=1341135996555) = (column=29ff, value=0a1020d3, timestamp=1317241534292) = (column=2aaa, value=0a1060d3, timestamp=1344530187539) = (column=38e38e38e38e37ff, value=0a1060d3, timestamp=1317241257569) = (column=38e38e38e38e38e38e38e38e38e38e38, value=0a1060d3, timestamp=1343136501647) = (column=393170e0207a17d8519f0c1bfe325d51, value=0a1020d3, timestamp=1345381375120) = (column=3fff, value=0a1080d3, timestamp=134104939) = (column=471c71c71c71c71c71c71c71c71c71c6, value=0a1080d3, timestamp=1343133153701) = (column=471c71c71c71c7ff, value=0a1080d3, timestamp=1317241786636) = (column=49249249249249249249249249249249, value=0a1080d3, timestamp=1341136002693) = (column=52492492492492492492492492492492, value=0a106010, timestamp=1341136002626) = (column=53ff, value=0a1020d4, timestamp=1328473688357) = (column=5554, value=0a1060d4, timestamp=134104910) = (column=5b6db6db6db6db6db6db6db6db6db6da, value=0a1060d4, timestamp=1332389784945) = (column=5b6db6db6db6db6db6db6db6db6db6db, value=0a1060d4, timestamp=1341136001027) = (column=638e38e38e38e38e38e38e38e38e38e2, value=0a1060d4, timestamp=1343125208462) = (column=638e38e38e38e3ff, value=0a1060d4, timestamp=1317241257577) = (column=6c00, value=0a1020d3, timestamp=134104789) --- RowKey: 4c = (column=436c75737465724e616d65, value=4d6f6e737465724d696e642050726f6420436c7573746572, timestamp=1317241251097000) = (column=47656e65726174696f6e, value=50447e78, timestamp=134104152000) = (column=50617274696f6e6572, value=6f72672e6170616368652e63617373616e6472612e6468742e52616e646f6d506172746974696f6e6572, timestamp=1317241251097000) = (column=546f6b656e, value=2a00, timestamp=134104214) --- RowKey: 436f6f6b696573 = (column=48696e7473207075726765642061732070617274206f6620757067726164696e672066726f6d20302e362e7820746f20302e37, value=6f68207965732c20697420746865792077657265207075726765642e, timestamp=1317241251249) = (column=5072652d312e302068696e747320707572676564, value=6f68207965732c2074686579207765726520707572676564, timestamp=1326274339337) --- RowKey: 426f6f747374726170 = (column=42, value=01, timestamp=134104213) 4 Rows Returned. Elapsed time: 34 msec(s). On Wed, Sep 5, 2012 at 2:42 AM
java.lang.NoClassDefFoundError when trying to do anything on one CF on one node
Hi I have a single node in a 6 node Cassandra 1.0.11 cluster that seems to have a single column family in a weird state. Repairs, upgradesstables, anything that touches this CF crashes. I've drained the node, removed every file for this CF from said node, removed the commit log, started it up and as soon as data is written to this CF on this node I'm in the same situation again. Anyone have any suggestions for how to fix this? I'm tempted to remove the node and re-add it but I was hoping for something a little less disruptive. $ nodetool -h localhost upgradesstables Player PlayerCounters Error occured while upgrading the sstables for keyspace Player java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: Could not initialize class org.apache.cassandra.utils.NodeId$LocalIds at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.cassandra.db.compaction.CompactionManager.performAllSSTableOperation(CompactionManager.java:219) at org.apache.cassandra.db.compaction.CompactionManager.performSSTableRewrite(CompactionManager.java:235) at org.apache.cassandra.db.ColumnFamilyStore.sstablesRewrite(ColumnFamilyStore.java:999) at org.apache.cassandra.service.StorageService.upgradeSSTables(StorageService.java:1652) at sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427) at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788) at sun.reflect.GeneratedMethodAccessor58.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:303) at sun.rmi.transport.Transport$1.run(Transport.java:159) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:155) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.cassandra.utils.NodeId$LocalIds at org.apache.cassandra.utils.NodeId.localIds(NodeId.java:49) at org.apache.cassandra.utils.NodeId.getOldLocalNodeIds(NodeId.java:79) at org.apache.cassandra.db.CounterColumn.computeOldShardMerger(CounterColumn.java:251) at org.apache.cassandra.db.CounterColumn.mergeAndRemoveOldShards(CounterColumn.java:297) at org.apache.cassandra.db.CounterColumn.mergeAndRemoveOldShards(CounterColumn.java:271) at org.apache.cassandra.db.compaction.PrecompactedRow.removeDeletedAndOldShards(PrecompactedRow.java:81) at org.apache.cassandra.db.compaction.PrecompactedRow.init(PrecompactedRow.java:97) at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:137) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:97) at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:82) at org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:207) at
Why does a large compaction on one node affect the entire cluster?
Hi all I am running Cassandra 1.0.10 installed from the apache debs on ubuntu 11.10 on a 7 node cluster. I moved some tokens around my cluster and now have one node compacting a large Leveled compaction column family. It has done about 5k out of 10k outstanding compactions today. The other nodes have all finished. The weird thing is when it hits a big-ish chunk to compact, for example: pending tasks: 4555 compaction typekeyspace column family bytes compacted bytes total progress Compaction PlayerPlayerDetail 213097 4286616517 0.00% , I see heap usage on it AND all other nodes go insane. Normal operation on all nodes is a leisurely saw-toothed climb to a CMS at just below 3/4 heap size every 10 minutes or so. During the big-ish compaction all nodes in the cluster CMS multiple times in a minute, with the peaks getting close to heap size. So my question is why does one node compacting put so much memory pressure on all the other nodes in the cluster and ruin my day?
Cassandra CF merkle tree
Hi all Is there a way I can easily retrieve a Merkle tree for a CF, like the one created during a repair? I didn't see anything about this in the Thrift API docs, I'm assuming this is a data structure made available only to internal Cassandra functions. I would like to explore using the Merkle trees as a method for data integrity checks after config changes, version upgrades, and probably loads of other scenarios I haven't even thought of that may result in data loss going initially unnoticed.
Re: ReplicateOnWriteStage exception causes a backlog in MutationStage that never clears
The main issue turned out to be a bug in our code whereby we were writing a lot of new columns to the same row key instead of a new row key, turning what we expected to be a skinny rowed CF into a CF with one very, very wide row. These writes on the single key were putting pressure on the 3 nodes holding our replicas. One of the replicas would eventually fail under the pressure and the rest of the cluster would try holding hints for the bad keys writes, which would cause the same problem on the rest of the cluster. On Thu, Mar 22, 2012 at 1:55 AM, Thomas van Neerijnen t...@bossastudios.comwrote: Hi I'm going with yes to all three of your questions. I found a very heavily hit index which we have since reworked to remove the secondry index entirely. This fixed a large portion of the problem but during the panic of the overloaded cluster we did the simple scaling out trick of doubling the cluster, however in the rush two out of the 7 new nodes accidentally ended up on EC2 EBS volumes instead of the usual ephemeral RAID10. So, same error but this time all nodes reporting only the two EBS backed nodes as down instead of the whole cluster getting weird. I'm rsyncing the data off the EBS volume onto an ephemeral RAID10 array as I type so in the next hour or so I'll know if this fixed the issue. On Wed, Mar 21, 2012 at 5:24 PM, aaron morton aa...@thelastpickle.comwrote: The node is overloaded with hints. I'll just grab the comments from code… // avoid OOMing due to excess hints. we need to do this check even for live nodes, since we can // still generate hints for those if it's overloaded or simply dead but not yet known-to-be-dead. // The idea is that if we have over maxHintsInProgress hints in flight, this is probably due to // a small number of nodes causing problems, so we should avoid shutting down writes completely to // healthy nodes. Any node with no hintsInProgress is considered healthy. Are the nodes going up and down a lot ? Are they under GC pressure. The other possibility is that you have overloaded the cluster. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 22/03/2012, at 3:20 AM, Thomas van Neerijnen wrote: Hi all I'm running into a weird error on Cassandra 1.0.7. As my clusters load gets heavier many of the nodes seem to hit the same error around the same time, resulting in MutationStage backing up and never clearing down. The only way to recover the cluster is to kill all the nodes and start them up again. The error is as below and is repeated continuously until I kill the Cassandra process. ERROR [ReplicateOnWriteStage:57] 2012-03-21 14:02:05,099 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[ReplicateOnWriteStage:57,5,main] java.lang.RuntimeException: java.util.concurrent.TimeoutException at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1227) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.util.concurrent.TimeoutException at org.apache.cassandra.service.StorageProxy.sendToHintedEndpoints(StorageProxy.java:301) at org.apache.cassandra.service.StorageProxy$7$1.runMayThrow(StorageProxy.java:544) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1223) ... 3 more
ReplicateOnWriteStage exception causes a backlog in MutationStage that never clears
Hi all I'm running into a weird error on Cassandra 1.0.7. As my clusters load gets heavier many of the nodes seem to hit the same error around the same time, resulting in MutationStage backing up and never clearing down. The only way to recover the cluster is to kill all the nodes and start them up again. The error is as below and is repeated continuously until I kill the Cassandra process. ERROR [ReplicateOnWriteStage:57] 2012-03-21 14:02:05,099 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[ReplicateOnWriteStage:57,5,main] java.lang.RuntimeException: java.util.concurrent.TimeoutException at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1227) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.util.concurrent.TimeoutException at org.apache.cassandra.service.StorageProxy.sendToHintedEndpoints(StorageProxy.java:301) at org.apache.cassandra.service.StorageProxy$7$1.runMayThrow(StorageProxy.java:544) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1223) ... 3 more
Re: ReplicateOnWriteStage exception causes a backlog in MutationStage that never clears
Hi I'm going with yes to all three of your questions. I found a very heavily hit index which we have since reworked to remove the secondry index entirely. This fixed a large portion of the problem but during the panic of the overloaded cluster we did the simple scaling out trick of doubling the cluster, however in the rush two out of the 7 new nodes accidentally ended up on EC2 EBS volumes instead of the usual ephemeral RAID10. So, same error but this time all nodes reporting only the two EBS backed nodes as down instead of the whole cluster getting weird. I'm rsyncing the data off the EBS volume onto an ephemeral RAID10 array as I type so in the next hour or so I'll know if this fixed the issue. On Wed, Mar 21, 2012 at 5:24 PM, aaron morton aa...@thelastpickle.comwrote: The node is overloaded with hints. I'll just grab the comments from code… // avoid OOMing due to excess hints. we need to do this check even for live nodes, since we can // still generate hints for those if it's overloaded or simply dead but not yet known-to-be-dead. // The idea is that if we have over maxHintsInProgress hints in flight, this is probably due to // a small number of nodes causing problems, so we should avoid shutting down writes completely to // healthy nodes. Any node with no hintsInProgress is considered healthy. Are the nodes going up and down a lot ? Are they under GC pressure. The other possibility is that you have overloaded the cluster. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 22/03/2012, at 3:20 AM, Thomas van Neerijnen wrote: Hi all I'm running into a weird error on Cassandra 1.0.7. As my clusters load gets heavier many of the nodes seem to hit the same error around the same time, resulting in MutationStage backing up and never clearing down. The only way to recover the cluster is to kill all the nodes and start them up again. The error is as below and is repeated continuously until I kill the Cassandra process. ERROR [ReplicateOnWriteStage:57] 2012-03-21 14:02:05,099 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[ReplicateOnWriteStage:57,5,main] java.lang.RuntimeException: java.util.concurrent.TimeoutException at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1227) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.util.concurrent.TimeoutException at org.apache.cassandra.service.StorageProxy.sendToHintedEndpoints(StorageProxy.java:301) at org.apache.cassandra.service.StorageProxy$7$1.runMayThrow(StorageProxy.java:544) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1223) ... 3 more
Re: Network, Compaction, Garbage collection and Cache monitoring in cassandra
Collectd with GenericJMX pushing data into Graphite is what we use. You can monitor the Graphite graphs directly instead of having an extra JMX interface on the Cassandra nodes for monitoring. On Wed, Mar 21, 2012 at 8:16 PM, Jeremiah Jordan jeremiah.jor...@morningstar.com wrote: You can also use any network/server monitoring tool which can talk to JMX. We are currently using vFabric Hyperic's JMX plugin for this. IIRC there are some cacti and nagios scripts on github for getting the data into those. -Jeremiah -- *From:* R. Verlangen [ro...@us2.nl] *Sent:* Wednesday, March 21, 2012 10:40 AM *To:* user@cassandra.apache.org *Subject:* Re: Network, Compaction, Garbage collection and Cache monitoring in cassandra Hi Rishabh, Please take a look at OpsCenter: http://www.datastax.com/products/opscenter It provides most of the details you request for. Good luck! 2012/3/21 Rishabh Agrawal rishabh.agra...@impetus.co.in Hello, Can someone help me with how to proactively monitor Network, Compaction, Garbage collection and Cache use in Cassandra. Regards Rishabh -- Impetus to sponsor and exhibit at Structure Data 2012, NY; Mar 21-22. Know more about our Big Data quick-start program at the event. New Impetus webcast ‘Cloud-enabled Performance Testing vis-à-vis On-premise’ available at http://bit.ly/z6zT4L. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
Re: Single Node Cassandra Installation
You'll need to either read or write at at least quorum to get consistent data from the cluster so you may as well do both. Now that you mention it, I was wrong about downtime, with a two node cluster reads or writes at quorum will mean both nodes need to be online. Perhaps you could have an emergency switch in your application which flips to consistency of 1 if one of your Cassandra servers goes down? Just make sure it's set back to quorum when the second one returns or again you could end up with inconsistent data. On Fri, Mar 16, 2012 at 2:04 AM, Drew Kutcharian d...@venarc.com wrote: Thanks for the comments, I guess I will end up doing a 2 node cluster with replica count 2 and read consistency 1. -- Drew On Mar 15, 2012, at 4:20 PM, Thomas van Neerijnen wrote: So long as data loss and downtime are acceptable risks a one node cluster is fine. Personally this is usually only acceptable on my workstation, even my dev environment is redundant, because servers fail, usually when you least want them to, like for example when you've decided to save costs by waiting before implementing redundancy. Could a failure end up costing you more than you've saved? I'd rather get cheaper servers (maybe even used off ebay??) so I could have at least two of them. If you do go with a one node solution, altho I haven't tried it myself Priam looks like a good place to start for backups, otherwise roll your own with incremental snapshotting turned on and a watch on the snapshot directory. Storage on something like S3 or Cloud Files is very cheap so there's no good excuse for no backups. On Thu, Mar 15, 2012 at 7:12 PM, R. Verlangen ro...@us2.nl wrote: Hi Drew, One other disadvantage is the lack of consistency level and replication. Both ware part of the high availability / redundancy. So you would really need to backup your single-node-cluster to some other external location. Good luck! 2012/3/15 Drew Kutcharian d...@venarc.com Hi, We are working on a project that initially is going to have very little data, but we would like to use Cassandra to ease the future scalability. Due to budget constraints, we were thinking to run a single node Cassandra for now and then add more nodes as required. I was wondering if it is recommended to run a single node cassandra in production? Are there any other issues besides lack of high availability? Thanks, Drew
Re: 1.0.8 with Leveled compaction - Possible issues
Heya I'd suggest staying away from Leveled Compaction until 1.0.9. For the why see this great explanation I got from Maki Watanabe on the list: http://mail-archives.apache.org/mod_mbox/cassandra-user/201203.mbox/%3CCALqbeQbQ=d-hORVhA-LHOo_a5j46fQrsZMm+OQgfkgR=4rr...@mail.gmail.com%3E Keep an eye on that one because I'm busy testing one of his suggestions, I'll post back with the results soon. My understanding is after a change in compaction or compression, until you run an upgradesstables on all the nodes the current sstables will have the old schema settings, only new ones get the new format. Obviously this compounds the issue I mentioned above tho. Be warned, an upgradesstables can take a long time so maybe keep an eye on the number of files around vs over 5MB to get an idea of progress. Maybe someone else knows a better way? You can change back and forth between compression and compaction options quite safely, but again you need an upgradesstables to remove it from current sstables. In my experience I've safely applied compression and leveled compaction to the same CF at the same time without issue so I guess it's ok:) On Thu, Mar 15, 2012 at 10:05 PM, Johan Elmerfjord jelme...@adobe.comwrote: ** Hi, I'm testing the community-version of Cassandra 1.0.8. We are currently on 0.8.7 in our production-setup. We have 3 Column Families that each takes between 20 and 35 GB on disk per node. (8*2 nodes total) We would like to change to Leveled Compaction - and even try compression as well to reduce the space needed for compactions. We are running on SSD-drives as latency is a key-issue. As test I have imported one Column Family from 3 production-nodes to a 3 node test-cluster. The data on the 3 nodes ranges from 19-33GB. (with at least one large SSTable (Tiered size - recently compacted)). After loading this data to the 3 test-nodes, and running scrub and repair, I took a backup of the data so I have good test-set of data to work on. Then I changed changed to leveled compaction, using the cassandra-cli: UPDATE COLUMN FAMILY TestCF1 WITH compaction_strategy=LeveledCompactionStrategy; I could see the change being written to the logfile on all nodes. Then I don't know for for sure if I need to run anything else to make the change happen - or if it's just to wait. My test-cluster does not receive new data. For this KS CF and on each of the nodes I have tried some or several of: upgradesstable, scrub, compact, cleanup and repair - each task taking between 40 minutes and 4 hours. With the exception of compact that returns almost immediately with no visible compactions made. On some node I ended up with over 3 files with the default 5MB size for leveled compaction, on another node it didn't look like anything has been done and I still have a 19GB SSTable. I then made another change. UPDATE COLUMN FAMILY TestCF1 WITH compaction_strategy=LeveledCompactionStrategy AND compaction_strategy_options=[{sstable_size_in_mb: 64}]; WARNING: [{}] strategy_options syntax is deprecated, please use {} Which is probably wrong in the documentation - and should be: UPDATE COLUMN FAMILY TestCF1 WITH compaction_strategy=LeveledCompactionStrategy AND compaction_strategy_options={sstable_size_in_mb: 64}; I think that we will be able to find the data in 3 searches with a 64MB size - and still only use around 700MB while doing compactions - and keep the number of files ~3000 per CF. A few days later it looks like I still have a mix between original huge SStables, 5MB once - and some nodes has 64MB files as well. Do I need to do something special to clean this up? I have tried another scrub /upgradesstables/clean - but nothing seems to do any change to me. Finally I have also tried to enable compression: UPDATE COLUMN FAMILY TestCF1 WITH compression_options=[{sstable_compression:SnappyCompressor, chunk_length_kb:64}]; - which results in the same [{}] - warning. As you can see below - this created CompressionInfo.db - files on some nodes - but not on all. *Is there a way I can force Teired sstables to be converted into Leveled once - and then to compression as well?* *Why are the original file (Tiered Sized SSTables still present on testnode1 - when is it supposed to delete them?* *Can I change back and forth between compression (on/off - or chunksizes) - and between Leveled vs Size Tiered compaction?* *Is there a way to see if the node is done - or waiting for something?* *When is it safe to apply another setting - does it have to complete one reorg before moving on to the next?* *Any input or own experiences are warmly welcome.* Best regards, Johan Some lines of example directory-listings below.: Some files for testnode 3. (looks like it's still have the original Size Tiered files around, and a mixture of compressed 64MB files - and 5MB files? total 19G drwxr-xr-x 3 cass cass 4.0K Mar 13 17:11 snapshots -rw-r--r-- 1 cass cass 6.0G Mar 13
Re: Single Node Cassandra Installation
So long as data loss and downtime are acceptable risks a one node cluster is fine. Personally this is usually only acceptable on my workstation, even my dev environment is redundant, because servers fail, usually when you least want them to, like for example when you've decided to save costs by waiting before implementing redundancy. Could a failure end up costing you more than you've saved? I'd rather get cheaper servers (maybe even used off ebay??) so I could have at least two of them. If you do go with a one node solution, altho I haven't tried it myself Priam looks like a good place to start for backups, otherwise roll your own with incremental snapshotting turned on and a watch on the snapshot directory. Storage on something like S3 or Cloud Files is very cheap so there's no good excuse for no backups. On Thu, Mar 15, 2012 at 7:12 PM, R. Verlangen ro...@us2.nl wrote: Hi Drew, One other disadvantage is the lack of consistency level and replication. Both ware part of the high availability / redundancy. So you would really need to backup your single-node-cluster to some other external location. Good luck! 2012/3/15 Drew Kutcharian d...@venarc.com Hi, We are working on a project that initially is going to have very little data, but we would like to use Cassandra to ease the future scalability. Due to budget constraints, we were thinking to run a single node Cassandra for now and then add more nodes as required. I was wondering if it is recommended to run a single node cassandra in production? Are there any other issues besides lack of high availability? Thanks, Drew
cleanup crashing with java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: 8
Hi all I am trying to run a cleanup on a column family and am getting the following error returned after about 15 seconds. A cleanup on a slightly smaller column family completes in about 21 minutes. This is on the Apache packaged version of Cassandra on Ubuntu 11.10, version 1.0.7. ~# nodetool -h localhost cleanup Player PlayerDetail Error occured during cleanup java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: 8 at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.cassandra.db.compaction.CompactionManager.performAllSSTableOperation(CompactionManager.java:203) at org.apache.cassandra.db.compaction.CompactionManager.performCleanup(CompactionManager.java:237) at org.apache.cassandra.db.ColumnFamilyStore.forceCleanup(ColumnFamilyStore.java:984) at org.apache.cassandra.service.StorageService.forceTableCleanup(StorageService.java:1635) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427) at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788) at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305) at sun.rmi.transport.Transport$1.run(Transport.java:159) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:155) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.ArrayIndexOutOfBoundsException: 8 at org.apache.cassandra.db.compaction.LeveledManifest.add(LeveledManifest.java:298) at org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:186) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:141) at org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:494) at org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:234) at org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:1006) at org.apache.cassandra.db.compaction.CompactionManager.doCleanupCompaction(CompactionManager.java:791) at org.apache.cassandra.db.compaction.CompactionManager.access$300(CompactionManager.java:63) at org.apache.cassandra.db.compaction.CompactionManager$5.perform(CompactionManager.java:241) at org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:182) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) ... 3 more
Re: LeveledCompaction and/or SnappyCompressor causing memory pressure during repair
Thanks for the suggestions but I'd already removed the compression when your message came thru. That alleviated the problem but didn't solve it. I'm still looking at a few other possible causes, I'll post back if I work out what's going on, for now I am running rolling repairs to avoid another outage. On Sun, Mar 11, 2012 at 6:32 PM, Edward Capriolo edlinuxg...@gmail.comwrote: One thing you may want to look at is the meanRowSize from nodetool cfstats and your compression block size. In our case the mean compacted size is 560 bytes and 64KB block size caused CPU tickets and a lot of short lived memory. I have brought by block size down to 16K. The result tables are not noticeably larger and less memory pressure on the young gen. I might try going down to 4 K next. On Sat, Mar 10, 2012 at 5:38 PM, Edward Capriolo edlinuxg...@gmail.com wrote: The only downside of compression is it does cause more memory pressure. I can imagine something like repair could confound this. Since it would seem like building the merkle tree would involve decompressing every block on disk. I have been attempting to determine if the block size being larger or smaller has any effect on memory pressure. On Sat, Mar 10, 2012 at 4:50 PM, Peter Schuller peter.schul...@infidyne.com wrote: However, when I run a repair my CMS usage graph no longer shows sudden drops but rather gradual slopes and only manages to clear around 300MB each GC. This seems to occur on 2 other nodes in my cluster around the same time, I assume this is because they're the replicas (we use 3 replicas). Parnew collections look about the same on my graphs with or without repair running so no trouble there so far as I can tell. I don't know why leveled/snappy would affect it, but disregarding that, I would have been suggesting that you are seeing additional heap usage because of long-running repairs retaining sstables and delaying their unload/removal (index sampling/bloom filters filling your heap). If it really only happens for leveled/snappy however, I don't know what that might be caused by. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
LeveledCompaction and/or SnappyCompressor causing memory pressure during repair
Hi all Running Cassandra 1.0.7, I recently changed a few read heavy column families from SizeTieredCompactionStrategy to LeveledCompactionStrategy and added in SnappyCompressor, all with defaults so 5MB files and if memory serves me correctly 64k chunk size for compression. The results were amazingly good, my data size halved and my heap usage and performance stabilised nicely, until it came time to run a repair. When a repair isn't running I'm seeing a saw toothed pattern on my heap graphs with CMS clearing out about 1.5GB each GC run. The CMS GC appears as a sudden vertical drop on the Old Gen usage graph. In addition to what I consider a healthy looking heap usage, my par new and CMS collections are running far quicker than before I made the changes. However, when I run a repair my CMS usage graph no longer shows sudden drops but rather gradual slopes and only manages to clear around 300MB each GC. This seems to occur on 2 other nodes in my cluster around the same time, I assume this is because they're the replicas (we use 3 replicas). Parnew collections look about the same on my graphs with or without repair running so no trouble there so far as I can tell. The symptom of the memory pressure during repair is either the node running the repair of one of the two replicas tends to perform badly with read stage backing up into the thousands at times. If I run a repair on more than one or two nodes at the same time (it's a 7 node cluster) the memory pressure is so bad that half the cluster ends up OOMing, and this happened during off-peak when it's doing about half the reads we handle during peak so not particularly loaded. The question I'm asking is has anyone run into this behaviour before, and if so how was it dealt with? Once I have nursed the cluster thru the repair it's currently running I will be turning off compression on one of my larger CFs to see if it makes a difference, I'll send the results of that test tomorrow.
Final buffer length 4690 to accomodate data size of 2347 for RowMutation error caused node death
Hi all I am running the Apache packaged Cassandra 1.0.7 on Ubuntu 11.10. It has been running fine for over a month however I encountered the below error yesterday which almost immediately resulted in heap usage rising quickly to almost 100% and client requests timing out on the affected node. I gave up waiting for the init script to stop Cassandra and killed it myself after about 3 minutes, restarted it and it has been fine since. Anyone seen this before? Here is the error in the output.log: ERROR 10:51:44,282 Fatal exception in thread Thread[COMMIT-LOG-WRITER,5,main] java.lang.AssertionError: Final buffer length 4690 to accomodate data size of 2347 (predicted 2344) for RowMutation(keyspace='Player', key='36336138643338652d366162302d343334392d383466302d356166643863353133356465', modifications=[ColumnFamily(PlayerCity [SuperColumn(owneditem_1019 []),SuperColumn(owneditem_1024 []),SuperColumn(owneditem_1026 []),SuperColumn(owneditem_1074 []),SuperColumn(owneditem_1077 []),SuperColumn(owneditem_1084 []),SuperColumn(owneditem_1094 []),SuperColumn(owneditem_1130 []),SuperColumn(owneditem_1136 []),SuperColumn(owneditem_1141 []),SuperColumn(owneditem_1142 []),SuperColumn(owneditem_1145 []),SuperColumn(owneditem_1218 [636f6e6e6563746564:false:5@1329648704269002 ,63757272656e744865616c7468:false:3@1329648704269006 ,656e64436f6e737472756374696f6e54696d65:false:13@1329648704269007 ,6964:false:4@1329648704269000,6974656d4964:false:15@1329648704269001 ,6c61737444657374726f79656454696d65:false:1@1329648704269008 ,6c61737454696d65436f6c6c6563746564:false:13@1329648704269005 ,736b696e4964:false:7@1329648704269009,78:false:4@1329648704269003 ,79:false:3@1329648704269004,]),SuperColumn(owneditem_133 []),SuperColumn(owneditem_134 []),SuperColumn(owneditem_135 []),SuperColumn(owneditem_141 []),SuperColumn(owneditem_147 []),SuperColumn(owneditem_154 []),SuperColumn(owneditem_159 []),SuperColumn(owneditem_171 []),SuperColumn(owneditem_253 []),SuperColumn(owneditem_422 []),SuperColumn(owneditem_438 []),SuperColumn(owneditem_515 []),SuperColumn(owneditem_521 []),SuperColumn(owneditem_523 []),SuperColumn(owneditem_525 []),SuperColumn(owneditem_562 []),SuperColumn(owneditem_61 []),SuperColumn(owneditem_634 []),SuperColumn(owneditem_636 []),SuperColumn(owneditem_71 []),SuperColumn(owneditem_712 []),SuperColumn(owneditem_720 []),SuperColumn(owneditem_728 []),SuperColumn(owneditem_787 []),SuperColumn(owneditem_797 []),SuperColumn(owneditem_798 []),SuperColumn(owneditem_838 []),SuperColumn(owneditem_842 []),SuperColumn(owneditem_847 []),SuperColumn(owneditem_849 []),SuperColumn(owneditem_851 []),SuperColumn(owneditem_852 []),SuperColumn(owneditem_853 []),SuperColumn(owneditem_854 []),SuperColumn(owneditem_857 []),SuperColumn(owneditem_858 []),SuperColumn(owneditem_874 []),SuperColumn(owneditem_884 []),SuperColumn(owneditem_886 []),SuperColumn(owneditem_908 []),SuperColumn(owneditem_91 []),SuperColumn(owneditem_911 []),SuperColumn(owneditem_930 []),SuperColumn(owneditem_934 []),SuperColumn(owneditem_937 []),SuperColumn(owneditem_944 []),SuperColumn(owneditem_945 []),SuperColumn(owneditem_962 []),SuperColumn(owneditem_963 []),SuperColumn(owneditem_964 []),])]) at org.apache.cassandra.utils.FBUtilities.serialize(FBUtilities.java:682) at org.apache.cassandra.db.RowMutation.getSerializedBuffer(RowMutation.java:279) at org.apache.cassandra.db.commitlog.CommitLogSegment.write(CommitLogSegment.java:122) at org.apache.cassandra.db.commitlog.CommitLog$LogRecordAdder.run(CommitLog.java:599) at org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService$1.runMayThrow(PeriodicCommitLogExecutorService.java:49) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.lang.Thread.run(Thread.java:662) WARN 10:51:54,302 Heap is 0.764063958911146 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically WARN 10:51:54,303 Flushing CFS(Keyspace='Player', ColumnFamily='PlayerDetail') to relieve memory pressure INFO 11:00:41,162 Started hinted handoff for token: 121529416757478022665490931225631504090 with IP: /10.16.96.212 INFO 11:00:41,163 Finished hinted handoff of 0 rows to endpoint / 10.16.96.212 [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor192] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor165] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor202] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor232] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor146] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor181] [Unloading class sun.reflect.GeneratedSerializationConstructorAccessor190] [Unloading class