4/20 nodes get disproportionate amount of mutations
We've been having issues where as soon as we start doing heavy writes (via hadoop) recently, it really hammers 4 nodes out of 20. We're using random partitioner and we've set the initial tokens for our 20 nodes according to the general spacing formula, except for a few token offsets as we've replaced dead nodes. When I say hammers, I look at nodetool tpstats: those 4 nodes have completed something like 70 million mutation stage events whereas the rest of the cluster have completed from 2-20 million mutation stage events. Therefore, on the 4 nodes, we find in the logs there is evidence of backing up in the mutation stage and a lot of read repair message drops. It looks like there is quite a bit of flushing is going on and consequently auto minor compactions. We are running 0.7.8 and have about 34 column families (when counting secondary indexes as column families) so we can't get too large with our memtable throughput in mb. We would like to upgrade to 0.8.4 (not least because of JAMM) but it seems that something else is going on with our cluster if we are using RP and balanced initial tokens and still have 4 hot nodes. Do these symptoms and context sound familiar to anyone? Does anyone have any suggestions as to how to address this kind of case - disproportionate write load? Thanks, Jeremy
Re: Avoid Simultaneous Minor Compactions?
Change one thing at a time and work out what metric it is you want to improve. I would start with reducing compaction_throughput_mb_per_sec. Have a look in your logs for the Enqueuing flush of Memtable… messages, count up how many serialised bytes you are flushing and then check it against the advice in the yaml file. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 23/08/2011, at 11:36 AM, Hefeng Yuan wrote: Shall I lower this or increase it? Or probably ask in this way, do we suggest to let it run longer while using less CPU, or should we let it finish faster with more CPU usage? The problem we're facing is, with the default setting, they run slow and also eat a lot of CPU in the meanwhile. I'm thinking about the following changes, does this make sense? 1. lower the compaction thread priority 2. shorten the compaction threshold to 2~20 3. lower compaction_throughput_mb_per_sec to 10 Thanks, Hefeng On Aug 22, 2011, at 8:09 AM, Jonathan Ellis wrote: Specifically, look at compaction_throughput_mb_per_sec in cassandra.yaml On Mon, Aug 22, 2011 at 12:39 AM, Ryan King r...@twitter.com wrote: You should throttle your compactions to a sustainable level. -ryan On Sun, Aug 21, 2011 at 10:22 PM, Hefeng Yuan hfy...@rhapsody.com wrote: We just noticed that at one time, 4 nodes were doing minor compaction together, each of them took 20~60 minutes. We're on 0.8.1, 6 nodes, RF5. This simultaneous compactions slowed down the whole cluster, we have local_quorum consistency level, therefore, dynamic_snitch is not helping us. Aside from lower down the compaction thread priority, is there any other way to tell the cluster hold on doing this if other nodes are already compacting? Thanks, Hefeng -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: run Cassandra tutorial example
Did you sort this out ? The #cassandra IRC room is a good place to get help as well. I tried to build it first using mvn compile and got this different error [ERROR] Failed to execute goal on project cassandra-tutorial: Could not resolve dependencies for project com.datastax.tutorial:cassandra-tutorial:jar:1.0-SNAPSHOT: Could not find artifact me.prettyprint:hector-core:jar:0.8.0-2-SNAPSHOT - [Help 1] There is a fix here... https://github.com/zznate/cassandra-tutorial/pull/1 After than I did mvn compile and it worked. So added the schema… path-to-cassandra/bin/cassandra-cli --host localhost ~/code/github/datastax/cassandra-tutorial/npanxx_script.txt And ran mvn -e exec:java -Dexec.args=get -Dexec.mainClass=com.datastax.tutorial.TutorialRunner Which outputted HColumn(city=Austin) Hope that helps. - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 23/08/2011, at 10:30 AM, Alvin UW wrote: Hello, I'd like to try the cassandra tutorial example: https://github.com/zznate/cassandra-tutorial by following the readme. After typing mvn -e exec:java -Dexec.args=get -Dexec.mainClass=com.datastax.tutorial.TutorialRunner I got the following errors. Should I do something before the above command? Thanks. + Error stacktraces are turned on. [INFO] Scanning for projects... [INFO] [INFO] Building cassandra-tutorial [INFO]task-segment: [exec:java] [INFO] [INFO] Preparing exec:java [INFO] No goals needed for project - skipping [INFO] [exec:java {execution: default-cli}] [INFO] [ERROR] BUILD ERROR [INFO] [INFO] An exception occured while executing the Java class. com.datastax.tutorial.TutorialRunner [INFO] [INFO] Trace org.apache.maven.lifecycle.LifecycleExecutionException: An exception occured while executing the Java class. com.datastax.tutorial.TutorialRunner at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:719) at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeStandaloneGoal(DefaultLifecycleExecutor.java:569) at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoal(DefaultLifecycleExecutor.java:539) at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalAndHandleFailures(DefaultLifecycleExecutor.java:387) at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeTaskSegments(DefaultLifecycleExecutor.java:348) at org.apache.maven.lifecycle.DefaultLifecycleExecutor.execute(DefaultLifecycleExecutor.java:180) at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:328) at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:138) at org.apache.maven.cli.MavenCli.main(MavenCli.java:362) at org.apache.maven.cli.compat.CompatibleMain.main(CompatibleMain.java:60) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.codehaus.classworlds.Launcher.launchEnhanced(Launcher.java:315) at org.codehaus.classworlds.Launcher.launch(Launcher.java:255) at org.codehaus.classworlds.Launcher.mainWithExitCode(Launcher.java:430) at org.codehaus.classworlds.Launcher.main(Launcher.java:375) Caused by: org.apache.maven.plugin.MojoExecutionException: An exception occured while executing the Java class. com.datastax.tutorial.TutorialRunner at org.codehaus.mojo.exec.ExecJavaMojo.execute(ExecJavaMojo.java:346) at org.apache.maven.plugin.DefaultPluginManager.executeMojo(DefaultPluginManager.java:490) at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:694) ... 17 more Caused by: java.lang.ClassNotFoundException: com.datastax.tutorial.TutorialRunner at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:284) at java.lang.Thread.run(Thread.java:619) [INFO] [INFO] Total time: 1 second [INFO] Finished at: Mon Aug 22 18:24:05 EDT 2011 [INFO] Final Memory:
Re: 4/20 nodes get disproportionate amount of mutations
We've been having issues where as soon as we start doing heavy writes (via hadoop) recently, it really hammers 4 nodes out of 20. We're using random partitioner and we've set the initial tokens for our 20 nodes according to the general spacing formula, except for a few token offsets as we've replaced dead nodes. Is the hadoop job iterating over keys in the cluster in token order perhaps, and you're generating writes to those keys? That would explain a moving hotspot along the cluster. -- / Peter Schuller (@scode on twitter)
Re: Completely removing a node from the cluster
I'm running low on ideas for this one. Anyone else ? If the phantom node is not listed in the ring, other nodes should not be storing hints for it. You can see what nodes they are storing hints for via JConsole. You can try a rolling restart passing the JVM opt -Dcassandra.load_ring_state=false However if the phantom node is been passed around in the gossip state it will probably just come back again. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 23/08/2011, at 3:49 PM, Bryce Godfrey wrote: Could this ghost node be causing my hints column family to grow to this size? I also crash after about 24 hours due to commit logs growth taking up all the drive space. A manual nodetool flush keeps it under control though. Column Family: HintsColumnFamily SSTable count: 6 Space used (live): 666480352 Space used (total): 666480352 Number of Keys (estimate): 768 Memtable Columns Count: 1043 Memtable Data Size: 461773 Memtable Switch Count: 3 Read Count: 38 Read Latency: 131.289 ms. Write Count: 582108 Write Latency: 0.019 ms. Pending Tasks: 0 Key cache capacity: 7 Key cache size: 6 Key cache hit rate: 0.8334 Row cache: disabled Compacted row minimum size: 2816160 Compacted row maximum size: 386857368 Compacted row mean size: 120432714 Is there a way for me to manually remove this dead node? -Original Message- From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com] Sent: Sunday, August 21, 2011 9:09 PM To: user@cassandra.apache.org Subject: RE: Completely removing a node from the cluster It's been at least 4 days now. -Original Message- From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Sunday, August 21, 2011 3:16 PM To: user@cassandra.apache.org Subject: Re: Completely removing a node from the cluster I see the mistake I made about ring, gets the endpoint list from the same place but uses the token's to drive the whole process. I'm guessing here, don't have time to check all the code. But there is a 3 day timeout in the gossip system. Not sure if it applies in this case. Anyone know ? Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 22/08/2011, at 6:23 AM, Bryce Godfrey wrote: Both .2 and .3 list the same from the mbean that Unreachable is empty collection, and Live node lists all 3 nodes still: 192.168.20.2 192.168.20.3 192.168.20.1 The removetoken was done a few days ago, and I believe the remove was done from .2 Here is what ring outlook looks like, not sure why I get that token on the empty first line either: Address DC RackStatus State LoadOwns Token 85070591730234615865843651857942052864 192.168.20.2datacenter1 rack1 Up Normal 79.53 GB 50.00% 0 192.168.20.3datacenter1 rack1 Up Normal 42.63 GB 50.00% 85070591730234615865843651857942052864 Yes, both nodes show the same thing when doing a describe cluster, that .1 is unreachable. -Original Message- From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Sunday, August 21, 2011 4:23 AM To: user@cassandra.apache.org Subject: Re: Completely removing a node from the cluster Unreachable nodes in either did not respond to the message or were known to be down and were not sent a message. The way the node lists are obtained for the ring command and describe cluster are the same. So it's a bit odd. Can you connect to JMX and have a look at the o.a.c.db.StorageService MBean ? What do the LiveNode and UnrechableNodes attributes say ? Also how long ago did you remove the token and on which machine? Do both 20.2 and 20.3 think 20.1 is still around ? Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 20/08/2011, at 9:48 AM, Bryce Godfrey wrote: I'm on 0.8.4 I have removed a dead node from the cluster using nodetool removetoken command, and moved one of the remaining nodes to rebalance the tokens. Everything looks fine when I run nodetool ring now, as it only lists the remaining 2 nodes and they both look fine, owning 50% of the tokens. However, I can still see it being considered as part of the cluster from the Cassandra-cli (192.168.20.1 being the removed node) and I'm worried that the cluster is still queuing up hints for the node, or any other issues it may cause: Cluster Information:
Re: Completely removing a node from the cluster
I ran into this. I also tried log_ring_state=false which also did not help. The way I got through this was to stop the entire cluster and start the nodes one-by-one. I realize this is not a practical solution for everyone, but if you can afford to stop the cluster for a few minutes, it's worth a try. On Aug 23, 2011, at 9:26 AM, aaron morton wrote: I'm running low on ideas for this one. Anyone else ? If the phantom node is not listed in the ring, other nodes should not be storing hints for it. You can see what nodes they are storing hints for via JConsole. You can try a rolling restart passing the JVM opt -Dcassandra.load_ring_state=false However if the phantom node is been passed around in the gossip state it will probably just come back again. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 23/08/2011, at 3:49 PM, Bryce Godfrey wrote: Could this ghost node be causing my hints column family to grow to this size? I also crash after about 24 hours due to commit logs growth taking up all the drive space. A manual nodetool flush keeps it under control though. Column Family: HintsColumnFamily SSTable count: 6 Space used (live): 666480352 Space used (total): 666480352 Number of Keys (estimate): 768 Memtable Columns Count: 1043 Memtable Data Size: 461773 Memtable Switch Count: 3 Read Count: 38 Read Latency: 131.289 ms. Write Count: 582108 Write Latency: 0.019 ms. Pending Tasks: 0 Key cache capacity: 7 Key cache size: 6 Key cache hit rate: 0.8334 Row cache: disabled Compacted row minimum size: 2816160 Compacted row maximum size: 386857368 Compacted row mean size: 120432714 Is there a way for me to manually remove this dead node? -Original Message- From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com] Sent: Sunday, August 21, 2011 9:09 PM To: user@cassandra.apache.org Subject: RE: Completely removing a node from the cluster It's been at least 4 days now. -Original Message- From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Sunday, August 21, 2011 3:16 PM To: user@cassandra.apache.org Subject: Re: Completely removing a node from the cluster I see the mistake I made about ring, gets the endpoint list from the same place but uses the token's to drive the whole process. I'm guessing here, don't have time to check all the code. But there is a 3 day timeout in the gossip system. Not sure if it applies in this case. Anyone know ? Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 22/08/2011, at 6:23 AM, Bryce Godfrey wrote: Both .2 and .3 list the same from the mbean that Unreachable is empty collection, and Live node lists all 3 nodes still: 192.168.20.2 192.168.20.3 192.168.20.1 The removetoken was done a few days ago, and I believe the remove was done from .2 Here is what ring outlook looks like, not sure why I get that token on the empty first line either: Address DC RackStatus State LoadOwns Token 85070591730234615865843651857942052864 192.168.20.2datacenter1 rack1 Up Normal 79.53 GB 50.00% 0 192.168.20.3datacenter1 rack1 Up Normal 42.63 GB 50.00% 85070591730234615865843651857942052864 Yes, both nodes show the same thing when doing a describe cluster, that .1 is unreachable. -Original Message- From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Sunday, August 21, 2011 4:23 AM To: user@cassandra.apache.org Subject: Re: Completely removing a node from the cluster Unreachable nodes in either did not respond to the message or were known to be down and were not sent a message. The way the node lists are obtained for the ring command and describe cluster are the same. So it's a bit odd. Can you connect to JMX and have a look at the o.a.c.db.StorageService MBean ? What do the LiveNode and UnrechableNodes attributes say ? Also how long ago did you remove the token and on which machine? Do both 20.2 and 20.3 think 20.1 is still around ? Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 20/08/2011, at 9:48 AM, Bryce Godfrey wrote: I'm on 0.8.4 I have removed a dead node from the cluster using nodetool removetoken command, and moved one of the remaining nodes to rebalance the tokens. Everything looks fine when I run nodetool ring now, as
Re: 4/20 nodes get disproportionate amount of mutations
On Aug 23, 2011, at 2:25 AM, Peter Schuller wrote: We've been having issues where as soon as we start doing heavy writes (via hadoop) recently, it really hammers 4 nodes out of 20. We're using random partitioner and we've set the initial tokens for our 20 nodes according to the general spacing formula, except for a few token offsets as we've replaced dead nodes. Is the hadoop job iterating over keys in the cluster in token order perhaps, and you're generating writes to those keys? That would explain a moving hotspot along the cluster. Yes - we're iterating over all the keys of particular column families, doing joins using pig as we enrich and perform measure calculations. When we write, we're usually writing out for a certain small subset of keys which shouldn't have hotspots with RandomPartitioner afaict. -- / Peter Schuller (@scode on twitter)
preloading entire CF with SEQ access on startup
Is there way to preload entire CF into cache with seq access when server starts? I think that standard cache preloader is using random access and because of that its so slow that we cant use it.
Re: 4/20 nodes get disproportionate amount of mutations
Dropped messages in ReadRepair is odd. Are you also dropping mutations ? There are two tasks performed on the ReadRepair stage. The digests are compared on this stage, and secondly the repair happens on the stage. Comparing digests is quick. Doing the repair could take a bit longer, all the cf's returned are collated, filtered and deletes removed. We don't do background Read Repair on range scans, they do have foreground digest checking though. What CL are you using ? begin crazy theory: Could there be a very big row that is out of sync ? The increased RR would be resulting in mutations been sent back to the replicas. Which would give you a hot spot in mutations. Check max compacted row size on the hot nodes. Turn the logging up to DEBUG on the hot machines for o.a.c.service.RowRepairResolver and look for the resolve:… message it has the time taken. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 23/08/2011, at 7:52 PM, Jeremy Hanna wrote: On Aug 23, 2011, at 2:25 AM, Peter Schuller wrote: We've been having issues where as soon as we start doing heavy writes (via hadoop) recently, it really hammers 4 nodes out of 20. We're using random partitioner and we've set the initial tokens for our 20 nodes according to the general spacing formula, except for a few token offsets as we've replaced dead nodes. Is the hadoop job iterating over keys in the cluster in token order perhaps, and you're generating writes to those keys? That would explain a moving hotspot along the cluster. Yes - we're iterating over all the keys of particular column families, doing joins using pig as we enrich and perform measure calculations. When we write, we're usually writing out for a certain small subset of keys which shouldn't have hotspots with RandomPartitioner afaict. -- / Peter Schuller (@scode on twitter)
Re: Completely removing a node from the cluster
I normally link to the data stax article to avoid having to actually write those words :) http://www.datastax.com/docs/0.8/troubleshooting/index#view-of-ring-differs-between-some-nodes A - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 23/08/2011, at 7:45 PM, Jonathan Colby wrote: I ran into this. I also tried log_ring_state=false which also did not help. The way I got through this was to stop the entire cluster and start the nodes one-by-one. I realize this is not a practical solution for everyone, but if you can afford to stop the cluster for a few minutes, it's worth a try. On Aug 23, 2011, at 9:26 AM, aaron morton wrote: I'm running low on ideas for this one. Anyone else ? If the phantom node is not listed in the ring, other nodes should not be storing hints for it. You can see what nodes they are storing hints for via JConsole. You can try a rolling restart passing the JVM opt -Dcassandra.load_ring_state=false However if the phantom node is been passed around in the gossip state it will probably just come back again. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 23/08/2011, at 3:49 PM, Bryce Godfrey wrote: Could this ghost node be causing my hints column family to grow to this size? I also crash after about 24 hours due to commit logs growth taking up all the drive space. A manual nodetool flush keeps it under control though. Column Family: HintsColumnFamily SSTable count: 6 Space used (live): 666480352 Space used (total): 666480352 Number of Keys (estimate): 768 Memtable Columns Count: 1043 Memtable Data Size: 461773 Memtable Switch Count: 3 Read Count: 38 Read Latency: 131.289 ms. Write Count: 582108 Write Latency: 0.019 ms. Pending Tasks: 0 Key cache capacity: 7 Key cache size: 6 Key cache hit rate: 0.8334 Row cache: disabled Compacted row minimum size: 2816160 Compacted row maximum size: 386857368 Compacted row mean size: 120432714 Is there a way for me to manually remove this dead node? -Original Message- From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com] Sent: Sunday, August 21, 2011 9:09 PM To: user@cassandra.apache.org Subject: RE: Completely removing a node from the cluster It's been at least 4 days now. -Original Message- From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Sunday, August 21, 2011 3:16 PM To: user@cassandra.apache.org Subject: Re: Completely removing a node from the cluster I see the mistake I made about ring, gets the endpoint list from the same place but uses the token's to drive the whole process. I'm guessing here, don't have time to check all the code. But there is a 3 day timeout in the gossip system. Not sure if it applies in this case. Anyone know ? Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 22/08/2011, at 6:23 AM, Bryce Godfrey wrote: Both .2 and .3 list the same from the mbean that Unreachable is empty collection, and Live node lists all 3 nodes still: 192.168.20.2 192.168.20.3 192.168.20.1 The removetoken was done a few days ago, and I believe the remove was done from .2 Here is what ring outlook looks like, not sure why I get that token on the empty first line either: Address DC RackStatus State Load OwnsToken 85070591730234615865843651857942052864 192.168.20.2datacenter1 rack1 Up Normal 79.53 GB 50.00% 0 192.168.20.3datacenter1 rack1 Up Normal 42.63 GB 50.00% 85070591730234615865843651857942052864 Yes, both nodes show the same thing when doing a describe cluster, that .1 is unreachable. -Original Message- From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Sunday, August 21, 2011 4:23 AM To: user@cassandra.apache.org Subject: Re: Completely removing a node from the cluster Unreachable nodes in either did not respond to the message or were known to be down and were not sent a message. The way the node lists are obtained for the ring command and describe cluster are the same. So it's a bit odd. Can you connect to JMX and have a look at the o.a.c.db.StorageService MBean ? What do the LiveNode and UnrechableNodes attributes say ? Also how long ago did you remove the token and on which machine? Do both 20.2 and 20.3 think 20.1 is still around ? Cheers - Aaron Morton Freelance
multi-node cassandra config doubt
Hi All, This is regarding multi-node cluster configuration doubt. I have configured 3 nodes of cluster using Cassandra-0.8.4 and getting error when I ran Map/Reduce job which uploads records from HDFS to Cassandra. Here are my 3 nodes cluster config file (cassandra.yaml) for Cassandra: node01: seeds: node01,node02,node03 auto_bootstrap: false listen_address: 192.168.0.1 rpc_address: 192.168.0.1 node02: seeds: node01,node02,node03 auto_bootstrap: true listen_address: 192.168.0.2 rpc_address: 192.168.0.2 node03: seeds: node01,node02,node03 auto_bootstrap: true listen_address: 192.168.0.3 rpc_address: 192.168.0.3 When I ran M/R program, I am getting below error 11/08/23 04:37:00 INFO mapred.JobClient: map 100% reduce 11% 11/08/23 04:37:06 INFO mapred.JobClient: map 100% reduce 22% 11/08/23 04:37:09 INFO mapred.JobClient: map 100% reduce 33% 11/08/23 04:37:14 INFO mapred.JobClient: Task Id : attempt_201104211044_0719_r_00_0, Status : FAILED java.lang.NullPointerException at org.apache.cassandra.client.RingCache.getRange(RingCache.java:130) at org.apache.cassandra.hadoop.ColumnFamilyRecordWriter.write(ColumnFamilyRecordWriter.java:125) at org.apache.cassandra.hadoop.ColumnFamilyRecordWriter.write(ColumnFamilyRecordWriter.java:60) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at CassTblUploader$TblUploadReducer.reduce(CassTblUploader.java:90) at CassTblUploader$TblUploadReducer.reduce(CassTblUploader.java:1) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:563) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) at org.apache.hadoop.mapred.Child.main(Child.java:170) Is anything wrong on my cassandra.yaml file? I followed http://wiki.apache.org/cassandra/MultinodeCluster for cluster configuration. Regards, Thamizhannal
Re: nodetool repair caused high disk space usage
El sáb, 20-08-2011 a las 01:22 +0200, Peter Schuller escribió: Is there any chance that the entire file from source node got streamed to destination node even though only small amount of data in hte file from source node is supposed to be streamed destination node? Yes, but the thing that's annoying me is that even if so - you should not be seeing a 40 gb - hundreds of gig increase even if all neighbors sent all their data. I'm having the very same issue. Trying to repair a node with 90GB of data fills up the 1.5TB drive, and it's still trying to send more data. This is on 0.8.1.
Re: 4/20 nodes get disproportionate amount of mutations
On Aug 23, 2011, at 3:43 AM, aaron morton wrote: Dropped messages in ReadRepair is odd. Are you also dropping mutations ? There are two tasks performed on the ReadRepair stage. The digests are compared on this stage, and secondly the repair happens on the stage. Comparing digests is quick. Doing the repair could take a bit longer, all the cf's returned are collated, filtered and deletes removed. We don't do background Read Repair on range scans, they do have foreground digest checking though. What CL are you using ? CL.ONE for hadoop writes, CL.QUORUM for hadoop reads begin crazy theory: Could there be a very big row that is out of sync ? The increased RR would be resulting in mutations been sent back to the replicas. Which would give you a hot spot in mutations. Check max compacted row size on the hot nodes. Turn the logging up to DEBUG on the hot machines for o.a.c.service.RowRepairResolver and look for the resolve:… message it has the time taken. The max compacted size didn't seem unreasonable - about a MB. I turned up logging to DEBUG for that class and I get plenty of dropped READ_REPAIR messages, but nothing coming out of DEBUG in the logs to indicate the time taken that I can see. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 23/08/2011, at 7:52 PM, Jeremy Hanna wrote: On Aug 23, 2011, at 2:25 AM, Peter Schuller wrote: We've been having issues where as soon as we start doing heavy writes (via hadoop) recently, it really hammers 4 nodes out of 20. We're using random partitioner and we've set the initial tokens for our 20 nodes according to the general spacing formula, except for a few token offsets as we've replaced dead nodes. Is the hadoop job iterating over keys in the cluster in token order perhaps, and you're generating writes to those keys? That would explain a moving hotspot along the cluster. Yes - we're iterating over all the keys of particular column families, doing joins using pig as we enrich and perform measure calculations. When we write, we're usually writing out for a certain small subset of keys which shouldn't have hotspots with RandomPartitioner afaict. -- / Peter Schuller (@scode on twitter)
Re: how to know if nodetool cleanup is safe?
On 21 August 2011 12:34, Yan Chunlu springri...@gmail.com wrote: since nodetool cleanup could remove hinted handoff, will it cause the data loss? Hi Yan, Hints are not guaranteed to be delivered and nodetool cleanup is one of the reasons for that. This will only cause data-loss if you are writing at CL.ANY where a hint counts as a write. If you are writing at CL.ONE or above then at least one replica must receive the data for the write to succeed, so losing hints will not cause data-loss. If a hint is not delivered then the replica to which it was intended will become consistent after a read-repair, or after manual anti-entropy repair. Sam -- Sam Overton Acunu | http://www.acunu.com | @acunu
Different Load values after stress test runs....
Hi, we're running some performance tests against some clusters and I'm curious about some of the numbers I see. I'm running the stress test against two identically configured clusters, but after I run at stress test, I get different Load values across the clusters? The difference between the two clusters is that one uses standard EC2 interfaces, but the other runs on a virtual network. Are these differences indicating something that I should be aware of?? Here is a sample of the kinds of results I'm seeing. Address DC RackStatus State LoadOwns Token 12760588759xxx 10.0.0.17 DC1 RAC1Up Normal 94 MB 25.00% 0 10.0.0.18 DC1 RAC1Up Normal 104.52 MB 25.00% 42535295865xxx 10.0.0.19 DC1 RAC1Up Normal 78.58 MB 25.00% 85070591730xxx 10.0.0.20 DC1 RAC1Up Normal 78.58 MB 25.00% 12760588759xxx Address DC RackStatus State LoadOwns Token 12760588759xxx 10.120.35.52DC1 RAC1Up Normal 103.74 MB 25.00% 0 10.120.6.124DC1 RAC1Up Normal 118.99 MB 25.00% 42535295865xxx 10.127.90.142 DC1 RAC1Up Normal 104.26 MB 25.00% 85070591730xxx 10.94.69.237DC1 RAC1Up Normal 75.74 MB 25.00% 12760588759xxx The first cluster with the vNet (10.0.0.0/28 addresses) consistently show smaller Load values. The total Load of 355MB vs. 402MB with native EC2 interfaces?? Is a total Load value even meaningful?? The stress test is the very first thing that's run against the clusters. [I'm also a little puzzled that these numbers are not uniform within the clusters, but I suspect that's because the stress test is using a key distribution that is Gaussian. I'm not 100% sure of this either since I've seen conflicting documentation. Haven't tried 'random' keys, but I presume that would change them to be uniform] Except for these curious Load numbers, things seem to be running just fine. Getting good fast results. Over 10 iterations I'm getting more than 10-12K inserts per sec. (default values for the stress test). Should I expect the Load to be the same across different clusters?? What might explain the differences I'm seeing??? Thanks in advance. CM
Re: run Cassandra tutorial example
Thanks. By your solution, the problem is fixed. But my output is like this. I don't understand what's the meaning of Error stacktraces are turned on. And I hope only the results are outputted, not the INFO. mvn -e exec:java -Dexec.args=get -Dexec.mainClass=com.datastax.tutorial.TutorialRunner + Error stacktraces are turned on. [INFO] Scanning for projects... [INFO] [INFO] Building cassandra-tutorial [INFO]task-segment: [exec:java] [INFO] [INFO] Preparing exec:java [INFO] No goals needed for project - skipping [INFO] [exec:java {execution: default-cli}] HColumn(city=Austin) [INFO] [INFO] BUILD SUCCESSFUL [INFO] [INFO] Total time: 1 second [INFO] Finished at: Tue Aug 23 12:02:27 EDT 2011 [INFO] Final Memory: 11M/100M 2011/8/23 aaron morton aa...@thelastpickle.com Did you sort this out ? The #cassandra IRC room is a good place to get help as well. I tried to build it first using mvn compile and got this different error [ERROR] Failed to execute goal on project cassandra-tutorial: Could not resolve dependencies for project com.datastax.tutorial:cassandra-tutorial:jar:1.0-SNAPSHOT: Could not find artifact me.prettyprint:hector-core:jar:0.8.0-2-SNAPSHOT - [Help 1] There is a fix here... https://github.com/zznate/cassandra-tutorial/pull/1 After than I did mvn compile and it worked. So added the schema… path-to-cassandra/bin/cassandra-cli --host localhost ~/code/github/datastax/cassandra-tutorial/npanxx_script.txt And ran mvn -e exec:java -Dexec.args=get -Dexec.mainClass=com.datastax.tutorial.TutorialRunner Which outputted HColumn(city=Austin) Hope that helps. - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 23/08/2011, at 10:30 AM, Alvin UW wrote: Hello, I'd like to try the cassandra tutorial example: https://github.com/zznate/cassandra-tutorial by following the readme. After typing mvn -e exec:java -Dexec.args=get -Dexec.mainClass=com.datastax.tutorial.TutorialRunner I got the following errors. Should I do something before the above command? Thanks. + Error stacktraces are turned on. [INFO] Scanning for projects... [INFO] [INFO] Building cassandra-tutorial [INFO]task-segment: [exec:java] [INFO] [INFO] Preparing exec:java [INFO] No goals needed for project - skipping [INFO] [exec:java {execution: default-cli}] [INFO] [ERROR] BUILD ERROR [INFO] [INFO] An exception occured while executing the Java class. com.datastax.tutorial.TutorialRunner [INFO] [INFO] Trace org.apache.maven.lifecycle.LifecycleExecutionException: An exception occured while executing the Java class. com.datastax.tutorial.TutorialRunner at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:719) at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeStandaloneGoal(DefaultLifecycleExecutor.java:569) at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoal(DefaultLifecycleExecutor.java:539) at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalAndHandleFailures(DefaultLifecycleExecutor.java:387) at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeTaskSegments(DefaultLifecycleExecutor.java:348) at org.apache.maven.lifecycle.DefaultLifecycleExecutor.execute(DefaultLifecycleExecutor.java:180) at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:328) at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:138) at org.apache.maven.cli.MavenCli.main(MavenCli.java:362) at org.apache.maven.cli.compat.CompatibleMain.main(CompatibleMain.java:60) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.codehaus.classworlds.Launcher.launchEnhanced(Launcher.java:315) at org.codehaus.classworlds.Launcher.launch(Launcher.java:255) at org.codehaus.classworlds.Launcher.mainWithExitCode(Launcher.java:430) at org.codehaus.classworlds.Launcher.main(Launcher.java:375) Caused by: org.apache.maven.plugin.MojoExecutionException: An exception occured while
Re: Different Load values after stress test runs....
Have you run repair on the nodes ? Maybe some data was lost and not repaired yet ? Philippe 2011/8/23 Chris Marino ch...@vcider.com Hi, we're running some performance tests against some clusters and I'm curious about some of the numbers I see. I'm running the stress test against two identically configured clusters, but after I run at stress test, I get different Load values across the clusters? The difference between the two clusters is that one uses standard EC2 interfaces, but the other runs on a virtual network. Are these differences indicating something that I should be aware of?? Here is a sample of the kinds of results I'm seeing. Address DC RackStatus State LoadOwns Token 12760588759xxx 10.0.0.17 DC1 RAC1Up Normal 94 MB 25.00% 0 10.0.0.18 DC1 RAC1Up Normal 104.52 MB 25.00% 42535295865xxx 10.0.0.19 DC1 RAC1Up Normal 78.58 MB 25.00% 85070591730xxx 10.0.0.20 DC1 RAC1Up Normal 78.58 MB 25.00% 12760588759xxx Address DC RackStatus State LoadOwns Token 12760588759xxx 10.120.35.52DC1 RAC1Up Normal 103.74 MB 25.00% 0 10.120.6.124DC1 RAC1Up Normal 118.99 MB 25.00% 42535295865xxx 10.127.90.142 DC1 RAC1Up Normal 104.26 MB 25.00% 85070591730xxx 10.94.69.237DC1 RAC1Up Normal 75.74 MB 25.00% 12760588759xxx The first cluster with the vNet (10.0.0.0/28 addresses) consistently show smaller Load values. The total Load of 355MB vs. 402MB with native EC2 interfaces?? Is a total Load value even meaningful?? The stress test is the very first thing that's run against the clusters. [I'm also a little puzzled that these numbers are not uniform within the clusters, but I suspect that's because the stress test is using a key distribution that is Gaussian. I'm not 100% sure of this either since I've seen conflicting documentation. Haven't tried 'random' keys, but I presume that would change them to be uniform] Except for these curious Load numbers, things seem to be running just fine. Getting good fast results. Over 10 iterations I'm getting more than 10-12K inserts per sec. (default values for the stress test). Should I expect the Load to be the same across different clusters?? What might explain the differences I'm seeing??? Thanks in advance. CM
Customized Secondary Index Schema
Hello, As mentioned by Ed Anuff in his blog and slides, one way to build customized secondary index is: We use one CF, each row to represent a secondary index, with the secondary index name as row key. For example, Indexes = { User_Keys_By_Last_Name : { adams : e5d61f2b-…, alden : e80a17ba-…, anderson : e5d61f2b-…, davis : e719962b-…, doe : e78ece0f-…, franks : e66afd40-…, … : …, } } But the whole secondary index is partitioned into a single node, because of the row key. All the queries against this secondary index will go to this node. Of course, there are some replica nodes. Do you think this is a scalability problem, or any better solution to solve it? Thanks.
Re: how to know if nodetool cleanup is safe?
On Tue, Aug 23, 2011 at 11:56 AM, Sam Overton sover...@acunu.com wrote: On 21 August 2011 12:34, Yan Chunlu springri...@gmail.com wrote: since nodetool cleanup could remove hinted handoff, will it cause the data loss? Hi Yan, Hints are not guaranteed to be delivered and nodetool cleanup is one of the reasons for that. This will only cause data-loss if you are writing at CL.ANY where a hint counts as a write. If you are writing at CL.ONE or above then at least one replica must receive the data for the write to succeed, so losing hints will not cause data-loss. If a hint is not delivered then the replica to which it was intended will become consistent after a read-repair, or after manual anti-entropy repair. Sam -- Sam Overton Acunu | http://www.acunu.com | @acunu If you run nodetool tpstats on each node in your cluster and ensure none of them have an active or pending threads in the Hinted stage no hints are currently being delivered. But as pointed out above Hinted Handoff is a best effort system.
Re: Completely removing a node from the cluster
On Tue, Aug 23, 2011 at 2:26 AM, aaron morton aa...@thelastpickle.com wrote: I'm running low on ideas for this one. Anyone else ? If the phantom node is not listed in the ring, other nodes should not be storing hints for it. You can see what nodes they are storing hints for via JConsole. I think I found it in https://issues.apache.org/jira/browse/CASSANDRA-3071 --Brandon
Re: How can I patch a single issue
Thanks Jonathan, and thanks Peter. How do u guys use the mail list? I'm using a mail client and this e-mail didn't group up until i found it today... On Aug 19, 2011, at 12:27 PM, Jonathan Ellis wrote: I think this is what you want: https://github.com/stuhood/cassandra/tree/file-format-and-promotion On Fri, Aug 19, 2011 at 1:28 PM, Peter Schuller peter.schul...@infidyne.com wrote: https://issues.apache.org/jira/browse/CASSANDRA-674 But when I downloaded the patch file I can't find the correct trunk to patch... Check it out from git (or svn) and apply to trunk. I'm not sure whether it still applies cleanly; given the size of the patch I wouldn't be surprised if some rebasing is necessary. You might try a trunk from further back in time (around the time Stu submitted the patch). I'm not quite sure what you're actual problem is though, if it's source code access then the easiest route is probably to check it out from https://github.com/apache/cassandra -- / Peter Schuller (@scode on twitter) -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Solandra error - spaces in search
We are getting an error in our Solandra search when the search string contains a space. Is anyone else seeing this? *Net::HTTPFatalError*: 500 null java. lang.ArrayIndexOutOfBoundsException null java.lang.ArrayIndexOutOfBoundsException request: http://10.103.1.70:8983/solandra/users~27/selectorg.apache.solr.common.SolrException: null java.lang.ArrayIndexOutOfBoundsException null java.lang.ArrayIndexOutOfBoundsException request: http://10.103.1.70:8983/solandra/users~27/select \tat org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:435) \tat org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) \tat org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421) \tat org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393) \tat java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) \tat java.util.concurrent.FutureTask.run(FutureTask.java:138) \tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) \tat java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) \tat java.util.concurrent.FutureTask.run(FutureTask.java:138) \tat java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) \tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) \tat java.lang.Thread.run(Thread.java:662)
RE: Completely removing a node from the cluster
Taking the cluster down completely did remove the phantom node. The hintscolumnfamily is causing a lot of commit logs to back up and threaten the commit log drive to run out of space. A manual flush of that column family always clears out the files though. -Original Message- From: Brandon Williams [mailto:dri...@gmail.com] Sent: Tuesday, August 23, 2011 10:42 AM To: user@cassandra.apache.org Subject: Re: Completely removing a node from the cluster On Tue, Aug 23, 2011 at 2:26 AM, aaron morton aa...@thelastpickle.com wrote: I'm running low on ideas for this one. Anyone else ? If the phantom node is not listed in the ring, other nodes should not be storing hints for it. You can see what nodes they are storing hints for via JConsole. I think I found it in https://issues.apache.org/jira/browse/CASSANDRA-3071 --Brandon
checksumming
Are checksum errors detected in Cassandra and if so how are they resolved?
cassandra unexpected shutdown
Hi, I'm running a 16-node cassandra cluster, with a reasonably large amount of data per node (~1TB). Nodes have 16G ram, but heap is set to 8G. The nodes keep stopping with this output in the log. Any ideas? ERROR [Thread-85] 2011-08-23 21:00:38,723 AbstractCassandraDaemon.java (line 113) Fatal exception in thread Thread[Thread-85,5,main] java.lang.OutOfMemoryError: Java heap space ERROR [ReadStage:568] 2011-08-23 21:00:38,723 AbstractCassandraDaemon.java (line 113) Fatal exception in thread Thread[ReadStage:568,5,main] java.lang.OutOfMemoryError: Java heap space INFO [HintedHandoff:1] 2011-08-23 21:00:38,720 HintedHandOffManager.java (line 320) Started hinted handoff for endpoint /10.28.0.184 INFO [GossipStage:2] 2011-08-23 21:00:50,751 Gossiper.java (line 606) InetAddress /10.29.20.67 is now UP ERROR [Thread-34] 2011-08-23 21:00:50,525 AbstractCassandraDaemon.java (line 113) Fatal exception in thread Thread[Thread-34,5,main] java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:73) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658) at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:444) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:117) ERROR [Thread-36] 2011-08-23 21:00:50,518 AbstractCassandraDaemon.java (line 113) Fatal exception in thread Thread[Thread-36,5,main] java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:73) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658) at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:444) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:117) INFO [GossipTasks:1] 2011-08-23 21:00:50,466 Gossiper.java (line 620) InetAddress /10.29.20.67 is now dead. INFO [HintedHandoff:1] 2011-08-23 21:00:50,751 HintedHandOffManager.java (line 376) Finished hinted handoff of 0 rows to endpoint /10.28.0.184 ERROR [Thread-33] 2011-08-23 21:01:05,048 AbstractCassandraDaemon.java (line 113) Fatal exception in thread Thread[Thread-33,5,main] java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:73) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658) at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:444) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:117) ERROR [Thread-128] 2011-08-23 21:01:05,048 AbstractCassandraDaemon.java (line 113) Fatal exception in thread Thread[Thread-128,5,main] java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:73) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658) at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:444) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:117) root@cass1:~#
Re: Could Not connect to cassandra-cli on windows
Hi Aaron, We are using Thrift 5.. TSocket _tr = new TSocket(server.Host, server.Port);//localhost, 9160); _transport = new TFramedTransport(_tr); _protocol = new TBinaryProtocol(_transport); _client = new Cassandra.Client(_protocol); Do you have any clue on what could cause the first exception? Thanks and Regards. Alaa On 8/18/2011 3:59 AM, aaron morton wrote: IIRC cassandra 0.7 needs thrift 0.5, are you using that version ? Perhaps try grabbing the cassandra 0.7 version for one of the pre built clients (pycassa, hector etc) to check things work and then check you are using the same thrift version. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 18/08/2011, at 4:03 PM, Alaa Zubaidi wrote: Hi Aaron, Thanks for the reply. I am running 0.7.4 and NO client. The error was reported by the application where it fails to connect and it happens that 2 threads are trying to connect at the same time. and when I checked the cassandra log I found these errors?? Thanks Alaa On 8/17/2011 4:29 PM, aaron morton wrote: What client, what version, what version of cassandra are you using ? Looks like you are connecting with an old version of thrift, like the message says. Check the client you are using was made for cassandra 0.8. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 18/08/2011, at 7:27 AM, Alaa Zubaidi wrote: Hi, I se this error while the application tries to connect to cassandra at the same time from 2 different threads: any clues: ERROR [pool-1-thread-13] 2011-07-29 06:46:45,718 CustomTThreadPoolServer.java (line 222) Error occurred during processing of message. java.lang.StringIndexOutOfBoundsException: String index out of range: -2147418111 at java.lang.String.checkBounds(String.java:397) at java.lang.String.init(String.java:442) at org.apache.thrift.protocol.TBinaryProtocol.readString(TBinaryProtocol.java:339) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:210) at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2543) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:206) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) ERROR [pool-1-thread-11] 2011-07-29 06:53:21,921 CustomTThreadPoolServer.java (line 218) Thrift error occurred during processing of message. org.apache.thrift.protocol.TProtocolException: Missing version in readMessageBegin, old client? at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:213) at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2543) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:206) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Thanks, Alaa -- Alaa Zubaidi PDF Solutions, Inc. 333 West San Carlos Street, Suite 700 San Jose, CA 95110 USA Tel: 408-283-5639 (or 408-280-7900 x5639) fax: 408-938-6479 email: alaa.zuba...@pdf.com -- Alaa Zubaidi PDF Solutions, Inc. 333 West San Carlos Street, Suite 700 San Jose, CA 95110 USA Tel: 408-283-5639 (or 408-280-7900 x5639) fax: 408-938-6479 email: alaa.zuba...@pdf.com
Re: cassandra unexpected shutdown
Thanks, We had already been running cassandra with a larger heap size, but it meant that java took way too long between garbage collections. The advice I'd found was to set the heap size at the 8 we're running at. It was ok for a while, but now some nodes crash. It's definitely our experience that adding more memory per node actually makes things worse eventually, as java starts eating up too many resources for it to handle. On 8/23/11 5:28 PM, Adi wrote: 2011/8/23 Ernst D Schoen-Renéer...@peoplebrowsr.com: Hi, I'm running a 16-node cassandra cluster, with a reasonably large amount of data per node (~1TB). Nodes have 16G ram, but heap is set to 8G. The nodes keep stopping with this output in the log. Any ideas? ERROR [Thread-85] 2011-08-23 21:00:38,723 AbstractCassandraDaemon.java (line 113) Fatal exception in thread Thread[Thread-85,5,main] java.lang.OutOfMemoryError: Java heap space ERROR [ReadStage:568] 2011-08-23 21:00:38,723 AbstractCassandraDaemon.java (line 113) Fatal exception in thread Thread[ReadStage:568,5,main] java.lang.OutOfMemoryError: Java heap space INFO [HintedHandoff:1] 2011-08-23 21:00:38,720 HintedHandOffManager.java (line 320) Started hinted handoff for endpoint /10.28.0.184 INFO [GossipStage:2] 2011-08-23 21:00:50,751 Gossiper.java (line 606) InetAddress /10.29.20.67 is now UP ERROR [Thread-34] 2011-08-23 21:00:50,525 AbstractCassandraDaemon.java (line 113) Fatal exception in thread Thread[Thread-34,5,main] java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:73) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658) at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:444) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:117) ERROR [Thread-36] 2011-08-23 21:00:50,518 AbstractCassandraDaemon.java (line 113) Fatal exception in thread Thread[Thread-36,5,main] java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:73) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658) at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:444) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:117) INFO [GossipTasks:1] 2011-08-23 21:00:50,466 Gossiper.java (line 620) InetAddress /10.29.20.67 is now dead. INFO [HintedHandoff:1] 2011-08-23 21:00:50,751 HintedHandOffManager.java (line 376) Finished hinted handoff of 0 rows to endpoint /10.28.0.184 ERROR [Thread-33] 2011-08-23 21:01:05,048 AbstractCassandraDaemon.java (line 113) Fatal exception in thread Thread[Thread-33,5,main] java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:73) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658) at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:444) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:117) ERROR [Thread-128] 2011-08-23 21:01:05,048 AbstractCassandraDaemon.java (line 113) Fatal exception in thread Thread[Thread-128,5,main] java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:73) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658) at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:444) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:117) root@cass1:~# You can try the cargo cult solution of upping the heap to 12GB and see if the nodes stabilize. We have a 4-node cluster with 2-3 TB data per node and that was the heap at which it the nodes were managing to serve requests without running out of memory. Ultimately we ordered more memory and are running it with 24 GB heap and the cluster has been stable without complains. Other things you can do for reducing memory usage if they are appropriate for your read/write profile: a) reduce memtable throughput(most reduction in mem footprint) b) disable row caching c) reduce/disable key caching(least reduction) Ultimately you will have to tune
Re: How can I patch a single issue
@Jonathan: I patched CASSANDRA 2530 on this version, and tested it for our financial related case. It really improved a lot on disk consumption, using only 20% of original space for financing-related data storage. The performance is better than MySQL and also it consumes only 1x more than MySQL, much better than previous versions. On Aug 19, 2011, at 12:27 PM, Jonathan Ellis wrote: I think this is what you want: https://github.com/stuhood/cassandra/tree/file-format-and-promotion On Fri, Aug 19, 2011 at 1:28 PM, Peter Schuller peter.schul...@infidyne.com wrote: https://issues.apache.org/jira/browse/CASSANDRA-674 But when I downloaded the patch file I can't find the correct trunk to patch... Check it out from git (or svn) and apply to trunk. I'm not sure whether it still applies cleanly; given the size of the patch I wouldn't be surprised if some rebasing is necessary. You might try a trunk from further back in time (around the time Stu submitted the patch). I'm not quite sure what you're actual problem is though, if it's source code access then the easiest route is probably to check it out from https://github.com/apache/cassandra -- / Peter Schuller (@scode on twitter) -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Solandra error - spaces in search
INFO [769787724@qtp-311722089-9825] 2011-08-23 22:07:53,750 SolrCore.java (line 1370) [users] webapp=/solandra path=/select params={fl=*,scorestart=0q=+(+(first_name:hatice^1.2)+(first_name:hatice~0.9^1.0)++)+AND+(+(last_name:ali^3.0)+(last_name:ali~0.9^2.1)++)+wt=rubyqt=standardrows=1} status=500 QTime=465 ERROR [1745740420@qtp-311722089-9795] 2011-08-23 22:07:53,875 SolrException.java (line 151) java.lang.ArrayIndexOutOfBoundsException: 4 at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:310) at org.apache.lucene.util.ScorerDocQueue.topScore(ScorerDocQueue.java:116) at org.apache.lucene.search.DisjunctionSumScorer.advanceAfterCurrent(DisjunctionSumScorer.java:179) at org.apache.lucene.search.DisjunctionSumScorer.advance(DisjunctionSumScorer.java:229) at org.apache.lucene.search.BooleanScorer2.advance(BooleanScorer2.java:320) at org.apache.lucene.search.ConjunctionScorer.doNext(ConjunctionScorer.java:99) at org.apache.lucene.search.ConjunctionScorer.init(ConjunctionScorer.java:72) at org.apache.lucene.search.ConjunctionScorer.init(ConjunctionScorer.java:33) at org.apache.lucene.search.BooleanScorer2$2.init(BooleanScorer2.java:173) at org.apache.lucene.search.BooleanScorer2.countingConjunctionSumScorer(BooleanScorer2.java:173) at org.apache.lucene.search.BooleanScorer2.makeCountingSumScorerSomeReq(BooleanScorer2.java:234) at org.apache.lucene.search.BooleanScorer2.makeCountingSumScorer(BooleanScorer2.java:211) at org.apache.lucene.search.BooleanScorer2.init(BooleanScorer2.java:101) at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:328) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:527) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:323) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1177) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1065) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:358) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:182) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at solandra.SolandraDispatchFilter.execute(SolandraDispatchFilter.java:171) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at solandra.SolandraDispatchFilter.doFilter(SolandraDispatchFilter.java:137) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Single node cassandra
I'm looking for advice for running cassandra 8.+ on a single node. Would love to hear stories about how much RAM you succeeded with, etc. Currently we are running with a 4GB heap size. Hardware is 4 cores and 8GB physical memory. We're not opposed to going to 16GB of memory or even 32GB. We have fast 6GBps SAS drives, although currently do not have a dedicated drive for Commit Log, but we are thinking about trying this out. Our application has time series data which means our data will get larger over time, not in all CFs though. We have 7 hot CFs but they are probably not that hot to someone running a sizable cluster. We have 15 CFs total in our keyspace. I've already tweaked key cache and in some cases added a large enough row cache to keep an entire CF in memory where appropriate. Basically, I'm just curious under this sort of load if we can expect decent results with a single node as the data gets larger. I know this isn't necessarily a binary answer, just curious about people's experiences on a single node. Thanks, -Derek cfstats: Keyspace: test Read Count: 2869777 Read Latency: 1.2266526890416922 ms. Write Count: 7672036 Write Latency: 0.016663120975970395 ms. Pending Tasks: 0 Column Family: DocData SSTable count: 1 Space used (live): 425090 Space used (total): 425090 Number of Keys (estimate): 128 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 0 Read Count: 162 Read Latency: NaN ms. Write Count: 0 Write Latency: NaN ms. Pending Tasks: 0 Key cache capacity: 1000 Key cache size: 27 Key cache hit rate: NaN Row cache: disabled Compacted row minimum size: 771 Compacted row maximum size: 315852 Compacted row mean size: 18290 Column Family: User SSTable count: 1 Space used (live): 5368 Space used (total): 5368 Number of Keys (estimate): 128 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 0 Read Count: 0 Read Latency: NaN ms. Write Count: 0 Write Latency: NaN ms. Pending Tasks: 0 Key cache capacity: 1000 Key cache size: 0 Key cache hit rate: NaN Row cache capacity: 1000 Row cache size: 0 Row cache hit rate: NaN Compacted row minimum size: 447 Compacted row maximum size: 535 Compacted row mean size: 535 Column Family: SMapping SSTable count: 1 Space used (live): 536900 Space used (total): 536900 Number of Keys (estimate): 768 Memtable Columns Count: 9528 Memtable Data Size: 4646940 Memtable Switch Count: 0 Read Count: 763 Read Latency: NaN ms. Write Count: 763 Write Latency: NaN ms. Pending Tasks: 0 Key cache capacity: 1000 Key cache size: 763 Key cache hit rate: NaN Row cache capacity: 1000 Row cache size: 763 Row cache hit rate: NaN Compacted row minimum size: 447 Compacted row maximum size: 73457 Compacted row mean size: 713 Column Family: LObject SSTable count: 2 Space used (live): 6945430 Space used (total): 6945430 Number of Keys (estimate): 10752 Memtable Columns Count: 729638 Memtable Data Size: 400799230 Memtable Switch Count: 2 Read Count: 316756 Read Latency: NaN ms. Write Count: 147028 Write Latency: NaN ms. Pending Tasks: 0 Key cache capacity: 2 Key cache size: 2 Key cache hit rate: NaN Row cache capacity: 2 Row cache size: 5304 Row cache hit rate: NaN Compacted row minimum size: 259 Compacted row maximum size: 182785 Compacted row mean size: 661 Column Family: SMeta SSTable count: 2 Space used (live): 173059487 Space used (total): 343520951 Number of Keys (estimate): 165760 Memtable Columns
Re: Solandra error - spaces in search
Thx for the info I'll try to reproduce On Aug 23, 2011, at 9:28 PM, Ashley Martens amart...@ngmoco.com wrote: INFO [769787724@qtp-311722089-9825] 2011-08-23 22:07:53,750 SolrCore.java (line 1370) [users] webapp=/solandra path=/select params={fl=*,scorestart=0q=+(+(first_name:hatice^1.2)+(first_name:hatice~0.9^1.0)++)+AND+(+(last_name:ali^3.0)+(last_name:ali~0.9^2.1)++)+wt=rubyqt=standardrows=1} status=500 QTime=465 ERROR [1745740420@qtp-311722089-9795] 2011-08-23 22:07:53,875 SolrException.java (line 151) java.lang.ArrayIndexOutOfBoundsException: 4 at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:310) at org.apache.lucene.util.ScorerDocQueue.topScore(ScorerDocQueue.java:116) at org.apache.lucene.search.DisjunctionSumScorer.advanceAfterCurrent(DisjunctionSumScorer.java:179) at org.apache.lucene.search.DisjunctionSumScorer.advance(DisjunctionSumScorer.java:229) at org.apache.lucene.search.BooleanScorer2.advance(BooleanScorer2.java:320) at org.apache.lucene.search.ConjunctionScorer.doNext(ConjunctionScorer.java:99) at org.apache.lucene.search.ConjunctionScorer.init(ConjunctionScorer.java:72) at org.apache.lucene.search.ConjunctionScorer.init(ConjunctionScorer.java:33) at org.apache.lucene.search.BooleanScorer2$2.init(BooleanScorer2.java:173) at org.apache.lucene.search.BooleanScorer2.countingConjunctionSumScorer(BooleanScorer2.java:173) at org.apache.lucene.search.BooleanScorer2.makeCountingSumScorerSomeReq(BooleanScorer2.java:234) at org.apache.lucene.search.BooleanScorer2.makeCountingSumScorer(BooleanScorer2.java:211) at org.apache.lucene.search.BooleanScorer2.init(BooleanScorer2.java:101) at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:328) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:527) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:323) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1177) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1065) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:358) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:182) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at solandra.SolandraDispatchFilter.execute(SolandraDispatchFilter.java:171) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at solandra.SolandraDispatchFilter.doFilter(SolandraDispatchFilter.java:137) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Memory overhead of vector clocks…. how often are they pruned?
I had a thread going the other day about vector clock memory usage and that it is a series of (clock id, clock):ts and the ability to prune old entries … I'm specifically curious here how often old entries are pruned. If you're storing small columns within cassandra. Say just an integer. The vector clock overhead could easily use up far more data than is actually in your database. However, if they are pruned, then this shouldn't really be a problem. How much memory is this wasting? Thoughts? Jonathan Ellis jbel...@gmail.com to user show details Aug 19 (4 days ago) The problem with naive last write wins is that writes don't always arrive at each replica in the same order. So no, that's a non-starter. Vector clocks are a series of (client id, clock) entries, and usually a timestamp so you can prune old entries. Obviously implementations can vary, but to pick a specific example, Voldemort [1] uses 2 bytes per client id, a variable number (at least one) of bytes for the clock, and 8 bytes for the timestamp. [1] https://github.com/voldemort/voldemort/blob/master/src/java/voldemort/versioning/VectorClock.java -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* Skype-in: *(415) 871-0687*