Re: Stress test using Java-based stress utility
Have you checked the logs on the nodes to see if there are any errors? On 7/21/11 10:43 PM, Nilabja Banerjee wrote: Hi All, I am following this following link " http://www.datastax.com/docs/0.7/utilities/stress_java " for a stress test. I am getting this notification after running this command xxx.xxx.xxx.xx= my ip contrib/stress/bin/stress -d xxx.xxx.xxx.xx Created keyspaces. Sleeping 1s for propagation. total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time Operation [44] retried 10 times - error inserting key 044 ((UnavailableException)) Operation [49] retried 10 times - error inserting key 049 ((UnavailableException)) Operation [7] retried 10 times - error inserting key 007 ((UnavailableException)) Operation [6] retried 10 times - error inserting key 006 ((UnavailableException)) Any idea why I am getting these things? Thank You -- Kirk True Founder, Principal Engineer Expert Engineering Firepower About us:
Re: b-tree
In order order to split the nodes. SimpleGeo have max 1,000 recods (i.e places) on each node in the tree, if the number is 1,000 they split the node. In order to avoid that more then 1 process will edit/split the node - transaction is needed. On Jul 22, 2011 1:01 AM, aaron morton aa...@thelastpickle.com wrote: But how will you be able to maintain it while it evolves and new data is added without transactions? What is the situation you think you need transactions for ? Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 22 Jul 2011, at 00:06, Eldad Yamin wrote: Aaron, Nested set is exactly what I had in mind. But how will you be able to maintain it while it evolves and new data is added without transactions? Thanks! On Thu, Jul 21, 2011 at 1:44 AM, aaron morton aa...@thelastpickle.com wrote: Just throwing out a (half baked) idea, perhaps the Nested Set Model of trees would work http://en.wikipedia.org/wiki/Nested_set_model * Ever row would represent a set with a left and right encoded into the key * Members are inserted as columns into *every* set / row they are a member. So we are de-normalising and trading space for time. * May need to maintain a custom secondary index of the materialised sets. e.g. slice a row to get the first column = the left value you are interested in, that is the key for the set. I've not thought it through much further than that, a lot would depend on your data. The top sets may get very big, . Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 21 Jul 2011, at 08:33, Jeffrey Kesselman wrote: Im not sure if I have an answer for you, anyway, but I'm curious A b-tree and a binary tree are not the same thing. A binary tree is a basic fundamental data structure, A b-tree is an approach to storing and indexing data on disc for a database. Which do you mean? On Wed, Jul 20, 2011 at 4:30 PM, Eldad Yamin elda...@gmail.com wrote: Hello, Is there any good way of storing a binary-tree in Cassandra? I wonder if someone already implement something like that and how accomplished that without transaction supports (while the tree keep evolving)? I'm asking that becouse I want to save geospatial-data, and SimpleGeo did it using b-tree: http://www.readwriteweb.com/cloud/2011/02/video-simplegeo-cassandra.php Thanks! -- It's always darkest just before you are eaten by a grue.
Re: cassandra fatal error when compaction
ERROR [pool-2-thread-3] 2011-07-22 10:34:59,102 Cassandra.java (line 3294) Internal error processing insert java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:73) at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source) at org.apache.cassandra.service.StorageProxy.insertLocal(StorageProxy.java:360) at org.apache.cassandra.service.StorageProxy.sendToHintedEndpoints(StorageProxy.java:241) at org.apache.cassandra.service.StorageProxy.access$000(StorageProxy.java:62) at org.apache.cassandra.service.StorageProxy$1.apply(StorageProxy.java:99) at org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:210) at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:154) at org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:560) at org.apache.cassandra.thrift.CassandraServer.internal_insert(CassandraServer.java:436) at org.apache.cassandra.thrift.CassandraServer.insert(CassandraServer.java:444) at org.apache.cassandra.thrift.Cassandra$Processor$insert.process(Cassandra.java:3286) at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) ERROR [pool-2-thread-6] 2011-07-22 10:34:59,102 Cassandra.java (line 3294) Internal error processing insert java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:73) at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source) at org.apache.cassandra.service.StorageProxy.insertLocal(StorageProxy.java:360) at org.apache.cassandra.service.StorageProxy.sendToHintedEndpoints(StorageProxy.java:241) at org.apache.cassandra.service.StorageProxy.access$000(StorageProxy.java:62) at org.apache.cassandra.service.StorageProxy$1.apply(StorageProxy.java:99) at org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:210) at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:154) at org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:560) at org.apache.cassandra.thrift.CassandraServer.internal_insert(CassandraServer.java:436) at org.apache.cassandra.thrift.CassandraServer.insert(CassandraServer.java:444) at org.apache.cassandra.thrift.Cassandra$Processor$insert.process(Cassandra.java:3286) at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) ERROR [pool-2-thread-3] 2011-07-22 10:34:59,102 Cassandra.java (line 3294) Internal error processing insert java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:73) at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source) at org.apache.cassandra.service.StorageProxy.insertLocal(StorageProxy.java:360) at org.apache.cassandra.service.StorageProxy.sendToHintedEndpoints(StorageProxy.java:241) at org.apache.cassandra.service.StorageProxy.access$000(StorageProxy.java:62) at org.apache.cassandra.service.StorageProxy$1.apply(StorageProxy.java:99) at org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:210) at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:154) at org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:560) at org.apache.cassandra.thrift.CassandraServer.internal_insert(CassandraServer.java:436) at org.apache.cassandra.thrift.CassandraServer.insert(CassandraServer.java:444) at org.apache.cassandra.thrift.Cassandra$Processor$insert.process(Cassandra.java:3286) at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source)
eliminate need to repair by using column TTL??
One of the main reasons for regularly running repair is to make sure deletes are propagated in the cluster, ie, data is not resurrected if a node never received the delete call. And repair-on-read takes care of repairing inconsistencies on-the-fly. So if I were to set a universal TTL on all columns - so everything would only live for a certain age, would I be able to get away without having to do regular repairs with nodetool? I realize this scenario would not be applicable for everyone, but our data model would allow us to do this. So could this be an alternative to running the (resource-intensive, long-running) repairs with nodetool? Thanks.
Re: Stress test using Java-based stress utility
UnavailableException is raised server side when there is less than CL nodes UP when the request starts. It seems odd to get it in this case because the default replication factor used by stress test is 1. How many nodes do you have and have you made any changes to the RF ? Also check the server side logs as Kirk says. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 22 Jul 2011, at 18:37, Kirk True wrote: Have you checked the logs on the nodes to see if there are any errors? On 7/21/11 10:43 PM, Nilabja Banerjee wrote: Hi All, I am following this following link http://www.datastax.com/docs/0.7/utilities/stress_java for a stress test. I am getting this notification after running this command xxx.xxx.xxx.xx= my ip contrib/stress/bin/stress -d xxx.xxx.xxx.xx Created keyspaces. Sleeping 1s for propagation. total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time Operation [44] retried 10 times - error inserting key 044 ((UnavailableException)) Operation [49] retried 10 times - error inserting key 049 ((UnavailableException)) Operation [7] retried 10 times - error inserting key 007 ((UnavailableException)) Operation [6] retried 10 times - error inserting key 006 ((UnavailableException)) Any idea why I am getting these things? Thank You -- Kirk True Founder, Principal Engineer mustardgrain.gif Expert Engineering Firepower About us: twitter.gif linkedin.gif
Re: b-tree
You can use something like Zoo Keeper to coordinate processes doing page splits. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 22 Jul 2011, at 19:05, Eldad Yamin wrote: In order order to split the nodes. SimpleGeo have max 1,000 recods (i.e places) on each node in the tree, if the number is 1,000 they split the node. In order to avoid that more then 1 process will edit/split the node - transaction is needed. On Jul 22, 2011 1:01 AM, aaron morton aa...@thelastpickle.com wrote: But how will you be able to maintain it while it evolves and new data is added without transactions? What is the situation you think you need transactions for ? Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 22 Jul 2011, at 00:06, Eldad Yamin wrote: Aaron, Nested set is exactly what I had in mind. But how will you be able to maintain it while it evolves and new data is added without transactions? Thanks! On Thu, Jul 21, 2011 at 1:44 AM, aaron morton aa...@thelastpickle.com wrote: Just throwing out a (half baked) idea, perhaps the Nested Set Model of trees would work http://en.wikipedia.org/wiki/Nested_set_model * Ever row would represent a set with a left and right encoded into the key * Members are inserted as columns into *every* set / row they are a member. So we are de-normalising and trading space for time. * May need to maintain a custom secondary index of the materialised sets. e.g. slice a row to get the first column = the left value you are interested in, that is the key for the set. I've not thought it through much further than that, a lot would depend on your data. The top sets may get very big, . Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 21 Jul 2011, at 08:33, Jeffrey Kesselman wrote: Im not sure if I have an answer for you, anyway, but I'm curious A b-tree and a binary tree are not the same thing. A binary tree is a basic fundamental data structure, A b-tree is an approach to storing and indexing data on disc for a database. Which do you mean? On Wed, Jul 20, 2011 at 4:30 PM, Eldad Yamin elda...@gmail.com wrote: Hello, Is there any good way of storing a binary-tree in Cassandra? I wonder if someone already implement something like that and how accomplished that without transaction supports (while the tree keep evolving)? I'm asking that becouse I want to save geospatial-data, and SimpleGeo did it using b-tree: http://www.readwriteweb.com/cloud/2011/02/video-simplegeo-cassandra.php Thanks! -- It's always darkest just before you are eaten by a grue.
Re: cassandra fatal error when compaction
Something has shutdown the mutation stage thread pool. This happens during drain or decommission / move. Restart the service and it should be ok. if it happens again without anyone running something like drain, decommission or move let us know. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 22 Jul 2011, at 19:41, lebron james wrote: ERROR [pool-2-thread-3] 2011-07-22 10:34:59,102 Cassandra.java (line 3294) Internal error processing insert java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:73) at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source) at org.apache.cassandra.service.StorageProxy.insertLocal(StorageProxy.java:360) at org.apache.cassandra.service.StorageProxy.sendToHintedEndpoints(StorageProxy.java:241) at org.apache.cassandra.service.StorageProxy.access$000(StorageProxy.java:62) at org.apache.cassandra.service.StorageProxy$1.apply(StorageProxy.java:99) at org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:210) at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:154) at org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:560) at org.apache.cassandra.thrift.CassandraServer.internal_insert(CassandraServer.java:436) at org.apache.cassandra.thrift.CassandraServer.insert(CassandraServer.java:444) at org.apache.cassandra.thrift.Cassandra$Processor$insert.process(Cassandra.java:3286) at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) ERROR [pool-2-thread-6] 2011-07-22 10:34:59,102 Cassandra.java (line 3294) Internal error processing insert java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:73) at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source) at org.apache.cassandra.service.StorageProxy.insertLocal(StorageProxy.java:360) at org.apache.cassandra.service.StorageProxy.sendToHintedEndpoints(StorageProxy.java:241) at org.apache.cassandra.service.StorageProxy.access$000(StorageProxy.java:62) at org.apache.cassandra.service.StorageProxy$1.apply(StorageProxy.java:99) at org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:210) at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:154) at org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:560) at org.apache.cassandra.thrift.CassandraServer.internal_insert(CassandraServer.java:436) at org.apache.cassandra.thrift.CassandraServer.insert(CassandraServer.java:444) at org.apache.cassandra.thrift.Cassandra$Processor$insert.process(Cassandra.java:3286) at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) ERROR [pool-2-thread-3] 2011-07-22 10:34:59,102 Cassandra.java (line 3294) Internal error processing insert java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:73) at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source) at org.apache.cassandra.service.StorageProxy.insertLocal(StorageProxy.java:360) at org.apache.cassandra.service.StorageProxy.sendToHintedEndpoints(StorageProxy.java:241) at org.apache.cassandra.service.StorageProxy.access$000(StorageProxy.java:62) at org.apache.cassandra.service.StorageProxy$1.apply(StorageProxy.java:99) at org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:210) at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:154) at org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:560) at org.apache.cassandra.thrift.CassandraServer.internal_insert(CassandraServer.java:436) at org.apache.cassandra.thrift.CassandraServer.insert(CassandraServer.java:444) at
Re: cassandra fatal error when compaction
it happend again i turn off compaction by setting max and min compaction tresholds to zero, and run, 5 threads of inserts, after base reach 27GB size cassandra fall with same error. OS Windows Server 2008 datacenter, JVM have 1.5 GB heap. cassandra version 0.8.1 all parameters in conf file are default. On Fri, Jul 22, 2011 at 12:18 PM, aaron morton aa...@thelastpickle.comwrote: Something has shutdown the mutation stage thread pool. This happens during drain or decommission / move. Restart the service and it should be ok. if it happens again without anyone running something like drain, decommission or move let us know. Cheers
Re: eliminate need to repair by using column TTL??
Read repair will only repair data that is read on the nodes that are up at that time, and does not guarantee that any changes it detects will be written back to the nodes. The diff mutations are async fire and forget messages which may go missing or be dropped or ignored by the recipient just like any other message. Also getting hit with a bunch of read repair operations is pretty painful. The normal read runs, the coordinator detects the digest mis-match, the read runs again from all nodes and they all have to return their full data (no digests this time), the coordinator detects the diffs, mutations are sent back to each node that needs them. All this happens sync to the read request when the CL ONE. Thats 2 reads with more network IO and up to RF mutations . The delete thing is important but repair also reduces the chance of reads getting hit with RR and gives me confidence when it's necessary to nuke a bad node. Your plan may work but it feels risky to me. You may end up with worse read performance and unpleasent emotions if you ever have to nuke a node. Others may disagree. Not ignoring the fact the repair can take a long time, fail, hurt performance etc. There are plans to improve it though. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 22 Jul 2011, at 19:55, jonathan.co...@gmail.com wrote: One of the main reasons for regularly running repair is to make sure deletes are propagated in the cluster, i.e., data is not resurrected if a node never received the delete call. And repair-on-read takes care of repairing inconsistencies on-the-fly. So if I were to set a universal TTL on all columns - so everything would only live for a certain age, would I be able to get away without having to do regular repairs with nodetool? I realize this scenario would not be applicable for everyone, but our data model would allow us to do this. So could this be an alternative to running the (resource-intensive, long-running) repairs with nodetool? Thanks.
Re: Stress test using Java-based stress utility
Running only one node. I dnt think it is coming for the replication factor... I will try to sort this out Any other suggestions from your side is always be helpful.. :) Thank you On 22 July 2011 14:36, aaron morton aa...@thelastpickle.com wrote: UnavailableException is raised server side when there is less than CL nodes UP when the request starts. It seems odd to get it in this case because the default replication factor used by stress test is 1. How many nodes do you have and have you made any changes to the RF ? Also check the server side logs as Kirk says. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 22 Jul 2011, at 18:37, Kirk True wrote: Have you checked the logs on the nodes to see if there are any errors? On 7/21/11 10:43 PM, Nilabja Banerjee wrote: Hi All, I am following this following link * http://www.datastax.com/docs/0.7/utilities/stress_java * for a stress test. I am getting this notification after running this command *xxx.xxx.xxx.xx= my ip* *contrib/stress/bin/stress -d xxx.xxx.xxx.xx* *Created keyspaces. Sleeping 1s for propagation. total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time Operation [44] retried 10 times - error inserting key 044 ((UnavailableException)) Operation [49] retried 10 times - error inserting key 049 ((UnavailableException)) Operation [7] retried 10 times - error inserting key 007 ((UnavailableException)) Operation [6] retried 10 times - error inserting key 006 ((UnavailableException)) * *Any idea why I am getting these things?* *Thank You * * * -- Kirk True Founder, Principal Engineer mustardgrain.gif http://www.mustardgrain.com/ *Expert Engineering Firepower* About us: twitter.gif http://www.twitter.com/mustardgraininc linkedin.gif http://www.linkedin.com/company/mustard-grain-inc.
Re: Re: eliminate need to repair by using column TTL??
good points Aaron. I realize now how expensive repair on reads are. I'm going to keep doing repairs regularly but still have a max TTL on all columns to make sure we don't have really old data we no longer need getting buried in the cluster. On , aaron morton aa...@thelastpickle.com wrote: Read repair will only repair data that is read on the nodes that are up at that time, and does not guarantee that any changes it detects will be written back to the nodes. The diff mutations are async fire and forget messages which may go missing or be dropped or ignored by the recipient just like any other message. Also getting hit with a bunch of read repair operations is pretty painful. The normal read runs, the coordinator detects the digest mis-match, the read runs again from all nodes and they all have to return their full data (no digests this time), the coordinator detects the diffs, mutations are sent back to each node that needs them. All this happens sync to the read request when the CL ONE. Thats 2 reads with more network IO and up to RF mutations . The delete thing is important but repair also reduces the chance of reads getting hit with RR and gives me confidence when it's necessary to nuke a bad node. Your plan may work but it feels risky to me. You may end up with worse read performance and unpleasent emotions if you ever have to nuke a node. Others may disagree. Not ignoring the fact the repair can take a long time, fail, hurt performance etc. There are plans to improve it though. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 22 Jul 2011, at 19:55, jonathan.co...@gmail.com wrote: One of the main reasons for regularly running repair is to make sure deletes are propagated in the cluster, ie, data is not resurrected if a node never received the delete call. And repair-on-read takes care of repairing inconsistencies on-the-fly. So if I were to set a universal TTL on all columns - so everything would only live for a certain age, would I be able to get away without having to do regular repairs with nodetool? I realize this scenario would not be applicable for everyone, but our data model would allow us to do this. So could this be an alternative to running the (resource-intensive, long-running) repairs with nodetool? Thanks.
Re: Stress test using Java-based stress utility
What does nodetool ring say? On Fri, Jul 22, 2011 at 12:43 AM, Nilabja Banerjee nilabja.baner...@gmail.com wrote: Hi All, I am following this following link http://www.datastax.com/docs/0.7/utilities/stress_java for a stress test. I am getting this notification after running this command xxx.xxx.xxx.xx= my ip contrib/stress/bin/stress -d xxx.xxx.xxx.xx Created keyspaces. Sleeping 1s for propagation. total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time Operation [44] retried 10 times - error inserting key 044 ((UnavailableException)) Operation [49] retried 10 times - error inserting key 049 ((UnavailableException)) Operation [7] retried 10 times - error inserting key 007 ((UnavailableException)) Operation [6] retried 10 times - error inserting key 006 ((UnavailableException)) Any idea why I am getting these things? Thank You -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Is it safe to stop a read repair and any suggestion on speeding up repairs
Short answer, yes it's safe to kill cassandra during a repair. It's one of the nice things about never mutating data. Longer answer: If nodetool compactionstats says there are no Validation compactions running (and the compaction queue is empty) and netstats says there is nothing streaming there is a a good chance the repair is finished or dead. If a neighbour dies during a repair the node it was started on will wait for 48 hours(?) until it times out. Check the logs on the machines for errors, particularly from the AntiEntropyService. And see what compactionstats is saying on all the nodes involved in the repair. Thanks Aaron. One of the neighboring nodes did go down due to running out of memory so I will make sure the repair is dead and start it again per column family. Even Longer: um, 3 TB of data is *way* to much data per node, generally happy people have up to about 200 to 300GB per node. The reason for this recommendation is so that things like repair, compaction, node moves, etc are managable and because the loss of a single node has less of an impact. I would not recommend running a live system with that much data per node. Thanks for the advice and this can be a separate discussion but that will make a Cassandra cluster way too costly , we would have to buy 16 systems for the same amount of data as opposed to 4 that we have now and my IT director will strangle me. -Adi
Equalizing nodes storage load
Hi everyone I've been struggling trying to get the data volume (load) to equalize across a balanced cluster, and I'm not sure what else I can try. Background: This was originally a 5-node cluster. We re-balanced the 3 faster machines across the ring, and decommissioned the 2 older ones. We also upgraded cassandra a few times from 0.7.4 through 0.7.5, 0.7.6-2 to 0.7.7. The ring currently looks like so: Address Status State LoadOwnsToken 151236607520417094872610936636341427313 xx.xx.x.105 Up Normal 41.98 GB33.33% 37809151880104273718152734159085356828 xx.xx.x.107 Up Normal 59.4 GB 33.33% 94522879700260684295381835397713392071 xx.xx.x.18 Up Normal 74.65 GB33.33% 151236607520417094872610936636341427313 What I've tried to far: 1. Running repair on each node (sequentially of course). 2. Running cleanup on the largest node (.18) hoping it would shed unneeded data The repairs helped a bit by, slightly, bumping up the load of the first 2 machines, but the cleanup on the 3rd failed to reduce its data volume. So, at this point, I'm out of ideas. In terms of tpstats metrics, each of the 3 nodes is serving roughly the same volume of ReadStage and MutationStage, so they're balanced in that respect. However I'm concerned about the imbalance of the data load ( 24% / 34% / 42% ) and being unable to equalize it. For the record, there's only 1 keyspace of meaningful data in the cluster, with the following schema settings: Keyspace: ZZ: Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy Options: [DCMTL:2] Column Families: ColumnFamily: AA default_validation_class: org.apache.cassandra.db.marshal.UTF8Type Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type Row cache size / save period in seconds: 256000.0/0 Key cache size / save period in seconds: 20.0/14400 Memtable thresholds: 0.88125/1440/188 (millions of ops/minutes/MB) GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 0.1 Built indexes: [] ColumnFamily: B (Super) default_validation_class: org.apache.cassandra.db.marshal.UTF8Type Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type/org.apache.cassandra.db.marshal.UTF8Type Row cache size / save period in seconds: 75000.0/0 Key cache size / save period in seconds: 20.0/14400 Memtable thresholds: 0.88125/1440/188 (millions of ops/minutes/MB) GC grace seconds: 864000 Compaction min/max thresholds: 4/32 Read repair chance: 0.25 Built indexes: [] Any tips or ideas to help get the nodes' load equalized would be highly appreciated. If this is normal behaviour and I shouldn't be trying too hard to get it equalized, I'd appreciate any notes/links explaining why. Thank you.
Counter consistency - are counters idempotent?
As of Cassandra 0.8.1, are counter increments and decrements idempotent? If, for example, a client sends an increment request and the increment occurs, but the network subsequently fails and reports a failure to the client, will Cassandra retry the increment (thus leading to an overcount and inconsistent data)? I have done some reading and I am getting conflicting sources about counter consistency. In this source ( http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/clarification-of-the-consistency-guarantees-of-Counters-td6421010.html), it states that counters now have the same consistency as regular columns--does this imply that the above example will not lead to an overcount? If counters are not idempotent, are there examples of effective uses of counters that will prevent inconsistent counts? Thank you for your help.
Re: Equalizing nodes storage load
are you trying to balance load or owns ? owns looks fine ... 33.33% each ... which to me says balanced. how did you calculate your tokens? On Fri, Jul 22, 2011 at 4:37 PM, Mina Naguib mina.nag...@bloomdigital.com wrote: Address Status State Load Owns Token xx.xx.x.105 Up Normal 41.98 GB 33.33% 37809151880104273718152734159085356828 xx.xx.x.107 Up Normal 59.4 GB 33.33% 94522879700260684295381835397713392071 xx.xx.x.18 Up Normal 74.65 GB 33.33% 151236607520417094872610936636341427313
Re: Counter consistency - are counters idempotent?
On Fri, Jul 22, 2011 at 4:52 PM, Kenny Yu kenny...@knewton.com wrote: As of Cassandra 0.8.1, are counter increments and decrements idempotent? If, for example, a client sends an increment request and the increment occurs, but the network subsequently fails and reports a failure to the client, will Cassandra retry the increment (thus leading to an overcount and inconsistent data)? I have done some reading and I am getting conflicting sources about counter consistency. In this source (http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/clarification-of-the-consistency-guarantees-of-Counters-td6421010.html), it states that counters now have the same consistency as regular columns--does this imply that the above example will not lead to an overcount? That email thread was arguably a bit imprecise with its use of the word 'consistency' but what it was talking about is really consistency level. That is, counter supports all the usual consistency levels (ONE, QUORUM, ALL, LOCAL_QUORUM, EACH_QUORUM) excepted ANY. Counter are still not idempotent. And just a small precision, if you get a TimeoutException, Cassandra never retry the increment on it's own (your sentence suggests it does), but you won't know in that case if the increment was persisted or not, and thus you won't know if you should retry or not. And yes, this is still a limitation of counters. If counters are not idempotent, are there examples of effective uses of counters that will prevent inconsistent counts? Thank you for your help.
Re: Equalizing nodes storage load
I'm trying to balance Load ( 41.98GB vs 59.4GB vs 74.65GB ) Owns looks ok. They're all 33.33% which is what I want. It was calculated simply by 2^127 / num_nodes. The only reason the first one doesn't start at 0 is that I''ve actually carved the ring planning for 9 machines (2 new data centers of 3 machines each). However only 1 data center (DCMTL) is currently up. On 2011-07-22, at 10:56 AM, Sasha Dolgy wrote: are you trying to balance load or owns ? owns looks fine ... 33.33% each ... which to me says balanced. how did you calculate your tokens? On Fri, Jul 22, 2011 at 4:37 PM, Mina Naguib mina.nag...@bloomdigital.com wrote: Address Status State LoadOwnsToken xx.xx.x.105 Up Normal 41.98 GB33.33% 37809151880104273718152734159085356828 xx.xx.x.107 Up Normal 59.4 GB 33.33% 94522879700260684295381835397713392071 xx.xx.x.18 Up Normal 74.65 GB33.33% 151236607520417094872610936636341427313
CompositeType for row Keys
With the current implementation of CompositeType in Cassandra 0.8.1, is it recommended practice to try to use a CompositeType as the key? Or are both, column and key, equally well supported? The documentation on CompositeType is light, well non-existent really, with key_validation_class set to CompositeType (UUIDType, IntegerType) can we query all matching rows just by using CompositeType(UUIDType)? In my specific use case, what would work best is to have a composite key that is a CompositeType with thousands of columns each.
Re: CompositeType for row Keys
If you are using OPP, then you can use CompositeType on both key and column name; otherwise(Random Partition), just use it for columns. On 22/07/2011 17:10, Patrick Julien wrote: With the current implementation of CompositeType in Cassandra 0.8.1, is it recommended practice to try to use a CompositeType as the key? Or are both, column and key, equally well supported? The documentation on CompositeType is light, well non-existent really, with key_validation_class set to CompositeType (UUIDType, IntegerType) can we query all matching rows just by using CompositeType(UUIDType)? In my specific use case, what would work best is to have a composite key that is a CompositeType with thousands of columns each. -- Donal Zang Computing Center, IHEP 19B YuquanLu, Shijingshan District,Beijing, 100049 zan...@ihep.ac.cn 86 010 8823 6018
Re: CompositeType for row Keys
I can still use it for keys if I don't need ranges then? Because for what we are doing we can always re-assemble keys On Fri, Jul 22, 2011 at 11:38 AM, Donal Zang zan...@ihep.ac.cn wrote: If you are using OPP, then you can use CompositeType on both key and column name; otherwise(Random Partition), just use it for columns. On 22/07/2011 17:10, Patrick Julien wrote: With the current implementation of CompositeType in Cassandra 0.8.1, is it recommended practice to try to use a CompositeType as the key? Or are both, column and key, equally well supported? The documentation on CompositeType is light, well non-existent really, with key_validation_class set to CompositeType (UUIDType, IntegerType) can we query all matching rows just by using CompositeType(UUIDType)? In my specific use case, what would work best is to have a composite key that is a CompositeType with thousands of columns each. -- Donal Zang Computing Center, IHEP 19B YuquanLu, Shijingshan District,Beijing, 100049 zan...@ihep.ac.cn 86 010 8823 6018
Fwd: Counter consistency - are counters idempotent?
btw, this issue of not knowing whether a write is persisted or not when client reports error, is not limited to counters, for regular columns, it's the same: if client reports write failure, the value may well be replicated to all replicas later. this is even the same with all other systems: Zookeeper, Paxos, ultimately due to the FLP theoretical result of no guarantee of consensus in async systems -- Forwarded message -- From: Sylvain Lebresne sylv...@datastax.com Date: Fri, Jul 22, 2011 at 8:03 AM Subject: Re: Counter consistency - are counters idempotent? To: user@cassandra.apache.org On Fri, Jul 22, 2011 at 4:52 PM, Kenny Yu kenny...@knewton.com wrote: As of Cassandra 0.8.1, are counter increments and decrements idempotent? If, for example, a client sends an increment request and the increment occurs, but the network subsequently fails and reports a failure to the client, will Cassandra retry the increment (thus leading to an overcount and inconsistent data)? I have done some reading and I am getting conflicting sources about counter consistency. In this source (http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/clarification-of-the-consistency-guarantees-of-Counters-td6421010.html), it states that counters now have the same consistency as regular columns--does this imply that the above example will not lead to an overcount? That email thread was arguably a bit imprecise with its use of the word 'consistency' but what it was talking about is really consistency level. That is, counter supports all the usual consistency levels (ONE, QUORUM, ALL, LOCAL_QUORUM, EACH_QUORUM) excepted ANY. Counter are still not idempotent. And just a small precision, if you get a TimeoutException, Cassandra never retry the increment on it's own (your sentence suggests it does), but you won't know in that case if the increment was persisted or not, and thus you won't know if you should retry or not. And yes, this is still a limitation of counters. If counters are not idempotent, are there examples of effective uses of counters that will prevent inconsistent counts? Thank you for your help.
Re: Counter consistency - are counters idempotent?
If that's the case, your client is being misleading. Cassandra distinguishes between Unavailable (we knew we couldn't achieve CL before we started, and nothing changed) and TimedOut (didn't get reply in a timely fashion; it may or may not have gone through). TimedOut != Failed. On Fri, Jul 22, 2011 at 11:08 AM, Yang tedd...@gmail.com wrote: btw, this issue of not knowing whether a write is persisted or not when client reports error, is not limited to counters, for regular columns, it's the same: if client reports write failure, the value may well be replicated to all replicas later. this is even the same with all other systems: Zookeeper, Paxos, ultimately due to the FLP theoretical result of no guarantee of consensus in async systems -- Forwarded message -- From: Sylvain Lebresne sylv...@datastax.com Date: Fri, Jul 22, 2011 at 8:03 AM Subject: Re: Counter consistency - are counters idempotent? To: user@cassandra.apache.org On Fri, Jul 22, 2011 at 4:52 PM, Kenny Yu kenny...@knewton.com wrote: As of Cassandra 0.8.1, are counter increments and decrements idempotent? If, for example, a client sends an increment request and the increment occurs, but the network subsequently fails and reports a failure to the client, will Cassandra retry the increment (thus leading to an overcount and inconsistent data)? I have done some reading and I am getting conflicting sources about counter consistency. In this source (http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/clarification-of-the-consistency-guarantees-of-Counters-td6421010.html), it states that counters now have the same consistency as regular columns--does this imply that the above example will not lead to an overcount? That email thread was arguably a bit imprecise with its use of the word 'consistency' but what it was talking about is really consistency level. That is, counter supports all the usual consistency levels (ONE, QUORUM, ALL, LOCAL_QUORUM, EACH_QUORUM) excepted ANY. Counter are still not idempotent. And just a small precision, if you get a TimeoutException, Cassandra never retry the increment on it's own (your sentence suggests it does), but you won't know in that case if the increment was persisted or not, and thus you won't know if you should retry or not. And yes, this is still a limitation of counters. If counters are not idempotent, are there examples of effective uses of counters that will prevent inconsistent counts? Thank you for your help. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: CompositeType for row Keys
On 22/07/2011 17:56, Patrick Julien wrote: I can still use it for keys if I don't need ranges then? Because for what we are doing we can always re-assemble keys yes,but why would you use CompositeType if you don't need range query? On Fri, Jul 22, 2011 at 11:38 AM, Donal Zangzan...@ihep.ac.cn wrote: If you are using OPP, then you can use CompositeType on both key and column name; otherwise(Random Partition), just use it for columns. On 22/07/2011 17:10, Patrick Julien wrote: With the current implementation of CompositeType in Cassandra 0.8.1, is it recommended practice to try to use a CompositeType as the key? Or are both, column and key, equally well supported? The documentation on CompositeType is light, well non-existent really, with key_validation_class set to CompositeType (UUIDType, IntegerType) can we query all matching rows just by using CompositeType(UUIDType)? In my specific use case, what would work best is to have a composite key that is a CompositeType with thousands of columns each. -- Donal Zang Computing Center, IHEP 19B YuquanLu, Shijingshan District,Beijing, 100049 zan...@ihep.ac.cn 86 010 8823 6018 -- Donal Zang Computing Center, IHEP 19B YuquanLu, Shijingshan District,Beijing, 100049 zan...@ihep.ac.cn 86 010 8823 6018
Re: [SPAM] Fwd: Counter consistency - are counters idempotent?
On 22/07/2011 18:08, Yang wrote: btw, this issue of not knowing whether a write is persisted or not when client reports error, is not limited to counters, for regular columns, it's the same: if client reports write failure, the value may well be replicated to all replicas later. this is even the same with all other systems: Zookeeper, Paxos, ultimately due to the FLP theoretical result of no guarantee of consensus in async systems yes, but with regular columns, retry is OK, while counter is not. -- Forwarded message -- From: Sylvain Lebresnesylv...@datastax.com Date: Fri, Jul 22, 2011 at 8:03 AM Subject: Re: Counter consistency - are counters idempotent? To: user@cassandra.apache.org On Fri, Jul 22, 2011 at 4:52 PM, Kenny Yukenny...@knewton.com wrote: As of Cassandra 0.8.1, are counter increments and decrements idempotent? If, for example, a client sends an increment request and the increment occurs, but the network subsequently fails and reports a failure to the client, will Cassandra retry the increment (thus leading to an overcount and inconsistent data)? I have done some reading and I am getting conflicting sources about counter consistency. In this source (http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/clarification-of-the-consistency-guarantees-of-Counters-td6421010.html), it states that counters now have the same consistency as regular columns--does this imply that the above example will not lead to an overcount? That email thread was arguably a bit imprecise with its use of the word 'consistency' but what it was talking about is really consistency level. That is, counter supports all the usual consistency levels (ONE, QUORUM, ALL, LOCAL_QUORUM, EACH_QUORUM) excepted ANY. Counter are still not idempotent. And just a small precision, if you get a TimeoutException, Cassandra never retry the increment on it's own (your sentence suggests it does), but you won't know in that case if the increment was persisted or not, and thus you won't know if you should retry or not. And yes, this is still a limitation of counters. If counters are not idempotent, are there examples of effective uses of counters that will prevent inconsistent counts? Thank you for your help. -- Donal Zang Computing Center, IHEP 19B YuquanLu, Shijingshan District,Beijing, 100049 zan...@ihep.ac.cn 86 010 8823 6018
Re: CompositeType for row Keys
yes,but why would you use CompositeType if you don't need range query? If you were doing composite keys anyway (common approach with time series data for example), you would not have to write parsing and concatenation code. Particularly useful if you had mixed types in the key.
Re: CompositeType for row Keys
Exactly. In any case, I just answered my own question. If I need range, I can just make another column family where the column name are these keys On Fri, Jul 22, 2011 at 12:37 PM, Nate McCall n...@datastax.com wrote: yes,but why would you use CompositeType if you don't need range query? If you were doing composite keys anyway (common approach with time series data for example), you would not have to write parsing and concatenation code. Particularly useful if you had mixed types in the key.
Predictable low RW latency, SLABS and STW GC
In order to be predicable @ big data scale, the intensity and periodicity of STW Garbage Collection has to be brought down. Assume that SLABS (Cass 2252) will be available in the main line at some time and assume that this will have the impact that other projects (hbase etc) are reporting. I womder whether avoiding GC by restarting the servers before GC will be a feasible approach (of course while knowing the workload) Regards Milind
[RELEASE] Apache Cassandra 0.7.8 released
The Cassandra team is pleased to announce the release of Apache Cassandra version 0.7.8. This version is a bug fix release[1] and in particular it fixes a regression of Cassandra 0.7.7 that made hinted handoff delivery not being triggered automatically (you could still force delivery through JMX). For that reason, upgrade is highly encouraged. Please always pay attention to the release notes[2] before upgrading though. This release can be downloaded as usual from http://cassandra.apache.org/. If you were to encounter any problem, please let us know[3]. Have fun! [1]: http://goo.gl/LrBBY (CHANGES.txt) [2]: http://goo.gl/rX1q0 (NEWS.txt) [3]: https://issues.apache.org/jira/browse/CASSANDRA
Re: b-tree
On Fri, Jul 22, 2011 at 12:05 AM, Eldad Yamin elda...@gmail.com wrote: In order order to split the nodes. SimpleGeo have max 1,000 recods (i.e places) on each node in the tree, if the number is 1,000 they split the node. In order to avoid that more then 1 process will edit/split the node - transaction is needed. You don't need a transaction, you just need consensus and/or idempotence. In this case both can be achieved fairly easily. Mike On Jul 22, 2011 1:01 AM, aaron morton aa...@thelastpickle.com wrote: But how will you be able to maintain it while it evolves and new data is added without transactions? What is the situation you think you need transactions for ? Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 22 Jul 2011, at 00:06, Eldad Yamin wrote: Aaron, Nested set is exactly what I had in mind. But how will you be able to maintain it while it evolves and new data is added without transactions? Thanks! On Thu, Jul 21, 2011 at 1:44 AM, aaron morton aa...@thelastpickle.com wrote: Just throwing out a (half baked) idea, perhaps the Nested Set Model of trees would work http://en.wikipedia.org/wiki/Nested_set_model * Ever row would represent a set with a left and right encoded into the key * Members are inserted as columns into *every* set / row they are a member. So we are de-normalising and trading space for time. * May need to maintain a custom secondary index of the materialised sets. e.g. slice a row to get the first column = the left value you are interested in, that is the key for the set. I've not thought it through much further than that, a lot would depend on your data. The top sets may get very big, . Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 21 Jul 2011, at 08:33, Jeffrey Kesselman wrote: Im not sure if I have an answer for you, anyway, but I'm curious A b-tree and a binary tree are not the same thing. A binary tree is a basic fundamental data structure, A b-tree is an approach to storing and indexing data on disc for a database. Which do you mean? On Wed, Jul 20, 2011 at 4:30 PM, Eldad Yamin elda...@gmail.com wrote: Hello, Is there any good way of storing a binary-tree in Cassandra? I wonder if someone already implement something like that and how accomplished that without transaction supports (while the tree keep evolving)? I'm asking that becouse I want to save geospatial-data, and SimpleGeo did it using b-tree: http://www.readwriteweb.com/cloud/2011/02/video-simplegeo-cassandra.php Thanks! -- It's always darkest just before you are eaten by a grue.
Re: how to stop the whole cluster, start the whole cluster like in hadoop/hbase?
Yes, I am wondering more about the yaml file and the settings like the autobootstrap setting and such. I guess I will find out once they enable my amazon service and I can get running with it. NOTE: anyone doing 1.0 or prototype I think constantly uses start/stop whole cluster to upgrade/install new stuff into all the nodes in cassandrayes, we don't plan on using that in production of course as then we would prefer to do a rolling restart to get new code into the datagrid if needed. thanks, Dean On Thu, Jul 21, 2011 at 2:24 PM, Eldad Yamin elda...@gmail.com wrote: I wonder if it wont make problems... Anyine did it already? On Jul 21, 2011 10:39 PM, Jonathan Ellis jbel...@gmail.com wrote: dsh -c -g cassandra /etc/init.d/cassandra stop http://www.netfort.gr.jp/~dancer/software/dsh.html.en P.S. mostly people are concerned about making sure their entire cluster does NOT stop at the same time :) On Thu, Jul 21, 2011 at 2:23 PM, Dean Hiller d...@alvazan.com wrote: Is there a framework for stopping all nodes/starting all nodes for cassandra? I am okay with something like password-less ssh setup that hadoop scripts did...just something that allows me to start and stop the whole cluster. thanks, Dean -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
CL=N-1?
is there such an option? in some cases I want to distribute some small lookup tables to all the nodes, so that everyone has a local copy, and loaded in memory. so the lookup is fast. supposedly I want to write to all N nodes, but that exposes me to failure in case of just one node down. so I'd like to declare success to N-1 nodes. thanks Yang
CL=N-1?
is there such an option? in some cases I want to distribute some small lookup tables to all the nodes, so that everyone has a local copy, and loaded in memory. so the lookup is fast. supposedly I want to write to all N nodes, but that exposes me to failure in case of just one node down. so I'd like to declare success to N-1 nodes. thanks Yang
Re: CL=N-1?
On Fri, Jul 22, 2011 at 3:24 PM, Yang tedd...@gmail.com wrote: is there such an option? in some cases I want to distribute some small lookup tables to all the nodes, so that everyone has a local copy, and loaded in memory. so the lookup is fast. supposedly I want to write to all N nodes, but that exposes me to failure in case of just one node down. so I'd like to declare success to N-1 nodes. thanks Yang There is no n-1 CL. The numbered levels are ONE,TWO,THREE. There has been some talk of this. By pairing READ/WRITE levels you can normally get an effect close enough to what you are looking for. In your case QUORUM will write and read with a single FAILED node. Also if your lookup tables are not changing often not having n-1 is negligible.
Re: Repair fails with java.io.IOError: java.io.EOFException
I don't see a JVM crashlog ( hs_err_pid[pid].log) in ~/brisk/resources/cassandra/bin or /tmp. So maybe JVM didn't crash? We're running a pretty up to date with Sun Java: ubuntu@ip-10-2-x-x:/tmp$ java -version java version 1.6.0_24 Java(TM) SE Runtime Environment (build 1.6.0_24-b07) Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode) I'm gonna restart the Repair process in a few more hours. If there are any additional debug or troubleshooting logs you'd like me to enable first, please let me know. - Sameer On Thu, Jul 21, 2011 at 5:31 PM, Jonathan Ellis jbel...@gmail.com wrote: Did you check for a JVM crash log? You should make sure you're running the latest Sun JVM, older versions and OpenJDK in particular are prone to segfaulting. On Thu, Jul 21, 2011 at 6:53 PM, Sameer Farooqui cassandral...@gmail.com wrote: We are starting Cassandra with brisk cassandra, so as a stand-alone process, not a service. The syslog on the node doesn't show anything regarding the Cassandra Java process around the time the last entries were made in the Cassandra system.log (2011-07-21 13:01:51): Jul 21 12:35:01 ip-10-2-206-127 CRON[12826]: (root) CMD (command -v debian-sa1 /dev/null debian-sa1 1 1) Jul 21 12:45:01 ip-10-2-206-127 CRON[13420]: (root) CMD (command -v debian-sa1 /dev/null debian-sa1 1 1) Jul 21 12:55:01 ip-10-2-206-127 CRON[14021]: (root) CMD (command -v debian-sa1 /dev/null debian-sa1 1 1) Jul 21 14:26:07 ip-10-2-206-127 kernel: imklog 4.2.0, log source = /proc/kmsg started. Jul 21 14:26:07 ip-10-2-206-127 rsyslogd: [origin software=rsyslogd swVersion=4.2.0 x-pid=663 x-info=http://www.rsyslog.com;] (re)start The last thing in the Cassandra log before INFO Logging initialized is: INFO [ScheduledTasks:1] 2011-07-21 13:01:51,187 GCInspector.java (line 128) GC for ParNew: 202 ms, 153219160 reclaimed leaving 2040879600 used; max is 4030726144 I can start Repair again, but am worried that it will crash Cassandra again, so I want to turn on any debugging or helpful logs to diagnose the crash if it happens again. - Sameer On Thu, Jul 21, 2011 at 4:30 PM, aaron morton aa...@thelastpickle.com wrote: The default init.d script will direct std out/err to that file, how are you starting brisk / cassandra ? Check the syslog and other logs in /var/log to see if the OS killed cassandra. Also, what was the last thing in the casandra log before INFO [main] 2011-07-21 15:48:07,233 AbstractCassandraDaemon.java (line 78) Logging initialised ? Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 22 Jul 2011, at 10:50, Sameer Farooqui wrote: Hey Aaron, I don't have any output.log files in that folder: ubuntu@ip-10-2-x-x:~$ cd /var/log/cassandra ubuntu@ip-10-2-x-x:/var/log/cassandra$ ls system.log system.log.11 system.log.4 system.log.7 system.log.1 system.log.2 system.log.5 system.log.8 system.log.10 system.log.3 system.log.6 system.log.9 On Thu, Jul 21, 2011 at 3:40 PM, aaron morton aa...@thelastpickle.com wrote: Check /var/log/cassandra/output.log (assuming the default init scripts) A - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 22 Jul 2011, at 10:13, Sameer Farooqui wrote: Hmm. Just looked at the log more closely. So, what actually happened is while Repair was running on this specific node, the Cassandra java process terminated itself automatically. The last entries in the log are: INFO [ScheduledTasks:1] 2011-07-21 13:00:20,285 GCInspector.java (line 128) GC for ParNew: 214 ms, 162748656 reclaimed leaving 1845274888 used; max is 4030726144 INFO [ScheduledTasks:1] 2011-07-21 13:00:27,375 GCInspector.java (line 128) GC for ParNew: 266 ms, 158835624 reclaimed leaving 1864471688 used; max is 4030726144 INFO [ScheduledTasks:1] 2011-07-21 13:00:57,658 GCInspector.java (line 128) GC for ParNew: 251 ms, 148861328 reclaimed leaving 193120 used; max is 4030726144 INFO [ScheduledTasks:1] 2011-07-21 13:01:19,358 GCInspector.java (line 128) GC for ParNew: 260 ms, 157638152 reclaimed leaving 1955746368 used; max is 4030726144 INFO [ScheduledTasks:1] 2011-07-21 13:01:22,729 GCInspector.java (line 128) GC for ParNew: 325 ms, 154157352 reclaimed leaving 1969361176 used; max is 4030726144 INFO [ScheduledTasks:1] 2011-07-21 13:01:51,187 GCInspector.java (line 128) GC for ParNew: 202 ms, 153219160 reclaimed leaving 2040879600 used; max is 4030726144 When we came in this morning, nodetool ring from another node showed the 1st node as down and OpsCenter also reported it as down. Next we ran sudo netstat -anp | grep 7199 from the 1st node to see the status of the Cassandra PID and it was not running. We then started Cassandra: INFO [main]
Re: b-tree
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I've not tried this but speculative implementation schema probably something like the following: Super Col family for structure hash(nodeId): { root: { left=nodeId1, right=nodeId2 } nodeId1: { left=nodeId3, right=nodeId4 } nodeId2: { left=nodeId5, right=nodeId6 } } Col family for data root: { meta=blob col1=val1 col2=val2 } nodeId1: { meta=blob col3=val1 col4=val3 } etc Same Col family for index val1: { root: , nodeId1= } val2: { root: } val1: { nodeId1= } I think 3 Multiget queries covers most logic and splitting the tree etc should be reasonably easy to manage. Though you probably pull more data back to the client than was strictly needed for the logic. Something along these lines would probably work. p On 20/07/11 21:30, Eldad Yamin wrote: Hello, Is there any good way of storing a binary-tree in Cassandra? I wonder if someone already implement something like that and how accomplished that without transaction supports (while the tree keep evolving)? I'm asking that becouse I want to save geospatial-data, and SimpleGeo did it using b-tree: http://www.readwriteweb.com/cloud/2011/02/video-simplegeo-cassandra.php Thanks! -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJOKd7gAAoJEFL3PBwM86C5QhcH/2a3/dkvGcIWuc81s3Ffv3Gp qdcJ8zTajZnDK5iH7+vItDGiTTr8QcUNDLCz/CsVIBIglMHGZXe0n4m43VAlJGy2 LiVCqP8QPO8QAP+4Juwy3TORh/QlY6+ac+OXDD7FDHNjujQfTy3EW9ey7bx96g8m M8XeFM4jXME+YAkUxLUhN0azx/khAM5AQGqEkoOs4W3+BVlYBTmejNmSZZZICUxz SIapVX6mzzeJhQdXRKSx1dUBJBbt78duG3WcXw2y1hqH0T9Kk8LHQqoCOVzkXQQa GGjUrKArTHKJAJwLbu9qxV9uR7kjaBYiBkAYD/7R6TnDwAaqpk6tnWnOC16aMbc= =WX26 -END PGP SIGNATURE-
Re: [SPAM] Fwd: Counter consistency - are counters idempotent?
On Fri, Jul 22, 2011 at 9:27 AM, Donal Zang zan...@ihep.ac.cn wrote: On 22/07/2011 18:08, Yang wrote: btw, this issue of not knowing whether a write is persisted or not when client reports error, is not limited to counters, for regular columns, it's the same: if client reports write failure, the value may well be replicated to all replicas later. this is even the same with all other systems: Zookeeper, Paxos, ultimately due to the FLP theoretical result of no guarantee of consensus in async systems yes, but with regular columns, retry is OK, while counter is not. I know I've heard that fixing this issue is hard. I've assumed this to mean don't expect a fix anytime soon. Is that accurate? Beginning to start having second thoughts that Cassandra is the right fit for my project which would heavily rely on counters to roll up aggregates. -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin carpe diem quam minimum credula postero
question on setup for writes into 2 datacenters
Ideally, we would want to have a replication factor of 4, and a minimum write consistency of 2 (which looking at the default in cassandra.yaml is to memory first with asynch to disk...perfect so far!!!) Now, obviously, I can get the partitioner setup to make sure I get 2 replicas in each data center. The next thing I would want to guarantee however is that if a write came into datacenter 1, it would write to the two nodes in datacenter 1 and asynchronously replicate to datacenter 2. Is this possible? Does cassandra already handle that or is there something I could do to get cassandra to do that? In this mode, I believe I can have both datacenters be live as well as be backup for the other not wasting resources. thanks, Dean
CQL COUNT Not Accurate?
Hi, I just noticed that the count(*) in CQL seems to be having wrong answer, when I have only one row, the count(*) returns two. Below are the commands I tried: cqlsh SELECT COUNT(*) FROM UserProfile USING CONSISTENCY QUORUM WHERE KEY IN ('00D760DB1730482D81BC6845F875A97D'); (2,) cqlsh select * from UserProfile where key = '00D760DB1730482D81BC6845F875A97D'; u'00D760DB1730482D81BC6845F875A97D' | u'ScreenName','3D5C78CAE2E143FBBD1F539A8496D472' Is this a known bug? Or I did something wrong? Thanks, Hefeng
Re: CQL COUNT Not Accurate?
On Fri, 2011-07-22 at 14:18 -0700, Hefeng Yuan wrote: Hi, I just noticed that the count(*) in CQL seems to be having wrong answer, when I have only one row, the count(*) returns two. Below are the commands I tried: cqlsh SELECT COUNT(*) FROM UserProfile USING CONSISTENCY QUORUM WHERE KEY IN ('00D760DB1730482D81BC6845F875A97D'); (2,) cqlsh select * from UserProfile where key = '00D760DB1730482D81BC6845F875A97D'; u'00D760DB1730482D81BC6845F875A97D' | u'ScreenName','3D5C78CAE2E143FBBD1F539A8496D472' Is this a known bug? Or I did something wrong? It's the equivalent of get_count(), so it returns a column count instead of row count. -- Eric Evans eev...@rackspace.com
select * from A join B using(common_id) where A.id == a and B.id == b
this is a common pattern used in RDMS, is there some existing idiom to do it in cassandra ? if the size of select * from A where id == a is very large, and similarly for B, while the join of A.id == a and B.id==b is small, then doing a get() for both and then merging seems excessively slow. my fall-back approach is, since A and B do not change a lot, I'll pre-generate the join of A and B (not very large) keyed on A.id + B.id, then do the get(a+b) thanks Yang
Re: CQL COUNT Not Accurate?
Yes, this is broken. We'll fix this for https://issues.apache.org/jira/browse/CASSANDRA-2474 On Fri, Jul 22, 2011 at 4:18 PM, Hefeng Yuan hfy...@rhapsody.com wrote: Hi, I just noticed that the count(*) in CQL seems to be having wrong answer, when I have only one row, the count(*) returns two. Below are the commands I tried: cqlsh SELECT COUNT(*) FROM UserProfile USING CONSISTENCY QUORUM WHERE KEY IN ('00D760DB1730482D81BC6845F875A97D'); (2,) cqlsh select * from UserProfile where key = '00D760DB1730482D81BC6845F875A97D'; u'00D760DB1730482D81BC6845F875A97D' | u'ScreenName','3D5C78CAE2E143FBBD1F539A8496D472' Is this a known bug? Or I did something wrong? Thanks, Hefeng -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Equalizing nodes storage load
Hi Peter That was precisely it. Thank you :) Doing a major compaction on the heaviest node (74.65GB) reduced it to 33.55GB. I'll compact the other 2 nodes as well. I anticipate they will also settle around that size. On 2011-07-22, at 5:00 PM, Peter Tillotson wrote: I'm not sure if this is the answer, but major compaction on each node for each column family. I suspect the data shuffle has left quite a few deleted keys which may get cleaned out on major compaction. As I remember major compaction doesn't automatically in 7.x, I'm not sure if it is triggered by repair. p On 22/07/11 16:08, Mina Naguib wrote: I'm trying to balance Load ( 41.98GB vs 59.4GB vs 74.65GB ) Owns looks ok. They're all 33.33% which is what I want. It was calculated simply by 2^127 / num_nodes. The only reason the first one doesn't start at 0 is that I''ve actually carved the ring planning for 9 machines (2 new data centers of 3 machines each). However only 1 data center (DCMTL) is currently up. On 2011-07-22, at 10:56 AM, Sasha Dolgy wrote: are you trying to balance load or owns ? owns looks fine ... 33.33% each ... which to me says balanced. how did you calculate your tokens? On Fri, Jul 22, 2011 at 4:37 PM, Mina Naguib mina.nag...@bloomdigital.com wrote: Address Status State LoadOwnsToken xx.xx.x.105 Up Normal 41.98 GB33.33% 37809151880104273718152734159085356828 xx.xx.x.107 Up Normal 59.4 GB 33.33% 94522879700260684295381835397713392071 xx.xx.x.18 Up Normal 74.65 GB33.33% 151236607520417094872610936636341427313