Re: slow read
2012/3/5 Jeesoo Shin bsh...@gmail.com Hi all. I have very SLOW READ here. :-( I made a cluster with three node (aws xlarge, replication = 3) Cassandra version is 1.0.6 I have inserted 1,000,000 rows. (standard column) Each row has 200 columns. Each column has 16 byte key, 512 byte value. I used Hector createSliceQuery to get one column in a row. This basic query(random row, fixed column) is created with multiple thread and hit cassandra. I only get up to 140 request per second. Is this all I can get for read? Or am I doing something wrong? Interestingly, when I request rows which doesn't exist, it goes up to 1600 per second. You must test read performance by paralel test (ie multiple threads). The result when not existent rows are more faster is result of bloom filter ANY insight, share will be extremely helpful. Thank you. Regards, Jeesoo.
Re: Secondary indexes don't go away after metadata change
The secondary index CF's are marked as no longer required / marked as compacted. under 1.x they would then be deleted reasonably quickly, and definitely deleted after a restart. Is there a zero length .Compacted file there ? Also, when adding a new node to the ring the new node will build indexes for the ones that supposedly don’t exist any longer. Is this supposed to happen? Would this have happened if I had deleted the old SSTables from the previously existing nodes? Check you have a consistent schema using describe cluster in the CLI. And check the schema is what you think it is using show schema. Another trick is to do a snapshot. Only the files in use are included the snapshot. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 2/03/2012, at 2:53 AM, Frisch, Michael wrote: I have a few column families that I decided to get rid of the secondary indexes on. I see that there aren’t any new index SSTables being created, but all of the old ones remain (some from as far back as September). Is it safe to just delete then when the node is offline? Should I run clean-up or scrub? Also, when adding a new node to the ring the new node will build indexes for the ones that supposedly don’t exist any longer. Is this supposed to happen? Would this have happened if I had deleted the old SSTables from the previously existing nodes? The nodes in question have either been upgraded from v0.8.1 = v1.0.2 (scrubbed at this time) = v1.0.6 or from v1.0.2 = v1.0.6. The secondary index was dropped when the nodes were version 1.0.6. The new node added was also 1.0.6. - Mike
Re: slow read
Thank you for reply. :) Yes I did multiple thread. 160, 320 gave me same result. On 3/5/12, ruslan usifov ruslan.usi...@gmail.com wrote: 2012/3/5 Jeesoo Shin bsh...@gmail.com Hi all. I have very SLOW READ here. :-( I made a cluster with three node (aws xlarge, replication = 3) Cassandra version is 1.0.6 I have inserted 1,000,000 rows. (standard column) Each row has 200 columns. Each column has 16 byte key, 512 byte value. I used Hector createSliceQuery to get one column in a row. This basic query(random row, fixed column) is created with multiple thread and hit cassandra. I only get up to 140 request per second. Is this all I can get for read? Or am I doing something wrong? Interestingly, when I request rows which doesn't exist, it goes up to 1600 per second. You must test read performance by paralel test (ie multiple threads). The result when not existent rows are more faster is result of bloom filter ANY insight, share will be extremely helpful. Thank you. Regards, Jeesoo.
Re: can't find rows
am guessing a lot here, but I would check if auto_bootstrap is enabled. It is by default. When a new node joins reads are not directed to it until it is marked as UP (writes are sent to it as it is joining). So reads should continue to go to the original UP node. Sounds like it's all running now. If you get see it again can you provide some detailed steps such as the type of query and the CL level. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 2/03/2012, at 7:35 AM, Casey Deccio wrote: On Thu, Mar 1, 2012 at 9:33 AM, aaron morton aa...@thelastpickle.com wrote: What RF were you using and had you been running repair regularly ? RF 1 *sigh*. Waiting until I have more/better resources to use RF 1. Hopefully soon. In the mean time... Oddly (to me), when I removed the most recently added node, all my rows re-appeared, but were only up-to-date as of a 10 days ago (a few days before I added the node). None of the supercolumns since then show up. But when I look at the sstable files on the different nodes, I see large files with timestamps in between that date and today's, which makes me think the data is still there. Also, if I re-add the troublesome new node (not having run cleanup), all rows are again inaccessible until I again decommission it. Casey
Re: Schema change causes exception when adding data
I don't have a lot of Hector experience but it sounds like the way to go. The CLI and cqlsh will take care of this. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 2/03/2012, at 10:12 AM, Tharindu Mathew wrote: There are 2. I'd like to wait till there are one, when I insert the value. Going through the code, calling client.describe_schema_versions() seems to give a good answer to this. And I discovered that if I wait till there is only 1 version, I will not get this error. Is this the best practice if I want to check this programatically? On Thu, Mar 1, 2012 at 11:15 PM, aaron morton aa...@thelastpickle.com wrote: use describe cluster in the CLI to see how many schema versions there are. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 2/03/2012, at 12:25 AM, Tharindu Mathew wrote: On Thu, Mar 1, 2012 at 11:47 AM, Tharindu Mathew mcclou...@gmail.com wrote: Jeremiah, Thanks for the reply. This is what we have been doing, but it's not reliable as we don't know a definite time that the schema would get replicated. Is there any way I can know for sure that changes have propagated? [Edit: corrected to a question] Then I can block the insertion of data until then. On Thu, Mar 1, 2012 at 4:33 AM, Jeremiah Jordan jeremiah.jor...@morningstar.com wrote: The error is that the specified colum family doesn’t exist. If you connect with the CLI and describe the keyspace does it show up? Also, after adding a new column family programmatically you can’t use it immediately, you have to wait for it to propagate. You can use calls to describe schema to do so, keep calling it until every node is on the same schema. -Jeremiah From: Tharindu Mathew [mailto:mcclou...@gmail.com] Sent: Wednesday, February 29, 2012 8:27 AM To: user Subject: Schema change causes exception when adding data Hi, I have a 3 node cluster and I'm dynamically updating a keyspace with a new column family. Then, when I try to write records to it I get the following exception shown at [1]. How do I avoid this. I'm using Hector and the default consistency level of QUORUM is used. Cassandra version 0.7.8. Replication Factor is 1. How can I solve my problem? [1] - me.prettyprint.hector.api.exceptions.HInvalidRequestException: InvalidRequestException(why:unconfigured columnfamily proxySummary) at me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:42) at me.prettyprint.cassandra.service.KeyspaceServiceImpl$10.execute(KeyspaceServiceImpl.java:397) at me.prettyprint.cassandra.service.KeyspaceServiceImpl$10.execute(KeyspaceServiceImpl.java:383) at me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:101) at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:156) at me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:129) at me.prettyprint.cassandra.service.KeyspaceServiceImpl.multigetSlice(KeyspaceServiceImpl.java:401) at me.prettyprint.cassandra.model.thrift.ThriftMultigetSliceQuery$1.doInKeyspace(ThriftMultigetSliceQuery.java:67) at me.prettyprint.cassandra.model.thrift.ThriftMultigetSliceQuery$1.doInKeyspace(ThriftMultigetSliceQuery.java:59) at me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20) at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:72) at me.prettyprint.cassandra.model.thrift.ThriftMultigetSliceQuery.execute(ThriftMultigetSliceQuery.java:58) -- Regards, Tharindu blog: http://mackiemathew.com/ -- Regards, Tharindu blog: http://mackiemathew.com/ -- Regards, Tharindu blog: http://mackiemathew.com/ -- Regards, Tharindu blog: http://mackiemathew.com/
Re: slow read
And sum of all rq/s threads is 160?? 2012/3/5 Jeesoo Shin bsh...@gmail.com Thank you for reply. :) Yes I did multiple thread. 160, 320 gave me same result. On 3/5/12, ruslan usifov ruslan.usi...@gmail.com wrote: 2012/3/5 Jeesoo Shin bsh...@gmail.com Hi all. I have very SLOW READ here. :-( I made a cluster with three node (aws xlarge, replication = 3) Cassandra version is 1.0.6 I have inserted 1,000,000 rows. (standard column) Each row has 200 columns. Each column has 16 byte key, 512 byte value. I used Hector createSliceQuery to get one column in a row. This basic query(random row, fixed column) is created with multiple thread and hit cassandra. I only get up to 140 request per second. Is this all I can get for read? Or am I doing something wrong? Interestingly, when I request rows which doesn't exist, it goes up to 1600 per second. You must test read performance by paralel test (ie multiple threads). The result when not existent rows are more faster is result of bloom filter ANY insight, share will be extremely helpful. Thank you. Regards, Jeesoo.
Re: composite types in CQL
It's not currently supported in CQL https://issues.apache.org/jira/browse/CASSANDRA-3761 You can do it using the CLI, see the online help. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 2/03/2012, at 10:39 AM, Bayle Shanks wrote: hi, i'm wondering how to do composite data storage types in CQL. I am trying to mimic the Composite Types functionality of the Pycassa client: http://pycassa.github.com/pycassa/assorted/composite_types.html In short, in Pycassa you can do something like: --- itemTimeCompositeType = CompositeType(UTF8Type(), LongType()) pycassa.system_manager.SystemManager().create_column_family( keyspaceName, columnFamilyName, key_validation_class=itemTimeCompositeType ) ... columnFamily.insert(self._makeKey(item, time_as_integer), {field : value}) --- and then your primary key for this column family is a pair of a string and an integer. This is important because i am using ByteOrderedPartitioner and doing range scans among keys which share the same string 'item' but have different values for the integer. My motivation is that i am trying to port https://github.com/bshanks/cassandra-timeseries-py to Ruby and i thought i might try CQL. thanks, bayle
Re: Test Data creation in Cassandra
try tools/stress in the source distribution. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 3/03/2012, at 6:01 AM, A J wrote: What is the best way to create millions of test data in Cassandra ? I would like to have some script where I first insert say 100 rows in a CF. Then reinsert the same data on 'server side' with new unique key. That will make it 200 rows. Then continue the exercise a few times till I get lot of records. I don't care if the column names and values are identical between the different rows. Just a lot of records generated for a few seed records. The rows are very fat. So I don't want to use any client side scripting that would push individual or batched rows to cassandra. Thanks for any tips.
RE: cli question
I faced the same issue some time back. Solution which fit my bill is as follows: CREATE COLUMN FAMILY aaa with comparator = 'CompositeType(UTF8Type,UTF8Type)' and default_validation_class = 'UTF8Type' and key_validation_class = 'CompositeType(UTF8Type,UTF8Type,UTF8Type,)'; notice I have mentioned three datatypes or validators in key_validation_class under CompositeType. Now if I have to insert with key aaa:bbb:ccc it will work smoothly and even if I wish to insert with just aaa:bbb it will work just fine. Do let me know if it solves your problem. Regards RIshabh Agrawal From: Tamar Fraenkel [mailto:ta...@tok-media.com] Sent: Monday, March 05, 2012 1:19 PM To: cassandra-u...@incubator.apache.org Subject: cli question Hi! I have CF with the following deffinition: CREATE COLUMN FAMILY a_b_indx with comparator = 'CompositeType(LongType,UUIDType)' and default_validation_class = 'UTF8Type' and key_validation_class = 'CompositeType(UTF8Type,UTF8Type)'; Where the key may be a composite of the following two strings: 'AAA' and 'BBB:CCC' Notice, that the second string has ':' in it. I try to query for rows I know exist in the CF but can't. I tried those and many more :) * get a_b_indx ['AAA:BBB:CCC']; * get a_b_indx ['AAA:BBB\:CCC']; * get a_b_indx [utf8('AAA'):utf8('BBB:CCC')]; Is it possible? Does anyone know how? Thanks, Tamar Fraenkel Senior Software Engineer, TOK Media [Inline image 1] ta...@tok-media.commailto:ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 Impetus' Head of Innovation labs, Vineet Tyagi will be presenting on 'Big Data Big Costs?' at the Strata Conference, CA (Feb 28 - Mar 1) http://bit.ly/bSMWd7. Listen to our webcast 'Hybrid Approach to Extend Web Apps to Tablets Smartphones' available at http://bit.ly/yQC1oD. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference. inline: image001.png
running two rings on the same subnet
Hi! I have a Cassandra cluster with two nodes nodetool ring -h localhost Address DC RackStatus State LoadOwns Token 85070591730234615865843651857942052864 10.0.0.19 datacenter1 rack1 Up Normal 488.74 KB 50.00% 0 10.0.0.28 datacenter1 rack1 Up Normal 504.63 KB 50.00% 85070591730234615865843651857942052864 I want to create a second ring with the same name but two different nodes. using tokengentool I get the same tokens as they are affected from the number of nodes in a ring. My question is like this: Lets say I create two new VMs, with IPs: 10.0.0.31 and 10.0.0.11 *In 10.0.0.31 cassandra.yaml I will set* initial_token: 0 seeds: 10.0.0.31 listen_address: 10.0.0.31 rpc_address: 0.0.0.0 *In 10.0.0.11 cassandra.yaml I will set* initial_token: 85070591730234615865843651857942052864 seeds: 10.0.0.31 listen_address: 10.0.0.11 rpc_address: 0.0.0.0 *Would the rings be separate?* Thanks, *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 tokLogo.png
Re: cli question
Thanks! I decided to just replace all : with ^ and I can simply run: get a_b_indx ['AAA:BBB^CCC']; *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Mon, Mar 5, 2012 at 11:58 AM, Rishabh Agrawal rishabh.agra...@impetus.co.in wrote: I faced the same issue some time back. Solution which fit my bill is as follows: CREATE COLUMN FAMILY aaa with comparator = 'CompositeType(UTF8Type,UTF8Type)' and default_validation_class = 'UTF8Type' and key_validation_class = 'CompositeType(UTF8Type,UTF8Type,UTF8Type,)'; notice I have mentioned three datatypes or validators in key_validation_class under CompositeType. Now if I have to insert with key aaa:bbb:ccc it will work smoothly and even if I wish to insert with just aaa:bbb it will work just fine. Do let me know if it solves your problem. Regards RIshabh Agrawal *From:* Tamar Fraenkel [mailto:ta...@tok-media.com] *Sent:* Monday, March 05, 2012 1:19 PM *To:* cassandra-u...@incubator.apache.org *Subject:* cli question Hi! I have CF with the following deffinition: CREATE COLUMN FAMILY a_b_indx with comparator = 'CompositeType(LongType,UUIDType)' and default_validation_class = 'UTF8Type' and key_validation_class = 'CompositeType(UTF8Type,UTF8Type)'; Where the key may be a composite of the following two strings: 'AAA' and 'BBB:CCC' Notice, that the second string has ':' in it. I try to query for rows I know exist in the CF but can't. I tried those and many more :) - get a_b_indx ['AAA:BBB:CCC']; - get a_b_indx ['AAA:BBB\:CCC']; - get a_b_indx [utf8('AAA'):utf8('BBB:CCC')]; Is it possible? Does anyone know how? Thanks, *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 -- Impetus’ Head of Innovation labs, Vineet Tyagi will be presenting on ‘Big Data Big Costs?’ at the Strata Conference, CA (Feb 28 - Mar 1) http://bit.ly/bSMWd7. Listen to our webcast ‘Hybrid Approach to Extend Web Apps to Tablets Smartphones’ available at http://bit.ly/yQC1oD. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference. tokLogo.pngimage001.png
Re: Maximum Row Size in Cassandra : Potential Bottleneck
Is there any way in which the writes can be made pretty slow on different nodes. Ideally I would like data to be written on one node and eventually replicating across other nodes I dont really need a real time update, so can pretty much live with slow writes. Replicating inside the mutation request is a core feature of cassandra. You can hack something by disabling the the gossip on a node and doing the inserts on it (at CL One). Re-enable gossip and let HH send the data to the other nodes, or disable HH and use repair to distribute the changes. HH will be less resource intensive. 1250.188: [Full GC [PSYoungGen: 76825K-0K(571648K)] [PSOldGen: 7356362K-2356764K(7569408K)] 7433188K-2356764K(8141056K) [PSPermGen: 32579K-32579K(61056K)], 8.1019330 secs] [Times: user=8.10 sys=0.00, real=8.10 secs] What JVM are you using and what are the JVM options ? (The info I can find about PSYoungGen suggest it's pretty old. ) Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 5/03/2012, at 1:13 AM, Shubham Srivastava wrote: Tried all the possible options and nothing actually seems to work. Was trying to get insights of where exactly problem is arising when writes on done on one node and read on another . I found that GC gets triggered when writes are done on the other node through RowMutationVerbHandler. Settings 1.I tested this on two node setup with RF:2 and Read CL:1. 2.Also heap is of 8G and Xmn:800M on both the nodes with 4Cores. 3.I am using concurrent_write:32 as default (8 * core). 4. -Dcassandra.compaction.priority=1 5.No explicit GC settings commented the one in solandra-env.sh 6.in_memory_compaction_limit_in_mb: 1 7.read repair:0.1 8.concurrent_compactors: 1 Is there any way in which the writes can be made pretty slow on different nodes. Ideally I would like data to be written on one node and eventually replicating across other nodes I dont really need a real time update, so can pretty much live with slow writes. Sharing the cassandra and GC logs as below DEBUG [MutationStage:5] 2012-03-04 17:07:52,014 RowMutationVerbHandler.java (line 56) RowMutation(keyspace='L', key='686f74656c737e30efbfbf7365617263686669656c64efbfbf64657320', modifications=[ColumnFamily(TI [357025:false:4@1330861061454,])]) applied. Sending response to 14541197@/10.86.29.21 DEBUG [MutationStage:5] 2012-03-04 17:07:52,014 RowMutationVerbHandler.java (line 44) Applying RowMutation(keyspace='L', key='686f74656c737e30efbfbf686f74656c666163696c69747964657461696cefbfbf49726f6e696e672053657276696365', modifications=[ColumnFamily(TI [357025:false:2@1330861061454,])]) DEBUG [MutationStage:5] 2012-03-04 17:07:52,014 Table.java (line 387) applying mutation of row 686f74656c737e30efbfbf686f74656c666163696c69747964657461696cefbfbf49726f6e696e672053657276696365 DEBUG [MutationStage:5] 2012-03-04 17:07:52,014 RowMutationVerbHandler.java (line 56) RowMutation(keyspace='L', key='686f74656c737e30efbfbf686f74656c666163696c69747964657461696cefbfbf49726f6e696e672053657276696365', modifications=[ColumnFamily(TI [357025:false:2@1330861061454,])]) applied. Sending response to 14541198@/10.86.29.21 DEBUG [MutationStage:5] 2012-03-04 17:07:52,014 RowMutationVerbHandler.java (line 44) Applying RowMutation(keyspace='L', key='686f74656c737e30efbfbf686f74656c666163696c697479efbfbf4c61756e647279205365727669636573', modifications=[ColumnFamily(TI [357025:false:2@1330861061454,])]) DEBUG [MutationStage:5] 2012-03-04 17:07:52,014 Table.java (line 387) applying mutation of row 686f74656c737e30efbfbf686f74656c666163696c697479efbfbf4c61756e647279205365727669636573 DEBUG [MutationStage:5] 2012-03-04 17:07:52,014 RowMutationVerbHandler.java (line 56) RowMutation(keyspace='L', key='686f74656c737e30efbfbf686f74656c666163696c697479efbfbf4c61756e647279205365727669636573', modifications=[ColumnFamily(TI [357025:false:2@1330861061454,])]) applied. Sending response to 14541199@/10.86.29.21 DEBUG [248896865@qtp-1257398760-2] 2012-03-04 17:07:51,572 SolrIndexReader.java (line 928) getCoreCacheKey() - start DEBUG [MutationStage:32] 2012-03-04 17:07:51,426 RowMutationVerbHandler.java (line 56) RowMutation(keyspace='L', key='686f74656c737e30efbfbf636f756e74727953636f7265', modifications=[ColumnFamily(FC [356996:false:4@1330861059172,])]) applied. Sending response to 14532390@/10.86.29.21 DEBUG [1568261127@qtp-1257398760-14] 2012-03-04 17:07:51,425 SolrIndexSearcher.java (line 557) doc(int, SetString) - start DEBUG [MutationStage:2] 2012-03-04 17:07:51,425 Table.java (line 387) applying mutation of row 686f74656c737e30efbfbf6e616d65efbfbf6c61 DEBUG [1568261127@qtp-1257398760-14] 2012-03-04 17:07:52,015 SolrIndexSearcher.java (line 557) doc(int, SetString) - start DEBUG [1861954021@qtp-1257398760-11] 2012-03-04 17:07:51,425 StorageProxy.java (line 696) Read: 0 ms. INFO
Mutation Dropped Messages
Hi All, While benchmarking Cassandra I found Mutation Dropped messages in the logs. Now I know this is a good old question. It will be really great if someone can provide a check list to recover when such a thing happens. I am looking for answers of the following questions - 1. Which parameters to tune in the config files? - Especially looking for heavy writes 2. What is the difference between TimedOutException and silently dropping mutation messages while operating on a CL of QUORUM. Regards, Dushyant -- NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or views contained herein are not intended to be, and do not constitute, advice within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and Consumer Protection Act. If you have received this communication in error, please destroy all electronic and paper copies and notify the sender immediately. Mistransmission is not intended to waive confidentiality or privilege. Morgan Stanley reserves the right, to the extent permitted under applicable law, to monitor electronic communications. This message is subject to terms available at the following link: http://www.morganstanley.com/disclaimers. If you cannot access these links, please notify us by reply message and we will send the contents to you. By messaging with Morgan Stanley you consent to the foregoing.
Re: slow read
Where is the client running from ? To see if a node it keeping up with requests look at nodetool tpstats, check if the read stage is backing up. To see how long a read takes, use nodetool cfstats and look at the read latency. (this the latency of a read on that node, not cluster wide) To see how long a read takes cluster wide, use the StorageProxyMBean via JConsole. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 5/03/2012, at 10:46 PM, ruslan usifov wrote: And sum of all rq/s threads is 160?? 2012/3/5 Jeesoo Shin bsh...@gmail.com Thank you for reply. :) Yes I did multiple thread. 160, 320 gave me same result. On 3/5/12, ruslan usifov ruslan.usi...@gmail.com wrote: 2012/3/5 Jeesoo Shin bsh...@gmail.com Hi all. I have very SLOW READ here. :-( I made a cluster with three node (aws xlarge, replication = 3) Cassandra version is 1.0.6 I have inserted 1,000,000 rows. (standard column) Each row has 200 columns. Each column has 16 byte key, 512 byte value. I used Hector createSliceQuery to get one column in a row. This basic query(random row, fixed column) is created with multiple thread and hit cassandra. I only get up to 140 request per second. Is this all I can get for read? Or am I doing something wrong? Interestingly, when I request rows which doesn't exist, it goes up to 1600 per second. You must test read performance by paralel test (ie multiple threads). The result when not existent rows are more faster is result of bloom filter ANY insight, share will be extremely helpful. Thank you. Regards, Jeesoo.
Re: running two rings on the same subnet
Would the rings be separate? Yes. But I would recommend you give them different cluster names. It's a good protections against nodes accidentally joining the wrong cluster. cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 5/03/2012, at 11:06 PM, Tamar Fraenkel wrote: Hi! I have a Cassandra cluster with two nodes nodetool ring -h localhost Address DC RackStatus State LoadOwns Token 85070591730234615865843651857942052864 10.0.0.19 datacenter1 rack1 Up Normal 488.74 KB 50.00% 0 10.0.0.28 datacenter1 rack1 Up Normal 504.63 KB 50.00% 85070591730234615865843651857942052864 I want to create a second ring with the same name but two different nodes. using tokengentool I get the same tokens as they are affected from the number of nodes in a ring. My question is like this: Lets say I create two new VMs, with IPs: 10.0.0.31 and 10.0.0.11 In 10.0.0.31 cassandra.yaml I will set initial_token: 0 seeds: 10.0.0.31 listen_address: 10.0.0.31 rpc_address: 0.0.0.0 In 10.0.0.11 cassandra.yaml I will set initial_token: 85070591730234615865843651857942052864 seeds: 10.0.0.31 listen_address: 10.0.0.11 rpc_address: 0.0.0.0 Would the rings be separate? Thanks, Tamar Fraenkel Senior Software Engineer, TOK Media tokLogo.png ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com
Re: running two rings on the same subnet
You have to use PropertyFileSnitch and NetworkTopologyStrategy to create a multi-datacenter setup with two circles. You can start reading from this page: http://www.datastax.com/docs/1.0/cluster_architecture/replication#about-replica-placement-strategy Moreover all tokens must be unique (even across datacenters), although - from pure curiosity - I wonder what is the rationale behind this. By the way, can someone enlighten me about the first line in the output of the nodetool. Obviously it contains a token, but nothing else. It seems like a formatting glitch, but maybe it has a role. On 2012.03.05. 11:06, Tamar Fraenkel wrote: Hi! I have aCassandra clusterwith two nodes nodetool ring -h localhost Address DC Rack Status State Load Owns Token 85070591730234615865843651857942052864 10.0.0.19datacenter1 rack1 Up Normal 488.74 KB50.00% 0 10.0.0.28datacenter1 rack1 Up Normal 504.63 KB50.00% 85070591730234615865843651857942052864 I want to create a second ring with the same name but two different nodes. using tokengentool I get the same tokens as they are affected from the number of nodes in a ring. My question is like this: Lets say I create two new VMs, with IPs: 10.0.0.31 and 10.0.0.11 In 10.0.0.31 cassandra.yaml I will set initial_token: 0 seeds: "10.0.0.31" listen_address:10.0.0.31 rpc_address: 0.0.0.0 In 10.0.0.11cassandra.yamlI will set initial_token:85070591730234615865843651857942052864 seeds: "10.0.0.31" listen_address: 10.0.0.11 rpc_address: 0.0.0.0 Would the rings be separate? Thanks, Tamar Fraenkel Senior Software Engineer, TOK Media ta...@tok-media.com Tel:+972 2 6409736 Mob:+972 54 8356490 Fax:+972 2 5612956
Re: Mutation Dropped Messages
1. Which parameters to tune in the config files? – Especially looking for heavy writes The node is overloaded. It may be because there are no enough nodes, or the node is under temporary stress such as GC or repair. If you have spare IO / CPU capacity you could increase the current_writes to increase throughput on the write stage. You then need to ensure the commit log and, to a lesser degree, the data volumes can keep up. 2. What is the difference between TimedOutException and silently dropping mutation messages while operating on a CL of QUORUM. TimedOutExceptions means CL nodes did not respond to the coordinator before rpc_timeout. Dropping messages happens when a message is removed from the queue in the a thread pool after rpc_timeout has occurred. it is a feature of the architecture, and correct behaviour under stress. Inconsistencies created by dropped messages are repaired via reads as high CL, HH (in 1.+), Read Repair or Anti Entropy. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 5/03/2012, at 11:32 PM, Tiwari, Dushyant wrote: Hi All, While benchmarking Cassandra I found “Mutation Dropped” messages in the logs. Now I know this is a good old question. It will be really great if someone can provide a check list to recover when such a thing happens. I am looking for answers of the following questions - 1. Which parameters to tune in the config files? – Especially looking for heavy writes 2. What is the difference between TimedOutException and silently dropping mutation messages while operating on a CL of QUORUM. Regards, Dushyant NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or views contained herein are not intended to be, and do not constitute, advice within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and Consumer Protection Act. If you have received this communication in error, please destroy all electronic and paper copies and notify the sender immediately. Mistransmission is not intended to waive confidentiality or privilege. Morgan Stanley reserves the right, to the extent permitted under applicable law, to monitor electronic communications. This message is subject to terms available at the following link: http://www.morganstanley.com/disclaimers. If you cannot access these links, please notify us by reply message and we will send the contents to you. By messaging with Morgan Stanley you consent to the foregoing.
Re: running two rings on the same subnet
Do you want to create two separate clusters or a single cluster with two data centres ? If it's the later, token selection is discussed here http://www.datastax.com/docs/1.0/install/cluster_init#token-gen-cassandra Moreover all tokens must be unique (even across datacenters), although - from pure curiosity - I wonder what is the rationale behind this. Otherwise data is not evenly distributed. By the way, can someone enlighten me about the first line in the output of the nodetool. Obviously it contains a token, but nothing else. It seems like a formatting glitch, but maybe it has a role. It's the exclusive lower bound token for the first node in the ring. This also happens to be the token for the last node in the ring. In your setup 10.0.0.19 owns (85070591730234615865843651857942052864+1) to 0 10.0.0.28 owns (0 + 1) to 85070591730234615865843651857942052864 (does not imply primary replica, just used to map keys to nodes.) - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 5/03/2012, at 11:38 PM, Hontvári József Levente wrote: You have to use PropertyFileSnitch and NetworkTopologyStrategy to create a multi-datacenter setup with two circles. You can start reading from this page: http://www.datastax.com/docs/1.0/cluster_architecture/replication#about-replica-placement-strategy Moreover all tokens must be unique (even across datacenters), although - from pure curiosity - I wonder what is the rationale behind this. By the way, can someone enlighten me about the first line in the output of the nodetool. Obviously it contains a token, but nothing else. It seems like a formatting glitch, but maybe it has a role. On 2012.03.05. 11:06, Tamar Fraenkel wrote: Hi! I have a Cassandra cluster with two nodes nodetool ring -h localhost Address DC RackStatus State LoadOwns Token 85070591730234615865843651857942052864 10.0.0.19 datacenter1 rack1 Up Normal 488.74 KB 50.00% 0 10.0.0.28 datacenter1 rack1 Up Normal 504.63 KB 50.00% 85070591730234615865843651857942052864 I want to create a second ring with the same name but two different nodes. using tokengentool I get the same tokens as they are affected from the number of nodes in a ring. My question is like this: Lets say I create two new VMs, with IPs: 10.0.0.31 and 10.0.0.11 In 10.0.0.31 cassandra.yaml I will set initial_token: 0 seeds: 10.0.0.31 listen_address: 10.0.0.31 rpc_address: 0.0.0.0 In 10.0.0.11 cassandra.yaml I will set initial_token: 85070591730234615865843651857942052864 seeds: 10.0.0.31 listen_address: 10.0.0.11 rpc_address: 0.0.0.0 Would the rings be separate? Thanks, Tamar Fraenkel Senior Software Engineer, TOK Media Mail Attachment.png ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956
Re: how stable is 1.0 these days?
1.0.7 is very stable, weeks in high-load production environment without any exception, 1.0.8 should be even more stable, check changes.txt for what was fixed. 2012/3/2 Marcus Eriksson krum...@gmail.com beware of https://issues.apache.org/jira/browse/CASSANDRA-3820 though if you have many keys per node other than that, yep, it seems solid /Marcus On Wed, Feb 29, 2012 at 6:20 PM, Thibaut Britz thibaut.br...@trendiction.com wrote: Thanks! We will test it on our test cluster in the coming weeks and hopefully put it into production on our 200 node main cluster. :) Thibaut On Wed, Feb 29, 2012 at 5:52 PM, Edward Capriolo edlinuxg...@gmail.comwrote: On Wed, Feb 29, 2012 at 10:35 AM, Thibaut Britz thibaut.br...@trendiction.com wrote: Any more feedback on larger deployments of 1.0.*? We are eager to try out the new features in production, but don't want to run into bugs as on former 0.7 and 0.8 versions. Thanks, Thibaut On Tue, Jan 31, 2012 at 6:59 AM, Ben Coverston ben.covers...@datastax.com wrote: I'm not sure what Carlo is referring to, but generally if you have done, thousands of migrations you can end up in a situation where the migrations take a long time to replay, and there are some race conditions that can be problematic in the case where there are thousands of migrations that may need to be replayed while a node is bootstrapped. If you get into this situation it can be fixed by copying migrations from a known good schema to the node that you are trying to bootstrap. Generally I would advise against frequent schema updates. Unlike rows in column families the schema itself is designed to be relatively static. On Mon, Jan 30, 2012 at 2:14 PM, Jim Newsham jnews...@referentia.com wrote: Could you also elaborate for creating/dropping column families? We're currently working on moving to 1.0 and using dynamically created tables, so I'm very interested in what issues we might encounter. So far the only thing I've encountered (with 1.0.7 + hector 1.0-2) is that dropping a cf may sometimes fail with UnavailableException. I think this happens when the cf is busy being compacted. When I sleep/retry within a loop it eventually succeeds. Thanks, Jim On 1/26/2012 7:32 AM, Pierre-Yves Ritschard wrote: Can you elaborate on the composite types instabilities ? is this specific to hector as the radim's posts suggests ? These one liner answers are quite stressful :) On Thu, Jan 26, 2012 at 1:28 PM, Carlo Pirescarlopi...@gmail.com wrote: If you need to use composite types and create/drop column families on the fly you must be prepared to instabilities. -- Ben Coverston DataStax -- The Apache Cassandra Company I would call 1.0.7 rock fricken solid. Incredibly stable. It has been that way since I updated to 0.8.8 really. TBs of data, billions of requests a day, and thanks to JAMM, memtable type auto-tuning, and other enhancements I rarely, if ever, find a node in a state where it requires a restart. My clusters are beast-ing. There always is bugs in software, but coming from a guy who ran cassandra 0.6.1.Administration on my Cassandra cluster is like a vacation now.
Re: Huge amount of empty files in data directory.
After running Cassandra for 2 years in production on Windows servers, starting from 0.7 beta2 up to 1.0.7 we have moved to Linux and forgot all the hell we had on Windows. Having JNA, off-heap row cache and normally working MMAP on Linux you're getting a lot better performance and stability comparing to Windows, and less maintenance. 2012/3/1 Henrik Schröder skro...@gmail.com Great, thanks! /Henrik On Thu, Mar 1, 2012 at 13:08, Sylvain Lebresne sylv...@datastax.comwrote: It's a bug, namely: https://issues.apache.org/jira/browse/CASSANDRA-3616 You'd want to upgrade. -- Sylvain On Thu, Mar 1, 2012 at 1:01 PM, Henrik Schröder skro...@gmail.com wrote: Hi, We're running Cassandra 1.0.6 on Windows, and noticed that the amount of files in the datadirectory just keeps growing. We have about 60GB of data per node, we do a major compaction about once a week, but after compaction there's a lot of 0-byte temp files and old files that are kept for some reason. After 50 days of uptime there was around 5 files in each datadirectory, but when we restarted a server it deleted all the unnecessary files and it shrunk down to about 200 files. We're running without compression, and with the regular compaction strategy, not leveldb. I don't remember seeing this behaviour in older versions of Cassandra, shouldn't it delete temp files while running? Is it possible to force it to delete temp files while running? Is this fixed in a later version? Or do we have to periodically restart servers to clean up the datadirectories? /Henrik Schröder
Re: running two rings on the same subnet
I want tow separate clusters. *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Mon, Mar 5, 2012 at 12:48 PM, aaron morton aa...@thelastpickle.comwrote: Do you want to create two separate clusters or a single cluster with two data centres ? If it's the later, token selection is discussed here http://www.datastax.com/docs/1.0/install/cluster_init#token-gen-cassandra Moreover all tokens must be unique (even across datacenters), although - from pure curiosity - I wonder what is the rationale behind this. Otherwise data is not evenly distributed. By the way, can someone enlighten me about the first line in the output of the nodetool. Obviously it contains a token, but nothing else. It seems like a formatting glitch, but maybe it has a role. It's the exclusive lower bound token for the first node in the ring. This also happens to be the token for the last node in the ring. In your setup 10.0.0.19 owns (85070591730234615865843651857942052864+1) to 0 10.0.0.28 owns (0 + 1) to 85070591730234615865843651857942052864 (does not imply primary replica, just used to map keys to nodes.) - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 5/03/2012, at 11:38 PM, Hontvári József Levente wrote: You have to use PropertyFileSnitch and NetworkTopologyStrategy to create a multi-datacenter setup with two circles. You can start reading from this page: http://www.datastax.com/docs/1.0/cluster_architecture/replication#about-replica-placement-strategy Moreover all tokens must be unique (even across datacenters), although - from pure curiosity - I wonder what is the rationale behind this. By the way, can someone enlighten me about the first line in the output of the nodetool. Obviously it contains a token, but nothing else. It seems like a formatting glitch, but maybe it has a role. On 2012.03.05. 11:06, Tamar Fraenkel wrote: Hi! I have a Cassandra cluster with two nodes nodetool ring -h localhost Address DC RackStatus State Load OwnsToken 85070591730234615865843651857942052864 10.0.0.19 datacenter1 rack1 Up Normal 488.74 KB 50.00% 0 10.0.0.28 datacenter1 rack1 Up Normal 504.63 KB 50.00% 85070591730234615865843651857942052864 I want to create a second ring with the same name but two different nodes. using tokengentool I get the same tokens as they are affected from the number of nodes in a ring. My question is like this: Lets say I create two new VMs, with IPs: 10.0.0.31 and 10.0.0.11 *In 10.0.0.31 cassandra.yaml I will set* initial_token: 0 seeds: 10.0.0.31 listen_address: 10.0.0.31 rpc_address: 0.0.0.0 *In 10.0.0.11 cassandra.yaml I will set* initial_token: 85070591730234615865843651857942052864 seeds: 10.0.0.31 listen_address: 10.0.0.11 rpc_address: 0.0.0.0 *Would the rings be separate?* Thanks, *Tamar Fraenkel * Senior Software Engineer, TOK Media Mail Attachment.png ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 tokLogo.png
Rationale behind incrementing all tokens by one in a different datacenter (was: running two rings on the same subnet)
I am thinking about the frequent example: dc1 - node1: 0 dc1 - node2: large...number dc2 - node1: 1 dc2 - node2: large...number + 1 In theory using the same tokens in dc2 as in dc1 does not significantly affect key distribution, specifically the two keys on the border will move to the next one, but that is not much. However it seems that there is an unexplained requirement (at least I could not find an explanation), that all nodes must have a unique token, even if they are put into a different circle by NetworkTopologyStrategy. On 2012.03.05. 11:48, aaron morton wrote: Moreover all tokens must be unique (even across datacenters), although - from pure curiosity - I wonder what is the rationale behind this. Otherwise data is not evenly distributed.
Adding a second datacenter
Everything that I've read about data centers focuses on setting things up at the beginning of time. I've the the following situation: 10 machines in a datacenter (DC1), with replication factor of 2. I want to set up a second data center (DC2) with the following configuration: 20 machines with a replication factor of 4 What I've found is that if I initially start adding things, the first machine to join the network attempts to replicate all of the data from DC1 and fills up it's disk drive. I've played with setting the storage_options to have a replication factor of 0, then I can bring up all 20 machines in DC2 but then start getting a huge number of read errors from read on DC1. Is there a simple cookbook on how to add a second DC? I'm currently trying to set the replication factor to 1 and do a repair, but that doesn't feel like the right approach. Thanks,
Re: Adding a second datacenter
You need to make sure your clients are reading using LOCAL_* settings so that they don't try to get data from the other data center. But you shouldn't get errors while replication_factor is 0. Once you change the replication factor to 4, you should get missing data if you are using LOCAL_* for reading. What version are you using? See the IRC logs at the begining of this JIRA discussion thread for some info: https://issues.apache.org/jira/browse/CASSANDRA-3483 But you should be able to: 1. Set dc2:0 in the replication_factor. 2. Set bootstrap to false on the new nodes. 2. Start all of the new nodes. 3. Change replication_factor to dc2:4 4. run repair on the nodes in dc2. Once the repairs finish you should be able to start using DC2. You are still going to need a bunch of extra space because the repair is going to get you a couple copies of the data. Once 1.1 comes out it will have new nodetool commands for making this a little nicer per CASSANDRA-3483 -Jeremiah On 03/05/2012 09:42 AM, David Koblas wrote: Everything that I've read about data centers focuses on setting things up at the beginning of time. I've the the following situation: 10 machines in a datacenter (DC1), with replication factor of 2. I want to set up a second data center (DC2) with the following configuration: 20 machines with a replication factor of 4 What I've found is that if I initially start adding things, the first machine to join the network attempts to replicate all of the data from DC1 and fills up it's disk drive. I've played with setting the storage_options to have a replication factor of 0, then I can bring up all 20 machines in DC2 but then start getting a huge number of read errors from read on DC1. Is there a simple cookbook on how to add a second DC? I'm currently trying to set the replication factor to 1 and do a repair, but that doesn't feel like the right approach. Thanks,
Division by zero
After upgrading from version 1.0.1 to 1.0.8 we started to get exception: ERROR [http-8095-1 WideEntityServiceImpl.java:142] - get: key1 - {type=RANGE, start=0, end=9223372036854775807, orderDesc=false, limit=1} me.prettyprint.hector.api.exceptions.HCassandraInternalException: Cassandra encountered an internal error processing this request: TApplicationError type: 6 message:Internal error processing get_slice at me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:31) at me.prettyprint.cassandra.service.KeyspaceServiceImpl$7.execute(KeyspaceServiceImpl.java:285) at me.prettyprint.cassandra.service.KeyspaceServiceImpl$7.execute(KeyspaceServiceImpl.java:268) at me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:101) at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:233) at me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:131) at me.prettyprint.cassandra.service.KeyspaceServiceImpl.getSlice(KeyspaceServiceImpl.java:289) at me.prettyprint.cassandra.model.thrift.ThriftSliceQuery$1.doInKeyspace(ThriftSliceQuery.java:53) at me.prettyprint.cassandra.model.thrift.ThriftSliceQuery$1.doInKeyspace(ThriftSliceQuery.java:49) at me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20) at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:85) at me.prettyprint.cassandra.model.thrift.ThriftSliceQuery.execute(ThriftSliceQuery.java:48) I already (not too soon?) created an issue in jira with more detailed description: https://issues.apache.org/jira/browse/CASSANDRA-4000 Any ideas? Thanks.
Re: Rationale behind incrementing all tokens by one in a different datacenter (was: running two rings on the same subnet)
There is a requirement that all nodes have a unique token. There is still one global cluster/ring that each node needs to be unique on. The logically seperate rings that NetworkTopologyStrategy puts them into is hidden from the rest of the code. -Jeremiah On 03/05/2012 05:13 AM, Hontvári József Levente wrote: I am thinking about the frequent example: dc1 - node1: 0 dc1 - node2: large...number dc2 - node1: 1 dc2 - node2: large...number + 1 In theory using the same tokens in dc2 as in dc1 does not significantly affect key distribution, specifically the two keys on the border will move to the next one, but that is not much. However it seems that there is an unexplained requirement (at least I could not find an explanation), that all nodes must have a unique token, even if they are put into a different circle by NetworkTopologyStrategy. On 2012.03.05. 11:48, aaron morton wrote: Moreover all tokens must be unique (even across datacenters), although - from pure curiosity - I wonder what is the rationale behind this. Otherwise data is not evenly distributed.
Re: how stable is 1.0 these days?
Thanks for the feedback. I will certainly execute scrub after the update. On Mon, Mar 5, 2012 at 11:55 AM, Viktor Jevdokimov vjevdoki...@gmail.comwrote: 1.0.7 is very stable, weeks in high-load production environment without any exception, 1.0.8 should be even more stable, check changes.txt for what was fixed. 2012/3/2 Marcus Eriksson krum...@gmail.com beware of https://issues.apache.org/jira/browse/CASSANDRA-3820 though if you have many keys per node other than that, yep, it seems solid /Marcus On Wed, Feb 29, 2012 at 6:20 PM, Thibaut Britz thibaut.br...@trendiction.com wrote: Thanks! We will test it on our test cluster in the coming weeks and hopefully put it into production on our 200 node main cluster. :) Thibaut On Wed, Feb 29, 2012 at 5:52 PM, Edward Capriolo edlinuxg...@gmail.comwrote: On Wed, Feb 29, 2012 at 10:35 AM, Thibaut Britz thibaut.br...@trendiction.com wrote: Any more feedback on larger deployments of 1.0.*? We are eager to try out the new features in production, but don't want to run into bugs as on former 0.7 and 0.8 versions. Thanks, Thibaut On Tue, Jan 31, 2012 at 6:59 AM, Ben Coverston ben.covers...@datastax.com wrote: I'm not sure what Carlo is referring to, but generally if you have done, thousands of migrations you can end up in a situation where the migrations take a long time to replay, and there are some race conditions that can be problematic in the case where there are thousands of migrations that may need to be replayed while a node is bootstrapped. If you get into this situation it can be fixed by copying migrations from a known good schema to the node that you are trying to bootstrap. Generally I would advise against frequent schema updates. Unlike rows in column families the schema itself is designed to be relatively static. On Mon, Jan 30, 2012 at 2:14 PM, Jim Newsham jnews...@referentia.com wrote: Could you also elaborate for creating/dropping column families? We're currently working on moving to 1.0 and using dynamically created tables, so I'm very interested in what issues we might encounter. So far the only thing I've encountered (with 1.0.7 + hector 1.0-2) is that dropping a cf may sometimes fail with UnavailableException. I think this happens when the cf is busy being compacted. When I sleep/retry within a loop it eventually succeeds. Thanks, Jim On 1/26/2012 7:32 AM, Pierre-Yves Ritschard wrote: Can you elaborate on the composite types instabilities ? is this specific to hector as the radim's posts suggests ? These one liner answers are quite stressful :) On Thu, Jan 26, 2012 at 1:28 PM, Carlo Pirescarlopi...@gmail.com wrote: If you need to use composite types and create/drop column families on the fly you must be prepared to instabilities. -- Ben Coverston DataStax -- The Apache Cassandra Company I would call 1.0.7 rock fricken solid. Incredibly stable. It has been that way since I updated to 0.8.8 really. TBs of data, billions of requests a day, and thanks to JAMM, memtable type auto-tuning, and other enhancements I rarely, if ever, find a node in a state where it requires a restart. My clusters are beast-ing. There always is bugs in software, but coming from a guy who ran cassandra 0.6.1.Administration on my Cassandra cluster is like a vacation now.
Re: Adding a second datacenter
Jeremiah, Thanks! I'm running 1.0.8, two interesting things to note: - I don't have sufficient disk space to handle the straight bump to a replication factor of 4, so I think I'm going to have to do it one by one (1,2,3 and 4) with a bunch of cleanups in between. - Also, using a LOCAL_QUORUM doesn't work since my application has a hard response time limit then my read speed ends up being the speed of the slowest node. What I want is LOCAL_ONE which doesn't exist in the API (unless I missed something). Yes, CASSANDRA-3483 is really what I'm looking for. --david On 3/5/12 8:02 AM, Jeremiah Jordan wrote: You need to make sure your clients are reading using LOCAL_* settings so that they don't try to get data from the other data center. But you shouldn't get errors while replication_factor is 0. Once you change the replication factor to 4, you should get missing data if you are using LOCAL_* for reading. What version are you using? See the IRC logs at the begining of this JIRA discussion thread for some info: https://issues.apache.org/jira/browse/CASSANDRA-3483 But you should be able to: 1. Set dc2:0 in the replication_factor. 2. Set bootstrap to false on the new nodes. 2. Start all of the new nodes. 3. Change replication_factor to dc2:4 4. run repair on the nodes in dc2. Once the repairs finish you should be able to start using DC2. You are still going to need a bunch of extra space because the repair is going to get you a couple copies of the data. Once 1.1 comes out it will have new nodetool commands for making this a little nicer per CASSANDRA-3483 -Jeremiah On 03/05/2012 09:42 AM, David Koblas wrote: Everything that I've read about data centers focuses on setting things up at the beginning of time. I've the the following situation: 10 machines in a datacenter (DC1), with replication factor of 2. I want to set up a second data center (DC2) with the following configuration: 20 machines with a replication factor of 4 What I've found is that if I initially start adding things, the first machine to join the network attempts to replicate all of the data from DC1 and fills up it's disk drive. I've played with setting the storage_options to have a replication factor of 0, then I can bring up all 20 machines in DC2 but then start getting a huge number of read errors from read on DC1. Is there a simple cookbook on how to add a second DC? I'm currently trying to set the replication factor to 1 and do a repair, but that doesn't feel like the right approach. Thanks,
Re: Issue with nodetool clearsnapshot
It seems that instead of removing the snapshot, clearsnapshot moved the data files from the snapshot directory to the parent directory and the size of the data for that keyspace has doubled. That is not possible, there is only code there to delete a files in the snapshot. Note that in the snapshot are hard links to the files in the data dir. Deleting / clearing the snapshot will not delete the files from the data dir if they are still in use. Many of the files are looking like duplicates. in Keyspace1 directory 156987786084 Jan 21 03:18 Standard1-g-7317-Data.db 156987786084 Mar 4 01:33 Standard1-g-8850-Data.db Under 0.8.x files are not immediately deleted. Did the data directory contain zero size -Compacted files with the same number ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 5/03/2012, at 11:50 PM, B R wrote: Version 0.8.9 We run a 2 node cluster with RF=2. We ran a scrub and after that ran the clearsnapshot to remove the backup snapshot created by scrub. It seems that instead of removing the snapshot, clearsnapshot moved the data files from the snapshot directory to the parent directory and the size of the data for that keyspace has doubled. Many of the files are looking like duplicates. in Keyspace1 directory 156987786084 Jan 21 03:18 Standard1-g-7317-Data.db 156987786084 Mar 4 01:33 Standard1-g-8850-Data.db 118211555728 Jan 31 12:50 Standard1-g-7968-Data.db 118211555728 Mar 3 22:58 Standard1-g-8840-Data.db 116902342895 Feb 25 02:04 Standard1-g-8832-Data.db 116902342895 Mar 3 22:10 Standard1-g-8836-Data.db 93788425710 Feb 21 04:20 Standard1-g-8791-Data.db 93788425710 Mar 4 00:29 Standard1-g-8845-Data.db . Even though the nodetool ring command shows the correct data size for the node, the du -sh on the keyspace directory gives double the size. Can you guide us to proceed from this situation ? Thanks.
Re: running two rings on the same subnet
Create nodes that do not share seeds, and give the clusters different names as a safety measure. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 6/03/2012, at 12:04 AM, Tamar Fraenkel wrote: I want tow separate clusters. Tamar Fraenkel Senior Software Engineer, TOK Media tokLogo.png ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Mon, Mar 5, 2012 at 12:48 PM, aaron morton aa...@thelastpickle.com wrote: Do you want to create two separate clusters or a single cluster with two data centres ? If it's the later, token selection is discussed here http://www.datastax.com/docs/1.0/install/cluster_init#token-gen-cassandra Moreover all tokens must be unique (even across datacenters), although - from pure curiosity - I wonder what is the rationale behind this. Otherwise data is not evenly distributed. By the way, can someone enlighten me about the first line in the output of the nodetool. Obviously it contains a token, but nothing else. It seems like a formatting glitch, but maybe it has a role. It's the exclusive lower bound token for the first node in the ring. This also happens to be the token for the last node in the ring. In your setup 10.0.0.19 owns (85070591730234615865843651857942052864+1) to 0 10.0.0.28 owns (0 + 1) to 85070591730234615865843651857942052864 (does not imply primary replica, just used to map keys to nodes.) - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 5/03/2012, at 11:38 PM, Hontvári József Levente wrote: You have to use PropertyFileSnitch and NetworkTopologyStrategy to create a multi-datacenter setup with two circles. You can start reading from this page: http://www.datastax.com/docs/1.0/cluster_architecture/replication#about-replica-placement-strategy Moreover all tokens must be unique (even across datacenters), although - from pure curiosity - I wonder what is the rationale behind this. By the way, can someone enlighten me about the first line in the output of the nodetool. Obviously it contains a token, but nothing else. It seems like a formatting glitch, but maybe it has a role. On 2012.03.05. 11:06, Tamar Fraenkel wrote: Hi! I have a Cassandra cluster with two nodes nodetool ring -h localhost Address DC RackStatus State LoadOwns Token 85070591730234615865843651857942052864 10.0.0.19 datacenter1 rack1 Up Normal 488.74 KB 50.00% 0 10.0.0.28 datacenter1 rack1 Up Normal 504.63 KB 50.00% 85070591730234615865843651857942052864 I want to create a second ring with the same name but two different nodes. using tokengentool I get the same tokens as they are affected from the number of nodes in a ring. My question is like this: Lets say I create two new VMs, with IPs: 10.0.0.31 and 10.0.0.11 In 10.0.0.31 cassandra.yaml I will set initial_token: 0 seeds: 10.0.0.31 listen_address: 10.0.0.31 rpc_address: 0.0.0.0 In 10.0.0.11 cassandra.yaml I will set initial_token: 85070591730234615865843651857942052864 seeds: 10.0.0.31 listen_address: 10.0.0.11 rpc_address: 0.0.0.0 Would the rings be separate? Thanks, Tamar Fraenkel Senior Software Engineer, TOK Media Mail Attachment.png ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956
Re: Division by zero
(Commented in the ticket as well) What is the error in the server log ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 6/03/2012, at 5:04 AM, Vanger wrote: After upgrading from version 1.0.1 to 1.0.8 we started to get exception: ERROR [http-8095-1 WideEntityServiceImpl.java:142] - get: key1 - {type=RANGE, start=0, end=9223372036854775807, orderDesc=false, limit=1} me.prettyprint.hector.api.exceptions.HCassandraInternalException: Cassandra encountered an internal error processing this request: TApplicationError type: 6 message:Internal error processing get_slice at me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:31) at me.prettyprint.cassandra.service.KeyspaceServiceImpl$7.execute(KeyspaceServiceImpl.java:285) at me.prettyprint.cassandra.service.KeyspaceServiceImpl$7.execute(KeyspaceServiceImpl.java:268) at me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:101) at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:233) at me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:131) at me.prettyprint.cassandra.service.KeyspaceServiceImpl.getSlice(KeyspaceServiceImpl.java:289) at me.prettyprint.cassandra.model.thrift.ThriftSliceQuery$1.doInKeyspace(ThriftSliceQuery.java:53) at me.prettyprint.cassandra.model.thrift.ThriftSliceQuery$1.doInKeyspace(ThriftSliceQuery.java:49) at me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20) at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:85) at me.prettyprint.cassandra.model.thrift.ThriftSliceQuery.execute(ThriftSliceQuery.java:48) I already (not too soon?) created an issue in jira with more detailed description: https://issues.apache.org/jira/browse/CASSANDRA-4000 Any ideas? Thanks.
Re: Mutation Dropped Messages
I increased the size of the cluster also the concurrent_writes parameter. Still there is a node which keeps on dropping the mutation messages. Ensure all the nodes have the same spec, and the nodes have the same config. In a virtual environment consider moving the node. Is this due to some improper load balancing? What does nodetool ring say and what sort of queries (and RF and CL) are you sending. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 6/03/2012, at 3:58 AM, Tiwari, Dushyant wrote: Hey Aaron, I increased the size of the cluster also the concurrent_writes parameter. Still there is a node which keeps on dropping the mutation messages. The other nodes are not dropping mutation messages. I am using Hector API and had done nothing for load balancing so far. Just provided the host:port of the nodes in the Cassandrahostconfig. Is this due to some improper load balancing? Also the physical host where the node is hosted is relatively heavier than other nodes’ host. What can I do to improve? PS: The node is seed of the cluster. Thanks, Dushyant From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Monday, March 05, 2012 4:15 PM To: user@cassandra.apache.org Subject: Re: Mutation Dropped Messages 1. Which parameters to tune in the config files? – Especially looking for heavy writes The node is overloaded. It may be because there are no enough nodes, or the node is under temporary stress such as GC or repair. If you have spare IO / CPU capacity you could increase the current_writes to increase throughput on the write stage. You then need to ensure the commit log and, to a lesser degree, the data volumes can keep up. 2. What is the difference between TimedOutException and silently dropping mutation messages while operating on a CL of QUORUM. TimedOutExceptions means CL nodes did not respond to the coordinator before rpc_timeout. Dropping messages happens when a message is removed from the queue in the a thread pool after rpc_timeout has occurred. it is a feature of the architecture, and correct behaviour under stress. Inconsistencies created by dropped messages are repaired via reads as high CL, HH (in 1.+), Read Repair or Anti Entropy. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 5/03/2012, at 11:32 PM, Tiwari, Dushyant wrote: Hi All, While benchmarking Cassandra I found “Mutation Dropped” messages in the logs. Now I know this is a good old question. It will be really great if someone can provide a check list to recover when such a thing happens. I am looking for answers of the following questions - 1. Which parameters to tune in the config files? – Especially looking for heavy writes 2. What is the difference between TimedOutException and silently dropping mutation messages while operating on a CL of QUORUM. Regards, Dushyant NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or views contained herein are not intended to be, and do not constitute, advice within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and Consumer Protection Act. If you have received this communication in error, please destroy all electronic and paper copies and notify the sender immediately. Mistransmission is not intended to waive confidentiality or privilege. Morgan Stanley reserves the right, to the extent permitted under applicable law, to monitor electronic communications. This message is subject to terms available at the following link: http://www.morganstanley.com/disclaimers. If you cannot access these links, please notify us by reply message and we will send the contents to you. By messaging with Morgan Stanley you consent to the foregoing. NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or views contained herein are not intended to be, and do not constitute, advice within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and Consumer Protection Act. If you have received this communication in error, please destroy all electronic and paper copies and notify the sender immediately. Mistransmission is not intended to waive confidentiality or privilege. Morgan Stanley reserves the right, to the extent permitted under applicable law, to monitor electronic communications. This message is subject to terms available at the following link:http://www.morganstanley.com/disclaimers. If you cannot access these links, please notify us by reply message and we will send the contents to you. By messaging with Morgan Stanley you consent to the foregoing.
Re: Issue with nodetool clearsnapshot
Hi Aaron, 1)Since you mentioned hard links, I would like to add that our data directory itself is a sym-link. Could that be causing an issue ? 2)Yes, there are 0 byte files of the same numbers in Keyspace1 directory 0 Mar 4 01:33 Standard1-g-7317-Compacted 0 Mar 3 22:58 Standard1-g-7968-Compacted 0 Mar 3 23:10 Standard1-g-8778-Compacted 0 Mar 3 23:47 Standard1-g-8782-Compacted ... I restarted the node and it went about deleting the files and the disk space has been released. Can this be done using nodetool, and without restarting ? Thanks. On Mon, Mar 5, 2012 at 10:59 PM, aaron morton aa...@thelastpickle.comwrote: It seems that instead of removing the snapshot, clearsnapshot moved the data files from the snapshot directory to the parent directory and the size of the data for that keyspace has doubled. That is not possible, there is only code there to delete a files in the snapshot. Note that in the snapshot are hard links to the files in the data dir. Deleting / clearing the snapshot will not delete the files from the data dir if they are still in use. Many of the files are looking like duplicates. in Keyspace1 directory 156987786084 Jan 21 03:18 Standard1-g-7317-Data.db 156987786084 Mar 4 01:33 Standard1-g-8850-Data.db Under 0.8.x files are not immediately deleted. Did the data directory contain zero size -Compacted files with the same number ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 5/03/2012, at 11:50 PM, B R wrote: Version 0.8.9 We run a 2 node cluster with RF=2. We ran a scrub and after that ran the clearsnapshot to remove the backup snapshot created by scrub. It seems that instead of removing the snapshot, clearsnapshot moved the data files from the snapshot directory to the parent directory and the size of the data for that keyspace has doubled. Many of the files are looking like duplicates. in Keyspace1 directory 156987786084 Jan 21 03:18 Standard1-g-7317-Data.db 156987786084 Mar 4 01:33 Standard1-g-8850-Data.db 118211555728 Jan 31 12:50 Standard1-g-7968-Data.db 118211555728 Mar 3 22:58 Standard1-g-8840-Data.db 116902342895 Feb 25 02:04 Standard1-g-8832-Data.db 116902342895 Mar 3 22:10 Standard1-g-8836-Data.db 93788425710 Feb 21 04:20 Standard1-g-8791-Data.db 93788425710 Mar 4 00:29 Standard1-g-8845-Data.db . Even though the nodetool ring command shows the correct data size for the node, the du -sh on the keyspace directory gives double the size. Can you guide us to proceed from this situation ? Thanks.
hector connection pool
I just got this error : All host pools marked down. Retry burden pushed out to client. in a few clients recently, client could not recover, we have to restart client application. we are using 0.8.0.3 hector. At that time we did compaction for a CF, it takes several hours, server was busy. But I think client should recover after server load was down. Any bug reported about this? I did search but could not find one. Thanks, Daning
RE: Secondary indexes don't go away after metadata change
Thank you very much for your response. It is true that the older, previously existing nodes are not snapshotting the indexes that I had removed. I'll go ahead and just delete those SSTables from the data directory. They may be around still because they were created back when we used 0.8. The more troubling issue is with adding new nodes to the cluster though. It built indexes for column families that have had all indexes dropped weeks or months in the past. It also will snapshot the index SSTables that it created. The index files are non-empty as well, some are hundreds of megabytes. All nodes have the same schema, none list themselves as having the rows indexed. I cannot drop the indexes via the CLI either because it says that they don't exist. It's quite perplexing. - Mike From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Monday, March 05, 2012 3:58 AM To: user@cassandra.apache.org Subject: Re: Secondary indexes don't go away after metadata change The secondary index CF's are marked as no longer required / marked as compacted. under 1.x they would then be deleted reasonably quickly, and definitely deleted after a restart. Is there a zero length .Compacted file there ? Also, when adding a new node to the ring the new node will build indexes for the ones that supposedly don't exist any longer. Is this supposed to happen? Would this have happened if I had deleted the old SSTables from the previously existing nodes? Check you have a consistent schema using describe cluster in the CLI. And check the schema is what you think it is using show schema. Another trick is to do a snapshot. Only the files in use are included the snapshot. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 2/03/2012, at 2:53 AM, Frisch, Michael wrote: I have a few column families that I decided to get rid of the secondary indexes on. I see that there aren't any new index SSTables being created, but all of the old ones remain (some from as far back as September). Is it safe to just delete then when the node is offline? Should I run clean-up or scrub? Also, when adding a new node to the ring the new node will build indexes for the ones that supposedly don't exist any longer. Is this supposed to happen? Would this have happened if I had deleted the old SSTables from the previously existing nodes? The nodes in question have either been upgraded from v0.8.1 = v1.0.2 (scrubbed at this time) = v1.0.6 or from v1.0.2 = v1.0.6. The secondary index was dropped when the nodes were version 1.0.6. The new node added was also 1.0.6. - Mike
Cassandra cache patterns with thiny and wide rows
I've asked this question already on stackoverflow but without answer - I wll try again: My use case expects heavy read load - there are two possible model design strategies: 1. Tiny rows with row cache: In this case row is small enough to fit into RAM and all columns are being cached. Read access should be fast. 2. Wide rows with key cache. Wide rows with large columns amount are to big for row cache. Access to column subset requires HDD seek. As I understand using wide rows is a good design pattern. But we would need to disable row cache - so what is the benefit of such wide row (at least for read access)? Which approach is better 1 or 2?
Re: hector connection pool
Have you tried to change: me.prettyprint.cassandra.service.CassandraHostConfigurator#retryDownedHostsDelayInSeconds ? Hector will ping down hosts every xx seconds and recover connection. Regards, Maciej On Mon, Mar 5, 2012 at 8:13 PM, Daning Wang dan...@netseer.com wrote: I just got this error : All host pools marked down. Retry burden pushed out to client. in a few clients recently, client could not recover, we have to restart client application. we are using 0.8.0.3 hector. At that time we did compaction for a CF, it takes several hours, server was busy. But I think client should recover after server load was down. Any bug reported about this? I did search but could not find one. Thanks, Daning
Re: Cassandra cache patterns with thiny and wide rows
Depends on how large is a data set, specifically hot data, comparing to available RAM, what is a heavy read load, and what are the latency requirements. 2012/3/6 Maciej Miklas mac.mik...@googlemail.com I've asked this question already on stackoverflow but without answer - I wll try again: My use case expects heavy read load - there are two possible model design strategies: 1. Tiny rows with row cache: In this case row is small enough to fit into RAM and all columns are being cached. Read access should be fast. 2. Wide rows with key cache. Wide rows with large columns amount are to big for row cache. Access to column subset requires HDD seek. As I understand using wide rows is a good design pattern. But we would need to disable row cache - so what is the benefit of such wide row (at least for read access)? Which approach is better 1 or 2?
Re: running two rings on the same subnet
Works.. But during the night my setup encountered a problem. I have two VMs on my cluster (running on VmWare ESXi). Each VM has1GB memory, and two Virtual Disks of 16 GB They are running on a small server with 4CPUs (2.66 GHz), and 4 GB memory (together with two other VMs) I put cassandra data on the second disk of each machine. VMs are running Ubuntu 11.10 and cassandra 1.0.7. I left them running overnight and this morning when I came: In one node cassandra was down, and the last thing in the system.log is: INFO [CompactionExecutor:150] 2012-03-06 00:55:04,821 CompactionTask.java (line 113) Compacting [SSTableReader(path='/opt/cassandra/data/tok/tk_vertical_tag_story_indx-hc-1243-Data.db'), SSTableReader(path='/opt/cassandra/data/tok/tk_vertical_tag_story_indx-hc-1245-Data.db'), SSTableReader(path='/opt/cassandra/data/tok/tk_vertical_tag_story_indx-hc-1242-Data.db'), SSTableReader(path='/opt/cassandra/data/tok/tk_vertical_tag_story_indx-hc-1244-Data.db')] INFO [CompactionExecutor:150] 2012-03-06 00:55:07,919 CompactionTask.java (line 221) Compacted to [/opt/cassandra/data/tok/tk_vertical_tag_story_indx-hc-1246-Data.db,]. 32,424,771 to 26,447,685 (~81% of original) bytes for 58,938 keys at 8.144165MB/s. Time: 3,097ms. The other node was using all it's CPU and I had to restart it. After that, I can see that the last lines in it's system.log are that the other node is down... INFO [FlushWriter:142] 2012-03-06 00:55:02,418 Memtable.java (line 246) Writing Memtable-tk_vertical_tag_story_indx@1365852701(1122169/25154556 serialized/live bytes, 21173 ops) INFO [FlushWriter:142] 2012-03-06 00:55:02,742 Memtable.java (line 283) Completed flushing /opt/cassandra/data/tok/tk_vertical_tag_story_indx-hc-1244-Data.db (2075930 bytes) INFO [GossipTasks:1] 2012-03-06 08:02:18,584 Gossiper.java (line 818) InetAddress /10.0.0.31 is now dead. How can I trace why that happened? Also, I brought cassandra up in both nodes. They both spend long time reading commit logs, but now they seem to run. Any idea how to debug or improve my setup? Thanks, Tamar *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Mon, Mar 5, 2012 at 7:30 PM, aaron morton aa...@thelastpickle.comwrote: Create nodes that do not share seeds, and give the clusters different names as a safety measure. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 6/03/2012, at 12:04 AM, Tamar Fraenkel wrote: I want tow separate clusters. *Tamar Fraenkel * Senior Software Engineer, TOK Media tokLogo.png ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Mon, Mar 5, 2012 at 12:48 PM, aaron morton aa...@thelastpickle.comwrote: Do you want to create two separate clusters or a single cluster with two data centres ? If it's the later, token selection is discussed here http://www.datastax.com/docs/1.0/install/cluster_init#token-gen-cassandra Moreover all tokens must be unique (even across datacenters), although - from pure curiosity - I wonder what is the rationale behind this. Otherwise data is not evenly distributed. By the way, can someone enlighten me about the first line in the output of the nodetool. Obviously it contains a token, but nothing else. It seems like a formatting glitch, but maybe it has a role. It's the exclusive lower bound token for the first node in the ring. This also happens to be the token for the last node in the ring. In your setup 10.0.0.19 owns (85070591730234615865843651857942052864+1) to 0 10.0.0.28 owns (0 + 1) to 85070591730234615865843651857942052864 (does not imply primary replica, just used to map keys to nodes.) - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 5/03/2012, at 11:38 PM, Hontvári József Levente wrote: You have to use PropertyFileSnitch and NetworkTopologyStrategy to create a multi-datacenter setup with two circles. You can start reading from this page: http://www.datastax.com/docs/1.0/cluster_architecture/replication#about-replica-placement-strategy Moreover all tokens must be unique (even across datacenters), although - from pure curiosity - I wonder what is the rationale behind this. By the way, can someone enlighten me about the first line in the output of the nodetool. Obviously it contains a token, but nothing else. It seems like a formatting glitch, but maybe it has a role. On 2012.03.05. 11:06, Tamar Fraenkel wrote: Hi! I have a Cassandra cluster with two nodes nodetool ring -h localhost Address DC RackStatus State Load OwnsToken 85070591730234615865843651857942052864 10.0.0.19 datacenter1 rack1 Up Normal 488.74 KB 50.00% 0 10.0.0.28 datacenter1 rack1 Up Normal 504.63 KB 50.00%