Understanding Cassandra Architecture
Hi all, 1- I have 4 nodes A, B, C, and D (A is the seed node). I have modified seed node address and listen address of each Cassandra.yaml file (I have modified nothing else) 2- Then, from A: CREATE KEYSPACE my_keyspace WITH REPLICATION = {‘class’:’Simple Strategy, ‘replication_factor’:2} CREATE TABLE my_table (text varchar PRIMARY KEY); USE my_keyspace; INSERT INTO my_table (text)VALUES('text1'); Please, I try to understand the architecture of Cassandra and what happen exactly in this case: As I understand: *Step 1:* Create the ring 1) The seed node A bootstrap gossip protocol and wait listening other clients (B, C, D) 2) The other node joins (contacts) seed node and run gossip process (to exchange information across the cluster every second) *Question 1:* at that moment (at this step) the seed node store some details about clients? It broadcasts details to other clients? Each node exchanges information across the cluster every second ? *Step 2:* Creating the key space and Storing rows (replication factor =2) Thank you so much for help. Best Reagrds.
auto_bootstrap=false broken?
Hi everyone, I'll just ask my question as provocative as possible ;-) Isnt't auto_bootstrap=false broken the way it is currently implemented? What currently happens: New node starts with auto_bootstrap=false and it starts serving reads immediately without having any data. Would the following be more correct: - New node should stay in a joining state - Operator loads data (e.g. using nodetool rebuild or putting in backupped files or whatever) - Operator has to manually switch from joining into normal state using nodetool (only then it will start serving reads) Wouldn't this behaviour more consistent? kind regards, Christian
RE: auto_bootstrap=false broken?
I had problems with write_survey. I opened a bug : https://issues.apache.org/jira/browse/CASSANDRA-9934 De : horschi [mailto:hors...@gmail.com] Envoyé : mardi 4 août 2015 15:20 À : user@cassandra.apache.org Objet : Re: auto_bootstrap=false broken? Hi Paulo, thanks for your feedback, but I think this is not what I am looking for. Starting with join_ring does not take any tokens in the ring. And the nodetool join afterwards will again do token-selection and data loading in one step. I would like to separate these steps: 1. assign tokens 2. have the node in a joining state, so that I can copy in data 3. mark the node as ready I just saw that perhaps write_survey could be misused for that. Did anyone ever use write_survey for such a partial bootstrapping? Do I have to worry about data-loss when using multiple write_survey nodes in one cluster? kind regards, Christian On Tue, Aug 4, 2015 at 2:24 PM, Paulo Motta pauloricard...@gmail.commailto:pauloricard...@gmail.com wrote: Hello Christian, You may use the start-up parameter -Dcassandra.join_ring=false if you don't want the node to join the ring on startup. More about this parameter here: http://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsCUtility_t.html You can later join the ring via nodetool join command: http://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsJoin.html auto_bootstrap=false is typically used to bootstrap new datacenters or clusters, or nodes with data already on it before starting the process. Cheers, Paulo 2015-08-04 8:50 GMT-03:00 horschi hors...@gmail.commailto:hors...@gmail.com: Hi everyone, I'll just ask my question as provocative as possible ;-) Isnt't auto_bootstrap=false broken the way it is currently implemented? What currently happens: New node starts with auto_bootstrap=false and it starts serving reads immediately without having any data. Would the following be more correct: - New node should stay in a joining state - Operator loads data (e.g. using nodetool rebuild or putting in backupped files or whatever) - Operator has to manually switch from joining into normal state using nodetool (only then it will start serving reads) Wouldn't this behaviour more consistent? kind regards, Christian _ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you.
Re: auto_bootstrap=false broken?
Hi Paulo, thanks for your feedback, but I think this is not what I am looking for. Starting with join_ring does not take any tokens in the ring. And the nodetool join afterwards will again do token-selection and data loading in one step. I would like to separate these steps: 1. assign tokens 2. have the node in a joining state, so that I can copy in data 3. mark the node as ready I just saw that perhaps write_survey could be misused for that. Did anyone ever use write_survey for such a partial bootstrapping? Do I have to worry about data-loss when using multiple write_survey nodes in one cluster? kind regards, Christian On Tue, Aug 4, 2015 at 2:24 PM, Paulo Motta pauloricard...@gmail.com wrote: Hello Christian, You may use the start-up parameter -Dcassandra.join_ring=false if you don't want the node to join the ring on startup. More about this parameter here: http://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsCUtility_t.html You can later join the ring via nodetool join command: http://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsJoin.html auto_bootstrap=false is typically used to bootstrap new datacenters or clusters, or nodes with data already on it before starting the process. Cheers, Paulo 2015-08-04 8:50 GMT-03:00 horschi hors...@gmail.com: Hi everyone, I'll just ask my question as provocative as possible ;-) Isnt't auto_bootstrap=false broken the way it is currently implemented? What currently happens: New node starts with auto_bootstrap=false and it starts serving reads immediately without having any data. Would the following be more correct: - New node should stay in a joining state - Operator loads data (e.g. using nodetool rebuild or putting in backupped files or whatever) - Operator has to manually switch from joining into normal state using nodetool (only then it will start serving reads) Wouldn't this behaviour more consistent? kind regards, Christian
Re: auto_bootstrap=false broken?
Hello Christian, You may use the start-up parameter -Dcassandra.join_ring=false if you don't want the node to join the ring on startup. More about this parameter here: http://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsCUtility_t.html You can later join the ring via nodetool join command: http://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsJoin.html auto_bootstrap=false is typically used to bootstrap new datacenters or clusters, or nodes with data already on it before starting the process. Cheers, Paulo 2015-08-04 8:50 GMT-03:00 horschi hors...@gmail.com: Hi everyone, I'll just ask my question as provocative as possible ;-) Isnt't auto_bootstrap=false broken the way it is currently implemented? What currently happens: New node starts with auto_bootstrap=false and it starts serving reads immediately without having any data. Would the following be more correct: - New node should stay in a joining state - Operator loads data (e.g. using nodetool rebuild or putting in backupped files or whatever) - Operator has to manually switch from joining into normal state using nodetool (only then it will start serving reads) Wouldn't this behaviour more consistent? kind regards, Christian
Re: Long joining node
That's the one. I set it to an hour to be safe (if a stream goes above the timeout it will get restarted) but it can probably be lower. All the best, [image: datastax_logo.png] http://www.datastax.com/ Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] https://www.linkedin.com/company/datastax [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax http://cassandrasummit-datastax.com/?utm_campaign=summit15utm_medium=summiticonutm_source=emailsignature DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Tue, Aug 4, 2015 at 2:21 PM, Stan Lemon sle...@salesforce.com wrote: Sebastian, You're referring to streaming_socket_timeout_in_ms correct? What value do you recommend? All of my nodes are currently at the default 0. Thanks, Stan On Tue, Aug 4, 2015 at 2:16 PM, Sebastian Estevez sebastian.este...@datastax.com wrote: It helps to set stream socket timeout in the yaml so that you don't hang forever on a lost / broken stream. All the best, [image: datastax_logo.png] http://www.datastax.com/ Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] https://www.linkedin.com/company/datastax [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax http://cassandrasummit-datastax.com/?utm_campaign=summit15utm_medium=summiticonutm_source=emailsignature DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Tue, Aug 4, 2015 at 2:14 PM, Robert Coli rc...@eventbrite.com wrote: On Tue, Aug 4, 2015 at 11:02 AM, Stan Lemon sle...@salesforce.com wrote: I am attempting to add a 13th node in one of the datacenters. I have been monitoring this process from the node itself with nodetool netstats and from one of the existing nodes using nodetool status. On the existing node I see the new node as UJ. I have watched the load steadily climb up to about 203.4gb, and then over the last two hours it has fluctuated a bit and has been steadily dropping to about 203.1gb It's probably hung. If I were you I'd probably wipe the node and re-bootstrap. (what version of cassandra/what network are you on (AWS?)/etc.) =Rob
Re: auto_bootstrap=false broken?
Hi Robert, sorry for the confusion. Perhaps write_survey is not my solution (unfortunetaly I cant get it to work, so I dont really know). I just thought that it *could* be my solution. What I actually want: I want to be able to start a new node, without it starting to serve reads prematurely. I want cassandra to wait for me to confirm everything is ok, now serve reads. Possible solutions so far: A) When starting a new node with auto_bootstrap=false, then I get a node that has no data, but serves reads. In my opinion it would be cleaner if it would stay in a joining state where it only receives writes. B) Disabling join_ring on my new node does nothing. The new node will not have a token. I cant see it in nodetool status. Therefore I assume it will not receive any writes. C) write_survey unfortunetaly does not seem to work for me: My new node, which I start with survey-mode, gets writes from other nodes and shows as joining in the ring. Which is good! But does not get a schema, so it throws exceptions when receiving these writes. I assume its just a bug in 2.0. Disclaimer: I am using C* 2.0, with which I can't get the desire behaviour (or at least I don't know how). kind regards, Christian On Tue, Aug 4, 2015 at 7:12 PM, Robert Coli rc...@eventbrite.com wrote: On Tue, Aug 4, 2015 at 6:19 AM, horschi hors...@gmail.com wrote: I would like to separate these steps: 1. assign tokens 2. have the node in a joining state, so that I can copy in data 3. mark the node as ready Did anyone ever use write_survey for such a partial bootstrapping? What you're asking doesn't make sense to me. What does partial bootstrap mean? Where are you getting the data from? How are you copying in data and why do you need the node to be in a joining state to do that? https://issues.apache.org/jira/browse/CASSANDRA-6961 Explains a method by which you can repair a partially joined node. In what way does this differ from what you want? =Rob
Re: auto_bootstrap=false broken?
You're trying to solve a problem that doesn't exist. Cassandra only starts serving reads when it's ready. On Tue, Aug 4, 2015 at 10:51 AM horschi hors...@gmail.com wrote: Hi Robert, sorry for the confusion. Perhaps write_survey is not my solution (unfortunetaly I cant get it to work, so I dont really know). I just thought that it *could* be my solution. What I actually want: I want to be able to start a new node, without it starting to serve reads prematurely. I want cassandra to wait for me to confirm everything is ok, now serve reads. Possible solutions so far: A) When starting a new node with auto_bootstrap=false, then I get a node that has no data, but serves reads. In my opinion it would be cleaner if it would stay in a joining state where it only receives writes. B) Disabling join_ring on my new node does nothing. The new node will not have a token. I cant see it in nodetool status. Therefore I assume it will not receive any writes. C) write_survey unfortunetaly does not seem to work for me: My new node, which I start with survey-mode, gets writes from other nodes and shows as joining in the ring. Which is good! But does not get a schema, so it throws exceptions when receiving these writes. I assume its just a bug in 2.0. Disclaimer: I am using C* 2.0, with which I can't get the desire behaviour (or at least I don't know how). kind regards, Christian On Tue, Aug 4, 2015 at 7:12 PM, Robert Coli rc...@eventbrite.com wrote: On Tue, Aug 4, 2015 at 6:19 AM, horschi hors...@gmail.com wrote: I would like to separate these steps: 1. assign tokens 2. have the node in a joining state, so that I can copy in data 3. mark the node as ready Did anyone ever use write_survey for such a partial bootstrapping? What you're asking doesn't make sense to me. What does partial bootstrap mean? Where are you getting the data from? How are you copying in data and why do you need the node to be in a joining state to do that? https://issues.apache.org/jira/browse/CASSANDRA-6961 Explains a method by which you can repair a partially joined node. In what way does this differ from what you want? =Rob
Long joining node
Hello, I have a a cluster with 12 nodes each in 2 datacenters for a total of 24 nodes. I am attempting to add a 13th node in one of the datacenters. I have been monitoring this process from the node itself with nodetool netstats and from one of the existing nodes using nodetool status. On the existing node I see the new node as UJ. I have watched the load steadily climb up to about 203.4gb, and then over the last two hours it has fluctuated a bit and has been steadily dropping to about 203.1gb On the node that I am adding I watched over several hours as nodetool netstats received data, however for the last couple of hours nodetool netsats simply shows the ips of the other nodes in the cluster. It looks something like to this... Mode: JOINING Bootstrap 659153b0-3ab6-11e5-8c94-5dd79366f3d9 /10.1.82.160 /10.1.82.162 /10.1.82.80 /10.2.123.74 /10.1.82.166 /10.1.82.158 /10.1.82.168 /10.1.82.150 /10.1.82.148 /10.2.123.2 /10.1.82.152 /10.1.82.156 /10.84.78.120 /10.2.123.80 /10.2.123.78 /10.81.122.64 /10.2.123.82 /10.2.123.84 /10.1.82.164 /10.81.122.62 /10.2.123.76 Read Repair Statistics: Attempted: 0 Mismatch (Blocking): 0 Mismatch (Background): 0 Pool NameActive Pending Completed Commandsn/a 0 24 Responses n/a 01090793 So I'm trying to figure out... What is the node doing? Why is it still joining? How long should I wait before being concerned? Also... The UUID next to the word 'Bootstrap' is NOT the host ID of the node joining, it's actually the UUID of a different node already in the cluster. This seems concerning to me, but again I'm not sure if this is expected behavior or not. ANY help would be greatly appreciated. Thanks, Stan
Re: Long joining node
It helps to set stream socket timeout in the yaml so that you don't hang forever on a lost / broken stream. All the best, [image: datastax_logo.png] http://www.datastax.com/ Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] https://www.linkedin.com/company/datastax [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax http://cassandrasummit-datastax.com/?utm_campaign=summit15utm_medium=summiticonutm_source=emailsignature DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Tue, Aug 4, 2015 at 2:14 PM, Robert Coli rc...@eventbrite.com wrote: On Tue, Aug 4, 2015 at 11:02 AM, Stan Lemon sle...@salesforce.com wrote: I am attempting to add a 13th node in one of the datacenters. I have been monitoring this process from the node itself with nodetool netstats and from one of the existing nodes using nodetool status. On the existing node I see the new node as UJ. I have watched the load steadily climb up to about 203.4gb, and then over the last two hours it has fluctuated a bit and has been steadily dropping to about 203.1gb It's probably hung. If I were you I'd probably wipe the node and re-bootstrap. (what version of cassandra/what network are you on (AWS?)/etc.) =Rob
Re: auto_bootstrap=false broken?
On Tue, Aug 4, 2015 at 6:19 AM, horschi hors...@gmail.com wrote: I would like to separate these steps: 1. assign tokens 2. have the node in a joining state, so that I can copy in data 3. mark the node as ready Did anyone ever use write_survey for such a partial bootstrapping? What you're asking doesn't make sense to me. What does partial bootstrap mean? Where are you getting the data from? How are you copying in data and why do you need the node to be in a joining state to do that? https://issues.apache.org/jira/browse/CASSANDRA-6961 Explains a method by which you can repair a partially joined node. In what way does this differ from what you want? =Rob
Re: Long joining node
Sebastian, You're referring to streaming_socket_timeout_in_ms correct? What value do you recommend? All of my nodes are currently at the default 0. Thanks, Stan On Tue, Aug 4, 2015 at 2:16 PM, Sebastian Estevez sebastian.este...@datastax.com wrote: It helps to set stream socket timeout in the yaml so that you don't hang forever on a lost / broken stream. All the best, [image: datastax_logo.png] http://www.datastax.com/ Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] https://www.linkedin.com/company/datastax [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax http://cassandrasummit-datastax.com/?utm_campaign=summit15utm_medium=summiticonutm_source=emailsignature DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Tue, Aug 4, 2015 at 2:14 PM, Robert Coli rc...@eventbrite.com wrote: On Tue, Aug 4, 2015 at 11:02 AM, Stan Lemon sle...@salesforce.com wrote: I am attempting to add a 13th node in one of the datacenters. I have been monitoring this process from the node itself with nodetool netstats and from one of the existing nodes using nodetool status. On the existing node I see the new node as UJ. I have watched the load steadily climb up to about 203.4gb, and then over the last two hours it has fluctuated a bit and has been steadily dropping to about 203.1gb It's probably hung. If I were you I'd probably wipe the node and re-bootstrap. (what version of cassandra/what network are you on (AWS?)/etc.) =Rob
Re: Long joining node
I'm running 2.0.11. These nodes are on bare metal at Soft Layer. So after I sent my first post logs a RuntimeException popped up in the logs, not sure if this might be related? ERROR 14:13:17,648 Exception in thread Thread[CompactionExecutor:505,1,main] java.lang.RuntimeException: Last written key DecoratedKey(7750448305743929847, 003076697369746f725f706167655f76696561316261636466313364626531633637326636336366646636636138333032314930303030616336342d303030302d303033302d343030302d3030303030303030616336343a64323065323964632d643836342d313165342d383030302d303030303063366500) = current key DecoratedKey(-8457751561836812744, 002c6f626a6563745f617564697438616462666462383532646536373263393534633831326130663837326466375230303030326263322d303030302d303033302d343030302d3030303030303030326263323a30303030326263322d303030302d303033302d343030302d3030303030336331373630303a50726f737065637400) writing into /var/lib/cassandra/data/pi/__shardindex/pi-__shardindex-tmp-jb-657-Data.db at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:143) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:166) at org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:170) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) On Tue, Aug 4, 2015 at 2:14 PM, Robert Coli rc...@eventbrite.com wrote: On Tue, Aug 4, 2015 at 11:02 AM, Stan Lemon sle...@salesforce.com wrote: I am attempting to add a 13th node in one of the datacenters. I have been monitoring this process from the node itself with nodetool netstats and from one of the existing nodes using nodetool status. On the existing node I see the new node as UJ. I have watched the load steadily climb up to about 203.4gb, and then over the last two hours it has fluctuated a bit and has been steadily dropping to about 203.1gb It's probably hung. If I were you I'd probably wipe the node and re-bootstrap. (what version of cassandra/what network are you on (AWS?)/etc.) =Rob
Re: auto_bootstrap=false broken?
Hi Aeljami, thanks for the ticket. I'll keep an eye on it. I can't get the survey to work at all on 2.0 (I am not getting any schema on the survey node). So I guess the survey is not going to be a solution for now. kind regards, Christian On Tue, Aug 4, 2015 at 3:29 PM, aeljami@orange.com wrote: I had problems with write_survey. I opened a bug : https://issues.apache.org/jira/browse/CASSANDRA-9934 *De :* horschi [mailto:hors...@gmail.com] *Envoyé :* mardi 4 août 2015 15:20 *À :* user@cassandra.apache.org *Objet :* Re: auto_bootstrap=false broken? Hi Paulo, thanks for your feedback, but I think this is not what I am looking for. Starting with join_ring does not take any tokens in the ring. And the nodetool join afterwards will again do token-selection and data loading in one step. I would like to separate these steps: 1. assign tokens 2. have the node in a joining state, so that I can copy in data 3. mark the node as ready I just saw that perhaps write_survey could be misused for that. Did anyone ever use write_survey for such a partial bootstrapping? Do I have to worry about data-loss when using multiple write_survey nodes in one cluster? kind regards, Christian On Tue, Aug 4, 2015 at 2:24 PM, Paulo Motta pauloricard...@gmail.com wrote: Hello Christian, You may use the start-up parameter -Dcassandra.join_ring=false if you don't want the node to join the ring on startup. More about this parameter here: http://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsCUtility_t.html You can later join the ring via nodetool join command: http://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsJoin.html auto_bootstrap=false is typically used to bootstrap new datacenters or clusters, or nodes with data already on it before starting the process. Cheers, Paulo 2015-08-04 8:50 GMT-03:00 horschi hors...@gmail.com: Hi everyone, I'll just ask my question as provocative as possible ;-) Isnt't auto_bootstrap=false broken the way it is currently implemented? What currently happens: New node starts with auto_bootstrap=false and it starts serving reads immediately without having any data. Would the following be more correct: - New node should stay in a joining state - Operator loads data (e.g. using nodetool rebuild or putting in backupped files or whatever) - Operator has to manually switch from joining into normal state using nodetool (only then it will start serving reads) Wouldn't this behaviour more consistent? kind regards, Christian _ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you.
Re: Long joining node
On Tue, Aug 4, 2015 at 11:02 AM, Stan Lemon sle...@salesforce.com wrote: I am attempting to add a 13th node in one of the datacenters. I have been monitoring this process from the node itself with nodetool netstats and from one of the existing nodes using nodetool status. On the existing node I see the new node as UJ. I have watched the load steadily climb up to about 203.4gb, and then over the last two hours it has fluctuated a bit and has been steadily dropping to about 203.1gb It's probably hung. If I were you I'd probably wipe the node and re-bootstrap. (what version of cassandra/what network are you on (AWS?)/etc.) =Rob
Re: auto_bootstrap=false broken?
Hi Jonathan, unless you specify auto_bootstrap=false :) kind regards, Christian On Tue, Aug 4, 2015 at 7:54 PM, Jonathan Haddad j...@jonhaddad.com wrote: You're trying to solve a problem that doesn't exist. Cassandra only starts serving reads when it's ready. On Tue, Aug 4, 2015 at 10:51 AM horschi hors...@gmail.com wrote: Hi Robert, sorry for the confusion. Perhaps write_survey is not my solution (unfortunetaly I cant get it to work, so I dont really know). I just thought that it *could* be my solution. What I actually want: I want to be able to start a new node, without it starting to serve reads prematurely. I want cassandra to wait for me to confirm everything is ok, now serve reads. Possible solutions so far: A) When starting a new node with auto_bootstrap=false, then I get a node that has no data, but serves reads. In my opinion it would be cleaner if it would stay in a joining state where it only receives writes. B) Disabling join_ring on my new node does nothing. The new node will not have a token. I cant see it in nodetool status. Therefore I assume it will not receive any writes. C) write_survey unfortunetaly does not seem to work for me: My new node, which I start with survey-mode, gets writes from other nodes and shows as joining in the ring. Which is good! But does not get a schema, so it throws exceptions when receiving these writes. I assume its just a bug in 2.0. Disclaimer: I am using C* 2.0, with which I can't get the desire behaviour (or at least I don't know how). kind regards, Christian On Tue, Aug 4, 2015 at 7:12 PM, Robert Coli rc...@eventbrite.com wrote: On Tue, Aug 4, 2015 at 6:19 AM, horschi hors...@gmail.com wrote: I would like to separate these steps: 1. assign tokens 2. have the node in a joining state, so that I can copy in data 3. mark the node as ready Did anyone ever use write_survey for such a partial bootstrapping? What you're asking doesn't make sense to me. What does partial bootstrap mean? Where are you getting the data from? How are you copying in data and why do you need the node to be in a joining state to do that? https://issues.apache.org/jira/browse/CASSANDRA-6961 Explains a method by which you can repair a partially joined node. In what way does this differ from what you want? =Rob
Re: auto_bootstrap=false broken?
On Tue, Aug 4, 2015 at 11:40 AM, horschi hors...@gmail.com wrote: unless you specify auto_bootstrap=false :) ... so why are you doing that? Two experts are confused as to what you're trying to do; why do you think you need to do it? =Rob
Re: Understanding Cassandra Architecture
Thouraya, Assuming that you have started all 4 nodes in step 1, then your cluster is setup and running. Gossip just relays state about the cluster members, not much else. The seed node isn't special in any way, it's just the node you have designated the other nodes to contact if they need to get cluster information. Keep in mind, there are no master nodes or leaders. This is peer-to-peer so each node should have a complete picture of the cluster stored in the system keyspace after joining the cluster. One other thing. You will want to USE my_keyspace; Before CREATE TABLE my_table (text varchar PRIMARY KEY); In your question, the clients will ask the node they connect to for cluster information, but client state is not stored in Cassanda. Patrick On Tue, Aug 4, 2015 at 7:09 AM, Thouraya TH thouray...@gmail.com wrote: Hi all, 1- I have 4 nodes A, B, C, and D (A is the seed node). I have modified seed node address and listen address of each Cassandra.yaml file (I have modified nothing else) 2- Then, from A: CREATE KEYSPACE my_keyspace WITH REPLICATION = {‘class’:’Simple Strategy, ‘replication_factor’:2} CREATE TABLE my_table (text varchar PRIMARY KEY); USE my_keyspace; INSERT INTO my_table (text)VALUES('text1'); Please, I try to understand the architecture of Cassandra and what happen exactly in this case: As I understand: *Step 1:* Create the ring 1) The seed node A bootstrap gossip protocol and wait listening other clients (B, C, D) 2) The other node joins (contacts) seed node and run gossip process (to exchange information across the cluster every second) *Question 1:* at that moment (at this step) the seed node store some details about clients? It broadcasts details to other clients? Each node exchanges information across the cluster every second ? *Step 2:* Creating the key space and Storing rows (replication factor =2) Thank you so much for help. Best Reagrds.
TTLs on tables with *only* primary keys?
I have a table which just has primary keys. basically: create table foo ( sequence bigint, signature text, primary key( sequence, signature ) ) I need these to eventually get GCd however it doesn’t seem to work. If I then run: select ttl(sequence) from foo; I get: Cannot use selection function ttl on PRIMARY KEY part sequence … I get the same thing if I do it on the second column .. (signature). And the value doesn’t seem to be TTLd. What’s the best way to proceed here? -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts
Re: Cassandra Data Stax java driver Snappy Compression library
Janne, A little clarification i found snappy-java-1.0.4.1.jar on class path. But other questions still remain. On Tue, Aug 4, 2015 at 8:24 PM, Sachin Nikam skni...@gmail.com wrote: Janne, Thanks for continuing to take the time to answer my queries. We noticed that write latency (tp99) from Services S1 and S2 is 50% of the write latency (tp99) for Service S3. I also noticed that S1 and S2, which also use astyanax client library also have compress-lzf.jar on their class path. Although the table is defined to use Snappy Compression. Is this compression library or some other transitive dependency pulled in by Astyanax enabling compression of the payload i.e. sent over the wire and account for the difference in tp99? Regards Sachin On Mon, Aug 3, 2015 at 12:14 AM, Janne Jalkanen janne.jalka...@ecyrd.com wrote: Correct. Note that you may lose some performance this way though; in a typical case saving bandwidth by increasing CPU usage is good. However, it always depends on your usecase and whether you’re running your cluster to the max. It’s a good, low-hanging optimization to keep in mind though for production environments, if you choose not to enable compression now. /Janne On 3 Aug 2015, at 08:40, Sachin Nikam skni...@gmail.com wrote: Thanks Janne... To clarify, Service S3 should not run in to any issues and I may choose to not fix the issue? Regards Sachin On Sat, Aug 1, 2015 at 11:50 PM, Janne Jalkanen janne.jalka...@ecyrd.com wrote: No, this just tells that your client (S3 using Datastax driver) cannot communicate to the Cassandra cluster using a compressed protocol, since the necessary libraries are missing on the client side. Servers will still compress the data they receive when they write it to disk. In other words Client - [uncompressed data] - Server - [compressed data] - Disk. To fix, make sure that the Snappy libraries are in the classpath of your S3 service application. As always, there’s no guarantee that this improves your performance, since if your app is already CPU-heavy, the extra CPU overhead of compression *may* be a problem. So measure :-) /Janne On 02 Aug 2015, at 02:17 , Sachin Nikam skni...@gmail.com wrote: I am currently running a Cassandra 1.2 cluster. This cluster has 2 tables i.e. TableA and TableB. TableA is read and written to by Services S1 and S2 which use Astyanax client library. TableB is read and written by Service S3 which uses the datastax java driver 2.1. S3 also reads data from TableA. Both TableA and TableB are defined on the Cassandra nodes to use SnappyCompressor. On start-up service, Service S3 throws the following WARNing messages. The service is able to continue doing its normal operation thereafter ** [main] WARN loggerClass=com.datastax.driver.core.FrameCompressor;Cannot find Snappy class, you should make sure the Snappy library is in the classpath if you intend to use it. Snappy compression will not be available for the protocol. *** My questions are as follows-- #1. Does the compression happen on the cassandra client side or within cassandra server side itself? #2. Does Service S3 need to pull in additional dependencies for Snappy Compressions as mentioned here -- http://stackoverflow.com/questions/21784149/getting-cassandra-connection-error #3. What happens without this additional library not being present on class path of Service S3. Any data that S3 writes to TableB will not be compressed? Regards Sachin
Retrieve all the columnfamily / tables of thrift and CQL from the keyspace in cassandra
Hi, I use hector to manipulate cassandra of version 2.1.8 and want to retrieve all the tables from certain keyspace in an application. I use KeyspaceDefinition.getCfDefs() to retrieve the columnfamily list in a keyspace. However, I found that the getCfDefs() function can just retrieve the columnfamily created by the thrift api such as me.prettyprint.hector.api.Cluster.updateColumnFamily but not table created by CQL such as cqlsh client. Then, how to retrieve all the tables from certain keyspace? I asked the same question on stackoverflow http://stackoverflow.com/questions/31804797/retrieve-all-the-columnfamily-tables-of-thrift-and-cql-from-the-keyspace-in-ca Shuo Chen
Re: Cassandra Data Stax java driver Snappy Compression library
Janne, Thanks for continuing to take the time to answer my queries. We noticed that write latency (tp99) from Services S1 and S2 is 50% of the write latency (tp99) for Service S3. I also noticed that S1 and S2, which also use astyanax client library also have compress-lzf.jar on their class path. Although the table is defined to use Snappy Compression. Is this compression library or some other transitive dependency pulled in by Astyanax enabling compression of the payload i.e. sent over the wire and account for the difference in tp99? Regards Sachin On Mon, Aug 3, 2015 at 12:14 AM, Janne Jalkanen janne.jalka...@ecyrd.com wrote: Correct. Note that you may lose some performance this way though; in a typical case saving bandwidth by increasing CPU usage is good. However, it always depends on your usecase and whether you’re running your cluster to the max. It’s a good, low-hanging optimization to keep in mind though for production environments, if you choose not to enable compression now. /Janne On 3 Aug 2015, at 08:40, Sachin Nikam skni...@gmail.com wrote: Thanks Janne... To clarify, Service S3 should not run in to any issues and I may choose to not fix the issue? Regards Sachin On Sat, Aug 1, 2015 at 11:50 PM, Janne Jalkanen janne.jalka...@ecyrd.com wrote: No, this just tells that your client (S3 using Datastax driver) cannot communicate to the Cassandra cluster using a compressed protocol, since the necessary libraries are missing on the client side. Servers will still compress the data they receive when they write it to disk. In other words Client - [uncompressed data] - Server - [compressed data] - Disk. To fix, make sure that the Snappy libraries are in the classpath of your S3 service application. As always, there’s no guarantee that this improves your performance, since if your app is already CPU-heavy, the extra CPU overhead of compression *may* be a problem. So measure :-) /Janne On 02 Aug 2015, at 02:17 , Sachin Nikam skni...@gmail.com wrote: I am currently running a Cassandra 1.2 cluster. This cluster has 2 tables i.e. TableA and TableB. TableA is read and written to by Services S1 and S2 which use Astyanax client library. TableB is read and written by Service S3 which uses the datastax java driver 2.1. S3 also reads data from TableA. Both TableA and TableB are defined on the Cassandra nodes to use SnappyCompressor. On start-up service, Service S3 throws the following WARNing messages. The service is able to continue doing its normal operation thereafter ** [main] WARN loggerClass=com.datastax.driver.core.FrameCompressor;Cannot find Snappy class, you should make sure the Snappy library is in the classpath if you intend to use it. Snappy compression will not be available for the protocol. *** My questions are as follows-- #1. Does the compression happen on the cassandra client side or within cassandra server side itself? #2. Does Service S3 need to pull in additional dependencies for Snappy Compressions as mentioned here -- http://stackoverflow.com/questions/21784149/getting-cassandra-connection-error #3. What happens without this additional library not being present on class path of Service S3. Any data that S3 writes to TableB will not be compressed? Regards Sachin