Re: setting up prod cluster
I might be misinterpreting you, but it seems you are only using one seed per node. Is there a specific reason for that? A node can have multiple seeds in its seed list. It is my understanding that typically, every node in a cluster has the same seed list. On Sun, Jan 11, 2015 at 10:03 PM, Tim Dunphy bluethu...@gmail.com wrote: Hey all, I've been experimenting with Cassandra on a small scale and in my own sandbox for a while now. I'm pretty used to working with it to get small clusters up and running and gossiping with each other. But I just had a new project at work drop into my lap that requires a NoSQL data store. And the developers have selected... you guessed it! Cassasndra as their back end database. So I'll be asked to setup a 6 node cluster all hosted in one data center. I want to just make sure that I understand the concept of seeds correctly. I think since we'll be dealing with 6 nodes, what I'll want to do is have 2 seeds. And have each seed seeing each other as it's own seed. Then the other 2 nodes in each sub-group will have the IP for it's seed on each of it's cassandra.yml files. Then I'll want to set the replication factor to 5. Since it'll be the total number of nodes -1. I just want to make sure I have all that right. Another thing that will have to happen is that I will need to connect Cassandra into a 4 node ElasticSearch cluster. I think there are a few options for doing that. I've seen names like Titan and Gremlin. And I was wondering if anyone has any recommendations there. And lastly I'd like to point out that I know literally nothing about the data that will be stored there just as of yet. The first meeting about the project will be tomorrow. My manager gave me an advanced heads up about what will be required. Thank you, Tim -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
Re: setting up prod cluster
Hi Tim, replies inline below. On Sun, Jan 11, 2015 at 8:03 PM, Tim Dunphy bluethu...@gmail.com wrote: Hey all, I've been experimenting with Cassandra on a small scale and in my own sandbox for a while now. I'm pretty used to working with it to get small clusters up and running and gossiping with each other. But I just had a new project at work drop into my lap that requires a NoSQL data store. And the developers have selected... you guessed it! Cassasndra as their back end database. So I'll be asked to setup a 6 node cluster all hosted in one data center. I want to just make sure that I understand the concept of seeds correctly. I think since we'll be dealing with 6 nodes, what I'll want to do is have 2 seeds. And have each seed seeing each other as it's own seed. There isn't really a reason to have a seed host exclude itself from its own seeds list. All hosts in a cluster can share a common set of seeds. A typical configuration is to select three hosts from each data center, preferably from three different racks (or AWS availability zones). Then in order for there to be troubles with a new host coming online, all three hosts would have to go offline at the same time. If a host which is coming online can talk to even one seed, it will query that seed to find the rest of the nodes in the cluster. The one thing you *don't* want to do is have a host be in its own seeds list when joining a cluster with existing data (that's a hint that a host should consider itself authoritative on what data it already owns, and will keep that host from bootstrapping, it'll join the cluster immediately without learning anything about the data it's now responsible for). Then the other 2 nodes in each sub-group will have the IP for it's seed on each of it's cassandra.yml files. I'm not really sure what you mean by sub-group here, if all six hosts are in the same datacenter do you maybe mean you're spreading the hosts out across several physical racks (or AWS availability zones)? There might be some cognative dissonance here. Most if not all hosts in your cluster would typically share the same seeds list. Then I'll want to set the replication factor to 5. Since it'll be the total number of nodes -1. I just want to make sure I have all that right. RF=5 isn't necessarily *wrong*, but I have a feeling it's not what you want. RF doesn't usually consider how many nodes are in your cluster, it represents your fault tolerance. Replication Factor says how many times a single piece of data (piece as determined by partition key in the table) is written to your cluster inside of a given datacenter, with each copy going to a different physical host, and preferring to place replicas in different physical racks if it's possible. With RF=5, you can totally lose four nodes and still be able to access all your data (albeit at a read/write consistency level of ONE). You can simultaneously lose two nodes, and most clients (which tend to prefer consistency level of quorum by default) wouldn't even notice. A more common RF is 3, regardless of cluster size. This lets you totally lose two nodes at the same time, and not lose any data. Another thing that will have to happen is that I will need to connect Cassandra into a 4 node ElasticSearch cluster. I think there are a few options for doing that. I've seen names like Titan and Gremlin. And I was wondering if anyone has any recommendations there. I have no first hand experience on that front, but depending on your budget, DataStax Enterprise's integrated Solr might be a better fit (it'll be a lot less work and time). And lastly I'd like to point out that I know literally nothing about the data that will be stored there just as of yet. The first meeting about the project will be tomorrow. My manager gave me an advanced heads up about what will be required. If this is your first Cassandra project, you should understand that effective data modeling for Cassandra focuses very, very heavily on knowing exactly what queries will be performed against the data. CQL looks like SQL, but ad hoc querying isn't practical, and typically you'll write the same business data multiple times in multiple layouts (tables with different partition/clustering keys), once to satisfy each specific query. Some of my business data I write exactly the same data to 6 to 8 tables so I can answer different classes of question. Thank you, Tim -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
setting up prod cluster
Hey all, I've been experimenting with Cassandra on a small scale and in my own sandbox for a while now. I'm pretty used to working with it to get small clusters up and running and gossiping with each other. But I just had a new project at work drop into my lap that requires a NoSQL data store. And the developers have selected... you guessed it! Cassasndra as their back end database. So I'll be asked to setup a 6 node cluster all hosted in one data center. I want to just make sure that I understand the concept of seeds correctly. I think since we'll be dealing with 6 nodes, what I'll want to do is have 2 seeds. And have each seed seeing each other as it's own seed. Then the other 2 nodes in each sub-group will have the IP for it's seed on each of it's cassandra.yml files. Then I'll want to set the replication factor to 5. Since it'll be the total number of nodes -1. I just want to make sure I have all that right. Another thing that will have to happen is that I will need to connect Cassandra into a 4 node ElasticSearch cluster. I think there are a few options for doing that. I've seen names like Titan and Gremlin. And I was wondering if anyone has any recommendations there. And lastly I'd like to point out that I know literally nothing about the data that will be stored there just as of yet. The first meeting about the project will be tomorrow. My manager gave me an advanced heads up about what will be required. Thank you, Tim -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
Re: trouble setting up initial cluster: Host ID collision between active endpoint
Hi Ben, Thanks for the tip I will certainly check it out. I really appreciate the information! Tim On Thu, Jan 24, 2013 at 6:32 PM, Ben Bromhead b...@instaclustr.com wrote: Hi Tim If you want to check out Cassandra on AWS you should also have a look www.instaclustr.com. We are still very much in Beta (so if you come across anything, please let us know), but if you have a few minutes and want to deploy a cluster in just a few clicks I highly recommend trying Instaclustr out. Cheers Ben Bromhead *Instaclustr* On Fri, Jan 25, 2013 at 12:35 AM, Tim Dunphy bluethu...@gmail.com wrote: Cool Thanks for the advice Aaron. I actually did get this working before I read your reply. The trick apparently for me was to use the IP for the first node in the seeds setting of each successive node. But I like the idea of using larges for an hour or so and terminating them for some basic experimentation. Also, thanks for pointing me to the Datastax AMIs I'll be sure to check them out. Tim On Thu, Jan 24, 2013 at 3:45 AM, aaron morton aa...@thelastpickle.comwrote: They both have 0 for their token, and this is stored in their System keyspace. Scrub them and start again. But I found that the tokens that were being generated would require way too much memory Token assignments have nothing to do with memory usage. m1.micro instances You are better off using your laptop than micro instances. For playing around try m1.large and terminate them when not in use. To make life easier use this to make the cluster for you http://www.datastax.com/docs/1.2/install/install_ami Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 24/01/2013, at 5:17 AM, Tim Dunphy bluethu...@gmail.com wrote: Hello list, I really do appreciate the advice I've gotten here as I start building familiarity with Cassandra. Aside from the single node instance I setup for a developer friend, I've just been playing with a single node in a VM on my laptop and playing around with the cassandra-cli and PHP. Well I've decided to setup my first cluster on my amazon ec2 account and I'm running into an issue getting the nodes to gossip. I've set the IP's of 'node01' and 'node02' ec2 instances in their respective listen_address, rpc_address and made sure that the 'cluster_name' on both was in agreement. I believe the problem may be in one of two places: either the seeds or the initial_token setting. For the seeds I have it setup as such. I put the IPs for both machines in the 'seeds' settings for each, thinking this would be how each node would discover each other: - seeds: 10.xxx.xxx.248,10.xxx.xxx.123 Initially I tried the tokengen script that I found in the documentation. But I found that the tokens that were being generated would require way too much memory for the m1.micro instances that I'm experimenting with on the Amazon free tier. And according to the docs in the config it is in some cases ok to leave that field blank. So that's what I did on both instances. Not sure how much/if this matters but I am using the setting - endpoint_snitch: Ec2Snitch Finally, when I start up the first node all goes well. But when I startup the second node I see this exception on both hosts: node1 INFO 11:02:32,231 Listening for thrift clients... INFO 11:02:59,262 Node /10.xxx.xxx.123 is now part of the cluster INFO 11:02:59,268 InetAddress /10.xxx.xxx.123 is now UP ERROR 11:02:59,270 Exception in thread Thread[GossipStage:1,5,main] java.lang.RuntimeException: Host ID collision between active endpoint /10..xxx.248 and /10.xxx.xxx.123 (id=54ce7ccd-1b1d-418e-9861-1c281c078b8f) at org.apache.cassandra.locator.TokenMetadata.updateHostId(TokenMetadata.java:227) at org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1296) at org.apache.cassandra.service.StorageService.onChange(StorageService.java:1157) at org.apache.cassandra.service.StorageService.onJoin(StorageService.java:1895) at org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:805) at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:883) at org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:43) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) And on node02 I see: INFO 11:02:58,817 Starting Messaging Service on port 7000 INFO 11:02:58,835 Using saved token [0] INFO 11:02:58,837 Enqueuing flush of Memtable-local@672636645(84/84 serialized/live bytes, 4 ops) INFO 11:02:58,838 Writing Memtable-local@672636645(84/84
Re: trouble setting up initial cluster: Host ID collision between active endpoint
They both have 0 for their token, and this is stored in their System keyspace. Scrub them and start again. But I found that the tokens that were being generated would require way too much memory Token assignments have nothing to do with memory usage. m1.micro instances You are better off using your laptop than micro instances. For playing around try m1.large and terminate them when not in use. To make life easier use this to make the cluster for you http://www.datastax.com/docs/1.2/install/install_ami Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 24/01/2013, at 5:17 AM, Tim Dunphy bluethu...@gmail.com wrote: Hello list, I really do appreciate the advice I've gotten here as I start building familiarity with Cassandra. Aside from the single node instance I setup for a developer friend, I've just been playing with a single node in a VM on my laptop and playing around with the cassandra-cli and PHP. Well I've decided to setup my first cluster on my amazon ec2 account and I'm running into an issue getting the nodes to gossip. I've set the IP's of 'node01' and 'node02' ec2 instances in their respective listen_address, rpc_address and made sure that the 'cluster_name' on both was in agreement. I believe the problem may be in one of two places: either the seeds or the initial_token setting. For the seeds I have it setup as such. I put the IPs for both machines in the 'seeds' settings for each, thinking this would be how each node would discover each other: - seeds: 10.xxx.xxx.248,10.xxx.xxx.123 Initially I tried the tokengen script that I found in the documentation. But I found that the tokens that were being generated would require way too much memory for the m1.micro instances that I'm experimenting with on the Amazon free tier. And according to the docs in the config it is in some cases ok to leave that field blank. So that's what I did on both instances. Not sure how much/if this matters but I am using the setting - endpoint_snitch: Ec2Snitch Finally, when I start up the first node all goes well. But when I startup the second node I see this exception on both hosts: node1 INFO 11:02:32,231 Listening for thrift clients... INFO 11:02:59,262 Node /10.xxx.xxx.123 is now part of the cluster INFO 11:02:59,268 InetAddress /10.xxx.xxx.123 is now UP ERROR 11:02:59,270 Exception in thread Thread[GossipStage:1,5,main] java.lang.RuntimeException: Host ID collision between active endpoint /10..xxx.248 and /10.xxx.xxx.123 (id=54ce7ccd-1b1d-418e-9861-1c281c078b8f) at org.apache.cassandra.locator.TokenMetadata.updateHostId(TokenMetadata.java:227) at org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1296) at org.apache.cassandra.service.StorageService.onChange(StorageService.java:1157) at org.apache.cassandra.service.StorageService.onJoin(StorageService.java:1895) at org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:805) at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:883) at org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:43) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) And on node02 I see: INFO 11:02:58,817 Starting Messaging Service on port 7000 INFO 11:02:58,835 Using saved token [0] INFO 11:02:58,837 Enqueuing flush of Memtable-local@672636645(84/84 serialized/live bytes, 4 ops) INFO 11:02:58,838 Writing Memtable-local@672636645(84/84 serialized/live bytes, 4 ops) INFO 11:02:58,912 Completed flushing /var/lib/cassandra/data/system/local/system-local-ia-43-Data.db (120 bytes) for commitlog position ReplayPosition(segmentId=1358956977628, position=49266) INFO 11:02:58,922 Enqueuing flush of Memtable-local@1007604537(32/32 serialized/live bytes, 2 ops) INFO 11:02:58,923 Writing Memtable-local@1007604537(32/32 serialized/live bytes, 2 ops) INFO 11:02:58,943 Compacting [SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-ia-40-Data.db'), SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-ia-42-Data.db'), SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-ia-43-Data.db'), SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-ia-41-Data.db')] INFO 11:02:58,953 Node /10.192.179.248 is now part of the cluster INFO 11:02:58,961 InetAddress /10.192.179.248 is now UP INFO 11:02:59,003 Completed flushing /var/lib/cassandra/data/system/local/system-local-ia-44-Data.db (90 bytes) for
Re: trouble setting up initial cluster: Host ID collision between active endpoint
Cool Thanks for the advice Aaron. I actually did get this working before I read your reply. The trick apparently for me was to use the IP for the first node in the seeds setting of each successive node. But I like the idea of using larges for an hour or so and terminating them for some basic experimentation. Also, thanks for pointing me to the Datastax AMIs I'll be sure to check them out. Tim On Thu, Jan 24, 2013 at 3:45 AM, aaron morton aa...@thelastpickle.comwrote: They both have 0 for their token, and this is stored in their System keyspace. Scrub them and start again. But I found that the tokens that were being generated would require way too much memory Token assignments have nothing to do with memory usage. m1.micro instances You are better off using your laptop than micro instances. For playing around try m1.large and terminate them when not in use. To make life easier use this to make the cluster for you http://www.datastax.com/docs/1.2/install/install_ami Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 24/01/2013, at 5:17 AM, Tim Dunphy bluethu...@gmail.com wrote: Hello list, I really do appreciate the advice I've gotten here as I start building familiarity with Cassandra. Aside from the single node instance I setup for a developer friend, I've just been playing with a single node in a VM on my laptop and playing around with the cassandra-cli and PHP. Well I've decided to setup my first cluster on my amazon ec2 account and I'm running into an issue getting the nodes to gossip. I've set the IP's of 'node01' and 'node02' ec2 instances in their respective listen_address, rpc_address and made sure that the 'cluster_name' on both was in agreement. I believe the problem may be in one of two places: either the seeds or the initial_token setting. For the seeds I have it setup as such. I put the IPs for both machines in the 'seeds' settings for each, thinking this would be how each node would discover each other: - seeds: 10.xxx.xxx.248,10.xxx.xxx.123 Initially I tried the tokengen script that I found in the documentation. But I found that the tokens that were being generated would require way too much memory for the m1.micro instances that I'm experimenting with on the Amazon free tier. And according to the docs in the config it is in some cases ok to leave that field blank. So that's what I did on both instances. Not sure how much/if this matters but I am using the setting - endpoint_snitch: Ec2Snitch Finally, when I start up the first node all goes well. But when I startup the second node I see this exception on both hosts: node1 INFO 11:02:32,231 Listening for thrift clients... INFO 11:02:59,262 Node /10.xxx.xxx.123 is now part of the cluster INFO 11:02:59,268 InetAddress /10.xxx.xxx.123 is now UP ERROR 11:02:59,270 Exception in thread Thread[GossipStage:1,5,main] java.lang.RuntimeException: Host ID collision between active endpoint /10..xxx.248 and /10.xxx.xxx.123 (id=54ce7ccd-1b1d-418e-9861-1c281c078b8f) at org.apache.cassandra.locator.TokenMetadata.updateHostId(TokenMetadata.java:227) at org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1296) at org.apache.cassandra.service.StorageService.onChange(StorageService.java:1157) at org.apache.cassandra.service.StorageService.onJoin(StorageService.java:1895) at org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:805) at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:883) at org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:43) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) And on node02 I see: INFO 11:02:58,817 Starting Messaging Service on port 7000 INFO 11:02:58,835 Using saved token [0] INFO 11:02:58,837 Enqueuing flush of Memtable-local@672636645(84/84 serialized/live bytes, 4 ops) INFO 11:02:58,838 Writing Memtable-local@672636645(84/84 serialized/live bytes, 4 ops) INFO 11:02:58,912 Completed flushing /var/lib/cassandra/data/system/local/system-local-ia-43-Data.db (120 bytes) for commitlog position ReplayPosition(segmentId=1358956977628, position=49266) INFO 11:02:58,922 Enqueuing flush of Memtable-local@1007604537(32/32 serialized/live bytes, 2 ops) INFO 11:02:58,923 Writing Memtable-local@1007604537(32/32 serialized/live bytes, 2 ops) INFO 11:02:58,943 Compacting [SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-ia-40-Data.db'),
Re: trouble setting up initial cluster: Host ID collision between active endpoint
Hi Tim If you want to check out Cassandra on AWS you should also have a look www.instaclustr.com. We are still very much in Beta (so if you come across anything, please let us know), but if you have a few minutes and want to deploy a cluster in just a few clicks I highly recommend trying Instaclustr out. Cheers Ben Bromhead *Instaclustr* On Fri, Jan 25, 2013 at 12:35 AM, Tim Dunphy bluethu...@gmail.com wrote: Cool Thanks for the advice Aaron. I actually did get this working before I read your reply. The trick apparently for me was to use the IP for the first node in the seeds setting of each successive node. But I like the idea of using larges for an hour or so and terminating them for some basic experimentation. Also, thanks for pointing me to the Datastax AMIs I'll be sure to check them out. Tim On Thu, Jan 24, 2013 at 3:45 AM, aaron morton aa...@thelastpickle.comwrote: They both have 0 for their token, and this is stored in their System keyspace. Scrub them and start again. But I found that the tokens that were being generated would require way too much memory Token assignments have nothing to do with memory usage. m1.micro instances You are better off using your laptop than micro instances. For playing around try m1.large and terminate them when not in use. To make life easier use this to make the cluster for you http://www.datastax.com/docs/1.2/install/install_ami Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 24/01/2013, at 5:17 AM, Tim Dunphy bluethu...@gmail.com wrote: Hello list, I really do appreciate the advice I've gotten here as I start building familiarity with Cassandra. Aside from the single node instance I setup for a developer friend, I've just been playing with a single node in a VM on my laptop and playing around with the cassandra-cli and PHP. Well I've decided to setup my first cluster on my amazon ec2 account and I'm running into an issue getting the nodes to gossip. I've set the IP's of 'node01' and 'node02' ec2 instances in their respective listen_address, rpc_address and made sure that the 'cluster_name' on both was in agreement. I believe the problem may be in one of two places: either the seeds or the initial_token setting. For the seeds I have it setup as such. I put the IPs for both machines in the 'seeds' settings for each, thinking this would be how each node would discover each other: - seeds: 10.xxx.xxx.248,10.xxx.xxx.123 Initially I tried the tokengen script that I found in the documentation. But I found that the tokens that were being generated would require way too much memory for the m1.micro instances that I'm experimenting with on the Amazon free tier. And according to the docs in the config it is in some cases ok to leave that field blank. So that's what I did on both instances. Not sure how much/if this matters but I am using the setting - endpoint_snitch: Ec2Snitch Finally, when I start up the first node all goes well. But when I startup the second node I see this exception on both hosts: node1 INFO 11:02:32,231 Listening for thrift clients... INFO 11:02:59,262 Node /10.xxx.xxx.123 is now part of the cluster INFO 11:02:59,268 InetAddress /10.xxx.xxx.123 is now UP ERROR 11:02:59,270 Exception in thread Thread[GossipStage:1,5,main] java.lang.RuntimeException: Host ID collision between active endpoint /10..xxx.248 and /10.xxx.xxx.123 (id=54ce7ccd-1b1d-418e-9861-1c281c078b8f) at org.apache.cassandra.locator.TokenMetadata.updateHostId(TokenMetadata.java:227) at org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1296) at org.apache.cassandra.service.StorageService.onChange(StorageService.java:1157) at org.apache.cassandra.service.StorageService.onJoin(StorageService.java:1895) at org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:805) at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:883) at org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:43) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) And on node02 I see: INFO 11:02:58,817 Starting Messaging Service on port 7000 INFO 11:02:58,835 Using saved token [0] INFO 11:02:58,837 Enqueuing flush of Memtable-local@672636645(84/84 serialized/live bytes, 4 ops) INFO 11:02:58,838 Writing Memtable-local@672636645(84/84 serialized/live bytes, 4 ops) INFO 11:02:58,912 Completed flushing /var/lib/cassandra/data/system/local/system-local-ia-43-Data.db (120 bytes) for commitlog position
Setting up a cluster
I am new to Cassandra, and setting up a cluster for the first time with 1.1.1. There are three nodes, 1 acts as a seed node that all three have the ip address of that node as their seed. I have set the listen address to the address of each node and rpc address as 0.0.0.0. I turned the trace on on all three and see the GOSSIP messages between seed node and the other two, not between the two non-seed nodes, sometime I see connection timeout between seed node and the nodes but not very often; However nodetool -h address ring only shows one node in each machine (the localhost) a,d when I define a keyspace with any replication factor the begin and end token of the keyspace is the localhost token. P.S. i have generated tokens for each node. What did I miss here? Thanks Shahryar Sedghi -- Life is what happens while you are making other plans. ~ John Lennon
Re: Setting up a cluster
Did you set the cluster name to be the same ? Check the logs on the machines for errors or warnings. Finally check that each node can telnet to port 7000 on the others. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 19/06/2012, at 6:29 AM, Shahryar Sedghi wrote: I am new to Cassandra, and setting up a cluster for the first time with 1.1.1. There are three nodes, 1 acts as a seed node that all three have the ip address of that node as their seed. I have set the listen address to the address of each node and rpc address as 0.0.0.0. I turned the trace on on all three and see the GOSSIP messages between seed node and the other two, not between the two non-seed nodes, sometime I see connection timeout between seed node and the nodes but not very often; However nodetool -h address ring only shows one node in each machine (the localhost) a,d when I define a keyspace with any replication factor the begin and end token of the keyspace is the localhost token. P.S. i have generated tokens for each node. What did I miss here? Thanks Shahryar Sedghi -- Life is what happens while you are making other plans. ~ John Lennon
Re: Setting up a cluster
I did all you said. No errors and warnings. On Mon, Jun 18, 2012 at 2:31 PM, aaron morton aa...@thelastpickle.comwrote: Did you set the cluster name to be the same ? Check the logs on the machines for errors or warnings. Finally check that each node can telnet to port 7000 on the others. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 19/06/2012, at 6:29 AM, Shahryar Sedghi wrote: I am new to Cassandra, and setting up a cluster for the first time with 1.1.1. There are three nodes, 1 acts as a seed node that all three have the ip address of that node as their seed. I have set the listen address to the address of each node and rpc address as 0.0.0.0. I turned the trace on on all three and see the GOSSIP messages between seed node and the other two, not between the two non-seed nodes, sometime I see connection timeout between seed node and the nodes but not very often; However nodetool -h address ring only shows one node in each machine (the localhost) a,d when I define a keyspace with any replication factor the begin and end token of the keyspace is the localhost token. P.S. i have generated tokens for each node. What did I miss here? Thanks Shahryar Sedghi -- Life is what happens while you are making other plans. ~ John Lennon -- Life is what happens while you are making other plans. ~ John Lennon
Re: Setting up a cluster
Are you sure all your settings are perfect. If so, then plz follow this steps ./nodetool disablethrift ./nodetool disablegossip ./nodetool drain stop the service and then delete the all data, saved_caches and commitlog files. Then restart your service. Repeat these steps for all the nodes. I hope it will work. Regards, -- Abhijit Chanda VeHere Interactive Pvt. Ltd. +91-974395
Re: setting up a cluster
This page may help http://wiki.apache.org/cassandra/MultinodeClusterit goes through the settings to change in storage-config.xmlHave not used it on EC2 so cannot help there. AaronOn 22 Jul, 2010,at 08:34 AM, S Ahmed sahmed1...@gmail.com wrote:Is this the onlydocumentationon startup up a cluster?http://wiki.apache.org/cassandra/GettingStartedI got a single node up, pretty straight forward. But have no idea how to setup a cluster.1. Once I start a 2nd node, how do I tell it about the other nodes in the network?2. Do I inform all other nodes via the nodetool or is there a config file? 3. when you have nodes in other ec2 zones, how do you setup security? can you setup firewall rules when nodes are in different zones?thanks allot!