Re: setting up prod cluster

2015-01-12 Thread Philip Thompson
I might be misinterpreting you, but it seems you are only using one seed
per node. Is there a specific reason for that? A node can have multiple
seeds in its seed list. It is my understanding that typically, every node
in a cluster has the same seed list.

On Sun, Jan 11, 2015 at 10:03 PM, Tim Dunphy bluethu...@gmail.com wrote:

 Hey all,

  I've been experimenting with Cassandra on a small scale and in my own
 sandbox for a while now. I'm pretty used to working with it to get small
 clusters up and running and gossiping with each other.

 But I just had a new project at work drop into my lap that requires a
 NoSQL data store. And the developers have selected... you guessed it!
 Cassasndra as their back end database.

 So I'll be asked to setup a 6 node cluster all hosted in one data center.
 I want to just make sure that I understand the concept of seeds correctly.
 I think since we'll be dealing with 6 nodes, what I'll want to do is have 2
 seeds. And have each seed seeing each other as it's own seed.

 Then the other 2 nodes in each sub-group will have the IP for it's seed on
 each of it's cassandra.yml files.

 Then I'll want to set the replication factor to 5. Since it'll be the
 total number of nodes -1. I just want to make sure I have all that right.

 Another thing that will have to happen is that I will need to connect
 Cassandra into a 4 node ElasticSearch cluster. I think there are a few
 options for doing that. I've seen names like Titan and Gremlin. And I was
 wondering if anyone has any recommendations there.

 And lastly I'd like to point out that I know literally nothing about the
 data that will be stored there just as of yet. The first meeting about the
 project will be tomorrow. My manager gave me an advanced heads up about
 what will be required.

 Thank you,
 Tim

 --
 GPG me!!

 gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B




Re: setting up prod cluster

2015-01-12 Thread Eric Stevens
Hi Tim, replies inline below.

On Sun, Jan 11, 2015 at 8:03 PM, Tim Dunphy bluethu...@gmail.com wrote:

 Hey all,

  I've been experimenting with Cassandra on a small scale and in my own
 sandbox for a while now. I'm pretty used to working with it to get small
 clusters up and running and gossiping with each other.

 But I just had a new project at work drop into my lap that requires a
 NoSQL data store. And the developers have selected... you guessed it!
 Cassasndra as their back end database.

 So I'll be asked to setup a 6 node cluster all hosted in one data center.
 I want to just make sure that I understand the concept of seeds correctly.
 I think since we'll be dealing with 6 nodes, what I'll want to do is have 2
 seeds. And have each seed seeing each other as it's own seed.


There isn't really a reason to have a seed host exclude itself from its own
seeds list.  All hosts in a cluster can share a common set of seeds.  A
typical configuration is to select three hosts from each data center,
preferably from three different racks (or AWS availability zones).  Then in
order for there to be troubles with a new host coming online, all three
hosts would have to go offline at the same time.  If a host which is coming
online can talk to even one seed, it will query that seed to find the rest
of the nodes in the cluster.

The one thing you *don't* want to do is have a host be in its own seeds
list when joining a cluster with existing data (that's a hint that a host
should consider itself authoritative on what data it already owns, and will
keep that host from bootstrapping, it'll join the cluster immediately
without learning anything about the data it's now responsible for).


 Then the other 2 nodes in each sub-group will have the IP for it's seed on
 each of it's cassandra.yml files.


I'm not really sure what you mean by sub-group here, if all six hosts are
in the same datacenter do you maybe mean you're spreading the hosts out
across several physical racks (or AWS availability zones)?  There might be
some cognative dissonance here.  Most if not all hosts in your cluster
would typically share the same seeds list.


 Then I'll want to set the replication factor to 5. Since it'll be the
 total number of nodes -1. I just want to make sure I have all that right.


RF=5 isn't necessarily *wrong*, but I have a feeling it's not what you
want.  RF doesn't usually consider how many nodes are in your cluster, it
represents your fault tolerance.

Replication Factor says how many times a single piece of data (piece as
determined by partition key in the table) is written to your cluster inside
of a given datacenter, with each copy going to a different physical host,
and preferring to place replicas in different physical racks if it's
possible. With RF=5, you can totally lose four nodes and still be able to
access all your data (albeit at a read/write consistency level of ONE).
You can simultaneously lose two nodes, and most clients (which tend to
prefer consistency level of quorum by default) wouldn't even notice.  A
more common RF is 3, regardless of cluster size.  This lets you totally
lose two nodes at the same time, and not lose any data.


 Another thing that will have to happen is that I will need to connect
 Cassandra into a 4 node ElasticSearch cluster. I think there are a few
 options for doing that. I've seen names like Titan and Gremlin. And I was
 wondering if anyone has any recommendations there.


I have no first hand experience on that front, but depending on your
budget, DataStax Enterprise's integrated Solr might be a better fit (it'll
be a lot less work and time).


 And lastly I'd like to point out that I know literally nothing about the
 data that will be stored there just as of yet. The first meeting about the
 project will be tomorrow. My manager gave me an advanced heads up about
 what will be required.


If this is your first Cassandra project, you should understand that
effective data modeling for Cassandra focuses very, very heavily on knowing
exactly what queries will be performed against the data.  CQL looks like
SQL, but ad hoc querying isn't practical, and typically you'll write the
same business data multiple times in multiple layouts (tables with
different partition/clustering keys), once to satisfy each specific query.
Some of my business data I write exactly the same data to 6 to 8 tables so
I can answer different classes of question.


 Thank you,
 Tim

 --
 GPG me!!

 gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B




setting up prod cluster

2015-01-11 Thread Tim Dunphy
Hey all,

 I've been experimenting with Cassandra on a small scale and in my own
sandbox for a while now. I'm pretty used to working with it to get small
clusters up and running and gossiping with each other.

But I just had a new project at work drop into my lap that requires a NoSQL
data store. And the developers have selected... you guessed it! Cassasndra
as their back end database.

So I'll be asked to setup a 6 node cluster all hosted in one data center. I
want to just make sure that I understand the concept of seeds correctly. I
think since we'll be dealing with 6 nodes, what I'll want to do is have 2
seeds. And have each seed seeing each other as it's own seed.

Then the other 2 nodes in each sub-group will have the IP for it's seed on
each of it's cassandra.yml files.

Then I'll want to set the replication factor to 5. Since it'll be the total
number of nodes -1. I just want to make sure I have all that right.

Another thing that will have to happen is that I will need to connect
Cassandra into a 4 node ElasticSearch cluster. I think there are a few
options for doing that. I've seen names like Titan and Gremlin. And I was
wondering if anyone has any recommendations there.

And lastly I'd like to point out that I know literally nothing about the
data that will be stored there just as of yet. The first meeting about the
project will be tomorrow. My manager gave me an advanced heads up about
what will be required.

Thank you,
Tim

-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B


Re: trouble setting up initial cluster: Host ID collision between active endpoint

2013-01-25 Thread Tim Dunphy
Hi Ben,

 Thanks for the tip I will certainly check it out. I really appreciate the
information!

Tim

On Thu, Jan 24, 2013 at 6:32 PM, Ben Bromhead b...@instaclustr.com wrote:

 Hi Tim

 If you want to check out Cassandra on AWS you should also have a look
 www.instaclustr.com.

 We are still very much in Beta (so if you come across anything, please let
 us know), but if you have a few minutes and want to deploy a cluster in
 just a few clicks I highly recommend trying Instaclustr out.

 Cheers

 Ben Bromhead
 *Instaclustr*


 On Fri, Jan 25, 2013 at 12:35 AM, Tim Dunphy bluethu...@gmail.com wrote:

 Cool Thanks for the advice Aaron. I actually did get this working before
 I read your reply. The trick apparently for me was to use the IP for the
 first node in the seeds setting of each successive node. But I like the
 idea of using larges for an hour or so and terminating them for some basic
 experimentation.  Also, thanks for pointing me to the Datastax AMIs I'll be
 sure to check them out.

 Tim


 On Thu, Jan 24, 2013 at 3:45 AM, aaron morton aa...@thelastpickle.comwrote:

 They both have 0 for their token, and this is stored in their System
 keyspace.
 Scrub them and start again.

 But I found that the tokens that were being generated would require way
 too much memory

 Token assignments have nothing to do with memory usage.

  m1.micro instances

 You are better off using your laptop than micro instances.
 For playing around try m1.large and terminate them when not in use.
 To make life easier use this to make the cluster for you
 http://www.datastax.com/docs/1.2/install/install_ami

 Cheers

-
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 24/01/2013, at 5:17 AM, Tim Dunphy bluethu...@gmail.com wrote:

 Hello list,

  I really do appreciate the advice I've gotten here as I start building
 familiarity with Cassandra. Aside from the single node instance I setup for
 a developer friend, I've just been playing with a single node in a VM on my
 laptop and playing around with the cassandra-cli and PHP.

 Well I've decided to setup my first cluster on my amazon ec2 account and
 I'm running into an issue getting the nodes to gossip.

 I've set the IP's of 'node01' and 'node02' ec2 instances in their
 respective listen_address, rpc_address and made sure that the
 'cluster_name' on both was in agreement.

  I believe the problem may be in one of two places: either the seeds or
 the initial_token setting.

 For the seeds I have it setup as such. I put the IPs for both machines
 in the 'seeds' settings for each, thinking this would be how each node
 would discover each other:

  - seeds: 10.xxx.xxx.248,10.xxx.xxx.123

 Initially I tried the tokengen script that I found in the documentation.
 But I found that the tokens that were being generated would require way too
 much memory for the m1.micro instances that I'm experimenting with on the
 Amazon free tier. And according to the docs in the config it is in some
 cases ok to leave that field blank. So that's what I did on both instances.

 Not sure how much/if this matters but I am using the setting -
 endpoint_snitch: Ec2Snitch

 Finally, when I start up the first node all goes well.

 But when I startup the second node I see this exception on both hosts:

 node1

 INFO 11:02:32,231 Listening for thrift clients...
  INFO 11:02:59,262 Node /10.xxx.xxx.123 is now part of the cluster
  INFO 11:02:59,268 InetAddress /10.xxx.xxx.123 is now UP
 ERROR 11:02:59,270 Exception in thread Thread[GossipStage:1,5,main]
 java.lang.RuntimeException: Host ID collision between active endpoint
 /10..xxx.248 and /10.xxx.xxx.123
 (id=54ce7ccd-1b1d-418e-9861-1c281c078b8f)
 at
 org.apache.cassandra.locator.TokenMetadata.updateHostId(TokenMetadata.java:227)
 at
 org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1296)
 at
 org.apache.cassandra.service.StorageService.onChange(StorageService.java:1157)
 at
 org.apache.cassandra.service.StorageService.onJoin(StorageService.java:1895)
 at
 org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:805)
 at
 org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:883)
 at
 org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:43)
 at
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
 Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
 Source)
 at java.lang.Thread.run(Unknown Source)

 And on node02 I see:

  INFO 11:02:58,817 Starting Messaging Service on port 7000
  INFO 11:02:58,835 Using saved token [0]
  INFO 11:02:58,837 Enqueuing flush of Memtable-local@672636645(84/84
 serialized/live bytes, 4 ops)
  INFO 11:02:58,838 Writing Memtable-local@672636645(84/84
 

Re: trouble setting up initial cluster: Host ID collision between active endpoint

2013-01-24 Thread aaron morton
They both have 0 for their token, and this is stored in their System keyspace. 
Scrub them and start again. 

 But I found that the tokens that were being generated would require way too 
 much memory
Token assignments have nothing to do with memory usage. 

  m1.micro instances
You are better off using your laptop than micro instances. 
For playing around try m1.large and terminate them when not in use. 
To make life easier use this to make the cluster for you 
http://www.datastax.com/docs/1.2/install/install_ami

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 24/01/2013, at 5:17 AM, Tim Dunphy bluethu...@gmail.com wrote:

 Hello list,
 
  I really do appreciate the advice I've gotten here as I start building 
 familiarity with Cassandra. Aside from the single node instance I setup for a 
 developer friend, I've just been playing with a single node in a VM on my 
 laptop and playing around with the cassandra-cli and PHP.
 
 Well I've decided to setup my first cluster on my amazon ec2 account and I'm 
 running into an issue getting the nodes to gossip. 
 
 I've set the IP's of 'node01' and 'node02' ec2 instances in their respective 
 listen_address, rpc_address and made sure that the 'cluster_name' on both was 
 in agreement.
 
  I believe the problem may be in one of two places: either the seeds or the 
 initial_token setting. 
 
 For the seeds I have it setup as such. I put the IPs for both machines in the 
 'seeds' settings for each, thinking this would be how each node would 
 discover each other:
 
  - seeds: 10.xxx.xxx.248,10.xxx.xxx.123
 
 Initially I tried the tokengen script that I found in the documentation. But 
 I found that the tokens that were being generated would require way too much 
 memory for the m1.micro instances that I'm experimenting with on the Amazon 
 free tier. And according to the docs in the config it is in some cases ok to 
 leave that field blank. So that's what I did on both instances. 
 
 Not sure how much/if this matters but I am using the setting - 
 endpoint_snitch: Ec2Snitch
 
 Finally, when I start up the first node all goes well.
 
 But when I startup the second node I see this exception on both hosts:
 
 node1
 
 INFO 11:02:32,231 Listening for thrift clients...
  INFO 11:02:59,262 Node /10.xxx.xxx.123 is now part of the cluster
  INFO 11:02:59,268 InetAddress /10.xxx.xxx.123 is now UP
 ERROR 11:02:59,270 Exception in thread Thread[GossipStage:1,5,main]
 java.lang.RuntimeException: Host ID collision between active endpoint 
 /10..xxx.248 and /10.xxx.xxx.123 (id=54ce7ccd-1b1d-418e-9861-1c281c078b8f)
 at 
 org.apache.cassandra.locator.TokenMetadata.updateHostId(TokenMetadata.java:227)
 at 
 org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1296)
 at 
 org.apache.cassandra.service.StorageService.onChange(StorageService.java:1157)
 at 
 org.apache.cassandra.service.StorageService.onJoin(StorageService.java:1895)
 at 
 org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:805)
 at 
 org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:883)
 at 
 org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:43)
 at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.lang.Thread.run(Unknown Source)
 
 And on node02 I see:
 
  INFO 11:02:58,817 Starting Messaging Service on port 7000
  INFO 11:02:58,835 Using saved token [0]
  INFO 11:02:58,837 Enqueuing flush of Memtable-local@672636645(84/84 
 serialized/live bytes, 4 ops)
  INFO 11:02:58,838 Writing Memtable-local@672636645(84/84 serialized/live 
 bytes, 4 ops)
  INFO 11:02:58,912 Completed flushing 
 /var/lib/cassandra/data/system/local/system-local-ia-43-Data.db (120 bytes) 
 for commitlog position ReplayPosition(segmentId=1358956977628, position=49266)
  INFO 11:02:58,922 Enqueuing flush of Memtable-local@1007604537(32/32 
 serialized/live bytes, 2 ops)
  INFO 11:02:58,923 Writing Memtable-local@1007604537(32/32 serialized/live 
 bytes, 2 ops)
  INFO 11:02:58,943 Compacting 
 [SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-ia-40-Data.db'),
  
 SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-ia-42-Data.db'),
  
 SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-ia-43-Data.db'),
  
 SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-ia-41-Data.db')]
  INFO 11:02:58,953 Node /10.192.179.248 is now part of the cluster
  INFO 11:02:58,961 InetAddress /10.192.179.248 is now UP
  INFO 11:02:59,003 Completed flushing 
 /var/lib/cassandra/data/system/local/system-local-ia-44-Data.db (90 bytes) 
 for 

Re: trouble setting up initial cluster: Host ID collision between active endpoint

2013-01-24 Thread Tim Dunphy
Cool Thanks for the advice Aaron. I actually did get this working before I
read your reply. The trick apparently for me was to use the IP for the
first node in the seeds setting of each successive node. But I like the
idea of using larges for an hour or so and terminating them for some basic
experimentation.  Also, thanks for pointing me to the Datastax AMIs I'll be
sure to check them out.

Tim

On Thu, Jan 24, 2013 at 3:45 AM, aaron morton aa...@thelastpickle.comwrote:

 They both have 0 for their token, and this is stored in their System
 keyspace.
 Scrub them and start again.

 But I found that the tokens that were being generated would require way
 too much memory

 Token assignments have nothing to do with memory usage.

  m1.micro instances

 You are better off using your laptop than micro instances.
 For playing around try m1.large and terminate them when not in use.
 To make life easier use this to make the cluster for you
 http://www.datastax.com/docs/1.2/install/install_ami

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 24/01/2013, at 5:17 AM, Tim Dunphy bluethu...@gmail.com wrote:

 Hello list,

  I really do appreciate the advice I've gotten here as I start building
 familiarity with Cassandra. Aside from the single node instance I setup for
 a developer friend, I've just been playing with a single node in a VM on my
 laptop and playing around with the cassandra-cli and PHP.

 Well I've decided to setup my first cluster on my amazon ec2 account and
 I'm running into an issue getting the nodes to gossip.

 I've set the IP's of 'node01' and 'node02' ec2 instances in their
 respective listen_address, rpc_address and made sure that the
 'cluster_name' on both was in agreement.

  I believe the problem may be in one of two places: either the seeds or
 the initial_token setting.

 For the seeds I have it setup as such. I put the IPs for both machines in
 the 'seeds' settings for each, thinking this would be how each node would
 discover each other:

  - seeds: 10.xxx.xxx.248,10.xxx.xxx.123

 Initially I tried the tokengen script that I found in the documentation.
 But I found that the tokens that were being generated would require way too
 much memory for the m1.micro instances that I'm experimenting with on the
 Amazon free tier. And according to the docs in the config it is in some
 cases ok to leave that field blank. So that's what I did on both instances.

 Not sure how much/if this matters but I am using the setting -
 endpoint_snitch: Ec2Snitch

 Finally, when I start up the first node all goes well.

 But when I startup the second node I see this exception on both hosts:

 node1

 INFO 11:02:32,231 Listening for thrift clients...
  INFO 11:02:59,262 Node /10.xxx.xxx.123 is now part of the cluster
  INFO 11:02:59,268 InetAddress /10.xxx.xxx.123 is now UP
 ERROR 11:02:59,270 Exception in thread Thread[GossipStage:1,5,main]
 java.lang.RuntimeException: Host ID collision between active endpoint
 /10..xxx.248 and /10.xxx.xxx.123
 (id=54ce7ccd-1b1d-418e-9861-1c281c078b8f)
 at
 org.apache.cassandra.locator.TokenMetadata.updateHostId(TokenMetadata.java:227)
 at
 org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1296)
 at
 org.apache.cassandra.service.StorageService.onChange(StorageService.java:1157)
 at
 org.apache.cassandra.service.StorageService.onJoin(StorageService.java:1895)
 at
 org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:805)
 at
 org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:883)
 at
 org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:43)
 at
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
 Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
 Source)
 at java.lang.Thread.run(Unknown Source)

 And on node02 I see:

  INFO 11:02:58,817 Starting Messaging Service on port 7000
  INFO 11:02:58,835 Using saved token [0]
  INFO 11:02:58,837 Enqueuing flush of Memtable-local@672636645(84/84
 serialized/live bytes, 4 ops)
  INFO 11:02:58,838 Writing Memtable-local@672636645(84/84 serialized/live
 bytes, 4 ops)
  INFO 11:02:58,912 Completed flushing
 /var/lib/cassandra/data/system/local/system-local-ia-43-Data.db (120 bytes)
 for commitlog position ReplayPosition(segmentId=1358956977628,
 position=49266)
  INFO 11:02:58,922 Enqueuing flush of Memtable-local@1007604537(32/32
 serialized/live bytes, 2 ops)
  INFO 11:02:58,923 Writing Memtable-local@1007604537(32/32
 serialized/live bytes, 2 ops)
  INFO 11:02:58,943 Compacting
 [SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-ia-40-Data.db'),
 

Re: trouble setting up initial cluster: Host ID collision between active endpoint

2013-01-24 Thread Ben Bromhead
Hi Tim

If you want to check out Cassandra on AWS you should also have a look
www.instaclustr.com.

We are still very much in Beta (so if you come across anything, please let
us know), but if you have a few minutes and want to deploy a cluster in
just a few clicks I highly recommend trying Instaclustr out.

Cheers

Ben Bromhead
*Instaclustr*

On Fri, Jan 25, 2013 at 12:35 AM, Tim Dunphy bluethu...@gmail.com wrote:

 Cool Thanks for the advice Aaron. I actually did get this working before I
 read your reply. The trick apparently for me was to use the IP for the
 first node in the seeds setting of each successive node. But I like the
 idea of using larges for an hour or so and terminating them for some basic
 experimentation.  Also, thanks for pointing me to the Datastax AMIs I'll be
 sure to check them out.

 Tim


 On Thu, Jan 24, 2013 at 3:45 AM, aaron morton aa...@thelastpickle.comwrote:

 They both have 0 for their token, and this is stored in their System
 keyspace.
 Scrub them and start again.

 But I found that the tokens that were being generated would require way
 too much memory

 Token assignments have nothing to do with memory usage.

  m1.micro instances

 You are better off using your laptop than micro instances.
 For playing around try m1.large and terminate them when not in use.
 To make life easier use this to make the cluster for you
 http://www.datastax.com/docs/1.2/install/install_ami

 Cheers

-
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 24/01/2013, at 5:17 AM, Tim Dunphy bluethu...@gmail.com wrote:

 Hello list,

  I really do appreciate the advice I've gotten here as I start building
 familiarity with Cassandra. Aside from the single node instance I setup for
 a developer friend, I've just been playing with a single node in a VM on my
 laptop and playing around with the cassandra-cli and PHP.

 Well I've decided to setup my first cluster on my amazon ec2 account and
 I'm running into an issue getting the nodes to gossip.

 I've set the IP's of 'node01' and 'node02' ec2 instances in their
 respective listen_address, rpc_address and made sure that the
 'cluster_name' on both was in agreement.

  I believe the problem may be in one of two places: either the seeds or
 the initial_token setting.

 For the seeds I have it setup as such. I put the IPs for both machines in
 the 'seeds' settings for each, thinking this would be how each node would
 discover each other:

  - seeds: 10.xxx.xxx.248,10.xxx.xxx.123

 Initially I tried the tokengen script that I found in the documentation.
 But I found that the tokens that were being generated would require way too
 much memory for the m1.micro instances that I'm experimenting with on the
 Amazon free tier. And according to the docs in the config it is in some
 cases ok to leave that field blank. So that's what I did on both instances.

 Not sure how much/if this matters but I am using the setting -
 endpoint_snitch: Ec2Snitch

 Finally, when I start up the first node all goes well.

 But when I startup the second node I see this exception on both hosts:

 node1

 INFO 11:02:32,231 Listening for thrift clients...
  INFO 11:02:59,262 Node /10.xxx.xxx.123 is now part of the cluster
  INFO 11:02:59,268 InetAddress /10.xxx.xxx.123 is now UP
 ERROR 11:02:59,270 Exception in thread Thread[GossipStage:1,5,main]
 java.lang.RuntimeException: Host ID collision between active endpoint
 /10..xxx.248 and /10.xxx.xxx.123
 (id=54ce7ccd-1b1d-418e-9861-1c281c078b8f)
 at
 org.apache.cassandra.locator.TokenMetadata.updateHostId(TokenMetadata.java:227)
 at
 org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1296)
 at
 org.apache.cassandra.service.StorageService.onChange(StorageService.java:1157)
 at
 org.apache.cassandra.service.StorageService.onJoin(StorageService.java:1895)
 at
 org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:805)
 at
 org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:883)
 at
 org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:43)
 at
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
 Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
 Source)
 at java.lang.Thread.run(Unknown Source)

 And on node02 I see:

  INFO 11:02:58,817 Starting Messaging Service on port 7000
  INFO 11:02:58,835 Using saved token [0]
  INFO 11:02:58,837 Enqueuing flush of Memtable-local@672636645(84/84
 serialized/live bytes, 4 ops)
  INFO 11:02:58,838 Writing Memtable-local@672636645(84/84
 serialized/live bytes, 4 ops)
  INFO 11:02:58,912 Completed flushing
 /var/lib/cassandra/data/system/local/system-local-ia-43-Data.db (120 bytes)
 for commitlog position 

Setting up a cluster

2012-06-18 Thread Shahryar Sedghi
I am new to Cassandra, and setting up a cluster for the first time with
1.1.1. There are three nodes, 1 acts as a seed node that all three have the
ip address of that node as their seed. I have set the listen address to the
address of each node and rpc address as 0.0.0.0. I turned the trace on on
all three and see the GOSSIP messages between seed node and the other two,
not between the two non-seed nodes, sometime I see connection timeout
between seed node and the nodes but not very often; However nodetool -h
address ring only shows one node in each machine (the localhost) a,d when I
define a keyspace with any replication factor the begin and end token of
the keyspace is the localhost token.

P.S. i have generated tokens for each node.

What did I miss here?

Thanks

Shahryar Sedghi

-- 
Life is what happens while you are making other plans. ~ John Lennon


Re: Setting up a cluster

2012-06-18 Thread aaron morton
Did you set the cluster name to be the same ?

Check the logs on the machines for errors or warnings. 

Finally check that each node can telnet to port 7000 on the others. 

Cheers
 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 19/06/2012, at 6:29 AM, Shahryar Sedghi wrote:

 I am new to Cassandra, and setting up a cluster for the first time with 
 1.1.1. There are three nodes, 1 acts as a seed node that all three have the 
 ip address of that node as their seed. I have set the listen address to the 
 address of each node and rpc address as 0.0.0.0. I turned the trace on on all 
 three and see the GOSSIP messages between seed node and the other two, not 
 between the two non-seed nodes, sometime I see connection timeout between 
 seed node and the nodes but not very often; However nodetool -h address ring 
 only shows one node in each machine (the localhost) a,d when I define a 
 keyspace with any replication factor the begin and end token of the keyspace 
 is the localhost token.
 
 P.S. i have generated tokens for each node.
 
 What did I miss here?
 
 Thanks
 
 Shahryar Sedghi
 
 -- 
 Life is what happens while you are making other plans. ~ John Lennon



Re: Setting up a cluster

2012-06-18 Thread Shahryar Sedghi
I did all you said. No errors and warnings.

On Mon, Jun 18, 2012 at 2:31 PM, aaron morton aa...@thelastpickle.comwrote:

 Did you set the cluster name to be the same ?

 Check the logs on the machines for errors or warnings.

 Finally check that each node can telnet to port 7000 on the others.

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 19/06/2012, at 6:29 AM, Shahryar Sedghi wrote:

 I am new to Cassandra, and setting up a cluster for the first time with
 1.1.1. There are three nodes, 1 acts as a seed node that all three have the
 ip address of that node as their seed. I have set the listen address to the
 address of each node and rpc address as 0.0.0.0. I turned the trace on on
 all three and see the GOSSIP messages between seed node and the other two,
 not between the two non-seed nodes, sometime I see connection timeout
 between seed node and the nodes but not very often; However nodetool -h
 address ring only shows one node in each machine (the localhost) a,d when I
 define a keyspace with any replication factor the begin and end token of
 the keyspace is the localhost token.

 P.S. i have generated tokens for each node.

 What did I miss here?

 Thanks

 Shahryar Sedghi

 --
 Life is what happens while you are making other plans. ~ John Lennon





-- 
Life is what happens while you are making other plans. ~ John Lennon


Re: Setting up a cluster

2012-06-18 Thread Abhijit Chanda
Are you sure all your settings are perfect. If so, then plz follow this
steps

./nodetool disablethrift
./nodetool disablegossip
./nodetool drain

stop the service and then delete the all data, saved_caches and commitlog
files. Then restart your service.
Repeat these steps for all the nodes. I hope it will work.

Regards,
-- 
Abhijit Chanda
VeHere Interactive Pvt. Ltd.
+91-974395


Re: setting up a cluster

2010-07-21 Thread Aaron Morton
This page may help http://wiki.apache.org/cassandra/MultinodeClusterit goes through the settings to change in storage-config.xmlHave not used it on EC2 so cannot help there. AaronOn 22 Jul, 2010,at 08:34 AM, S Ahmed sahmed1...@gmail.com wrote:Is this the onlydocumentationon startup up a cluster?http://wiki.apache.org/cassandra/GettingStartedI got a single node up, pretty straight forward.
But have no idea how to setup a cluster.1. Once I start a 2nd node, how do I tell it about the other nodes in the network?2. Do I inform all other nodes via the nodetool or is there a config file?
3. when you have nodes in other ec2 zones, how do you setup security? can you setup firewall rules when nodes are in different zones?thanks allot!