[ 
https://issues.apache.org/jira/browse/CASSANDRA-7547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062237#comment-14062237
 ] 

Joaquin Casares edited comment on CASSANDRA-7547 at 7/15/14 4:21 PM:
---------------------------------------------------------------------

Before we go any further with the using the DataStax reflector, you should 
consider its initial purpose was to find a simple way for individual nodes to 
cluster together on startup without initially knowing anything about each other.

The current reflector is built with a short-term memory of about 10 minutes to 
get the list of seeds. If a node is slow to boot and comes in on the 11th 
minute, it will never know of its peers. If a pre-chosen seed node is slow to 
boot, the nodes may never properly cluster together.

This is important because the seed provider is pinged multiple times during the 
lifetime of the cluster, mainly during periods of topological changes: removal, 
bootstrap, replace, etc. If these happen outside of a window of 10 minutes for 
all nodes, you'll get an empty or incomplete list of seeds.

Taking the concept of using "a" reflector may be worth doing, but keep these 
things in mind:
* ensure you use a private service with long-term memory,
* you should rely on a reflector for assistance in configuring the seed list, 
not the seed-provider directly,
* always assume the service can and will go down so write to disk 
appropriately, perhaps conf/seed-list.txt,
* you must account for topological changes that will occur in long running 
clusters,
* and all seed lists on each node should be identical.

The last point is probably the hardest. I'm not sure if this infrastructure 
fits best inside of Cassandra or as external tools. However, in order to have 
more control of when seed lists get updated, instead of waiting for Cassandra 
services to kick in, external tools will probably be your best option.

I hope this helps you build what you have in mind. Cheers!


was (Author: j.casares):
Before we go any further with the using the DataStax reflector, you should 
consider its initial purpose was to find a simple way for individual nodes to 
cluster together on startup without initially knowing anything about each other.

The current reflector is built with a short-term memory of about 10 minutes to 
get the list of seeds. If a node is slow to boot and comes in on the 11th 
minute, it will never know of its peers. If a pre-chosen seed node is slow to 
boot, the nodes may never properly cluster together.

This is important because the seed provider is pinged multiple times during the 
lifetime of the cluster, mainly during periods of topological changes: removal, 
bootstrap, replace, etc. If these happen outside of a window of 10 minutes for 
all nodes, you'll get an empty or incomplete list of seeds.

Taking the concept of using "a" reflector may be worth doing, but keep these 
things in mind:
* ensure you use a private service with long-term memory,
* you should rely on a reflector for assistance in configuring the seed list, 
not the seed-provider directly,
* always assume the service can and will go down so write to disk 
appropriately, perhaps conf/seed-list.txt,
* you must account for topological changes that will occur in long running 
clusters,
* and all seed lists on each node should be identical.

The last point is probably the hardest. I'm not sure if this infrastructure 
fits best inside of Cassandra or as external tools.

I hope this helps you build what you have in mind. Cheers!

> EC2 seed provider using DataStax Reflector
> ------------------------------------------
>
>                 Key: CASSANDRA-7547
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7547
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Pekka Enberg
>            Priority: Minor
>         Attachments: 0001-EC2-seed-provider.patch
>
>
> This is a request for comments. I am using this to build our EC2 AMIs but I 
> thought I'd ask if this makes sense as a generic feature for Cassandra.
> Cassandra cluster auto-configuration on EC2 uses the Datastax reflector 
> service for discovering seed nodes. Instead of relying on external scripts, 
> this patch implements EC2 seed provider that uses the Datastax reflector 
> service.
> This is particularly useful for EC2 AMIs that don't include a complete 
> userspace (such as those built with OSv) where we ideally want to push as 
> much configuration to the application itself.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to