Hi, I've checked all things Alain suggested and set up a fresh 2-node cluster, and I still get the same result: each node lists itself as only one.
This time I made the following changes: - I set listen_address to the public DNS name. Internally, AWS's DNS will map this to the 10.x IP, so this should work correctlly if I understand right. These are new EC2 instances, and I did not trust configured hostname or so on. - I opened all ports between nodes in security group. - I kept the snitch at Ec2MultiRegionSnitch. This cluster is small now but it will be very large and nationwide if I succeed and choose Cassandra for this purpose. Do I right understand that it is not possible to change this later, or at least is not easy? - I ensured all Alain suggestions, for example cluster_name is same with all nodes. - I set seed list to public DNS name of first node. This is identical on both node. - I checked Alain's suggest about auto_bootstrap. Docs say this is not needed to set. Is this docs wrong? (I look at DataStax 1.2 PDF docs) Here is some more debugging evidence. On node 1, the seed, [root@ip-10-113-19-24 ~]# ifconfig | grep inet.addr inet addr:10.113.19.24 Bcast:10.113.19.255 Mask:255.255.254.0 [root@ip-10-113-19-24 ~]# nodetool status Datacenter: us-east =================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 23.22.204.201 20.97 KB 256 100.0% 4fadd4fd-c57c-4172-95aa-092368ba5743 1a [root@ip-10-113-19-24 ~]# netstat -antp Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 0.0.0.0:7199 0.0.0.0:* LISTEN 1910/java tcp 0 0 0.0.0.0:47298 0.0.0.0:* LISTEN 1910/java tcp 0 0 0.0.0.0:57030 0.0.0.0:* LISTEN 1910/java tcp 0 0 0.0.0.0:9160 0.0.0.0:* LISTEN 1910/java tcp 0 0 0.0.0.0:9042 0.0.0.0:* LISTEN 1910/java tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1231/sshd tcp 0 0 10.113.19.24:7000 0.0.0.0:* LISTEN 1910/java tcp 0 1 10.113.19.24:38948 54.234.147.60:7000 SYN_SENT 1910/java tcp 0 0 10.113.19.24:7000 10.113.19.24:45328 ESTABLISHED 1910/java tcp 0 0 10.113.19.24:7000 10.114.205.157:47713 ESTABLISHED 1910/java tcp 0 1 10.113.19.24:45597 23.22.204.201:7000 SYN_SENT 1910/java tcp 0 0 10.113.19.24:45328 10.113.19.24:7000 ESTABLISHED 1910/java And in the log, INFO 20:58:12,472 Node /23.22.204.201 state jump to normal INFO 20:58:12,482 Startup completed! Now serving reads. Now, this looks similar to the problem before with the private IP addresses being used some times, public other times. By the way, the other node, whose internal IP address is 10.114.205.157, is connected to this seed node as you can see. I think I could understand this problem if I understand which types of network connections I should expect to see in the netstat, and what output I should expect to see in the log. Can someone with more experience tell me what is wrong/unexpected above? And am I working against Amazon's architecture by using IPs the way I do? While I wait for answer, I will shut down, delete all data, and reconfigure with public IP addresses explicitly and not use DNS names :-) I have a feeling this is the problem. From within Amazon EC2 server, requesting DNS for a public DNS name returns the private IP address. (However, I still feel unsure about what is right way to do this, because I do not know if Cassandra will use DNS resolve and end up trying to connect to a private IP that Cassandra is not listening.) Thanks, - Boris On Wed, Feb 13, 2013 at 10:37 AM, Boris Solovyov <boris.solov...@gmail.com>wrote: > Thank you Alain. I will check the things you suggest and report my results. > > - Boris > > > On Wed, Feb 13, 2013 at 7:54 AM, Alain RODRIGUEZ <arodr...@gmail.com>wrote: > >> Hi Boris. >> >> "I feel like I have made a beginner's mistake" >> That's an horrible feeling :D. I'll try to help ;) >> >> "cluster_name: 'TS'" >> Are you sure you used the same name for both node ? >> >> "I can connect to port 7000" >> You can check all the ports needed there >> http://www.datastax.com/docs/1.2/install/install_ami and open them in >> security group once and for all so you won't be wondering this anymore. >> >> "listen_address: 10.145.232.190" >> "INFO 19:36:32,710 Node /107.22.114.19 state jump to normal" >> There is "10.145.232.190" defined as listen address and you logs says >> that 107.22.114.19 joined the ring and your second ip seems to be >> 23.21.11.193... When you stop an EC2 server, its internal ip may change. >> So I recommend you not to do so, but restart them instead. Anyway you >> should use instance stores and not EBS, and Instance Store can't be stopped >> so you won't have this issue anymore. Don't trust ip-10-145-232-190 >> which is configured at first start in /etc/hostname. >> >> "endpoint_snitch: Ec2MultiRegionSnitch" >> Maybe should you use endpoint_snitch: Ec2Snitch since all your servers >> are in the same zone. You will have to use privates ip everywhere and >> comment the broadcast_address if you do so. >> >> >> The first node has to start with auto_bootsrap: false, while the 2nd one >> could use auto_bootsrap: true. Seeds node must be your first node only, a >> bootstrapping node mustn't be defined as a seed. >> >> "my guess... certainly 30-second timeouts look suspicious" >> This is not a timeout but rather a sleep and it is a normal wait while >> adding a node. >> >> Since your a new user, I guess you have no data. If you want to try some >> conf you can always "reset" your cassandra node by removing .../cassandra/* >> (commitlog, data and saved_caches) after stopping Cassandra. >> >> Good luck with this. >> >> Alain >> >> >> 2013/2/12 Boris Solovyov <boris.solov...@gmail.com> >> >>> I've configured 2-node cluster in EC2, key settings as follows: >>> >>> cluster_name: 'TS' >>> num_tokens: 256 >>> seed_provider: >>> - class_name: org.apache.cassandra.locator.SimpleSeedProvider >>> parameters: >>> - seeds: "ec2-23-21-11-193.compute-1.amazonaws.com, >>> ec2-107-22-114-19.compute-1.amazonaws.com" >>> listen_address: 10.145.232.190 >>> broadcast_address: ec2-23-21-11-193.compute-1.amazonaws.com >>> rpc_address: 0.0.0.0 >>> endpoint_snitch: Ec2MultiRegionSnitch >>> >>> On other node, it is similar, but of course the listen and broadcast >>> address are different. Now, when I start Cassandra, I see in the logs >>> >>> INFO 19:35:32,348 JOINING: waiting for ring information >>> >>> And then after 30 seconds, it says a bunch of things like this: >>> >>> JOINING: schema complete, ready to bootstrap >>> JOINING: getting bootstrap token >>> Enqueuing flush of Memtable... >>> JOINING: sleeping 30000 ms for pending range setup >>> JOINING: Starting to bootstrap... >>> Bootstrap completed! for the tokens [....] >>> >>> Finally, after some more memtable flushing, >>> >>> INFO 19:36:32,710 Node /107.22.114.19 state jump to normal >>> INFO 19:36:32,722 Startup completed! Now serving reads. >>> >>> Now, I start the other node, and I see basically the same thing in the >>> logs. >>> >>> Running nodetool status, I see what looks like two single-node clusters! >>> >>> [root@ip-10-147-171-160 ~]# nodetool status >>> Datacenter: us-east >>> =================== >>> Status=Up/Down >>> |/ State=Normal/Leaving/Joining/Moving >>> -- Address Load Tokens Owns Host ID >>> Rack >>> UN 107.22.114.19 21 KB 256 100.0% >>> f7a24bd2-8cb9-499d-806c-d9e548f34b8d 1a >>> >>> [root@ip-10-145-232-190 ~]# nodetool status >>> Datacenter: us-east >>> =================== >>> Status=Up/Down >>> |/ State=Normal/Leaving/Joining/Moving >>> -- Address Load Tokens Owns Host ID >>> Rack >>> UN 23.21.11.193 21 KB 256 100.0% >>> 9d70f022-03cf-488a-807d-22e991761483 1a >>> >>> It looks to me like nodes didn't communicate with each other like I >>> thought they would, and timed out waiting for gossip to tell them which >>> nodes are in the ring (I'm new to Cassandra, but this is my guess... >>> certainly 30-second timeouts look suspicious). I checked with telnet, and >>> from each node I can connect to port 7000 on the other node (both on >>> internal and public IP). I feel like I have made a beginner's mistake. >>> Anyone has a suggestion where to look next? >>> >>> - Boris >>> >> >> >