Hi to all: I am having some problems with two client's cassandra:3.0.8 clusters i want to share with you. These clusters are for QA and DEV.
The cluster 1 (1 DC) is composed by 3 vm (heap=4G, RAM=8G) sharing the same physical machine and sharing one ssd. I know this is not the best environment but it is only for testing purposes. The entire cluster runs very slow and sometimes have some failing inserts causing saving hints and replaying them and some data inconsistency with 2i queries. I know it is not the best environment (virtual machines sharing physical machine and one physical disk) but it is very weird to me that just the same test case works like a charm in a 3 docker container inside my laptop(i7 16G ssd) but causes a lot of problems in their cluster. *listen_address* and *rpc_address* are set to external domain name (i. e: NODE_NAME.clientdomain.com). I have activated TRACE logs and get some strange messages So, my questions: *1.- It is posible that one node(with ) send a message to self triggering READ_REPAIR?* TRACE [SharedPool-Worker-1] 2017-04-24 08:58:28,558 MessagingService.java:750 - Message-to-self TYPE:MUTATION VERB:READ_REPAIR going over MessagingService TRACE [SharedPool-Worker-1] 2017-04-16 04:38:47,513 MessagingService.java:747 -01a.clientdomain.com/10.63.24.238 <http://qathcsdvm01c.ny3.corp.portware.net/10.63.24.238> sending READ_REPAIR to 3426@/10.63.24.238" *Does this log line shows one node asking itself for a portion of data that it has not? * *2.-* I have another suspicious log line about slow vms: -WARN [GossipTasks:1] 2017-04-14 00:32:44,371 FailureDetector.java:287 - Not marking nodes down due to local pause of 11195193520 > 5000000000 *Does this line says that there is a pause in JVM of 11 secs*? There is no garbage collector log lines. *Is it posible that this 11 secs pause is caused by a dns lookup of the domain?* *3.-* I know that listen_address must be the external IP (Inter node communications will be faster, no need to dns lookup) *If i set listen_address to external ip, is it necessary that ip be pingable from all the other datacenter nodes? * *Does inter-data-center communications use 'rpc_address' or 'listen_address'*? Thank you in advance Eduardo Alonso Vía de las dos Castillas, 33, Ática 4, 3ª Planta 28224 Pozuelo de Alarcón, Madrid Tel: +34 91 828 6473 // www.stratio.com // *@stratiobd <https://twitter.com/StratioBD>*