Very slow cluster

Eduardo Alonso Fri, 28 Apr 2017 04:26:48 -0700

Hi to all:

I am having some problems with two client's cassandra:3.0.8 clusters i want
to share with you. These clusters are for QA and DEV.


The cluster 1 (1 DC) is composed by 3 vm (heap=4G, RAM=8G) sharing the same
physical machine and sharing one ssd. I know this is not the best
environment but it is only for testing purposes.

The entire cluster runs very slow and sometimes have some failing inserts
causing saving hints and replaying them and some data inconsistency with 2i
queries.

I know it is not the best environment (virtual machines sharing physical
machine and one physical disk) but it is very weird to me that just the
same test case works like a charm in a 3 docker container inside my
laptop(i7 16G ssd) but causes a lot of problems in their cluster.

*listen_address* and *rpc_address* are set to external domain name (i. e:
NODE_NAME.clientdomain.com). I have activated TRACE logs and get some
strange messages

So, my questions:

*1.- It is posible that one node(with ) send a message to self triggering
READ_REPAIR?*

TRACE [SharedPool-Worker-1] 2017-04-24 08:58:28,558
MessagingService.java:750 - Message-to-self TYPE:MUTATION
VERB:READ_REPAIR going
over MessagingService

    TRACE [SharedPool-Worker-1] 2017-04-16 04:38:47,513
MessagingService.java:747 -01a.clientdomain.com/10.63.24.238
<http://qathcsdvm01c.ny3.corp.portware.net/10.63.24.238> sending READ_REPAIR to
3426@/10.63.24.238"

*Does this log line shows one node asking itself for a portion of data that
it has not? *

*2.-* I have another suspicious log line about slow vms:

-WARN  [GossipTasks:1] 2017-04-14 00:32:44,371 FailureDetector.java:287 -
Not marking nodes down due to local pause of 11195193520 > 5000000000

*Does this line says that there is a pause in JVM  of 11 secs*? There is no
garbage collector log lines. *Is it posible that this 11 secs pause is
caused by a dns lookup of the domain?*


*3.-* I know that listen_address must be the external IP (Inter node
communications will be faster, no need to dns lookup)

*If i set listen_address to external ip, is it necessary that ip be
pingable from all the other datacenter nodes? *
*Does inter-data-center communications use 'rpc_address' or
'listen_address'*?

Thank you in advance



















Eduardo Alonso
Vía de las dos Castillas, 33, Ática 4, 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 828 6473 // www.stratio.com // *@stratiobd
<https://twitter.com/StratioBD>*

Very slow cluster

Reply via email to