Re: Replication delays due to issues with inter node communication across multiple data centers, hints are piling up

2017-10-22 Thread Jeff Jirsa
What consistency level do you use on writes?
Did this just start or has it always happened ?
Are you seeing GC pauses at all?

What’s your 99% write latency? 

-- 
Jeff Jirsa


> On Oct 22, 2017, at 9:21 PM, "vbhang...@gmail.com" wrote:
> 
> This is for Cassandra 2.1.13. At times there are replication delays across 
> multiple regions. Data is available (getting queried from command line) in 1 
> region but not seen in other region(s).  This is not consistent. It is 
> cluster spanning multiple data centers with total > 30 nodes. Keyspace is 
> configured to get replicated in all the data centers.
> 
> Hints are getting piled up in the source region. This happens especially for 
> large data payload (appro 1kb to few MB blobs).  Network  level congestion or 
> saturation does not seem to be an issue.  There is no memory/cpu pressure on 
> individual nodes.
> 
> I am sharing Cassandra.yaml below, any pointers on what can be tuned are 
> highly appreciated. Let me know if you need any other info.
> 
> We tried bumping up hinted_handoff_throttle_in_kb: 30720 and handoff tends to 
> be slower max_hints_delivery_threads: 12 on one of the nodes to see if it 
> speeds up hints delivery, there was some improvement but not whole lot.
> 
> Thanks
> 
> =
> # Cassandra storage config YAML
> 
> # NOTE:
> #   See http://wiki.apache.org/cassandra/StorageConfiguration for
> #   full explanations of configuration directives
> # /NOTE
> 
> # The name of the cluster. This is mainly used to prevent machines in
> # one logical cluster from joining another.
> cluster_name: "central"
> 
> # This defines the number of tokens randomly assigned to this node on the ring
> # The more tokens, relative to other nodes, the larger the proportion of data
> # that this node will store. You probably want all nodes to have the same 
> number
> # of tokens assuming they have equal hardware capability.
> #
> # If you leave this unspecified, Cassandra will use the default of 1 token 
> for legacy compatibility,
> # and will use the initial_token as described below.
> #
> # Specifying initial_token will override this setting on the node's initial 
> start,
> # on subsequent starts, this setting will apply even if initial token is set.
> #
> # If you already have a cluster with 1 token per node, and wish to migrate to
> # multiple tokens per node, see http://wiki.apache.org/cassandra/Operations
> #num_tokens: 256
> 
> # initial_token allows you to specify tokens manually.  While you can use # 
> it with
> # vnodes (num_tokens > 1, above) -- in which case you should provide a
> # comma-separated list -- it's primarily used when adding nodes # to legacy 
> clusters
> # that do not have vnodes enabled.
> # initial_token:
> 
> initial_token: 
> 
> # See http://wiki.apache.org/cassandra/HintedHandoff
> # May either be "true" or "false" to enable globally, or contain a list
> # of data centers to enable per-datacenter.
> # hinted_handoff_enabled: DC1,DC2
> hinted_handoff_enabled: true
> # this defines the maximum amount of time a dead host will have hints
> # generated.  After it has been dead this long, new hints for it will not be
> # created until it has been seen alive and gone down again.
> max_hint_window_in_ms: 1080 # 3 hours
> # Maximum throttle in KBs per second, per delivery thread.  This will be
> # reduced proportionally to the number of nodes in the cluster.  (If there
> # are two nodes in the cluster, each delivery thread will use the maximum
> # rate; if there are three, each will throttle to half of the maximum,
> # since we expect two nodes to be delivering hints simultaneously.)
> hinted_handoff_throttle_in_kb: 1024
> # Number of threads with which to deliver hints;
> # Consider increasing this number when you have multi-dc deployments, since
> # cross-dc handoff tends to be slower
> max_hints_delivery_threads: 6
> 
> # Maximum throttle in KBs per second, total. This will be
> # reduced proportionally to the number of nodes in the cluster.
> batchlog_replay_throttle_in_kb: 1024
> 
> # Authentication backend, implementing IAuthenticator; used to identify users
> # Out of the box, Cassandra provides 
> org.apache.cassandra.auth.{AllowAllAuthenticator,
> # PasswordAuthenticator}.
> #
> # - AllowAllAuthenticator performs no checks - set it to disable 
> authentication.
> # - PasswordAuthenticator relies on username/password pairs to authenticate
> #   users. It keeps usernames and hashed passwords in system_auth.credentials 
> table.
> #   Please increase system_auth keyspace replication factor if you use this 
> authenticator.
> authenticator: AllowAllAuthenticator
> 
> # Authorization backend, implementing IAuthorizer; used to limit 
> access/provide permissions
> # Out of the box, Cassandra provides 
> org.apache.cassandra.auth.{AllowAllAuthorizer,
> # CassandraAuthorizer}.
> #
> # - AllowAllAuthorizer allows any action to any user - set it to disable 
> authorization.
> # - 

Replication delays due to issues with inter node communication across multiple data centers, hints are piling up

2017-10-22 Thread vbhang...@gmail.com
This is for Cassandra 2.1.13. At times there are replication delays across 
multiple regions. Data is available (getting queried from command line) in 1 
region but not seen in other region(s).  This is not consistent. It is cluster 
spanning multiple data centers with total > 30 nodes. Keyspace is configured to 
get replicated in all the data centers.

Hints are getting piled up in the source region. This happens especially for 
large data payload (appro 1kb to few MB blobs).  Network  level congestion or 
saturation does not seem to be an issue.  There is no memory/cpu pressure on 
individual nodes.

I am sharing Cassandra.yaml below, any pointers on what can be tuned are highly 
appreciated. Let me know if you need any other info.

We tried bumping up hinted_handoff_throttle_in_kb: 30720 and handoff tends to 
be slower max_hints_delivery_threads: 12 on one of the nodes to see if it 
speeds up hints delivery, there was some improvement but not whole lot.

Thanks

=
# Cassandra storage config YAML

# NOTE:
#   See http://wiki.apache.org/cassandra/StorageConfiguration for
#   full explanations of configuration directives
# /NOTE

# The name of the cluster. This is mainly used to prevent machines in
# one logical cluster from joining another.
cluster_name: "central"

# This defines the number of tokens randomly assigned to this node on the ring
# The more tokens, relative to other nodes, the larger the proportion of data
# that this node will store. You probably want all nodes to have the same number
# of tokens assuming they have equal hardware capability.
#
# If you leave this unspecified, Cassandra will use the default of 1 token for 
legacy compatibility,
# and will use the initial_token as described below.
#
# Specifying initial_token will override this setting on the node's initial 
start,
# on subsequent starts, this setting will apply even if initial token is set.
#
# If you already have a cluster with 1 token per node, and wish to migrate to
# multiple tokens per node, see http://wiki.apache.org/cassandra/Operations
#num_tokens: 256

# initial_token allows you to specify tokens manually.  While you can use # it 
with
# vnodes (num_tokens > 1, above) -- in which case you should provide a
# comma-separated list -- it's primarily used when adding nodes # to legacy 
clusters
# that do not have vnodes enabled.
# initial_token:

initial_token: 

# See http://wiki.apache.org/cassandra/HintedHandoff
# May either be "true" or "false" to enable globally, or contain a list
# of data centers to enable per-datacenter.
# hinted_handoff_enabled: DC1,DC2
hinted_handoff_enabled: true
# this defines the maximum amount of time a dead host will have hints
# generated.  After it has been dead this long, new hints for it will not be
# created until it has been seen alive and gone down again.
max_hint_window_in_ms: 1080 # 3 hours
# Maximum throttle in KBs per second, per delivery thread.  This will be
# reduced proportionally to the number of nodes in the cluster.  (If there
# are two nodes in the cluster, each delivery thread will use the maximum
# rate; if there are three, each will throttle to half of the maximum,
# since we expect two nodes to be delivering hints simultaneously.)
hinted_handoff_throttle_in_kb: 1024
# Number of threads with which to deliver hints;
# Consider increasing this number when you have multi-dc deployments, since
# cross-dc handoff tends to be slower
max_hints_delivery_threads: 6

# Maximum throttle in KBs per second, total. This will be
# reduced proportionally to the number of nodes in the cluster.
batchlog_replay_throttle_in_kb: 1024

# Authentication backend, implementing IAuthenticator; used to identify users
# Out of the box, Cassandra provides 
org.apache.cassandra.auth.{AllowAllAuthenticator,
# PasswordAuthenticator}.
#
# - AllowAllAuthenticator performs no checks - set it to disable authentication.
# - PasswordAuthenticator relies on username/password pairs to authenticate
#   users. It keeps usernames and hashed passwords in system_auth.credentials 
table.
#   Please increase system_auth keyspace replication factor if you use this 
authenticator.
authenticator: AllowAllAuthenticator

# Authorization backend, implementing IAuthorizer; used to limit access/provide 
permissions
# Out of the box, Cassandra provides 
org.apache.cassandra.auth.{AllowAllAuthorizer,
# CassandraAuthorizer}.
#
# - AllowAllAuthorizer allows any action to any user - set it to disable 
authorization.
# - CassandraAuthorizer stores permissions in system_auth.permissions table. 
Please
#   increase system_auth keyspace replication factor if you use this authorizer.
authorizer: AllowAllAuthorizer

# Validity period for permissions cache (fetching permissions can be an
# expensive operation depending on the authorizer, CassandraAuthorizer is
# one example). Defaults to 2000, set to 0 to disable.
# Will be disabled automatically for AllowAllAuthorizer.