Monitoring replication lag/latency in multi DC setup

2012-09-05 Thread Venkat Rama
Hi,

We have multi DC Cassandra ring with 2 DCs setup.   We use LOCAL_QUORUM for
writes and reads.  The network we have seen between the DC is sometimes
flaky lasting few minutes to few 10 of minutes.

I wanted to know what is the best way to measure/monitor either the lag or
replication latency between the data centers.  Are there any metrics I can
monitor to find the backlog of data that needs to be transferred?

Thanks in advance.

VR


Re: Monitoring replication lag/latency in multi DC setup

2012-09-05 Thread Venkat Rama
Thanks for the quick reply, Mohit.Can we measure/monitor the size of
Hinted Handoffs?  Would it be a good enough indicator of my back log?

Although we know when a network is flaky, we are interested in knowing how
much data is piling up in local DC that needs to be transferred.

Greatly appreciate your help.

VR


On Wed, Sep 5, 2012 at 8:33 PM, Mohit Anchlia mohitanch...@gmail.comwrote:

 As far as I know Cassandra doesn't use internal queueing mechanism
 specific to replication. Cassandra sends the write the remote DC and after
 that it's upto the tcp/ip stack to deal with buffering. If requests starts
 to timeout Cassandra would use HH upto certain time. For longer outage you
 would have to run repair.

 Also look at tcp/ip tuning parameters that are helpful with your scenario:

 http://kaivanov.blogspot.com/2010/09/linux-tcp-tuning.html

 Run iperf and test the latency.

 On Wed, Sep 5, 2012 at 8:22 PM, Venkat Rama venkata.s.r...@gmail.comwrote:

 Hi,

 We have multi DC Cassandra ring with 2 DCs setup.   We use LOCAL_QUORUM
 for writes and reads.  The network we have seen between the DC is sometimes
 flaky lasting few minutes to few 10 of minutes.

 I wanted to know what is the best way to measure/monitor either the lag
 or replication latency between the data centers.  Are there any metrics I
 can monitor to find the backlog of data that needs to be transferred?

 Thanks in advance.

 VR





Secondary index read/write explanation

2012-09-05 Thread Venkat Rama
Hi All,

I am a new bee to Cassandra and trying to understand how secondary indexes
work.  I have been going over the discussion on
https://issues.apache.org/jira/browse/CASSANDRA-749 about local secondary
indexes. And interesting question on
http://www.mail-archive.com/user@cassandra.apache.org/msg16966.html.  The
discussion seems to assume that most common uses cases are ones with range
queries.  Is this right?

I am trying to understand the low cardinality reasoning and how the read
gets executed.  I have following questions, hoping i can explain my
question well :)

1.  When a write request is received, it is written to the base CF and
secondary index to secondary (hidden) CF. If this right, will the secondary
index be written local the node or will it follow RP/OPP to write to nodes.
2.  When a coordinator receives a read request with say predicate x=y where
column x is the secondary index, how does the coordinator query relevant
node(s)? How does it avoid sending it to all nodes if it is locally indexed?

If there is any article/blog that can help understand this better, please
let me know.

Thanks again in advance.

VR


Re: Monitoring replication lag/latency in multi DC setup

2012-09-05 Thread Venkat Rama
Is there a specific metric you can recommend?

VR

On Wed, Sep 5, 2012 at 9:19 PM, Mohit Anchlia mohitanch...@gmail.comwrote:

 Cassandra exposes lot of metrics through Jconsole. You might be able to
 get some information from Jconsole.


 On Wed, Sep 5, 2012 at 8:47 PM, Venkat Rama venkata.s.r...@gmail.comwrote:

 Thanks for the quick reply, Mohit.Can we measure/monitor the size of
 Hinted Handoffs?  Would it be a good enough indicator of my back log?

 Although we know when a network is flaky, we are interested in knowing
 how much data is piling up in local DC that needs to be transferred.

 Greatly appreciate your help.

 VR


 On Wed, Sep 5, 2012 at 8:33 PM, Mohit Anchlia mohitanch...@gmail.comwrote:

 As far as I know Cassandra doesn't use internal queueing mechanism
 specific to replication. Cassandra sends the write the remote DC and after
 that it's upto the tcp/ip stack to deal with buffering. If requests starts
 to timeout Cassandra would use HH upto certain time. For longer outage you
 would have to run repair.

 Also look at tcp/ip tuning parameters that are helpful with your
 scenario:

 http://kaivanov.blogspot.com/2010/09/linux-tcp-tuning.html

 Run iperf and test the latency.

  On Wed, Sep 5, 2012 at 8:22 PM, Venkat Rama 
 venkata.s.r...@gmail.comwrote:

 Hi,

 We have multi DC Cassandra ring with 2 DCs setup.   We use LOCAL_QUORUM
 for writes and reads.  The network we have seen between the DC is sometimes
 flaky lasting few minutes to few 10 of minutes.

 I wanted to know what is the best way to measure/monitor either the lag
 or replication latency between the data centers.  Are there any metrics I
 can monitor to find the backlog of data that needs to be transferred?

 Thanks in advance.

 VR







Re: Adding a new node

2011-05-09 Thread Venkat Rama
Thanks for the pointer.  I restarted entire cluster and started nodes at the
same time. However, I still see the issue.  The view is not consistant. Am
running 0.7.5.
In general, if a node with bad ring view starts first, then I guess the
restart also doesnt help as it might be propagating its view.  Is this
assumption correct?



On Sun, May 8, 2011 at 9:02 PM, aaron morton aa...@thelastpickle.comwrote:

 It is possible to change IP address of a node, background
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/change-node-IP-address-td6197607.html


 If you have already bought a new node back with a different IP and the
 nodes in the cluster have different views of the ring (nodetool ring) you
 should see

 http://www.datastax.com/docs/0.7/troubleshooting/index#view-of-ring-differs-between-some-nodes


 What version are you on and what does nodetool ring say?

 Hope that helps.

 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/change-node-IP-address-td6197607.html
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 9 May 2011, at 12:24, Venkat Rama wrote:

 Hi,

 I am trying to bring up a new node (with different IP) to replace a dead
 node on cassandra 0.7.5.   Rather than bootstrap, I am copying the SSTable
 files to the new node(backed up files) as my data runs into several GB.
  Although the node successfully joins the ring, some of the ring nodes still
 seem to point to the old dead node as seen from ring command.  Is there a
 way to notify all nodes about the new node?  Am looking for options that can
 bring the cluster back to it original state in a faster and reliable manner
 since I do have all the SSTable files.
 One option I looked at was to remove all system table and restart the
 entire cluster.  But I loose the schemas with this approach.

 Thanks in advance for your reply.

 VR






Adding a new node

2011-05-08 Thread Venkat Rama
Hi,

I am trying to bring up a new node (with different IP) to replace a dead
node on cassandra 0.7.5.   Rather than bootstrap, I am copying the SSTable
files to the new node(backed up files) as my data runs into several GB.
 Although the node successfully joins the ring, some of the ring nodes still
seem to point to the old dead node as seen from ring command.  Is there a
way to notify all nodes about the new node?  Am looking for options that can
bring the cluster back to it original state in a faster and reliable manner
since I do have all the SSTable files.
One option I looked at was to remove all system table and restart the entire
cluster.  But I loose the schemas with this approach.

Thanks in advance for your reply.

VR