Re: Checking replication status

2016-03-01 Thread Bryan Cheng
HI Jeremy,

For more insight into the hint system, these two blog posts are great
resources: http://www.datastax.com/dev/blog/modern-hinted-handoff, and
http://www.datastax.com/dev/blog/whats-coming-to-cassandra-in-3-0-improved-hint-storage-and-delivery
.

For timeframes, that's going to differ based on your read/write patterns
and load. Although I haven't tried this before, I believe you can
query the system.hints
table to see the status of hints queued by the local machine.

--local and --dc are similar in the sense that they are always repairs
against the local datacenter, they just differ in syntax. If you sustain
loss of inter-dc connectivity for longer than max_hint_window_in_ms, you'll
want to run a cross-dc repair, which is just the standard full repair
(without specifying either).

On Mon, Feb 29, 2016 at 7:38 PM, Jimmy Lin  wrote:

> hi Bryan,
> I guess I want to find out if there is any way to tell when data will
> become consistent again in both cases.
>
> if the node being down shorter than the max_hint_window(say 2 hours out of
> 3 hrs max), is there anyway to check the log or JMX etc to see if the hint
> queue size back to zero or lower range?
>
>
> if node goes down longer than max_hint_window time (say 4 hrs hours > our
> max 3 hrs), we run repair job. What is the correct nodetool repair job
> syntax to use?
> in particular what is the difference between -local vs -dc? they both
> seems to indicate repairing nodes within a datacenter, but for across DC
> network outage, we want to repair nodes across DCs right?
>
> thanks
>
>
>
> On Fri, Feb 26, 2016 at 3:38 PM, Bryan Cheng 
> wrote:
>
>> Hi Jimmy,
>>
>> If you sustain a long downtime, repair is almost always the way to go.
>>
>> It seems like you're asking to what extent a cluster is able to
>> recover/resync a downed peer.
>>
>> A peer will not attempt to reacquire all the data it has missed while
>> being down. Recovery happens in a few ways:
>>
>> 1) Hints: Assuming that there are enough peers to satisfy your quorum
>> requirements on write, the live peers will queue up these operations for up
>> to max_hint_window_in_ms (from cassandra.yaml). These hints will be
>> delivered once the peer recovers.
>> 2) Read repair: There is a probability that read repair will happen,
>> meaning that a query will trigger data consistency checks and updates _on
>> the query being performed_.
>> 3) Repair.
>>
>> If a machine goes down for longer than max_hint_window_in_ms, AFAIK you
>> _will_ have missing data. If you cannot tolerate this situation, you need
>> to take a look at your tunable consistency and/or trigger a repair.
>>
>> On Thu, Feb 25, 2016 at 7:26 PM, Jimmy Lin  wrote:
>>
>>> so far they are not long, just some config change and restart.
>>> if it is a 2 hrs downtime due to whatever reason, a repair is better
>>> option than trying to figure out if replication syn finish or not?
>>>
>>> On Thu, Feb 25, 2016 at 1:09 PM, daemeon reiydelle 
>>> wrote:
>>>
 Hmm. What are your processes when a node comes back after "a long
 offline"? Long enough to take the node offline and do a repair? Run the
 risk of serving stale data? Parallel repairs? ???

 So, what sort of time frames are "a long time"?


 *...*



 *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
 <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
 <%28%2B44%29%20%280%29%2020%208144%209872>*

 On Thu, Feb 25, 2016 at 11:36 AM, Jimmy Lin  wrote:

> hi all,
>
> what are the better ways to check replication overall status of cassandra 
> cluster?
>
>  within a single DC, unless a node is down for long time, most of the 
> time i feel it is pretty much non-issue and things are replicated pretty 
> fast. But when a node come back from a long offline, is there a way to 
> check that the node has finished its data sync with other nodes  ?
>
>  Now across DC, we have frequent VPN outage (sometime short sometims 
> long) between DCs, i also like to know if there is a way to find how the 
> replication progress between DC catching up under this condtion?
>
>  Also, if i understand correctly, the only gaurantee way to make sure 
> data are synced is to run a complete repair job,
> is that correct? I am trying to see if there is a way to "force a quick 
> replication sync" between DCs after vpn outage.
> Or maybe this is unnecessary, as Cassandra will catch up as fast as it 
> can, there is nothing else we/(system admin) can do to make it faster or 
> better?
>
>
>
> Sent from my iPhone
>


>>>
>>
>


Re: Checking replication status

2016-02-29 Thread Jimmy Lin
hi Bryan,
I guess I want to find out if there is any way to tell when data will
become consistent again in both cases.

if the node being down shorter than the max_hint_window(say 2 hours out of
3 hrs max), is there anyway to check the log or JMX etc to see if the hint
queue size back to zero or lower range?


if node goes down longer than max_hint_window time (say 4 hrs hours > our
max 3 hrs), we run repair job. What is the correct nodetool repair job
syntax to use?
in particular what is the difference between -local vs -dc? they both seems
to indicate repairing nodes within a datacenter, but for across DC network
outage, we want to repair nodes across DCs right?

thanks



On Fri, Feb 26, 2016 at 3:38 PM, Bryan Cheng  wrote:

> Hi Jimmy,
>
> If you sustain a long downtime, repair is almost always the way to go.
>
> It seems like you're asking to what extent a cluster is able to
> recover/resync a downed peer.
>
> A peer will not attempt to reacquire all the data it has missed while
> being down. Recovery happens in a few ways:
>
> 1) Hints: Assuming that there are enough peers to satisfy your quorum
> requirements on write, the live peers will queue up these operations for up
> to max_hint_window_in_ms (from cassandra.yaml). These hints will be
> delivered once the peer recovers.
> 2) Read repair: There is a probability that read repair will happen,
> meaning that a query will trigger data consistency checks and updates _on
> the query being performed_.
> 3) Repair.
>
> If a machine goes down for longer than max_hint_window_in_ms, AFAIK you
> _will_ have missing data. If you cannot tolerate this situation, you need
> to take a look at your tunable consistency and/or trigger a repair.
>
> On Thu, Feb 25, 2016 at 7:26 PM, Jimmy Lin  wrote:
>
>> so far they are not long, just some config change and restart.
>> if it is a 2 hrs downtime due to whatever reason, a repair is better
>> option than trying to figure out if replication syn finish or not?
>>
>> On Thu, Feb 25, 2016 at 1:09 PM, daemeon reiydelle 
>> wrote:
>>
>>> Hmm. What are your processes when a node comes back after "a long
>>> offline"? Long enough to take the node offline and do a repair? Run the
>>> risk of serving stale data? Parallel repairs? ???
>>>
>>> So, what sort of time frames are "a long time"?
>>>
>>>
>>> *...*
>>>
>>>
>>>
>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>
>>> On Thu, Feb 25, 2016 at 11:36 AM, Jimmy Lin  wrote:
>>>
 hi all,

 what are the better ways to check replication overall status of cassandra 
 cluster?

  within a single DC, unless a node is down for long time, most of the time 
 i feel it is pretty much non-issue and things are replicated pretty fast. 
 But when a node come back from a long offline, is there a way to check 
 that the node has finished its data sync with other nodes  ?

  Now across DC, we have frequent VPN outage (sometime short sometims long) 
 between DCs, i also like to know if there is a way to find how the 
 replication progress between DC catching up under this condtion?

  Also, if i understand correctly, the only gaurantee way to make sure data 
 are synced is to run a complete repair job,
 is that correct? I am trying to see if there is a way to "force a quick 
 replication sync" between DCs after vpn outage.
 Or maybe this is unnecessary, as Cassandra will catch up as fast as it 
 can, there is nothing else we/(system admin) can do to make it faster or 
 better?



 Sent from my iPhone

>>>
>>>
>>
>


Re: Checking replication status

2016-02-26 Thread Bryan Cheng
Hi Jimmy,

If you sustain a long downtime, repair is almost always the way to go.

It seems like you're asking to what extent a cluster is able to
recover/resync a downed peer.

A peer will not attempt to reacquire all the data it has missed while being
down. Recovery happens in a few ways:

1) Hints: Assuming that there are enough peers to satisfy your quorum
requirements on write, the live peers will queue up these operations for up
to max_hint_window_in_ms (from cassandra.yaml). These hints will be
delivered once the peer recovers.
2) Read repair: There is a probability that read repair will happen,
meaning that a query will trigger data consistency checks and updates _on
the query being performed_.
3) Repair.

If a machine goes down for longer than max_hint_window_in_ms, AFAIK you
_will_ have missing data. If you cannot tolerate this situation, you need
to take a look at your tunable consistency and/or trigger a repair.

On Thu, Feb 25, 2016 at 7:26 PM, Jimmy Lin  wrote:

> so far they are not long, just some config change and restart.
> if it is a 2 hrs downtime due to whatever reason, a repair is better
> option than trying to figure out if replication syn finish or not?
>
> On Thu, Feb 25, 2016 at 1:09 PM, daemeon reiydelle 
> wrote:
>
>> Hmm. What are your processes when a node comes back after "a long
>> offline"? Long enough to take the node offline and do a repair? Run the
>> risk of serving stale data? Parallel repairs? ???
>>
>> So, what sort of time frames are "a long time"?
>>
>>
>> *...*
>>
>>
>>
>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>
>> On Thu, Feb 25, 2016 at 11:36 AM, Jimmy Lin  wrote:
>>
>>> hi all,
>>>
>>> what are the better ways to check replication overall status of cassandra 
>>> cluster?
>>>
>>>  within a single DC, unless a node is down for long time, most of the time 
>>> i feel it is pretty much non-issue and things are replicated pretty fast. 
>>> But when a node come back from a long offline, is there a way to check that 
>>> the node has finished its data sync with other nodes  ?
>>>
>>>  Now across DC, we have frequent VPN outage (sometime short sometims long) 
>>> between DCs, i also like to know if there is a way to find how the 
>>> replication progress between DC catching up under this condtion?
>>>
>>>  Also, if i understand correctly, the only gaurantee way to make sure data 
>>> are synced is to run a complete repair job,
>>> is that correct? I am trying to see if there is a way to "force a quick 
>>> replication sync" between DCs after vpn outage.
>>> Or maybe this is unnecessary, as Cassandra will catch up as fast as it can, 
>>> there is nothing else we/(system admin) can do to make it faster or better?
>>>
>>>
>>>
>>> Sent from my iPhone
>>>
>>
>>
>


Re: Checking replication status

2016-02-25 Thread Jimmy Lin
so far they are not long, just some config change and restart.
if it is a 2 hrs downtime due to whatever reason, a repair is better option
than trying to figure out if replication syn finish or not?

On Thu, Feb 25, 2016 at 1:09 PM, daemeon reiydelle 
wrote:

> Hmm. What are your processes when a node comes back after "a long
> offline"? Long enough to take the node offline and do a repair? Run the
> risk of serving stale data? Parallel repairs? ???
>
> So, what sort of time frames are "a long time"?
>
>
> *...*
>
>
>
> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
> <%28%2B44%29%20%280%29%2020%208144%209872>*
>
> On Thu, Feb 25, 2016 at 11:36 AM, Jimmy Lin  wrote:
>
>> hi all,
>>
>> what are the better ways to check replication overall status of cassandra 
>> cluster?
>>
>>  within a single DC, unless a node is down for long time, most of the time i 
>> feel it is pretty much non-issue and things are replicated pretty fast. But 
>> when a node come back from a long offline, is there a way to check that the 
>> node has finished its data sync with other nodes  ?
>>
>>  Now across DC, we have frequent VPN outage (sometime short sometims long) 
>> between DCs, i also like to know if there is a way to find how the 
>> replication progress between DC catching up under this condtion?
>>
>>  Also, if i understand correctly, the only gaurantee way to make sure data 
>> are synced is to run a complete repair job,
>> is that correct? I am trying to see if there is a way to "force a quick 
>> replication sync" between DCs after vpn outage.
>> Or maybe this is unnecessary, as Cassandra will catch up as fast as it can, 
>> there is nothing else we/(system admin) can do to make it faster or better?
>>
>>
>>
>> Sent from my iPhone
>>
>
>


Re: Checking replication status

2016-02-25 Thread daemeon reiydelle
Hmm. What are your processes when a node comes back after "a long offline"?
Long enough to take the node offline and do a repair? Run the risk of
serving stale data? Parallel repairs? ???

So, what sort of time frames are "a long time"?


*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Thu, Feb 25, 2016 at 11:36 AM, Jimmy Lin  wrote:

> hi all,
>
> what are the better ways to check replication overall status of cassandra 
> cluster?
>
>  within a single DC, unless a node is down for long time, most of the time i 
> feel it is pretty much non-issue and things are replicated pretty fast. But 
> when a node come back from a long offline, is there a way to check that the 
> node has finished its data sync with other nodes  ?
>
>  Now across DC, we have frequent VPN outage (sometime short sometims long) 
> between DCs, i also like to know if there is a way to find how the 
> replication progress between DC catching up under this condtion?
>
>  Also, if i understand correctly, the only gaurantee way to make sure data 
> are synced is to run a complete repair job,
> is that correct? I am trying to see if there is a way to "force a quick 
> replication sync" between DCs after vpn outage.
> Or maybe this is unnecessary, as Cassandra will catch up as fast as it can, 
> there is nothing else we/(system admin) can do to make it faster or better?
>
>
>
> Sent from my iPhone
>


Checking replication status

2016-02-25 Thread Jimmy Lin
hi all, 
what are the better ways to check replication overall status of cassandra 
cluster?
 within a single DC, unless a node is down for long time, most of the time i 
feel it is pretty much non-issue and things are replicated pretty fast. But 
when a node come back from a long offline, is there a way to check that the 
node has finished its data sync with other nodes  ? 
 Now across DC, we have frequent VPN outage (sometime short sometims long) 
between DCs, i also like to know if there is a way to find how the replication 
progress between DC catching up under this condtion? 
 Also, if i understand correctly, the only gaurantee way to make sure data are 
synced is to run a complete repair job, is that correct? I am trying to see if 
there is a way to "force a quick replication sync" between DCs after vpn 
outage. Or maybe this is unnecessary, as Cassandra will catch up as fast as it 
can, there is nothing else we/(system admin) can do to make it faster or better?


Sent from my iPhone