Re: Node Failure Scenario

2017-11-15 Thread Anshu Vajpayee
Thank you Jonathan and all.

On Tue, Nov 14, 2017 at 10:53 PM, Jonathan Haddad  wrote:

> Anthony’s suggestions using replace_address_first_boot lets you avoid that
> requirement, and it’s specifically why it was added in 2.2.
> On Tue, Nov 14, 2017 at 1:02 AM Anshu Vajpayee 
> wrote:
>
>> ​Thanks  guys ,
>>
>> I thikn better to pass replace_address on command line rather than update
>> the cassndra-env file so that there would not be requirement to  remove it
>> later.
>> ​
>>
>> On Tue, Nov 14, 2017 at 6:32 AM, Anthony Grasso > > wrote:
>>
>>> Hi Anshu,
>>>
>>> To add to Erick's comment, remember to remove the *replace_address* method
>>> from the *cassandra-env.sh* file once the node has rejoined
>>> successfully. The node will fail the next restart otherwise.
>>>
>>> Alternatively, use the *replace_address_first_boot* method which works
>>> exactly the same way as *replace_address* the only difference is there
>>> is no need to remove it from the *cassandra-env.sh* file.
>>>
>>> Kind regards,
>>> Anthony
>>>
>>> On 13 November 2017 at 14:59, Erick Ramirez 
>>> wrote:
>>>
 Use the replace_address method with its own IP address. Make sure you
 delete the contents of the following directories:
 - data/
 - commitlog/
 - saved_caches/

 Forget rejoining with repair -- it will just cause more problems.
 Cheers!

 On Mon, Nov 13, 2017 at 2:54 PM, Anshu Vajpayee <
 anshu.vajpa...@gmail.com> wrote:

> Hi All ,
>
> There was a node failure in one of production cluster due to disk
> failure.  After h/w recovery that node is noew ready be part of cluster,
> but it doesn't has any data due to disk crash.
>
>
>
> I can think of following option :
>
>
>
> 1. replace the node with same. using replace_address
>
> 2. Set bootstrap=false , start the node and run the repair to stream
> the data.
>
>
>
> Please suggest if both option are good and which is  best as per your
> experience. This is live production cluster.
>
>
> Thanks,
>
>
> --
> *C*heers,*
> *Anshu V*
>
>
>

>>>
>>
>>
>> --
>> *C*heers,*
>> *Anshu V*
>>
>>
>>


-- 
*C*heers,*
*Anshu V*


Re: Node Failure Scenario

2017-11-14 Thread Jonathan Haddad
Anthony’s suggestions using replace_address_first_boot lets you avoid that
requirement, and it’s specifically why it was added in 2.2.
On Tue, Nov 14, 2017 at 1:02 AM Anshu Vajpayee 
wrote:

> ​Thanks  guys ,
>
> I thikn better to pass replace_address on command line rather than update
> the cassndra-env file so that there would not be requirement to  remove it
> later.
> ​
>
> On Tue, Nov 14, 2017 at 6:32 AM, Anthony Grasso 
> wrote:
>
>> Hi Anshu,
>>
>> To add to Erick's comment, remember to remove the *replace_address* method
>> from the *cassandra-env.sh* file once the node has rejoined
>> successfully. The node will fail the next restart otherwise.
>>
>> Alternatively, use the *replace_address_first_boot* method which works
>> exactly the same way as *replace_address* the only difference is there
>> is no need to remove it from the *cassandra-env.sh* file.
>>
>> Kind regards,
>> Anthony
>>
>> On 13 November 2017 at 14:59, Erick Ramirez  wrote:
>>
>>> Use the replace_address method with its own IP address. Make sure you
>>> delete the contents of the following directories:
>>> - data/
>>> - commitlog/
>>> - saved_caches/
>>>
>>> Forget rejoining with repair -- it will just cause more problems. Cheers!
>>>
>>> On Mon, Nov 13, 2017 at 2:54 PM, Anshu Vajpayee <
>>> anshu.vajpa...@gmail.com> wrote:
>>>
 Hi All ,

 There was a node failure in one of production cluster due to disk
 failure.  After h/w recovery that node is noew ready be part of cluster,
 but it doesn't has any data due to disk crash.



 I can think of following option :



 1. replace the node with same. using replace_address

 2. Set bootstrap=false , start the node and run the repair to stream
 the data.



 Please suggest if both option are good and which is  best as per your
 experience. This is live production cluster.


 Thanks,


 --
 *C*heers,*
 *Anshu V*



>>>
>>
>
>
> --
> *C*heers,*
> *Anshu V*
>
>
>


Re: Node Failure Scenario

2017-11-14 Thread Anshu Vajpayee
​Thanks  guys ,

I thikn better to pass replace_address on command line rather than update
the cassndra-env file so that there would not be requirement to  remove it
later.
​

On Tue, Nov 14, 2017 at 6:32 AM, Anthony Grasso 
wrote:

> Hi Anshu,
>
> To add to Erick's comment, remember to remove the *replace_address* method
> from the *cassandra-env.sh* file once the node has rejoined successfully.
> The node will fail the next restart otherwise.
>
> Alternatively, use the *replace_address_first_boot* method which works
> exactly the same way as *replace_address* the only difference is there is
> no need to remove it from the *cassandra-env.sh* file.
>
> Kind regards,
> Anthony
>
> On 13 November 2017 at 14:59, Erick Ramirez  wrote:
>
>> Use the replace_address method with its own IP address. Make sure you
>> delete the contents of the following directories:
>> - data/
>> - commitlog/
>> - saved_caches/
>>
>> Forget rejoining with repair -- it will just cause more problems. Cheers!
>>
>> On Mon, Nov 13, 2017 at 2:54 PM, Anshu Vajpayee > > wrote:
>>
>>> Hi All ,
>>>
>>> There was a node failure in one of production cluster due to disk
>>> failure.  After h/w recovery that node is noew ready be part of cluster,
>>> but it doesn't has any data due to disk crash.
>>>
>>>
>>>
>>> I can think of following option :
>>>
>>>
>>>
>>> 1. replace the node with same. using replace_address
>>>
>>> 2. Set bootstrap=false , start the node and run the repair to stream the
>>> data.
>>>
>>>
>>>
>>> Please suggest if both option are good and which is  best as per your
>>> experience. This is live production cluster.
>>>
>>>
>>> Thanks,
>>>
>>>
>>> --
>>> *C*heers,*
>>> *Anshu V*
>>>
>>>
>>>
>>
>


-- 
*C*heers,*
*Anshu V*


Re: Node Failure Scenario

2017-11-13 Thread Anthony Grasso
Hi Anshu,

To add to Erick's comment, remember to remove the *replace_address* method
from the *cassandra-env.sh* file once the node has rejoined successfully.
The node will fail the next restart otherwise.

Alternatively, use the *replace_address_first_boot* method which works
exactly the same way as *replace_address* the only difference is there is
no need to remove it from the *cassandra-env.sh* file.

Kind regards,
Anthony

On 13 November 2017 at 14:59, Erick Ramirez  wrote:

> Use the replace_address method with its own IP address. Make sure you
> delete the contents of the following directories:
> - data/
> - commitlog/
> - saved_caches/
>
> Forget rejoining with repair -- it will just cause more problems. Cheers!
>
> On Mon, Nov 13, 2017 at 2:54 PM, Anshu Vajpayee 
> wrote:
>
>> Hi All ,
>>
>> There was a node failure in one of production cluster due to disk
>> failure.  After h/w recovery that node is noew ready be part of cluster,
>> but it doesn't has any data due to disk crash.
>>
>>
>>
>> I can think of following option :
>>
>>
>>
>> 1. replace the node with same. using replace_address
>>
>> 2. Set bootstrap=false , start the node and run the repair to stream the
>> data.
>>
>>
>>
>> Please suggest if both option are good and which is  best as per your
>> experience. This is live production cluster.
>>
>>
>> Thanks,
>>
>>
>> --
>> *C*heers,*
>> *Anshu V*
>>
>>
>>
>


Re: Node Failure Scenario

2017-11-12 Thread Erick Ramirez
Use the replace_address method with its own IP address. Make sure you
delete the contents of the following directories:
- data/
- commitlog/
- saved_caches/

Forget rejoining with repair -- it will just cause more problems. Cheers!

On Mon, Nov 13, 2017 at 2:54 PM, Anshu Vajpayee 
wrote:

> Hi All ,
>
> There was a node failure in one of production cluster due to disk
> failure.  After h/w recovery that node is noew ready be part of cluster,
> but it doesn't has any data due to disk crash.
>
>
>
> I can think of following option :
>
>
>
> 1. replace the node with same. using replace_address
>
> 2. Set bootstrap=false , start the node and run the repair to stream the
> data.
>
>
>
> Please suggest if both option are good and which is  best as per your
> experience. This is live production cluster.
>
>
> Thanks,
>
>
> --
> *C*heers,*
> *Anshu V*
>
>
>


Re: Node failure

2017-10-06 Thread Jon Haddad
I’ve had a few use cases for downgrading consistency over the years.  If you’re 
showing a customer dashboard w/ some Ad summary data, it’s great to be right, 
but showing a number that’s close is better than not being up.

> On Oct 6, 2017, at 1:32 PM, Jeff Jirsa  wrote:
> 
> I think it was Brandon that used to make a pretty compelling argument that 
> downgrading consistency on writes was always wrong, because if you can 
> tolerate the lower consistency, you should just use the lower consistency 
> from the start (because cassandra is still going to send the write to all 
> replicas, anyway). 
> 
> On Fri, Oct 6, 2017 at 12:51 PM, Jim Witschey  > wrote:
> > Modern client drivers also have ways to “downgrade” the CL of requests, in 
> > case they fail. E.g. for the Java driver: 
> > http://docs.datastax.com/en/latest-java-driver-api/com/datastax/driver/core/policies/DowngradingConsistencyRetryPolicy.html
> >  
> > 
> 
> Quick note from a driver dev's perspective: Mark, yours sounds like a
> bad use case for a downgrading retry policy. If your cluster has an RF
> of 2, and your app requires CL.QUORUM, a downgrading policy will, e.g.
> try at CL.QUORUM and downgrade below your required CL; or try at
> CL.ALL, then fail and downgrade to CL.QUORUM or an equivalent, which
> is what your app needs in the first place.
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
> 
> For additional commands, e-mail: user-h...@cassandra.apache.org 
> 
> 
> 



Re: Node failure

2017-10-06 Thread Jeff Jirsa
I think it was Brandon that used to make a pretty compelling argument that
downgrading consistency on writes was always wrong, because if you can
tolerate the lower consistency, you should just use the lower consistency
from the start (because cassandra is still going to send the write to all
replicas, anyway).

On Fri, Oct 6, 2017 at 12:51 PM, Jim Witschey 
wrote:

> > Modern client drivers also have ways to “downgrade” the CL of requests,
> in case they fail. E.g. for the Java driver: http://docs.datastax.com/en/
> latest-java-driver-api/com/datastax/driver/core/policies/
> DowngradingConsistencyRetryPolicy.html
>
> Quick note from a driver dev's perspective: Mark, yours sounds like a
> bad use case for a downgrading retry policy. If your cluster has an RF
> of 2, and your app requires CL.QUORUM, a downgrading policy will, e.g.
> try at CL.QUORUM and downgrade below your required CL; or try at
> CL.ALL, then fail and downgrade to CL.QUORUM or an equivalent, which
> is what your app needs in the first place.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: Node failure

2017-10-06 Thread Jim Witschey
> Modern client drivers also have ways to “downgrade” the CL of requests, in 
> case they fail. E.g. for the Java driver: 
> http://docs.datastax.com/en/latest-java-driver-api/com/datastax/driver/core/policies/DowngradingConsistencyRetryPolicy.html

Quick note from a driver dev's perspective: Mark, yours sounds like a
bad use case for a downgrading retry policy. If your cluster has an RF
of 2, and your app requires CL.QUORUM, a downgrading policy will, e.g.
try at CL.QUORUM and downgrade below your required CL; or try at
CL.ALL, then fail and downgrade to CL.QUORUM or an equivalent, which
is what your app needs in the first place.

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



RE: Node failure

2017-10-06 Thread Mark Furlong
I’ll check to see what our app is using.

Thanks
Mark
801-705-7115 office

From: Steinmaurer, Thomas [mailto:thomas.steinmau...@dynatrace.com]
Sent: Friday, October 6, 2017 12:25 PM
To: user@cassandra.apache.org
Subject: RE: Node failure

QUORUM should succeed with a RF=3 and 2 of 3 nodes available.

Modern client drivers also have ways to “downgrade” the CL of requests, in case 
they fail. E.g. for the Java driver: 
http://docs.datastax.com/en/latest-java-driver-api/com/datastax/driver/core/policies/DowngradingConsistencyRetryPolicy.html


Thomas

From: Mark Furlong [mailto:mfurl...@ancestry.com]
Sent: Freitag, 06. Oktober 2017 19:43
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: RE: Node failure

Thanks for the detail. I’ll have to remove and then add one back in. It’s my 
consistency levels that may bite me in the interim.

Thanks
Mark
801-705-7115 office

From: Jeff Jirsa [mailto:jji...@gmail.com]
Sent: Friday, October 6, 2017 11:29 AM
To: cassandra <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: Node failure

There's a lot to talk about here, what's your exact question?


- You can either remove it from the cluster or replace it. You typically remove 
it if it'll never be replaced, but in RF=3 with 3 nodes, you probably need to 
replace it. To replace, you'll start a new server with 
-Dcassandra.replace_address=a.b.c.d ( 
http://cassandra.apache.org/doc/latest/operating/topo_changes.html#replacing-a-dead-node
 ) , and it'll stream data from the neighbors and eventually replace the dead 
node in the ring (the dead node will be removed from 'nodetool status', the new 
node will be there instead).

- If you're not going to replace it, things get a bit more complex - you'll do 
some combination of repair, 'nodetool removenode' or 'nodetool assassinate', 
and ALTERing the keyspace to set RF=2. The order matters, and so does the 
consistency level you use for reads/writes (so we can tell you whether or not 
you're likely to lose data in this process), so I'm not giving step-by-steps 
here because it's not very straight forward and there are a lot of caveats.




On Fri, Oct 6, 2017 at 10:20 AM, Mark Furlong 
<mfurl...@ancestry.com<mailto:mfurl...@ancestry.com>> wrote:
What happens when I have a 3 node cluster with RF 3 and a node fails that needs 
to be removed?

Mark Furlong

Sr. Database Administrator

mfurl...@ancestry.com<mailto:mfurl...@ancestry.com>
M: 801-859-7427<tel:(801)%20859-7427>
O: 801-705-7115<tel:(801)%20705-7115>
1300 W Traverse 
Pkwy<https://maps.google.com/?q=1300+W+Traverse+Pkwy%0D+Lehi,+UT+84043=gmail=g>
Lehi, UT 
84043<https://maps.google.com/?q=1300+W+Traverse+Pkwy%0D+Lehi,+UT+84043=gmail=g>





​[http://c.mfcreative.com/mars/email/shared-icon/sig-logo.gif]




The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


RE: Node failure

2017-10-06 Thread Steinmaurer, Thomas
QUORUM should succeed with a RF=3 and 2 of 3 nodes available.

Modern client drivers also have ways to “downgrade” the CL of requests, in case 
they fail. E.g. for the Java driver: 
http://docs.datastax.com/en/latest-java-driver-api/com/datastax/driver/core/policies/DowngradingConsistencyRetryPolicy.html


Thomas

From: Mark Furlong [mailto:mfurl...@ancestry.com]
Sent: Freitag, 06. Oktober 2017 19:43
To: user@cassandra.apache.org
Subject: RE: Node failure

Thanks for the detail. I’ll have to remove and then add one back in. It’s my 
consistency levels that may bite me in the interim.

Thanks
Mark
801-705-7115 office

From: Jeff Jirsa [mailto:jji...@gmail.com]
Sent: Friday, October 6, 2017 11:29 AM
To: cassandra <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: Node failure

There's a lot to talk about here, what's your exact question?


- You can either remove it from the cluster or replace it. You typically remove 
it if it'll never be replaced, but in RF=3 with 3 nodes, you probably need to 
replace it. To replace, you'll start a new server with 
-Dcassandra.replace_address=a.b.c.d ( 
http://cassandra.apache.org/doc/latest/operating/topo_changes.html#replacing-a-dead-node
 ) , and it'll stream data from the neighbors and eventually replace the dead 
node in the ring (the dead node will be removed from 'nodetool status', the new 
node will be there instead).

- If you're not going to replace it, things get a bit more complex - you'll do 
some combination of repair, 'nodetool removenode' or 'nodetool assassinate', 
and ALTERing the keyspace to set RF=2. The order matters, and so does the 
consistency level you use for reads/writes (so we can tell you whether or not 
you're likely to lose data in this process), so I'm not giving step-by-steps 
here because it's not very straight forward and there are a lot of caveats.




On Fri, Oct 6, 2017 at 10:20 AM, Mark Furlong 
<mfurl...@ancestry.com<mailto:mfurl...@ancestry.com>> wrote:
What happens when I have a 3 node cluster with RF 3 and a node fails that needs 
to be removed?

Mark Furlong

Sr. Database Administrator

mfurl...@ancestry.com<mailto:mfurl...@ancestry.com>
M: 801-859-7427<tel:(801)%20859-7427>
O: 801-705-7115<tel:(801)%20705-7115>
1300 W Traverse 
Pkwy<https://maps.google.com/?q=1300+W+Traverse+Pkwy%0D+Lehi,+UT+84043=gmail=g>
Lehi, UT 
84043<https://maps.google.com/?q=1300+W+Traverse+Pkwy%0D+Lehi,+UT+84043=gmail=g>





​[http://c.mfcreative.com/mars/email/shared-icon/sig-logo.gif]




The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313


RE: Node failure

2017-10-06 Thread Mark Furlong
We are using quorum on our reads and writes.

Thanks
Mark
801-705-7115 office

From: Jeff Jirsa [mailto:jji...@gmail.com]
Sent: Friday, October 6, 2017 11:30 AM
To: cassandra <user@cassandra.apache.org>
Subject: Re: Node failure

If you write with CL:ANY, CL:ONE (or LOCAL_ONE), and one node fails, you may 
lose data that hasn't made it to other nodes.


On Fri, Oct 6, 2017 at 10:28 AM, Mark Furlong 
<mfurl...@ancestry.com<mailto:mfurl...@ancestry.com>> wrote:
The only time I’ll have a problem is if I have a do a read all or write all. 
Any other gotchas I should be aware of?

Thanks
Mark
801-705-7115<tel:(801)%20705-7115> office

From: Akshit Jain 
[mailto:akshit13...@iiitd.ac.in<mailto:akshit13...@iiitd.ac.in>]
Sent: Friday, October 6, 2017 11:25 AM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Node failure

You replace it with a new node and bootstraping happens.The new node receives 
data from other two nodes.
Rest depends on the scenerio u are asking for.

Regards
Akshit Jain
B-Tech,2013124
9891724697
[Image removed by sender.]

On Fri, Oct 6, 2017 at 10:50 PM, Mark Furlong 
<mfurl...@ancestry.com<mailto:mfurl...@ancestry.com>> wrote:
What happens when I have a 3 node cluster with RF 3 and a node fails that needs 
to be removed?

Mark Furlong

Sr. Database Administrator

mfurl...@ancestry.com<mailto:mfurl...@ancestry.com>
M: 801-859-7427<tel:(801)%20859-7427>
O: 801-705-7115<tel:(801)%20705-7115>
1300 W Traverse 
Pkwy<https://maps.google.com/?q=1300+W+Traverse+Pkwy%0D+Lehi,+UT+84043=gmail=g>
Lehi, UT 
84043<https://maps.google.com/?q=1300+W+Traverse+Pkwy%0D+Lehi,+UT+84043=gmail=g>





​[http://c.mfcreative.com/mars/email/shared-icon/sig-logo.gif]







RE: Node failure

2017-10-06 Thread Mark Furlong
Thanks for the detail. I’ll have to remove and then add one back in. It’s my 
consistency levels that may bite me in the interim.

Thanks
Mark
801-705-7115 office

From: Jeff Jirsa [mailto:jji...@gmail.com]
Sent: Friday, October 6, 2017 11:29 AM
To: cassandra <user@cassandra.apache.org>
Subject: Re: Node failure

There's a lot to talk about here, what's your exact question?


- You can either remove it from the cluster or replace it. You typically remove 
it if it'll never be replaced, but in RF=3 with 3 nodes, you probably need to 
replace it. To replace, you'll start a new server with 
-Dcassandra.replace_address=a.b.c.d ( 
http://cassandra.apache.org/doc/latest/operating/topo_changes.html#replacing-a-dead-node
 ) , and it'll stream data from the neighbors and eventually replace the dead 
node in the ring (the dead node will be removed from 'nodetool status', the new 
node will be there instead).

- If you're not going to replace it, things get a bit more complex - you'll do 
some combination of repair, 'nodetool removenode' or 'nodetool assassinate', 
and ALTERing the keyspace to set RF=2. The order matters, and so does the 
consistency level you use for reads/writes (so we can tell you whether or not 
you're likely to lose data in this process), so I'm not giving step-by-steps 
here because it's not very straight forward and there are a lot of caveats.




On Fri, Oct 6, 2017 at 10:20 AM, Mark Furlong 
<mfurl...@ancestry.com<mailto:mfurl...@ancestry.com>> wrote:
What happens when I have a 3 node cluster with RF 3 and a node fails that needs 
to be removed?

Mark Furlong

Sr. Database Administrator

mfurl...@ancestry.com<mailto:mfurl...@ancestry.com>
M: 801-859-7427<tel:(801)%20859-7427>
O: 801-705-7115<tel:(801)%20705-7115>
1300 W Traverse 
Pkwy<https://maps.google.com/?q=1300+W+Traverse+Pkwy%0D+Lehi,+UT+84043=gmail=g>
Lehi, UT 
84043<https://maps.google.com/?q=1300+W+Traverse+Pkwy%0D+Lehi,+UT+84043=gmail=g>





​[http://c.mfcreative.com/mars/email/shared-icon/sig-logo.gif]






Re: Node failure

2017-10-06 Thread Jeff Jirsa
If you write with CL:ANY, CL:ONE (or LOCAL_ONE), and one node fails, you
may lose data that hasn't made it to other nodes.


On Fri, Oct 6, 2017 at 10:28 AM, Mark Furlong <mfurl...@ancestry.com> wrote:

> The only time I’ll have a problem is if I have a do a read all or write
> all. Any other gotchas I should be aware of?
>
>
>
> *Thanks*
>
> *Mark*
>
> *801-705-7115 <(801)%20705-7115> office*
>
>
>
> *From:* Akshit Jain [mailto:akshit13...@iiitd.ac.in]
> *Sent:* Friday, October 6, 2017 11:25 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Node failure
>
>
>
> You replace it with a new node and bootstraping happens.The new node
> receives data from other two nodes.
>
> Rest depends on the scenerio u are asking for.
>
>
> Regards
>
> Akshit Jain
>
> B-Tech,2013124
>
> 9891724697
>
> [image: Image removed by sender.]
>
>
>
> On Fri, Oct 6, 2017 at 10:50 PM, Mark Furlong <mfurl...@ancestry.com>
> wrote:
>
> What happens when I have a 3 node cluster with RF 3 and a node fails that
> needs to be removed?
>
>
>
> *Mark Furlong*
>
> Sr. Database Administrator
>
> *mfurl...@ancestry.com <mfurl...@ancestry.com>*
> M: 801-859-7427 <(801)%20859-7427>
>
> O: 801-705-7115 <(801)%20705-7115>
>
> 1300 W Traverse Pkwy
> <https://maps.google.com/?q=1300+W+Traverse+Pkwy%0D+Lehi,+UT+84043=gmail=g>
>
> Lehi, UT 84043
> <https://maps.google.com/?q=1300+W+Traverse+Pkwy%0D+Lehi,+UT+84043=gmail=g>
>
>
>
>
>
> ​[image: http://c.mfcreative.com/mars/email/shared-icon/sig-logo.gif]
>
>
>
>
>
>
>


RE: Node failure

2017-10-06 Thread Mark Furlong
The only time I’ll have a problem is if I have a do a read all or write all. 
Any other gotchas I should be aware of?

Thanks
Mark
801-705-7115 office

From: Akshit Jain [mailto:akshit13...@iiitd.ac.in]
Sent: Friday, October 6, 2017 11:25 AM
To: user@cassandra.apache.org
Subject: Re: Node failure

You replace it with a new node and bootstraping happens.The new node receives 
data from other two nodes.
Rest depends on the scenerio u are asking for.

Regards
Akshit Jain
B-Tech,2013124
9891724697
[Image removed by sender.]

On Fri, Oct 6, 2017 at 10:50 PM, Mark Furlong 
<mfurl...@ancestry.com<mailto:mfurl...@ancestry.com>> wrote:
What happens when I have a 3 node cluster with RF 3 and a node fails that needs 
to be removed?

Mark Furlong

Sr. Database Administrator

mfurl...@ancestry.com<mailto:mfurl...@ancestry.com>
M: 801-859-7427
O: 801-705-7115
1300 W Traverse 
Pkwy<https://maps.google.com/?q=1300+W+Traverse+Pkwy%0D+Lehi,+UT+84043=gmail=g>
Lehi, UT 
84043<https://maps.google.com/?q=1300+W+Traverse+Pkwy%0D+Lehi,+UT+84043=gmail=g>





​[http://c.mfcreative.com/mars/email/shared-icon/sig-logo.gif]






Re: Node failure

2017-10-06 Thread Akshit Jain
You replace it with a new node and bootstraping happens.The new node
receives data from other two nodes.
Rest depends on the scenerio u are asking for.

Regards
Akshit Jain
B-Tech,2013124
9891724697


On Fri, Oct 6, 2017 at 10:50 PM, Mark Furlong  wrote:

> What happens when I have a 3 node cluster with RF 3 and a node fails that
> needs to be removed?
>
>
>
> *Mark Furlong*
>
> Sr. Database Administrator
>
> *mfurl...@ancestry.com *
> M: 801-859-7427
>
> O: 801-705-7115
>
> 1300 W Traverse Pkwy
> 
>
> Lehi, UT 84043
> 
>
>
>
>
>
> ​[image: http://c.mfcreative.com/mars/email/shared-icon/sig-logo.gif]
>
>
>
>
>


Re: Node failure

2017-10-06 Thread Jeff Jirsa
There's a lot to talk about here, what's your exact question?


- You can either remove it from the cluster or replace it. You typically
remove it if it'll never be replaced, but in RF=3 with 3 nodes, you
probably need to replace it. To replace, you'll start a new server with
-Dcassandra.replace_address=a.b.c.d (
http://cassandra.apache.org/doc/latest/operating/topo_changes.html#replacing-a-dead-node
) , and it'll stream data from the neighbors and eventually replace the
dead node in the ring (the dead node will be removed from 'nodetool
status', the new node will be there instead).

- If you're not going to replace it, things get a bit more complex - you'll
do some combination of repair, 'nodetool removenode' or 'nodetool
assassinate', and ALTERing the keyspace to set RF=2. The order matters, and
so does the consistency level you use for reads/writes (so we can tell you
whether or not you're likely to lose data in this process), so I'm not
giving step-by-steps here because it's not very straight forward and there
are a lot of caveats.




On Fri, Oct 6, 2017 at 10:20 AM, Mark Furlong  wrote:

> What happens when I have a 3 node cluster with RF 3 and a node fails that
> needs to be removed?
>
>
>
> *Mark Furlong*
>
> Sr. Database Administrator
>
> *mfurl...@ancestry.com *
> M: 801-859-7427 <(801)%20859-7427>
>
> O: 801-705-7115 <(801)%20705-7115>
>
> 1300 W Traverse Pkwy
> 
>
> Lehi, UT 84043
> 
>
>
>
>
>
> ​[image: http://c.mfcreative.com/mars/email/shared-icon/sig-logo.gif]
>
>
>
>
>


RE: Node failure Due To Very high GC pause time

2017-07-13 Thread Durity, Sean R
I like Bryan’s terminology of an “antagonistic use case.” If I am reading this 
correctly, you are putting 5 (or 10) million records in a partition and then 
trying to delete them in the same order they are stored. This is not a good 
data model for Cassandra, in fact a dangerous data model. That partition will 
reside completely on one node (and a number of replicas). Then, you are forcing 
the reads to wade through all the tombstones to get to the undeleted records – 
all on the same nodes. This cannot scale to the scope you want.

For a distributed data store, you want the data distributed across all of your 
cluster. And you want to delete whole partitions, if at all possible. (Or at 
least a reasonable number of deletes within a partition.)


Sean Durity
From: Karthick V [mailto:karthick...@zohocorp.com]
Sent: Monday, July 03, 2017 12:47 PM
To: user <user@cassandra.apache.org>
Subject: Re: Node failure Due To Very high GC pause time

Hi Bryan,

Thanks for your quick response.  We have already tuned our memory 
and GC based on our hardware specification and it was working fine until 
yesterday, i.e before facing the below specified delete request. As you 
specified we will once again look into our GC & memory configuration.

FYKI :  We are using memtable_allocation_typ as offheap_objects.

Consider the following table

CREATE TABLE  EmployeeDetails (
branch_id text,
department_id  text,
emp_id bigint,
emp_details text,
PRIMARY KEY (branch, department, emp_id)
) WITH CLUSTERING ORDER BY (department ASC, emp_id ASC)


In this table I have 10 million records for the a particular branch_id and 
department_id . And following are the list of operation which I perform in C* 
in chronological order

  1.  Deleting 5 million records, from the start, in batches of 500 records per 
request for the particular branch_id (say 'xxx' ) and department_id (say 'yyy')
  2.  Read the next 500 records as soon the above delete operation is being 
completed ( Select * from EmployeeDetails where branch_id='xxx' and 
department_id = 'yyy' and emp_id >5000 limit 500 )

It's only after executing the above read request there was a spike in memory 
and within few minutes the node has been marked down.

So my question here is , will the above read request will load all the deleted 
5 million records in my memory before it starts fetching or will it jump 
directly to the offset of 5001 record (since we have specified the greater 
than condition) ? If its going to the former case then for sure the read 
request will keep the data in main memory and performs merge operation before 
it delivers the data as per this wiki( 
https://wiki.apache.org/cassandra/ReadPathForUsers<https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.apache.org_cassandra_ReadPathForUsers=DwMFaQ=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=vKUn5NzE_8ZTSnpd-nZm4PEv1cHlVlCWislk0ZFzuqM=zumGOb2d0jimG7vaGzRDMd8wnODr8sp55zh1KVURl2I=>
 ). If not let me know how the above specified read request will provide me the 
data .


Note : And also while analyzing my heap dump its clear that majority of the 
memory is being held my Tombstone threads.


Thanks in advance
-- karthick



 On Mon, 03 Jul 2017 20:40:10 +0530 Bryan Cheng 
<br...@blockcypher.com<mailto:br...@blockcypher.com>> wrote 

This is a very antagonistic use case for Cassandra :P I assume you're familiar 
with Cassandra and deletes? (eg. 
http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html<https://urldefense.proofpoint.com/v2/url?u=http-3A__thelastpickle.com_blog_2016_07_27_about-2Ddeletes-2Dand-2Dtombstones.html=DwMFaQ=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=vKUn5NzE_8ZTSnpd-nZm4PEv1cHlVlCWislk0ZFzuqM=3UuijdoAetFFXUGpk68hBRTkeLcm5sPORFJgGnF1Axw=>,
 
http://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dml_about_deletes_c.html<https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.datastax.com_en_cassandra_2.1_cassandra_dml_dml-5Fabout-5Fdeletes-5Fc.html=DwMFaQ=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=vKUn5NzE_8ZTSnpd-nZm4PEv1cHlVlCWislk0ZFzuqM=YPXQsUTjN7jh0ugK8zhzF1D3Z2ANjEP_Kv2Bm38EAaY=>)

That being said, are you giving enough time for your tables to flush to disk? 
Deletes generate markers which can and will consume memory until they have a 
chance to be flushed, after which they will impact query time and performance 
(but should relieve memory pressure). If you're saturating the capability of 
your nodes your tables will have difficulty flushing. See 
http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_memtable_thruput_c.html<https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.datastax.com_en_cassandra_2.1_cassandra_operations_ops-5Fmemtable-5Fthruput-5Fc.html=DwMFaQ=MtgQEAMQGqekjTjiAhkudQ=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ=vKUn5NzE_8ZTSnpd-nZm4PEv1cHlVlCWislk0ZFzu

RE: Node failure Due To Very high GC pause time

2017-07-03 Thread ZAIDI, ASAD A
>>   here my is doubt is that does all the deleted 3.3Million row will be 
>> loaded in my on-heap memory? if not what will be object that occupying those 
>> memory ?

  It depends on your queries what data they’re fetching from your 
database.   Assuming you’re using CMS garbage collector and you’ve enabled GC 
logs with PrintGCDetails, PrintClassHistogramBeforeFullGC, 
PrintClassHistogramAfterFullGC – your logs should tell you what java classes 
occupies most of your  heap memory.

System.log file can also give you some clue  like if you see references to your 
tables with [tombstones],  A quick [grep –i tombstone /path/to/system.log] 
command would tell you what objects are suffering with tombstones!


From: Karthick V [mailto:karthick...@zohocorp.com]
Sent: Monday, July 03, 2017 11:47 AM
To: user <user@cassandra.apache.org>
Subject: Re: Node failure Due To Very high GC pause time

Hi Bryan,

Thanks for your quick response.  We have already tuned our memory 
and GC based on our hardware specification and it was working fine until 
yesterday, i.e before facing the below specified delete request. As you 
specified we will once again look into our GC & memory configuration.

FYKI :  We are using memtable_allocation_typ as offheap_objects.

Consider the following table

CREATE TABLE  EmployeeDetails (
branch_id text,
department_id  text,
emp_id bigint,
emp_details text,
PRIMARY KEY (branch, department, emp_id)
) WITH CLUSTERING ORDER BY (department ASC, emp_id ASC)


In this table I have 10 million records for the a particular branch_id and 
department_id . And following are the list of operation which I perform in C* 
in chronological order

  1.  Deleting 5 million records, from the start, in batches of 500 records per 
request for the particular branch_id (say 'xxx' ) and department_id (say 'yyy')
  2.  Read the next 500 records as soon the above delete operation is being 
completed ( Select * from EmployeeDetails where branch_id='xxx' and 
department_id = 'yyy' and emp_id >5000 limit 500 )

It's only after executing the above read request there was a spike in memory 
and within few minutes the node has been marked down.

So my question here is , will the above read request will load all the deleted 
5 million records in my memory before it starts fetching or will it jump 
directly to the offset of 5001 record (since we have specified the greater 
than condition) ? If its going to the former case then for sure the read 
request will keep the data in main memory and performs merge operation before 
it delivers the data as per this wiki( 
https://wiki.apache.org/cassandra/ReadPathForUsers<https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.apache.org_cassandra_ReadPathForUsers=DwMFaQ=LFYZ-o9_HUMeMTSQicvjIg=FsmDztdsVuIKml8IDhdHdg=CDWTETAp4ysP2BZOZf0NmK0uMC8DqyczuMM153cHKwU=_TNbuiV87eU6XZXA_kQ3gOpFzaebRps5xE0dhIb7vcs=>
 ). If not let me know how the above specified read request will provide me the 
data .


Note : And also while analyzing my heap dump its clear that majority of the 
memory is being held my Tombstone threads.


Thanks in advance
-- karthick



 On Mon, 03 Jul 2017 20:40:10 +0530 Bryan Cheng 
<br...@blockcypher.com<mailto:br...@blockcypher.com>> wrote 

This is a very antagonistic use case for Cassandra :P I assume you're familiar 
with Cassandra and deletes? (eg. 
http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html<https://urldefense.proofpoint.com/v2/url?u=http-3A__thelastpickle.com_blog_2016_07_27_about-2Ddeletes-2Dand-2Dtombstones.html=DwMFaQ=LFYZ-o9_HUMeMTSQicvjIg=FsmDztdsVuIKml8IDhdHdg=CDWTETAp4ysP2BZOZf0NmK0uMC8DqyczuMM153cHKwU=V22H8IC2AtB4TjhmouEnIcWgUvXDPyH1WsWGttfGuFs=>,
 
http://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dml_about_deletes_c.html<https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.datastax.com_en_cassandra_2.1_cassandra_dml_dml-5Fabout-5Fdeletes-5Fc.html=DwMFaQ=LFYZ-o9_HUMeMTSQicvjIg=FsmDztdsVuIKml8IDhdHdg=CDWTETAp4ysP2BZOZf0NmK0uMC8DqyczuMM153cHKwU=js6Qj4wHUwpGKxZIKujRi850HoJKGGr9Hyuu9hMhB_M=>)

That being said, are you giving enough time for your tables to flush to disk? 
Deletes generate markers which can and will consume memory until they have a 
chance to be flushed, after which they will impact query time and performance 
(but should relieve memory pressure). If you're saturating the capability of 
your nodes your tables will have difficulty flushing. See 
http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_memtable_thruput_c.html<https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.datastax.com_en_cassandra_2.1_cassandra_operations_ops-5Fmemtable-5Fthruput-5Fc.html=DwMFaQ=LFYZ-o9_HUMeMTSQicvjIg=FsmDztdsVuIKml8IDhdHdg=CDWTETAp4ysP2BZOZf0NmK0uMC8DqyczuMM153cHKwU=WG7gSaQQvLC2tO1GliJANn9ZddUG5Kb0jLThOy8vKwM=>.

This could also be a heap/memory configuration issue as

Re: Node failure Due To Very high GC pause time

2017-07-03 Thread Karthick V
Hi Bryan,



Thanks for your quick response.  We have already tuned our memory 
and GC based on our hardware specification and it was working fine until 
yesterday, i.e before facing the below specified delete request. As you 
specified we will once again look into our GC  memory configuration. 



FYKI :  We are using memtable_allocation_typ as offheap_objects. 



Consider the following table 



CREATE TABLE  EmployeeDetails (

branch_id text,

department_id  text,

emp_id bigint,

emp_details text,

PRIMARY KEY (branch, department, emp_id)

) WITH CLUSTERING ORDER BY (department ASC, emp_id ASC)






In this table I have 10 million records for the a particular branch_id and 
department_id . And following are the list of operation which I perform in C* 
in chronological order

Deleting 5 million records, from the start, in batches of 500 records per 
request for the particular branch_id (say 'xxx' ) and department_id (say 'yyy')

Read the next 500 records as soon the above delete operation is being completed 
( Select * from EmployeeDetails where branch_id='xxx' and department_id = 'yyy' 
and emp_id 5000 limit 500 )



It's only after executing the above read request there was a spike in memory 
and within few minutes the node has been marked down.



So my question here is , will the above read request will load all the deleted 
5 million records in my memory before it starts fetching or will it jump 
directly to the offset of 5001 record (since we have specified the greater 
than condition) ? If its going to the former case then for sure the read 
request will keep the data in main memory and performs merge operation before 
it delivers the data as per this wiki( 
https://wiki.apache.org/cassandra/ReadPathForUsers ). If not let me know how 
the above specified read request will provide me the data .





Note : And also while analyzing my heap dump its clear that majority of the 
memory is being held my Tombstone threads.





Thanks in advance 

-- karthick

 







 On Mon, 03 Jul 2017 20:40:10 +0530 Bryan Cheng 
br...@blockcypher.com wrote 




This is a very antagonistic use case for Cassandra :P I assume you're familiar 
with Cassandra and deletes? (eg. 
http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html, 
http://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dml_about_deletes_c.html)



That being said, are you giving enough time for your tables to flush to disk? 
Deletes generate markers which can and will consume memory until they have a 
chance to be flushed, after which they will impact query time and performance 
(but should relieve memory pressure). If you're saturating the capability of 
your nodes your tables will have difficulty flushing. See 
http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_memtable_thruput_c.html.



This could also be a heap/memory configuration issue as well or a GC tuning 
issue (although unlikely if you've left those at default)



--Bryan






On Mon, Jul 3, 2017 at 7:51 AM, Karthick V karthick...@zohocorp.com 
wrote:








Hi,



  Recently In my test Cluster I faced a outrageous GC activity which made 
the Node unreachable inside the cluster itself.



Scenario : 

  In a Partition of 5Million rows we read first 500 (by giving the starting 
range) and delete the same 500 again.The same has been done recursively by 
changing the Start range alone. Initially I didn't see any difference in the 
query performance ( upto 50,000) but later I observed a significant increase in 
performance when reached about a 3.3Million the read request failed and the 
node went unreachable. After analysing my GC logs it is clear that 99% of my 
old-memory space is occupied and there are no more space for allocation it 
caused the machine stall.

   here my is doubt is that does all the deleted 3.3Million row will be 
loaded in my on-heap memory? if not what will be object that occupying those 
memory ?.   



PS : I am using C* 2.1.13 in cluster. 

















Re: Node failure Due To Very high GC pause time

2017-07-03 Thread Bryan Cheng
This is a very antagonistic use case for Cassandra :P I assume you're
familiar with Cassandra and deletes? (eg.
http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html,
http://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dml_about_deletes_c.html
)

That being said, are you giving enough time for your tables to flush to
disk? Deletes generate markers which can and will consume memory until they
have a chance to be flushed, after which they will impact query time and
performance (but should relieve memory pressure). If you're saturating the
capability of your nodes your tables will have difficulty flushing. See
http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_memtable_thruput_c.html
.

This could also be a heap/memory configuration issue as well or a GC tuning
issue (although unlikely if you've left those at default)

--Bryan


On Mon, Jul 3, 2017 at 7:51 AM, Karthick V  wrote:

> Hi,
>
>   Recently In my test Cluster I faced a outrageous GC activity which
> made the Node unreachable inside the cluster itself.
>
> Scenario :
>   In a Partition of 5Million rows we read first 500 (by giving the
> starting range) and delete the same 500 again.The same has been done
> recursively by changing the Start range alone. Initially I didn't see any
> difference in the query performance ( upto 50,000) but later I observed a
> significant increase in performance when reached about a 3.3Million the
> read request failed and the node went unreachable. After analysing my GC
> logs it is clear that 99% of my old-memory space is occupied and there are
> no more space for allocation it caused the machine stall.
>here my is doubt is that does all the deleted 3.3Million row will
> be loaded in my on-heap memory? if not what will be object that occupying
> those memory ?.
>
> PS : I am using C* 2.1.13 in cluster.
>
>
>
>
>


Re: node failure, and automatic decommission (or removetoken)

2011-03-01 Thread Mimi Aluminium
It helps, Thanks a lot,
miriam

On Mon, Feb 28, 2011 at 9:50 PM, Aaron Morton aa...@thelastpickle.comwrote:

  I thought there was more to it.

 The steps for move or removing nodes are outlined on the operations page
 wiki as you probably know.

 What approach are you considering to rebalancing the token distribution
 when removing a node? E.g. If you have 5 nodes and remove 1 the best long
 term solution is to spread that token range across the remaining 4. This
 will result in additional data streaming.

 My understanding is that Cassandra is designed for a relatively stable
 number of nodes in the cluster. With the assumption the failures are
 generally transitory. The features to handle permanent moves and removal are
 somewhat heavy weight and not designed to be used frequently.

 Hope that helps
 Aaron
   On 1/03/2011, at 2:22 AM, Mimi Aluminium mimi.alumin...@gmail.com
 wrote:

   Aaron,
 Thanks a lot,
 Actually I meant a larger number of nodes than 3 and replication factor of
 3.
 We are looking on a system that may shrink due to permanent failures, and
 then automatically detects the failure and stream its range to other nodes
 in the cluster to have again 3 replicas.
 I understnd there is no such script.
 Thanks
 Miriam

 On Mon, Feb 28, 2011 at 11:51 AM, aaron morton aa...@thelastpickle.comwrote:

  AFAIK the general assumption is that you will want to repair the node
 manually, within the GCGraceSeconds period. If this cannot be done then
 nodetool decomission and removetoken are the recommended approach.

 In your example though, with 3 nodes and an RF of 3 your cluster can
 sustain a single node failure and continue to operate at CL Quorum for reads
 and writes. So there is no immediate need to move data.

 Does that help?

 Aaron

  On 28 Feb 2011, at 07:41, Mimi Aluminium wrote:

  Hi,
 I have a question about a tool or a wrapper that perform automatic data
 move upon node failure?
 Assuming I have  3 nodes with a replication factor of 3. In case of one
 node failure, does the third replica (that was located before on the failed
 node ) re-appears on one the of live nodes?
 I am looking for something that is similar to Hinted Handoff but with with
 a viable that can be read.
 I know we can stream manually the data (using nodetool move or
 decommissions), but is there something automatic?
 I also found an open ticket 957 but was not sure this is what I am looking
 for.
 Thanks
 Miriam








Re: node failure, and automatic decommission (or removetoken)

2011-02-28 Thread aaron morton
AFAIK the general assumption is that you will want to repair the node manually, 
within the GCGraceSeconds period. If this cannot be done then nodetool 
decomission and removetoken are the recommended approach. 

In your example though, with 3 nodes and an RF of 3 your cluster can sustain a 
single node failure and continue to operate at CL Quorum for reads and writes. 
So there is no immediate need to move data. 

Does that help? 

Aaron

On 28 Feb 2011, at 07:41, Mimi Aluminium wrote:

 Hi,
 I have a question about a tool or a wrapper that perform automatic data move 
 upon node failure?
 Assuming I have  3 nodes with a replication factor of 3. In case of one node 
 failure, does the third replica (that was located before on the failed node ) 
 re-appears on one the of live nodes? 
 I am looking for something that is similar to Hinted Handoff but with with a 
 viable that can be read.
 I know we can stream manually the data (using nodetool move or 
 decommissions), but is there something automatic?
 I also found an open ticket 957 but was not sure this is what I am looking 
 for.
 Thanks
 Miriam
  
  
  
 Miriam Allalouf, PhD
 
 Storage  Network Research
 IBM, Haifa Research Labs
 Tel: 972-3-7689525
 Mobile: 972-52-3664129
 e-mail: miri...@il.ibm.com
 



Re: node failure, and automatic decommission (or removetoken)

2011-02-28 Thread Mimi Aluminium
Aaron,
Thanks a lot,
Actually I meant a larger number of nodes than 3 and replication factor of
3.
We are looking on a system that may shrink due to permanent failures, and
then automatically detects the failure and stream its range to other nodes
in the cluster to have again 3 replicas.
I understnd there is no such script.
Thanks
Miriam

On Mon, Feb 28, 2011 at 11:51 AM, aaron morton aa...@thelastpickle.comwrote:

  AFAIK the general assumption is that you will want to repair the node
 manually, within the GCGraceSeconds period. If this cannot be done then
 nodetool decomission and removetoken are the recommended approach.

 In your example though, with 3 nodes and an RF of 3 your cluster can
 sustain a single node failure and continue to operate at CL Quorum for reads
 and writes. So there is no immediate need to move data.

 Does that help?

 Aaron

  On 28 Feb 2011, at 07:41, Mimi Aluminium wrote:

  Hi,
 I have a question about a tool or a wrapper that perform automatic data
 move upon node failure?
 Assuming I have  3 nodes with a replication factor of 3. In case of one
 node failure, does the third replica (that was located before on the failed
 node ) re-appears on one the of live nodes?
 I am looking for something that is similar to Hinted Handoff but with with
 a viable that can be read.
 I know we can stream manually the data (using nodetool move or
 decommissions), but is there something automatic?
 I also found an open ticket 957 but was not sure this is what I am looking
 for.
 Thanks
 Miriam