RE: Restart Cassandra automatically

2016-02-23 Thread SEAN_R_DURITY
What anti-pattern are you mocking me for exactly?


Sean Durity

From: daemeon reiydelle [mailto:daeme...@gmail.com]
Sent: Tuesday, February 23, 2016 11:21 AM
To: user@cassandra.apache.org
Subject: RE: Restart Cassandra automatically


Cassandra nodes do not go down "for no reason". They are not stateless. I would 
like to thank you for this marvelous example of a wonderful antipattern. 
Absolutely fantastic.

Thank you! I am not being a satirical smartass. I sometimes am challenged by 
clients in my presentations about sre best practices around c*, hadoop, and elk 
on the grounds that "noone would ever do this in production". Now I have 
objective proof!

Daemeon

sent from my mobile
Daemeon C.M. Reiydelle
USA 415.501.0198
London +44.0.20.8144.9872
On Feb 23, 2016 7:53 AM, 
<sean_r_dur...@homedepot.com<mailto:sean_r_dur...@homedepot.com>> wrote:
Yes, I can see the potential problem in theory. However, we never do your #2. 
Generally, we don’t have unused spare hardware. We just fix the host that is 
down and run repairs. (Side note: while I have seen nodes fight it out over who 
owns a particular token in earlier versions, it seems that 1.2+ doesn’t allow 
that to happen as easily. The second node will just not come up.)

For most of our use cases, I would agree with your Coli Conjecture.


Sean Durity

From: Robert Coli [mailto:rc...@eventbrite.com<mailto:rc...@eventbrite.com>]
Sent: Tuesday, February 09, 2016 4:41 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Restart Cassandra automatically

On Tue, Feb 9, 2016 at 6:20 AM, 
<sean_r_dur...@homedepot.com<mailto:sean_r_dur...@homedepot.com>> wrote:
Call me naïve, but we do use an in-house built program for keeping nodes 
started (based on a flag-check). The program is something that was written for 
all kinds of daemon processes here, not Cassandra specifically. The basic idea 
is that is runs a status check. If that fails, and the flag is set, start 
Cassandra. In my opinion, it has helped more than hurt us – especially with the 
very fragile 1.1 releases that were prone to heap problems.

Ok, you're naïve.. ;P

But seriously, think of this scenario :

1) Node A, responsible for range A-M, goes down due to hardware failure of a 
disk in a RAID
2) Node B is put into service and is made responsible for A-M
3) Months pass
4) Node A comes back up, announces that it is responsible for A-M, and the 
cluster agrees

Consistency is now permanently broken for any involved rows. Why doesn't it 
(usually) matter?

It's not so much that you are naïve but that you are providing still more 
support for the Coli Conjecture : "If you are using a distributed database you 
probably do not care about consistency, even if you think you do." You have 
repeatedly chosen Availability over Consistency and it has never had a negative 
impact on your actual application.

=Rob




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


Re: Restart Cassandra automatically

2016-02-23 Thread Anuj Wadehra
Hi Subharaj,
Cassandra is built to be a Fault tolerant distributed db and suitable for 
building HA systems. As Cassandra provides multiple replicas for the same data, 
if a single nide goes down in Production, it wont bring down the cluster.
In my opinion, if you target to start one or more failed Cassandra nodes 
without investigating the issue, you can damage system health rather than 
preserve it.
Please set RF amd CL appropriately to ensure that system can afford node 
failures.
ThanksAnuj

Sent from Yahoo Mail on Android 
 
  On Fri, 5 Feb, 2016 at 9:56 am, Debraj Manna wrote: 
  Hi,


What is the best way to keep cassandra running? My requirement is if for some 
reason cassandra stops then it should get started automatically. 

I tried to achieve this by adding cassandra to supervisord. My supervisor conf 
for cassandra looks like below:-
[program:cassandra]
command=/bin/bash -c 'sleep 10 && bin/cassandra'
directory=/opt/cassandra/
autostart=true
autorestart=true
startretries=3
stderr_logfile=/var/log/cassandra_supervisor.err.log
stdout_logfile=/var/log/cassandra_supervisor.out.log

But it does not seem to work properly. Even if I stop cassandra from 
supervisor then the cassandra process seem to be running if I do 
ps -ef | grep cassandra


I also tried the configuration mentioned in this question but still no luck.

Can someone let me know what is the best way to keep cassandra running on 
production environment?
Environment
   
   - Cassandra 2.2.4
   - Debian 8

Thanks,



  


RE: Restart Cassandra automatically

2016-02-23 Thread daemeon reiydelle
Cassandra nodes do not go down "for no reason". They are not stateless. I
would like to thank you for this marvelous example of a wonderful
antipattern. Absolutely fantastic.

Thank you! I am not being a satirical smartass. I sometimes am challenged
by clients in my presentations about sre best practices around c*, hadoop,
and elk on the grounds that "noone would ever do this in production". Now I
have objective proof!

Daemeon

sent from my mobile
Daemeon C.M. Reiydelle
USA 415.501.0198
London +44.0.20.8144.9872
On Feb 23, 2016 7:53 AM, <sean_r_dur...@homedepot.com> wrote:

> Yes, I can see the potential problem in theory. However, we never do your
> #2. Generally, we don’t have unused spare hardware. We just fix the host
> that is down and run repairs. (Side note: while I have seen nodes fight it
> out over who owns a particular token in earlier versions, it seems that
> 1.2+ doesn’t allow that to happen as easily. The second node will just not
> come up.)
>
>
>
> For most of our use cases, I would agree with your Coli Conjecture.
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* Robert Coli [mailto:rc...@eventbrite.com]
> *Sent:* Tuesday, February 09, 2016 4:41 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Restart Cassandra automatically
>
>
>
> On Tue, Feb 9, 2016 at 6:20 AM, <sean_r_dur...@homedepot.com> wrote:
>
> Call me naïve, but we do use an in-house built program for keeping nodes
> started (based on a flag-check). The program is something that was written
> for all kinds of daemon processes here, not Cassandra specifically. The
> basic idea is that is runs a status check. If that fails, and the flag is
> set, start Cassandra. In my opinion, it has helped more than hurt us –
> especially with the very fragile 1.1 releases that were prone to heap
> problems.
>
>
>
> Ok, you're naïve.. ;P
>
>
>
> But seriously, think of this scenario :
>
>
>
> 1) Node A, responsible for range A-M, goes down due to hardware failure of
> a disk in a RAID
>
> 2) Node B is put into service and is made responsible for A-M
>
> 3) Months pass
>
> 4) Node A comes back up, announces that it is responsible for A-M, and the
> cluster agrees
>
>
>
> Consistency is now permanently broken for any involved rows. Why doesn't
> it (usually) matter?
>
>
>
> It's not so much that you are naïve but that you are providing still more
> support for the Coli Conjecture : "If you are using a distributed database
> you probably do not care about consistency, even if you think you do." You
> have repeatedly chosen Availability over Consistency and it has never had a
> negative impact on your actual application.
>
>
>
> =Rob
>
>
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>


RE: Restart Cassandra automatically

2016-02-23 Thread SEAN_R_DURITY
Yes, I can see the potential problem in theory. However, we never do your #2. 
Generally, we don’t have unused spare hardware. We just fix the host that is 
down and run repairs. (Side note: while I have seen nodes fight it out over who 
owns a particular token in earlier versions, it seems that 1.2+ doesn’t allow 
that to happen as easily. The second node will just not come up.)

For most of our use cases, I would agree with your Coli Conjecture.


Sean Durity

From: Robert Coli [mailto:rc...@eventbrite.com]
Sent: Tuesday, February 09, 2016 4:41 PM
To: user@cassandra.apache.org
Subject: Re: Restart Cassandra automatically

On Tue, Feb 9, 2016 at 6:20 AM, 
<sean_r_dur...@homedepot.com<mailto:sean_r_dur...@homedepot.com>> wrote:
Call me naïve, but we do use an in-house built program for keeping nodes 
started (based on a flag-check). The program is something that was written for 
all kinds of daemon processes here, not Cassandra specifically. The basic idea 
is that is runs a status check. If that fails, and the flag is set, start 
Cassandra. In my opinion, it has helped more than hurt us – especially with the 
very fragile 1.1 releases that were prone to heap problems.

Ok, you're naïve.. ;P

But seriously, think of this scenario :

1) Node A, responsible for range A-M, goes down due to hardware failure of a 
disk in a RAID
2) Node B is put into service and is made responsible for A-M
3) Months pass
4) Node A comes back up, announces that it is responsible for A-M, and the 
cluster agrees

Consistency is now permanently broken for any involved rows. Why doesn't it 
(usually) matter?

It's not so much that you are naïve but that you are providing still more 
support for the Coli Conjecture : "If you are using a distributed database you 
probably do not care about consistency, even if you think you do." You have 
repeatedly chosen Availability over Consistency and it has never had a negative 
impact on your actual application.

=Rob




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


RE: Restart Cassandra automatically

2016-02-09 Thread SEAN_R_DURITY
Call me naïve, but we do use an in-house built program for keeping nodes 
started (based on a flag-check). The program is something that was written for 
all kinds of daemon processes here, not Cassandra specifically. The basic idea 
is that is runs a status check. If that fails, and the flag is set, start 
Cassandra. In my opinion, it has helped more than hurt us – especially with the 
very fragile 1.1 releases that were prone to heap problems.

Sean Durity

From: Robert Coli [mailto:rc...@eventbrite.com]
Sent: Friday, February 05, 2016 1:28 PM
To: user@cassandra.apache.org
Subject: Re: Restart Cassandra automatically

On Thu, Feb 4, 2016 at 8:26 PM, Debraj Manna 
<subharaj.ma...@gmail.com<mailto:subharaj.ma...@gmail.com>> wrote:

What is the best way to keep cassandra running? My requirement is if for some 
reason cassandra stops then it should get started automatically.
I recommend against this mode of operation. When automatically restarting, you 
have no idea how long Cassandra has been stopped and for what reason. In some 
cases, you really do not want it to start up and attempt to participate in 
whatever cluster it was formerly participating in.

I understand this creates a support overhead, especially with very large 
clusters, but it's difficult for me to accept the premise that net operational 
safety will be improved by naively restarting nodes.

=Rob




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


Re: Restart Cassandra automatically

2016-02-05 Thread Robert Coli
On Thu, Feb 4, 2016 at 8:26 PM, Debraj Manna 
wrote:

> What is the best way to keep cassandra running? My requirement is if for
> some reason cassandra stops then it should get started automatically.
>
I recommend against this mode of operation. When automatically restarting,
you have no idea how long Cassandra has been stopped and for what reason.
In some cases, you really do not want it to start up and attempt to
participate in whatever cluster it was formerly participating in.

I understand this creates a support overhead, especially with very large
clusters, but it's difficult for me to accept the premise that net
operational safety will be improved by naively restarting nodes.

=Rob