RE: Restart Cassandra automatically
What anti-pattern are you mocking me for exactly? Sean Durity From: daemeon reiydelle [mailto:daeme...@gmail.com] Sent: Tuesday, February 23, 2016 11:21 AM To: user@cassandra.apache.org Subject: RE: Restart Cassandra automatically Cassandra nodes do not go down "for no reason". They are not stateless. I would like to thank you for this marvelous example of a wonderful antipattern. Absolutely fantastic. Thank you! I am not being a satirical smartass. I sometimes am challenged by clients in my presentations about sre best practices around c*, hadoop, and elk on the grounds that "noone would ever do this in production". Now I have objective proof! Daemeon sent from my mobile Daemeon C.M. Reiydelle USA 415.501.0198 London +44.0.20.8144.9872 On Feb 23, 2016 7:53 AM, <sean_r_dur...@homedepot.com<mailto:sean_r_dur...@homedepot.com>> wrote: Yes, I can see the potential problem in theory. However, we never do your #2. Generally, we don’t have unused spare hardware. We just fix the host that is down and run repairs. (Side note: while I have seen nodes fight it out over who owns a particular token in earlier versions, it seems that 1.2+ doesn’t allow that to happen as easily. The second node will just not come up.) For most of our use cases, I would agree with your Coli Conjecture. Sean Durity From: Robert Coli [mailto:rc...@eventbrite.com<mailto:rc...@eventbrite.com>] Sent: Tuesday, February 09, 2016 4:41 PM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: Restart Cassandra automatically On Tue, Feb 9, 2016 at 6:20 AM, <sean_r_dur...@homedepot.com<mailto:sean_r_dur...@homedepot.com>> wrote: Call me naïve, but we do use an in-house built program for keeping nodes started (based on a flag-check). The program is something that was written for all kinds of daemon processes here, not Cassandra specifically. The basic idea is that is runs a status check. If that fails, and the flag is set, start Cassandra. In my opinion, it has helped more than hurt us – especially with the very fragile 1.1 releases that were prone to heap problems. Ok, you're naïve.. ;P But seriously, think of this scenario : 1) Node A, responsible for range A-M, goes down due to hardware failure of a disk in a RAID 2) Node B is put into service and is made responsible for A-M 3) Months pass 4) Node A comes back up, announces that it is responsible for A-M, and the cluster agrees Consistency is now permanently broken for any involved rows. Why doesn't it (usually) matter? It's not so much that you are naïve but that you are providing still more support for the Coli Conjecture : "If you are using a distributed database you probably do not care about consistency, even if you think you do." You have repeatedly chosen Availability over Consistency and it has never had a negative impact on your actual application. =Rob The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment. The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.
Re: Restart Cassandra automatically
Hi Subharaj, Cassandra is built to be a Fault tolerant distributed db and suitable for building HA systems. As Cassandra provides multiple replicas for the same data, if a single nide goes down in Production, it wont bring down the cluster. In my opinion, if you target to start one or more failed Cassandra nodes without investigating the issue, you can damage system health rather than preserve it. Please set RF amd CL appropriately to ensure that system can afford node failures. ThanksAnuj Sent from Yahoo Mail on Android On Fri, 5 Feb, 2016 at 9:56 am, Debraj Mannawrote: Hi, What is the best way to keep cassandra running? My requirement is if for some reason cassandra stops then it should get started automatically. I tried to achieve this by adding cassandra to supervisord. My supervisor conf for cassandra looks like below:- [program:cassandra] command=/bin/bash -c 'sleep 10 && bin/cassandra' directory=/opt/cassandra/ autostart=true autorestart=true startretries=3 stderr_logfile=/var/log/cassandra_supervisor.err.log stdout_logfile=/var/log/cassandra_supervisor.out.log But it does not seem to work properly. Even if I stop cassandra from supervisor then the cassandra process seem to be running if I do ps -ef | grep cassandra I also tried the configuration mentioned in this question but still no luck. Can someone let me know what is the best way to keep cassandra running on production environment? Environment - Cassandra 2.2.4 - Debian 8 Thanks,
RE: Restart Cassandra automatically
Cassandra nodes do not go down "for no reason". They are not stateless. I would like to thank you for this marvelous example of a wonderful antipattern. Absolutely fantastic. Thank you! I am not being a satirical smartass. I sometimes am challenged by clients in my presentations about sre best practices around c*, hadoop, and elk on the grounds that "noone would ever do this in production". Now I have objective proof! Daemeon sent from my mobile Daemeon C.M. Reiydelle USA 415.501.0198 London +44.0.20.8144.9872 On Feb 23, 2016 7:53 AM, <sean_r_dur...@homedepot.com> wrote: > Yes, I can see the potential problem in theory. However, we never do your > #2. Generally, we don’t have unused spare hardware. We just fix the host > that is down and run repairs. (Side note: while I have seen nodes fight it > out over who owns a particular token in earlier versions, it seems that > 1.2+ doesn’t allow that to happen as easily. The second node will just not > come up.) > > > > For most of our use cases, I would agree with your Coli Conjecture. > > > > > > Sean Durity > > > > *From:* Robert Coli [mailto:rc...@eventbrite.com] > *Sent:* Tuesday, February 09, 2016 4:41 PM > *To:* user@cassandra.apache.org > *Subject:* Re: Restart Cassandra automatically > > > > On Tue, Feb 9, 2016 at 6:20 AM, <sean_r_dur...@homedepot.com> wrote: > > Call me naïve, but we do use an in-house built program for keeping nodes > started (based on a flag-check). The program is something that was written > for all kinds of daemon processes here, not Cassandra specifically. The > basic idea is that is runs a status check. If that fails, and the flag is > set, start Cassandra. In my opinion, it has helped more than hurt us – > especially with the very fragile 1.1 releases that were prone to heap > problems. > > > > Ok, you're naïve.. ;P > > > > But seriously, think of this scenario : > > > > 1) Node A, responsible for range A-M, goes down due to hardware failure of > a disk in a RAID > > 2) Node B is put into service and is made responsible for A-M > > 3) Months pass > > 4) Node A comes back up, announces that it is responsible for A-M, and the > cluster agrees > > > > Consistency is now permanently broken for any involved rows. Why doesn't > it (usually) matter? > > > > It's not so much that you are naïve but that you are providing still more > support for the Coli Conjecture : "If you are using a distributed database > you probably do not care about consistency, even if you think you do." You > have repeatedly chosen Availability over Consistency and it has never had a > negative impact on your actual application. > > > > =Rob > > > > -- > > The information in this Internet Email is confidential and may be legally > privileged. It is intended solely for the addressee. Access to this Email > by anyone else is unauthorized. If you are not the intended recipient, any > disclosure, copying, distribution or any action taken or omitted to be > taken in reliance on it, is prohibited and may be unlawful. When addressed > to our clients any opinions or advice contained in this Email are subject > to the terms and conditions expressed in any applicable governing The Home > Depot terms of business or client engagement letter. The Home Depot > disclaims all responsibility and liability for the accuracy and content of > this attachment and for any damages or losses arising from any > inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other > items of a destructive nature, which may be contained in this attachment > and shall not be liable for direct, indirect, consequential or special > damages in connection with this e-mail message or its attachment. >
RE: Restart Cassandra automatically
Yes, I can see the potential problem in theory. However, we never do your #2. Generally, we don’t have unused spare hardware. We just fix the host that is down and run repairs. (Side note: while I have seen nodes fight it out over who owns a particular token in earlier versions, it seems that 1.2+ doesn’t allow that to happen as easily. The second node will just not come up.) For most of our use cases, I would agree with your Coli Conjecture. Sean Durity From: Robert Coli [mailto:rc...@eventbrite.com] Sent: Tuesday, February 09, 2016 4:41 PM To: user@cassandra.apache.org Subject: Re: Restart Cassandra automatically On Tue, Feb 9, 2016 at 6:20 AM, <sean_r_dur...@homedepot.com<mailto:sean_r_dur...@homedepot.com>> wrote: Call me naïve, but we do use an in-house built program for keeping nodes started (based on a flag-check). The program is something that was written for all kinds of daemon processes here, not Cassandra specifically. The basic idea is that is runs a status check. If that fails, and the flag is set, start Cassandra. In my opinion, it has helped more than hurt us – especially with the very fragile 1.1 releases that were prone to heap problems. Ok, you're naïve.. ;P But seriously, think of this scenario : 1) Node A, responsible for range A-M, goes down due to hardware failure of a disk in a RAID 2) Node B is put into service and is made responsible for A-M 3) Months pass 4) Node A comes back up, announces that it is responsible for A-M, and the cluster agrees Consistency is now permanently broken for any involved rows. Why doesn't it (usually) matter? It's not so much that you are naïve but that you are providing still more support for the Coli Conjecture : "If you are using a distributed database you probably do not care about consistency, even if you think you do." You have repeatedly chosen Availability over Consistency and it has never had a negative impact on your actual application. =Rob The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.
RE: Restart Cassandra automatically
Call me naïve, but we do use an in-house built program for keeping nodes started (based on a flag-check). The program is something that was written for all kinds of daemon processes here, not Cassandra specifically. The basic idea is that is runs a status check. If that fails, and the flag is set, start Cassandra. In my opinion, it has helped more than hurt us – especially with the very fragile 1.1 releases that were prone to heap problems. Sean Durity From: Robert Coli [mailto:rc...@eventbrite.com] Sent: Friday, February 05, 2016 1:28 PM To: user@cassandra.apache.org Subject: Re: Restart Cassandra automatically On Thu, Feb 4, 2016 at 8:26 PM, Debraj Manna <subharaj.ma...@gmail.com<mailto:subharaj.ma...@gmail.com>> wrote: What is the best way to keep cassandra running? My requirement is if for some reason cassandra stops then it should get started automatically. I recommend against this mode of operation. When automatically restarting, you have no idea how long Cassandra has been stopped and for what reason. In some cases, you really do not want it to start up and attempt to participate in whatever cluster it was formerly participating in. I understand this creates a support overhead, especially with very large clusters, but it's difficult for me to accept the premise that net operational safety will be improved by naively restarting nodes. =Rob The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.
Re: Restart Cassandra automatically
On Thu, Feb 4, 2016 at 8:26 PM, Debraj Mannawrote: > What is the best way to keep cassandra running? My requirement is if for > some reason cassandra stops then it should get started automatically. > I recommend against this mode of operation. When automatically restarting, you have no idea how long Cassandra has been stopped and for what reason. In some cases, you really do not want it to start up and attempt to participate in whatever cluster it was formerly participating in. I understand this creates a support overhead, especially with very large clusters, but it's difficult for me to accept the premise that net operational safety will be improved by naively restarting nodes. =Rob