Re: 99.999% uptime - Operations Best Practices?
I think very high uptime, and very low data loss is achievable in Cassandra, but, for new users there are TONS of gotchas. You really have to know what you're doing, and I doubt that many people acquire that knowledge without making a lot of mistakes. I see above that most people are talking about configuration issues. But, the first thing that you will probably do, before you have any experience with Cassandra(!), is architect your system. Architecture is not easily changed when you bump into a gotcha, and for some reason you really have to search the literature well to find out about them. So, my contributions: The too many CFs problem. Cassandra doesn't do well with many column families. If you come from a relational world, a real application can easily have hundreds of tables. Even if you combine them into entities (which is the Cassandra way), you can easily end up with dozens of entities. The most natural thing for someone with a relational background is have one CF per entity, plus indexes according to your needs. Don't do it. You need to store multiple entities in the same CF. Group them together according to access patterns (i.e. when you use X, you probably also need Y), and distinguish them by adding a prefix to their keys (e.g. entityName@key). Don't use supercolumns, use composite columns. Supercolumns are disfavored by the Cassandra community and are slowly being orphaned. For example, secondary indexes don't work on supercolumns. Nor does CQL. Bugs crop up with supercolumns that don't happen with regular columns because internally there's a huge separate code base for supercolumns, and every new feature is designed and implemented for regular columns and then retrofitted for supercolumns (or not). There should really be a database of gotchas somewhere, and how they were solved... On Thu, Jun 23, 2011 at 6:57 AM, Les Hazlewood l...@katasoft.com wrote: Edward, Thank you so much for this reply - this is great stuff, and I really appreciate it. You'll be happy to know that I've already pre-ordered your book. I'm looking forward to it! (When is the ship date?) Best regards, Les On Wed, Jun 22, 2011 at 7:03 PM, Edward Capriolo edlinuxg...@gmail.com wrote: On Wed, Jun 22, 2011 at 8:31 PM, Les Hazlewood l...@katasoft.com wrote: Hi Thoku, You were able to more concisely represent my intentions (and their reasoning) in this thread than I was able to do so myself. Thanks! On Wed, Jun 22, 2011 at 5:14 PM, Thoku Hansen tho...@gmail.com wrote: I think that Les's question was reasonable. Why *not* ask the community for the 'gotchas'? Whether the info is already documented or not, it could be an opportunity to improve the documentation based on users' perception. The you just have to learn responses are fair also, but that reminds me of the days when running Oracle was a black art, and accumulated wisdom made DBAs irreplaceable. Yes, this was my initial concern. I know that Cassandra is still young, and I expect this to be the norm for a while, but I was hoping to make that process a bit easier (for me and anyone else reading this thread in the future). Some recommendations *are* documented, but they are dispersed / stale / contradictory / or counter-intuitive. Others have not been documented in the wiki nor in DataStax's doco, and are instead learned anecdotally or The Hard Way. For example, whether documented or not, some of the 'gotchas' that I encountered when I first started working with Cassandra were: * Don't use OpenJDK. Prefer the Sun JDK. (Wiki says this, Jira says that). * Its not viable to run without JNA installed. * Disable swap memory. * Need to run nodetool repair on a regular basis. I'm looking forward to Edward Capriolo's Cassandra book which Les will probably find helpful. Thanks for linking to this. I'm pre-ordering right away. And thanks for the pointers, they are exactly the kind of enumerated things I was looking to elicit. These are the kinds of things that are hard to track down in a single place. I think it'd be nice for the community to contribute this stuff to a single page ('best practices', 'checklist', whatever you want to call it). It would certainly make things easier when getting started. Thanks again, Les Since I got a plug on the book I will chip in again to the thread :) Some things that were mentioned already: Install JNA absolutely (without JNA the snapshot command has to fork to hard link the sstables, I have seen clients backoff from this). Also the performance focused Cassandra devs always try to squeeze out performance by utilizing more native features. OpenJDK vs Sun. I agree, almost always try to do what 'most others' do in production, this way you get surprised less. Other stuff: RAID. You might want to go RAID 1+0 if you are aiming for uptime. RAID 0 has better performance, but if you lose a node your capacity is diminished, rebuilding and rejoining a node involves more
Re: 99.999% uptime - Operations Best Practices?
On 06/23/11 09:43, David Boxenhorn wrote: I think very high uptime, and very low data loss is achievable in Cassandra, but, for new users there are TONS of gotchas. You really have to know what you're doing, and I doubt that many people acquire that knowledge without making a lot of mistakes. I see above that most people are talking about configuration issues. But, the first thing that you will probably do, before you have any experience with Cassandra(!), is architect your system. Architecture is not easily changed when you bump into a gotcha, and for some reason you really have to search the literature well to find out about them. So, my contributions: The too many CFs problem. Cassandra doesn't do well with many column families. If you come from a relational world, a real application can easily have hundreds of tables. Even if you combine them into entities (which is the Cassandra way), you can easily end up with dozens of entities. The most natural thing for someone with a relational background is have one CF per entity, plus indexes according to your needs. Don't do it. You need to store multiple entities in the same CF. Group them together according to access patterns (i.e. when you use X, you probably also need Y), and distinguish them by adding a prefix to their keys (e.g. entityName@key). While avoiding too many CF's is a good idea I would also advise against a very large CF. Keeping a CF size down, helps speed up repair and compact. -- Karl
Re: 99.999% uptime - Operations Best Practices?
Les, Cassandra is a good system, but it has not reached version 1.0 yet, nor has HBase etc. It is cutting edge technology and therefore in practice you are unlikely to achieve five nines immediately - even if in theory with perfect planning, perfect administration and so on, this should be achievable even now. The reasons you might choose Cassandra are:- 1. New more flexible data model that may increase developer productivity and lead to fast release cycle 2. Superior capability as concerns being able to *write* large volumes of data, which is incredibly useful in many applications 3. Horizontal scalability, where you can add nodes rather than buying bigger machines 4. Data redundancy, which means you have a kind of live backup going on a bit like RAID - we use replication factor 3 for example 5. Due to the redundancy of data across the cluster, the ability to perform rolling restarts to administer and upgrade your nodes while the cluster continues to run (yes, this is the feature that in theory allows for continual operation, but in practice until we reach 1.0 I don't think five nines of uptime is always possible in every scenario yet because of deficiencies that may present themselves unexpectedly) 6. The benefit of building your new product on a platform designed to solve many modern computing challenges that will give you a better upgrade path e.g. for example in future when you grow you won't have to change over from SQL to NoSQL because you're already on it! These are pretty compelling arguments, but you have to be realistic about where Cassandra is right now. For what it's worth though, you might also consider how easy it is to screw up databases running on commercial production software that are handling very large amounts of data (just let the volumes handling the commit log run short of disk space for example). Setting up a Cassandra cluster is the simplest way to handle big data I've seen and this reduction in complexity will also contribute to uptime. Best, Dominic On 22 June 2011 22:24, Les Hazlewood l...@katasoft.com wrote: I'm planning on using Cassandra as a product's core data store, and it is imperative that it never goes down or loses data, even in the event of a data center failure. This uptime requirement (five nines: 99.999% uptime) w/ WAN capabilities is largely what led me to choose Cassandra over other NoSQL products, given its history and 'from the ground up' design for such operational benefits. However, in a recent thread, a user indicated that all 4 of 4 of his Cassandra instances were down because the OS killed the Java processes due to memory starvation, and all 4 instances went down in a relatively short period of time of each other. Another user helped out and replied that running 0.8 and nodetool repair on each node regularly via a cron job (once a day?) seems to work for him. Naturally this was disconcerting to read, given our needs for a Highly Available product - we'd be royally screwed if this ever happened to us. But given Cassandra's history and it's current production use, I'm aware that this HA/uptime is being achieved today, and I believe it is certainly achievable. So, is there a collective set of guidelines or best practices to ensure this problem (or unavailability due to OOM) can be easily managed? Things like memory settings, initial GC recommendations, cron recommendations, ulimit settings, etc. that can be bundled up as a best-practices Production Kickstart? Could anyone share their nuggets of wisdom or point me to resources where this may already exist? Thanks! Best regards, Les
Re: 99.999% uptime - Operations Best Practices?
On 06/22/2011 10:03 PM, Edward Capriolo wrote: I have not read the original thread concerning the problem you mentioned. One way to avoid OOM is large amounts of RAM :) On a more serious note most OOM's are caused by setting caches or memtables too large. If the OOM was caused by a software bug, the cassandra devs are on the ball and move fast. I still suggest not jumping into a release right away. For what it's worth that particular thread was about the kernel oom killer, which is a good example of a the kind of gotcha that has caused several people to chime in with the importance of monitoring both Cassandra and the OS.
Re: 99.999% uptime - Operations Best Practices?
On 06/22/2011 07:12 PM, Les Hazlewood wrote: Telling me to read the mailing lists and follow the issue tracker and use monitoring software is all great and fine - and I do all of these things today already - but this is a philosophical recommendation that does not actually address my question. So I chalk this up as an error on my side in not being clear in my question - my apologies. Let me reformulate it :) For what it's worth that was intended as a concrete suggestion. We adopted Cassandra a year ago when (IMHO) it was a mistake to do so it without the willingness to develop sufficient in house expertise to internally patch/fork/debug if needed. Things are more mature now, best practices more widespread etc., but you should judge that yourself. In the spirit of your re-formulated questions: - Read-before-write is a Cassandra anti-pattern, avoid it if at all possible. - Those optional lines in the env script about GC logging? Uncomment them on at least some of your boxes. - use MLOCKALL+mmap, or standard io, but not mmap without MLOCKALL.
Re: 99.999% uptime - Operations Best Practices?
Great stuff Chris - thanks so much for the feedback! Les
Re: 99.999% uptime - Operations Best Practices?
In the spirit of your re-formulated questions: - Read-before-write is a Cassandra anti-pattern, avoid it if at all possible. This leads me to believe that Cassandra may not be a good idea for a primary OLTP data store. For example only create a user object if email foo is not already in use or, more generally, you can't create object X because one with an existing constraint already exists. Is that a fair assumption? Actually, this may not be true, at least using Digg and Twitter as examples. I'd assume those apps are far more read-heavy than they are write-heavy, but I wouldn't know for sure.
Re: 99.999% uptime - Operations Best Practices?
On 06/23/2011 01:56 PM, Les Hazlewood wrote: Is there a roadmap or time to 1.0? Even a ballpark time (e.g next year 3rd quarter, end of year, etc) would be great as it would help me understand where it may lie in relation to my production rollout. The C* devs are rather strongly inclined against putting too much meaning in version numbers. The next major release might be called 1.0. Or maybe it won't. Either way it won't be different code or support from something called 0.9 or 10.0. September 8th is the feature freeze for the next major release.
Re: 99.999% uptime - Operations Best Practices?
As an additional concrete detail to Edward's response, 'result pinning' can provide some performance improvements depending on topology and workload. See the conf file comments for details: https://github.com/apache/cassandra/blob/cassandra-0.8.0/conf/cassandra.yaml#L308-315 I would also advise to take the time to experiment with consistency levels (particularly in multi-DC setup) and their effect on response times and weigh those against your consistency requirements. For the record, any performance twiddling will only provide useful results when comparable metrics are available for the similar workload (Les, it appears you have a good grasp of this already - just wanted to re-iterate).
99.999% uptime - Operations Best Practices?
I'm planning on using Cassandra as a product's core data store, and it is imperative that it never goes down or loses data, even in the event of a data center failure. This uptime requirement (five nines: 99.999% uptime) w/ WAN capabilities is largely what led me to choose Cassandra over other NoSQL products, given its history and 'from the ground up' design for such operational benefits. However, in a recent thread, a user indicated that all 4 of 4 of his Cassandra instances were down because the OS killed the Java processes due to memory starvation, and all 4 instances went down in a relatively short period of time of each other. Another user helped out and replied that running 0.8 and nodetool repair on each node regularly via a cron job (once a day?) seems to work for him. Naturally this was disconcerting to read, given our needs for a Highly Available product - we'd be royally screwed if this ever happened to us. But given Cassandra's history and it's current production use, I'm aware that this HA/uptime is being achieved today, and I believe it is certainly achievable. So, is there a collective set of guidelines or best practices to ensure this problem (or unavailability due to OOM) can be easily managed? Things like memory settings, initial GC recommendations, cron recommendations, ulimit settings, etc. that can be bundled up as a best-practices Production Kickstart? Could anyone share their nuggets of wisdom or point me to resources where this may already exist? Thanks! Best regards, Les
Re: 99.999% uptime - Operations Best Practices?
On Wed, Jun 22, 2011 at 2:24 PM, Les Hazlewood l...@katasoft.com wrote: I'm planning on using Cassandra as a product's core data store, and it is imperative that it never goes down or loses data, even in the event of a data center failure. This uptime requirement (five nines: 99.999% uptime) w/ WAN capabilities is largely what led me to choose Cassandra over other NoSQL products, given its history and 'from the ground up' design for such operational benefits. However, in a recent thread, a user indicated that all 4 of 4 of his Cassandra instances were down because the OS killed the Java processes due to memory starvation, and all 4 instances went down in a relatively short period of time of each other. Another user helped out and replied that running 0.8 and nodetool repair on each node regularly via a cron job (once a day?) seems to work for him. Naturally this was disconcerting to read, given our needs for a Highly Available product - we'd be royally screwed if this ever happened to us. But given Cassandra's history and it's current production use, I'm aware that this HA/uptime is being achieved today, and I believe it is certainly achievable. So, is there a collective set of guidelines or best practices to ensure this problem (or unavailability due to OOM) can be easily managed? Things like memory settings, initial GC recommendations, cron recommendations, ulimit settings, etc. that can be bundled up as a best-practices Production Kickstart? Unfortunately most of these are in the category of it depends. -ryan Could anyone share their nuggets of wisdom or point me to resources where this may already exist? Thanks! Best regards, Les
Re: 99.999% uptime - Operations Best Practices?
Just to be clear: I understand that resources like [1] and [2] exist, and I've read them. I'm just wondering if there are any 'gotchas' that might be missing from that documentation that should be considered and if there are any recommendations in addition to these documents. Thanks, Les [1] http://www.datastax.com/docs/0.8/operations/index [2] http://wiki.apache.org/cassandra/Operations
Re: 99.999% uptime - Operations Best Practices?
I understand that every environment is different and it always 'depends' :) But recommending settings and techniques based on an existing real production environment (like the user's suggestion to run nodetool repair as a regular cron job) is always a better starting point for a new Cassandra evaluator than having to start from scratch. Ryan, do you have any 'seed' settings that you guys use for nodes at Twitter? Are there any resources/write-ups beyond the two I've listed already that address some of these 'gotchas'? If those two links are in fact the ideal starting point, that's fine - but it appears that this may not be the case however based on the aforementioned user as well as the other who helped him who saw similar warning signs. I'm hoping for someone to dispel these reports based on what people actually do in production today. Any info/settings/recommendations based on real production environments would be appreciated! Thanks again, Les
Re: 99.999% uptime - Operations Best Practices?
Implement monitoring and be proactive...that will stop you waking up to a big surprise. i'm sure there were symltoms leading up to all 4 nodes going down. willing to wager that each node went down at different times and not all went down at once... On Jun 22, 2011 11:50 PM, Les Hazlewood l...@katasoft.com wrote: I understand that every environment is different and it always 'depends' :) But recommending settings and techniques based on an existing real production environment (like the user's suggestion to run nodetool repair as a regular cron job) is always a better starting point for a new Cassandra evaluator than having to start from scratch. Ryan, do you have any 'seed' settings that you guys use for nodes at Twitter? Are there any resources/write-ups beyond the two I've listed already that address some of these 'gotchas'? If those two links are in fact the ideal starting point, that's fine - but it appears that this may not be the case however based on the aforementioned user as well as the other who helped him who saw similar warning signs. I'm hoping for someone to dispel these reports based on what people actually do in production today. Any info/settings/recommendations based on real production environments would be appreciated! Thanks again, Les
Re: 99.999% uptime - Operations Best Practices?
Sadly, they all went down within minutes of each other. Sent from my iPhone On Jun 22, 2011, at 6:16 PM, Sasha Dolgy sdo...@gmail.com wrote: Implement monitoring and be proactive...that will stop you waking up to a big surprise. i'm sure there were symltoms leading up to all 4 nodes going down. willing to wager that each node went down at different times and not all went down at once... On Jun 22, 2011 11:50 PM, Les Hazlewood l...@katasoft.com wrote: I understand that every environment is different and it always 'depends' :) But recommending settings and techniques based on an existing real production environment (like the user's suggestion to run nodetool repair as a regular cron job) is always a better starting point for a new Cassandra evaluator than having to start from scratch. Ryan, do you have any 'seed' settings that you guys use for nodes at Twitter? Are there any resources/write-ups beyond the two I've listed already that address some of these 'gotchas'? If those two links are in fact the ideal starting point, that's fine - but it appears that this may not be the case however based on the aforementioned user as well as the other who helped him who saw similar warning signs. I'm hoping for someone to dispel these reports based on what people actually do in production today. Any info/settings/recommendations based on real production environments would be appreciated! Thanks again, Les
Re: 99.999% uptime - Operations Best Practices?
On 06/22/2011 05:33 PM, Les Hazlewood wrote: Just to be clear: I understand that resources like [1] and [2] exist, and I've read them. I'm just wondering if there are any 'gotchas' that might be missing from that documentation that should be considered and if there are any recommendations in addition to these documents. Thanks, Les [1] http://www.datastax.com/docs/0.8/operations/index [2] http://wiki.apache.org/cassandra/Operations Well if they new some secret gotcha the dutiful cassandra operators of the world would update the wiki. The closest thing to a 'gotcha' is that neither Cassandra nor any other technology is going to get you those nines. Humans will need to commit to reading the mailing lists, following JIRA, and understanding what the code is doing. And humans will need to commit to combine that understanding with monitoring and alerting to figure out all of the it depends for your particular case.
Re: 99.999% uptime - Operations Best Practices?
Committing to that many 9s is going to be impossible since as far as I know no internet service provier will sla you more the 2 9s . You can not have more uptime then your isp. On Wednesday, June 22, 2011, Chris Burroughs chris.burrou...@gmail.com wrote: On 06/22/2011 05:33 PM, Les Hazlewood wrote: Just to be clear: I understand that resources like [1] and [2] exist, and I've read them. I'm just wondering if there are any 'gotchas' that might be missing from that documentation that should be considered and if there are any recommendations in addition to these documents. Thanks, Les [1] http://www.datastax.com/docs/0.8/operations/index [2] http://wiki.apache.org/cassandra/Operations Well if they new some secret gotcha the dutiful cassandra operators of the world would update the wiki. The closest thing to a 'gotcha' is that neither Cassandra nor any other technology is going to get you those nines. Humans will need to commit to reading the mailing lists, following JIRA, and understanding what the code is doing. And humans will need to commit to combine that understanding with monitoring and alerting to figure out all of the it depends for your particular case.
Re: 99.999% uptime - Operations Best Practices?
you have to use multiple data centers to really deliver 4 or 5 9's of service On Wed, Jun 22, 2011 at 7:09 PM, Edward Capriolo edlinuxg...@gmail.com wrote: Committing to that many 9s is going to be impossible since as far as I know no internet service provier will sla you more the 2 9s . You can not have more uptime then your isp. On Wednesday, June 22, 2011, Chris Burroughs chris.burrou...@gmail.com wrote: On 06/22/2011 05:33 PM, Les Hazlewood wrote: Just to be clear: I understand that resources like [1] and [2] exist, and I've read them. I'm just wondering if there are any 'gotchas' that might be missing from that documentation that should be considered and if there are any recommendations in addition to these documents. Thanks, Les [1] http://www.datastax.com/docs/0.8/operations/index [2] http://wiki.apache.org/cassandra/Operations Well if they new some secret gotcha the dutiful cassandra operators of the world would update the wiki. The closest thing to a 'gotcha' is that neither Cassandra nor any other technology is going to get you those nines. Humans will need to commit to reading the mailing lists, following JIRA, and understanding what the code is doing. And humans will need to commit to combine that understanding with monitoring and alerting to figure out all of the it depends for your particular case.
Re: 99.999% uptime - Operations Best Practices?
[1] http://www.datastax.com/docs/0.8/operations/index [2] http://wiki.apache.org/cassandra/Operations Well if they new some secret gotcha the dutiful cassandra operators of the world would update the wiki. As I am new to the Cassandra community, I don't know how 'dutifully' this is maintained. My questions were not unreasonable question given the nature of open-source documentation. All I was looking for was what people thought were best practices based on their own production experience. Telling me to read the mailing lists and follow the issue tracker and use monitoring software is all great and fine - and I do all of these things today already - but this is a philosophical recommendation that does not actually address my question. So I chalk this up as an error on my side in not being clear in my question - my apologies. Let me reformulate it :) Does anyone out there have any concrete recommended techniques or insights in maintaining a HA Cassandra cluster that you've gained based on production experience beyond what is described in the 2 links above? Thanks, Les
Re: 99.999% uptime - Operations Best Practices?
On Wed, Jun 22, 2011 at 4:11 PM, Peter Lin wool...@gmail.com wrote: you have to use multiple data centers to really deliver 4 or 5 9's of service We do, hence my question, as well as my choice of Cassandra :) Best, Les
Re: 99.999% uptime - Operations Best Practices?
In my opinion 5 9s don't matter. It's the number of impacted customers. You might be down during peak for 5 mts causing 1000s of customer turn aways while you might be down during night causing only few customer turn aways. There is no magic bullet. It's all about learning and improving. You will not get HA right away, but over period of time as you learn and improve you will do better. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/99-999-uptime-Operations-Best-Practices-tp6506227p6506511.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: 99.999% uptime - Operations Best Practices?
so having multiple data centers is step 1 of 4/5 9's. I've worked on some services that had 3-4 9's SLA. Getting there is really tough as others have stated. you have to auditing built into your service, capacity metrics, capacity planning, some kind of real-time monitoring, staff to respond to alerts, plan for handling system failures, training to handle outage and a dozen other things. your best choice is to hire someone that has built a system that supports 4-5 9's and patiently work to get there. On Wed, Jun 22, 2011 at 7:16 PM, Les Hazlewood l...@katasoft.com wrote: On Wed, Jun 22, 2011 at 4:11 PM, Peter Lin wool...@gmail.com wrote: you have to use multiple data centers to really deliver 4 or 5 9's of service We do, hence my question, as well as my choice of Cassandra :) Best, Les
Re: 99.999% uptime - Operations Best Practices?
Forget the 5 9's - I apologize for even writing that. It was my shorthand way of saying 'this can never go down'. I'm not asking for philosophical advice - I've been doing large scale enterprise deployments for over 10 years. I 'get' the 'it depends' and 'do your homework' philosophy. All I'm asking for is concrete techniques that anyone might wish to share that they've found valuable beyond what is currently written in the existing operations documentation in [1] and [2]. If no one wants to share that, that's totally cool - no need to derail the thread into a different discussion. Thanks, Les
Re: 99.999% uptime - Operations Best Practices?
Start with reading comments on cassandra.yaml and http://wiki.apache.org/cassandra/Operations http://wiki.apache.org/cassandra/Operations As far as I know there is no comprehensive list for performance tuning. More specifically common setting applicable to everyone. For most part issues revolve around compactions and GC tuning. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/99-999-uptime-Operations-Best-Practices-tp6506227p6506529.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: 99.999% uptime - Operations Best Practices?
I have architected, built and been responsible for systems that support 4-5 9s for years. This discussion is not about how to do that generally. It was intended to be about concrete techniques that have been found valuable when deploying Cassandra in HA environments beyond what is documented in [1] and [2]. Cheers, Les
Re: 99.999% uptime - Operations Best Practices?
Yep, that was [2] on my existing list. Thanks very much for actually addressing my question - it is greatly appreciated! If anyone else has examples they'd like to share (like their own cron techniques, or JVM settings and why, etc), I'd love to hear them! Best regards, Les On Wed, Jun 22, 2011 at 4:24 PM, mcasandra mohitanch...@gmail.com wrote: Start with reading comments on cassandra.yaml and http://wiki.apache.org/cassandra/Operations http://wiki.apache.org/cassandra/Operations As far as I know there is no comprehensive list for performance tuning. More specifically common setting applicable to everyone. For most part issues revolve around compactions and GC tuning.
Re: 99.999% uptime - Operations Best Practices?
Les Hazlewood wrote: I have architected, built and been responsible for systems that support 4-5 9s for years. So have most of us. But probably by now it should be clear that no technology can provide concrete recommendations. They can only provide what might be helpful which varies from env to env. That's why I suggest look at the comments in cassandra.yaml and see which are applicable in your scenario. I learn something new everytime I read it. BTW: Can you be clear as to what kind of recommendations are you referring to? NetworkToplogy, how many copies to store, uptime, load balancing, request routing when on DC is down? If you ask specific questions you might get better response. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/99-999-uptime-Operations-Best-Practices-tp6506227p6506565.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: 99.999% uptime - Operations Best Practices?
On Wed, Jun 22, 2011 at 4:35 PM, mcasandra mohitanch...@gmail.com wrote: might be helpful which varies from env to env. That's why I suggest look at the comments in cassandra.yaml and see which are applicable in your scenario. I learn something new everytime I read it. Yep, and this was awesome - thanks very much for the reply - very helpful. BTW: Can you be clear as to what kind of recommendations are you referring to? NetworkToplogy, how many copies to store, uptime, load balancing, request routing when on DC is down? If you ask specific questions you might get better response. Yes, this was my fault in not being specific, but I intentionally left it open to see if anyone wanted to bring up something specific to their environment that they thought would be valuable ('e.g. when our nodes got to 95% memory utilization, we find that GC behavior is doing X. Setting the JVM option of 'foo' helped us reduce problem Y'). I was mainly looking initially for what folks thought were satisfactory initial JVM/GC and *nix OS settings for a production node (e.g. 8 cores w/ 64 gig ram, or an EC2 'large' or 'XL' node). E.g. what collector was used, and why, whether folks have used the standard CMS collector or if they've tried the G1 collector and what settings made them happy after testing... Those kinds of things. Call it a tiny 'case study' if you will. Network topology I thought I'd leave for a whole 'nuther discussion :) As an aside, I definitely plan to publish our actual JVM and OS settings and operational procedures once we find a happy medium based on our application in the event that it might help someone else. Thanks again! Les
Re: 99.999% uptime - Operations Best Practices?
Hi Les, I wanted to offer a couple thoughts on where to start and strategies for approaching development and deployment with reliability in mind. One way that we've found to more productively think about the reliability of our data tier is to focus our thoughts away from a concept of uptime or x nines toward one of error rates. Ryan mentioned that it depends, and while brief, this is actually a very correct comment. Perhaps I can help elaborate. Failures in systems distributed across multiple systems in multiple datacenters can rarely be described in terms of binary uptime guarantees (e.g., either everything is up or everything is down). Instead, certain nodes may be unavailable at certain times, but given appropriate read and write parameters (and their implicit tradeoffs), these service interruptions may remain transparent. Cassandra provides a variety of tools to allow you to tune these, two of the most important of which are the consistency level for reads and writes and your replication factor. I'm sure you're familiar with these, but mention them because thinking hard about the tradeoffs you're willing to make in terms of consistency and replication may heavily impact your operational experience if availability is of utmost importance. Of course, the single-node operational story is very important as well. Ryan's it depends comment here takes on painful significance for myself, as we've found that the manner in which read and write loads vary, their duration, and intensity can have very different operational profiles and failure modes. If relaxed consistency is acceptable for your reads and writes, you'll likely find querying with CL.ONE to be more available than QUROUM or ALL, at the cost of reduced consistency. Similarly, if it is economical for you to provision extra nodes for a higher replication factor, you will increase your ability to continue reading and writing in the event of single- or multiple-node failures. One of the prime challenges we've faced is reducing the frequency and intensity of full garbage collections in the JVM, which tend to result in single-node unavailability. Thanks to help from Jonathan Ellis and Peter Schuller (along with a fair amount of elbow grease ourselves), we've worked through several of these issues and have arrived at a steady state that leaves the ring happy even under load. We've not found GC tuning to bring night-and-day differences outside of resolving the STW collections, but the difference is noticeable. Occasionally, these issues will result from Cassandra's behavior itself; documented APIs such as querying for the count of all columns associated with a key will materialize the row across all nodes being queried. Once when issuing a count query for a key that had around 300k columns at CL.QUORUM, we knocked three nodes out of our ring by triggering a stop-the-world collection that lasted about 30 seconds, so watch out for things like that. Some of the other tuning knobs available to you involve tradeoffs such as when to flush memtables or to trigger compactions, both of which are somewhat intensive operations that can strain a cluster under heavy read or write load, but which are equally necessary for the cluster to remain in operation. If you find yourself pushing hard against these tradeoffs and attempting to navigate a path between icebergs, it's very likely that the best answer to the problem is more or more powerful hardware. But a lot of this is tacit knowledge, which often comes through a bit of pain but is hopefully operationally transparent to your users. Things that you discover once the system is live in operation and your monitoring is providing continuous feedback about the ring's health. This is where Sasha's point becomes so critical -- having advanced early-warning systems in place, watching monitoring and graphs closely even when everything's fine, and beginning to understand how it likes to operate and what it tends to do will give you a huge leg up on your reliability and allow you to react to issues in the ring before they present operational impact. You mention that you've been building HA systems for a long time -- indeed, far longer than I have, so I'm sure that you're also aware that good, solid up/down binaries are hard to come by, that none of this is easy, and that while some pointers are available (the defaults are actually quite good), it's essentially impossible to offer the best production defaults because they vary wildly based on your hardware, ring configuration, and read/write load and query patterns. To that end, you might find it more productive to begin with the defaults as you develop your system, and let the ring tell you how it's feeling as you begin load testing. Once you have stressed it to the point of failure, you'll see how it failed and either be able to isolate the cause and begin planning to handle that mode, or better yet, understand
Re: 99.999% uptime - Operations Best Practices?
I think that Les's question was reasonable. Why *not* ask the community for the 'gotchas'? Whether the info is already documented or not, it could be an opportunity to improve the documentation based on users' perception. The you just have to learn responses are fair also, but that reminds me of the days when running Oracle was a black art, and accumulated wisdom made DBAs irreplaceable. Some recommendations *are* documented, but they are dispersed / stale / contradictory / or counter-intuitive. Others have not been documented in the wiki nor in DataStax's doco, and are instead learned anecdotally or The Hard Way. For example, whether documented or not, some of the 'gotchas' that I encountered when I first started working with Cassandra were: * Don't use OpenJDK. Prefer the Sun JDK. (Wiki says this, Jira says that). * Its not viable to run without JNA installed. * Disable swap memory. * Need to run nodetool repair on a regular basis. I'm looking forward to Edward Capriolo's Cassandra book which Les will probably find helpful. On Jun 22, 2011, at 7:12 PM, Les Hazlewood wrote: [1] http://www.datastax.com/docs/0.8/operations/index [2] http://wiki.apache.org/cassandra/Operations Well if they new some secret gotcha the dutiful cassandra operators of the world would update the wiki. As I am new to the Cassandra community, I don't know how 'dutifully' this is maintained. My questions were not unreasonable question given the nature of open-source documentation. All I was looking for was what people thought were best practices based on their own production experience. Telling me to read the mailing lists and follow the issue tracker and use monitoring software is all great and fine - and I do all of these things today already - but this is a philosophical recommendation that does not actually address my question. So I chalk this up as an error on my side in not being clear in my question - my apologies. Let me reformulate it :) Does anyone out there have any concrete recommended techniques or insights in maintaining a HA Cassandra cluster that you've gained based on production experience beyond what is described in the 2 links above? Thanks, Les
Re: 99.999% uptime - Operations Best Practices?
Hi Scott, First, let me say that this email was amazing - I'm always appreciative of the time that anyone puts into mailing list replies, especially ones as thorough, well-thought and articulated as this one. I'm a firm believer that these types of replies reflect a strong and durable open-source community. You, sir, are a bad ass :) Thanks so much! As for the '5 9s' comment, I apologize for even writing that - it threw everyone off. It was a shorthand way of saying this data store is so critical to the product, that if it ever goes down entirely (as it did for one user of 4 nodes, all at the same time), then we're screwed. I was hoping to trigger the 'hrm - what have we done ourselves to work to that availability that wasn't easily represented in the documentation' train of thought. It proved to be a red herring however, so I apologize for even bringing it up. Thanks *very* much for the reply. I'll be sure to follow up with the list as I come across any particular issues and I'll also report my own findings in the interest of (hopefully) being beneficial to anyone in the future. Cheers, Les On Wed, Jun 22, 2011 at 4:58 PM, C. Scott Andreas csco...@urbanairship.comwrote: Hi Les, I wanted to offer a couple thoughts on where to start and strategies for approaching development and deployment with reliability in mind. One way that we've found to more productively think about the reliability of our data tier is to focus our thoughts away from a concept of uptime or *x* nines toward one of error rates. Ryan mentioned that it depends, and while brief, this is actually a very correct comment. Perhaps I can help elaborate. Failures in systems distributed across multiple systems in multiple datacenters can rarely be described in terms of binary uptime guarantees (e.g., either everything is up or everything is down). Instead, certain nodes may be unavailable at certain times, but given appropriate read and write parameters (and their implicit tradeoffs), these service interruptions may remain transparent. Cassandra provides a variety of tools to allow you to tune these, two of the most important of which are the consistency level for reads and writes and your replication factor. I'm sure you're familiar with these, but mention them because thinking hard about the tradeoffs you're willing to make in terms of consistency and replication may heavily impact your operational experience if availability is of utmost importance. Of course, the single-node operational story is very important as well. Ryan's it depends comment here takes on painful significance for myself, as we've found that the manner in which read and write loads vary, their duration, and intensity can have very different operational profiles and failure modes. If relaxed consistency is acceptable for your reads and writes, you'll likely find querying with CL.ONE to be more available than QUROUM or ALL, at the cost of reduced consistency. Similarly, if it is economical for you to provision extra nodes for a higher replication factor, you will increase your ability to continue reading and writing in the event of single- or multiple-node failures. One of the prime challenges we've faced is reducing the frequency and intensity of full garbage collections in the JVM, which tend to result in single-node unavailability. Thanks to help from Jonathan Ellis and Peter Schuller (along with a fair amount of elbow grease ourselves), we've worked through several of these issues and have arrived at a steady state that leaves the ring happy even under load. We've not found GC tuning to bring night-and-day differences outside of resolving the STW collections, but the difference is noticeable. Occasionally, these issues will result from Cassandra's behavior itself; documented APIs such as querying for the count of all columns associated with a key will materialize the row across all nodes being queried. Once when issuing a count query for a key that had around 300k columns at CL.QUORUM, we knocked three nodes out of our ring by triggering a stop-the-world collection that lasted about 30 seconds, so watch out for things like that. Some of the other tuning knobs available to you involve tradeoffs such as when to flush memtables or to trigger compactions, both of which are somewhat intensive operations that can strain a cluster under heavy read or write load, but which are equally necessary for the cluster to remain in operation. If you find yourself pushing hard against these tradeoffs and attempting to navigate a path between icebergs, it's very likely that the best answer to the problem is more or more powerful hardware. But a lot of this is tacit knowledge, which often comes through a bit of pain but is hopefully operationally transparent to your users. Things that you discover once the system is live in operation and your monitoring is providing continuous feedback about the ring's health.
Re: 99.999% uptime - Operations Best Practices?
Hi Thoku, You were able to more concisely represent my intentions (and their reasoning) in this thread than I was able to do so myself. Thanks! On Wed, Jun 22, 2011 at 5:14 PM, Thoku Hansen tho...@gmail.com wrote: I think that Les's question was reasonable. Why *not* ask the community for the 'gotchas'? Whether the info is already documented or not, it could be an opportunity to improve the documentation based on users' perception. The you just have to learn responses are fair also, but that reminds me of the days when running Oracle was a black art, and accumulated wisdom made DBAs irreplaceable. Yes, this was my initial concern. I know that Cassandra is still young, and I expect this to be the norm for a while, but I was hoping to make that process a bit easier (for me and anyone else reading this thread in the future). Some recommendations *are* documented, but they are dispersed / stale / contradictory / or counter-intuitive. Others have not been documented in the wiki nor in DataStax's doco, and are instead learned anecdotally or The Hard Way. For example, whether documented or not, some of the 'gotchas' that I encountered when I first started working with Cassandra were: * Don't use OpenJDK. Prefer the Sun JDK. (Wiki says thishttp://wiki.apache.org/cassandra/GettingStarted , Jira says that https://issues.apache.org/jira/browse/CASSANDRA-2441). * Its not viable to run without JNA installed. * Disable swap memory. * Need to run nodetool repair on a regular basis. I'm looking forward to Edward Capriolo's Cassandra bookhttps://www.packtpub.com/cassandra-apache-high-performance-cookbook/book which Les will probably find helpful. Thanks for linking to this. I'm pre-ordering right away. And thanks for the pointers, they are exactly the kind of enumerated things I was looking to elicit. These are the kinds of things that are hard to track down in a single place. I think it'd be nice for the community to contribute this stuff to a single page ('best practices', 'checklist', whatever you want to call it). It would certainly make things easier when getting started. Thanks again, Les
Re: 99.999% uptime - Operations Best Practices?
On Wed, Jun 22, 2011 at 8:31 PM, Les Hazlewood l...@katasoft.com wrote: Hi Thoku, You were able to more concisely represent my intentions (and their reasoning) in this thread than I was able to do so myself. Thanks! On Wed, Jun 22, 2011 at 5:14 PM, Thoku Hansen tho...@gmail.com wrote: I think that Les's question was reasonable. Why *not* ask the community for the 'gotchas'? Whether the info is already documented or not, it could be an opportunity to improve the documentation based on users' perception. The you just have to learn responses are fair also, but that reminds me of the days when running Oracle was a black art, and accumulated wisdom made DBAs irreplaceable. Yes, this was my initial concern. I know that Cassandra is still young, and I expect this to be the norm for a while, but I was hoping to make that process a bit easier (for me and anyone else reading this thread in the future). Some recommendations *are* documented, but they are dispersed / stale / contradictory / or counter-intuitive. Others have not been documented in the wiki nor in DataStax's doco, and are instead learned anecdotally or The Hard Way. For example, whether documented or not, some of the 'gotchas' that I encountered when I first started working with Cassandra were: * Don't use OpenJDK. Prefer the Sun JDK. (Wiki says thishttp://wiki.apache.org/cassandra/GettingStarted , Jira says that https://issues.apache.org/jira/browse/CASSANDRA-2441). * Its not viable to run without JNA installed. * Disable swap memory. * Need to run nodetool repair on a regular basis. I'm looking forward to Edward Capriolo's Cassandra bookhttps://www.packtpub.com/cassandra-apache-high-performance-cookbook/book which Les will probably find helpful. Thanks for linking to this. I'm pre-ordering right away. And thanks for the pointers, they are exactly the kind of enumerated things I was looking to elicit. These are the kinds of things that are hard to track down in a single place. I think it'd be nice for the community to contribute this stuff to a single page ('best practices', 'checklist', whatever you want to call it). It would certainly make things easier when getting started. Thanks again, Les Since I got a plug on the book I will chip in again to the thread :) Some things that were mentioned already: Install JNA absolutely (without JNA the snapshot command has to fork to hard link the sstables, I have seen clients backoff from this). Also the performance focused Cassandra devs always try to squeeze out performance by utilizing more native features. OpenJDK vs Sun. I agree, almost always try to do what 'most others' do in production, this way you get surprised less. Other stuff: RAID. You might want to go RAID 1+0 if you are aiming for uptime. RAID 0 has better performance, but if you lose a node your capacity is diminished, rebuilding and rejoining a node involves more manpower more steps and more chances for human error. Collect statistics on the normal system items CPU, disk (size and utilization), memory. Then collect the JMX cassandra counters and understand how they interact. For example record ReadCount and WriteCount per column family, then use try to determine how this effects disk utilization. You can use this for capacity planning. Then try using a key/row cache. Evaluate again. Check the hit rate graph for your new cache. How did this effect your disk? You want to head off anything that can be a performance killer like traffic patterns changing or data growing significantly. Do not be short on hardware. I do not want to say overbuy but if uptime is important have spares drives and servers and have room to grow. Balance that ring :) I have not read the original thread concerning the problem you mentioned. One way to avoid OOM is large amounts of RAM :) On a more serious note most OOM's are caused by setting caches or memtables too large. If the OOM was caused by a software bug, the cassandra devs are on the ball and move fast. I still suggest not jumping into a release right away. I know its hard to live without counters or CQL since new things are super cool. But if you want all those 9s your going to have to stay disciplined. Unless a release has a fix for a problem you think you have, stay a minor or revision back, or at least wait some time before upgrading to it, and do some internal confidence testing before pulling the trigger on an update. Almost all usecases demand that repair be run regularly due to the nature of distributed deletes. Other good tips, subscribe to all the mailing lists, and hang out in the IRC channels cassandra, cassandra-dev, cassandra-ops. You get an osmoses learning effect and you learn to fix or head off issues you never had.
Re: 99.999% uptime - Operations Best Practices?
Edward, Thank you so much for this reply - this is great stuff, and I really appreciate it. You'll be happy to know that I've already pre-ordered your book. I'm looking forward to it! (When is the ship date?) Best regards, Les On Wed, Jun 22, 2011 at 7:03 PM, Edward Capriolo edlinuxg...@gmail.comwrote: On Wed, Jun 22, 2011 at 8:31 PM, Les Hazlewood l...@katasoft.com wrote: Hi Thoku, You were able to more concisely represent my intentions (and their reasoning) in this thread than I was able to do so myself. Thanks! On Wed, Jun 22, 2011 at 5:14 PM, Thoku Hansen tho...@gmail.com wrote: I think that Les's question was reasonable. Why *not* ask the community for the 'gotchas'? Whether the info is already documented or not, it could be an opportunity to improve the documentation based on users' perception. The you just have to learn responses are fair also, but that reminds me of the days when running Oracle was a black art, and accumulated wisdom made DBAs irreplaceable. Yes, this was my initial concern. I know that Cassandra is still young, and I expect this to be the norm for a while, but I was hoping to make that process a bit easier (for me and anyone else reading this thread in the future). Some recommendations *are* documented, but they are dispersed / stale / contradictory / or counter-intuitive. Others have not been documented in the wiki nor in DataStax's doco, and are instead learned anecdotally or The Hard Way. For example, whether documented or not, some of the 'gotchas' that I encountered when I first started working with Cassandra were: * Don't use OpenJDK. Prefer the Sun JDK. (Wiki says thishttp://wiki.apache.org/cassandra/GettingStarted , Jira says that https://issues.apache.org/jira/browse/CASSANDRA-2441 ). * Its not viable to run without JNA installed. * Disable swap memory. * Need to run nodetool repair on a regular basis. I'm looking forward to Edward Capriolo's Cassandra bookhttps://www.packtpub.com/cassandra-apache-high-performance-cookbook/book which Les will probably find helpful. Thanks for linking to this. I'm pre-ordering right away. And thanks for the pointers, they are exactly the kind of enumerated things I was looking to elicit. These are the kinds of things that are hard to track down in a single place. I think it'd be nice for the community to contribute this stuff to a single page ('best practices', 'checklist', whatever you want to call it). It would certainly make things easier when getting started. Thanks again, Les Since I got a plug on the book I will chip in again to the thread :) Some things that were mentioned already: Install JNA absolutely (without JNA the snapshot command has to fork to hard link the sstables, I have seen clients backoff from this). Also the performance focused Cassandra devs always try to squeeze out performance by utilizing more native features. OpenJDK vs Sun. I agree, almost always try to do what 'most others' do in production, this way you get surprised less. Other stuff: RAID. You might want to go RAID 1+0 if you are aiming for uptime. RAID 0 has better performance, but if you lose a node your capacity is diminished, rebuilding and rejoining a node involves more manpower more steps and more chances for human error. Collect statistics on the normal system items CPU, disk (size and utilization), memory. Then collect the JMX cassandra counters and understand how they interact. For example record ReadCount and WriteCount per column family, then use try to determine how this effects disk utilization. You can use this for capacity planning. Then try using a key/row cache. Evaluate again. Check the hit rate graph for your new cache. How did this effect your disk? You want to head off anything that can be a performance killer like traffic patterns changing or data growing significantly. Do not be short on hardware. I do not want to say overbuy but if uptime is important have spares drives and servers and have room to grow. Balance that ring :) I have not read the original thread concerning the problem you mentioned. One way to avoid OOM is large amounts of RAM :) On a more serious note most OOM's are caused by setting caches or memtables too large. If the OOM was caused by a software bug, the cassandra devs are on the ball and move fast. I still suggest not jumping into a release right away. I know its hard to live without counters or CQL since new things are super cool. But if you want all those 9s your going to have to stay disciplined. Unless a release has a fix for a problem you think you have, stay a minor or revision back, or at least wait some time before upgrading to it, and do some internal confidence testing before pulling the trigger on an update. Almost all usecases demand that repair be run regularly due to the nature of distributed deletes. Other good tips, subscribe to all the mailing lists, and hang out in the