Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
On 5 Aug 2015, at 1:34 am, Joshua Harlow harlo...@outlook.com wrote: Philipp Marek wrote: If we end up using a DLM then we have to detect when the connection to the DLM is lost on a node and stop all ongoing operations to prevent data corruption. It may not be trivial to do, but we will have to do it in any solution we use, even on my last proposal that only uses the DB in Volume Manager we would still need to stop all operations if we lose connection to the DB. Well, is it already decided that Pacemaker would be chosen to provide HA in Openstack? There's been a talk Pacemaker: the PID 1 of Openstack IIRC. I know that Pacemaker's been pushed aside in an earlier ML post, but IMO there's already *so much* been done for HA in Pacemaker that Openstack should just use it. All HA nodes needs to participate in a Pacemaker cluster - and if one node looses connection, all services will get stopped automatically (by Pacemaker) - or the node gets fenced. No need to invent some sloppy scripts to do exactly the tasks (badly!) that the Linux HA Stack has been providing for quite a few years. Yes, Pacemaker needs learning - but not more than any other involved project, and there are already quite a few here, which have to be known to any operator or developer already. (BTW, LINBIT sells training for the Linux HA Cluster Stack - and yes, I work for them ;) So just a piece of information, but yahoo (the company I work for, with vms in the tens of thousands, baremetal in the much more than that...) hasn't used pacemaker, and in all honesty this is the first project (openstack) that I have heard that needs such a solution. I feel that we really should be building our services better so that they can be A-A vs having to depend on another piece of software to get around our 'sloppiness' (for lack of a better word). HA is a deceptively hard problem. There is really no need for every project to attempt to solve it on their own. Having everyone consuming/calculating a different membership list is a very good way to go insane. Aside from the usual bugs, the HA space lends itself to making simplifying assumptions early on, only to trap you with them down the road. Its even worse if you’re trying to bolt it on after-the-fact... Perhaps try to think of pacemaker as a distribute finite state machine instead of a cluster manager. That is part of the value we bring to projects like galera and rabbitmq. Sure they are A-A, and once they’re up they can survive many failures, but bringing them up can be non-trivial. We also provide the additional context (eg. quorum and fencing) that allow more kinds of failures to be safely recovered from. Something to think about perhaps. — Andrew Nothing against pacemaker personally... IMHO it just doesn't feel like we are doing this right if we need such a product in the first place. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
On Tue, Aug 4, 2015 at 7:47 PM, Morgan Fainberg morgan.fainb...@gmail.com wrote: On Tue, Aug 4, 2015 at 1:43 AM, Gorka Eguileor gegui...@redhat.com wrote: On Tue, Aug 04, 2015 at 05:47:44AM +1000, Morgan Fainberg wrote: On Aug 4, 2015, at 01:42, Fox, Kevin M kevin@pnnl.gov wrote: I'm usually for abstraction layers, but they don't always pay off very well due to catering to the lowest common denominator. Lets clearly define the problem space first. IFF the problem space can be fully implemented using Tooz, then lets do that. Then the operator can choose. If Tooz cant and wont handle the problem space, then we're trying to fit a square peg in a round hole. +1 and specifically around tooz, it is narrow in comparison to the feature sets of some the DLMs (since it has to mostly-implement to the lowest common denominator, as abstraction layers do). Defining the space we are trying to target will let us make the informed decision on what we use. Again with this? Yes, I was reiterating that we should not talk about a specific choice but continue with the other discussion. Tooz, ZooKeeper, Consul, etc, is all irrelevant to the rest of the conversation we are having. The specific technology used can be discussed in an x-project spec, but I really would rather see a very opinionated choice. That can again be delayed until a later point. We already what we want to get out of Tooz, where we want it and for how long we'll be using it in each of those places. My response was also before the rest of the convo that occurred post Flavio's summary. To answer those questions all that's needed is to read this thread and the links referred on some conversations. I am fine with using a DLM. I see a significant benefit (without putting too fine a point on it, Keystone *will* benefit from a choice for a DLM to be available in OpenStack, and I like the idea). I was hoping to continue (and we did) identify where we had DLM-like/DLM uses in OpenStack so we knew where to focus. Hey all, This thread is a mess. I'm going to put together facts with what projects are doing and why. I will present my findings at the session that I will be moderating in the cross project track of the summit [1], if accepted. Spec may follow. [1] - https://etherpad.openstack.org/p/mitaka-cross-project-session-planning -- Mike Perez __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
On 04/08/15 23:39 -0700, Mike Perez wrote: On Tue, Aug 4, 2015 at 7:47 PM, Morgan Fainberg morgan.fainb...@gmail.com wrote: On Tue, Aug 4, 2015 at 1:43 AM, Gorka Eguileor gegui...@redhat.com wrote: On Tue, Aug 04, 2015 at 05:47:44AM +1000, Morgan Fainberg wrote: On Aug 4, 2015, at 01:42, Fox, Kevin M kevin@pnnl.gov wrote: I'm usually for abstraction layers, but they don't always pay off very well due to catering to the lowest common denominator. Lets clearly define the problem space first. IFF the problem space can be fully implemented using Tooz, then lets do that. Then the operator can choose. If Tooz cant and wont handle the problem space, then we're trying to fit a square peg in a round hole. +1 and specifically around tooz, it is narrow in comparison to the feature sets of some the DLMs (since it has to mostly-implement to the lowest common denominator, as abstraction layers do). Defining the space we are trying to target will let us make the informed decision on what we use. Again with this? Yes, I was reiterating that we should not talk about a specific choice but continue with the other discussion. Tooz, ZooKeeper, Consul, etc, is all irrelevant to the rest of the conversation we are having. The specific technology used can be discussed in an x-project spec, but I really would rather see a very opinionated choice. That can again be delayed until a later point. We already what we want to get out of Tooz, where we want it and for how long we'll be using it in each of those places. My response was also before the rest of the convo that occurred post Flavio's summary. To answer those questions all that's needed is to read this thread and the links referred on some conversations. I am fine with using a DLM. I see a significant benefit (without putting too fine a point on it, Keystone *will* benefit from a choice for a DLM to be available in OpenStack, and I like the idea). I was hoping to continue (and we did) identify where we had DLM-like/DLM uses in OpenStack so we knew where to focus. Hey all, This thread is a mess. I'm going to put together facts with what projects are doing and why. I will present my findings at the session that I will be moderating in the cross project track of the summit [1], if accepted. Spec may follow. [1] - https://etherpad.openstack.org/p/mitaka-cross-project-session-planning FWIW, there are 2 threads now. This one that you just replied to is supposed to be related to Cinder and not to the cross-project discussion. It's a mess, I agree! :( That said, you may want to sync with Joshua since he's going to work on a cross-project spec as well (as he mentioned in the other thread).[0] Thanks for taking the time, Flavio [0] http://lists.openstack.org/pipermail/openstack-dev/2015-August/071400.html -- Mike Perez __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- @flaper87 Flavio Percoco pgpjhEZ_lKeP9.pgp Description: PGP signature __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
Well, is it already decided that Pacemaker would be chosen to provide HA in Openstack? There's been a talk Pacemaker: the PID 1 of Openstack IIRC. I know that Pacemaker's been pushed aside in an earlier ML post, but IMO there's already *so much* been done for HA in Pacemaker that Openstack should just use it. All HA nodes needs to participate in a Pacemaker cluster - and if one node looses connection, all services will get stopped automatically (by Pacemaker) - or the node gets fenced. No need to invent some sloppy scripts to do exactly the tasks (badly!) that the Linux HA Stack has been providing for quite a few years. So just a piece of information, but yahoo (the company I work for, with vms in the tens of thousands, baremetal in the much more than that...) hasn't used pacemaker, and in all honesty this is the first project (openstack) that I have heard that needs such a solution. I feel that we really should be building our services better so that they can be A-A vs having to depend on another piece of software to get around our 'sloppiness' (for lack of a better word). Nothing against pacemaker personally... IMHO it just doesn't feel like we are doing this right if we need such a product in the first place. Well, Pacemaker is *the* Linux HA Stack. So, before trying to achieve similar goals by self-written scripts (and having to re-discover all the gotchas involved), it would be much better to learn from previous experiences - even if they are not one's own. Pacemaker has eg. the concept of clones[1] - these define services that run multiple instances within a cluster. And behold! the instances get some Pacemaker-internal unique id[2], which can be used to do sharding. Yes, that still means that upon service or node crash the failed instance has to be started on some other node; but as that'll typically be up and running already, the startup time should be in the range of seconds. We'd instantly get * a supervisor to start/stop/restart/fence/monitor the service(s) * node/service failure detection * only small changes needed in the services * and all that in a tested software that's available in all distributions, and that already has its own testsuite... If we decide that this solution won't fulfill all our expectations, fine - let's use something else. But I don't think it makes *any* sense to try to redo some (existing) High-Availability code in some quickly written scripts, just because it looks easy - there are quite a few traps for the unwary. Ad 1: http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-resource-clone.html Ad 2: OCF_RESKEY_CRM_meta_clone; that's not guaranteed to be an unbroken sequence, though. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
On Tue, Aug 04, 2015 at 08:30:17AM -0700, Joshua Harlow wrote: Duncan Thomas wrote: On 3 August 2015 at 20:53, Clint Byrum cl...@fewbar.com mailto:cl...@fewbar.com wrote: Excerpts from Devananda van der Veen's message of 2015-08-03 08:53:21 -0700: Also on a side note, I think Cinder's need for this is really subtle, and one could just accept that sometimes it's going to break when it does two things to one resource from two hosts. The error rate there might even be lower than the false-error rate that would be caused by a twitchy DLM with timeouts a little low. So there's a core cinder discussion that keeps losing to the shiny DLM discussion, and I'd like to see it played out fully: Could Cinder just not do anything, and let the few drivers that react _really_ badly, implement their own concurrency controls? So the problem here is data corruption. Lots of our races can cause data corruption. Not 'my instance didn't come up', not 'my network is screwed and I need to tear everything down and do it again', but 'My 1tb of customer database is now missing the second half'. This means that we *really* need some confidence and understanding in whatever we do. The idea of locks timing out and being stolen without fencing is frankly scary and begging for data corruption unless we're very careful. I'd rather use a persistent lock (e.g. a db record change) and manual recovery than a lock timeout that might cause corruption. So perhaps start off using persistent locks, gain confidence that we have all the right fixes in to prevent that data corruption, and then slowly remove persistent locks as needed. Sounds like an iterative solution to me, and one that will build confidence (hopefully that confidence building can be automated via a chaos-monkey like test-suite) as we go :) That was my suggestion as well, it is not that we cannot do without locks, it's that we have confidence in them and the current code that uses them, so we can start with an initial solution with distributed locks, confirm that the rest of the code is running properly (as distributed locks are not the only change needed) and then, on a second iteration, proceed to remove locks in the Volume Manager and lastly on the next iteration remove them in the drivers wherever it is possible, and for those places where it isn't possible maybe look for alternative solutions. This way we can get a solution faster and avoid potential delays that may raise if we try to do everything at once. But I can see the point of those who say that why put the ops through the DLM configuration process if we are probably going to remove the DLM in a couple of releases. But since we don't know how difficult it will get to remove all other locks, I think that a bird in the hand is worth two in the bush and we should still go with the distributed locks and at least make sure we have a solution. Cheers, Gorka. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
On Tue, Aug 04, 2015 at 08:40:13AM -0700, Joshua Harlow wrote: Clint Byrum wrote: Excerpts from Devananda van der Veen's message of 2015-08-03 08:53:21 -0700: On Mon, Aug 3, 2015 at 8:41 AM Joshua Harlowharlo...@outlook.com wrote: Clint Byrum wrote: Excerpts from Gorka Eguileor's message of 2015-08-02 15:49:46 -0700: On Fri, Jul 31, 2015 at 01:47:22AM -0700, Mike Perez wrote: On Mon, Jul 27, 2015 at 12:35 PM, Gorka Eguileorgegui...@redhat.com wrote: I know we've all been looking at the HA Active-Active problem in Cinder and trying our best to figure out possible solutions to the different issues, and since current plan is going to take a while (because it requires that we finish first fixing Cinder-Nova interactions), I've been looking at alternatives that allow Active-Active configurations without needing to wait for those changes to take effect. And I think I have found a possible solution, but since the HA A-A problem has a lot of moving parts I ended up upgrading my initial Etherpad notes to a post [1]. Even if we decide that this is not the way to go, which we'll probably do, I still think that the post brings a little clarity on all the moving parts of the problem, even some that are not reflected on our Etherpad [2], and it can help us not miss anything when deciding on a different solution. Based on IRC conversations in the Cinder room and hearing people's opinions in the spec reviews, I'm not convinced the complexity that a distributed lock manager adds to Cinder for both developers and the operators who ultimately are going to have to learn to maintain things like Zoo Keeper as a result is worth it. **Key point**: We're not scaling Cinder itself, it's about scaling to avoid build up of operations from the storage backend solutions themselves. Whatever people think ZooKeeper scaling level is going to accomplish is not even a question. We don't need it, because Cinder isn't as complex as people are making it. I'd like to think the Cinder team is a great in recognizing potential cross project initiatives. Look at what Thang Pham has done with Nova's version object solution. He made a generic solution into an Oslo solution for all, and Cinder is using it. That was awesome, and people really appreciated that there was a focus for other projects to get better, not just Cinder. Have people consider Ironic's hash ring solution? The project Akanda is now adopting it [1], and I think it might have potential. I'd appreciate it if interested parties could have this evaluated before the Cinder midcycle sprint next week, to be ready for discussion. [1] - https://review.openstack.org/#/c/195366/ -- Mike Perez Hi all, Since my original proposal was more complex that it needed be I have a new proposal of a simpler solution, and I describe how we can do it with or without a DLM since we don't seem to reach an agreement on that. The solution description was more rushed than previous one so I may have missed some things. http://gorka.eguileor.com/simpler-road-to-cinder-active-active/ I like the idea of keeping it simpler Gorka. :) Note that this is punting back to use the database for coordination, which is what most projects have done thus far, and has a number of advantages and disadvantages. Note that the stale-lock problem was solved in an interesting way in Heat: each engine process gets an instance-of-engine uuid that adds to the topic queues it listens to. If it holds a lock, it records this UUID in the owner field. When somebody wants to steal the lock (due to timeout) they send to this queue, and if there's no response, the lock is stolen. Anyway, I think what might make more sense than copying that directly, is implementing Use the database and oslo.messaging to build a DLM as a tooz backend. This way as the negative aspects of that approach impact an operator, they can pick a tooz driver that satisfies their needs, or even write one to their specific backend needs. Oh jeez, using 'the database and oslo.messaging to build a DLM' scares me :-/ There are already N + 1 DLM like-systems out there (and more every day if u consider the list at https://raftconsensus.github.io/#implementations) so I'd really rather use one that is proven to work by academia vs make a frankenstein one. Joshua, As has been said on this thread, some projects (eg, Ironic) are already using a consistent hash ring backed by a database to meet the requirements they have. Could those requirements also be met with some other tech? Yes. Would that provide additional functionality or some other benefits? Maybe. But that's not what this thread was about. Distributed hash rings are a well understood technique, as are databases. There's no need to be insulting by calling not-your-favorite-technology-of-the-day a frankenstein one. The topic here, which I've been eagerly following, is whether or not Cinder needs to
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
On 2015-08-05 09:10:30 +0200 (+0200), Philipp Marek wrote: [...] Pacemaker is *the* Linux HA Stack. [...] Can you expand on this assertion? It doesn't look to me like it's part of the Linux source tree and I see strong evidence to suggest it's released and distributed completely separately from the kernel. Statements like this one make the rest of your messages look even more like a marketing campaign, so I'd love to understand what you really mean (I seriously doubt you're campaigning for this specific piece of software, after all, but that's the way it comes across). -- Jeremy Stanley __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
[...] Pacemaker is *the* Linux HA Stack. [...] Can you expand on this assertion? It doesn't look to me like it's part of the Linux source tree and I see strong evidence to suggest it's released and distributed completely separately from the kernel. If you read Linux as GNU/Linux or Linux platform, instead of Linux kernel, it's what I meant. Statements like this one make the rest of your messages look even more like a marketing campaign, so I'd love to understand what you really mean (I seriously doubt you're campaigning for this specific piece of software, after all, but that's the way it comes across). Sorry for not being entirely clear. I thought that my message was good enough, as the OpenStack documentation itself already talks about Pacemaker: http://docs.openstack.org/high-availability-guide/content/ch-pacemaker.html OpenStack infrastructure high availability relies on the Pacemaker cluster stack, the state-of-the-art high availability and load balancing stack for the Linux platform. Pacemaker is storage and application-agnostic, and is in no way specific to OpenStack. Expanding on what we have, what GNU/Linux already has, and what is being used for Linux (platform) HA, I wanted to point out that most of the parts for _one_ possible solution already exists. Whether we want to go *that* route is yet to be decided, of course. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
On 17:03 Aug 05, Flavio Percoco wrote: snip That said, you may want to sync with Joshua since he's going to work on a cross-project spec as well (as he mentioned in the other thread).[0] http://lists.openstack.org/pipermail/openstack-dev/2015-August/071441.html -- Mike Perez __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
On 2015-08-05 14:36:37 +0200 (+0200), Philipp Marek wrote: [...] Pacemaker is *the* Linux HA Stack. [...] Can you expand on this assertion? It doesn't look to me like it's part of the Linux source tree and I see strong evidence to suggest it's released and distributed completely separately from the kernel. If you read Linux as GNU/Linux or Linux platform, instead of Linux kernel, it's what I meant. [...] Okay, that makes slightly more sense. So you're implying that Pacemaker is the only HA stack available for Linux-based platforms, or that it's the most popular, or... I guess I'm mostly thrown by your use of the definite article the (which you emphasized, so it seems like you must mean there are effectively no others?). -- Jeremy Stanley __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
[...] Pacemaker is *the* Linux HA Stack. [...] Can you expand on this assertion? It doesn't look to me like it's part of the Linux source tree and I see strong evidence to suggest it's released and distributed completely separately from the kernel. If you read Linux as GNU/Linux or Linux platform, instead of Linux kernel, it's what I meant. [...] Okay, that makes slightly more sense. So you're implying that Pacemaker is the only HA stack available for Linux-based platforms, or that it's the most popular, or... I guess I'm mostly thrown by your use of the definite article the (which you emphasized, so it seems like you must mean there are effectively no others?). Well, SUSE and Redhat (7) use Pacemaker by default, Debian/Ubuntu have it (along with others)... That gives it quite some market share, wouldn't you think? Yes, I guess the most popular meaning is a good match here. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
On 2015-08-05 15:31:03 +0200 (+0200), Philipp Marek wrote: [...] Pacemaker is *the* Linux HA Stack. [...] Can you expand on this assertion? It doesn't look to me like it's part of the Linux source tree and I see strong evidence to suggest it's released and distributed completely separately from the kernel. If you read Linux as GNU/Linux or Linux platform, instead of Linux kernel, it's what I meant. [...] Okay, that makes slightly more sense. So you're implying that Pacemaker is the only HA stack available for Linux-based platforms, or that it's the most popular, or... I guess I'm mostly thrown by your use of the definite article the (which you emphasized, so it seems like you must mean there are effectively no others?). Well, SUSE and Redhat (7) use Pacemaker by default, Debian/Ubuntu have it (along with others)... That gives it quite some market share, wouldn't you think? Yes, I guess the most popular meaning is a good match here. I see, so in the same way that nano is *the* Linux text editor (Debian/Ubuntu configure it as the default, SUSE and Redhat have it packaged). Popularity alone doesn't seem like a great criterion for making these sorts of technology choices. -- Jeremy Stanley __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
Well, SUSE and Redhat (7) use Pacemaker by default, Debian/Ubuntu have it (along with others)... That gives it quite some market share, wouldn't you think? Yes, I guess the most popular meaning is a good match here. I see, so in the same way that nano is *the* Linux text editor (Debian/Ubuntu configure it as the default, SUSE and Redhat have it packaged). Along with quite a few alternatives. How many cluster stack alternatives can you see in SUSE? How many cluster stack alternatives are available in _every_ major distribution? Popularity alone doesn't seem like a great criterion for making these sorts of technology choices. Popularity _alone_ is not the sole criteria, right. But to write something new just because of NIH is the wrong approach, IMO. [[ I'm going to stop arguing now. ]] __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
On 2015-08-05 15:48:52 +0200 (+0200), Philipp Marek wrote: [...] How many cluster stack alternatives can you see in SUSE? How many cluster stack alternatives are available in _every_ major distribution? I think it depends a lot on how you define cluster stack and whether the solution to the current dilemma needs one (for example, if the answer is simply a DLM then several examples have already surfaced elsewhere in this thread and the cross-project subthread). Popularity _alone_ is not the sole criteria, right. But to write something new just because of NIH is the wrong approach, IMO. I couldn't agree more. If the need is already met by an available solution which can be reused, that seems better for everyone. I just get concerned when I see messages which state there is only one technology choice and that choice is insert product my employer makes money selling/supporting. Acknowledging alternatives makes for a much less fanatical discussion. -- Jeremy Stanley __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
Excerpts from Philipp Marek's message of 2015-08-05 00:10:30 -0700: Well, is it already decided that Pacemaker would be chosen to provide HA in Openstack? There's been a talk Pacemaker: the PID 1 of Openstack IIRC. I know that Pacemaker's been pushed aside in an earlier ML post, but IMO there's already *so much* been done for HA in Pacemaker that Openstack should just use it. All HA nodes needs to participate in a Pacemaker cluster - and if one node looses connection, all services will get stopped automatically (by Pacemaker) - or the node gets fenced. No need to invent some sloppy scripts to do exactly the tasks (badly!) that the Linux HA Stack has been providing for quite a few years. So just a piece of information, but yahoo (the company I work for, with vms in the tens of thousands, baremetal in the much more than that...) hasn't used pacemaker, and in all honesty this is the first project (openstack) that I have heard that needs such a solution. I feel that we really should be building our services better so that they can be A-A vs having to depend on another piece of software to get around our 'sloppiness' (for lack of a better word). Nothing against pacemaker personally... IMHO it just doesn't feel like we are doing this right if we need such a product in the first place. Well, Pacemaker is *the* Linux HA Stack. I'm not sure it's wise to claim the definite article for anything in Open Source. :) That said, it's certainly the most mature, and widely accepted. So, before trying to achieve similar goals by self-written scripts (and having to re-discover all the gotchas involved), it would be much better to learn from previous experiences - even if they are not one's own. Pacemaker has eg. the concept of clones[1] - these define services that run multiple instances within a cluster. And behold! the instances get some Pacemaker-internal unique id[2], which can be used to do sharding. Yes, that still means that upon service or node crash the failed instance has to be started on some other node; but as that'll typically be up and running already, the startup time should be in the range of seconds. We'd instantly get * a supervisor to start/stop/restart/fence/monitor the service(s) * node/service failure detection * only small changes needed in the services * and all that in a tested software that's available in all distributions, and that already has its own testsuite... If we decide that this solution won't fulfill all our expectations, fine - let's use something else. But I don't think it makes *any* sense to try to redo some (existing) High-Availability code in some quickly written scripts, just because it looks easy - there are quite a few traps for the unwary. I think Keystone's dev team agrees with you, and also doesn't want to get in the way of that with any half-baked solution. They give you all the CLI tools and filesystem layouts to make this work perfectly. It would be nice to even ship the pacemaker resources in a contrib directory and run tests in the gate on them. But if users have some reason not to use it, they shouldn't be force to use it. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
On 3 August 2015 at 20:53, Clint Byrum cl...@fewbar.com wrote: Excerpts from Devananda van der Veen's message of 2015-08-03 08:53:21 -0700: Also on a side note, I think Cinder's need for this is really subtle, and one could just accept that sometimes it's going to break when it does two things to one resource from two hosts. The error rate there might even be lower than the false-error rate that would be caused by a twitchy DLM with timeouts a little low. So there's a core cinder discussion that keeps losing to the shiny DLM discussion, and I'd like to see it played out fully: Could Cinder just not do anything, and let the few drivers that react _really_ badly, implement their own concurrency controls? So the problem here is data corruption. Lots of our races can cause data corruption. Not 'my instance didn't come up', not 'my network is screwed and I need to tear everything down and do it again', but 'My 1tb of customer database is now missing the second half'. This means that we *really* need some confidence and understanding in whatever we do. The idea of locks timing out and being stolen without fencing is frankly scary and begging for data corruption unless we're very careful. I'd rather use a persistent lock (e.g. a db record change) and manual recovery than a lock timeout that might cause corruption. -- Duncan Thomas __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
On 04/08/15 10:50 +0200, Philipp Marek wrote: If we end up using a DLM then we have to detect when the connection to the DLM is lost on a node and stop all ongoing operations to prevent data corruption. It may not be trivial to do, but we will have to do it in any solution we use, even on my last proposal that only uses the DB in Volume Manager we would still need to stop all operations if we lose connection to the DB. Well, is it already decided that Pacemaker would be chosen to provide HA in Openstack? There's been a talk Pacemaker: the PID 1 of Openstack IIRC. I know that Pacemaker's been pushed aside in an earlier ML post, but IMO there's already *so much* been done for HA in Pacemaker that Openstack should just use it. All HA nodes needs to participate in a Pacemaker cluster - and if one node looses connection, all services will get stopped automatically (by Pacemaker) - or the node gets fenced. No need to invent some sloppy scripts to do exactly the tasks (badly!) that the Linux HA Stack has been providing for quite a few years. Yes, Pacemaker needs learning - but not more than any other involved project, and there are already quite a few here, which have to be known to any operator or developer already. (BTW, LINBIT sells training for the Linux HA Cluster Stack - and yes, I work for them ;) With all due respect, because I know you come with the best of intentions but, how is the above related to *anything* in this thread? Hasn't this thread digressed enough to throw pacemaker in the discussion just for the sake of it? I don't mean to come off harsh but seriously, let's stop trying to save the world in one thread and, instead, let's try to focus on what the real problem that is being solved is and how we can do it. The to DLM or not discussion was moved to a new thread (renamed from this one) and this one should be used to discussed *cinder specific issues* and hopefully come up with a solution - that Gorka has already proposed - that would provide a better story for Cinder in the short future. Flavio -- @flaper87 Flavio Percoco pgpkN153_B1FM.pgp Description: PGP signature __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
On Tue, Aug 04, 2015 at 10:32:40AM +0300, Duncan Thomas wrote: On 3 August 2015 at 20:53, Clint Byrum cl...@fewbar.com wrote: Excerpts from Devananda van der Veen's message of 2015-08-03 08:53:21 -0700: Also on a side note, I think Cinder's need for this is really subtle, and one could just accept that sometimes it's going to break when it does two things to one resource from two hosts. The error rate there might even be lower than the false-error rate that would be caused by a twitchy DLM with timeouts a little low. So there's a core cinder discussion that keeps losing to the shiny DLM discussion, and I'd like to see it played out fully: Could Cinder just not do anything, and let the few drivers that react _really_ badly, implement their own concurrency controls? So the problem here is data corruption. Lots of our races can cause data corruption. Not 'my instance didn't come up', not 'my network is screwed and I need to tear everything down and do it again', but 'My 1tb of customer database is now missing the second half'. This means that we *really* need some confidence and understanding in whatever we do. The idea of locks timing out and being stolen without fencing is frankly scary and begging for data corruption unless we're very careful. I'd rather use a persistent lock (e.g. a db record change) and manual recovery than a lock timeout that might cause corruption. -- Duncan Thomas If we end up using a DLM then we have to detect when the connection to the DLM is lost on a node and stop all ongoing operations to prevent data corruption. It may not be trivial to do, but we will have to do it in any solution we use, even on my last proposal that only uses the DB in Volume Manager we would still need to stop all operations if we lose connection to the DB. Cheers, Gorka. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
On Tue, Aug 04, 2015 at 05:47:44AM +1000, Morgan Fainberg wrote: On Aug 4, 2015, at 01:42, Fox, Kevin M kevin@pnnl.gov wrote: I'm usually for abstraction layers, but they don't always pay off very well due to catering to the lowest common denominator. Lets clearly define the problem space first. IFF the problem space can be fully implemented using Tooz, then lets do that. Then the operator can choose. If Tooz cant and wont handle the problem space, then we're trying to fit a square peg in a round hole. +1 and specifically around tooz, it is narrow in comparison to the feature sets of some the DLMs (since it has to mostly-implement to the lowest common denominator, as abstraction layers do). Defining the space we are trying to target will let us make the informed decision on what we use. Again with this? We already what we want to get out of Tooz, where we want it and for how long we'll be using it in each of those places. To answer those questions all that's needed is to read this thread and the links referred on some conversations. Gorka. Thanks, Kevin From: Gorka Eguileor [gegui...@redhat.com] Sent: Monday, August 03, 2015 1:43 AM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active On Mon, Aug 03, 2015 at 10:22:42AM +0200, Thierry Carrez wrote: Flavio Percoco wrote: [...] So, to summarize, I love the effort behind this. But, as others have mentioned, I'd like us to take a step back, run this accross teams and come up with an opinonated solution that would work for everyone. Starting this discussion now would allow us to prepare enough material to reach an agreement in Tokyo and work on a single solution for Mikata. This sounds like a good topic for a cross-project session. +1 The last thing we want is to rush a solution that would only solve a particular project use case. Personally I'd like us to pick the simplest solution that can solve most of the use cases. Each of the solutions bring something to the table -- Zookeeper is mature, Consul is featureful, etcd is lean and simple... Let's not dive into the best solution but clearly define the problem space first. -- Thierry Carrez (ttx) I don't see those as different solutions from the point of view of Cinder, they are different implementations to the same solution case, using a DLM to lock resources. We keep circling back to the fancy names like moths to a flame, when we are still discussing whether we need or want a DLM for the solution. I think we should stop doing that, we need to decide on the solution from an abstract point of view (like you say, define the problem space) and not get caught up on discussions of which one of those is best. If we end up deciding to use a DLM, which is unlikely, then we can look into available drivers in Tooz and if we are not convinced with the ones we have (Redis, ZooKeeper, etc.) then we discuss which one we should be using instead and just add it to Tooz. Cheers, Gorka. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
On Tue, Aug 04, 2015 at 09:28:58AM +, Fox, Kevin M wrote: Its been explained for Cinder, but not for other OpenStack projects that also have needs in this area. For that, Flavio started a new thread Does OpenStack need a common solution for DLM? We are discussing Cinder specifics on this thread. Cheers, Gorka. Thanks, Kevin From: Gorka Eguileor Sent: Tuesday, August 04, 2015 1:39:07 AM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active On Mon, Aug 03, 2015 at 06:12:23PM +, Fox, Kevin M wrote: For example, to parallel the conversation with databases: We want a database. Well, that means mongodb, postgres, mysql, berkeleydb, etc Oh, well, I need it to be a relational db, Well, that means postgresq, mysql, etc Oh, and I need recursive queries... that excludes even more. We are pretty sure We want a distributed lock manager. What problems are we trying to solve using it, and what features do they require in the DLM/DLM Abstraction of choice? That will exclude some of them. It also may exclude abstraction layers that don't expose the features needed. (Recursive queries for example) Thanks, Kevin Kevin all that has already been explained: http://gorka.eguileor.com/a-cinder-road-to-activeactive-ha/ http://gorka.eguileor.com/simpler-road-to-cinder-active-active/ As well as on IRC and this thread, I don't see the point of repeating it over and over again, at some point people need to start doing their homework and read what's already been said to get up to speed on the topic so we can move on. Gorka. From: Gorka Eguileor [gegui...@redhat.com] Sent: Monday, August 03, 2015 10:48 AM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active On Mon, Aug 03, 2015 at 03:42:48PM +, Fox, Kevin M wrote: I'm usually for abstraction layers, but they don't always pay off very well due to catering to the lowest common denominator. Lets clearly define the problem space first. IFF the problem space can be fully implemented using Tooz, then lets do that. Then the operator can choose. If Tooz cant and wont handle the problem space, then we're trying to fit a square peg in a round hole. What do you mean with clearly define the problem space? We know what we want, we just need to agree on the compromises we are willing to make, use a DLM and make admins' life a little harder (only for those that deploy A-A) but have an A-A solution earlier, or postpone A-A functionality but make their life easier. And we already know that Tooz is not the Holy Grail and will not perform the miracle of giving Cinder HA A-A. It is only a piece of the problem, so there's nothing to discuss there, and it's not a square peg on a round hole, because it fits perfectly for what it is intended. But once you have filled that square hole you need another peg, the round one for the round hole. If people are expecting to find one thing that fixes everything and gives us HA A-A on its own, then I believe they are a little bit lost. Gorka. Thanks, Kevin From: Gorka Eguileor [gegui...@redhat.com] Sent: Monday, August 03, 2015 1:43 AM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active On Mon, Aug 03, 2015 at 10:22:42AM +0200, Thierry Carrez wrote: Flavio Percoco wrote: [...] So, to summarize, I love the effort behind this. But, as others have mentioned, I'd like us to take a step back, run this accross teams and come up with an opinonated solution that would work for everyone. Starting this discussion now would allow us to prepare enough material to reach an agreement in Tokyo and work on a single solution for Mikata. This sounds like a good topic for a cross-project session. +1 The last thing we want is to rush a solution that would only solve a particular project use case. Personally I'd like us to pick the simplest solution that can solve most of the use cases. Each of the solutions bring something to the table -- Zookeeper is mature, Consul is featureful, etcd is lean and simple... Let's not dive into the best solution but clearly define the problem space first. -- Thierry Carrez (ttx) I don't see those as different solutions from the point of view of Cinder, they are different implementations to the same solution case, using a DLM to lock resources. We keep circling back to the fancy names like moths to a flame, when we are still discussing
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
On Mon, Aug 03, 2015 at 06:12:23PM +, Fox, Kevin M wrote: For example, to parallel the conversation with databases: We want a database. Well, that means mongodb, postgres, mysql, berkeleydb, etc Oh, well, I need it to be a relational db, Well, that means postgresq, mysql, etc Oh, and I need recursive queries... that excludes even more. We are pretty sure We want a distributed lock manager. What problems are we trying to solve using it, and what features do they require in the DLM/DLM Abstraction of choice? That will exclude some of them. It also may exclude abstraction layers that don't expose the features needed. (Recursive queries for example) Thanks, Kevin Kevin all that has already been explained: http://gorka.eguileor.com/a-cinder-road-to-activeactive-ha/ http://gorka.eguileor.com/simpler-road-to-cinder-active-active/ As well as on IRC and this thread, I don't see the point of repeating it over and over again, at some point people need to start doing their homework and read what's already been said to get up to speed on the topic so we can move on. Gorka. From: Gorka Eguileor [gegui...@redhat.com] Sent: Monday, August 03, 2015 10:48 AM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active On Mon, Aug 03, 2015 at 03:42:48PM +, Fox, Kevin M wrote: I'm usually for abstraction layers, but they don't always pay off very well due to catering to the lowest common denominator. Lets clearly define the problem space first. IFF the problem space can be fully implemented using Tooz, then lets do that. Then the operator can choose. If Tooz cant and wont handle the problem space, then we're trying to fit a square peg in a round hole. What do you mean with clearly define the problem space? We know what we want, we just need to agree on the compromises we are willing to make, use a DLM and make admins' life a little harder (only for those that deploy A-A) but have an A-A solution earlier, or postpone A-A functionality but make their life easier. And we already know that Tooz is not the Holy Grail and will not perform the miracle of giving Cinder HA A-A. It is only a piece of the problem, so there's nothing to discuss there, and it's not a square peg on a round hole, because it fits perfectly for what it is intended. But once you have filled that square hole you need another peg, the round one for the round hole. If people are expecting to find one thing that fixes everything and gives us HA A-A on its own, then I believe they are a little bit lost. Gorka. Thanks, Kevin From: Gorka Eguileor [gegui...@redhat.com] Sent: Monday, August 03, 2015 1:43 AM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active On Mon, Aug 03, 2015 at 10:22:42AM +0200, Thierry Carrez wrote: Flavio Percoco wrote: [...] So, to summarize, I love the effort behind this. But, as others have mentioned, I'd like us to take a step back, run this accross teams and come up with an opinonated solution that would work for everyone. Starting this discussion now would allow us to prepare enough material to reach an agreement in Tokyo and work on a single solution for Mikata. This sounds like a good topic for a cross-project session. +1 The last thing we want is to rush a solution that would only solve a particular project use case. Personally I'd like us to pick the simplest solution that can solve most of the use cases. Each of the solutions bring something to the table -- Zookeeper is mature, Consul is featureful, etcd is lean and simple... Let's not dive into the best solution but clearly define the problem space first. -- Thierry Carrez (ttx) I don't see those as different solutions from the point of view of Cinder, they are different implementations to the same solution case, using a DLM to lock resources. We keep circling back to the fancy names like moths to a flame, when we are still discussing whether we need or want a DLM for the solution. I think we should stop doing that, we need to decide on the solution from an abstract point of view (like you say, define the problem space) and not get caught up on discussions of which one of those is best. If we end up deciding to use a DLM, which is unlikely, then we can look into available drivers in Tooz and if we are not convinced with the ones we have (Redis, ZooKeeper, etc.) then we discuss which one we should be using instead and just add it to Tooz. Cheers, Gorka. __ OpenStack Development Mailing List (not for usage
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
If we end up using a DLM then we have to detect when the connection to the DLM is lost on a node and stop all ongoing operations to prevent data corruption. It may not be trivial to do, but we will have to do it in any solution we use, even on my last proposal that only uses the DB in Volume Manager we would still need to stop all operations if we lose connection to the DB. Well, is it already decided that Pacemaker would be chosen to provide HA in Openstack? There's been a talk Pacemaker: the PID 1 of Openstack IIRC. I know that Pacemaker's been pushed aside in an earlier ML post, but IMO there's already *so much* been done for HA in Pacemaker that Openstack should just use it. All HA nodes needs to participate in a Pacemaker cluster - and if one node looses connection, all services will get stopped automatically (by Pacemaker) - or the node gets fenced. No need to invent some sloppy scripts to do exactly the tasks (badly!) that the Linux HA Stack has been providing for quite a few years. Yes, Pacemaker needs learning - but not more than any other involved project, and there are already quite a few here, which have to be known to any operator or developer already. (BTW, LINBIT sells training for the Linux HA Cluster Stack - and yes, I work for them ;) __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
Its been explained for Cinder, but not for other OpenStack projects that also have needs in this area. Thanks, Kevin From: Gorka Eguileor Sent: Tuesday, August 04, 2015 1:39:07 AM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active On Mon, Aug 03, 2015 at 06:12:23PM +, Fox, Kevin M wrote: For example, to parallel the conversation with databases: We want a database. Well, that means mongodb, postgres, mysql, berkeleydb, etc Oh, well, I need it to be a relational db, Well, that means postgresq, mysql, etc Oh, and I need recursive queries... that excludes even more. We are pretty sure We want a distributed lock manager. What problems are we trying to solve using it, and what features do they require in the DLM/DLM Abstraction of choice? That will exclude some of them. It also may exclude abstraction layers that don't expose the features needed. (Recursive queries for example) Thanks, Kevin Kevin all that has already been explained: http://gorka.eguileor.com/a-cinder-road-to-activeactive-ha/ http://gorka.eguileor.com/simpler-road-to-cinder-active-active/ As well as on IRC and this thread, I don't see the point of repeating it over and over again, at some point people need to start doing their homework and read what's already been said to get up to speed on the topic so we can move on. Gorka. From: Gorka Eguileor [gegui...@redhat.com] Sent: Monday, August 03, 2015 10:48 AM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active On Mon, Aug 03, 2015 at 03:42:48PM +, Fox, Kevin M wrote: I'm usually for abstraction layers, but they don't always pay off very well due to catering to the lowest common denominator. Lets clearly define the problem space first. IFF the problem space can be fully implemented using Tooz, then lets do that. Then the operator can choose. If Tooz cant and wont handle the problem space, then we're trying to fit a square peg in a round hole. What do you mean with clearly define the problem space? We know what we want, we just need to agree on the compromises we are willing to make, use a DLM and make admins' life a little harder (only for those that deploy A-A) but have an A-A solution earlier, or postpone A-A functionality but make their life easier. And we already know that Tooz is not the Holy Grail and will not perform the miracle of giving Cinder HA A-A. It is only a piece of the problem, so there's nothing to discuss there, and it's not a square peg on a round hole, because it fits perfectly for what it is intended. But once you have filled that square hole you need another peg, the round one for the round hole. If people are expecting to find one thing that fixes everything and gives us HA A-A on its own, then I believe they are a little bit lost. Gorka. Thanks, Kevin From: Gorka Eguileor [gegui...@redhat.com] Sent: Monday, August 03, 2015 1:43 AM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active On Mon, Aug 03, 2015 at 10:22:42AM +0200, Thierry Carrez wrote: Flavio Percoco wrote: [...] So, to summarize, I love the effort behind this. But, as others have mentioned, I'd like us to take a step back, run this accross teams and come up with an opinonated solution that would work for everyone. Starting this discussion now would allow us to prepare enough material to reach an agreement in Tokyo and work on a single solution for Mikata. This sounds like a good topic for a cross-project session. +1 The last thing we want is to rush a solution that would only solve a particular project use case. Personally I'd like us to pick the simplest solution that can solve most of the use cases. Each of the solutions bring something to the table -- Zookeeper is mature, Consul is featureful, etcd is lean and simple... Let's not dive into the best solution but clearly define the problem space first. -- Thierry Carrez (ttx) I don't see those as different solutions from the point of view of Cinder, they are different implementations to the same solution case, using a DLM to lock resources. We keep circling back to the fancy names like moths to a flame, when we are still discussing whether we need or want a DLM for the solution. I think we should stop doing that, we need to decide on the solution from an abstract point of view (like you say, define the problem space) and not get caught up on discussions of which one of those is best. If we end up deciding to use a DLM, which is unlikely, then we can
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
Duncan Thomas wrote: On 3 August 2015 at 20:53, Clint Byrum cl...@fewbar.com mailto:cl...@fewbar.com wrote: Excerpts from Devananda van der Veen's message of 2015-08-03 08:53:21 -0700: Also on a side note, I think Cinder's need for this is really subtle, and one could just accept that sometimes it's going to break when it does two things to one resource from two hosts. The error rate there might even be lower than the false-error rate that would be caused by a twitchy DLM with timeouts a little low. So there's a core cinder discussion that keeps losing to the shiny DLM discussion, and I'd like to see it played out fully: Could Cinder just not do anything, and let the few drivers that react _really_ badly, implement their own concurrency controls? So the problem here is data corruption. Lots of our races can cause data corruption. Not 'my instance didn't come up', not 'my network is screwed and I need to tear everything down and do it again', but 'My 1tb of customer database is now missing the second half'. This means that we *really* need some confidence and understanding in whatever we do. The idea of locks timing out and being stolen without fencing is frankly scary and begging for data corruption unless we're very careful. I'd rather use a persistent lock (e.g. a db record change) and manual recovery than a lock timeout that might cause corruption. So perhaps start off using persistent locks, gain confidence that we have all the right fixes in to prevent that data corruption, and then slowly remove persistent locks as needed. Sounds like an iterative solution to me, and one that will build confidence (hopefully that confidence building can be automated via a chaos-monkey like test-suite) as we go :) -- Duncan Thomas __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
Philipp Marek wrote: If we end up using a DLM then we have to detect when the connection to the DLM is lost on a node and stop all ongoing operations to prevent data corruption. It may not be trivial to do, but we will have to do it in any solution we use, even on my last proposal that only uses the DB in Volume Manager we would still need to stop all operations if we lose connection to the DB. Well, is it already decided that Pacemaker would be chosen to provide HA in Openstack? There's been a talk Pacemaker: the PID 1 of Openstack IIRC. I know that Pacemaker's been pushed aside in an earlier ML post, but IMO there's already *so much* been done for HA in Pacemaker that Openstack should just use it. All HA nodes needs to participate in a Pacemaker cluster - and if one node looses connection, all services will get stopped automatically (by Pacemaker) - or the node gets fenced. No need to invent some sloppy scripts to do exactly the tasks (badly!) that the Linux HA Stack has been providing for quite a few years. Yes, Pacemaker needs learning - but not more than any other involved project, and there are already quite a few here, which have to be known to any operator or developer already. (BTW, LINBIT sells training for the Linux HA Cluster Stack - and yes, I work for them ;) So just a piece of information, but yahoo (the company I work for, with vms in the tens of thousands, baremetal in the much more than that...) hasn't used pacemaker, and in all honesty this is the first project (openstack) that I have heard that needs such a solution. I feel that we really should be building our services better so that they can be A-A vs having to depend on another piece of software to get around our 'sloppiness' (for lack of a better word). Nothing against pacemaker personally... IMHO it just doesn't feel like we are doing this right if we need such a product in the first place. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
Clint Byrum wrote: Excerpts from Devananda van der Veen's message of 2015-08-03 08:53:21 -0700: On Mon, Aug 3, 2015 at 8:41 AM Joshua Harlowharlo...@outlook.com wrote: Clint Byrum wrote: Excerpts from Gorka Eguileor's message of 2015-08-02 15:49:46 -0700: On Fri, Jul 31, 2015 at 01:47:22AM -0700, Mike Perez wrote: On Mon, Jul 27, 2015 at 12:35 PM, Gorka Eguileorgegui...@redhat.com wrote: I know we've all been looking at the HA Active-Active problem in Cinder and trying our best to figure out possible solutions to the different issues, and since current plan is going to take a while (because it requires that we finish first fixing Cinder-Nova interactions), I've been looking at alternatives that allow Active-Active configurations without needing to wait for those changes to take effect. And I think I have found a possible solution, but since the HA A-A problem has a lot of moving parts I ended up upgrading my initial Etherpad notes to a post [1]. Even if we decide that this is not the way to go, which we'll probably do, I still think that the post brings a little clarity on all the moving parts of the problem, even some that are not reflected on our Etherpad [2], and it can help us not miss anything when deciding on a different solution. Based on IRC conversations in the Cinder room and hearing people's opinions in the spec reviews, I'm not convinced the complexity that a distributed lock manager adds to Cinder for both developers and the operators who ultimately are going to have to learn to maintain things like Zoo Keeper as a result is worth it. **Key point**: We're not scaling Cinder itself, it's about scaling to avoid build up of operations from the storage backend solutions themselves. Whatever people think ZooKeeper scaling level is going to accomplish is not even a question. We don't need it, because Cinder isn't as complex as people are making it. I'd like to think the Cinder team is a great in recognizing potential cross project initiatives. Look at what Thang Pham has done with Nova's version object solution. He made a generic solution into an Oslo solution for all, and Cinder is using it. That was awesome, and people really appreciated that there was a focus for other projects to get better, not just Cinder. Have people consider Ironic's hash ring solution? The project Akanda is now adopting it [1], and I think it might have potential. I'd appreciate it if interested parties could have this evaluated before the Cinder midcycle sprint next week, to be ready for discussion. [1] - https://review.openstack.org/#/c/195366/ -- Mike Perez Hi all, Since my original proposal was more complex that it needed be I have a new proposal of a simpler solution, and I describe how we can do it with or without a DLM since we don't seem to reach an agreement on that. The solution description was more rushed than previous one so I may have missed some things. http://gorka.eguileor.com/simpler-road-to-cinder-active-active/ I like the idea of keeping it simpler Gorka. :) Note that this is punting back to use the database for coordination, which is what most projects have done thus far, and has a number of advantages and disadvantages. Note that the stale-lock problem was solved in an interesting way in Heat: each engine process gets an instance-of-engine uuid that adds to the topic queues it listens to. If it holds a lock, it records this UUID in the owner field. When somebody wants to steal the lock (due to timeout) they send to this queue, and if there's no response, the lock is stolen. Anyway, I think what might make more sense than copying that directly, is implementing Use the database and oslo.messaging to build a DLM as a tooz backend. This way as the negative aspects of that approach impact an operator, they can pick a tooz driver that satisfies their needs, or even write one to their specific backend needs. Oh jeez, using 'the database and oslo.messaging to build a DLM' scares me :-/ There are already N + 1 DLM like-systems out there (and more every day if u consider the list at https://raftconsensus.github.io/#implementations) so I'd really rather use one that is proven to work by academia vs make a frankenstein one. Joshua, As has been said on this thread, some projects (eg, Ironic) are already using a consistent hash ring backed by a database to meet the requirements they have. Could those requirements also be met with some other tech? Yes. Would that provide additional functionality or some other benefits? Maybe. But that's not what this thread was about. Distributed hash rings are a well understood technique, as are databases. There's no need to be insulting by calling not-your-favorite-technology-of-the-day a frankenstein one. The topic here, which I've been eagerly following, is whether or not Cinder needs to use a DLM *at all*. Until that is addressed, discussing specific DLM or distributed KVS is not necessary. The hash ring has its own set of problems
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
Ah. Ok. Thanks. Kevin From: Gorka Eguileor [gegui...@redhat.com] Sent: Tuesday, August 04, 2015 6:33 AM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [Cinder] A possiblesolutionfor HA Active-Active On Tue, Aug 04, 2015 at 09:28:58AM +, Fox, Kevin M wrote: Its been explained for Cinder, but not for other OpenStack projects that also have needs in this area. For that, Flavio started a new thread Does OpenStack need a common solution for DLM? We are discussing Cinder specifics on this thread. Cheers, Gorka. Thanks, Kevin From: Gorka Eguileor Sent: Tuesday, August 04, 2015 1:39:07 AM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active On Mon, Aug 03, 2015 at 06:12:23PM +, Fox, Kevin M wrote: For example, to parallel the conversation with databases: We want a database. Well, that means mongodb, postgres, mysql, berkeleydb, etc Oh, well, I need it to be a relational db, Well, that means postgresq, mysql, etc Oh, and I need recursive queries... that excludes even more. We are pretty sure We want a distributed lock manager. What problems are we trying to solve using it, and what features do they require in the DLM/DLM Abstraction of choice? That will exclude some of them. It also may exclude abstraction layers that don't expose the features needed. (Recursive queries for example) Thanks, Kevin Kevin all that has already been explained: http://gorka.eguileor.com/a-cinder-road-to-activeactive-ha/ http://gorka.eguileor.com/simpler-road-to-cinder-active-active/ As well as on IRC and this thread, I don't see the point of repeating it over and over again, at some point people need to start doing their homework and read what's already been said to get up to speed on the topic so we can move on. Gorka. From: Gorka Eguileor [gegui...@redhat.com] Sent: Monday, August 03, 2015 10:48 AM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active On Mon, Aug 03, 2015 at 03:42:48PM +, Fox, Kevin M wrote: I'm usually for abstraction layers, but they don't always pay off very well due to catering to the lowest common denominator. Lets clearly define the problem space first. IFF the problem space can be fully implemented using Tooz, then lets do that. Then the operator can choose. If Tooz cant and wont handle the problem space, then we're trying to fit a square peg in a round hole. What do you mean with clearly define the problem space? We know what we want, we just need to agree on the compromises we are willing to make, use a DLM and make admins' life a little harder (only for those that deploy A-A) but have an A-A solution earlier, or postpone A-A functionality but make their life easier. And we already know that Tooz is not the Holy Grail and will not perform the miracle of giving Cinder HA A-A. It is only a piece of the problem, so there's nothing to discuss there, and it's not a square peg on a round hole, because it fits perfectly for what it is intended. But once you have filled that square hole you need another peg, the round one for the round hole. If people are expecting to find one thing that fixes everything and gives us HA A-A on its own, then I believe they are a little bit lost. Gorka. Thanks, Kevin From: Gorka Eguileor [gegui...@redhat.com] Sent: Monday, August 03, 2015 1:43 AM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active On Mon, Aug 03, 2015 at 10:22:42AM +0200, Thierry Carrez wrote: Flavio Percoco wrote: [...] So, to summarize, I love the effort behind this. But, as others have mentioned, I'd like us to take a step back, run this accross teams and come up with an opinonated solution that would work for everyone. Starting this discussion now would allow us to prepare enough material to reach an agreement in Tokyo and work on a single solution for Mikata. This sounds like a good topic for a cross-project session. +1 The last thing we want is to rush a solution that would only solve a particular project use case. Personally I'd like us to pick the simplest solution that can solve most of the use cases. Each of the solutions bring something to the table -- Zookeeper is mature, Consul is featureful, etcd is lean and simple... Let's not dive into the best solution but clearly define the problem space first
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
Excerpts from Duncan Thomas's message of 2015-08-04 00:32:40 -0700: On 3 August 2015 at 20:53, Clint Byrum cl...@fewbar.com wrote: Excerpts from Devananda van der Veen's message of 2015-08-03 08:53:21 -0700: Also on a side note, I think Cinder's need for this is really subtle, and one could just accept that sometimes it's going to break when it does two things to one resource from two hosts. The error rate there might even be lower than the false-error rate that would be caused by a twitchy DLM with timeouts a little low. So there's a core cinder discussion that keeps losing to the shiny DLM discussion, and I'd like to see it played out fully: Could Cinder just not do anything, and let the few drivers that react _really_ badly, implement their own concurrency controls? So the problem here is data corruption. Lots of our races can cause data corruption. Not 'my instance didn't come up', not 'my network is screwed and I need to tear everything down and do it again', but 'My 1tb of customer database is now missing the second half'. This means that we *really* need some confidence and understanding in whatever we do. The idea of locks timing out and being stolen without fencing is frankly scary and begging for data corruption unless we're very careful. I'd rather use a persistent lock (e.g. a db record change) and manual recovery than a lock timeout that might cause corruption. Thanks Duncan. Can you be more specific about a known data-corrupting race that a) isn't handled simply by serialization in the database, and b) isn't specific to a single driver? __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
On Tue, Aug 4, 2015 at 1:43 AM, Gorka Eguileor gegui...@redhat.com wrote: On Tue, Aug 04, 2015 at 05:47:44AM +1000, Morgan Fainberg wrote: On Aug 4, 2015, at 01:42, Fox, Kevin M kevin@pnnl.gov wrote: I'm usually for abstraction layers, but they don't always pay off very well due to catering to the lowest common denominator. Lets clearly define the problem space first. IFF the problem space can be fully implemented using Tooz, then lets do that. Then the operator can choose. If Tooz cant and wont handle the problem space, then we're trying to fit a square peg in a round hole. +1 and specifically around tooz, it is narrow in comparison to the feature sets of some the DLMs (since it has to mostly-implement to the lowest common denominator, as abstraction layers do). Defining the space we are trying to target will let us make the informed decision on what we use. Again with this? Yes, I was reiterating that we should not talk about a specific choice but continue with the other discussion. Tooz, ZooKeeper, Consul, etc, is all irrelevant to the rest of the conversation we are having. The specific technology used can be discussed in an x-project spec, but I really would rather see a very opinionated choice. That can again be delayed until a later point. We already what we want to get out of Tooz, where we want it and for how long we'll be using it in each of those places. My response was also before the rest of the convo that occurred post Flavio's summary. To answer those questions all that's needed is to read this thread and the links referred on some conversations. I am fine with using a DLM. I see a significant benefit (without putting too fine a point on it, Keystone *will* benefit from a choice for a DLM to be available in OpenStack, and I like the idea). I was hoping to continue (and we did) identify where we had DLM-like/DLM uses in OpenStack so we knew where to focus. Gorka. Thanks, Kevin From: Gorka Eguileor [gegui...@redhat.com] Sent: Monday, August 03, 2015 1:43 AM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active On Mon, Aug 03, 2015 at 10:22:42AM +0200, Thierry Carrez wrote: Flavio Percoco wrote: [...] So, to summarize, I love the effort behind this. But, as others have mentioned, I'd like us to take a step back, run this accross teams and come up with an opinonated solution that would work for everyone. Starting this discussion now would allow us to prepare enough material to reach an agreement in Tokyo and work on a single solution for Mikata. This sounds like a good topic for a cross-project session. +1 The last thing we want is to rush a solution that would only solve a particular project use case. Personally I'd like us to pick the simplest solution that can solve most of the use cases. Each of the solutions bring something to the table -- Zookeeper is mature, Consul is featureful, etcd is lean and simple... Let's not dive into the best solution but clearly define the problem space first. -- Thierry Carrez (ttx) I don't see those as different solutions from the point of view of Cinder, they are different implementations to the same solution case, using a DLM to lock resources. We keep circling back to the fancy names like moths to a flame, when we are still discussing whether we need or want a DLM for the solution. I think we should stop doing that, we need to decide on the solution from an abstract point of view (like you say, define the problem space) and not get caught up on discussions of which one of those is best. If we end up deciding to use a DLM, which is unlikely, then we can look into available drivers in Tooz and if we are not convinced with the ones we have (Redis, ZooKeeper, etc.) then we discuss which one we should be using instead and just add it to Tooz. Cheers, Gorka. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
Morgan Fainberg wrote: Lets step back away from tooz. Tooz for the sake of this conversation is as much the same as saying zookeeper or consul or etcd, etc. We should be focused (as both Flavio and Thierry said) on if we need DLM and what it will solve. So IMHO part of the problem with just thinking about a DLM is that most of the above (zk, consul, etcd) are much more than a simple DLM(s); that likely leads to over-thinking solutions... I'm all for a cross-project spec and chat and all that, as long as it goes somewhere, maybe the TC can try to work together with the community and make that happen (if they can free themselves from picking tags). Once we have all of that defined, the use of an abstraction such as tooz (or just the direct bindings for some specific choice) can be made. I want to voice that we should be very picky about the solution (if we decide on a DLM) so that we are implementing to the strengths of the solution rather than try and make everything work seamlessly. --Morgan Sent via mobile On Aug 3, 2015, at 18:49, Julien Danjoujul...@danjou.info wrote: On Mon, Aug 03 2015, Thierry Carrez wrote: The last thing we want is to rush a solution that would only solve a particular project use case. Personally I'd like us to pick the simplest solution that can solve most of the use cases. Each of the solutions bring something to the table -- Zookeeper is mature, Consul is featureful, etcd is lean and simple... Let's not dive into the best solution but clearly define the problem space first. Or just start using Tooz – like some of OpenStack are already doing for months – and let the operators pick the backend that they are the most comfortable with? :) -- Julien Danjou -- Free Software hacker -- http://julien.danjou.info __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
+1. From: Flavio Percoco [fla...@redhat.com] Sent: Monday, August 03, 2015 12:30 AM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active On 03/08/15 00:49 +0200, Gorka Eguileor wrote: On Fri, Jul 31, 2015 at 01:47:22AM -0700, Mike Perez wrote: On Mon, Jul 27, 2015 at 12:35 PM, Gorka Eguileor gegui...@redhat.com wrote: I know we've all been looking at the HA Active-Active problem in Cinder and trying our best to figure out possible solutions to the different issues, and since current plan is going to take a while (because it requires that we finish first fixing Cinder-Nova interactions), I've been looking at alternatives that allow Active-Active configurations without needing to wait for those changes to take effect. And I think I have found a possible solution, but since the HA A-A problem has a lot of moving parts I ended up upgrading my initial Etherpad notes to a post [1]. Even if we decide that this is not the way to go, which we'll probably do, I still think that the post brings a little clarity on all the moving parts of the problem, even some that are not reflected on our Etherpad [2], and it can help us not miss anything when deciding on a different solution. Based on IRC conversations in the Cinder room and hearing people's opinions in the spec reviews, I'm not convinced the complexity that a distributed lock manager adds to Cinder for both developers and the operators who ultimately are going to have to learn to maintain things like Zoo Keeper as a result is worth it. **Key point**: We're not scaling Cinder itself, it's about scaling to avoid build up of operations from the storage backend solutions themselves. Whatever people think ZooKeeper scaling level is going to accomplish is not even a question. We don't need it, because Cinder isn't as complex as people are making it. I'd like to think the Cinder team is a great in recognizing potential cross project initiatives. Look at what Thang Pham has done with Nova's version object solution. He made a generic solution into an Oslo solution for all, and Cinder is using it. That was awesome, and people really appreciated that there was a focus for other projects to get better, not just Cinder. Have people consider Ironic's hash ring solution? The project Akanda is now adopting it [1], and I think it might have potential. I'd appreciate it if interested parties could have this evaluated before the Cinder midcycle sprint next week, to be ready for discussion. [1] - https://review.openstack.org/#/c/195366/ -- Mike Perez Hi all, Since my original proposal was more complex that it needed be I have a new proposal of a simpler solution, and I describe how we can do it with or without a DLM since we don't seem to reach an agreement on that. The solution description was more rushed than previous one so I may have missed some things. http://gorka.eguileor.com/simpler-road-to-cinder-active-active/ First and foremost, thanks for collecting the feedback and working on a different proposal that integrates what's been discussed so far - or at least proposes a way forward and gives enough time to make the right call. Now, lets please stop for two seconds and say no to adding a DLM for now. This thread has already branched out to several discussions on whether we should use a DLM or not and whether it should be one speciffically or tooz. I'll take the chance and reply here directly to collect what's been said so far. I'm always down for avoiding new services to the stack because that makes it harder to deploy, maintain and reason about. However, in the case of DLM's, there are an essential part of distributed systems. We've been able to avoid them long enough but we're getting to the point where we might not be able to do that anymore. Therefore, I believe we should start discussing, carefully, what/how/when to do it. This is deffinitely not a decision that should be rushed. Lets start by mentioning some of the services that use or could use a DLM - not an exhaustive list: - Nova - Cinder - Ceilometer - Keystone - Zaqar - Each one of these has a specific use-case for a DLM, some of them even share it (cinder, nova). Therefore, I believe this deserves a cross-spec where we'd be able to mention *different* use cases that would lead us to pick the right technology (or the one that seems saner ;). As of now, whether it's Zookeeper, etcd, consul, put_here_the_new_cool_thing I don't really care. What I care about is that we pick a single technology that works well for all services. I'm starting to grow worried about the excessive lack of opinion we have in cases - like this one - where we should simply be opinionated. A strong opinion here would helps to be consistent, make it simpler to understand issues and share knowledge, it'll make OPs
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
Clint Byrum wrote: Excerpts from Gorka Eguileor's message of 2015-08-02 15:49:46 -0700: On Fri, Jul 31, 2015 at 01:47:22AM -0700, Mike Perez wrote: On Mon, Jul 27, 2015 at 12:35 PM, Gorka Eguileorgegui...@redhat.com wrote: I know we've all been looking at the HA Active-Active problem in Cinder and trying our best to figure out possible solutions to the different issues, and since current plan is going to take a while (because it requires that we finish first fixing Cinder-Nova interactions), I've been looking at alternatives that allow Active-Active configurations without needing to wait for those changes to take effect. And I think I have found a possible solution, but since the HA A-A problem has a lot of moving parts I ended up upgrading my initial Etherpad notes to a post [1]. Even if we decide that this is not the way to go, which we'll probably do, I still think that the post brings a little clarity on all the moving parts of the problem, even some that are not reflected on our Etherpad [2], and it can help us not miss anything when deciding on a different solution. Based on IRC conversations in the Cinder room and hearing people's opinions in the spec reviews, I'm not convinced the complexity that a distributed lock manager adds to Cinder for both developers and the operators who ultimately are going to have to learn to maintain things like Zoo Keeper as a result is worth it. **Key point**: We're not scaling Cinder itself, it's about scaling to avoid build up of operations from the storage backend solutions themselves. Whatever people think ZooKeeper scaling level is going to accomplish is not even a question. We don't need it, because Cinder isn't as complex as people are making it. I'd like to think the Cinder team is a great in recognizing potential cross project initiatives. Look at what Thang Pham has done with Nova's version object solution. He made a generic solution into an Oslo solution for all, and Cinder is using it. That was awesome, and people really appreciated that there was a focus for other projects to get better, not just Cinder. Have people consider Ironic's hash ring solution? The project Akanda is now adopting it [1], and I think it might have potential. I'd appreciate it if interested parties could have this evaluated before the Cinder midcycle sprint next week, to be ready for discussion. [1] - https://review.openstack.org/#/c/195366/ -- Mike Perez Hi all, Since my original proposal was more complex that it needed be I have a new proposal of a simpler solution, and I describe how we can do it with or without a DLM since we don't seem to reach an agreement on that. The solution description was more rushed than previous one so I may have missed some things. http://gorka.eguileor.com/simpler-road-to-cinder-active-active/ I like the idea of keeping it simpler Gorka. :) Note that this is punting back to use the database for coordination, which is what most projects have done thus far, and has a number of advantages and disadvantages. Note that the stale-lock problem was solved in an interesting way in Heat: each engine process gets an instance-of-engine uuid that adds to the topic queues it listens to. If it holds a lock, it records this UUID in the owner field. When somebody wants to steal the lock (due to timeout) they send to this queue, and if there's no response, the lock is stolen. Anyway, I think what might make more sense than copying that directly, is implementing Use the database and oslo.messaging to build a DLM as a tooz backend. This way as the negative aspects of that approach impact an operator, they can pick a tooz driver that satisfies their needs, or even write one to their specific backend needs. Oh jeez, using 'the database and oslo.messaging to build a DLM' scares me :-/ There are already N + 1 DLM like-systems out there (and more every day if u consider the list at https://raftconsensus.github.io/#implementations) so I'd really rather use one that is proven to work by academia vs make a frankenstein one. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
I'm usually for abstraction layers, but they don't always pay off very well due to catering to the lowest common denominator. Lets clearly define the problem space first. IFF the problem space can be fully implemented using Tooz, then lets do that. Then the operator can choose. If Tooz cant and wont handle the problem space, then we're trying to fit a square peg in a round hole. Thanks, Kevin From: Gorka Eguileor [gegui...@redhat.com] Sent: Monday, August 03, 2015 1:43 AM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active On Mon, Aug 03, 2015 at 10:22:42AM +0200, Thierry Carrez wrote: Flavio Percoco wrote: [...] So, to summarize, I love the effort behind this. But, as others have mentioned, I'd like us to take a step back, run this accross teams and come up with an opinonated solution that would work for everyone. Starting this discussion now would allow us to prepare enough material to reach an agreement in Tokyo and work on a single solution for Mikata. This sounds like a good topic for a cross-project session. +1 The last thing we want is to rush a solution that would only solve a particular project use case. Personally I'd like us to pick the simplest solution that can solve most of the use cases. Each of the solutions bring something to the table -- Zookeeper is mature, Consul is featureful, etcd is lean and simple... Let's not dive into the best solution but clearly define the problem space first. -- Thierry Carrez (ttx) I don't see those as different solutions from the point of view of Cinder, they are different implementations to the same solution case, using a DLM to lock resources. We keep circling back to the fancy names like moths to a flame, when we are still discussing whether we need or want a DLM for the solution. I think we should stop doing that, we need to decide on the solution from an abstract point of view (like you say, define the problem space) and not get caught up on discussions of which one of those is best. If we end up deciding to use a DLM, which is unlikely, then we can look into available drivers in Tooz and if we are not convinced with the ones we have (Redis, ZooKeeper, etc.) then we discuss which one we should be using instead and just add it to Tooz. Cheers, Gorka. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
On Mon, Aug 3, 2015 at 8:41 AM Joshua Harlow harlo...@outlook.com wrote: Clint Byrum wrote: Excerpts from Gorka Eguileor's message of 2015-08-02 15:49:46 -0700: On Fri, Jul 31, 2015 at 01:47:22AM -0700, Mike Perez wrote: On Mon, Jul 27, 2015 at 12:35 PM, Gorka Eguileorgegui...@redhat.com wrote: I know we've all been looking at the HA Active-Active problem in Cinder and trying our best to figure out possible solutions to the different issues, and since current plan is going to take a while (because it requires that we finish first fixing Cinder-Nova interactions), I've been looking at alternatives that allow Active-Active configurations without needing to wait for those changes to take effect. And I think I have found a possible solution, but since the HA A-A problem has a lot of moving parts I ended up upgrading my initial Etherpad notes to a post [1]. Even if we decide that this is not the way to go, which we'll probably do, I still think that the post brings a little clarity on all the moving parts of the problem, even some that are not reflected on our Etherpad [2], and it can help us not miss anything when deciding on a different solution. Based on IRC conversations in the Cinder room and hearing people's opinions in the spec reviews, I'm not convinced the complexity that a distributed lock manager adds to Cinder for both developers and the operators who ultimately are going to have to learn to maintain things like Zoo Keeper as a result is worth it. **Key point**: We're not scaling Cinder itself, it's about scaling to avoid build up of operations from the storage backend solutions themselves. Whatever people think ZooKeeper scaling level is going to accomplish is not even a question. We don't need it, because Cinder isn't as complex as people are making it. I'd like to think the Cinder team is a great in recognizing potential cross project initiatives. Look at what Thang Pham has done with Nova's version object solution. He made a generic solution into an Oslo solution for all, and Cinder is using it. That was awesome, and people really appreciated that there was a focus for other projects to get better, not just Cinder. Have people consider Ironic's hash ring solution? The project Akanda is now adopting it [1], and I think it might have potential. I'd appreciate it if interested parties could have this evaluated before the Cinder midcycle sprint next week, to be ready for discussion. [1] - https://review.openstack.org/#/c/195366/ -- Mike Perez Hi all, Since my original proposal was more complex that it needed be I have a new proposal of a simpler solution, and I describe how we can do it with or without a DLM since we don't seem to reach an agreement on that. The solution description was more rushed than previous one so I may have missed some things. http://gorka.eguileor.com/simpler-road-to-cinder-active-active/ I like the idea of keeping it simpler Gorka. :) Note that this is punting back to use the database for coordination, which is what most projects have done thus far, and has a number of advantages and disadvantages. Note that the stale-lock problem was solved in an interesting way in Heat: each engine process gets an instance-of-engine uuid that adds to the topic queues it listens to. If it holds a lock, it records this UUID in the owner field. When somebody wants to steal the lock (due to timeout) they send to this queue, and if there's no response, the lock is stolen. Anyway, I think what might make more sense than copying that directly, is implementing Use the database and oslo.messaging to build a DLM as a tooz backend. This way as the negative aspects of that approach impact an operator, they can pick a tooz driver that satisfies their needs, or even write one to their specific backend needs. Oh jeez, using 'the database and oslo.messaging to build a DLM' scares me :-/ There are already N + 1 DLM like-systems out there (and more every day if u consider the list at https://raftconsensus.github.io/#implementations) so I'd really rather use one that is proven to work by academia vs make a frankenstein one. Joshua, As has been said on this thread, some projects (eg, Ironic) are already using a consistent hash ring backed by a database to meet the requirements they have. Could those requirements also be met with some other tech? Yes. Would that provide additional functionality or some other benefits? Maybe. But that's not what this thread was about. Distributed hash rings are a well understood technique, as are databases. There's no need to be insulting by calling not-your-favorite-technology-of-the-day a frankenstein one. The topic here, which I've been eagerly following, is whether or not Cinder needs to use a DLM *at all*. Until that is addressed, discussing specific DLM or distributed KVS is not necessary. Thanks,
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
Devananda van der Veen wrote: On Mon, Aug 3, 2015 at 8:41 AM Joshua Harlow harlo...@outlook.com mailto:harlo...@outlook.com wrote: Clint Byrum wrote: Excerpts from Gorka Eguileor's message of 2015-08-02 15:49:46 -0700: On Fri, Jul 31, 2015 at 01:47:22AM -0700, Mike Perez wrote: On Mon, Jul 27, 2015 at 12:35 PM, Gorka Eguileorgegui...@redhat.com mailto:gegui...@redhat.com wrote: I know we've all been looking at the HA Active-Active problem in Cinder and trying our best to figure out possible solutions to the different issues, and since current plan is going to take a while (because it requires that we finish first fixing Cinder-Nova interactions), I've been looking at alternatives that allow Active-Active configurations without needing to wait for those changes to take effect. And I think I have found a possible solution, but since the HA A-A problem has a lot of moving parts I ended up upgrading my initial Etherpad notes to a post [1]. Even if we decide that this is not the way to go, which we'll probably do, I still think that the post brings a little clarity on all the moving parts of the problem, even some that are not reflected on our Etherpad [2], and it can help us not miss anything when deciding on a different solution. Based on IRC conversations in the Cinder room and hearing people's opinions in the spec reviews, I'm not convinced the complexity that a distributed lock manager adds to Cinder for both developers and the operators who ultimately are going to have to learn to maintain things like Zoo Keeper as a result is worth it. **Key point**: We're not scaling Cinder itself, it's about scaling to avoid build up of operations from the storage backend solutions themselves. Whatever people think ZooKeeper scaling level is going to accomplish is not even a question. We don't need it, because Cinder isn't as complex as people are making it. I'd like to think the Cinder team is a great in recognizing potential cross project initiatives. Look at what Thang Pham has done with Nova's version object solution. He made a generic solution into an Oslo solution for all, and Cinder is using it. That was awesome, and people really appreciated that there was a focus for other projects to get better, not just Cinder. Have people consider Ironic's hash ring solution? The project Akanda is now adopting it [1], and I think it might have potential. I'd appreciate it if interested parties could have this evaluated before the Cinder midcycle sprint next week, to be ready for discussion. [1] - https://review.openstack.org/#/c/195366/ -- Mike Perez Hi all, Since my original proposal was more complex that it needed be I have a new proposal of a simpler solution, and I describe how we can do it with or without a DLM since we don't seem to reach an agreement on that. The solution description was more rushed than previous one so I may have missed some things. http://gorka.eguileor.com/simpler-road-to-cinder-active-active/ I like the idea of keeping it simpler Gorka. :) Note that this is punting back to use the database for coordination, which is what most projects have done thus far, and has a number of advantages and disadvantages. Note that the stale-lock problem was solved in an interesting way in Heat: each engine process gets an instance-of-engine uuid that adds to the topic queues it listens to. If it holds a lock, it records this UUID in the owner field. When somebody wants to steal the lock (due to timeout) they send to this queue, and if there's no response, the lock is stolen. Anyway, I think what might make more sense than copying that directly, is implementing Use the database and oslo.messaging to build a DLM as a tooz backend. This way as the negative aspects of that approach impact an operator, they can pick a tooz driver that satisfies their needs, or even write one to their specific backend needs. Oh jeez, using 'the database and oslo.messaging to build a DLM' scares me :-/ There are already N + 1 DLM like-systems out there (and more every day if u consider the list at https://raftconsensus.github.io/#implementations) so I'd really rather use one that is proven to work by academia vs make a frankenstein one. Joshua, As has been said on this thread, some projects (eg, Ironic) are already using a consistent hash ring backed by a database to meet the requirements they have. Could those requirements also be met
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
On 03/08/15 08:32 -0700, Joshua Harlow wrote: Morgan Fainberg wrote: Lets step back away from tooz. Tooz for the sake of this conversation is as much the same as saying zookeeper or consul or etcd, etc. We should be focused (as both Flavio and Thierry said) on if we need DLM and what it will solve. So IMHO part of the problem with just thinking about a DLM is that most of the above (zk, consul, etcd) are much more than a simple DLM(s); that likely leads to over-thinking solutions... I'm all for a cross-project spec and chat and all that, as long as it goes somewhere, maybe the TC can try to work together with the community and make that happen (if they can free themselves from picking tags). I believe that is exactly what we're trying to do in this thread. The TC won't just go and chat for an hour about whether other projects need a DLM or not. This is a community decision, which is why I asked for a cross-project spec were this discussion can be held. Once the problem has been laid down and we all have a clear understanding of what we need, then we can move forward. Just to be clear, the discussion I'm looking forward to see is on whether a DLM is something we need, what problmes we're trying to solve and how it'd benefit projects at large rather than a single project. I think Gorka did an amazing job defining what the problem for *Cinder* is and how a DLM would help there. Now we need to expand that to other projects that we know already would benefit from having one around. This, by any means, should block other works in Cinder w.r.t HA A/A. Cheers, Flavio Once we have all of that defined, the use of an abstraction such as tooz (or just the direct bindings for some specific choice) can be made. I want to voice that we should be very picky about the solution (if we decide on a DLM) so that we are implementing to the strengths of the solution rather than try and make everything work seamlessly. --Morgan Sent via mobile On Aug 3, 2015, at 18:49, Julien Danjoujul...@danjou.info wrote: On Mon, Aug 03 2015, Thierry Carrez wrote: The last thing we want is to rush a solution that would only solve a particular project use case. Personally I'd like us to pick the simplest solution that can solve most of the use cases. Each of the solutions bring something to the table -- Zookeeper is mature, Consul is featureful, etcd is lean and simple... Let's not dive into the best solution but clearly define the problem space first. Or just start using Tooz – like some of OpenStack are already doing for months – and let the operators pick the backend that they are the most comfortable with? :) -- Julien Danjou -- Free Software hacker -- http://julien.danjou.info __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- @flaper87 Flavio Percoco pgp7Qx8EClBJI.pgp Description: PGP signature __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
On Mon, Aug 03, 2015 at 12:28:27AM -0700, Clint Byrum wrote: Excerpts from Gorka Eguileor's message of 2015-08-02 15:49:46 -0700: On Fri, Jul 31, 2015 at 01:47:22AM -0700, Mike Perez wrote: On Mon, Jul 27, 2015 at 12:35 PM, Gorka Eguileor gegui...@redhat.com wrote: I know we've all been looking at the HA Active-Active problem in Cinder and trying our best to figure out possible solutions to the different issues, and since current plan is going to take a while (because it requires that we finish first fixing Cinder-Nova interactions), I've been looking at alternatives that allow Active-Active configurations without needing to wait for those changes to take effect. And I think I have found a possible solution, but since the HA A-A problem has a lot of moving parts I ended up upgrading my initial Etherpad notes to a post [1]. Even if we decide that this is not the way to go, which we'll probably do, I still think that the post brings a little clarity on all the moving parts of the problem, even some that are not reflected on our Etherpad [2], and it can help us not miss anything when deciding on a different solution. Based on IRC conversations in the Cinder room and hearing people's opinions in the spec reviews, I'm not convinced the complexity that a distributed lock manager adds to Cinder for both developers and the operators who ultimately are going to have to learn to maintain things like Zoo Keeper as a result is worth it. **Key point**: We're not scaling Cinder itself, it's about scaling to avoid build up of operations from the storage backend solutions themselves. Whatever people think ZooKeeper scaling level is going to accomplish is not even a question. We don't need it, because Cinder isn't as complex as people are making it. I'd like to think the Cinder team is a great in recognizing potential cross project initiatives. Look at what Thang Pham has done with Nova's version object solution. He made a generic solution into an Oslo solution for all, and Cinder is using it. That was awesome, and people really appreciated that there was a focus for other projects to get better, not just Cinder. Have people consider Ironic's hash ring solution? The project Akanda is now adopting it [1], and I think it might have potential. I'd appreciate it if interested parties could have this evaluated before the Cinder midcycle sprint next week, to be ready for discussion. [1] - https://review.openstack.org/#/c/195366/ -- Mike Perez Hi all, Since my original proposal was more complex that it needed be I have a new proposal of a simpler solution, and I describe how we can do it with or without a DLM since we don't seem to reach an agreement on that. The solution description was more rushed than previous one so I may have missed some things. http://gorka.eguileor.com/simpler-road-to-cinder-active-active/ I like the idea of keeping it simpler Gorka. :) Note that this is punting back to use the database for coordination, which is what most projects have done thus far, and has a number of advantages and disadvantages. Note that the stale-lock problem was solved in an interesting way in Heat: each engine process gets an instance-of-engine uuid that adds to the topic queues it listens to. If it holds a lock, it records this UUID in the owner field. When somebody wants to steal the lock (due to timeout) they send to this queue, and if there's no response, the lock is stolen. I don't think that's a good idea for Cinder, if the node that is holding the lock is doing a long running CPU bound operation (like backups) it may not be fast enough to reply to that message, and that would end up with multiple nodes accessing the same data. Using the service heartbeat and the startup of the volume nodes we can do automatic cleanups on failed nodes. And lets be realistic, failed nodes will not be the norm, so we should prioritize normal operations over failure cleanup. And having inter-node operations like that will not only increase our message broker workload but it will also set more strict constraint in our Volume Node responsiveness. Which could put us in a pinch in some operations and would require a careful and thorough empirical study to confirm that we don't have false positives on lock steals. Anyway, I think what might make more sense than copying that directly, is implementing Use the database and oslo.messaging to build a DLM as a tooz backend. This way as the negative aspects of that approach impact an operator, they can pick a tooz driver that satisfies their needs, or even write one to their specific backend needs. I have no problem implementing a locking variant in Tooz using the DB (not DB locks). As far as I've seen Tooz community moves really fast with reviews and we could probably
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
On 03/08/15 00:49 +0200, Gorka Eguileor wrote: On Fri, Jul 31, 2015 at 01:47:22AM -0700, Mike Perez wrote: On Mon, Jul 27, 2015 at 12:35 PM, Gorka Eguileor gegui...@redhat.com wrote: I know we've all been looking at the HA Active-Active problem in Cinder and trying our best to figure out possible solutions to the different issues, and since current plan is going to take a while (because it requires that we finish first fixing Cinder-Nova interactions), I've been looking at alternatives that allow Active-Active configurations without needing to wait for those changes to take effect. And I think I have found a possible solution, but since the HA A-A problem has a lot of moving parts I ended up upgrading my initial Etherpad notes to a post [1]. Even if we decide that this is not the way to go, which we'll probably do, I still think that the post brings a little clarity on all the moving parts of the problem, even some that are not reflected on our Etherpad [2], and it can help us not miss anything when deciding on a different solution. Based on IRC conversations in the Cinder room and hearing people's opinions in the spec reviews, I'm not convinced the complexity that a distributed lock manager adds to Cinder for both developers and the operators who ultimately are going to have to learn to maintain things like Zoo Keeper as a result is worth it. **Key point**: We're not scaling Cinder itself, it's about scaling to avoid build up of operations from the storage backend solutions themselves. Whatever people think ZooKeeper scaling level is going to accomplish is not even a question. We don't need it, because Cinder isn't as complex as people are making it. I'd like to think the Cinder team is a great in recognizing potential cross project initiatives. Look at what Thang Pham has done with Nova's version object solution. He made a generic solution into an Oslo solution for all, and Cinder is using it. That was awesome, and people really appreciated that there was a focus for other projects to get better, not just Cinder. Have people consider Ironic's hash ring solution? The project Akanda is now adopting it [1], and I think it might have potential. I'd appreciate it if interested parties could have this evaluated before the Cinder midcycle sprint next week, to be ready for discussion. [1] - https://review.openstack.org/#/c/195366/ -- Mike Perez Hi all, Since my original proposal was more complex that it needed be I have a new proposal of a simpler solution, and I describe how we can do it with or without a DLM since we don't seem to reach an agreement on that. The solution description was more rushed than previous one so I may have missed some things. http://gorka.eguileor.com/simpler-road-to-cinder-active-active/ First and foremost, thanks for collecting the feedback and working on a different proposal that integrates what's been discussed so far - or at least proposes a way forward and gives enough time to make the right call. Now, lets please stop for two seconds and say no to adding a DLM for now. This thread has already branched out to several discussions on whether we should use a DLM or not and whether it should be one speciffically or tooz. I'll take the chance and reply here directly to collect what's been said so far. I'm always down for avoiding new services to the stack because that makes it harder to deploy, maintain and reason about. However, in the case of DLM's, there are an essential part of distributed systems. We've been able to avoid them long enough but we're getting to the point where we might not be able to do that anymore. Therefore, I believe we should start discussing, carefully, what/how/when to do it. This is deffinitely not a decision that should be rushed. Lets start by mentioning some of the services that use or could use a DLM - not an exhaustive list: - Nova - Cinder - Ceilometer - Keystone - Zaqar - Each one of these has a specific use-case for a DLM, some of them even share it (cinder, nova). Therefore, I believe this deserves a cross-spec where we'd be able to mention *different* use cases that would lead us to pick the right technology (or the one that seems saner ;). As of now, whether it's Zookeeper, etcd, consul, put_here_the_new_cool_thing I don't really care. What I care about is that we pick a single technology that works well for all services. I'm starting to grow worried about the excessive lack of opinion we have in cases - like this one - where we should simply be opinionated. A strong opinion here would helps to be consistent, make it simpler to understand issues and share knowledge, it'll make OPs lives simpler (as in there's just one thing they can deploy), etc. IMHO, OpenStack is confusing enough for us to keep adding abstraction over abstractions. The topic we're discussing here will impact all deployments out there and we better try to do one thing and do it right. So, to summarize, I love the effort
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
Lets step back away from tooz. Tooz for the sake of this conversation is as much the same as saying zookeeper or consul or etcd, etc. We should be focused (as both Flavio and Thierry said) on if we need DLM and what it will solve. Once we have all of that defined, the use of an abstraction such as tooz (or just the direct bindings for some specific choice) can be made. I want to voice that we should be very picky about the solution (if we decide on a DLM) so that we are implementing to the strengths of the solution rather than try and make everything work seamlessly. --Morgan Sent via mobile On Aug 3, 2015, at 18:49, Julien Danjou jul...@danjou.info wrote: On Mon, Aug 03 2015, Thierry Carrez wrote: The last thing we want is to rush a solution that would only solve a particular project use case. Personally I'd like us to pick the simplest solution that can solve most of the use cases. Each of the solutions bring something to the table -- Zookeeper is mature, Consul is featureful, etcd is lean and simple... Let's not dive into the best solution but clearly define the problem space first. Or just start using Tooz – like some of OpenStack are already doing for months – and let the operators pick the backend that they are the most comfortable with? :) -- Julien Danjou -- Free Software hacker -- http://julien.danjou.info __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
Excerpts from Gorka Eguileor's message of 2015-08-02 15:49:46 -0700: On Fri, Jul 31, 2015 at 01:47:22AM -0700, Mike Perez wrote: On Mon, Jul 27, 2015 at 12:35 PM, Gorka Eguileor gegui...@redhat.com wrote: I know we've all been looking at the HA Active-Active problem in Cinder and trying our best to figure out possible solutions to the different issues, and since current plan is going to take a while (because it requires that we finish first fixing Cinder-Nova interactions), I've been looking at alternatives that allow Active-Active configurations without needing to wait for those changes to take effect. And I think I have found a possible solution, but since the HA A-A problem has a lot of moving parts I ended up upgrading my initial Etherpad notes to a post [1]. Even if we decide that this is not the way to go, which we'll probably do, I still think that the post brings a little clarity on all the moving parts of the problem, even some that are not reflected on our Etherpad [2], and it can help us not miss anything when deciding on a different solution. Based on IRC conversations in the Cinder room and hearing people's opinions in the spec reviews, I'm not convinced the complexity that a distributed lock manager adds to Cinder for both developers and the operators who ultimately are going to have to learn to maintain things like Zoo Keeper as a result is worth it. **Key point**: We're not scaling Cinder itself, it's about scaling to avoid build up of operations from the storage backend solutions themselves. Whatever people think ZooKeeper scaling level is going to accomplish is not even a question. We don't need it, because Cinder isn't as complex as people are making it. I'd like to think the Cinder team is a great in recognizing potential cross project initiatives. Look at what Thang Pham has done with Nova's version object solution. He made a generic solution into an Oslo solution for all, and Cinder is using it. That was awesome, and people really appreciated that there was a focus for other projects to get better, not just Cinder. Have people consider Ironic's hash ring solution? The project Akanda is now adopting it [1], and I think it might have potential. I'd appreciate it if interested parties could have this evaluated before the Cinder midcycle sprint next week, to be ready for discussion. [1] - https://review.openstack.org/#/c/195366/ -- Mike Perez Hi all, Since my original proposal was more complex that it needed be I have a new proposal of a simpler solution, and I describe how we can do it with or without a DLM since we don't seem to reach an agreement on that. The solution description was more rushed than previous one so I may have missed some things. http://gorka.eguileor.com/simpler-road-to-cinder-active-active/ I like the idea of keeping it simpler Gorka. :) Note that this is punting back to use the database for coordination, which is what most projects have done thus far, and has a number of advantages and disadvantages. Note that the stale-lock problem was solved in an interesting way in Heat: each engine process gets an instance-of-engine uuid that adds to the topic queues it listens to. If it holds a lock, it records this UUID in the owner field. When somebody wants to steal the lock (due to timeout) they send to this queue, and if there's no response, the lock is stolen. Anyway, I think what might make more sense than copying that directly, is implementing Use the database and oslo.messaging to build a DLM as a tooz backend. This way as the negative aspects of that approach impact an operator, they can pick a tooz driver that satisfies their needs, or even write one to their specific backend needs. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
On Fri, Jul 31, 2015 at 03:18:39PM -0700, Clint Byrum wrote: Excerpts from Mike Perez's message of 2015-07-31 10:40:04 -0700: On Fri, Jul 31, 2015 at 8:56 AM, Joshua Harlow harlo...@outlook.com wrote: ...random thought here, skip as needed... in all honesty orchestration solutions like mesos (http://mesos.apache.org/assets/img/documentation/architecture3.jpg), map-reduce solutions like hadoop, stream processing systems like apache storm (...), are already using zookeeper and I'm not saying we should just use it cause they are, but the likelihood that they just picked it for no reason are imho slim. I'd really like to see focus cross project. I don't want Ceilometer to depend on Zoo Keeper, Cinder to depend on etcd, etc. This is not ideal for an operator to have to deploy, learn and maintain each of these solutions. If we finally implement the DLM with Tooz then we wouldn't really need to use different DLM solution in each project, they could all be using the same. It's just a matter of choice for the admin. If admin wants to use the same tool he will set the same configuration in Ceilometer and Cinder, and he wants more work and use different tools then he will set different configurations. The discussion is not about choosing ZooKeeper over Redis or anything like that, the discussion is more in the lines of we need/don't need a DLM for HA A-A. I think this is difficult when you consider everyone wants options of their preferred DLM. If we went this route, we should pick one. Regardless, I want to know if we really need a DLM. Does Ceilometer really need a DLM? Does Cinder really need a DLM? Can we just use a hash ring solution where operators don't even have to know or care about deploying a DLM and running multiple instances of Cinder manager just works? So in the Ironic case, if two conductors decide they both own one IPMI controller, _chaos_ can ensue. They may, at different times, read that the power is up, or down, and issue power control commands that may take many seconds, and thus on the next status run of the other command may cause the conductor to react by reversing, and they'll just fight over the node in a tug-o-war fashion. Oh wait, except, thats not true. Instead, they use the database as a locking mechanism, and AFAIK, no nodes have been torn limb from limb by two conductors thus far. One thing we must not forget is that we are not talking about using locks just for mutual exclusion or for the sake of it, we are doing it to maintain current Cinder behavior intact. Something people keep forgetting. Right now if you are using volume A for reading, lets say for cloning into a new volume, and a request that changes the resource status comes in for that same volume A, like backup or delete, the API will accept that request and when the message arrives at the Volume Node it will be queued on the lock, and once that lock is released operation will be performed. When locking only using the DB, which I believe is mostly possible with possible exception of some driver sections of software based storage systems, you will be probably changing this behavior and no longer allowing this. Depending on who you ask this is a big deal (breaking an implicit contract based on current behavior) or not. Keeping things backward compatible is a pain. And I think that's one of the differences between Ironic and Cinder, at this point we cannot just decide to change Cinder however we like, we have to take into account Nova interactions as well as external client interactions. But, a DLM would be more efficient, and actually simplify failure recovery for Ironic's operators. The database locks suffer from being a little too conservative, and sometimes you just have to go into the DB and delete a lock after something explodes (this was true 6 months ago, it may have better automation sometimes now, I don't know). Anyway, I'm all for the simplest possible solution. But, don't make it _too_ simple. I agree, like Einstein said, everything should be made as simple as possible, but no simpler. Cheers, Gorka. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
Flavio Percoco wrote: [...] So, to summarize, I love the effort behind this. But, as others have mentioned, I'd like us to take a step back, run this accross teams and come up with an opinonated solution that would work for everyone. Starting this discussion now would allow us to prepare enough material to reach an agreement in Tokyo and work on a single solution for Mikata. This sounds like a good topic for a cross-project session. +1 The last thing we want is to rush a solution that would only solve a particular project use case. Personally I'd like us to pick the simplest solution that can solve most of the use cases. Each of the solutions bring something to the table -- Zookeeper is mature, Consul is featureful, etcd is lean and simple... Let's not dive into the best solution but clearly define the problem space first. -- Thierry Carrez (ttx) signature.asc Description: OpenPGP digital signature __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
On Mon, Aug 03, 2015 at 10:22:42AM +0200, Thierry Carrez wrote: Flavio Percoco wrote: [...] So, to summarize, I love the effort behind this. But, as others have mentioned, I'd like us to take a step back, run this accross teams and come up with an opinonated solution that would work for everyone. Starting this discussion now would allow us to prepare enough material to reach an agreement in Tokyo and work on a single solution for Mikata. This sounds like a good topic for a cross-project session. +1 The last thing we want is to rush a solution that would only solve a particular project use case. Personally I'd like us to pick the simplest solution that can solve most of the use cases. Each of the solutions bring something to the table -- Zookeeper is mature, Consul is featureful, etcd is lean and simple... Let's not dive into the best solution but clearly define the problem space first. -- Thierry Carrez (ttx) I don't see those as different solutions from the point of view of Cinder, they are different implementations to the same solution case, using a DLM to lock resources. We keep circling back to the fancy names like moths to a flame, when we are still discussing whether we need or want a DLM for the solution. I think we should stop doing that, we need to decide on the solution from an abstract point of view (like you say, define the problem space) and not get caught up on discussions of which one of those is best. If we end up deciding to use a DLM, which is unlikely, then we can look into available drivers in Tooz and if we are not convinced with the ones we have (Redis, ZooKeeper, etc.) then we discuss which one we should be using instead and just add it to Tooz. Cheers, Gorka. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
On Mon, Aug 03 2015, Thierry Carrez wrote: The last thing we want is to rush a solution that would only solve a particular project use case. Personally I'd like us to pick the simplest solution that can solve most of the use cases. Each of the solutions bring something to the table -- Zookeeper is mature, Consul is featureful, etcd is lean and simple... Let's not dive into the best solution but clearly define the problem space first. Or just start using Tooz – like some of OpenStack are already doing for months – and let the operators pick the backend that they are the most comfortable with? :) -- Julien Danjou -- Free Software hacker -- http://julien.danjou.info signature.asc Description: PGP signature __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
Excerpts from Devananda van der Veen's message of 2015-08-03 08:53:21 -0700: On Mon, Aug 3, 2015 at 8:41 AM Joshua Harlow harlo...@outlook.com wrote: Clint Byrum wrote: Excerpts from Gorka Eguileor's message of 2015-08-02 15:49:46 -0700: On Fri, Jul 31, 2015 at 01:47:22AM -0700, Mike Perez wrote: On Mon, Jul 27, 2015 at 12:35 PM, Gorka Eguileorgegui...@redhat.com wrote: I know we've all been looking at the HA Active-Active problem in Cinder and trying our best to figure out possible solutions to the different issues, and since current plan is going to take a while (because it requires that we finish first fixing Cinder-Nova interactions), I've been looking at alternatives that allow Active-Active configurations without needing to wait for those changes to take effect. And I think I have found a possible solution, but since the HA A-A problem has a lot of moving parts I ended up upgrading my initial Etherpad notes to a post [1]. Even if we decide that this is not the way to go, which we'll probably do, I still think that the post brings a little clarity on all the moving parts of the problem, even some that are not reflected on our Etherpad [2], and it can help us not miss anything when deciding on a different solution. Based on IRC conversations in the Cinder room and hearing people's opinions in the spec reviews, I'm not convinced the complexity that a distributed lock manager adds to Cinder for both developers and the operators who ultimately are going to have to learn to maintain things like Zoo Keeper as a result is worth it. **Key point**: We're not scaling Cinder itself, it's about scaling to avoid build up of operations from the storage backend solutions themselves. Whatever people think ZooKeeper scaling level is going to accomplish is not even a question. We don't need it, because Cinder isn't as complex as people are making it. I'd like to think the Cinder team is a great in recognizing potential cross project initiatives. Look at what Thang Pham has done with Nova's version object solution. He made a generic solution into an Oslo solution for all, and Cinder is using it. That was awesome, and people really appreciated that there was a focus for other projects to get better, not just Cinder. Have people consider Ironic's hash ring solution? The project Akanda is now adopting it [1], and I think it might have potential. I'd appreciate it if interested parties could have this evaluated before the Cinder midcycle sprint next week, to be ready for discussion. [1] - https://review.openstack.org/#/c/195366/ -- Mike Perez Hi all, Since my original proposal was more complex that it needed be I have a new proposal of a simpler solution, and I describe how we can do it with or without a DLM since we don't seem to reach an agreement on that. The solution description was more rushed than previous one so I may have missed some things. http://gorka.eguileor.com/simpler-road-to-cinder-active-active/ I like the idea of keeping it simpler Gorka. :) Note that this is punting back to use the database for coordination, which is what most projects have done thus far, and has a number of advantages and disadvantages. Note that the stale-lock problem was solved in an interesting way in Heat: each engine process gets an instance-of-engine uuid that adds to the topic queues it listens to. If it holds a lock, it records this UUID in the owner field. When somebody wants to steal the lock (due to timeout) they send to this queue, and if there's no response, the lock is stolen. Anyway, I think what might make more sense than copying that directly, is implementing Use the database and oslo.messaging to build a DLM as a tooz backend. This way as the negative aspects of that approach impact an operator, they can pick a tooz driver that satisfies their needs, or even write one to their specific backend needs. Oh jeez, using 'the database and oslo.messaging to build a DLM' scares me :-/ There are already N + 1 DLM like-systems out there (and more every day if u consider the list at https://raftconsensus.github.io/#implementations) so I'd really rather use one that is proven to work by academia vs make a frankenstein one. Joshua, As has been said on this thread, some projects (eg, Ironic) are already using a consistent hash ring backed by a database to meet the requirements they have. Could those requirements also be met with some other tech? Yes. Would that provide additional functionality or some other benefits? Maybe. But that's not what this thread was about. Distributed hash rings are a well understood technique, as are databases. There's no need to be insulting by calling not-your-favorite-technology-of-the-day a frankenstein one. The topic here,
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
On Mon, Aug 03, 2015 at 07:01:45PM +1000, Morgan Fainberg wrote: Lets step back away from tooz. Tooz for the sake of this conversation is as much the same as saying zookeeper or consul or etcd, etc. We should be focused (as both Flavio and Thierry said) on if we need DLM and what it will solve. What do you mean we should be focused on if we need a DLM and what it will solve? I don't know what you mean, as those answers are quite clear: - The DLM replaces our current local file locks and extends them among nodes, it does not provide any additional functionality. - Do we need a DLM? Need is a strong word, if you are asking if we can do it without a DLM, then the answer is yes, we can do it without it. And if you ask if it will take more time than using a DLM and has the potential to introduce more bugs, then the answer is yes as well. - Will we keep using a DLM forever? No, we will change the DLM locks with DB mutual exclusion at the API nodes later. Gorka. Once we have all of that defined, the use of an abstraction such as tooz (or just the direct bindings for some specific choice) can be made. I want to voice that we should be very picky about the solution (if we decide on a DLM) so that we are implementing to the strengths of the solution rather than try and make everything work seamlessly. --Morgan Sent via mobile On Aug 3, 2015, at 18:49, Julien Danjou jul...@danjou.info wrote: On Mon, Aug 03 2015, Thierry Carrez wrote: The last thing we want is to rush a solution that would only solve a particular project use case. Personally I'd like us to pick the simplest solution that can solve most of the use cases. Each of the solutions bring something to the table -- Zookeeper is mature, Consul is featureful, etcd is lean and simple... Let's not dive into the best solution but clearly define the problem space first. Or just start using Tooz – like some of OpenStack are already doing for months – and let the operators pick the backend that they are the most comfortable with? :) -- Julien Danjou -- Free Software hacker -- http://julien.danjou.info __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
We'll have a chance to discuss DB mutual exclusion at the API nodes at the Cider mid-cyle, which starts tomorrow. The details, issues, and realistic schedule for that will be a key piece to this whole puzzle, since anything else is seen as a temporary solution. -Original Message- From: Gorka Eguileor [mailto:gegui...@redhat.com] Sent: Monday, August 03, 2015 11:38 AM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active On Mon, Aug 03, 2015 at 07:01:45PM +1000, Morgan Fainberg wrote: Lets step back away from tooz. Tooz for the sake of this conversation is as much the same as saying zookeeper or consul or etcd, etc. We should be focused (as both Flavio and Thierry said) on if we need DLM and what it will solve. What do you mean we should be focused on if we need a DLM and what it will solve? I don't know what you mean, as those answers are quite clear: - The DLM replaces our current local file locks and extends them among nodes, it does not provide any additional functionality. - Do we need a DLM? Need is a strong word, if you are asking if we can do it without a DLM, then the answer is yes, we can do it without it. And if you ask if it will take more time than using a DLM and has the potential to introduce more bugs, then the answer is yes as well. - Will we keep using a DLM forever? No, we will change the DLM locks with DB mutual exclusion at the API nodes later. Gorka. Once we have all of that defined, the use of an abstraction such as tooz (or just the direct bindings for some specific choice) can be made. I want to voice that we should be very picky about the solution (if we decide on a DLM) so that we are implementing to the strengths of the solution rather than try and make everything work seamlessly. --Morgan Sent via mobile On Aug 3, 2015, at 18:49, Julien Danjou jul...@danjou.info wrote: On Mon, Aug 03 2015, Thierry Carrez wrote: The last thing we want is to rush a solution that would only solve a particular project use case. Personally I'd like us to pick the simplest solution that can solve most of the use cases. Each of the solutions bring something to the table -- Zookeeper is mature, Consul is featureful, etcd is lean and simple... Let's not dive into the best solution but clearly define the problem space first. Or just start using Tooz – like some of OpenStack are already doing for months – and let the operators pick the backend that they are the most comfortable with? :) -- Julien Danjou -- Free Software hacker -- http://julien.danjou.info __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
For example, to parallel the conversation with databases: We want a database. Well, that means mongodb, postgres, mysql, berkeleydb, etc Oh, well, I need it to be a relational db, Well, that means postgresq, mysql, etc Oh, and I need recursive queries... that excludes even more. We are pretty sure We want a distributed lock manager. What problems are we trying to solve using it, and what features do they require in the DLM/DLM Abstraction of choice? That will exclude some of them. It also may exclude abstraction layers that don't expose the features needed. (Recursive queries for example) Thanks, Kevin From: Gorka Eguileor [gegui...@redhat.com] Sent: Monday, August 03, 2015 10:48 AM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active On Mon, Aug 03, 2015 at 03:42:48PM +, Fox, Kevin M wrote: I'm usually for abstraction layers, but they don't always pay off very well due to catering to the lowest common denominator. Lets clearly define the problem space first. IFF the problem space can be fully implemented using Tooz, then lets do that. Then the operator can choose. If Tooz cant and wont handle the problem space, then we're trying to fit a square peg in a round hole. What do you mean with clearly define the problem space? We know what we want, we just need to agree on the compromises we are willing to make, use a DLM and make admins' life a little harder (only for those that deploy A-A) but have an A-A solution earlier, or postpone A-A functionality but make their life easier. And we already know that Tooz is not the Holy Grail and will not perform the miracle of giving Cinder HA A-A. It is only a piece of the problem, so there's nothing to discuss there, and it's not a square peg on a round hole, because it fits perfectly for what it is intended. But once you have filled that square hole you need another peg, the round one for the round hole. If people are expecting to find one thing that fixes everything and gives us HA A-A on its own, then I believe they are a little bit lost. Gorka. Thanks, Kevin From: Gorka Eguileor [gegui...@redhat.com] Sent: Monday, August 03, 2015 1:43 AM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active On Mon, Aug 03, 2015 at 10:22:42AM +0200, Thierry Carrez wrote: Flavio Percoco wrote: [...] So, to summarize, I love the effort behind this. But, as others have mentioned, I'd like us to take a step back, run this accross teams and come up with an opinonated solution that would work for everyone. Starting this discussion now would allow us to prepare enough material to reach an agreement in Tokyo and work on a single solution for Mikata. This sounds like a good topic for a cross-project session. +1 The last thing we want is to rush a solution that would only solve a particular project use case. Personally I'd like us to pick the simplest solution that can solve most of the use cases. Each of the solutions bring something to the table -- Zookeeper is mature, Consul is featureful, etcd is lean and simple... Let's not dive into the best solution but clearly define the problem space first. -- Thierry Carrez (ttx) I don't see those as different solutions from the point of view of Cinder, they are different implementations to the same solution case, using a DLM to lock resources. We keep circling back to the fancy names like moths to a flame, when we are still discussing whether we need or want a DLM for the solution. I think we should stop doing that, we need to decide on the solution from an abstract point of view (like you say, define the problem space) and not get caught up on discussions of which one of those is best. If we end up deciding to use a DLM, which is unlikely, then we can look into available drivers in Tooz and if we are not convinced with the ones we have (Redis, ZooKeeper, etc.) then we discuss which one we should be using instead and just add it to Tooz. Cheers, Gorka. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
On Mon, Aug 03, 2015 at 03:42:48PM +, Fox, Kevin M wrote: I'm usually for abstraction layers, but they don't always pay off very well due to catering to the lowest common denominator. Lets clearly define the problem space first. IFF the problem space can be fully implemented using Tooz, then lets do that. Then the operator can choose. If Tooz cant and wont handle the problem space, then we're trying to fit a square peg in a round hole. What do you mean with clearly define the problem space? We know what we want, we just need to agree on the compromises we are willing to make, use a DLM and make admins' life a little harder (only for those that deploy A-A) but have an A-A solution earlier, or postpone A-A functionality but make their life easier. And we already know that Tooz is not the Holy Grail and will not perform the miracle of giving Cinder HA A-A. It is only a piece of the problem, so there's nothing to discuss there, and it's not a square peg on a round hole, because it fits perfectly for what it is intended. But once you have filled that square hole you need another peg, the round one for the round hole. If people are expecting to find one thing that fixes everything and gives us HA A-A on its own, then I believe they are a little bit lost. Gorka. Thanks, Kevin From: Gorka Eguileor [gegui...@redhat.com] Sent: Monday, August 03, 2015 1:43 AM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active On Mon, Aug 03, 2015 at 10:22:42AM +0200, Thierry Carrez wrote: Flavio Percoco wrote: [...] So, to summarize, I love the effort behind this. But, as others have mentioned, I'd like us to take a step back, run this accross teams and come up with an opinonated solution that would work for everyone. Starting this discussion now would allow us to prepare enough material to reach an agreement in Tokyo and work on a single solution for Mikata. This sounds like a good topic for a cross-project session. +1 The last thing we want is to rush a solution that would only solve a particular project use case. Personally I'd like us to pick the simplest solution that can solve most of the use cases. Each of the solutions bring something to the table -- Zookeeper is mature, Consul is featureful, etcd is lean and simple... Let's not dive into the best solution but clearly define the problem space first. -- Thierry Carrez (ttx) I don't see those as different solutions from the point of view of Cinder, they are different implementations to the same solution case, using a DLM to lock resources. We keep circling back to the fancy names like moths to a flame, when we are still discussing whether we need or want a DLM for the solution. I think we should stop doing that, we need to decide on the solution from an abstract point of view (like you say, define the problem space) and not get caught up on discussions of which one of those is best. If we end up deciding to use a DLM, which is unlikely, then we can look into available drivers in Tooz and if we are not convinced with the ones we have (Redis, ZooKeeper, etc.) then we discuss which one we should be using instead and just add it to Tooz. Cheers, Gorka. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
Agree with Thierry. Let's define requirements. We are trying to solve HA not scale infinitely number of cinder instances running. Thanks, Arkady -Original Message- From: Gorka Eguileor [mailto:gegui...@redhat.com] Sent: Monday, August 03, 2015 3:44 AM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active On Mon, Aug 03, 2015 at 10:22:42AM +0200, Thierry Carrez wrote: Flavio Percoco wrote: [...] So, to summarize, I love the effort behind this. But, as others have mentioned, I'd like us to take a step back, run this accross teams and come up with an opinonated solution that would work for everyone. Starting this discussion now would allow us to prepare enough material to reach an agreement in Tokyo and work on a single solution for Mikata. This sounds like a good topic for a cross-project session. +1 The last thing we want is to rush a solution that would only solve a particular project use case. Personally I'd like us to pick the simplest solution that can solve most of the use cases. Each of the solutions bring something to the table -- Zookeeper is mature, Consul is featureful, etcd is lean and simple... Let's not dive into the best solution but clearly define the problem space first. -- Thierry Carrez (ttx) I don't see those as different solutions from the point of view of Cinder, they are different implementations to the same solution case, using a DLM to lock resources. We keep circling back to the fancy names like moths to a flame, when we are still discussing whether we need or want a DLM for the solution. I think we should stop doing that, we need to decide on the solution from an abstract point of view (like you say, define the problem space) and not get caught up on discussions of which one of those is best. If we end up deciding to use a DLM, which is unlikely, then we can look into available drivers in Tooz and if we are not convinced with the ones we have (Redis, ZooKeeper, etc.) then we discuss which one we should be using instead and just add it to Tooz. Cheers, Gorka. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
On Aug 4, 2015, at 01:42, Fox, Kevin M kevin@pnnl.gov wrote: I'm usually for abstraction layers, but they don't always pay off very well due to catering to the lowest common denominator. Lets clearly define the problem space first. IFF the problem space can be fully implemented using Tooz, then lets do that. Then the operator can choose. If Tooz cant and wont handle the problem space, then we're trying to fit a square peg in a round hole. +1 and specifically around tooz, it is narrow in comparison to the feature sets of some the DLMs (since it has to mostly-implement to the lowest common denominator, as abstraction layers do). Defining the space we are trying to target will let us make the informed decision on what we use. Thanks, Kevin From: Gorka Eguileor [gegui...@redhat.com] Sent: Monday, August 03, 2015 1:43 AM To: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active On Mon, Aug 03, 2015 at 10:22:42AM +0200, Thierry Carrez wrote: Flavio Percoco wrote: [...] So, to summarize, I love the effort behind this. But, as others have mentioned, I'd like us to take a step back, run this accross teams and come up with an opinonated solution that would work for everyone. Starting this discussion now would allow us to prepare enough material to reach an agreement in Tokyo and work on a single solution for Mikata. This sounds like a good topic for a cross-project session. +1 The last thing we want is to rush a solution that would only solve a particular project use case. Personally I'd like us to pick the simplest solution that can solve most of the use cases. Each of the solutions bring something to the table -- Zookeeper is mature, Consul is featureful, etcd is lean and simple... Let's not dive into the best solution but clearly define the problem space first. -- Thierry Carrez (ttx) I don't see those as different solutions from the point of view of Cinder, they are different implementations to the same solution case, using a DLM to lock resources. We keep circling back to the fancy names like moths to a flame, when we are still discussing whether we need or want a DLM for the solution. I think we should stop doing that, we need to decide on the solution from an abstract point of view (like you say, define the problem space) and not get caught up on discussions of which one of those is best. If we end up deciding to use a DLM, which is unlikely, then we can look into available drivers in Tooz and if we are not convinced with the ones we have (Redis, ZooKeeper, etc.) then we discuss which one we should be using instead and just add it to Tooz. Cheers, Gorka. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
Daniel Comnea wrote: From Operators point of view i'd love to see less technology proliferation in OpenStack, if you wear the developer hat please don't be selfish, take into account the others :) ZK is a robust technology but hey is a beast like Rabbit, there is a lot to massage and over 2 data centers ZK is not very efficient. Very much understand the operator view here, IMHO in its current state according to http://gorka.eguileor.com/a-cinder-road-to-activeactive-ha/ I'd say the operators of cinder in are a much worse boat right now, and adding a robust technology in that could help the current state doesn't exactly seem that bad. IMHO if you are planning to (or are) running a cloud you are likely going to have to be running zookeeper or similar service soon if you aren't already anyway; because most cloudy projects already depend on such services (for service discovery, configuration discovery/management, DLM locking, fault detection, leader election...) As for the 2 data centers, afaik the following is making this better: https://zookeeper.apache.org/doc/trunk/zookeeperObservers.html On Sat, Aug 1, 2015 at 4:27 AM, Joshua Harlow harlo...@outlook.com mailto:harlo...@outlook.com wrote: Monty Taylor wrote: On 08/01/2015 03:40 AM, Mike Perez wrote: On Fri, Jul 31, 2015 at 8:56 AM, Joshua Harlowharlo...@outlook.com mailto:harlo...@outlook.com wrote: ...random thought here, skip as needed... in all honesty orchestration solutions like mesos (http://mesos.apache.org/assets/img/documentation/architecture3.jpg), map-reduce solutions like hadoop, stream processing systems like apache storm (...), are already using zookeeper and I'm not saying we should just use it cause they are, but the likelihood that they just picked it for no reason are imho slim. I'd really like to see focus cross project. I don't want Ceilometer to depend on Zoo Keeper, Cinder to depend on etcd, etc. This is not ideal for an operator to have to deploy, learn and maintain each of these solutions. I think this is difficult when you consider everyone wants options of their preferred DLM. If we went this route, we should pick one. Regardless, I want to know if we really need a DLM. Does Ceilometer really need a DLM? Does Cinder really need a DLM? Can we just use a hash ring solution where operators don't even have to know or care about deploying a DLM and running multiple instances of Cinder manager just works? I'd like to take that one step further and say that we should also look holistically at the other things that such technology are often used for in distributed systems and see if, in addition to Does Cinder need a DLM - ask does Cinder need service discover and does Cinder need distributed KV store and does anyone else? Adding something like zookeeper or etcd or consul has the potential to allow us to design an OpenStack that works better. Adding all of them in an ad-hoc and uncoordinated manner is a bit sledgehammery. The Java community uses zookeeper a lot The container orchestration community seem to all love etcd I hear tell that there a bunch of ops people who are in love with consul I'd suggest we look at more than lock management. Oh I very much agree, but gotta start somewhere :) __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
On Aug 1, 2015, at 09:51, Monty Taylor mord...@inaugust.com wrote: On 08/01/2015 03:40 AM, Mike Perez wrote: On Fri, Jul 31, 2015 at 8:56 AM, Joshua Harlow harlo...@outlook.com wrote: ...random thought here, skip as needed... in all honesty orchestration solutions like mesos (http://mesos.apache.org/assets/img/documentation/architecture3.jpg), map-reduce solutions like hadoop, stream processing systems like apache storm (...), are already using zookeeper and I'm not saying we should just use it cause they are, but the likelihood that they just picked it for no reason are imho slim. I'd really like to see focus cross project. I don't want Ceilometer to depend on Zoo Keeper, Cinder to depend on etcd, etc. This is not ideal for an operator to have to deploy, learn and maintain each of these solutions. I think this is difficult when you consider everyone wants options of their preferred DLM. If we went this route, we should pick one. Regardless, I want to know if we really need a DLM. Does Ceilometer really need a DLM? Does Cinder really need a DLM? Can we just use a hash ring solution where operators don't even have to know or care about deploying a DLM and running multiple instances of Cinder manager just works? I'd like to take that one step further and say that we should also look holistically at the other things that such technology are often used for in distributed systems and see if, in addition to Does Cinder need a DLM - ask does Cinder need service discover and does Cinder need distributed KV store and does anyone else? Adding something like zookeeper or etcd or consul has the potential to allow us to design an OpenStack that works better. Adding all of them in an ad-hoc and uncoordinated manner is a bit sledgehammery. The Java community uses zookeeper a lot The container orchestration community seem to all love etcd I hear tell that there a bunch of ops people who are in love with consul I'd suggest we look at more than lock management. From the perspective of what zookeeper, consul, or etcd (no particular order of preference) brings to the table, i would like to see a hard look taken at incorporating at least one of them this way. I see it as a huge win (especially from the keystone side and with distributed key-value-store capabilities). There are so many things we can do really improve openstack across the board. Utilizing consul or similar for helping to manage the keystone catalog or sourcing the individual endpoint policy.json without needing to copy it to horizon is just a start beyond the proposed DLM uses in this thread. There is a lot we can benefit from with one of these tools being generally available for openstack deployments. --morgan Sent via mobile __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
From Operators point of view i'd love to see less technology proliferation in OpenStack, if you wear the developer hat please don't be selfish, take into account the others :) ZK is a robust technology but hey is a beast like Rabbit, there is a lot to massage and over 2 data centers ZK is not very efficient. On Sat, Aug 1, 2015 at 4:27 AM, Joshua Harlow harlo...@outlook.com wrote: Monty Taylor wrote: On 08/01/2015 03:40 AM, Mike Perez wrote: On Fri, Jul 31, 2015 at 8:56 AM, Joshua Harlowharlo...@outlook.com wrote: ...random thought here, skip as needed... in all honesty orchestration solutions like mesos (http://mesos.apache.org/assets/img/documentation/architecture3.jpg), map-reduce solutions like hadoop, stream processing systems like apache storm (...), are already using zookeeper and I'm not saying we should just use it cause they are, but the likelihood that they just picked it for no reason are imho slim. I'd really like to see focus cross project. I don't want Ceilometer to depend on Zoo Keeper, Cinder to depend on etcd, etc. This is not ideal for an operator to have to deploy, learn and maintain each of these solutions. I think this is difficult when you consider everyone wants options of their preferred DLM. If we went this route, we should pick one. Regardless, I want to know if we really need a DLM. Does Ceilometer really need a DLM? Does Cinder really need a DLM? Can we just use a hash ring solution where operators don't even have to know or care about deploying a DLM and running multiple instances of Cinder manager just works? I'd like to take that one step further and say that we should also look holistically at the other things that such technology are often used for in distributed systems and see if, in addition to Does Cinder need a DLM - ask does Cinder need service discover and does Cinder need distributed KV store and does anyone else? Adding something like zookeeper or etcd or consul has the potential to allow us to design an OpenStack that works better. Adding all of them in an ad-hoc and uncoordinated manner is a bit sledgehammery. The Java community uses zookeeper a lot The container orchestration community seem to all love etcd I hear tell that there a bunch of ops people who are in love with consul I'd suggest we look at more than lock management. Oh I very much agree, but gotta start somewhere :) __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
I couldn't put it better, nice write up Morgan!! +1 On Sun, Aug 2, 2015 at 10:28 AM, Morgan Fainberg morgan.fainb...@gmail.com wrote: On Aug 2, 2015, at 16:00, Daniel Comnea comnea.d...@gmail.com wrote: From Operators point of view i'd love to see less technology proliferation in OpenStack, if you wear the developer hat please don't be selfish, take into account the others :) ZK is a robust technology but hey is a beast like Rabbit, there is a lot to massage and over 2 data centers ZK is not very efficient. Sure, lets evaluate the more far reaching benefits of running the new service for all openstack deployments. This is not a hey neat tech debate, it is a lets see if this tool solves enough issues that it is worth using an 'innovation token' on. I think it is worth it personally, but it should be a consistent choice with a strong reason and added value beyond a single one-off usecase. --morgan Sent via mobile __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
On Aug 2, 2015, at 16:00, Daniel Comnea comnea.d...@gmail.com wrote: From Operators point of view i'd love to see less technology proliferation in OpenStack, if you wear the developer hat please don't be selfish, take into account the others :) ZK is a robust technology but hey is a beast like Rabbit, there is a lot to massage and over 2 data centers ZK is not very efficient. Sure, lets evaluate the more far reaching benefits of running the new service for all openstack deployments. This is not a hey neat tech debate, it is a lets see if this tool solves enough issues that it is worth using an 'innovation token' on. I think it is worth it personally, but it should be a consistent choice with a strong reason and added value beyond a single one-off usecase. --morgan Sent via mobile __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
On Fri, Jul 31, 2015 at 01:47:22AM -0700, Mike Perez wrote: On Mon, Jul 27, 2015 at 12:35 PM, Gorka Eguileor gegui...@redhat.com wrote: I know we've all been looking at the HA Active-Active problem in Cinder and trying our best to figure out possible solutions to the different issues, and since current plan is going to take a while (because it requires that we finish first fixing Cinder-Nova interactions), I've been looking at alternatives that allow Active-Active configurations without needing to wait for those changes to take effect. And I think I have found a possible solution, but since the HA A-A problem has a lot of moving parts I ended up upgrading my initial Etherpad notes to a post [1]. Even if we decide that this is not the way to go, which we'll probably do, I still think that the post brings a little clarity on all the moving parts of the problem, even some that are not reflected on our Etherpad [2], and it can help us not miss anything when deciding on a different solution. Based on IRC conversations in the Cinder room and hearing people's opinions in the spec reviews, I'm not convinced the complexity that a distributed lock manager adds to Cinder for both developers and the operators who ultimately are going to have to learn to maintain things like Zoo Keeper as a result is worth it. **Key point**: We're not scaling Cinder itself, it's about scaling to avoid build up of operations from the storage backend solutions themselves. Whatever people think ZooKeeper scaling level is going to accomplish is not even a question. We don't need it, because Cinder isn't as complex as people are making it. I'd like to think the Cinder team is a great in recognizing potential cross project initiatives. Look at what Thang Pham has done with Nova's version object solution. He made a generic solution into an Oslo solution for all, and Cinder is using it. That was awesome, and people really appreciated that there was a focus for other projects to get better, not just Cinder. Have people consider Ironic's hash ring solution? The project Akanda is now adopting it [1], and I think it might have potential. I'd appreciate it if interested parties could have this evaluated before the Cinder midcycle sprint next week, to be ready for discussion. [1] - https://review.openstack.org/#/c/195366/ -- Mike Perez Hi all, Since my original proposal was more complex that it needed be I have a new proposal of a simpler solution, and I describe how we can do it with or without a DLM since we don't seem to reach an agreement on that. The solution description was more rushed than previous one so I may have missed some things. http://gorka.eguileor.com/simpler-road-to-cinder-active-active/ Cheers, Gorka. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
On Sat, Aug 1, 2015 at 2:51 AM, Monty Taylor mord...@inaugust.com wrote: I hear tell that there a bunch of ops people who are in love with consul At my company we love Consul. We found it to be very scalable and performant, gives us an easy-to-use k/v store, membership service, DNS, etc. We use it to load balance requests to our services and route requests to active instances, including to openstack and mariadb+galera. That said, I don't know if something like Consul, etcd, or zookeeper need to be part of openstack itself, or just part of the deployment (unless we decide to store metadata in a kv store in place of the SQL DB - which is entirely possible with some adjustments to openstack). I find it hard to believe that Cinder really needs distributed locks. AFAIU, there is one lock in the non-driver Cinder code to solve a race between deleting a volume and creating a snapshot/clone from it. You can solve that with other methods. I already proposed to use garbage collection for deleting volumes - you can delete offline and before deleting easily check the DB if there is an ongoing operation with the given volume as a source. If yes, just wait. The bulk of the locks seem to be in the drivers. I find it hard to believe that the management APIs of so many storage products cannot be called concurrently. I think we could solve many issues in Cinder with some requirements on drivers, such as that they need to be able to run active-active with no distributed locks. Another requirement of idempotency would significantly ease recovery pains I believe. I very much agree with Mike's statement that Cinder isn't as complex as people are making it. Well maybe it is, but it doesn't need to be. :-) -- *Avishay Traeger, PhD* *Architect* Mobile: +972 54 447 1475 E-mail: avis...@stratoscale.com Web http://www.stratoscale.com/ | Blog http://www.stratoscale.com/blog/ | Twitter https://twitter.com/Stratoscale | Google+ https://plus.google.com/u/1/b/108421603458396133912/108421603458396133912/posts | Linkedin https://www.linkedin.com/company/stratoscale __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
Mike Perez wrote: On Mon, Jul 27, 2015 at 12:35 PM, Gorka Eguileorgegui...@redhat.com wrote: I know we've all been looking at the HA Active-Active problem in Cinder and trying our best to figure out possible solutions to the different issues, and since current plan is going to take a while (because it requires that we finish first fixing Cinder-Nova interactions), I've been looking at alternatives that allow Active-Active configurations without needing to wait for those changes to take effect. And I think I have found a possible solution, but since the HA A-A problem has a lot of moving parts I ended up upgrading my initial Etherpad notes to a post [1]. Even if we decide that this is not the way to go, which we'll probably do, I still think that the post brings a little clarity on all the moving parts of the problem, even some that are not reflected on our Etherpad [2], and it can help us not miss anything when deciding on a different solution. Based on IRC conversations in the Cinder room and hearing people's opinions in the spec reviews, I'm not convinced the complexity that a distributed lock manager adds to Cinder for both developers and the operators who ultimately are going to have to learn to maintain things like Zoo Keeper as a result is worth it. **Key point**: We're not scaling Cinder itself, it's about scaling to avoid build up of operations from the storage backend solutions themselves. Whatever people think ZooKeeper scaling level is going to accomplish is not even a question. We don't need it, because Cinder isn't as complex as people are making it. I agree with 'cinder isn't as complex as people are making it' and that is very likely a good thing to keep in mind, whether zookeeper can help or not is a different question. Zookeeper imho is just another tool in your toolset/belt, and as with any tool u have to know when to use it (of course you can also just continue using chisels and such to); I'd rather people see that it is just that and avoid getting caught up on the other aspects prematurely. ...random thought here, skip as needed... in all honesty orchestration solutions like mesos (http://mesos.apache.org/assets/img/documentation/architecture3.jpg), map-reduce solutions like hadoop, stream processing systems like apache storm (...), are already using zookeeper and I'm not saying we should just use it cause they are, but the likelihood that they just picked it for no reason are imho slim. I'd like to think the Cinder team is a great in recognizing potential cross project initiatives. Look at what Thang Pham has done with Nova's version object solution. He made a generic solution into an Oslo solution for all, and Cinder is using it. That was awesome, and people really appreciated that there was a focus for other projects to get better, not just Cinder. Have people consider Ironic's hash ring solution? The project Akanda is now adopting it [1], and I think it might have potential. I'd appreciate it if interested parties could have this evaluated before the Cinder midcycle sprint next week, to be ready for discussion. [1] - https://review.openstack.org/#/c/195366/ -- Mike Perez __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
On 31 July 2015 at 20:40, Mike Perez thin...@gmail.com wrote: Regardless, I want to know if we really need a DLM. Does Ceilometer really need a DLM? Does Cinder really need a DLM? Can we just use a hash ring solution where operators don't even have to know or care about deploying a DLM and running multiple instances of Cinder manager just works? There's a lot of circling around here about what we're trying to achieve with 'H/A'. Some people are interested in performance. For them, a hash ring solution (deterministic load balancing) is fine. If the aim is availability (as mine is) then I can't see how it helps. I might be missing something, of course - if so, I'm happy to be corrected. To be clear, my aim with H/A is to remove the situation where a single node failure removes the control path for my storage. Currently, the only way to avoid this is to use something like pacemaker to monitor the c-vol services. Extensive experience suggests that pacemaker is a complex, fragile piece of software. Every component of cinder except c-vol can be deployer active/active[/active/...] - I'm aiming for consistency of approach if nothing else. If it ends up that trying to fix this adds too much complexity and/or fragility to cinder itself, then I can accept that - once whatever we do ends up being worse than pacemaker, we've taken a significant step backwards. Regardless of how H/A discussions go, the first part of Gorka's patch can certainly be used to fix a few of the API races we have, and can do so with rather nice, elegant, easy to understand code, so I think the whole process has been productive whatever the H/A outcome. -- Duncan Thomas __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
Mike Perez wrote: On Fri, Jul 31, 2015 at 8:56 AM, Joshua Harlowharlo...@outlook.com wrote: ...random thought here, skip as needed... in all honesty orchestration solutions like mesos (http://mesos.apache.org/assets/img/documentation/architecture3.jpg), map-reduce solutions like hadoop, stream processing systems like apache storm (...), are already using zookeeper and I'm not saying we should just use it cause they are, but the likelihood that they just picked it for no reason are imho slim. I'd really like to see focus cross project. I don't want Ceilometer to depend on Zoo Keeper, Cinder to depend on etcd, etc. This is not ideal for an operator to have to deploy, learn and maintain each of these solutions. I think this is difficult when you consider everyone wants options of their preferred DLM. If we went this route, we should pick one. +1 Regardless, I want to know if we really need a DLM. Does Ceilometer really need a DLM? Does Cinder really need a DLM? Can we just use a hash ring solution where operators don't even have to know or care about deploying a DLM and running multiple instances of Cinder manager just works? All very good questions, although IMHO a hash-ring is just a piece of the puzzle, and is more equivalent to sharding resources, which yes is one way to scale as long as each shard never touches anything from the other shards. If those shards ever start to need to touch anything shared then u get back into this same situation again for a DLM (and at that point u really do need the 'distributed' part of DLM, because each shard is distributed). And an few (maybe obvious) questions: - How would re-sharding work? - If sharding (the hash-ring partitioning) is based on entities (conductors/other) owning a 'bucket' of resources (ie entity 1 manages resources A-F, entity 2 manages resources G-M...), what happens if a entity dies, does some other entity take over that bucket, what happens if that entity really hasn't 'died' but is just disconnected from the network (partition tolerance...)? (If the answer is there is a lock on the resource/s being used by each entity, then u get back into the LM question). I'm unsure about how ironic handles these problems (although I believe they have a hash-ring and still have a locking scheme as well, so maybe thats there answer for the dual-entities manipulating the same bucket problem). -- Mike Perez __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
On Fri, Jul 31, 2015 at 8:56 AM, Joshua Harlow harlo...@outlook.com wrote: ...random thought here, skip as needed... in all honesty orchestration solutions like mesos (http://mesos.apache.org/assets/img/documentation/architecture3.jpg), map-reduce solutions like hadoop, stream processing systems like apache storm (...), are already using zookeeper and I'm not saying we should just use it cause they are, but the likelihood that they just picked it for no reason are imho slim. I'd really like to see focus cross project. I don't want Ceilometer to depend on Zoo Keeper, Cinder to depend on etcd, etc. This is not ideal for an operator to have to deploy, learn and maintain each of these solutions. I think this is difficult when you consider everyone wants options of their preferred DLM. If we went this route, we should pick one. Regardless, I want to know if we really need a DLM. Does Ceilometer really need a DLM? Does Cinder really need a DLM? Can we just use a hash ring solution where operators don't even have to know or care about deploying a DLM and running multiple instances of Cinder manager just works? -- Mike Perez __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
Excerpts from Mike Perez's message of 2015-07-31 10:40:04 -0700: On Fri, Jul 31, 2015 at 8:56 AM, Joshua Harlow harlo...@outlook.com wrote: ...random thought here, skip as needed... in all honesty orchestration solutions like mesos (http://mesos.apache.org/assets/img/documentation/architecture3.jpg), map-reduce solutions like hadoop, stream processing systems like apache storm (...), are already using zookeeper and I'm not saying we should just use it cause they are, but the likelihood that they just picked it for no reason are imho slim. I'd really like to see focus cross project. I don't want Ceilometer to depend on Zoo Keeper, Cinder to depend on etcd, etc. This is not ideal for an operator to have to deploy, learn and maintain each of these solutions. I think this is difficult when you consider everyone wants options of their preferred DLM. If we went this route, we should pick one. Regardless, I want to know if we really need a DLM. Does Ceilometer really need a DLM? Does Cinder really need a DLM? Can we just use a hash ring solution where operators don't even have to know or care about deploying a DLM and running multiple instances of Cinder manager just works? So in the Ironic case, if two conductors decide they both own one IPMI controller, _chaos_ can ensue. They may, at different times, read that the power is up, or down, and issue power control commands that may take many seconds, and thus on the next status run of the other command may cause the conductor to react by reversing, and they'll just fight over the node in a tug-o-war fashion. Oh wait, except, thats not true. Instead, they use the database as a locking mechanism, and AFAIK, no nodes have been torn limb from limb by two conductors thus far. But, a DLM would be more efficient, and actually simplify failure recovery for Ironic's operators. The database locks suffer from being a little too conservative, and sometimes you just have to go into the DB and delete a lock after something explodes (this was true 6 months ago, it may have better automation sometimes now, I don't know). Anyway, I'm all for the simplest possible solution. But, don't make it _too_ simple. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
Monty Taylor wrote: On 08/01/2015 03:40 AM, Mike Perez wrote: On Fri, Jul 31, 2015 at 8:56 AM, Joshua Harlowharlo...@outlook.com wrote: ...random thought here, skip as needed... in all honesty orchestration solutions like mesos (http://mesos.apache.org/assets/img/documentation/architecture3.jpg), map-reduce solutions like hadoop, stream processing systems like apache storm (...), are already using zookeeper and I'm not saying we should just use it cause they are, but the likelihood that they just picked it for no reason are imho slim. I'd really like to see focus cross project. I don't want Ceilometer to depend on Zoo Keeper, Cinder to depend on etcd, etc. This is not ideal for an operator to have to deploy, learn and maintain each of these solutions. I think this is difficult when you consider everyone wants options of their preferred DLM. If we went this route, we should pick one. Regardless, I want to know if we really need a DLM. Does Ceilometer really need a DLM? Does Cinder really need a DLM? Can we just use a hash ring solution where operators don't even have to know or care about deploying a DLM and running multiple instances of Cinder manager just works? I'd like to take that one step further and say that we should also look holistically at the other things that such technology are often used for in distributed systems and see if, in addition to Does Cinder need a DLM - ask does Cinder need service discover and does Cinder need distributed KV store and does anyone else? Adding something like zookeeper or etcd or consul has the potential to allow us to design an OpenStack that works better. Adding all of them in an ad-hoc and uncoordinated manner is a bit sledgehammery. The Java community uses zookeeper a lot The container orchestration community seem to all love etcd I hear tell that there a bunch of ops people who are in love with consul I'd suggest we look at more than lock management. Oh I very much agree, but gotta start somewhere :) __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
Clint Byrum wrote: Excerpts from Mike Perez's message of 2015-07-31 10:40:04 -0700: On Fri, Jul 31, 2015 at 8:56 AM, Joshua Harlowharlo...@outlook.com wrote: ...random thought here, skip as needed... in all honesty orchestration solutions like mesos (http://mesos.apache.org/assets/img/documentation/architecture3.jpg), map-reduce solutions like hadoop, stream processing systems like apache storm (...), are already using zookeeper and I'm not saying we should just use it cause they are, but the likelihood that they just picked it for no reason are imho slim. I'd really like to see focus cross project. I don't want Ceilometer to depend on Zoo Keeper, Cinder to depend on etcd, etc. This is not ideal for an operator to have to deploy, learn and maintain each of these solutions. I think this is difficult when you consider everyone wants options of their preferred DLM. If we went this route, we should pick one. Regardless, I want to know if we really need a DLM. Does Ceilometer really need a DLM? Does Cinder really need a DLM? Can we just use a hash ring solution where operators don't even have to know or care about deploying a DLM and running multiple instances of Cinder manager just works? So in the Ironic case, if two conductors decide they both own one IPMI controller, _chaos_ can ensue. They may, at different times, read that the power is up, or down, and issue power control commands that may take many seconds, and thus on the next status run of the other command may cause the conductor to react by reversing, and they'll just fight over the node in a tug-o-war fashion. Oh wait, except, thats not true. Instead, they use the database as a locking mechanism, and AFAIK, no nodes have been torn limb from limb by two conductors thus far. But, a DLM would be more efficient, and actually simplify failure recovery for Ironic's operators. The database locks suffer from being a little too conservative, and sometimes you just have to go into the DB and delete a lock after something explodes (this was true 6 months ago, it may have better automation sometimes now, I don't know). A point of data, using kazoo, and zk-shell (python library and python zookeeper shell like interface), just to show how much introspection can be done with zookeeper when a kazoo lock is created (tooz locks when used with zookeeper use this same/similar code). (session #1) from kazoo import client c = client.KazooClient() c.start() lk = c.Lock() lk = c.Lock('/resourceX') lk.acquire() True (session #2) $ zk-shell Welcome to zk-shell (1.1.0) (DISCONNECTED) / connect (DISCONNECTED) / connect localhost:2181 (CLOSED) / (CONNECTED) / ls /resourceX 75ef011db92a44bfabf5dbf25fe2965c__lock__00 (CONNECTED) / stat /resourceX/75ef011db92a44bfabf5dbf25fe2965c__lock__00 Stat( czxid=8103 mzxid=8103 ctime=1438383904513 mtime=1438383904513 version=0 cversion=0 aversion=0 ephemeralOwner=0x14ed0a76f850002 dataLength=0 numChildren=0 pzxid=8103 ) (CONNECTED) / stat /resourceX/ Stat( czxid=8102 mzxid=8102 ctime=1438383904494 mtime=1438383904494 version=0 cversion=1 aversion=0 ephemeralOwner=0x0 dataLength=0 numChildren=1 pzxid=8103 ) ### back to session #1 lk.release() lock in first session (CONNECTED) / ls /resourceX/ (CONNECTED) / The above shows creation times, who is waiting on the lock, modification times, the owner Anyways I digress, if anyone really wants to know more about zookeeper let me know or drop into the #zookeeper channel on freenode (I'm one of the core maintainers of kazoo). -Josh Anyway, I'm all for the simplest possible solution. But, don't make it _too_ simple. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
On 08/01/2015 03:40 AM, Mike Perez wrote: On Fri, Jul 31, 2015 at 8:56 AM, Joshua Harlow harlo...@outlook.com wrote: ...random thought here, skip as needed... in all honesty orchestration solutions like mesos (http://mesos.apache.org/assets/img/documentation/architecture3.jpg), map-reduce solutions like hadoop, stream processing systems like apache storm (...), are already using zookeeper and I'm not saying we should just use it cause they are, but the likelihood that they just picked it for no reason are imho slim. I'd really like to see focus cross project. I don't want Ceilometer to depend on Zoo Keeper, Cinder to depend on etcd, etc. This is not ideal for an operator to have to deploy, learn and maintain each of these solutions. I think this is difficult when you consider everyone wants options of their preferred DLM. If we went this route, we should pick one. Regardless, I want to know if we really need a DLM. Does Ceilometer really need a DLM? Does Cinder really need a DLM? Can we just use a hash ring solution where operators don't even have to know or care about deploying a DLM and running multiple instances of Cinder manager just works? I'd like to take that one step further and say that we should also look holistically at the other things that such technology are often used for in distributed systems and see if, in addition to Does Cinder need a DLM - ask does Cinder need service discover and does Cinder need distributed KV store and does anyone else? Adding something like zookeeper or etcd or consul has the potential to allow us to design an OpenStack that works better. Adding all of them in an ad-hoc and uncoordinated manner is a bit sledgehammery. The Java community uses zookeeper a lot The container orchestration community seem to all love etcd I hear tell that there a bunch of ops people who are in love with consul I'd suggest we look at more than lock management. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
On Mon, Jul 27, 2015 at 12:35 PM, Gorka Eguileor gegui...@redhat.com wrote: I know we've all been looking at the HA Active-Active problem in Cinder and trying our best to figure out possible solutions to the different issues, and since current plan is going to take a while (because it requires that we finish first fixing Cinder-Nova interactions), I've been looking at alternatives that allow Active-Active configurations without needing to wait for those changes to take effect. And I think I have found a possible solution, but since the HA A-A problem has a lot of moving parts I ended up upgrading my initial Etherpad notes to a post [1]. Even if we decide that this is not the way to go, which we'll probably do, I still think that the post brings a little clarity on all the moving parts of the problem, even some that are not reflected on our Etherpad [2], and it can help us not miss anything when deciding on a different solution. Based on IRC conversations in the Cinder room and hearing people's opinions in the spec reviews, I'm not convinced the complexity that a distributed lock manager adds to Cinder for both developers and the operators who ultimately are going to have to learn to maintain things like Zoo Keeper as a result is worth it. **Key point**: We're not scaling Cinder itself, it's about scaling to avoid build up of operations from the storage backend solutions themselves. Whatever people think ZooKeeper scaling level is going to accomplish is not even a question. We don't need it, because Cinder isn't as complex as people are making it. I'd like to think the Cinder team is a great in recognizing potential cross project initiatives. Look at what Thang Pham has done with Nova's version object solution. He made a generic solution into an Oslo solution for all, and Cinder is using it. That was awesome, and people really appreciated that there was a focus for other projects to get better, not just Cinder. Have people consider Ironic's hash ring solution? The project Akanda is now adopting it [1], and I think it might have potential. I'd appreciate it if interested parties could have this evaluated before the Cinder midcycle sprint next week, to be ready for discussion. [1] - https://review.openstack.org/#/c/195366/ -- Mike Perez __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
On Fri, Jul 31, 2015 at 01:47:22AM -0700, Mike Perez wrote: On Mon, Jul 27, 2015 at 12:35 PM, Gorka Eguileor gegui...@redhat.com wrote: I know we've all been looking at the HA Active-Active problem in Cinder and trying our best to figure out possible solutions to the different issues, and since current plan is going to take a while (because it requires that we finish first fixing Cinder-Nova interactions), I've been looking at alternatives that allow Active-Active configurations without needing to wait for those changes to take effect. And I think I have found a possible solution, but since the HA A-A problem has a lot of moving parts I ended up upgrading my initial Etherpad notes to a post [1]. Even if we decide that this is not the way to go, which we'll probably do, I still think that the post brings a little clarity on all the moving parts of the problem, even some that are not reflected on our Etherpad [2], and it can help us not miss anything when deciding on a different solution. Based on IRC conversations in the Cinder room and hearing people's opinions in the spec reviews, I'm not convinced the complexity that a distributed lock manager adds to Cinder for both developers and the operators who ultimately are going to have to learn to maintain things like Zoo Keeper as a result is worth it. Hi Mike, I think you are right in bringing up the cost that adding a DLM to the solution brings to operators, as it is something important to take into consideration, and I would like to say that Ceilometer is already using Tooz so operators are already familiar with these DLM, but unfortunately that would be stretching the truth, since Cinder is present in 73% of OpenStack production workloads while Ceilometer is only in 33% of them, so we would be certainly disturbing some operators. But we must not forget that the only operators that would need to worry about deploying and maintaining the DLM are those wanting to deploy Active-Active configurations (for Active-Passive configuration Tooz will be working with local file locks like we are doing now), and some of those may think like Duncan does: I already have to administer rabbit, mysql, backends, horizon, load ballancers, rate limiters... adding redis isn't going to make it that much harder. That's why I don't think this is such a big deal for the vast majority of operators. On the developer side I have to disagree, there is no difference between using Tooz and using current oslo synchronization mechanism for non Active-Active deployments. **Key point**: We're not scaling Cinder itself, it's about scaling to avoid build up of operations from the storage backend solutions themselves. You must also consider that Active-Active solution will help deployments where downtime is not an option or have SLAs with uptime or operational requirements, it's not only about increasing volume of operations and reducing times. Whatever people think ZooKeeper scaling level is going to accomplish is not even a question. We don't need it, because Cinder isn't as complex as people are making it. I'd like to think the Cinder team is a great in recognizing potential cross project initiatives. Look at what Thang Pham has done with Nova's version object solution. He made a generic solution into an Oslo solution for all, and Cinder is using it. That was awesome, and people really appreciated that there was a focus for other projects to get better, not just Cinder. To be fair, Tooz is just one of those cross project initiatives you are describing, it's a generic solution that can be used in all projects, not just Ceilometer. Have people consider Ironic's hash ring solution? The project Akanda is now adopting it [1], and I think it might have potential. I'd appreciate it if interested parties could have this evaluated before the Cinder midcycle sprint next week, to be ready for discussion. I will have a look at the hash ring solution you mention and see if it makes sense to use it. And I would really love to see the HA A-A discussion enabled for remote people, as some of us are interested in the discussion but won't be able to attend. In my case problems with living in the Old World :-( In a way I have to agree with you that sometimes we make Cinder look more complex than it really is, and in my case the solution I proposed in the post was way too complex as it has been pointed out. I just tried to solve de A-A problem and fix some other issues like recovering lost jobs (those waiting for locks) at the same time. There is an alternative solution I am considering that will be much simpler and will align with Walter's efforts to remove locks from the Volume Manager. I just need to give it a hard think to make sure the solution has all bases covered. The main reason why I am suggesting using Tooz and a DLM is because I think it will allow us to reach Active-Active faster and with less effort, not because I
[openstack-dev] [Cinder] A possible solution for HA Active-Active
Hi all, I know we've all been looking at the HA Active-Active problem in Cinder and trying our best to figure out possible solutions to the different issues, and since current plan is going to take a while (because it requires that we finish first fixing Cinder-Nova interactions), I've been looking at alternatives that allow Active-Active configurations without needing to wait for those changes to take effect. And I think I have found a possible solution, but since the HA A-A problem has a lot of moving parts I ended up upgrading my initial Etherpad notes to a post [1]. Even if we decide that this is not the way to go, which we'll probably do, I still think that the post brings a little clarity on all the moving parts of the problem, even some that are not reflected on our Etherpad [2], and it can help us not miss anything when deciding on a different solution. Cheers, Gorka. [1]: http://gorka.eguileor.com/a-cinder-road-to-activeactive-ha/ [2]: https://etherpad.openstack.org/p/cinder-active-active-vol-service-issues __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Cinder] A possible solution for HA Active-Active
Thanks for this work Gorka. Even if we don't end up taking the approach you suggest, there are parts that are undoubtedly useful piece of quality, well thought out code, posted in clean patches, that can be used to easily try out ideas that were not possible previously. I'm both impressed, and imthusiastic about moving forward on this for the first time in a while. Appreciated. -- Duncan Thomas On 27 July 2015 at 22:35, Gorka Eguileor gegui...@redhat.com wrote: Hi all, I know we've all been looking at the HA Active-Active problem in Cinder and trying our best to figure out possible solutions to the different issues, and since current plan is going to take a while (because it requires that we finish first fixing Cinder-Nova interactions), I've been looking at alternatives that allow Active-Active configurations without needing to wait for those changes to take effect. And I think I have found a possible solution, but since the HA A-A problem has a lot of moving parts I ended up upgrading my initial Etherpad notes to a post [1]. Even if we decide that this is not the way to go, which we'll probably do, I still think that the post brings a little clarity on all the moving parts of the problem, even some that are not reflected on our Etherpad [2], and it can help us not miss anything when deciding on a different solution. Cheers, Gorka. [1]: http://gorka.eguileor.com/a-cinder-road-to-activeactive-ha/ [2]: https://etherpad.openstack.org/p/cinder-active-active-vol-service-issues __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- -- Duncan Thomas __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev