Re: [Openstack-operators] expanding to 2nd location
Just to add in my $0.02, we run in multiple sites as well. We are using regions to do this. Cells at this point have a lot going for it, but we thought it wasn't there yet. We also don't have the necessary resources to make our own changes to it like a few other places do. With that, we said the only real thing that we should do is make sure items such as Tenant and User ID's remained the same. That allows us to do show-back reporting and it makes it easier on the user-base on when they want to deploy from one region to another. With that requirement, we did used galera in the same manner that many of the others mentioned. We then deployed Keystone pointing to that galera DB. That is the only DB that is replicated across sites. Everything else such as Nova, Neutron, etc are all within its own location. The only real confusing piece for our users is the dashboard. When you first go to the dashboard, there is a dropdown to select a region. Many users think that is going to send them to a particular location, so their information from that location is going to show up. It is really to which region do you want to authenticate against. Once you are in the dashboard, you can select which Project you want to see. That has been a major point of confusion. I think our solution is to just rename that text. On Tue, May 5, 2015 at 11:46 AM, Clayton O'Neill clay...@oneill.net wrote: On Tue, May 5, 2015 at 11:33 AM, Curtis serverasc...@gmail.com wrote: Do people have any comments or strategies on dealing with Galera replication across the WAN using regions? Seems like something to try to avoid if possible, though might not be possible. Any thoughts on that? We're doing this with good luck. Few things I'd recommend being aware of: Set galera_group_segment so that each site is in a separate segment. This will make it smarter about doing replication and for state transfer. Make sure you look at the timers and tunables in Galera and make sure they make sense for your network. We've got lots of BW and lowish latency (37ms), so the defaults have worked pretty well for us. Make sure that when you do provisioning in one site, you don't have CM tools in the other site breaking things. We can ran into issues during our first deploy like this where Puppet was making a change in one site to a user, and Puppet in the other site reverted the change nearly immediately. You may have to tweak your deployment process to deal with that sort of thing. Make sure you're running Galera or Galera Arbitrator in enough sites to maintain quorum if you have issues. We run 3 nodes in one DC, and 3 nodes in another DC for Horizon, Keystone and Designate. We run a Galera arbitrator in a third DC to settle ties. Lastly, the obvious one is just to stay up to date on patches. Galera is pretty stable, but we have run into bugs that we had to get fixes for. On Tue, May 5, 2015 at 11:33 AM, Curtis serverasc...@gmail.com wrote: Do people have any comments or strategies on dealing with Galera replication across the WAN using regions? Seems like something to try to avoid if possible, though might not be possible. Any thoughts on that? Thanks, Curtis. On Mon, May 4, 2015 at 3:11 PM, Jesse Keating j...@bluebox.net wrote: I agree with Subbu. You'll want that to be a region so that the control plane is mostly contained. Only Keystone (and swift if you have that) would be doing lots of site to site communication to keep databases in sync. http://docs.openstack.org/arch-design/content/multi_site.html is a good read on the topic. - jlk On Mon, May 4, 2015 at 1:58 PM, Allamaraju, Subbu su...@subbu.org wrote: I suggest building a new AZ (“region” in OpenStack parlance) in the new location. In general I would avoid setting up control plane to operate across multiple facilities unless the cloud is very large. On May 4, 2015, at 1:40 PM, Jonathan Proulx j...@jonproulx.com wrote: Hi All, We're about to expand our OpenStack Cloud to a second datacenter. Anyone one have opinions they'd like to share as to what I would and should be worrying about or how to structure this? Should I be thinking cells or regions (or maybe both)? Any obvious or not so obvious pitfalls I should try to avoid? Current scale is about 75 hypervisors. Running juno on Ubuntu 14.04 using Ceph for volume storage, ephemeral block devices, and image storage (as well as object store). Bulk data storage for most (but by no means all) of our workloads is at the current location (not that that matters I suppose). Second location is about 150km away and we'll have 10G (at least) between sites. The expansion will be approximately the same size as the existing cloud maybe slightly larger and given site capacities the new location is also more likely to be where any future grown goes. Thanks, -Jon
Re: [Openstack-operators] expanding to 2nd location
+1 to second site = second region. I would not recommend using cells unless you have a real nova scalability problem. There are a lot of caveats/gotchas. Cells v2 I think should come as an experimental feature in Liberty, and past that point cells will be the default mode of operation. It will probably be much easier to go from no cells to cells v2 than cells v1 to v2. Mike From: Joseph Bajin Date: Wednesday, May 6, 2015 at 8:06 AM Cc: OpenStack Operators Subject: Re: [Openstack-operators] expanding to 2nd location Just to add in my $0.02, we run in multiple sites as well. We are using regions to do this. Cells at this point have a lot going for it, but we thought it wasn't there yet. We also don't have the necessary resources to make our own changes to it like a few other places do. With that, we said the only real thing that we should do is make sure items such as Tenant and User ID's remained the same. That allows us to do show-back reporting and it makes it easier on the user-base on when they want to deploy from one region to another. With that requirement, we did used galera in the same manner that many of the others mentioned. We then deployed Keystone pointing to that galera DB. That is the only DB that is replicated across sites. Everything else such as Nova, Neutron, etc are all within its own location. The only real confusing piece for our users is the dashboard. When you first go to the dashboard, there is a dropdown to select a region. Many users think that is going to send them to a particular location, so their information from that location is going to show up. It is really to which region do you want to authenticate against. Once you are in the dashboard, you can select which Project you want to see. That has been a major point of confusion. I think our solution is to just rename that text. On Tue, May 5, 2015 at 11:46 AM, Clayton O'Neill clay...@oneill.netmailto:clay...@oneill.net wrote: On Tue, May 5, 2015 at 11:33 AM, Curtis serverasc...@gmail.commailto:serverasc...@gmail.com wrote: Do people have any comments or strategies on dealing with Galera replication across the WAN using regions? Seems like something to try to avoid if possible, though might not be possible. Any thoughts on that? We're doing this with good luck. Few things I'd recommend being aware of: Set galera_group_segment so that each site is in a separate segment. This will make it smarter about doing replication and for state transfer. Make sure you look at the timers and tunables in Galera and make sure they make sense for your network. We've got lots of BW and lowish latency (37ms), so the defaults have worked pretty well for us. Make sure that when you do provisioning in one site, you don't have CM tools in the other site breaking things. We can ran into issues during our first deploy like this where Puppet was making a change in one site to a user, and Puppet in the other site reverted the change nearly immediately. You may have to tweak your deployment process to deal with that sort of thing. Make sure you're running Galera or Galera Arbitrator in enough sites to maintain quorum if you have issues. We run 3 nodes in one DC, and 3 nodes in another DC for Horizon, Keystone and Designate. We run a Galera arbitrator in a third DC to settle ties. Lastly, the obvious one is just to stay up to date on patches. Galera is pretty stable, but we have run into bugs that we had to get fixes for. On Tue, May 5, 2015 at 11:33 AM, Curtis serverasc...@gmail.commailto:serverasc...@gmail.com wrote: Do people have any comments or strategies on dealing with Galera replication across the WAN using regions? Seems like something to try to avoid if possible, though might not be possible. Any thoughts on that? Thanks, Curtis. On Mon, May 4, 2015 at 3:11 PM, Jesse Keating j...@bluebox.netmailto:j...@bluebox.net wrote: I agree with Subbu. You'll want that to be a region so that the control plane is mostly contained. Only Keystone (and swift if you have that) would be doing lots of site to site communication to keep databases in sync. http://docs.openstack.org/arch-design/content/multi_site.html is a good read on the topic. - jlk On Mon, May 4, 2015 at 1:58 PM, Allamaraju, Subbu su...@subbu.orgmailto:su...@subbu.org wrote: I suggest building a new AZ (“region” in OpenStack parlance) in the new location. In general I would avoid setting up control plane to operate across multiple facilities unless the cloud is very large. On May 4, 2015, at 1:40 PM, Jonathan Proulx j...@jonproulx.commailto:j...@jonproulx.com wrote: Hi All, We're about to expand our OpenStack Cloud to a second datacenter. Anyone one have opinions they'd like to share as to what I would and should be worrying about or how to structure this? Should I be thinking cells or regions (or maybe both)? Any obvious or not so obvious pitfalls I should try to avoid
Re: [Openstack-operators] expanding to 2nd location
On Mon, May 4, 2015 at 9:42 PM, Tom Fifield t...@openstack.org wrote: Do you need users to be able to see it as one cloud, with a single API endpoint? Define need :) As many of you know my cloud is a University system and researchers are nothing if not lazy, in the best possible sense of course :) So having a single API and scheduler so users don't *need* to think about placement while using AZs so that they can (as Tim mentions a little further dorm the thread) is very attractive. Managing complexity is also important since we about 1 FTE equivalent (split between two or three actual humans) to manage our cloud. For partially technical partially political reasons we will not have the same IP networks available at the second location. With a bit of heavy lifting on my end I could probably change this, but if i did it would mean all the L3 would need to be routes for one of the sites (because $reasons trust me). So given that users would need to pick which network to use, which would in fact be picking which site to launch in, which sounds like it would rather be a Region. So Joe's model where Region2 slaves off Region1 for Keystone and Glance is looking like the best fit. We could force users to balance across regions by splitting their quota using the non-unified quota model to our advantage. Though I still have a bit of reeading to do apparently since I'd forgotten about the Architecture Design Guide Jesse pointed out http://docs.openstack.org/arch-design/content/multi_site.html Thanks all, -Jon ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] expanding to 2nd location
We do it with some of our databases (horizon, designate, and keystone) and we run a arbitrator process (garbd) in a 3rd DC. We have lots of low latency bandwidth which you have to be careful with. My recommendation would be that you need to know your network well and have good monitoring in place. On Tue, May 5, 2015 at 9:33 AM, Curtis serverasc...@gmail.com wrote: Do people have any comments or strategies on dealing with Galera replication across the WAN using regions? Seems like something to try to avoid if possible, though might not be possible. Any thoughts on that? Thanks, Curtis. On Mon, May 4, 2015 at 3:11 PM, Jesse Keating j...@bluebox.net wrote: I agree with Subbu. You'll want that to be a region so that the control plane is mostly contained. Only Keystone (and swift if you have that) would be doing lots of site to site communication to keep databases in sync. http://docs.openstack.org/arch-design/content/multi_site.html is a good read on the topic. - jlk On Mon, May 4, 2015 at 1:58 PM, Allamaraju, Subbu su...@subbu.org wrote: I suggest building a new AZ (“region” in OpenStack parlance) in the new location. In general I would avoid setting up control plane to operate across multiple facilities unless the cloud is very large. On May 4, 2015, at 1:40 PM, Jonathan Proulx j...@jonproulx.com wrote: Hi All, We're about to expand our OpenStack Cloud to a second datacenter. Anyone one have opinions they'd like to share as to what I would and should be worrying about or how to structure this? Should I be thinking cells or regions (or maybe both)? Any obvious or not so obvious pitfalls I should try to avoid? Current scale is about 75 hypervisors. Running juno on Ubuntu 14.04 using Ceph for volume storage, ephemeral block devices, and image storage (as well as object store). Bulk data storage for most (but by no means all) of our workloads is at the current location (not that that matters I suppose). Second location is about 150km away and we'll have 10G (at least) between sites. The expansion will be approximately the same size as the existing cloud maybe slightly larger and given site capacities the new location is also more likely to be where any future grown goes. Thanks, -Jon ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators -- Twitter: @serverascode Blog: serverascode.com ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] expanding to 2nd location
CERN runs two data centres in Geneva (3.5MW) and Budapest (2.7MW), around 1,200 KMs . We have two 100Gb/s links between the two sites and latency of around 22ms. We run this as a single cloud with 13 cells. Each cell is only in one data centre. We wanted a single API endpoint from the user perspective and thus, we did not use regions. There are things to consider such as - Availability zone set up so that people can choose which centre to place work in (such as disaster recovery) - Scheduling of work for projects and localisation of the volumes for those Vms (we¹ve not found a good solution for this one) In an ideal world, we¹d have a high availability API layer for the cells across two sites. We¹ve not got that far yet. Tim On 5/5/15, 3:42 AM, Tom Fifield t...@openstack.org wrote: On 05/05/15 04:40, Jonathan Proulx wrote: Hi All, We're about to expand our OpenStack Cloud to a second datacenter. Anyone one have opinions they'd like to share as to what I would and should be worrying about or how to structure this? Should I be thinking cells or regions (or maybe both)? Any obvious or not so obvious pitfalls I should try to avoid? Current scale is about 75 hypervisors. Running juno on Ubuntu 14.04 using Ceph for volume storage, ephemeral block devices, and image storage (as well as object store). Bulk data storage for most (but by no means all) of our workloads is at the current location (not that that matters I suppose). Second location is about 150km away and we'll have 10G (at least) between sites. The expansion will be approximately the same size as the existing cloud maybe slightly larger and given site capacities the new location is also more likely to be where any future grown goes. Do you need users to be able to see it as one cloud, with a single API endpoint? ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] expanding to 2nd location
I suggest building a new AZ (“region” in OpenStack parlance) in the new location. In general I would avoid setting up control plane to operate across multiple facilities unless the cloud is very large. On May 4, 2015, at 1:40 PM, Jonathan Proulx j...@jonproulx.com wrote: Hi All, We're about to expand our OpenStack Cloud to a second datacenter. Anyone one have opinions they'd like to share as to what I would and should be worrying about or how to structure this? Should I be thinking cells or regions (or maybe both)? Any obvious or not so obvious pitfalls I should try to avoid? Current scale is about 75 hypervisors. Running juno on Ubuntu 14.04 using Ceph for volume storage, ephemeral block devices, and image storage (as well as object store). Bulk data storage for most (but by no means all) of our workloads is at the current location (not that that matters I suppose). Second location is about 150km away and we'll have 10G (at least) between sites. The expansion will be approximately the same size as the existing cloud maybe slightly larger and given site capacities the new location is also more likely to be where any future grown goes. Thanks, -Jon ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
[Openstack-operators] expanding to 2nd location
Hi All, We're about to expand our OpenStack Cloud to a second datacenter. Anyone one have opinions they'd like to share as to what I would and should be worrying about or how to structure this? Should I be thinking cells or regions (or maybe both)? Any obvious or not so obvious pitfalls I should try to avoid? Current scale is about 75 hypervisors. Running juno on Ubuntu 14.04 using Ceph for volume storage, ephemeral block devices, and image storage (as well as object store). Bulk data storage for most (but by no means all) of our workloads is at the current location (not that that matters I suppose). Second location is about 150km away and we'll have 10G (at least) between sites. The expansion will be approximately the same size as the existing cloud maybe slightly larger and given site capacities the new location is also more likely to be where any future grown goes. Thanks, -Jon ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators