Hi all,

I was looking lately at upgrades for octavia images. This includes using new 
images for new loadbalancers, as well as for existing balancers.

For the first problem, the amp_image_tag option that I added in Mitaka seems to 
do the job: all new balancers are created with the latest image that is tagged 
properly.

As for balancers that already exist, the only way to get them use a new image 
is to trigger an instance failure, that should rebuild failed nova instance, 
using the new image. AFAIU the failover process is not currently automated, 
requiring from the user to set the corresponding port to DOWN and waiting for 
failover to be detected. I’ve heard there are plans to introduce a specific 
command to trigger a quick-failover, that would streamline the process and 
reduce the time needed for the process because the failover would be 
immediately detected and processed instead of waiting for keepalived failure 
mode to occur. Is it on the horizon? Patches to review?

While the approach seems rather promising and may be applicable for some 
environments, I have several concerns about the failover approach that we may 
want to address.

1. HA assumption. The approach assumes there is another node running available 
to serve requests while instance is rebuilding. For non-HA amphoras, it’s not 
the case, meaning the image upgrade process has a significant downtime.

2. Even if we have HA, for the time of instance rebuilding, the balancer 
cluster is degraded to a single node.

3. (minor) during the upgrade phase, instances that belong to the same HA 
amphora may run different versions of the image.

What’s the alternative?

One idea I was running with for some time is moving the upgrade complexity one 
level up. Instead of making Octavia aware of upgrade intricacies, allow it to 
do its job (load balance), while use neutron floating IP resource to flip a 
switch from an old image to a new one. Let me elaborate.

Let’s say we have a load balancer LB1 that is running Image1. In this scenario, 
we assume that access to LB1 VIP is proxied through a floating ip FIP that 
points to LB1 VIP. Now, the operator uploaded a new Image2 to glance registry 
and tagged it for octavia usage. The user now wants to migrate the load 
balancer function to using the new image. To achieve this, the user follows the 
steps:

1. create an independent clone of LB1 (let’s call it LB2) that has exact same 
attributes (members) as LB1.
2. once LB2 is up and ready to process requests incoming to its VIP, redirect 
FIP to the LB2 VIP.
3. now all new flows are immediately redirected to LB2 VIP, no downtime (for 
new flows) due to atomic nature of FIP update on the backend (we use 
iptables-save/iptables-restore to update FIP rules on the router).
4. since LB1 is no longer handling any flows, we can deprovision it. LB2 is now 
the only balancer handling members.

With that approach, 1) we provide for consistent downtime expectations 
irrelevant to amphora architecture chosen (HA or not); 2) we flip the switch 
when the clone is up and ready, so no degraded state for the balancer function; 
3) all instances in an HA amphora run the same image.

Of course, it won’t provide no downtime for existing flows that may already be 
handled by the balancer function. That’s a limitation that I believe is shared 
by all approaches currently at the table.

As a side note, the approach would work for other lbaas drivers, like 
namespaces, f.e. in case we want to update haproxy.

Several questions in regards to the topic:

1. are there any drawbacks with the approach? can we consider it an alternative 
way of doing image upgrades that could find its way into official documentation?

2. if the answer is yes, then how can I contribute the piece? should I sync 
with some other doc related work that I know is currently ongoing in the team?

Ihar
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to