Mark,

thanks for responding, i think you might be wrong. the master/slave
state is only the initial state, and current BACKUP/BACKUP/MASTER
settings are just to illustrate the issue.
yes you're right machine 3 won't get master in a stationary state, i
configured it as master just to show the problem more clear/reproducible
without to have cutting cables etc.
it's to demonstrate, that when 2 vrrp machines (1 & 2) are just running
fine, it needs only 1 (yes one) vrrp "i'm master" packet from machine 3.
so as soon as you start keepalived on machine 3 you will produce that
packet, and even if you stop machine 3 (immediate) the harm is done, and
machine 1 and 2 get and keep haystack (vrrp loop like crazy and flood
your network with multi-cast).
So basicaly a blackhat which is able to drop a matching (vrrp
password/etc) packet in the local network with a regular dual
vrrp/keepalived config, will get your network down. our case was
triggered in a quad vrrp setup.
My patch, perhaps not the ultimate one (but upstream keepalived mailing
list seems to be dead), prevents that.

regards,

On Tue, 2010-11-02 at 16:30 +0000, Mark Foster wrote:

> Noting that the original configuration shown looks like a problem. The
> prio settings on the MASTER (machine3) are too low. In keepalived,
> higher priority wins, so machine1 and machine3 are probably trying to
> fight it out. I suggest the situation can be avoided by using state
> MASTER on all three machines, or by swapping the priority settings of
> machines 1 and 3 so that 3 has prio 250 and 1 has 150.
> 
> -- 
> keepalived vrrp race condition and fix (versions 1.1.17 and 1.2.0 but perhaps 
> all?)
> https://bugs.launchpad.net/bugs/619712
> You received this bug notification because you are a direct subscriber
> of the bug.
> 
> Status in Release Notes for Ubuntu: Invalid
> Status in “keepalived” package in Ubuntu: Incomplete
> Status in “keepalived” source package in Maverick: Won't Fix
> 
> Bug description:
> Binary package hint: keepalived
> 
> Hello,
> 
> About a keepalived vrrp race condition and fix (versions 1.1.17 and 1.2.0)
> and probably all other versions.
> 
> Impact: possible complete/partial datacenter disruptions due to gratious
> arp/multicast floods
> 
> 
> 
> The situation to reproduce the race condition
> 
> The race condition got triggered in our situation with 1 cluster of 4
> machines, sharing  3 (vlan) interfaces.
> 
> Reason for doing doing so, not really importand, using vrrp and redundant
> cluster (2x2) for changing the default gateway, in case of a lease line
> failure.
> 
> 
> The most basic setup, and easyest way to trigger is a subset of above
> -3 machines, with keepalived
> -2 interfaces/machine, defined in keepalived as vrrp_instance
> -1 vrrp_sync_group, with those interfaces included
> 
> some key settings:
> 
> machine1:
>       initial state: state BACKUP
>       int 1, prio 250
>       int 2, prio 250
> machine2:
>       initial state: state BACKUP
>       int 1, prio 200
>       int 2, prio 200
> machine3:
>       initial state: state MASTER
>       int 1, prio 150
>       int 2, prio 150
> 
> 
> 
> Machine3 is set to master, just to trigger the race condition, but in
> initial state BACKUP, a connection failure (without a link down) will do the
> same, which is the situation we encountered.
> 
> 
> WARNING: below may/will cause network disruption, so be prepared.
> 
> The way to trigger, is having started keepalived on machine 1 & 2, where
> machine 1 gently becomes a master as meant/configured.
> 
> Now, start machine3, and here it happens, as soon machine sends a single
> vrrp message (as it want's to be master), machine 1 & 2 get's crazy(haywire), 
> and
> come in a race condition which will never stop, until you stop one of
> machine 1 or 2. Stopping machine 3 will definitly NOT stop the race.
> 
> 
> A sample of the relevant logging of the race:
> 
> machine 1:
> Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV610) Received 
> lower prio advert, forcing new election
> Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV910) Received 
> lower prio advert, forcing new election
> Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV610) Received 
> lower prio advert, forcing new election
> Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV910) Received 
> lower prio advert, forcing new election
> Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV610) Received 
> lower prio advert, forcing new election
> Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV910) Received 
> lower prio advert, forcing new election
> Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV610) Received 
> lower prio advert, forcing new election
> Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV910) Received 
> lower prio advert, forcing new election
> Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV610) Received 
> lower prio advert, forcing new election
> Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV910) Received 
> lower prio advert, forcing new election
> Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV610) Received 
> lower prio advert, forcing new election
> Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV910) Received 
> lower prio advert, forcing new election
> Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV610) Received 
> lower prio advert, forcing new election
> Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV910) Received 
> lower prio advert, forcing new election
> Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV610) Received 
> lower prio advert, forcing new election
> Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV910) Received 
> lower prio advert, forcing new election
> Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV610) Received 
> lower prio advert, forcing new election
> Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV910) Received 
> lower prio advert, forcing new election
> Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV610) Received 
> lower prio advert, forcing new election
> Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV910) Received 
> lower prio advert, forcing new election
> Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV610) Received 
> lower prio advert, forcing new election
> Aug 16 13:22:27 saaplsprd01 Keepalived_vrrp: VRRP_Instance(AMV910) Received 
> lower prio advert, forcing new election
> 
> 
> machine 2:
> Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Instance(AMV610) forcing a 
> new MASTER election
> Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Group(GROUP1) Transition to 
> MASTER state
> Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Instance(AMV910) forcing a 
> new MASTER election
> Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Group(GROUP1) Transition to 
> MASTER state
> Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Instance(AMV910) forcing a 
> new MASTER election
> Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Group(GROUP1) Transition to 
> MASTER state
> Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Instance(AMV610) forcing a 
> new MASTER election
> Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Group(GROUP1) Transition to 
> MASTER state
> Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Instance(AMV910) forcing a 
> new MASTER election
> Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Group(GROUP1) Transition to 
> MASTER state
> Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Instance(AMV610) forcing a 
> new MASTER election
> Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Group(GROUP1) Transition to 
> MASTER state
> Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Instance(AMV910) forcing a 
> new MASTER election
> Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Group(GROUP1) Transition to 
> MASTER state
> Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Instance(AMV610) forcing a 
> new MASTER election
> Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Group(GROUP1) Transition to 
> MASTER state
> Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Instance(AMV910) forcing a 
> new MASTER election
> Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Group(GROUP1) Transition to 
> MASTER state
> Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Instance(AMV610) forcing a 
> new MASTER election
> Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Group(GROUP1) Transition to 
> MASTER state
> Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Instance(AMV910) forcing a 
> new MASTER election
> Aug 16 13:22:27 saaplsprd02 Keepalived_vrrp: VRRP_Group(GROUP1) Transition to 
> MASTER state
> 
> 
> 
> After some review of the keepalived code, and analyzed traces, and
> researching the situation, i came to a fix of the issue.  Be aware i've
> never looked at the keepalived code before, so perhaps, there may be a more
> ultimate fix.
> 
> This patch prevents sending another vrrp message in the
> vrrp_sync.c: vrrp_sync_master_election() when already in the isync->wantstate 
> is in a
> VRRP_STATE_GOTO_MASTER  state.
> 
> old code in vrrp_sync.c:
> void
> vrrp_sync_master_election(vrrp_rt * vrrp)
> {
>         vrrp_rt *isync;
>         vrrp_sgroup *vgroup = vrrp->sync;
>         list l = vgroup->index_list;
>         element e;
> 
>         if (vrrp->wantstate != VRRP_STATE_GOTO_MASTER)
>                 return;
>         if (GROUP_STATE(vgroup) == VRRP_STATE_FAULT)
>                 return;
> 
>         log_message(LOG_INFO, "VRRP_Group(%s) Transition to MASTER state",
>                GROUP_NAME(vgroup));
> 
>         /* Perform sync index */
>         for (e = LIST_HEAD(l); e; ELEMENT_NEXT(e)) {
>                 isync = ELEMENT_DATA(e);
>                 if (isync != vrrp) {
>                         /* Force a new protocol master election */
>                         isync->wantstate = VRRP_STATE_GOTO_MASTER;
>                         log_message(LOG_INFO,
>                                "VRRP_Instance(%s) forcing a new MASTER 
> election",
>                                isync->iname);
>                         vrrp_send_adv(isync, isync->effective_priority);
>                 }
>         }
> }
> 
> 
> patched code in vrrp_sync.c:
> void
> vrrp_sync_master_election(vrrp_rt * vrrp)
> {
>         vrrp_rt *isync;
>         vrrp_sgroup *vgroup = vrrp->sync;
>         list l = vgroup->index_list;
>         element e;
> 
>         if (vrrp->wantstate != VRRP_STATE_GOTO_MASTER)
>                 return;
>         if (GROUP_STATE(vgroup) == VRRP_STATE_FAULT)
>                 return;
> 
>         log_message(LOG_INFO, "VRRP_Group(%s) Transition to MASTER state",
>                GROUP_NAME(vgroup));
> 
>         /* Perform sync index */
>         for (e = LIST_HEAD(l); e; ELEMENT_NEXT(e)) {
>                 isync = ELEMENT_DATA(e);
>                 if (isync != vrrp) {
>                         if ( isync->wantstate != VRRP_STATE_GOTO_MASTER ) {   
>   // Fix ArFi 2010/08/16 to prevent VRRP loops with 3 or more members and 2 
> or more shared interfaces in 1 group 
>                         /* Force a new protocol master election */
>                         isync->wantstate = VRRP_STATE_GOTO_MASTER;
>                         log_message(LOG_INFO,
>                                "VRRP_Instance(%s) forcing a new MASTER 
> election",
>                                isync->iname);
>                         vrrp_send_adv(isync, isync->effective_priority);
>                         }
>                 }
>         }
> }
> 
> 
> diff keepalived-1.1.17:
> 
> # diff  -u vrrp_sync.c.orig_ vrrp_sync.c
> --- vrrp_sync.c.orig_ 2010-08-17 14:12:30.944634215 +0200
> +++ vrrp_sync.c       2010-08-17 14:15:05.964634624 +0200
> @@ -157,12 +157,14 @@
>       for (e = LIST_HEAD(l); e; ELEMENT_NEXT(e)) {
>               isync = ELEMENT_DATA(e);
>               if (isync != vrrp) {
> +                     if ( isync->wantstate != VRRP_STATE_GOTO_MASTER ) {     
> // Fix ArFi 2010/08/16 to prevent VRRP loops with 3 or more members and 2 or 
> more shared interfaces in 1 group 
>                       /* Force a new protocol master election */
>                       isync->wantstate = VRRP_STATE_GOTO_MASTER;
>                       log_message(LOG_INFO,
>                              "VRRP_Instance(%s) forcing a new MASTER 
> election",
>                              isync->iname);
>                       vrrp_send_adv(isync, isync->effective_priority);
> +                     }
>               }
>       }
>  }
> 
> 
> diff keepalived-1.2.0:
> 
> # diff -u vrrp_sync.c.orig_ vrrp_sync.c
> --- vrrp_sync.c.orig_ 2010-08-18 08:17:52.766910325 +0200
> +++ vrrp_sync.c       2010-08-18 08:20:23.216968929 +0200
> @@ -155,12 +155,14 @@
>       for (e = LIST_HEAD(l); e; ELEMENT_NEXT(e)) {
>               isync = ELEMENT_DATA(e);
>               if (isync != vrrp) {
> +                     if ( isync->wantstate != VRRP_STATE_GOTO_MASTER ) {     
> // Fix ArFi 2010/08/16 to prevent VRRP loops with 3 or more members and 2 or 
> more shared interfaces in 1 group
>                       /* Force a new protocol master election */
>                       isync->wantstate = VRRP_STATE_GOTO_MASTER;
>                       log_message(LOG_INFO,
>                              "VRRP_Instance(%s) forcing a new MASTER 
> election",
>                              isync->iname);
>                       vrrp_send_adv(isync, isync->effective_priority);
> +                     }
>               }
>       }
>  }
> 
> 
> 
> 
> 
> 
> 
> This solves our issue, and we did some basic testing with good results, but 
> it may be wise an
> experienced keepalived developer should have a look at it too.
> 
> 
> PS: marked this as a security vulnerability as it _may_ bring a 
> system/datacenter
> down, by just a single (vrrp) packet.
> Posted this bugreport as well on the keepalived-de...@lists.sourceforge.net 
> ML.
> 
> 
> Regards,
> 
> Arjan Filius
> iafil...@xs4all.nl
> 
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/ubuntu-release-notes/+bug/619712/+subscribe
>

-- 
keepalived vrrp race condition and fix (versions 1.1.17 and 1.2.0 but perhaps 
all?)
https://bugs.launchpad.net/bugs/619712
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to keepalived in ubuntu.

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs

Reply via email to