Hi Michael, Thanks a lot for those many details regarding the transition between different states, indeed as you said, my LB passed from pending_update to active but I still had an offline status this morning as I still received UDP Packets that HM dropped.
When I was talking about the HealthManager reaching to the amphora on port 9443 of course I didn't mean it use the heartbeat key. I just had a look at my Amphora and Octavia CP (Control Plan) versions, seems a little bit off sync as my amphora agent is: *%prog 3.0.0.0b4.dev6* while my octavia CP services are: *%prog 2.0.1* I've just updated to stable/rocky this morning and so jumped to: *%prog 3.0.1* I'll check if I still encounter this issue, but for now my issue seems to have vanished as I've the following messages: *2018-10-24 11:58:54.620 24 DEBUG futurist.periodics [-] Submitting periodic callback 'octavia.cmd.health_manager.periodic_health_check' _process_scheduled /usr/lib/python2.7/site-packages/futurist/periodics.py:639* *2018-10-24 11:58:57.620 24 DEBUG futurist.periodics [-] Submitting periodic callback 'octavia.cmd.health_manager.periodic_health_check' _process_scheduled /usr/lib/python2.7/site-packages/futurist/periodics.py:639* *2018-10-24 11:59:00.620 24 DEBUG futurist.periodics [-] Submitting periodic callback 'octavia.cmd.health_manager.periodic_health_check' _process_scheduled /usr/lib/python2.7/site-packages/futurist/periodics.py:639* *2018-10-24 11:59:03.620 24 DEBUG futurist.periodics [-] Submitting periodic callback 'octavia.cmd.health_manager.periodic_health_check' _process_scheduled /usr/lib/python2.7/site-packages/futurist/periodics.py:639* *2018-10-24 11:59:04.557 23 DEBUG octavia.amphorae.drivers.health.heartbeat_udp [-] Received packet from ('172.27.201.105', 48342) dorecv /usr/lib/python2.7/site-packages/octavia/amphorae/drivers/health/heartbeat_udp.py:187* *2018-10-24 11:59:04.619 45 DEBUG octavia.controller.healthmanager.health_drivers.update_db [-] Health Update finished in: 0.0600640773773 seconds update_health /usr/lib/python2.7/site-packages/octavia/controller/healthmanager/health_drivers/update_db.py:93* I'll update you with my following investigation, but so far, the issue seems to be resolve, I'll tweak a bit the timeouts as my LB take a looooot of time to create Listeners/Pools and come to an online status. Thanks a lot! Le mar. 23 oct. 2018 à 19:09, Michael Johnson <johnso...@gmail.com> a écrit : > Are the controller and the amphora using the same version of Octavia? > > We had a python3 issue where we had to change the HMAC digest used. If > you controller is running an older version of Octavia than your > amphora images, it may not have the compatibility code to support the > new format. The compatibility code is here: > > https://github.com/openstack/octavia/blob/master/octavia/amphorae/backends/health_daemon/status_message.py#L56 > > There is also a release note about the issue here: > https://docs.openstack.org/releasenotes/octavia/rocky.html#upgrade-notes > > If that is not the issue, I would double check the heartbeat_key in > the health manager configuration files and inside one of the amphora. > > Note, that this key is only used for health heartbeats and stats, it > is not used for the controller to amphora communication on port 9443. > > Also, load balancers cannot get "stuck" in PENDING_* states unless > someone has killed the controller process that was actively working on > that load balancer. By killed I mean a non-graceful shutdown of the > process that was in the middle of working on the load balancer. > Otherwise all code paths lead back to ACTIVE or ERROR status after it > finishes the work or gives up retrying the requested action. Check > your controller logs to make sure this load balancer is not still > being worked on by one of the controllers. The default retry timeouts > (some are up to 25 minutes) are very long (it will keep trying to > accomplish the request) to accommodate very slow (virtual box) hosts > and the test gates. You will want to tune those down for a production > deployment. > > Michael > > On Tue, Oct 23, 2018 at 7:09 AM Gaël THEROND <gael.ther...@gmail.com> > wrote: > > > > Hi guys, > > > > I'm finishing to work on my POC for Octavia and after solving few issues > with my configuration I'm close to get a properly working setup. > > However, I'm facing a small but yet annoying bug with the health-manager > receiving amphora heartbeat UDP packet which it consider as not correct and > so drop it. > > > > Here are the messages that can be found in logs: > > > > 2018-10-23 13:53:21.844 25 WARNING > octavia.amphorae.backends.health_daemon.status_message [-] calculated hmac: > faf73e41a0f843b826ee581c3995b7f7e56b5e5a294fca0b84eda426766f8415 not equal > to msg hmac: > 6137613337316432636365393832376431343337306537353066626130653261 dropping > packet > > > > Which come from this part of the HM Code: > > > > > https://docs.openstack.org/octavia/pike/_modules/octavia/amphorae/backends/health_daemon/status_message.html#get_payload > > > > The annoying thing is that I don't get why the UDP packet is considered > as stale and how can I try to reproduce the payload which is send to the > HealthManager. > > I'm willing to write a simple PY program to simulate the heartbeat > payload but I don't now what's exactly the message and I think I miss some > informations. > > > > Both HealthManager and the Amphora do use the same heartbeat_key and > both can contact on the network as the initial Health-manager to Amphora > 9443 connection is validated. > > > > As an effect to this situation, my loadbalancer is stuck in > PENDING_UPDATE mode. > > > > Do you have any idea on how can I handle such thing or if it's something > already seen out there for anyone else? > > > > Kind regards, > > G. > > _______________________________________________ > > OpenStack-operators mailing list > > OpenStack-operators@lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >
_______________________________________________ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators