Hi Kris,
I'm adding Shi Han Zhang to the thread,
I'm was involved in some refactors during kilo and Han Zhang in some
extra fixes during Liberty [1] [2] [3],
Could you get us some logs of such failures to see what was
happening around the failure time?, as a minimum we should
post the log error traces to a bug in https://bugs.launchpad.net/neutron
We will be glad to use such information to make the ipset more
fault tolerant, and try to identify the cause of the
possible race conditions.
[1] https://review.openstack.org/#/c/187483/
[2] https://review.openstack.org/190991
[3] https://review.openstack.org/#/c/187433/
Kris G. Lindgren wrote:
We have been using ipsets since juno. Twice now since our kilo
upgrade we have had issues with ipsets blowing up on a compute node.
The first time, was iptables was referencing an ipset that was either
no longer there or was not added, and was trying to apply the iptables
config every second and dumping the full iptables-resotore output into
the log when it failed at TRACE level.
Second time, was that ipsets was failing to remove an element that was
no longer there.
For #1 I solved by restarting the neutron-openvswitch-agent. For #2
we just added the entry that ipsets was trying to remove. It seems
like we are having some race conditions under kilo that were not
present under juno (or we managed to run it for 6+ months without it
biting us).
Is anyone else seeing the same problems? I am noticing some commits
reverting/re-adding around ipsets in kilo and liberty so trying to
confirm if I need to open a new bug on this.
____________________________________________
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy, LLC.
_______________________________________________
OpenStack-operators mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Kris G. Lindgren wrote:
We have been using ipsets since juno. Twice now since our kilo upgrade we have
had issues with ipsets blowing up on a compute node.
The first time, was iptables was referencing an ipset that was either no longer
there or was not added, and was trying to apply the iptables config every
second and dumping the full iptables-resotore output into the log when it
failed at TRACE level.
Second time, was that ipsets was failing to remove an element that was no
longer there.
For #1 I solved by restarting the neutron-openvswitch-agent. For #2 we just
added the entry that ipsets was trying to remove. It seems like we are having
some race conditions under kilo that were not present under juno (or we managed
to run it for 6+ months without it biting us).
Is anyone else seeing the same problems? I am noticing some commits
reverting/re-adding around ipsets in kilo and liberty so trying to confirm if I
need to open a new bug on this.
____________________________________________
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy, LLC.
_______________________________________________
OpenStack-operators mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
_______________________________________________
OpenStack-operators mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators