Re: [Openstack-operators] Neutron getting stuck creating namespaces
On Tue, Nov 24, 2015 at 8:26 AM, Bajin, Joseph wrote: > We haven’t seen the bad namespaces issue, but we have experienced an issue > where our node eventually started to see soft lockups like these: > > kernel: BUG: soft lockup - CPU#0 stuck for 22s! > > We noticed it once we hit a high amount of namespaces. It was definitely over > 400, as we didn’t realize that the option to delete namespaces was reverted > from true to false a few releases ago. Note that since the Liberty release the option is set to True by default so that DHCP and router namespaces will be deleted whenever their respective resources are. We cleaned up the namespaces and those errors would stop showing up, then eventually over time those namespaces rose again to a high level, and this time we were lucky to have the soft lockup not on the neutron process, but on the kernel scheduler. That is where our reboot happened as the system realized that it was dead and restarted it. > > > > > > On 11/24/15, 4:14 AM, "Saverio Proto" wrote: > >>Hello Xav, >> >>we also had problems with namespaces in Juno. Maybe a little different >>than what you describe. >> >>we are running about 250 namespaces in our network node. When we >>reboot the network node we observe that some namespaces have qr-* and >>qg-* interfaces missing. >> >>we believe that is because the control plane in neutron juno performs >>very badly. This is probably fixed in Kilo. >> >>to work around it, after the network node is up and running, we do >>reset the namespaces that have interfaces missing: >> >> neutron router-update --admin-state-up false >> sleep 5 >> neutron router-update --admin-state-up true >> >>Saverio >> >> >> >> >> >>2015-11-24 9:51 GMT+01:00 Xav Paice : >>> Neutron is Juno, on Trusty boxes with the 3.19 LTS kernel. We're in the >>> process of updating to Kilo, and onwards to Liberty. >>> >>> On 24 November 2015 at 21:24, Saverio Proto wrote: Hello Xav, what version of Openstack are you running ? thank you Saverio 2015-11-23 20:04 GMT+01:00 Xav Paice : > Hi, > > Over the last few months we've had a few incidents where the process to > create network namespaces (Neutron, OVS) on the network nodes gets > 'stuck' > and prevents not only the router it's trying to create from finishing, > but > all further namespace operations too. > > This has usually finished up with either us rebooting the node pretty > fast > afterwards, or the node rebooting itself. > > It looks very much like we're affected by > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1403152 but the > notes > say it's fixed in the kernel we're running. I've asked the clever > person > who checked it to make some extra notes in the bug report. > > It looks very much like when we have a bunch of load on the box the > thing is > more likely to trigger - I was wondering if other ops have a max ratio > of > routers per network node? I would have thought our current max of 150 > routers per node would be pretty light, but with the dhcp namespaces as > well > that's ~450 namespaces on a box and maybe that's an issue? > > Thanks > > ___ > OpenStack-operators mailing list > OpenStack-operators@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > >>> >>> >> >>___ >>OpenStack-operators mailing list >>OpenStack-operators@lists.openstack.org >>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > ___ > OpenStack-operators mailing list > OpenStack-operators@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] Neutron getting stuck creating namespaces
We haven’t seen the bad namespaces issue, but we have experienced an issue where our node eventually started to see soft lockups like these: kernel: BUG: soft lockup - CPU#0 stuck for 22s! We noticed it once we hit a high amount of namespaces. It was definitely over 400, as we didn’t realize that the option to delete namespaces was reverted from true to false a few releases ago. We cleaned up the namespaces and those errors would stop showing up, then eventually over time those namespaces rose again to a high level, and this time we were lucky to have the soft lockup not on the neutron process, but on the kernel scheduler. That is where our reboot happened as the system realized that it was dead and restarted it. On 11/24/15, 4:14 AM, "Saverio Proto" wrote: >Hello Xav, > >we also had problems with namespaces in Juno. Maybe a little different >than what you describe. > >we are running about 250 namespaces in our network node. When we >reboot the network node we observe that some namespaces have qr-* and >qg-* interfaces missing. > >we believe that is because the control plane in neutron juno performs >very badly. This is probably fixed in Kilo. > >to work around it, after the network node is up and running, we do >reset the namespaces that have interfaces missing: > > neutron router-update --admin-state-up false > sleep 5 > neutron router-update --admin-state-up true > >Saverio > > > > > >2015-11-24 9:51 GMT+01:00 Xav Paice : >> Neutron is Juno, on Trusty boxes with the 3.19 LTS kernel. We're in the >> process of updating to Kilo, and onwards to Liberty. >> >> On 24 November 2015 at 21:24, Saverio Proto wrote: >>> >>> Hello Xav, >>> >>> what version of Openstack are you running ? >>> >>> thank you >>> >>> Saverio >>> >>> >>> 2015-11-23 20:04 GMT+01:00 Xav Paice : >>> > Hi, >>> > >>> > Over the last few months we've had a few incidents where the process to >>> > create network namespaces (Neutron, OVS) on the network nodes gets >>> > 'stuck' >>> > and prevents not only the router it's trying to create from finishing, >>> > but >>> > all further namespace operations too. >>> > >>> > This has usually finished up with either us rebooting the node pretty >>> > fast >>> > afterwards, or the node rebooting itself. >>> > >>> > It looks very much like we're affected by >>> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1403152 but the >>> > notes >>> > say it's fixed in the kernel we're running. I've asked the clever >>> > person >>> > who checked it to make some extra notes in the bug report. >>> > >>> > It looks very much like when we have a bunch of load on the box the >>> > thing is >>> > more likely to trigger - I was wondering if other ops have a max ratio >>> > of >>> > routers per network node? I would have thought our current max of 150 >>> > routers per node would be pretty light, but with the dhcp namespaces as >>> > well >>> > that's ~450 namespaces on a box and maybe that's an issue? >>> > >>> > Thanks >>> > >>> > ___ >>> > OpenStack-operators mailing list >>> > OpenStack-operators@lists.openstack.org >>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >>> > >> >> > >___ >OpenStack-operators mailing list >OpenStack-operators@lists.openstack.org >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators smime.p7s Description: S/MIME cryptographic signature ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] Neutron getting stuck creating namespaces
Hello Xav, we also had problems with namespaces in Juno. Maybe a little different than what you describe. we are running about 250 namespaces in our network node. When we reboot the network node we observe that some namespaces have qr-* and qg-* interfaces missing. we believe that is because the control plane in neutron juno performs very badly. This is probably fixed in Kilo. to work around it, after the network node is up and running, we do reset the namespaces that have interfaces missing: neutron router-update --admin-state-up false sleep 5 neutron router-update --admin-state-up true Saverio 2015-11-24 9:51 GMT+01:00 Xav Paice : > Neutron is Juno, on Trusty boxes with the 3.19 LTS kernel. We're in the > process of updating to Kilo, and onwards to Liberty. > > On 24 November 2015 at 21:24, Saverio Proto wrote: >> >> Hello Xav, >> >> what version of Openstack are you running ? >> >> thank you >> >> Saverio >> >> >> 2015-11-23 20:04 GMT+01:00 Xav Paice : >> > Hi, >> > >> > Over the last few months we've had a few incidents where the process to >> > create network namespaces (Neutron, OVS) on the network nodes gets >> > 'stuck' >> > and prevents not only the router it's trying to create from finishing, >> > but >> > all further namespace operations too. >> > >> > This has usually finished up with either us rebooting the node pretty >> > fast >> > afterwards, or the node rebooting itself. >> > >> > It looks very much like we're affected by >> > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1403152 but the >> > notes >> > say it's fixed in the kernel we're running. I've asked the clever >> > person >> > who checked it to make some extra notes in the bug report. >> > >> > It looks very much like when we have a bunch of load on the box the >> > thing is >> > more likely to trigger - I was wondering if other ops have a max ratio >> > of >> > routers per network node? I would have thought our current max of 150 >> > routers per node would be pretty light, but with the dhcp namespaces as >> > well >> > that's ~450 namespaces on a box and maybe that's an issue? >> > >> > Thanks >> > >> > ___ >> > OpenStack-operators mailing list >> > OpenStack-operators@lists.openstack.org >> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >> > > > ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [Openstack-operators] Neutron getting stuck creating namespaces
Hello Xav, what version of Openstack are you running ? thank you Saverio 2015-11-23 20:04 GMT+01:00 Xav Paice : > Hi, > > Over the last few months we've had a few incidents where the process to > create network namespaces (Neutron, OVS) on the network nodes gets 'stuck' > and prevents not only the router it's trying to create from finishing, but > all further namespace operations too. > > This has usually finished up with either us rebooting the node pretty fast > afterwards, or the node rebooting itself. > > It looks very much like we're affected by > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1403152 but the notes > say it's fixed in the kernel we're running. I've asked the clever person > who checked it to make some extra notes in the bug report. > > It looks very much like when we have a bunch of load on the box the thing is > more likely to trigger - I was wondering if other ops have a max ratio of > routers per network node? I would have thought our current max of 150 > routers per node would be pretty light, but with the dhcp namespaces as well > that's ~450 namespaces on a box and maybe that's an issue? > > Thanks > > ___ > OpenStack-operators mailing list > OpenStack-operators@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
[Openstack-operators] Neutron getting stuck creating namespaces
Hi, Over the last few months we've had a few incidents where the process to create network namespaces (Neutron, OVS) on the network nodes gets 'stuck' and prevents not only the router it's trying to create from finishing, but all further namespace operations too. This has usually finished up with either us rebooting the node pretty fast afterwards, or the node rebooting itself. It looks very much like we're affected by https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1403152 but the notes say it's fixed in the kernel we're running. I've asked the clever person who checked it to make some extra notes in the bug report. It looks very much like when we have a bunch of load on the box the thing is more likely to trigger - I was wondering if other ops have a max ratio of routers per network node? I would have thought our current max of 150 routers per node would be pretty light, but with the dhcp namespaces as well that's ~450 namespaces on a box and maybe that's an issue? Thanks ___ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators