[Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
** Tags added: cscc -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Fix Released Status in ipxe-qemu-256k-compat package in Ubuntu: Invalid Status in linux package in Ubuntu: Invalid Status in ipxe source package in Trusty: Won't Fix Status in ipxe source package in Xenial: Won't Fix Status in ipxe source package in Bionic: Fix Released Status in ipxe source package in Cosmic: Fix Released Status in ipxe source package in Disco: Fix Released Bug description: [Impact] * VLAN 0 is special (for QoS actually, not a real VLAN) * Some components in the stack accidentally strip it, so does ipxe in this case. * Fix by porting a fix that is carried by other distributions as upstream didn't follow the suggestion but it is needed for the use case affected by the bug here (Thanks Andres) [Test Case] * Comment #42 contains a virtual test setup to understand the case but it does NOT trigger the isse. That requires special switch HW that adds VLAN 0 tags for QoS. Therefore Vern (reporter) will test that on a customer site with such hardware being affected by this issue. [Regression Potential] * The only reference to VLAN tags on iPXE boot that we found was on iBFT boot for SCSI, we tested that in comment #34 and it still worked fine. * We didn't see such cases on review, but there might be use cases that made some unexpected use of the headers which are now stripped. But that seems wrong. [Other Info] * n/a --- I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging. Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged. The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes. I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable messages in the thread: http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004916.html http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html Would it be possible to install a local patch as part of the FCE deployment? I suspect the patch(es) mentioned in the above thread would require some modification to apply properly. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1805920/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
This bug was fixed in the package ipxe - 1.0.0+git-20180124.fbe8c52d- 0ubuntu2.2 --- ipxe (1.0.0+git-20180124.fbe8c52d-0ubuntu2.2) bionic; urgency=medium * d/p/0005-strip-802.1Q-VLAN-0-priority-tags.patch: Strip 802.1Q VLAN 0 priority tags; Fixes PXE when VLAN tag is 0. (LP: #1805920) -- Andres Rodriguez Mon, 10 Dec 2018 16:26:42 -0500 ** Changed in: ipxe (Ubuntu Bionic) Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Fix Released Status in ipxe-qemu-256k-compat package in Ubuntu: Invalid Status in linux package in Ubuntu: Invalid Status in ipxe source package in Trusty: Won't Fix Status in ipxe source package in Xenial: Won't Fix Status in ipxe source package in Bionic: Fix Released Status in ipxe source package in Cosmic: Fix Released Status in ipxe source package in Disco: Fix Released Bug description: [Impact] * VLAN 0 is special (for QoS actually, not a real VLAN) * Some components in the stack accidentally strip it, so does ipxe in this case. * Fix by porting a fix that is carried by other distributions as upstream didn't follow the suggestion but it is needed for the use case affected by the bug here (Thanks Andres) [Test Case] * Comment #42 contains a virtual test setup to understand the case but it does NOT trigger the isse. That requires special switch HW that adds VLAN 0 tags for QoS. Therefore Vern (reporter) will test that on a customer site with such hardware being affected by this issue. [Regression Potential] * The only reference to VLAN tags on iPXE boot that we found was on iBFT boot for SCSI, we tested that in comment #34 and it still worked fine. * We didn't see such cases on review, but there might be use cases that made some unexpected use of the headers which are now stripped. But that seems wrong. [Other Info] * n/a --- I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging. Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged. The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes. I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable messages in the thread: http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004916.html http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html Would it be possible to install a local patch as part of the FCE deployment? I suspect the patch(es) mentioned in the above thread would require some modification to apply properly. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1805920/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
This bug was fixed in the package ipxe - 1.0.0+git-20180124.fbe8c52d- 0ubuntu4.1 --- ipxe (1.0.0+git-20180124.fbe8c52d-0ubuntu4.1) cosmic; urgency=medium * d/p/0005-strip-802.1Q-VLAN-0-priority-tags.patch: Strip 802.1Q VLAN 0 priority tags; Fixes PXE when VLAN tag is 0. (LP: #1805920) -- Andres Rodriguez Mon, 10 Dec 2018 16:26:42 -0500 ** Changed in: ipxe (Ubuntu Cosmic) Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Fix Released Status in ipxe-qemu-256k-compat package in Ubuntu: Invalid Status in linux package in Ubuntu: Invalid Status in ipxe source package in Trusty: Won't Fix Status in ipxe source package in Xenial: Won't Fix Status in ipxe source package in Bionic: Fix Committed Status in ipxe source package in Cosmic: Fix Released Status in ipxe source package in Disco: Fix Released Bug description: [Impact] * VLAN 0 is special (for QoS actually, not a real VLAN) * Some components in the stack accidentally strip it, so does ipxe in this case. * Fix by porting a fix that is carried by other distributions as upstream didn't follow the suggestion but it is needed for the use case affected by the bug here (Thanks Andres) [Test Case] * Comment #42 contains a virtual test setup to understand the case but it does NOT trigger the isse. That requires special switch HW that adds VLAN 0 tags for QoS. Therefore Vern (reporter) will test that on a customer site with such hardware being affected by this issue. [Regression Potential] * The only reference to VLAN tags on iPXE boot that we found was on iBFT boot for SCSI, we tested that in comment #34 and it still worked fine. * We didn't see such cases on review, but there might be use cases that made some unexpected use of the headers which are now stripped. But that seems wrong. [Other Info] * n/a --- I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging. Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged. The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes. I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable messages in the thread: http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004916.html http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html Would it be possible to install a local patch as part of the FCE deployment? I suspect the patch(es) mentioned in the above thread would require some modification to apply properly. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1805920/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
I have two nodes, bionic and cosmic. I enabled the proposed repo on each. I installed ipxe: apt install ipxe ipxe-qemu grub-ipxe On bionic, this gave me: # apt list --installed ipxe-qemu grub-ipxe ipxe Listing... Done grub-ipxe/bionic-proposed,bionic-proposed,now 1.0.0+git-20180124.fbe8c52d-0ubuntu2.2 all [installed,automatic] ipxe/bionic-proposed,bionic-proposed,now 1.0.0+git-20180124.fbe8c52d-0ubuntu2.2 all [installed] ipxe-qemu/bionic-proposed,bionic-proposed,now 1.0.0+git-20180124.fbe8c52d-0ubuntu2.2 all [installed] On cosmic, this gave me: # apt list --installed ipxe ipxe-qemu grub-ipxe Listing... Done grub-ipxe/cosmic-proposed,cosmic-proposed,now 1.0.0+git-20180124.fbe8c52d-0ubuntu4.1 all [installed,automatic] ipxe-qemu/cosmic-proposed,cosmic-proposed,now 1.0.0+git-20180124.fbe8c52d-0ubuntu4.1 all [installed] ipxe/cosmic-proposed,cosmic-proposed,now 1.0.0+git-20180124.fbe8c52d-0ubuntu4.1 all [installed] I launched a VM configured to netboot on each system. The VM PXE booted properly on both bionic and cosmic. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Fix Released Status in ipxe-qemu-256k-compat package in Ubuntu: Invalid Status in linux package in Ubuntu: Invalid Status in ipxe source package in Trusty: Won't Fix Status in ipxe source package in Xenial: Won't Fix Status in ipxe source package in Bionic: Fix Committed Status in ipxe source package in Cosmic: Fix Committed Status in ipxe source package in Disco: Fix Released Bug description: [Impact] * VLAN 0 is special (for QoS actually, not a real VLAN) * Some components in the stack accidentally strip it, so does ipxe in this case. * Fix by porting a fix that is carried by other distributions as upstream didn't follow the suggestion but it is needed for the use case affected by the bug here (Thanks Andres) [Test Case] * Comment #42 contains a virtual test setup to understand the case but it does NOT trigger the isse. That requires special switch HW that adds VLAN 0 tags for QoS. Therefore Vern (reporter) will test that on a customer site with such hardware being affected by this issue. [Regression Potential] * The only reference to VLAN tags on iPXE boot that we found was on iBFT boot for SCSI, we tested that in comment #34 and it still worked fine. * We didn't see such cases on review, but there might be use cases that made some unexpected use of the headers which are now stripped. But that seems wrong. [Other Info] * n/a --- I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging. Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged. The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes. I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable messages in the thread: http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004916.html http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html Would it be possible to install a local patch as part of the FCE deployment? I suspect the patch(es) mentioned in the above thread would
[Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
As negative confirmation: I tested a PXE boot with 1.0.0+git-20180124.fbe8c52d-0ubuntu2.1 on bionic and 1.0.0+git-20180124.fbe8c52d-0ubuntu4 on cosmic. As expected, the VMs failed to successfully PXE boot. ** Tags removed: verification-needed-bionic verification-needed-cosmic ** Tags added: verification-done-bionic verification-done-cosmic ** Tags removed: verification-needed ** Tags added: verification-done -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Fix Released Status in ipxe-qemu-256k-compat package in Ubuntu: Invalid Status in linux package in Ubuntu: Invalid Status in ipxe source package in Trusty: Won't Fix Status in ipxe source package in Xenial: Won't Fix Status in ipxe source package in Bionic: Fix Committed Status in ipxe source package in Cosmic: Fix Committed Status in ipxe source package in Disco: Fix Released Bug description: [Impact] * VLAN 0 is special (for QoS actually, not a real VLAN) * Some components in the stack accidentally strip it, so does ipxe in this case. * Fix by porting a fix that is carried by other distributions as upstream didn't follow the suggestion but it is needed for the use case affected by the bug here (Thanks Andres) [Test Case] * Comment #42 contains a virtual test setup to understand the case but it does NOT trigger the isse. That requires special switch HW that adds VLAN 0 tags for QoS. Therefore Vern (reporter) will test that on a customer site with such hardware being affected by this issue. [Regression Potential] * The only reference to VLAN tags on iPXE boot that we found was on iBFT boot for SCSI, we tested that in comment #34 and it still worked fine. * We didn't see such cases on review, but there might be use cases that made some unexpected use of the headers which are now stripped. But that seems wrong. [Other Info] * n/a --- I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging. Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged. The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes. I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable messages in the thread: http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004916.html http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html Would it be possible to install a local patch as part of the FCE deployment? I suspect the patch(es) mentioned in the above thread would require some modification to apply properly. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1805920/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
Any update on testing? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Fix Released Status in ipxe-qemu-256k-compat package in Ubuntu: Invalid Status in linux package in Ubuntu: Invalid Status in ipxe source package in Trusty: Won't Fix Status in ipxe source package in Xenial: Won't Fix Status in ipxe source package in Bionic: Fix Committed Status in ipxe source package in Cosmic: Fix Committed Status in ipxe source package in Disco: Fix Released Bug description: [Impact] * VLAN 0 is special (for QoS actually, not a real VLAN) * Some components in the stack accidentally strip it, so does ipxe in this case. * Fix by porting a fix that is carried by other distributions as upstream didn't follow the suggestion but it is needed for the use case affected by the bug here (Thanks Andres) [Test Case] * Comment #42 contains a virtual test setup to understand the case but it does NOT trigger the isse. That requires special switch HW that adds VLAN 0 tags for QoS. Therefore Vern (reporter) will test that on a customer site with such hardware being affected by this issue. [Regression Potential] * The only reference to VLAN tags on iPXE boot that we found was on iBFT boot for SCSI, we tested that in comment #34 and it still worked fine. * We didn't see such cases on review, but there might be use cases that made some unexpected use of the headers which are now stripped. But that seems wrong. [Other Info] * n/a --- I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging. Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged. The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes. I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable messages in the thread: http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004916.html http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html Would it be possible to install a local patch as part of the FCE deployment? I suspect the patch(es) mentioned in the above thread would require some modification to apply properly. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1805920/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
s/Not/Note/ -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Fix Released Status in ipxe-qemu-256k-compat package in Ubuntu: Invalid Status in linux package in Ubuntu: Invalid Status in ipxe source package in Trusty: Won't Fix Status in ipxe source package in Xenial: Won't Fix Status in ipxe source package in Bionic: Fix Committed Status in ipxe source package in Cosmic: Fix Committed Status in ipxe source package in Disco: Fix Released Bug description: [Impact] * VLAN 0 is special (for QoS actually, not a real VLAN) * Some components in the stack accidentally strip it, so does ipxe in this case. * Fix by porting a fix that is carried by other distributions as upstream didn't follow the suggestion but it is needed for the use case affected by the bug here (Thanks Andres) [Test Case] * Comment #42 contains a virtual test setup to understand the case but it does NOT trigger the isse. That requires special switch HW that adds VLAN 0 tags for QoS. Therefore Vern (reporter) will test that on a customer site with such hardware being affected by this issue. [Regression Potential] * The only reference to VLAN tags on iPXE boot that we found was on iBFT boot for SCSI, we tested that in comment #34 and it still worked fine. * We didn't see such cases on review, but there might be use cases that made some unexpected use of the headers which are now stripped. But that seems wrong. [Other Info] * n/a --- I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging. Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged. The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes. I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable messages in the thread: http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004916.html http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html Would it be possible to install a local patch as part of the FCE deployment? I suspect the patch(es) mentioned in the above thread would require some modification to apply properly. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1805920/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
Yes, sorry. I will try to test tonight (in a few hours) when I'm back at the hotel. Not that I only have bionic and xenial to test with. I can try to upgrade one of those to cosmic. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Fix Released Status in ipxe-qemu-256k-compat package in Ubuntu: Invalid Status in linux package in Ubuntu: Invalid Status in ipxe source package in Trusty: Won't Fix Status in ipxe source package in Xenial: Won't Fix Status in ipxe source package in Bionic: Fix Committed Status in ipxe source package in Cosmic: Fix Committed Status in ipxe source package in Disco: Fix Released Bug description: [Impact] * VLAN 0 is special (for QoS actually, not a real VLAN) * Some components in the stack accidentally strip it, so does ipxe in this case. * Fix by porting a fix that is carried by other distributions as upstream didn't follow the suggestion but it is needed for the use case affected by the bug here (Thanks Andres) [Test Case] * Comment #42 contains a virtual test setup to understand the case but it does NOT trigger the isse. That requires special switch HW that adds VLAN 0 tags for QoS. Therefore Vern (reporter) will test that on a customer site with such hardware being affected by this issue. [Regression Potential] * The only reference to VLAN tags on iPXE boot that we found was on iBFT boot for SCSI, we tested that in comment #34 and it still worked fine. * We didn't see such cases on review, but there might be use cases that made some unexpected use of the headers which are now stripped. But that seems wrong. [Other Info] * n/a --- I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging. Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged. The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes. I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable messages in the thread: http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004916.html http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html Would it be possible to install a local patch as part of the FCE deployment? I suspect the patch(es) mentioned in the above thread would require some modification to apply properly. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1805920/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
@Vern - ping please test to unblock this from bionic-/cosmic-proposed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Fix Released Status in ipxe-qemu-256k-compat package in Ubuntu: Invalid Status in linux package in Ubuntu: Invalid Status in ipxe source package in Trusty: Won't Fix Status in ipxe source package in Xenial: Won't Fix Status in ipxe source package in Bionic: Fix Committed Status in ipxe source package in Cosmic: Fix Committed Status in ipxe source package in Disco: Fix Released Bug description: [Impact] * VLAN 0 is special (for QoS actually, not a real VLAN) * Some components in the stack accidentally strip it, so does ipxe in this case. * Fix by porting a fix that is carried by other distributions as upstream didn't follow the suggestion but it is needed for the use case affected by the bug here (Thanks Andres) [Test Case] * Comment #42 contains a virtual test setup to understand the case but it does NOT trigger the isse. That requires special switch HW that adds VLAN 0 tags for QoS. Therefore Vern (reporter) will test that on a customer site with such hardware being affected by this issue. [Regression Potential] * The only reference to VLAN tags on iPXE boot that we found was on iBFT boot for SCSI, we tested that in comment #34 and it still worked fine. * We didn't see such cases on review, but there might be use cases that made some unexpected use of the headers which are now stripped. But that seems wrong. [Other Info] * n/a --- I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging. Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged. The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes. I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable messages in the thread: http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004916.html http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html Would it be possible to install a local patch as part of the FCE deployment? I suspect the patch(es) mentioned in the above thread would require some modification to apply properly. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1805920/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
@Vern - if there is any ETA when you will get to the real SRU verifications that were requested by Brian a week ago let us know -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Fix Released Status in ipxe-qemu-256k-compat package in Ubuntu: Invalid Status in linux package in Ubuntu: Invalid Status in ipxe source package in Trusty: Won't Fix Status in ipxe source package in Xenial: Won't Fix Status in ipxe source package in Bionic: Fix Committed Status in ipxe source package in Cosmic: Fix Committed Status in ipxe source package in Disco: Fix Released Bug description: [Impact] * VLAN 0 is special (for QoS actually, not a real VLAN) * Some components in the stack accidentally strip it, so does ipxe in this case. * Fix by porting a fix that is carried by other distributions as upstream didn't follow the suggestion but it is needed for the use case affected by the bug here (Thanks Andres) [Test Case] * Comment #42 contains a virtual test setup to understand the case but it does NOT trigger the isse. That requires special switch HW that adds VLAN 0 tags for QoS. Therefore Vern (reporter) will test that on a customer site with such hardware being affected by this issue. [Regression Potential] * The only reference to VLAN tags on iPXE boot that we found was on iBFT boot for SCSI, we tested that in comment #34 and it still worked fine. * We didn't see such cases on review, but there might be use cases that made some unexpected use of the headers which are now stripped. But that seems wrong. [Other Info] * n/a --- I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging. Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged. The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes. I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable messages in the thread: http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004916.html http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html Would it be possible to install a local patch as part of the FCE deployment? I suspect the patch(es) mentioned in the above thread would require some modification to apply properly. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1805920/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
Hello Vern, or anyone else affected, Accepted ipxe into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ipxe/1.0.0+git-20180124.fbe8c52d- 0ubuntu2.2 in a few hours, and then in the -proposed repository. Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users. If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed. Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping! N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days. ** Changed in: ipxe (Ubuntu Bionic) Status: Incomplete => Fix Committed ** Tags added: verification-needed-bionic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Fix Released Status in ipxe-qemu-256k-compat package in Ubuntu: Invalid Status in linux package in Ubuntu: Invalid Status in ipxe source package in Trusty: Won't Fix Status in ipxe source package in Xenial: Won't Fix Status in ipxe source package in Bionic: Fix Committed Status in ipxe source package in Cosmic: Fix Committed Status in ipxe source package in Disco: Fix Released Bug description: [Impact] * VLAN 0 is special (for QoS actually, not a real VLAN) * Some components in the stack accidentally strip it, so does ipxe in this case. * Fix by porting a fix that is carried by other distributions as upstream didn't follow the suggestion but it is needed for the use case affected by the bug here (Thanks Andres) [Test Case] * Comment #42 contains a virtual test setup to understand the case but it does NOT trigger the isse. That requires special switch HW that adds VLAN 0 tags for QoS. Therefore Vern (reporter) will test that on a customer site with such hardware being affected by this issue. [Regression Potential] * The only reference to VLAN tags on iPXE boot that we found was on iBFT boot for SCSI, we tested that in comment #34 and it still worked fine. * We didn't see such cases on review, but there might be use cases that made some unexpected use of the headers which are now stripped. But that seems wrong. [Other Info] * n/a --- I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging. Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged. The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes. I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable messages in the th
[Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
Hello Vern, or anyone else affected, Accepted ipxe into cosmic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ipxe/1.0.0+git-20180124.fbe8c52d- 0ubuntu4.1 in a few hours, and then in the -proposed repository. Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users. If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-cosmic to verification-done-cosmic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-cosmic. In either case, without details of your testing we will not be able to proceed. Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping! N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days. ** Changed in: ipxe (Ubuntu Cosmic) Status: Incomplete => Fix Committed ** Tags added: verification-needed verification-needed-cosmic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Fix Released Status in ipxe-qemu-256k-compat package in Ubuntu: Invalid Status in linux package in Ubuntu: Invalid Status in ipxe source package in Trusty: Won't Fix Status in ipxe source package in Xenial: Won't Fix Status in ipxe source package in Bionic: Incomplete Status in ipxe source package in Cosmic: Fix Committed Status in ipxe source package in Disco: Fix Released Bug description: [Impact] * VLAN 0 is special (for QoS actually, not a real VLAN) * Some components in the stack accidentally strip it, so does ipxe in this case. * Fix by porting a fix that is carried by other distributions as upstream didn't follow the suggestion but it is needed for the use case affected by the bug here (Thanks Andres) [Test Case] * Comment #42 contains a virtual test setup to understand the case but it does NOT trigger the isse. That requires special switch HW that adds VLAN 0 tags for QoS. Therefore Vern (reporter) will test that on a customer site with such hardware being affected by this issue. [Regression Potential] * The only reference to VLAN tags on iPXE boot that we found was on iBFT boot for SCSI, we tested that in comment #34 and it still worked fine. * We didn't see such cases on review, but there might be use cases that made some unexpected use of the headers which are now stripped. But that seems wrong. [Other Info] * n/a --- I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging. Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged. The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes. I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable m
[Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
Ok, thanks for the precheck Vern. Now that we know that you will be able to SRU-verify this I have uploaded it to the SRU queue. Once accepted by the SRU Team there will be updates here asking for verification. Please do that for Bionic and Cosmic then - if you need any help let us know. Eventually that will make the fix released into the Ubuntu Archive. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Fix Released Status in ipxe-qemu-256k-compat package in Ubuntu: Invalid Status in linux package in Ubuntu: Invalid Status in ipxe source package in Trusty: Won't Fix Status in ipxe source package in Xenial: Won't Fix Status in ipxe source package in Bionic: Incomplete Status in ipxe source package in Cosmic: Incomplete Status in ipxe source package in Disco: Fix Released Bug description: [Impact] * VLAN 0 is special (for QoS actually, not a real VLAN) * Some components in the stack accidentally strip it, so does ipxe in this case. * Fix by porting a fix that is carried by other distributions as upstream didn't follow the suggestion but it is needed for the use case affected by the bug here (Thanks Andres) [Test Case] * Comment #42 contains a virtual test setup to understand the case but it does NOT trigger the isse. That requires special switch HW that adds VLAN 0 tags for QoS. Therefore Vern (reporter) will test that on a customer site with such hardware being affected by this issue. [Regression Potential] * The only reference to VLAN tags on iPXE boot that we found was on iBFT boot for SCSI, we tested that in comment #34 and it still worked fine. * We didn't see such cases on review, but there might be use cases that made some unexpected use of the headers which are now stripped. But that seems wrong. [Other Info] * n/a --- I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging. Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged. The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes. I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable messages in the thread: http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004916.html http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html Would it be possible to install a local patch as part of the FCE deployment? I suspect the patch(es) mentioned in the above thread would require some modification to apply properly. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1805920/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
After realizing there are packages in the ci build [1] I installed the following version from there: 1.0.0+git-20180124.fbe8c52d-0ubuntu4.1 I redefined the testipxe vm from the above test, and it also successfully pxe booted. [1]: https://launchpad.net/~ci-train-ppa- service/+archive/ubuntu/3560/+packages -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Fix Released Status in ipxe-qemu-256k-compat package in Ubuntu: Invalid Status in linux package in Ubuntu: Invalid Status in ipxe source package in Trusty: Won't Fix Status in ipxe source package in Xenial: Won't Fix Status in ipxe source package in Bionic: Incomplete Status in ipxe source package in Cosmic: Incomplete Status in ipxe source package in Disco: Fix Released Bug description: [Impact] * VLAN 0 is special (for QoS actually, not a real VLAN) * Some components in the stack accidentally strip it, so does ipxe in this case. * Fix by porting a fix that is carried by other distributions as upstream didn't follow the suggestion but it is needed for the use case affected by the bug here (Thanks Andres) [Test Case] * Comment #42 contains a virtual test setup to understand the case but it does NOT trigger the isse. That requires special switch HW that adds VLAN 0 tags for QoS. Therefore Vern (reporter) will test that on a customer site with such hardware being affected by this issue. [Regression Potential] * The only reference to VLAN tags on iPXE boot that we found was on iBFT boot for SCSI, we tested that in comment #34 and it still worked fine. * We didn't see such cases on review, but there might be use cases that made some unexpected use of the headers which are now stripped. But that seems wrong. [Other Info] * n/a --- I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging. Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged. The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes. I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable messages in the thread: http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004916.html http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html Would it be possible to install a local patch as part of the FCE deployment? I suspect the patch(es) mentioned in the above thread would require some modification to apply properly. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1805920/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
I have run a test on cosmic. The test involved MAAS 2.4.3 installed on bionic on 3 of the blades of the UCS chassis in the customer's data center. I installed cosmic, 18.10 on a 4th blade and installed libvirt and qemu-kvm and defined a VM similar to how maas defines VMs. with this xml: https://pastebin.ubuntu.com/p/yCTRGDjx2H/ $ lsb_release -d Description:Ubuntu 18.10 The ipxe-qemu version installed from dist is: 1.0.0+git-20180124.fbe8c52d-0ubuntu4 The attached screenshot is of the failed pxe boot of the testipxe vm. Added the ppa:andreserl/maas apt repo and installed ipxe-qemu which gave me version: 1.0.0+git-20180124.fbe8c52d-0ubuntu2.2~18.04.1 Note that I had to edit /etc/apt/sources.list.d/andreserl-ubuntu-maas- cosmic.list replacing "cosmic" with "bionic" because that repo doesn't have cosmic packages. And then I had to downgrade the ipxe-qemu because the cosmic version is greater than the one in the fix repo: # apt install ipxe-qemu=1.0.0+git-20180124.fbe8c52d- 0ubuntu2.2~18.04.1 Once I jumped through those hoops, I booted the exact same testipxe vm that failed to pxe boot above and it succeeded in getting an IP and commission in MAAS. ** Attachment added: "screenshot of failed boot" https://bugs.launchpad.net/maas/+bug/1805920/+attachment/5228521/+files/Screen%20Shot%202019-01-11%20at%207.36.20%20PM.png -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Fix Released Status in ipxe-qemu-256k-compat package in Ubuntu: Invalid Status in linux package in Ubuntu: Invalid Status in ipxe source package in Trusty: Won't Fix Status in ipxe source package in Xenial: Won't Fix Status in ipxe source package in Bionic: Incomplete Status in ipxe source package in Cosmic: Incomplete Status in ipxe source package in Disco: Fix Released Bug description: [Impact] * VLAN 0 is special (for QoS actually, not a real VLAN) * Some components in the stack accidentally strip it, so does ipxe in this case. * Fix by porting a fix that is carried by other distributions as upstream didn't follow the suggestion but it is needed for the use case affected by the bug here (Thanks Andres) [Test Case] * Comment #42 contains a virtual test setup to understand the case but it does NOT trigger the isse. That requires special switch HW that adds VLAN 0 tags for QoS. Therefore Vern (reporter) will test that on a customer site with such hardware being affected by this issue. [Regression Potential] * The only reference to VLAN tags on iPXE boot that we found was on iBFT boot for SCSI, we tested that in comment #34 and it still worked fine. * We didn't see such cases on review, but there might be use cases that made some unexpected use of the headers which are now stripped. But that seems wrong. [Other Info] * n/a --- I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging. Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged. The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes. I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable
[Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
Ok, once you know you'll be able to verify it on Cosmic as well (by successfully testing from the PPA) let me know. We will then upload it as a real SRU which needs verification per [1]. [1]: https://wiki.ubuntu.com/StableReleaseUpdates -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Fix Released Status in ipxe-qemu-256k-compat package in Ubuntu: Invalid Status in linux package in Ubuntu: Invalid Status in ipxe source package in Trusty: Won't Fix Status in ipxe source package in Xenial: Won't Fix Status in ipxe source package in Bionic: Incomplete Status in ipxe source package in Cosmic: Incomplete Status in ipxe source package in Disco: Fix Released Bug description: [Impact] * VLAN 0 is special (for QoS actually, not a real VLAN) * Some components in the stack accidentally strip it, so does ipxe in this case. * Fix by porting a fix that is carried by other distributions as upstream didn't follow the suggestion but it is needed for the use case affected by the bug here (Thanks Andres) [Test Case] * Comment #42 contains a virtual test setup to understand the case but it does NOT trigger the isse. That requires special switch HW that adds VLAN 0 tags for QoS. Therefore Vern (reporter) will test that on a customer site with such hardware being affected by this issue. [Regression Potential] * The only reference to VLAN tags on iPXE boot that we found was on iBFT boot for SCSI, we tested that in comment #34 and it still worked fine. * We didn't see such cases on review, but there might be use cases that made some unexpected use of the headers which are now stripped. But that seems wrong. [Other Info] * n/a --- I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging. Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged. The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes. I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable messages in the thread: http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004916.html http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html Would it be possible to install a local patch as part of the FCE deployment? I suspect the patch(es) mentioned in the above thread would require some modification to apply properly. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1805920/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
I was able to verify the fix works for bionic using maas 2.4.3. Am about to install cosmic on the customer hardware to verify the fix there too. I have run into a maybe-related issue with maas 2.5.0 filed separately here: https://bugs.launchpad.net/maas/+bug/1811021 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Fix Released Status in ipxe-qemu-256k-compat package in Ubuntu: Invalid Status in linux package in Ubuntu: Invalid Status in ipxe source package in Trusty: Won't Fix Status in ipxe source package in Xenial: Won't Fix Status in ipxe source package in Bionic: Incomplete Status in ipxe source package in Cosmic: Incomplete Status in ipxe source package in Disco: Fix Released Bug description: [Impact] * VLAN 0 is special (for QoS actually, not a real VLAN) * Some components in the stack accidentally strip it, so does ipxe in this case. * Fix by porting a fix that is carried by other distributions as upstream didn't follow the suggestion but it is needed for the use case affected by the bug here (Thanks Andres) [Test Case] * Comment #42 contains a virtual test setup to understand the case but it does NOT trigger the isse. That requires special switch HW that adds VLAN 0 tags for QoS. Therefore Vern (reporter) will test that on a customer site with such hardware being affected by this issue. [Regression Potential] * The only reference to VLAN tags on iPXE boot that we found was on iBFT boot for SCSI, we tested that in comment #34 and it still worked fine. * We didn't see such cases on review, but there might be use cases that made some unexpected use of the headers which are now stripped. But that seems wrong. [Other Info] * n/a --- I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging. Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged. The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes. I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable messages in the thread: http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004916.html http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html Would it be possible to install a local patch as part of the FCE deployment? I suspect the patch(es) mentioned in the above thread would require some modification to apply properly. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1805920/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
NO response yet, we really want to fix it but need some way to verify it. Sorry to ask once again, but we need to get this unblocked. @Vern - can you use the setup to verify the two planned uploads for Bionic and Cosmic? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Fix Released Status in ipxe-qemu-256k-compat package in Ubuntu: Invalid Status in linux package in Ubuntu: Invalid Status in ipxe source package in Trusty: Won't Fix Status in ipxe source package in Xenial: Won't Fix Status in ipxe source package in Bionic: Incomplete Status in ipxe source package in Cosmic: Incomplete Status in ipxe source package in Disco: Fix Released Bug description: [Impact] * VLAN 0 is special (for QoS actually, not a real VLAN) * Some components in the stack accidentally strip it, so does ipxe in this case. * Fix by porting a fix that is carried by other distributions as upstream didn't follow the suggestion but it is needed for the use case affected by the bug here (Thanks Andres) [Test Case] * Comment #42 contains a virtual test setup to understand the case but it does NOT trigger the isse. That requires special switch HW that adds VLAN 0 tags for QoS. Therefore Vern (reporter) will test that on a customer site with such hardware being affected by this issue. [Regression Potential] * The only reference to VLAN tags on iPXE boot that we found was on iBFT boot for SCSI, we tested that in comment #34 and it still worked fine. * We didn't see such cases on review, but there might be use cases that made some unexpected use of the headers which are now stripped. But that seems wrong. [Other Info] * n/a --- I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging. Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged. The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes. I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable messages in the thread: http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004916.html http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html Would it be possible to install a local patch as part of the FCE deployment? I suspect the patch(es) mentioned in the above thread would require some modification to apply properly. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1805920/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
I'll upload once I get a confirmation that it will be tested on both releases on the affected HW. ** Description changed: [Impact] * VLAN 0 is special (for QoS actually, not a real VLAN) * Some components in the stack accidentally strip it, so does ipxe in this case. * Fix by porting a fix that is carried by other distributions as upstream didn't follow the suggestion but it is needed for the use case affected by the bug here (Thanks Andres) [Test Case] - * TODO + * Comment #42 contains a virtual test setup to understand the case but it +does NOT trigger the isse. That requires special switch HW that adds +VLAN 0 tags for QoS. Therefore Vern (reporter) will test that on a +customer site with such hardware being affected by this issue. [Regression Potential] * The only reference to VLAN tags on iPXE boot that we found was on iBFT boot for SCSI, we tested that in comment #34 and it still worked fine. * We didn't see such cases on review, but there might be use cases that made some unexpected use of the headers which are now stripped. But that seems wrong. [Other Info] * n/a --- I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging. Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged. The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes. I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable messages in the thread: http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004916.html http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html Would it be possible to install a local patch as part of the FCE deployment? I suspect the patch(es) mentioned in the above thread would require some modification to apply properly. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Fix Released Status in ipxe-qemu-256k-compat package in Ubuntu: Invalid Status in linux package in Ubuntu: Invalid Status in ipxe source package in Trusty: Won't Fix Status in ipxe source package in Xenial: Won't Fix Status in ipxe source package in Bionic: Incomplete Status in ipxe source package in Cosmic: Incomplete Status in ipxe source package in Disco: Fix Released Bug description: [Impact] * VLAN 0 is special (for QoS actually, not a real VLAN) * Some components in the stack accidentally strip it, so does ipxe in this case. * Fix by porting a fix that is carried by other distributions as upstream didn't follow the suggestion but it is needed for the use case affected by the bug here (Thanks Andres) [Test Case] * Comment #42 contains a virtual test setup to understand the case but it does NOT trigger the isse. That requires special switch HW that adds VLAN 0 tags for QoS. Therefore Vern (reporter) will test that on a customer site with such hardware being affected by this issue. [Regression Potential] * The only reference to VLAN tags on iPXE boot that we found was on iBFT boot for SCSI, we tested that in comment #34 and it still worked fine. * We didn't s
Re: [Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
Thanks for the Details, but that means that even the MAAS Team might have enough systems, but not the right special HW. Therefore @Vern could you do the pre-checks with your associated customer that you can verify the bug on their setup before we push it as SRU? As I said, you only need to have Bionic/Cosmic on the target KVM Host that is supposed to spawn the Pod and currently fails. IMHO all other components can stay as-is. Please let me know if that would be possible -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Fix Released Status in ipxe-qemu-256k-compat package in Ubuntu: Invalid Status in linux package in Ubuntu: Invalid Status in ipxe source package in Trusty: Won't Fix Status in ipxe source package in Xenial: Won't Fix Status in ipxe source package in Bionic: Incomplete Status in ipxe source package in Cosmic: Incomplete Status in ipxe source package in Disco: Fix Released Bug description: [Impact] * VLAN 0 is special (for QoS actually, not a real VLAN) * Some components in the stack accidentally strip it, so does ipxe in this case. * Fix by porting a fix that is carried by other distributions as upstream didn't follow the suggestion but it is needed for the use case affected by the bug here (Thanks Andres) [Test Case] * TODO [Regression Potential] * The only reference to VLAN tags on iPXE boot that we found was on iBFT boot for SCSI, we tested that in comment #34 and it still worked fine. * We didn't see such cases on review, but there might be use cases that made some unexpected use of the headers which are now stripped. But that seems wrong. [Other Info] * n/a --- I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging. Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged. The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes. I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable messages in the thread: http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004916.html http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html Would it be possible to install a local patch as part of the FCE deployment? I suspect the patch(es) mentioned in the above thread would require some modification to apply properly. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1805920/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
FWIW, Cisco documentation here states that priority tagging is enabled by default: https://www.cisco.com/c/en/us/td/docs/switches/connectedgrid/cg-switch- sw-master/software/configuration/guide/vlan0/b_vlan_0.html "Default Settings VLAN 0 priority tagging is enabled by default." -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Fix Released Status in ipxe-qemu-256k-compat package in Ubuntu: Invalid Status in linux package in Ubuntu: Invalid Status in ipxe source package in Trusty: Won't Fix Status in ipxe source package in Xenial: Won't Fix Status in ipxe source package in Bionic: Incomplete Status in ipxe source package in Cosmic: Incomplete Status in ipxe source package in Disco: Fix Released Bug description: [Impact] * VLAN 0 is special (for QoS actually, not a real VLAN) * Some components in the stack accidentally strip it, so does ipxe in this case. * Fix by porting a fix that is carried by other distributions as upstream didn't follow the suggestion but it is needed for the use case affected by the bug here (Thanks Andres) [Test Case] * TODO [Regression Potential] * The only reference to VLAN tags on iPXE boot that we found was on iBFT boot for SCSI, we tested that in comment #34 and it still worked fine. * We didn't see such cases on review, but there might be use cases that made some unexpected use of the headers which are now stripped. But that seems wrong. [Other Info] * n/a --- I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging. Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged. The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes. I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable messages in the thread: http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004916.html http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html Would it be possible to install a local patch as part of the FCE deployment? I suspect the patch(es) mentioned in the above thread would require some modification to apply properly. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1805920/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
It seems an important component to the failure scenario is the hardware. The customer equipment is a Cisco UCS chassis and the MAAS nodes are blades in that chassis. Even though we cannot find anything in configuration that specifically adds the vlan-0 tag (or priority tag), traffic between the blades goes out one node untagged and shows up tagged on the other node. Some bugs/discussions around vlan-0 and UCS: https://quickview.cloudapps.cisco.com/quickview/bug/CSCuu29425 https://quickview.cloudapps.cisco.com/quickview/bug/CSCuz83183 https://bugs.launchpad.net/opencontrail/+bug/1457805 https://arstechnica.com/civis/viewtopic.php?f=10&t=1442797 https://lists.linuxfoundation.org/pipermail/fds-dev/2017-May/000710.html http://lists.openstack.org/pipermail/openstack-operators/2013-April/002777.html https://linux.oracle.com/pls/apex/f?p=102:2:::NO::P2_VC_ID,P2_VERSION:606,1.0 As a note, Cisco seems to suggest it's a bug in Linux, citing these two old posts: https://lists.openwall.net/netdev/2013/09/10/30 https://lists.linuxfoundation.org/pipermail/bridge/2015-July/009630.html But I'm not convinced they are valid since this vlan-0 tag problem only shows up with this specific Cisco hardware. It seems like there are multiple network related software projects (like ipxe, vpp, probably others) that are forced to deal with the special case of vlan 0 (priority tagging) being added by Cisco UCS switches because Cisco's stance is that they're not adding the tags. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Fix Released Status in ipxe-qemu-256k-compat package in Ubuntu: Invalid Status in linux package in Ubuntu: Invalid Status in ipxe source package in Trusty: Won't Fix Status in ipxe source package in Xenial: Won't Fix Status in ipxe source package in Bionic: Incomplete Status in ipxe source package in Cosmic: Incomplete Status in ipxe source package in Disco: Fix Released Bug description: [Impact] * VLAN 0 is special (for QoS actually, not a real VLAN) * Some components in the stack accidentally strip it, so does ipxe in this case. * Fix by porting a fix that is carried by other distributions as upstream didn't follow the suggestion but it is needed for the use case affected by the bug here (Thanks Andres) [Test Case] * TODO [Regression Potential] * The only reference to VLAN tags on iPXE boot that we found was on iBFT boot for SCSI, we tested that in comment #34 and it still worked fine. * We didn't see such cases on review, but there might be use cases that made some unexpected use of the headers which are now stripped. But that seems wrong. [Other Info] * n/a --- I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging. Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged. The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes. I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable messages in the thread: http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004916.html http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html
[Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
TL;DR: - I really really tried, but failed to recreate yout case on a single system - I need your real setup as the VLAN-Tag-0 addition in your case seems to be different - That makes my request to one of you committing (and checking to be able to) to verify this even more important - see comment #41 Details (of a failed test approach) # Simple iPXE (without dhcp/tftp/...) # get some virtualization that gives us a bridge with dhcp $ sudo apt install uvtool-libvirt apache2 # re-logon for permissions # copy host kernel there to boot from $ sudo cp -v /boot/vmlinuz-$(uname -r) /boot/initrd.img-$(uname -r) /var/www/html/ $ sudo chown www-data:www-data /var/www/html/* # prep qemu to tap on the libvirt bridge sudo mkdir -p /etc/qemu $ echo "allow all" | sudo tee /etc/qemu/bridge.conf # start qemu and right at the start press Ctrl+B to get to the iPXE prompt $ sudo qemu-system-x86 -cpu host -net nic -net bridge,br=virbr0 -m 1024 -enable-kvm -curses -boot n # in IPXE then iPXE> dhcp # check your dhcp config to work on the expected network iPXE> show ip # use your IPs and kernel versions for this iPXE> kernel http://192.168.122.1/vmlinuz-4.15.0-42-generic iPXE> initrd http://192.168.122.1/initrd.img-4.15.0-42-generic iPXE> boot You can do the same with a config, by putting a ipxe config file at your apache cat << EOF >/var/www/html/ipxe.config #!ipxe kernel http://192.168.122.1/vmlinuz-4.15.0-42-generic initrd http://192.168.122.1/initrd.img-4.15.0-42-generic boot EOF And then boot with chainbooting: iPXE> dhcp iPXE> chain http://192.168.122.1/ipxe.config # Try to use VLANs here # Note: There would actually be afull VLAN feature which no one every requested. # It is off atm, but per https://ipxe.org/cmd/vcreate we could now do like iPXE> vcreate --tag 42 net0 iPXE> set net0-42/ip 192.168.123.100 iPXE> set net0-42/netmask 255.255.255.0 iPXE> set net0-42/gateway 192.168.123.1 So I wonder about your case, we have a non vlan-aware iPXE that gets 0-tagged packages - is that correct? And most other network stacks would shrug the 0-tag off, but iPXE does not and thinks it is not there (unless you'd config through vcreate maybe). Lets try to "simulate" that ... # Add a "normal" VLAN tag 0 interface to the host bridge $ sudo ip link add name virbr0.0 link virbr0 type vlan id 0 $ sudo ip addr add 192.168.124.1/24 broadcast 192.168.124.255 dev virbr0.0 # On boot we configure iPXE to use that IP range, but intentionally ignoring any VLAN tagging iPXE> set net0/ip 192.168.124.100 iPXE> set net0/netmask 255.255.255.0 iPXE> set net0/gateway 192.168.124.1 iPXE> set net0/dns 192.168.124.1 iPXE> ifopen net0 iPXE> chain http://192.168.124.1/ipxe.config Without the fix this blocks on not reaching it iPXE> chain http://192.168.124.1/ipxe.config http://192.168.124.1/ipxe.config.. Connection timed out (http://ipxe.org/4c0a6035) Installed the 1.0.0+git-20180124.fbe8c52d-0ubuntu2.2 from proposed but it fails there as well. Probably your VLAN-0-TAG case is slightly different to what I had assumed here, but atm I have no way to know where. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Fix Released Status in ipxe-qemu-256k-compat package in Ubuntu: Invalid Status in linux package in Ubuntu: Invalid Status in ipxe source package in Trusty: Won't Fix Status in ipxe source package in Xenial: Won't Fix Status in ipxe source package in Bionic: Incomplete Status in ipxe source package in Cosmic: Incomplete Status in ipxe source package in Disco: Fix Released Bug description: [Impact] * VLAN 0 is special (for QoS actually, not a real VLAN) * Some components in the stack accidentally strip it, so does ipxe in this case. * Fix by porting a fix that is carried by other distributions as upstream didn't follow the suggestion but it is needed for the use case affected by the bug here (Thanks Andres) [Test Case] * TODO [Regression Potential] * The only reference to VLAN tags on iPXE boot that we found was on iBFT boot for SCSI, we tested that in comment #34 and it still worked fine. * We didn't see such cases on review, but there might be use cases that made some unexpected use of the headers which are now stripped. But that seems wrong. [Other Info] * n/a --- I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (t
[Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
virt stack tests are successful on 1.0.0+git-20180124.fbe8c52d-0ubuntu2.2 and 1.0.0+git-20180124.fbe8c52d-0ubuntu4.1 from the PPA. Including various others tests, but mostly related migrations and upgrades between those versions. Everything fine on that end ... -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Fix Released Status in ipxe-qemu-256k-compat package in Ubuntu: Invalid Status in linux package in Ubuntu: Invalid Status in ipxe source package in Trusty: Won't Fix Status in ipxe source package in Xenial: Won't Fix Status in ipxe source package in Bionic: Incomplete Status in ipxe source package in Cosmic: Incomplete Status in ipxe source package in Disco: Fix Released Bug description: [Impact] * VLAN 0 is special (for QoS actually, not a real VLAN) * Some components in the stack accidentally strip it, so does ipxe in this case. * Fix by porting a fix that is carried by other distributions as upstream didn't follow the suggestion but it is needed for the use case affected by the bug here (Thanks Andres) [Test Case] * TODO [Regression Potential] * The only reference to VLAN tags on iPXE boot that we found was on iBFT boot for SCSI, we tested that in comment #34 and it still worked fine. * We didn't see such cases on review, but there might be use cases that made some unexpected use of the headers which are now stripped. But that seems wrong. [Other Info] * n/a --- I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging. Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged. The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes. I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable messages in the thread: http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004916.html http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html Would it be possible to install a local patch as part of the FCE deployment? I suspect the patch(es) mentioned in the above thread would require some modification to apply properly. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1805920/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
Thank Vern for outlining your testcase. @Vern Doesn't that also have to include some component that does QoS management adding the VLAN-0 tag? I don't have the systems to verify this case on Bionic and Cosmic. As I read it from Vern he can only test Bionic at the Customer and is unsure he can test Cosmic the same way. @MAAS team - Would you have a MAAS tets environment that you can re-use to verify this? If so could you confirm that you can locally trigger the bug as it is today so that we can rely on it on SRU processing? @Vern - if above is Nack'ed by the MAAS team could you ask if you could if could get nodes to verify that on Cosmic as well. As I read your test description I think it would be enough to bump the node that is supposed to start the guest to Cosmic+Proposed then - no need to update all systems to Cosmic. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Fix Released Status in ipxe-qemu-256k-compat package in Ubuntu: Invalid Status in linux package in Ubuntu: Invalid Status in ipxe source package in Trusty: Won't Fix Status in ipxe source package in Xenial: Won't Fix Status in ipxe source package in Bionic: Incomplete Status in ipxe source package in Cosmic: Incomplete Status in ipxe source package in Disco: Fix Released Bug description: [Impact] * VLAN 0 is special (for QoS actually, not a real VLAN) * Some components in the stack accidentally strip it, so does ipxe in this case. * Fix by porting a fix that is carried by other distributions as upstream didn't follow the suggestion but it is needed for the use case affected by the bug here (Thanks Andres) [Test Case] * TODO [Regression Potential] * The only reference to VLAN tags on iPXE boot that we found was on iBFT boot for SCSI, we tested that in comment #34 and it still worked fine. * We didn't see such cases on review, but there might be use cases that made some unexpected use of the headers which are now stripped. But that seems wrong. [Other Info] * n/a --- I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging. Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged. The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes. I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable messages in the thread: http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004916.html http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html Would it be possible to install a local patch as part of the FCE deployment? I suspect the patch(es) mentioned in the above thread would require some modification to apply properly. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1805920/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
The customer has bionic installed on 3 of the blades and I have installed MAAS 2.4.3 on them using the Foundation Cloud Engine. I don't have access to do the OS install myself. I could request a pair of blades installed with cosmic but I'm unsure if I need all 3 or if I can get by with just 2. Easiest would probably be all 3 so that I can be sure at least one is not running dhcpd. The test would consist of: # Install MAAS on 3 nodes # Install ipxe to be tested on all three nodes # Configure subnet for PXE booting to have primary dhcp on first node and secondary on second node # Provision Pods on each maas node # Create and commission a VM on each pod # The VMs on first and second will commission successfully # The VM on the third node will fail DHCP/PXE # You can use virt-viewer to view the console of the VM to verify PXE failure -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Fix Released Status in ipxe-qemu-256k-compat package in Ubuntu: Invalid Status in linux package in Ubuntu: Invalid Status in ipxe source package in Trusty: Won't Fix Status in ipxe source package in Xenial: Won't Fix Status in ipxe source package in Bionic: Incomplete Status in ipxe source package in Cosmic: Incomplete Status in ipxe source package in Disco: Fix Released Bug description: [Impact] * VLAN 0 is special (for QoS actually, not a real VLAN) * Some components in the stack accidentally strip it, so does ipxe in this case. * Fix by porting a fix that is carried by other distributions as upstream didn't follow the suggestion but it is needed for the use case affected by the bug here (Thanks Andres) [Test Case] * TODO [Regression Potential] * The only reference to VLAN tags on iPXE boot that we found was on iBFT boot for SCSI, we tested that in comment #34 and it still worked fine. * We didn't see such cases on review, but there might be use cases that made some unexpected use of the headers which are now stripped. But that seems wrong. [Other Info] * n/a --- I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging. Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged. The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes. I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable messages in the thread: http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004916.html http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html Would it be possible to install a local patch as part of the FCE deployment? I suspect the patch(es) mentioned in the above thread would require some modification to apply properly. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1805920/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
FYI: virt stack regression tests started but will still take a while. To make it very very clear, this is incomplete until some path to test was provided. Marking the bug that way, waiting on that. Worst case (and only then) describe the test setup that you have on the customer site and volunteer to be willing and able to verify both Bionic and Cosmic on that setup. While writing remember the intention is to make the SRU team feel confident about the change and the checks. ** Changed in: ipxe (Ubuntu Bionic) Status: Triaged => Incomplete ** Changed in: ipxe (Ubuntu Cosmic) Status: Triaged => Incomplete -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Fix Released Status in ipxe-qemu-256k-compat package in Ubuntu: Invalid Status in linux package in Ubuntu: Invalid Status in ipxe source package in Trusty: Won't Fix Status in ipxe source package in Xenial: Won't Fix Status in ipxe source package in Bionic: Incomplete Status in ipxe source package in Cosmic: Incomplete Status in ipxe source package in Disco: Fix Released Bug description: [Impact] * VLAN 0 is special (for QoS actually, not a real VLAN) * Some components in the stack accidentally strip it, so does ipxe in this case. * Fix by porting a fix that is carried by other distributions as upstream didn't follow the suggestion but it is needed for the use case affected by the bug here (Thanks Andres) [Test Case] * TODO [Regression Potential] * The only reference to VLAN tags on iPXE boot that we found was on iBFT boot for SCSI, we tested that in comment #34 and it still worked fine. * We didn't see such cases on review, but there might be use cases that made some unexpected use of the headers which are now stripped. But that seems wrong. [Other Info] * n/a --- I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging. Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged. The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes. I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable messages in the thread: http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004916.html http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html Would it be possible to install a local patch as part of the FCE deployment? I suspect the patch(es) mentioned in the above thread would require some modification to apply properly. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1805920/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
Prepped for Bionic and Cosmic in a PPA [1] for Bileto ticket [2] Depending autopkgtests queued. I'll run usual virtualization regression checks on that over night and into tomorrow. MPs are up for review at [3][4], but since Andeas change applies on all these as-is there isn't much difference. The biggest blocker here si the lack of a more clearly outlined testcase. @Andres/@Vern - can you help to fill the testcase steps in the SRU template? [1]: https://bileto.ubuntu.com/#/ticket/3560 [2]: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/3560/+packages [3]: https://code.launchpad.net/~paelzer/ubuntu/+source/ipxe/+git/ipxe/+merge/360678 [4]: https://code.launchpad.net/~paelzer/ubuntu/+source/ipxe/+git/ipxe/+merge/360679 ** Description changed: + [Impact] + + * VLAN 0 is special (for QoS actually, not a real VLAN) + * Some components in the stack accidentially strip it, so does ipxe in +this case. + * Fix by porting a fix that is carried by other distributions as upstrem +didn't follow the sugegstion but it is needed for the use case affected +by the bug here (Thanks Andres) + + [Test Case] + + * TODO + + [Regression Potential] + + * The onyl refernce to VLAN tags on iPXE boto that we found was on iBFT +boot for SCSI, we tested that in comment #34 and it still worked fine. + * We didn't see such cases on reivew, but there might be use cases that +made some unexpected use of the headers which are now stripped. But +that seems wrong. + + + [Other Info] + + * n/a + + --- + + I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging. Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged. The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes. I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable messages in the thread: http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004916.html http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html Would it be possible to install a local patch as part of the FCE deployment? I suspect the patch(es) mentioned in the above thread would require some modification to apply properly. ** Description changed: [Impact] - * VLAN 0 is special (for QoS actually, not a real VLAN) - * Some components in the stack accidentially strip it, so does ipxe in -this case. - * Fix by porting a fix that is carried by other distributions as upstrem -didn't follow the sugegstion but it is needed for the use case affected -by the bug here (Thanks Andres) + * VLAN 0 is special (for QoS actually, not a real VLAN) + * Some components in the stack accidentally strip it, so does ipxe in + this case. + * Fix by porting a fix that is carried by other distributions as upstream + didn't follow the suggestion but it is needed for the use case affected + by the bug here (Thanks Andres) [Test Case] - * TODO + * TODO [Regression Potential] - * The onyl refernce to VLAN tags on iPXE boto that we found was on iBFT -boot for SCSI, we tested that in comment #34 and it still worked fine. - * We didn't see such cases on reivew, but there might be use cases that -made some unexpected use of the headers which are now stripped. But -that seems wrong. - + * The only reference to VLAN tags on i
[Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
** Merge proposal linked: https://code.launchpad.net/~paelzer/ubuntu/+source/ipxe/+git/ipxe/+merge/360678 ** Merge proposal linked: https://code.launchpad.net/~paelzer/ubuntu/+source/ipxe/+git/ipxe/+merge/360679 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Fix Released Status in ipxe-qemu-256k-compat package in Ubuntu: Invalid Status in linux package in Ubuntu: Invalid Status in ipxe source package in Trusty: Won't Fix Status in ipxe source package in Xenial: Won't Fix Status in ipxe source package in Bionic: Triaged Status in ipxe source package in Cosmic: Triaged Status in ipxe source package in Disco: Fix Released Bug description: I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging. Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged. The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes. I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable messages in the thread: http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004916.html http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html Would it be possible to install a local patch as part of the FCE deployment? I suspect the patch(es) mentioned in the above thread would require some modification to apply properly. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1805920/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
To keep potential regressions even lower I'd for now only consider that for >=Bionic. That also helps as if someone intentionally spawns an old type KVM machine (pre Bionic) on a >=Bionic host we don#t have to care about this too much (machine type, not release runnin IN the guest). That makes us able to ignore ipxe-qemu-256k-compat-efi-roms in regard to this issue. ** Also affects: ipxe-qemu-256k-compat (Ubuntu) Importance: Undecided Status: New ** Changed in: linux (Ubuntu) Status: Confirmed => Invalid ** Also affects: linux (Ubuntu Cosmic) Importance: Undecided Status: New ** Also affects: ipxe (Ubuntu Cosmic) Importance: Undecided Status: New ** Also affects: ipxe-qemu-256k-compat (Ubuntu Cosmic) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Xenial) Importance: Undecided Status: New ** Also affects: ipxe (Ubuntu Xenial) Importance: Undecided Status: New ** Also affects: ipxe-qemu-256k-compat (Ubuntu Xenial) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Disco) Importance: Undecided Status: Invalid ** Also affects: ipxe (Ubuntu Disco) Importance: Undecided Status: Fix Released ** Also affects: ipxe-qemu-256k-compat (Ubuntu Disco) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Trusty) Importance: Undecided Status: New ** Also affects: ipxe (Ubuntu Trusty) Importance: Undecided Status: New ** Also affects: ipxe-qemu-256k-compat (Ubuntu Trusty) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Bionic) Importance: Undecided Status: New ** Also affects: ipxe (Ubuntu Bionic) Importance: Undecided Status: New ** Also affects: ipxe-qemu-256k-compat (Ubuntu Bionic) Importance: Undecided Status: New ** No longer affects: linux (Ubuntu Trusty) ** No longer affects: linux (Ubuntu Xenial) ** No longer affects: linux (Ubuntu Bionic) ** No longer affects: linux (Ubuntu Cosmic) ** No longer affects: linux (Ubuntu Disco) ** Changed in: ipxe-qemu-256k-compat (Ubuntu Trusty) Status: New => Invalid ** Changed in: ipxe-qemu-256k-compat (Ubuntu Xenial) Status: New => Invalid ** No longer affects: ipxe-qemu-256k-compat (Ubuntu Trusty) ** No longer affects: ipxe-qemu-256k-compat (Ubuntu Xenial) ** No longer affects: ipxe-qemu-256k-compat (Ubuntu Bionic) ** No longer affects: ipxe-qemu-256k-compat (Ubuntu Cosmic) ** No longer affects: ipxe-qemu-256k-compat (Ubuntu Disco) ** Changed in: ipxe-qemu-256k-compat (Ubuntu) Status: New => Won't Fix ** Changed in: ipxe (Ubuntu Trusty) Status: New => Won't Fix ** Changed in: ipxe (Ubuntu Xenial) Status: New => Won't Fix ** Changed in: ipxe-qemu-256k-compat (Ubuntu) Status: Won't Fix => Invalid ** Changed in: ipxe (Ubuntu Bionic) Status: New => Triaged ** Changed in: ipxe (Ubuntu Cosmic) Status: New => Triaged -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Fix Released Status in ipxe-qemu-256k-compat package in Ubuntu: Invalid Status in linux package in Ubuntu: Invalid Status in ipxe source package in Trusty: Won't Fix Status in ipxe source package in Xenial: Won't Fix Status in ipxe source package in Bionic: Triaged Status in ipxe source package in Cosmic: Triaged Status in ipxe source package in Disco: Fix Released Bug description: I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just
[Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
This bug was fixed in the package ipxe - 1.0.0+git-20180124.fbe8c52d- 0ubuntu5 --- ipxe (1.0.0+git-20180124.fbe8c52d-0ubuntu5) disco; urgency=medium * d/p/0005-strip-802.1Q-VLAN-0-priority-tags.patch: Strip 802.1Q VLAN 0 priority tags; Fixes PXE when VLAN tag is 0. (LP: #1805920) -- Andres Rodriguez Mon, 10 Dec 2018 16:26:42 -0500 ** Changed in: ipxe (Ubuntu) Status: Confirmed => Fix Released -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Fix Released Status in linux package in Ubuntu: Confirmed Bug description: I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging. Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged. The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes. I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable messages in the thread: http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004916.html http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html Would it be possible to install a local patch as part of the FCE deployment? I suspect the patch(es) mentioned in the above thread would require some modification to apply properly. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1805920/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
[1] seems reasonable, I'll give it a try with and without the PPA of Andres. It needs a slight modification, to not conflict with the default portal. Install libvirt with all else it usually brings (for the bridge and dhcp on the bridge): $ sudo install libvirt-daemon-system So use these commands: $ curl -O http://download.cirros-cloud.net/0.3.4/cirros-0.3.4-x86_64-disk.img $ qemu-img convert -O raw cirros-0.3.4-x86_64-disk.img cirros.raw $ sudo targetcli /backstores/fileio/ create cirros $PWD/cirros.raw 100M false $ sudo targetcli /iscsi create iqn.2016-01.com.example:cirros $ sudo targetcli /iscsi/iqn.2016-01.com.example:cirros/tpg1/luns create /backstores/fileio/cirros $ sudo targetcli /iscsi/iqn.2016-01.com.example:cirros/tpg1/portals delete 0.0.0.0 ip_port=3260 $ sudo targetcli /iscsi/iqn.2016-01.com.example:cirros/tpg1/portals create 192.168.122.1 If you do that you'll end up with a targetcli config like this: $ sudo targetcli targetcli shell version 2.1.fb43 Copyright 2011-2013 by Datera, Inc and others. For help on commands, type 'help'. /> ls o- / . [...] o- backstores .. [...] | o- block .. [Storage Objects: 0] | o- fileio . [Storage Objects: 1] | | o- cirros ... [/home/ubuntu/cirros.raw (39.2MiB) write-thru activated] | o- pscsi .. [Storage Objects: 0] | o- ramdisk [Storage Objects: 0] o- iscsi [Targets: 1] | o- iqn.2016-01.com.example:cirros [TPGs: 1] | o- tpg1 .. [gen-acls, no-auth] | o- acls .. [ACLs: 0] | o- luns .. [LUNs: 1] | | o- lun0 [fileio/cirros (/home/ubuntu/cirros.raw)] | o- portals [Portals: 1] | o- 192.168.122.1:3260 ... [OK] o- loopback . [Targets: 0] o- srpt . [Targets: 0] o- vhost [Targets: 0] Do that and then on qemu start attach to the console early. To get that easier, instead of VNC use a local curses console with: $ sudo qemu-system-x86_64 -smp cpus=2 -curses -boot order=n -netdev bridge,br=virbr0,id=virtio0 -device virtio-net-pci,netdev=virtio Hit CTRL+B early on boot for ipxe commands With out virtbr0 default setup having the host on 192.168.122.1 that would be iPXE> ifopen net0 iPXE> dhcp iPXE> sanboot iscsi:192.168.122.1iqn.2016-01.com.example:cirros Retested with 1.0.0+git-20180124.fbe8c52d-0ubuntu2.2~18.04.1 from the PPA. Boot just as much - so while no perfect test (what if that would be in a VLAN tagged network?) it is better than nothing. That said - together with all that was discussed before - I think Andres could go on uploading it to Disco. For the SRUs we will need some extra for [2], but one thing at a time. Or is the assumption that I drive it from here and you only do verifications on the case? [1]: https://medium.com/oracledevs/kvm-iscsi-part-i-iscsi-boot-with-ipxe-f533f2666075 [2]: https://packages.ubuntu.com/bionic/ipxe-qemu-256k-compat-efi-roms -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Confirmed Status in linux package in Ubuntu: Confirmed Bug description: I have three MAAS rack/r
Re: [Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
On Fri, Dec 7, 2018 at 8:21 PM Mike Pontillo wrote: ... > So iPXE's iSCSI functionality wouldn't be exercised in this case. So MAAS itself is no good test for that, thanks Mike for the clarification! So the question is does anyone have a iscsi IPXE case for the last bit of confidence for Andreas upload to take place. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Confirmed Status in linux package in Ubuntu: Confirmed Bug description: I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging. Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged. The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes. I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable messages in the thread: http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004916.html http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html Would it be possible to install a local patch as part of the FCE deployment? I suspect the patch(es) mentioned in the above thread would require some modification to apply properly. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1805920/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
Sorry for the confusion. To be clear, the idea of using MAAS on Xenial was in order to test if the newly-modified iPXE (on Bionic) can support iSCSI boot. But come to think of it, I don't think that's a good test. If I remember correctly, MAAS used TFTP to transfer the kernel and initrd, /then/ iSCSI was used in order to mount the rootfs. So iPXE's iSCSI functionality wouldn't be exercised in this case. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Confirmed Status in linux package in Ubuntu: Confirmed Bug description: I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging. Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged. The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes. I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable messages in the thread: http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004916.html http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html Would it be possible to install a local patch as part of the FCE deployment? I suspect the patch(es) mentioned in the above thread would require some modification to apply properly. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1805920/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
Just to clarify the above statements as it has been source of confusion. MAAS 2.3+ (which is the latest available in Xenial), no longer uses nor supports iSCSI. While the option to fallback to old behavior does exist, it is not enabled by default, its obscured and, given that is not supported, it is to be used at users risk. That said, I'm not sure whether this change should be backported all the way to Xenial. It would seem to be that it should be backported to Bionic only. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Confirmed Status in linux package in Ubuntu: Confirmed Bug description: I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging. Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged. The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes. I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable messages in the thread: http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004916.html http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html Would it be possible to install a local patch as part of the FCE deployment? I suspect the patch(es) mentioned in the above thread would require some modification to apply properly. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1805920/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
Yes, MAAS 2.3 (the last revision of MAAS supported on Xenial) can support iSCSI when it is placed in backward compatibility mode; how to do this is documented in the changelog as follows: maas $PROFILE maas set-config name=http_boot value=False MAAS 2.2 and earlier (no longer supported) used iSCSI by default. So you could test this on any version of MAAS on Xenial. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Confirmed Status in linux package in Ubuntu: Confirmed Bug description: I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging. Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged. The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes. I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable messages in the thread: http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004916.html http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html Would it be possible to install a local patch as part of the FCE deployment? I suspect the patch(es) mentioned in the above thread would require some modification to apply properly. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1805920/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
Re: [Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
> Unfortunately, a modern MAAS no longer uses iSCSI; it will fetch all > necessary data over HTTP. Yeah that matches what I heard, used in the past but no more. But you certainly still have the best knowledge how to set it up from the past when it did. Checking that pre/post an upgrade to that PPA would be a great verification to avoid regressions. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Confirmed Status in linux package in Ubuntu: Confirmed Bug description: I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging. Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged. The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes. I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable messages in the thread: http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004916.html http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html Would it be possible to install a local patch as part of the FCE deployment? I suspect the patch(es) mentioned in the above thread would require some modification to apply properly. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1805920/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
Unfortunately, a modern MAAS no longer uses iSCSI; it will fetch all necessary data over HTTP. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Confirmed Status in linux package in Ubuntu: Confirmed Bug description: I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging. Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged. The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes. I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable messages in the thread: http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004916.html http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html Would it be possible to install a local patch as part of the FCE deployment? I suspect the patch(es) mentioned in the above thread would require some modification to apply properly. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1805920/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
I'm glad that the ipxe in the PPA seems to make it work. I now read the discussions and all questions that came up for me while doing so were asked and clarified already in later comments. Therefore I just reviewed the proposed change and it looks good to me (other than the version string but that was just for the PPA, so that is ok). Only one question to be sure: I was only wondering if this might trigger any issues in iscsi booting since the change in src/net/netdevice.c adds the stripping to the generic net_poll. Now the (old) commit [1] reads as that would be required to be set. I wonder if there would be any regression in that regard. I remember words iSCSI+Mass being used together, but I'm unsure if the stack these days still uses it. When Vern confirmed that he could deploy with the modified ipxe, did that include a iSCSI boot? If not could one of you just double-check that iSCSI boot didn't regress due to this change? [1]: https://git.ipxe.org/ipxe.git/commit/7d64abbc5d0b5dfe4810883f372b905a359f2697 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Confirmed Status in linux package in Ubuntu: Confirmed Bug description: I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging. Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged. The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes. I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable messages in the thread: http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004916.html http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html Would it be possible to install a local patch as part of the FCE deployment? I suspect the patch(es) mentioned in the above thread would require some modification to apply properly. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1805920/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
FWIW, I patched ipxe with the CentOS patch as a test, which Vern was going to test: https://launchpadlibrarian.net/400355423/ipxe_1.0.0+git-20180124.fbe8c52d-0ubuntu2_1.0.0+git-20180124.fbe8c52d-0ubuntu2.2~18.04.1.diff.gz -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Confirmed Status in linux package in Ubuntu: Confirmed Bug description: I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging. Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged. The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes. I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable messages in the thread: http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004916.html http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html Would it be possible to install a local patch as part of the FCE deployment? I suspect the patch(es) mentioned in the above thread would require some modification to apply properly. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1805920/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
Success. I installed ipxe-qemu from andreserl's ppa and was able to PXE boot a Pod VM from the infra node that wasn't running dhcpd. # add-apt-repository ppa:andreserl/maas # apt update # apt install ipxe-qemu # virsh list --all # virsh start elastic-3 I watched the console of the VM and it succeeded to get DHCP and PXE boot. Subsequently, I used MAAS to deploy to VMs on all three infra nodes, just to be sure. All succeeded. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Confirmed Status in linux package in Ubuntu: Confirmed Bug description: I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging. Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged. The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes. I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable messages in the thread: http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004916.html http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html Would it be possible to install a local patch as part of the FCE deployment? I suspect the patch(es) mentioned in the above thread would require some modification to apply properly. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1805920/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
Setting to 'Confirmed' in the kernel, although it's not clear what the actual fix would entail. It would certainly be nice to be able to tell a Linux virtual bridge to transparently strip off priority tags before L2 forwarding occurs. That would prevent the issue with iPXE. ** Changed in: linux (Ubuntu) Status: Incomplete => Confirmed ** Changed in: ipxe (Ubuntu) Status: New => Confirmed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: Confirmed Status in linux package in Ubuntu: Confirmed Bug description: I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging. Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged. The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes. I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable messages in the thread: http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004916.html http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html Would it be possible to install a local patch as part of the FCE deployment? I suspect the patch(es) mentioned in the above thread would require some modification to apply properly. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1805920/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1805920] Re: iPXE ignores vlan 0 traffic
Seems other users in other distros experiencing the same, and a kernel update fixes the issues? https://serverfault.com/questions/497391/ ** Also affects: linux (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1805920 Title: iPXE ignores vlan 0 traffic Status in MAAS: Invalid Status in ipxe package in Ubuntu: New Status in linux package in Ubuntu: Incomplete Bug description: I have three MAAS rack/region nodes which are blades in a Cisco UCS chassis. This is an FCE deployment where MAAS has two DHCP servers, infra1 is the primary and infra3 is the secondary. The pod VMs on infra1 and infra3 PXE boot fine but the pod VMs on infra2 fail to PXE boot. If I reconfigure the subnet to provide DHCP on infra2 (either as primary or secondary) then the pod VMs on infra2 will PXE boot but the pod VMs on the demoted infra node (that no longer serves DHCP) now fail to PXE boot. While commissioning a pod VM on infra2 I captured network traffic with tcpdump on the vnet interface. Here is the dump when the PXE boot fails (no dhcp server on infra2): https://pastebin.canonical.com/p/THW2gTSv4S/ Here is the dump when PXE boot succeeds (when infra2 is serving dhcp): https://pastebin.canonical.com/p/HH3XvZtTGG/ The only difference I can see is that in the unsuccessful scenario, the reply is an 802.1q packet -- it's got a vlan tag for vlan 0. Normally vlan 0 traffic is passed as if it is not tagged and indeed, I can ping between the blades with no problem. Outgoing packets are untagged but incoming packets are tagged vlan 0 -- but the ping works. It seems vlan 0 is used as a part of 802.1p to set priority of packets. This is separate from vlan, it just happens to use that ethertype to do the priority tagging. Someone confirmed to me that, in the iPXE source, it drops all packets if they are vlan tagged. The customer is unable to figure out why the packets between blades is getting vlan tagged so we either need to figure out how to allow iPXE to accept vlan 0 or the customer will need to use different equipment for the MAAS nodes. I found a conversation on the ipxe-devel mailing list that suggested a commit was submitted and signed off but that was from 2016 so I'm not sure what became of it. Notable messages in the thread: http://lists.ipxe.org/pipermail/ipxe-devel/2016-April/004916.html http://lists.ipxe.org/pipermail/ipxe-devel/2016-July/005099.html Would it be possible to install a local patch as part of the FCE deployment? I suspect the patch(es) mentioned in the above thread would require some modification to apply properly. To manage notifications about this bug go to: https://bugs.launchpad.net/maas/+bug/1805920/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp