Re: [PATCH v7 0/4] New Microsemi PCI Switch Management Driver
This is great! Thanks Bjorn and Logan for getting this drivers upstreamed! -Wei -- wei zhang | software engineer | facebook wzh...@fb.com | (408) 460-4803 On 3/9/17, 11:56 AM, "Logan Gunthorpe" <log...@deltatee.com> wrote: On 09/03/17 12:55 PM, Bjorn Helgaas wrote: > Applied to pci/switchtec for v4.12, thanks! Great! Thanks so much! Logan
Re: [PATCH v7 0/4] New Microsemi PCI Switch Management Driver
This is great! Thanks Bjorn and Logan for getting this drivers upstreamed! -Wei -- wei zhang | software engineer | facebook wzh...@fb.com | (408) 460-4803 On 3/9/17, 11:56 AM, "Logan Gunthorpe" wrote: On 09/03/17 12:55 PM, Bjorn Helgaas wrote: > Applied to pci/switchtec for v4.12, thanks! Great! Thanks so much! Logan
Re: [PATCH v2 0/4] New Microsemi PCI Switch Management Driver
Corrections to the email below. It should be Tested-by: Krishna Dhulipala krish...@fb.com Reviewed-by: Wei Zhang wzh...@fb.com Thanks, -Wei -- wei zhang | software engineer | facebook wzh...@fb.com | (408) 460-4803 On 2/9/17, 3:16 PM, "Wei Zhang" <wzh...@fb.com> wrote: Hi, The switchtec driver patches [PATCH v2 (0-4)/4] in conjunction with the switchtec userland tool is used to communicate with the Microsemi 8536 PCIe Switch used on Facebook’s Lightning platform. The following essential driver and tool functions were successfully tested with it: · Retrieval of firmware and configuration information along with CRCs · Switch firmware and configuration upgrades · Switch PHY/Link error counter collection and the ability to reset them · Switch upstream and downstream ports’ link status reporting · Switch interface functioning · Retrieval of switch ASIC temperature · Exporting switch firmware log dump · Read the information of firmware and configuration binaries · Extract the firmware and configuration images stored in the switch EEPROM Tested-by: Krishna Dhulipala krish...@fb.com Reviewed-by: Wei Zhang wzh...@fb.com Thanks, -Wei -- wei zhang | software engineer | facebook wzh...@fb.com | (408) 460-4803 On 2/2/17, 10:05 AM, "Logan Gunthorpe" <log...@deltatee.com> wrote: Changes since v1: * Rebased onto 4.10-rc6 (cleanly) * Split the patch into a few more easily digestible patches (as suggested by Greg Kroah-Hartman) * Folded switchtec.c into switchtec.h (per Greg) * Fixed a bunch of 32bit build warnings caught by the kbuild test robot * Fixed some issues in the documentation so it has a proper reStructredText format (as noted by Jonathan Corbet) * Fixed padding and sizes in the IOCTL structures as noticed by Emil Velikov and used pahole to verify their consistency across 32 and 64 bit builds * Reworked one of the IOCTL interfaces to be more future proof (per Emil). Changes since RFC: * Fixed incorrect use of the drive model as pointed out by Greg Kroah-Hartman * Used devm functions as suggested by Keith Busch * Added a handful of sysfs attributes to the switchtec class * Added a handful of IOCTLs to the switchtec device * A number of miscellaneous bug fixes -- Hi, This is a continuation of the RFC we posted lasted month [1] which proposes a management driver for Microsemi's Switchtec line of PCI switches. This hardware is still looking to be used in the Open Compute Platform To make this entirely clear: the Switchtec products are compliant with the PCI specifications and are supported today with the standard in-kernel driver. However, these devices also expose a management endpoint on a separate PCI function address which can be used to perform some advanced operations. This is a driver for that function. See the patch for more information. Since the RFC, we've made the changes requested by Greg Kroah-Hartman and Keith Busch, and we've also fleshed out a number of features. We've added a couple of IOCTLs and sysfs attributes which are documented in the patch. Significant work has also been done on the userspace tool which is available under a GPL license at [2]. We've also had testing done by some of the interested parties. We hope to see this work included in either 4.11 or 4.12 assuming a smooth review process. The patch is based off of the v4.10-rc6 release. Thanks for your review, Logan [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.net_lists_linux-2Dpci_msg56897.html=DwIBAg=5VD0RTtNlTh3ycd41b3MUw=LRFoLl92zWj5mkgkc_hRcg=VLDBJqzotzGkTj8-xjlfT-J0k2uFq6FcWg2nA_oKYJo=OkigHoSqH1Z3dnmLqN76lIQ_WxRJDj1uqIDl35SI58A= [2] https://github.com/sbates130272/switchtec-user -- Logan Gunthorpe (4): MicroSemi Switchtec management interface driver switchtec: Add user interface documentation switchtec: Add sysfs attributes to the Switchtec driver switchtec: Add IOCTLs to the Switchtec driver Documentation/ABI/testing/sysfs-class-switchtec | 96 ++ Documentation/ioctl/ioctl-number.txt|1 + Documentation/switchtec.txt | 80 ++ MAINTAINERS | 11 + drivers/pci/Kconfig
Re: [PATCH v2 0/4] New Microsemi PCI Switch Management Driver
Corrections to the email below. It should be Tested-by: Krishna Dhulipala krish...@fb.com Reviewed-by: Wei Zhang wzh...@fb.com Thanks, -Wei -- wei zhang | software engineer | facebook wzh...@fb.com | (408) 460-4803 On 2/9/17, 3:16 PM, "Wei Zhang" wrote: Hi, The switchtec driver patches [PATCH v2 (0-4)/4] in conjunction with the switchtec userland tool is used to communicate with the Microsemi 8536 PCIe Switch used on Facebook’s Lightning platform. The following essential driver and tool functions were successfully tested with it: · Retrieval of firmware and configuration information along with CRCs · Switch firmware and configuration upgrades · Switch PHY/Link error counter collection and the ability to reset them · Switch upstream and downstream ports’ link status reporting · Switch interface functioning · Retrieval of switch ASIC temperature · Exporting switch firmware log dump · Read the information of firmware and configuration binaries · Extract the firmware and configuration images stored in the switch EEPROM Tested-by: Krishna Dhulipala krish...@fb.com Reviewed-by: Wei Zhang wzh...@fb.com Thanks, -Wei -- wei zhang | software engineer | facebook wzh...@fb.com | (408) 460-4803 On 2/2/17, 10:05 AM, "Logan Gunthorpe" wrote: Changes since v1: * Rebased onto 4.10-rc6 (cleanly) * Split the patch into a few more easily digestible patches (as suggested by Greg Kroah-Hartman) * Folded switchtec.c into switchtec.h (per Greg) * Fixed a bunch of 32bit build warnings caught by the kbuild test robot * Fixed some issues in the documentation so it has a proper reStructredText format (as noted by Jonathan Corbet) * Fixed padding and sizes in the IOCTL structures as noticed by Emil Velikov and used pahole to verify their consistency across 32 and 64 bit builds * Reworked one of the IOCTL interfaces to be more future proof (per Emil). Changes since RFC: * Fixed incorrect use of the drive model as pointed out by Greg Kroah-Hartman * Used devm functions as suggested by Keith Busch * Added a handful of sysfs attributes to the switchtec class * Added a handful of IOCTLs to the switchtec device * A number of miscellaneous bug fixes -- Hi, This is a continuation of the RFC we posted lasted month [1] which proposes a management driver for Microsemi's Switchtec line of PCI switches. This hardware is still looking to be used in the Open Compute Platform To make this entirely clear: the Switchtec products are compliant with the PCI specifications and are supported today with the standard in-kernel driver. However, these devices also expose a management endpoint on a separate PCI function address which can be used to perform some advanced operations. This is a driver for that function. See the patch for more information. Since the RFC, we've made the changes requested by Greg Kroah-Hartman and Keith Busch, and we've also fleshed out a number of features. We've added a couple of IOCTLs and sysfs attributes which are documented in the patch. Significant work has also been done on the userspace tool which is available under a GPL license at [2]. We've also had testing done by some of the interested parties. We hope to see this work included in either 4.11 or 4.12 assuming a smooth review process. The patch is based off of the v4.10-rc6 release. Thanks for your review, Logan [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.net_lists_linux-2Dpci_msg56897.html=DwIBAg=5VD0RTtNlTh3ycd41b3MUw=LRFoLl92zWj5mkgkc_hRcg=VLDBJqzotzGkTj8-xjlfT-J0k2uFq6FcWg2nA_oKYJo=OkigHoSqH1Z3dnmLqN76lIQ_WxRJDj1uqIDl35SI58A= [2] https://github.com/sbates130272/switchtec-user -- Logan Gunthorpe (4): MicroSemi Switchtec management interface driver switchtec: Add user interface documentation switchtec: Add sysfs attributes to the Switchtec driver switchtec: Add IOCTLs to the Switchtec driver Documentation/ABI/testing/sysfs-class-switchtec | 96 ++ Documentation/ioctl/ioctl-number.txt|1 + Documentation/switchtec.txt | 80 ++ MAINTAINERS | 11 + drivers/pci/Kconfig |1 +
Re: [PATCH v2 0/4] New Microsemi PCI Switch Management Driver
Hi, The switchtec driver patches [PATCH v2 (0-4)/4] in conjunction with the switchtec userland tool is used to communicate with the Microsemi 8536 PCIe Switch used on Facebook’s Lightning platform. The following essential driver and tool functions were successfully tested with it: · Retrieval of firmware and configuration information along with CRCs · Switch firmware and configuration upgrades · Switch PHY/Link error counter collection and the ability to reset them · Switch upstream and downstream ports’ link status reporting · Switch interface functioning · Retrieval of switch ASIC temperature · Exporting switch firmware log dump · Read the information of firmware and configuration binaries · Extract the firmware and configuration images stored in the switch EEPROM Tested-by: Krishna Dhulipala krish...@fb.com Tested-by: Wei Zhang wzh...@fb.com Thanks, -Wei -- wei zhang | software engineer | facebook wzh...@fb.com | (408) 460-4803 On 2/2/17, 10:05 AM, "Logan Gunthorpe" <log...@deltatee.com> wrote: Changes since v1: * Rebased onto 4.10-rc6 (cleanly) * Split the patch into a few more easily digestible patches (as suggested by Greg Kroah-Hartman) * Folded switchtec.c into switchtec.h (per Greg) * Fixed a bunch of 32bit build warnings caught by the kbuild test robot * Fixed some issues in the documentation so it has a proper reStructredText format (as noted by Jonathan Corbet) * Fixed padding and sizes in the IOCTL structures as noticed by Emil Velikov and used pahole to verify their consistency across 32 and 64 bit builds * Reworked one of the IOCTL interfaces to be more future proof (per Emil). Changes since RFC: * Fixed incorrect use of the drive model as pointed out by Greg Kroah-Hartman * Used devm functions as suggested by Keith Busch * Added a handful of sysfs attributes to the switchtec class * Added a handful of IOCTLs to the switchtec device * A number of miscellaneous bug fixes -- Hi, This is a continuation of the RFC we posted lasted month [1] which proposes a management driver for Microsemi's Switchtec line of PCI switches. This hardware is still looking to be used in the Open Compute Platform To make this entirely clear: the Switchtec products are compliant with the PCI specifications and are supported today with the standard in-kernel driver. However, these devices also expose a management endpoint on a separate PCI function address which can be used to perform some advanced operations. This is a driver for that function. See the patch for more information. Since the RFC, we've made the changes requested by Greg Kroah-Hartman and Keith Busch, and we've also fleshed out a number of features. We've added a couple of IOCTLs and sysfs attributes which are documented in the patch. Significant work has also been done on the userspace tool which is available under a GPL license at [2]. We've also had testing done by some of the interested parties. We hope to see this work included in either 4.11 or 4.12 assuming a smooth review process. The patch is based off of the v4.10-rc6 release. Thanks for your review, Logan [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.net_lists_linux-2Dpci_msg56897.html=DwIBAg=5VD0RTtNlTh3ycd41b3MUw=LRFoLl92zWj5mkgkc_hRcg=VLDBJqzotzGkTj8-xjlfT-J0k2uFq6FcWg2nA_oKYJo=OkigHoSqH1Z3dnmLqN76lIQ_WxRJDj1uqIDl35SI58A= [2] https://github.com/sbates130272/switchtec-user -- Logan Gunthorpe (4): MicroSemi Switchtec management interface driver switchtec: Add user interface documentation switchtec: Add sysfs attributes to the Switchtec driver switchtec: Add IOCTLs to the Switchtec driver Documentation/ABI/testing/sysfs-class-switchtec | 96 ++ Documentation/ioctl/ioctl-number.txt|1 + Documentation/switchtec.txt | 80 ++ MAINTAINERS | 11 + drivers/pci/Kconfig |1 + drivers/pci/Makefile|1 + drivers/pci/switch/Kconfig | 13 + drivers/pci/switch/Makefile |1 + drivers/pci/switch/switchtec.c | 1608 +++ include/uapi/linux/switchtec_ioctl.h| 132 ++ 10 files changed, 1944 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-class-switchtec create mode 100644 Documentation/switchtec.txt create mode 100644 drivers/pci/switch/Kconfig create mode 100644 drivers/pci/switch/Makefile create mode 100644 drivers/pci/switch/switchtec.c create mode 100644 in
Re: [PATCH v2 0/4] New Microsemi PCI Switch Management Driver
Hi, The switchtec driver patches [PATCH v2 (0-4)/4] in conjunction with the switchtec userland tool is used to communicate with the Microsemi 8536 PCIe Switch used on Facebook’s Lightning platform. The following essential driver and tool functions were successfully tested with it: · Retrieval of firmware and configuration information along with CRCs · Switch firmware and configuration upgrades · Switch PHY/Link error counter collection and the ability to reset them · Switch upstream and downstream ports’ link status reporting · Switch interface functioning · Retrieval of switch ASIC temperature · Exporting switch firmware log dump · Read the information of firmware and configuration binaries · Extract the firmware and configuration images stored in the switch EEPROM Tested-by: Krishna Dhulipala krish...@fb.com Tested-by: Wei Zhang wzh...@fb.com Thanks, -Wei -- wei zhang | software engineer | facebook wzh...@fb.com | (408) 460-4803 On 2/2/17, 10:05 AM, "Logan Gunthorpe" wrote: Changes since v1: * Rebased onto 4.10-rc6 (cleanly) * Split the patch into a few more easily digestible patches (as suggested by Greg Kroah-Hartman) * Folded switchtec.c into switchtec.h (per Greg) * Fixed a bunch of 32bit build warnings caught by the kbuild test robot * Fixed some issues in the documentation so it has a proper reStructredText format (as noted by Jonathan Corbet) * Fixed padding and sizes in the IOCTL structures as noticed by Emil Velikov and used pahole to verify their consistency across 32 and 64 bit builds * Reworked one of the IOCTL interfaces to be more future proof (per Emil). Changes since RFC: * Fixed incorrect use of the drive model as pointed out by Greg Kroah-Hartman * Used devm functions as suggested by Keith Busch * Added a handful of sysfs attributes to the switchtec class * Added a handful of IOCTLs to the switchtec device * A number of miscellaneous bug fixes -- Hi, This is a continuation of the RFC we posted lasted month [1] which proposes a management driver for Microsemi's Switchtec line of PCI switches. This hardware is still looking to be used in the Open Compute Platform To make this entirely clear: the Switchtec products are compliant with the PCI specifications and are supported today with the standard in-kernel driver. However, these devices also expose a management endpoint on a separate PCI function address which can be used to perform some advanced operations. This is a driver for that function. See the patch for more information. Since the RFC, we've made the changes requested by Greg Kroah-Hartman and Keith Busch, and we've also fleshed out a number of features. We've added a couple of IOCTLs and sysfs attributes which are documented in the patch. Significant work has also been done on the userspace tool which is available under a GPL license at [2]. We've also had testing done by some of the interested parties. We hope to see this work included in either 4.11 or 4.12 assuming a smooth review process. The patch is based off of the v4.10-rc6 release. Thanks for your review, Logan [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.net_lists_linux-2Dpci_msg56897.html=DwIBAg=5VD0RTtNlTh3ycd41b3MUw=LRFoLl92zWj5mkgkc_hRcg=VLDBJqzotzGkTj8-xjlfT-J0k2uFq6FcWg2nA_oKYJo=OkigHoSqH1Z3dnmLqN76lIQ_WxRJDj1uqIDl35SI58A= [2] https://github.com/sbates130272/switchtec-user -- Logan Gunthorpe (4): MicroSemi Switchtec management interface driver switchtec: Add user interface documentation switchtec: Add sysfs attributes to the Switchtec driver switchtec: Add IOCTLs to the Switchtec driver Documentation/ABI/testing/sysfs-class-switchtec | 96 ++ Documentation/ioctl/ioctl-number.txt|1 + Documentation/switchtec.txt | 80 ++ MAINTAINERS | 11 + drivers/pci/Kconfig |1 + drivers/pci/Makefile|1 + drivers/pci/switch/Kconfig | 13 + drivers/pci/switch/Makefile |1 + drivers/pci/switch/switchtec.c | 1608 +++ include/uapi/linux/switchtec_ioctl.h| 132 ++ 10 files changed, 1944 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-class-switchtec create mode 100644 Documentation/switchtec.txt create mode 100644 drivers/pci/switch/Kconfig create mode 100644 drivers/pci/switch/Makefile create mode 100644 drivers/pci/switch/switchtec.c create mode 100644 include/uapi/linux/switcht
[PATCH v2] net: fix incorrect original ingress device index in PKTINFO
When we send a packet for our own local address on a non-loopback interface (e.g. eth0), due to the change had been introduced from commit 0b922b7a829c ("net: original ingress device index in PKTINFO"), the original ingress device index would be set as the loopback interface. However, the packet should be considered as if it is being arrived via the sending interface (eth0), otherwise it would break the expectation of the userspace application (e.g. the DHCPRELEASE message from dhcp_release binary would be ignored by the dnsmasq daemon, since it come from lo which is not the interface dnsmasq bind to) Fixes: 0b922b7a829c ("net: original ingress device index in PKTINFO") Acked-by: David Ahern <d...@cumulusnetworks.com> Signed-off-by: Wei Zhang <asuka@163.com> --- v2: - add the missing Fixes line - better comment come from David Ahern net/ipv4/ip_sockglue.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c index 57e1405..53ae0c6 100644 --- a/net/ipv4/ip_sockglue.c +++ b/net/ipv4/ip_sockglue.c @@ -1225,8 +1225,14 @@ void ipv4_pktinfo_prepare(const struct sock *sk, struct sk_buff *skb) * which has interface index (iif) as the first member of the * underlying inet{6}_skb_parm struct. This code then overlays * PKTINFO_SKB_CB and in_pktinfo also has iif as the first -* element so the iif is picked up from the prior IPCB +* element so the iif is picked up from the prior IPCB. If iif +* is the loopback interface, then return the sending interface +* (e.g., process binds socket to eth0 for Tx which is +* redirected to loopback in the rtable/dst). */ + if (pktinfo->ipi_ifindex == LOOPBACK_IFINDEX) + pktinfo->ipi_ifindex = inet_iif(skb); + pktinfo->ipi_spec_dst.s_addr = fib_compute_spec_dst(skb); } else { pktinfo->ipi_ifindex = 0; -- 1.8.3.1
[PATCH v2] net: fix incorrect original ingress device index in PKTINFO
When we send a packet for our own local address on a non-loopback interface (e.g. eth0), due to the change had been introduced from commit 0b922b7a829c ("net: original ingress device index in PKTINFO"), the original ingress device index would be set as the loopback interface. However, the packet should be considered as if it is being arrived via the sending interface (eth0), otherwise it would break the expectation of the userspace application (e.g. the DHCPRELEASE message from dhcp_release binary would be ignored by the dnsmasq daemon, since it come from lo which is not the interface dnsmasq bind to) Fixes: 0b922b7a829c ("net: original ingress device index in PKTINFO") Acked-by: David Ahern Signed-off-by: Wei Zhang --- v2: - add the missing Fixes line - better comment come from David Ahern net/ipv4/ip_sockglue.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c index 57e1405..53ae0c6 100644 --- a/net/ipv4/ip_sockglue.c +++ b/net/ipv4/ip_sockglue.c @@ -1225,8 +1225,14 @@ void ipv4_pktinfo_prepare(const struct sock *sk, struct sk_buff *skb) * which has interface index (iif) as the first member of the * underlying inet{6}_skb_parm struct. This code then overlays * PKTINFO_SKB_CB and in_pktinfo also has iif as the first -* element so the iif is picked up from the prior IPCB +* element so the iif is picked up from the prior IPCB. If iif +* is the loopback interface, then return the sending interface +* (e.g., process binds socket to eth0 for Tx which is +* redirected to loopback in the rtable/dst). */ + if (pktinfo->ipi_ifindex == LOOPBACK_IFINDEX) + pktinfo->ipi_ifindex = inet_iif(skb); + pktinfo->ipi_spec_dst.s_addr = fib_compute_spec_dst(skb); } else { pktinfo->ipi_ifindex = 0; -- 1.8.3.1
[PATCH] net: fix incorrect original ingress device index in PKTINFO
When we send a packet for our own local address on a non-loopback interface (e.g. eth0), due to the change had been introduced from commit 0b922b7a829c ("net: original ingress device index in PKTINFO"), the original ingress device index would be set as the loopback interface. However, the packet should be considered as if it is being arrived via the sending interface (eth0), otherwise it would break the expectation of the userspace application (e.g. the DHCPRELEASE message from dhcp_release binary would be ignored by the dnsmasq daemon, since it come from lo which is not the interface dnsmasq bind to) Signed-off-by: Wei Zhang <asuka@163.com> --- net/ipv4/ip_sockglue.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c index b8a2d63..76d78a7 100644 --- a/net/ipv4/ip_sockglue.c +++ b/net/ipv4/ip_sockglue.c @@ -1202,8 +1202,14 @@ void ipv4_pktinfo_prepare(const struct sock *sk, struct sk_buff *skb) * which has interface index (iif) as the first member of the * underlying inet{6}_skb_parm struct. This code then overlays * PKTINFO_SKB_CB and in_pktinfo also has iif as the first -* element so the iif is picked up from the prior IPCB +* element so the iif is picked up from the prior IPCB except +* iif is loopback interface which the packet should be +* considered as if it is being arrived via the sending +* interface */ + if (pktinfo->ipi_ifindex == LOOPBACK_IFINDEX) + pktinfo->ipi_ifindex = inet_iif(skb); + pktinfo->ipi_spec_dst.s_addr = fib_compute_spec_dst(skb); } else { pktinfo->ipi_ifindex = 0; -- 1.8.3.1
[PATCH] net: fix incorrect original ingress device index in PKTINFO
When we send a packet for our own local address on a non-loopback interface (e.g. eth0), due to the change had been introduced from commit 0b922b7a829c ("net: original ingress device index in PKTINFO"), the original ingress device index would be set as the loopback interface. However, the packet should be considered as if it is being arrived via the sending interface (eth0), otherwise it would break the expectation of the userspace application (e.g. the DHCPRELEASE message from dhcp_release binary would be ignored by the dnsmasq daemon, since it come from lo which is not the interface dnsmasq bind to) Signed-off-by: Wei Zhang --- net/ipv4/ip_sockglue.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c index b8a2d63..76d78a7 100644 --- a/net/ipv4/ip_sockglue.c +++ b/net/ipv4/ip_sockglue.c @@ -1202,8 +1202,14 @@ void ipv4_pktinfo_prepare(const struct sock *sk, struct sk_buff *skb) * which has interface index (iif) as the first member of the * underlying inet{6}_skb_parm struct. This code then overlays * PKTINFO_SKB_CB and in_pktinfo also has iif as the first -* element so the iif is picked up from the prior IPCB +* element so the iif is picked up from the prior IPCB except +* iif is loopback interface which the packet should be +* considered as if it is being arrived via the sending +* interface */ + if (pktinfo->ipi_ifindex == LOOPBACK_IFINDEX) + pktinfo->ipi_ifindex = inet_iif(skb); + pktinfo->ipi_spec_dst.s_addr = fib_compute_spec_dst(skb); } else { pktinfo->ipi_ifindex = 0; -- 1.8.3.1
Re:[PATCH] net: fix incorrect original ingress device index in PKTINFO
At 2016-12-27 15:52:18, "Wei Zhang" <asuka@163.com> wrote: >When we send a packet for our own local address on a non-loopback interface >(e.g. eth0), due to the change had been introduced from commit 0b922b7a829c >("net: original ingress device index in PKTINFO"), the original ingress >device index would be set as the loopback interface. However, the packet >should be considered as if it is being arrived via the sending interface >(eth0), otherwise it would break the expectation of the userspace >application (e.g. the DHCPRELEASE message from dhcp_release binary would >be ignored by the dnsmasq daemon) > >Signed-off-by: Wei Zhang <asuka@163.com> >--- > net/ipv4/ip_sockglue.c | 7 ++- > 1 file changed, 6 insertions(+), 1 deletion(-) > >diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c >index b8a2d63..b6a6d35 100644 >--- a/net/ipv4/ip_sockglue.c >+++ b/net/ipv4/ip_sockglue.c >@@ -1202,8 +1202,13 @@ void ipv4_pktinfo_prepare(const struct sock *sk, struct >sk_buff *skb) >* which has interface index (iif) as the first member of the >* underlying inet{6}_skb_parm struct. This code then overlays >* PKTINFO_SKB_CB and in_pktinfo also has iif as the first >- * element so the iif is picked up from the prior IPCB >+ * element so the iif is picked up from the prior IPCB except >+ * iif is loopback interface which the packet should be >+ * considered as if it is being arrived via the sending >interface >*/ >+ if (pktinfo->ipi_ifindex == LOOPBACK_IFINDEX) { >+ pktinfo->ipi_ifindex = inet_iif(skb); >+ } > pktinfo->ipi_spec_dst.s_addr = fib_compute_spec_dst(skb); > } else { > pktinfo->ipi_ifindex = 0; >-- >1.8.3.1 > When I upgrade to the 4.9, the dhcp_release could not release the dhcp lease, the dnsmasq ignored all the DHCPRELEASE message which it think come from lo. I think this is due to the commit 0b922b7a829c, which set the IPCB(skb)->iif = skb->skb_iif in the ip_rcv()! And I'm very sorry about forgetting checkpatch, I will resend the patch, hope I'm not bothering you! Thanks, Wei Zhang
Re:[PATCH] net: fix incorrect original ingress device index in PKTINFO
At 2016-12-27 15:52:18, "Wei Zhang" wrote: >When we send a packet for our own local address on a non-loopback interface >(e.g. eth0), due to the change had been introduced from commit 0b922b7a829c >("net: original ingress device index in PKTINFO"), the original ingress >device index would be set as the loopback interface. However, the packet >should be considered as if it is being arrived via the sending interface >(eth0), otherwise it would break the expectation of the userspace >application (e.g. the DHCPRELEASE message from dhcp_release binary would >be ignored by the dnsmasq daemon) > >Signed-off-by: Wei Zhang >--- > net/ipv4/ip_sockglue.c | 7 ++- > 1 file changed, 6 insertions(+), 1 deletion(-) > >diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c >index b8a2d63..b6a6d35 100644 >--- a/net/ipv4/ip_sockglue.c >+++ b/net/ipv4/ip_sockglue.c >@@ -1202,8 +1202,13 @@ void ipv4_pktinfo_prepare(const struct sock *sk, struct >sk_buff *skb) >* which has interface index (iif) as the first member of the >* underlying inet{6}_skb_parm struct. This code then overlays >* PKTINFO_SKB_CB and in_pktinfo also has iif as the first >- * element so the iif is picked up from the prior IPCB >+ * element so the iif is picked up from the prior IPCB except >+ * iif is loopback interface which the packet should be >+ * considered as if it is being arrived via the sending >interface >*/ >+ if (pktinfo->ipi_ifindex == LOOPBACK_IFINDEX) { >+ pktinfo->ipi_ifindex = inet_iif(skb); >+ } > pktinfo->ipi_spec_dst.s_addr = fib_compute_spec_dst(skb); > } else { > pktinfo->ipi_ifindex = 0; >-- >1.8.3.1 > When I upgrade to the 4.9, the dhcp_release could not release the dhcp lease, the dnsmasq ignored all the DHCPRELEASE message which it think come from lo. I think this is due to the commit 0b922b7a829c, which set the IPCB(skb)->iif = skb->skb_iif in the ip_rcv()! And I'm very sorry about forgetting checkpatch, I will resend the patch, hope I'm not bothering you! Thanks, Wei Zhang
[PATCH] net: fix incorrect original ingress device index in PKTINFO
When we send a packet for our own local address on a non-loopback interface (e.g. eth0), due to the change had been introduced from commit 0b922b7a829c ("net: original ingress device index in PKTINFO"), the original ingress device index would be set as the loopback interface. However, the packet should be considered as if it is being arrived via the sending interface (eth0), otherwise it would break the expectation of the userspace application (e.g. the DHCPRELEASE message from dhcp_release binary would be ignored by the dnsmasq daemon) Signed-off-by: Wei Zhang <asuka@163.com> --- net/ipv4/ip_sockglue.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c index b8a2d63..b6a6d35 100644 --- a/net/ipv4/ip_sockglue.c +++ b/net/ipv4/ip_sockglue.c @@ -1202,8 +1202,13 @@ void ipv4_pktinfo_prepare(const struct sock *sk, struct sk_buff *skb) * which has interface index (iif) as the first member of the * underlying inet{6}_skb_parm struct. This code then overlays * PKTINFO_SKB_CB and in_pktinfo also has iif as the first -* element so the iif is picked up from the prior IPCB +* element so the iif is picked up from the prior IPCB except +* iif is loopback interface which the packet should be +* considered as if it is being arrived via the sending interface */ + if (pktinfo->ipi_ifindex == LOOPBACK_IFINDEX) { + pktinfo->ipi_ifindex = inet_iif(skb); + } pktinfo->ipi_spec_dst.s_addr = fib_compute_spec_dst(skb); } else { pktinfo->ipi_ifindex = 0; -- 1.8.3.1
[PATCH] net: fix incorrect original ingress device index in PKTINFO
When we send a packet for our own local address on a non-loopback interface (e.g. eth0), due to the change had been introduced from commit 0b922b7a829c ("net: original ingress device index in PKTINFO"), the original ingress device index would be set as the loopback interface. However, the packet should be considered as if it is being arrived via the sending interface (eth0), otherwise it would break the expectation of the userspace application (e.g. the DHCPRELEASE message from dhcp_release binary would be ignored by the dnsmasq daemon) Signed-off-by: Wei Zhang --- net/ipv4/ip_sockglue.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c index b8a2d63..b6a6d35 100644 --- a/net/ipv4/ip_sockglue.c +++ b/net/ipv4/ip_sockglue.c @@ -1202,8 +1202,13 @@ void ipv4_pktinfo_prepare(const struct sock *sk, struct sk_buff *skb) * which has interface index (iif) as the first member of the * underlying inet{6}_skb_parm struct. This code then overlays * PKTINFO_SKB_CB and in_pktinfo also has iif as the first -* element so the iif is picked up from the prior IPCB +* element so the iif is picked up from the prior IPCB except +* iif is loopback interface which the packet should be +* considered as if it is being arrived via the sending interface */ + if (pktinfo->ipi_ifindex == LOOPBACK_IFINDEX) { + pktinfo->ipi_ifindex = inet_iif(skb); + } pktinfo->ipi_spec_dst.s_addr = fib_compute_spec_dst(skb); } else { pktinfo->ipi_ifindex = 0; -- 1.8.3.1
Re: [PATCH v2] openvswitch: supply a dummy err_handler of gre_cisco_protocol to prevent kernel crash
At 2014-04-05 07:05:59,"Jesse Gross" wrote: >On Tue, Apr 1, 2014 at 5:23 PM, Wei Zhang wrote: >> >> v2 -> v1: use the same logic of the gre_rcv() to distinguish which packet is >> intended to us! > >As a tip on kernel process: if you put the version information after >three dashes below the signed-off-by line then git will automatically >remove it when the final patch is applied. Thanks, should I modify it and send a v3 patch? > >> diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c >> index a3d6951..f391df1 100644 >> --- a/net/openvswitch/vport-gre.c >> +++ b/net/openvswitch/vport-gre.c >> @@ -110,6 +110,21 @@ static int gre_rcv(struct sk_buff *skb, >> return PACKET_RCVD; >> } >> >> +/* Called with rcu_read_lock and BH disabled. */ >> +static int gre_err(struct sk_buff *skb, u32 info, >> + const struct tnl_ptk_info *tpi) >> +{ >> + struct ovs_net *ovs_net; >> + struct vport *vport; >> + >> + ovs_net = net_generic(dev_net(skb->dev), ovs_net_id); >> + vport = rcu_dereference(ovs_net->vport_net.gre_vport); >> + if (unlikely(!vport)) >> + return PACKET_REJECT; >> + else >> + return PACKET_RCVD; > >Sorry, I forgot to say this before - if we receive the packet then we >should also call consume_skb() on it. Maybe there is no need to call consume_skb()? The icmp_rcv() would call kfree_skb() for us. I also checked the ipgre_err(), it return PACKET_RCVD without call consume_skb() too. Regards, Wei Zhang
Re: [PATCH v2] openvswitch: supply a dummy err_handler of gre_cisco_protocol to prevent kernel crash
At 2014-04-05 07:05:59,Jesse Gross je...@nicira.com wrote: On Tue, Apr 1, 2014 at 5:23 PM, Wei Zhang asuka@163.com wrote: v2 - v1: use the same logic of the gre_rcv() to distinguish which packet is intended to us! As a tip on kernel process: if you put the version information after three dashes below the signed-off-by line then git will automatically remove it when the final patch is applied. Thanks, should I modify it and send a v3 patch? diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c index a3d6951..f391df1 100644 --- a/net/openvswitch/vport-gre.c +++ b/net/openvswitch/vport-gre.c @@ -110,6 +110,21 @@ static int gre_rcv(struct sk_buff *skb, return PACKET_RCVD; } +/* Called with rcu_read_lock and BH disabled. */ +static int gre_err(struct sk_buff *skb, u32 info, + const struct tnl_ptk_info *tpi) +{ + struct ovs_net *ovs_net; + struct vport *vport; + + ovs_net = net_generic(dev_net(skb-dev), ovs_net_id); + vport = rcu_dereference(ovs_net-vport_net.gre_vport); + if (unlikely(!vport)) + return PACKET_REJECT; + else + return PACKET_RCVD; Sorry, I forgot to say this before - if we receive the packet then we should also call consume_skb() on it. Maybe there is no need to call consume_skb()? The icmp_rcv() would call kfree_skb() for us. I also checked the ipgre_err(), it return PACKET_RCVD without call consume_skb() too. Regards, Wei Zhang
Re: [PATCH] openvswitch: supply a dummy err_handler of gre_cisco_protocol to prevent kernel crash
At 2014-04-02 02:27:56,"Jesse Gross" wrote: >On Tue, Apr 1, 2014 at 8:24 AM, wei zhang wrote: >> At 2014-04-01 08:49:53,"Jesse Gross" wrote: >>>On Sun, Mar 30, 2014 at 5:12 AM, wei zhang wrote: >>>> At 2014-03-29 06:02:25,"Jesse Gross" wrote: >> >>>> Maybe I misunderstand something? I think if we discard all packet pass to >>>>us >>>> when we use gre vport, new gre_cisco_protocol which has lower priority >>>>could >>>> not see the packet intended to it. >>> >>>That's true but in this case it would also not see any data packets, >>>so I don't think that situation would work well anyways. >>> >>>> I checked the implementation of the ipgre_err(), which has be called before >>>> the err_handler of gre vport. It use the the (local address, remote >>>>address, key) >>>> to distinguish the packet which is realy intended to it, although it could >>>>not >>>> always get the key from the icmp packet. Should we do as the same as it? >>>> I'm not sure this is feasible, any advice is appreciate. >>> >>>OVS does flow based matching rather than using a static set of >>>configuration parameters, so everything "matches" in some way >>>(although the result might be to drop). >> >> So the flow based match could dynamically determine by the ovs daemon, we >>could >> not find out the belonging of the packet as far as we call ovs_dp_upcall(), >>isn't it? > >That's right - and since the OVS flow table always has a default >behavior (even if it is drop or send to controller) there's never a >packet that isn't considered to be destined to OVS once it is >received. > >If this makes sense to you, would you mind submitting the patch you >had earlier formally with a commit message and signed off by line? Sure, thank you again as it's my first time to send patch to the kernel, your patient and kind help give me confidence to do it :) Regards, Wei ZhangN�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a��� 0��h���i
Re: [PATCH] openvswitch: supply a dummy err_handler of gre_cisco_protocol to prevent kernel crash
At 2014-04-01 08:49:53,"Jesse Gross" wrote: >On Sun, Mar 30, 2014 at 5:12 AM, wei zhang wrote: >> At 2014-03-29 06:02:25,"Jesse Gross" wrote: >> Maybe I misunderstand something? I think if we discard all packet pass to us >> when we use gre vport, new gre_cisco_protocol which has lower priority could >> not see the packet intended to it. > >That's true but in this case it would also not see any data packets, >so I don't think that situation would work well anyways. > >> I checked the implementation of the ipgre_err(), which has be called before >> the err_handler of gre vport. It use the the (local address, remote address, >>key) >> to distinguish the packet which is realy intended to it, although it could >>not >> always get the key from the icmp packet. Should we do as the same as it? >> I'm not sure this is feasible, any advice is appreciate. > >OVS does flow based matching rather than using a static set of >configuration parameters, so everything "matches" in some way >(although the result might be to drop). So the flow based match could dynamically determine by the ovs daemon, we could not find out the belonging of the packet as far as we call ovs_dp_upcall(), isn't it? >This generally means that OVS >is the receiver of last resort and nothing currently has a lower >priority. Thanks for your kind help, this clarify my misunderstanding! >That actually means the difference between the patches is >somewhat academic but it seems more robust for the logic to be >consistent. Regards, Wei Zhang
Re: [PATCH] openvswitch: supply a dummy err_handler of gre_cisco_protocol to prevent kernel crash
At 2014-04-01 08:49:53,Jesse Gross je...@nicira.com wrote: On Sun, Mar 30, 2014 at 5:12 AM, wei zhang asuka@163.com wrote: At 2014-03-29 06:02:25,Jesse Gross je...@nicira.com wrote: Maybe I misunderstand something? I think if we discard all packet pass to us when we use gre vport, new gre_cisco_protocol which has lower priority could not see the packet intended to it. That's true but in this case it would also not see any data packets, so I don't think that situation would work well anyways. I checked the implementation of the ipgre_err(), which has be called before the err_handler of gre vport. It use the the (local address, remote address, key) to distinguish the packet which is realy intended to it, although it could not always get the key from the icmp packet. Should we do as the same as it? I'm not sure this is feasible, any advice is appreciate. OVS does flow based matching rather than using a static set of configuration parameters, so everything matches in some way (although the result might be to drop). So the flow based match could dynamically determine by the ovs daemon, we could not find out the belonging of the packet as far as we call ovs_dp_upcall(), isn't it? This generally means that OVS is the receiver of last resort and nothing currently has a lower priority. Thanks for your kind help, this clarify my misunderstanding! That actually means the difference between the patches is somewhat academic but it seems more robust for the logic to be consistent. Regards, Wei Zhang
Re: [PATCH] openvswitch: supply a dummy err_handler of gre_cisco_protocol to prevent kernel crash
At 2014-04-02 02:27:56,Jesse Gross je...@nicira.com wrote: On Tue, Apr 1, 2014 at 8:24 AM, wei zhang asuka@163.com wrote: At 2014-04-01 08:49:53,Jesse Gross je...@nicira.com wrote: On Sun, Mar 30, 2014 at 5:12 AM, wei zhang asuka@163.com wrote: At 2014-03-29 06:02:25,Jesse Gross je...@nicira.com wrote: Maybe I misunderstand something? I think if we discard all packet pass to us when we use gre vport, new gre_cisco_protocol which has lower priority could not see the packet intended to it. That's true but in this case it would also not see any data packets, so I don't think that situation would work well anyways. I checked the implementation of the ipgre_err(), which has be called before the err_handler of gre vport. It use the the (local address, remote address, key) to distinguish the packet which is realy intended to it, although it could not always get the key from the icmp packet. Should we do as the same as it? I'm not sure this is feasible, any advice is appreciate. OVS does flow based matching rather than using a static set of configuration parameters, so everything matches in some way (although the result might be to drop). So the flow based match could dynamically determine by the ovs daemon, we could not find out the belonging of the packet as far as we call ovs_dp_upcall(), isn't it? That's right - and since the OVS flow table always has a default behavior (even if it is drop or send to controller) there's never a packet that isn't considered to be destined to OVS once it is received. If this makes sense to you, would you mind submitting the patch you had earlier formally with a commit message and signed off by line? Sure, thank you again as it's my first time to send patch to the kernel, your patient and kind help give me confidence to do it :) Regards, Wei ZhangN�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a��� 0��h���i
Re: [PATCH] openvswitch: supply a dummy err_handler of gre_cisco_protocol to prevent kernel crash
At 2014-03-29 06:02:25,"Jesse Gross" wrote: >I'm not sure that rejecting all ICMP packets is the correct thing do >here since it means that we could pass them onto a later caller even >though they are intended for us. We should probably use the same logic >as for receiving packets and just discard them here. Thank you very much for your advice, did you mean this logic? diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c index a3d6951..c183a56 100644 --- a/net/openvswitch/vport-gre.c +++ b/net/openvswitch/vport-gre.c @@ -110,6 +110,21 @@ static int gre_rcv(struct sk_buff *skb, return PACKET_RCVD; } +/* Called with rcu_read_lock and BH disabled. */ +static int gre_err(struct sk_buff *skb, u32 info, + const struct tnl_ptk_info *tpi) +{ + struct ovs_net *ovs_net; + struct vport *vport; + + ovs_net = net_generic(dev_net(skb->dev), ovs_net_id); + vport = rcu_dereference(ovs_net->vport_net.gre_vport); + if (unlikely(!vport)) + return PACKET_REJECT; + else + return PACKET_RCVD; +} Maybe I misunderstand something? I think if we discard all packet pass to us when we use gre vport, new gre_cisco_protocol which has lower priority could not see the packet intended to it. I checked the implementation of the ipgre_err(), which has be called before the err_handler of gre vport. It use the the (local address, remote address, key) to distinguish the packet which is realy intended to it, although it could not always get the key from the icmp packet. Should we do as the same as it? I'm not sure this is feasible, any advice is appreciate. Regards, Wei Zhang N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a��� 0��h���i
Re: [PATCH] openvswitch: supply a dummy err_handler of gre_cisco_protocol to prevent kernel crash
At 2014-03-29 06:02:25,Jesse Gross je...@nicira.com wrote: I'm not sure that rejecting all ICMP packets is the correct thing do here since it means that we could pass them onto a later caller even though they are intended for us. We should probably use the same logic as for receiving packets and just discard them here. Thank you very much for your advice, did you mean this logic? diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c index a3d6951..c183a56 100644 --- a/net/openvswitch/vport-gre.c +++ b/net/openvswitch/vport-gre.c @@ -110,6 +110,21 @@ static int gre_rcv(struct sk_buff *skb, return PACKET_RCVD; } +/* Called with rcu_read_lock and BH disabled. */ +static int gre_err(struct sk_buff *skb, u32 info, + const struct tnl_ptk_info *tpi) +{ + struct ovs_net *ovs_net; + struct vport *vport; + + ovs_net = net_generic(dev_net(skb-dev), ovs_net_id); + vport = rcu_dereference(ovs_net-vport_net.gre_vport); + if (unlikely(!vport)) + return PACKET_REJECT; + else + return PACKET_RCVD; +} Maybe I misunderstand something? I think if we discard all packet pass to us when we use gre vport, new gre_cisco_protocol which has lower priority could not see the packet intended to it. I checked the implementation of the ipgre_err(), which has be called before the err_handler of gre vport. It use the the (local address, remote address, key) to distinguish the packet which is realy intended to it, although it could not always get the key from the icmp packet. Should we do as the same as it? I'm not sure this is feasible, any advice is appreciate. Regards, Wei Zhang N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a��� 0��h���i
Re: [PATCH] ipv4: gre: Fix null pointer dereference in gre_cisco_err()
At 2014-03-25 03:14:17,"David Miller" wrote: >From: Wei Zhang >Date: Mon, 24 Mar 2014 15:34:31 +0800 > >> When use the gre vport, openvswitch register a gre_cisco_protocol but >> does not supply a err_handler with it. The gre_cisco_err() call the >> err_handler without existence check, cause the kernel crash. >> >> This patch base on v3.14-rc7. But the bug affect all kernel newer than >> 3.11! >> >> Signed-off-by: Wei Zhang > >Rather, openvswitch should provide an appropriate ->err_handler() that >returns PACKET_RCVD or PACKET_REJECT. Thank you for your explanation, I misunderstand it . I'm very sorry about this bothering! > >I'm not applying this patch.
Re: [PATCH] ipv4: gre: Fix null pointer dereference in gre_cisco_err()
At 2014-03-25 03:14:17,David Miller da...@davemloft.net wrote: From: Wei Zhang asuka@163.com Date: Mon, 24 Mar 2014 15:34:31 +0800 When use the gre vport, openvswitch register a gre_cisco_protocol but does not supply a err_handler with it. The gre_cisco_err() call the err_handler without existence check, cause the kernel crash. This patch base on v3.14-rc7. But the bug affect all kernel newer than 3.11! Signed-off-by: Wei Zhang asuka@163.com Rather, openvswitch should provide an appropriate -err_handler() that returns PACKET_RCVD or PACKET_REJECT. Thank you for your explanation, I misunderstand it . I'm very sorry about this bothering! I'm not applying this patch.
Re:[PATCH] ipv4: gre: Fix null pointer dereference in gre_cisco_err()
The crash is occur on Centos 6.4, when we use gre vport of openvswitch! <1>BUG: unable to handle kernel NULL pointer dereference at (null) <1>IP: [<(null)>] (null) <4>PGD c2910b067 PUD c2927d067 PMD 0 <4>Oops: 0010 [#1] SMP <4>last sysfs file: /sys/devices/virtual/net/gretap0/flags <4>CPU 20 <4>Modules linked in: ip_gre ip_tunnel xt_conntrack act_police cls_basic sch_ingress veth ipt_REDIRECT ipmi_devintf ipv6 openvswitch vxlan iptable_filter iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_tables power_meter sg dcdbas microcode sb_edac edac_core iTCO_wdt iTCO_vendor_support shpchp tg3 ext4 jbd2 mbcache sd_mod crc_t10dif ahci wmi megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib] <4> <4>Pid: 2358, comm: ovs-vswitchd Not tainted 2.6.32-358.123.2.openstack.el6.x86_64 #1 Dell Inc. PowerEdge R620/0D2D5F <4>RIP: 0010:[<>] [<(null)>] (null) <4>RSP: 0018:880053743c70 EFLAGS: 00010282 <4>RAX: a01cbe20 RBX: 880bda46ecc0 RCX: <4>RDX: 880053743c78 RSI: RDI: 880bda46ecc0 <4>RBP: 880053743ca8 R08: 5617f772 R09: <4>R10: R11: R12: 880053743c78 <4>R13: R14: 880be5a5b244 R15: <4>FS: 7fe5d5f837c0() GS:88005374() knlGS: <4>CS: 0010 DS: ES: CR0: 80050033 <4>CR2: CR3: 000c2a804000 CR4: 000407e0 <4>DR0: DR1: DR2: <4>DR3: DR6: 0ff0 DR7: 0400 <4>Process ovs-vswitchd (pid: 2358, threadinfo 880c284e, task 880c28d1f500) <4>Stack: <4> 814c6e61 aba73000 880053743cc8 <4> 002f 880bda46ecc0 881800deb200 880053743cb8 <4> 814c676b 880053743cf8 814afa91 880053743ce8 <4>Call Trace: <4> <4> [] ? gre_cisco_err+0x71/0x80 <4> [] gre_err+0x4b/0x50 <4> [] icmp_unreach+0x141/0x2e0 <4> [] icmp_rcv+0x290/0x330 <4> [] ? raw_local_deliver+0x221/0x250 <4> [] ip_local_deliver_finish+0xdd/0x2d0 <4> [] ip_local_deliver+0x98/0xa0 <4> [] ip_rcv_finish+0x12d/0x440 <4> [] ip_rcv+0x275/0x350 <4> [] ? ovs_netdev_frame_hook+0xb3/0x110 [openvswitch] <4> [] __netif_receive_skb+0x4ab/0x750 <4> [] process_backlog+0x9a/0x100 <4> [] net_rx_action+0x103/0x2f0 <4> [] __do_softirq+0xc1/0x1e0 <4> [] ? call_softirq+0x1c/0x30 <4> [] call_softirq+0x1c/0x30 <4> <4> [] ? do_softirq+0x65/0xa0 <4> [] local_bh_enable+0x9a/0xb0 <4> [] ovs_packet_cmd_execute+0x20c/0x240 [openvswitch] <4> [] genl_rcv_msg+0x203/0x250 <4> [] ? genl_rcv_msg+0x0/0x250 <4> [] netlink_rcv_skb+0xa9/0xd0 <4> [] genl_rcv+0x25/0x40 <4> [] netlink_unicast+0x2db/0x320 <4> [] netlink_sendmsg+0x2c0/0x3d0 <4> [] sock_sendmsg+0x123/0x150 <4> [] ? sock_recvmsg+0x133/0x160 <4> [] ? autoremove_wake_function+0x0/0x40 <4> [] ? pollwake+0x0/0x60 <4> [] ? pollwake+0x0/0x60 <4> [] ? pipe_read+0x2a7/0x4e0 <4> [] ? pollwake+0x0/0x60 <4> [] __sys_sendmsg+0x406/0x420 <4> [] ? ep_scan_ready_list+0x194/0x1a0 <4> [] ? ep_poll+0x12e/0x330 <4> [] ? security_file_permission+0x16/0x20 <4> [] sys_sendmsg+0x49/0x90 <4> [] system_call_fastpath+0x16/0x1b <4>Code: Bad RIP value. <1>RIP [<(null)>] (null) <4> RSP <4>CR2: N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a��� 0��h���i
[PATCH] ipv4: gre: Fix null pointer dereference in gre_cisco_err()
When use the gre vport, openvswitch register a gre_cisco_protocol but does not supply a err_handler with it. The gre_cisco_err() call the err_handler without existence check, cause the kernel crash. This patch base on v3.14-rc7. But the bug affect all kernel newer than 3.11! Signed-off-by: Wei Zhang --- net/ipv4/gre_demux.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/ipv4/gre_demux.c b/net/ipv4/gre_demux.c index 1863422..56b0d67 100644 --- a/net/ipv4/gre_demux.c +++ b/net/ipv4/gre_demux.c @@ -250,7 +250,7 @@ static void gre_cisco_err(struct sk_buff *skb, u32 info) struct gre_cisco_protocol *proto; proto = rcu_dereference(gre_cisco_proto_list[i]); - if (!proto) + if (!proto || !proto->err_handler) continue; if (proto->err_handler(skb, info, ) == PACKET_RCVD) -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ipv4: gre: Fix null pointer dereference in gre_cisco_err()
When use the gre vport, openvswitch register a gre_cisco_protocol but does not supply a err_handler with it. The gre_cisco_err() call the err_handler without existence check, cause the kernel crash. This patch base on v3.14-rc7. But the bug affect all kernel newer than 3.11! Signed-off-by: Wei Zhang asuka@163.com --- net/ipv4/gre_demux.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/ipv4/gre_demux.c b/net/ipv4/gre_demux.c index 1863422..56b0d67 100644 --- a/net/ipv4/gre_demux.c +++ b/net/ipv4/gre_demux.c @@ -250,7 +250,7 @@ static void gre_cisco_err(struct sk_buff *skb, u32 info) struct gre_cisco_protocol *proto; proto = rcu_dereference(gre_cisco_proto_list[i]); - if (!proto) + if (!proto || !proto-err_handler) continue; if (proto-err_handler(skb, info, tpi) == PACKET_RCVD) -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re:[PATCH] ipv4: gre: Fix null pointer dereference in gre_cisco_err()
The crash is occur on Centos 6.4, when we use gre vport of openvswitch! 1BUG: unable to handle kernel NULL pointer dereference at (null) 1IP: [(null)] (null) 4PGD c2910b067 PUD c2927d067 PMD 0 4Oops: 0010 [#1] SMP 4last sysfs file: /sys/devices/virtual/net/gretap0/flags 4CPU 20 4Modules linked in: ip_gre ip_tunnel xt_conntrack act_police cls_basic sch_ingress veth ipt_REDIRECT ipmi_devintf ipv6 openvswitch vxlan iptable_filter iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_tables power_meter sg dcdbas microcode sb_edac edac_core iTCO_wdt iTCO_vendor_support shpchp tg3 ext4 jbd2 mbcache sd_mod crc_t10dif ahci wmi megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib] 4 4Pid: 2358, comm: ovs-vswitchd Not tainted 2.6.32-358.123.2.openstack.el6.x86_64 #1 Dell Inc. PowerEdge R620/0D2D5F 4RIP: 0010:[] [(null)] (null) 4RSP: 0018:880053743c70 EFLAGS: 00010282 4RAX: a01cbe20 RBX: 880bda46ecc0 RCX: 4RDX: 880053743c78 RSI: RDI: 880bda46ecc0 4RBP: 880053743ca8 R08: 5617f772 R09: 4R10: R11: R12: 880053743c78 4R13: R14: 880be5a5b244 R15: 4FS: 7fe5d5f837c0() GS:88005374() knlGS: 4CS: 0010 DS: ES: CR0: 80050033 4CR2: CR3: 000c2a804000 CR4: 000407e0 4DR0: DR1: DR2: 4DR3: DR6: 0ff0 DR7: 0400 4Process ovs-vswitchd (pid: 2358, threadinfo 880c284e, task 880c28d1f500) 4Stack: 4 814c6e61 aba73000 880053743cc8 4d 002f 880bda46ecc0 881800deb200 880053743cb8 4d 814c676b 880053743cf8 814afa91 880053743ce8 4Call Trace: 4 IRQ 4 [814c6e61] ? gre_cisco_err+0x71/0x80 4 [814c676b] gre_err+0x4b/0x50 4 [814afa91] icmp_unreach+0x141/0x2e0 4 [814b02e0] icmp_rcv+0x290/0x330 4 [814a8c71] ? raw_local_deliver+0x221/0x250 4 [8148222d] ip_local_deliver_finish+0xdd/0x2d0 4 [814824b8] ip_local_deliver+0x98/0xa0 4 [8148197d] ip_rcv_finish+0x12d/0x440 4 [81481f05] ip_rcv+0x275/0x350 4 [a01ca503] ? ovs_netdev_frame_hook+0xb3/0x110 [openvswitch] 4 [81449e6b] __netif_receive_skb+0x4ab/0x750 4 [8144a1aa] process_backlog+0x9a/0x100 4 [8144f483] net_rx_action+0x103/0x2f0 4 [810770b1] __do_softirq+0xc1/0x1e0 4 [8100c1cc] ? call_softirq+0x1c/0x30 4 [8100c1cc] call_softirq+0x1c/0x30 4 EOI 4 [8100de05] ? do_softirq+0x65/0xa0 4 [81076f3a] local_bh_enable+0x9a/0xb0 4 [a01c3b9c] ovs_packet_cmd_execute+0x20c/0x240 [openvswitch] 4 [81476013] genl_rcv_msg+0x203/0x250 4 [81475e10] ? genl_rcv_msg+0x0/0x250 4 [81474ca9] netlink_rcv_skb+0xa9/0xd0 4 [81475df5] genl_rcv+0x25/0x40 4 [814748db] netlink_unicast+0x2db/0x320 4 [81475350] netlink_sendmsg+0x2c0/0x3d0 4 [81436b33] sock_sendmsg+0x123/0x150 4 [814387e3] ? sock_recvmsg+0x133/0x160 4 [81096da0] ? autoremove_wake_function+0x0/0x40 4 [81197af0] ? pollwake+0x0/0x60 4 [81197af0] ? pollwake+0x0/0x60 4 [8118c687] ? pipe_read+0x2a7/0x4e0 4 [81197af0] ? pollwake+0x0/0x60 4 [81438326] __sys_sendmsg+0x406/0x420 4 [811c7c54] ? ep_scan_ready_list+0x194/0x1a0 4 [811c7dae] ? ep_poll+0x12e/0x330 4 [8121cb26] ? security_file_permission+0x16/0x20 4 [81438549] sys_sendmsg+0x49/0x90 4 [8100b072] system_call_fastpath+0x16/0x1b 4Code: Bad RIP value. 1RIP [(null)] (null) 4 RSP 880053743c70 4CR2: N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a��� 0��h���i