Re: [PATCH v7 0/4] New Microsemi PCI Switch Management Driver

2017-03-09 Thread Wei Zhang
This is great!  Thanks Bjorn and Logan for getting this drivers upstreamed!

-Wei

--
wei zhang | software engineer | facebook
wzh...@fb.com | (408) 460-4803

On 3/9/17, 11:56 AM, "Logan Gunthorpe" <log...@deltatee.com> wrote:



On 09/03/17 12:55 PM, Bjorn Helgaas wrote:
> Applied to pci/switchtec for v4.12, thanks!

Great! Thanks so much!

Logan





Re: [PATCH v7 0/4] New Microsemi PCI Switch Management Driver

2017-03-09 Thread Wei Zhang
This is great!  Thanks Bjorn and Logan for getting this drivers upstreamed!

-Wei

--
wei zhang | software engineer | facebook
wzh...@fb.com | (408) 460-4803

On 3/9/17, 11:56 AM, "Logan Gunthorpe"  wrote:



On 09/03/17 12:55 PM, Bjorn Helgaas wrote:
> Applied to pci/switchtec for v4.12, thanks!

Great! Thanks so much!

Logan





Re: [PATCH v2 0/4] New Microsemi PCI Switch Management Driver

2017-02-09 Thread Wei Zhang
Corrections to the email below.  It should be

Tested-by: Krishna Dhulipala krish...@fb.com
Reviewed-by: Wei Zhang wzh...@fb.com  

Thanks,
-Wei

--
wei zhang | software engineer | facebook
wzh...@fb.com | (408) 460-4803

On 2/9/17, 3:16 PM, "Wei Zhang" <wzh...@fb.com> wrote:

Hi,

The switchtec driver patches [PATCH v2 (0-4)/4] in conjunction with the 
switchtec userland tool is used to communicate with the Microsemi 8536 PCIe 
Switch used on Facebook’s Lightning platform. The following essential driver 
and tool functions were successfully tested with it:
· Retrieval of firmware and configuration information along with 
CRCs
· Switch firmware and configuration upgrades
· Switch PHY/Link error counter collection and the ability to reset 
them
· Switch upstream and downstream ports’ link status reporting
· Switch interface functioning
· Retrieval of switch ASIC temperature
· Exporting switch firmware log dump
· Read the information of firmware and configuration binaries
· Extract the firmware and configuration images stored in the 
switch EEPROM

Tested-by: Krishna Dhulipala krish...@fb.com
    Reviewed-by: Wei Zhang wzh...@fb.com  

Thanks,
-Wei
    
--
wei zhang | software engineer | facebook
wzh...@fb.com | (408) 460-4803

On 2/2/17, 10:05 AM, "Logan Gunthorpe" <log...@deltatee.com> wrote:

Changes since v1:

* Rebased onto 4.10-rc6 (cleanly)
* Split the patch into a few more easily digestible patches (as
  suggested by Greg Kroah-Hartman)
* Folded switchtec.c into switchtec.h (per Greg)
* Fixed a bunch of 32bit build warnings caught by the kbuild test robot
* Fixed some issues in the documentation so it has a proper
  reStructredText format (as noted by Jonathan Corbet)
* Fixed padding and sizes in the IOCTL structures as noticed by Emil
  Velikov and used pahole to verify their consistency across 32 and 64
  bit builds
* Reworked one of the IOCTL interfaces to be more future proof (per
  Emil).

Changes since RFC:

* Fixed incorrect use of the drive model as pointed out by Greg
  Kroah-Hartman
* Used devm functions as suggested by Keith Busch
* Added a handful of sysfs attributes to the switchtec class
* Added a handful of IOCTLs to the switchtec device
* A number of miscellaneous bug fixes

--

Hi,

This is a continuation of the RFC we posted lasted month [1] which
proposes a management driver for Microsemi's Switchtec line of PCI
switches. This hardware is still looking to be used in the Open
Compute Platform

To make this entirely clear: the Switchtec products are compliant
with the PCI specifications and are supported today with the standard
in-kernel driver. However, these devices also expose a management 
endpoint
on a separate PCI function address which can be used to perform some
advanced operations. This is a driver for that function. See the patch
for more information.

Since the RFC, we've made the changes requested by Greg Kroah-Hartman
and Keith Busch, and we've also fleshed out a number of features. We've
added a couple of IOCTLs and sysfs attributes which are documented in
the patch. Significant work has also been done on the userspace tool
which is available under a GPL license at [2]. We've also had testing
done by some of the interested parties.

We hope to see this work included in either 4.11 or 4.12 assuming a
smooth review process.

The patch is based off of the v4.10-rc6 release.

Thanks for your review,

Logan

[1] 
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.net_lists_linux-2Dpci_msg56897.html=DwIBAg=5VD0RTtNlTh3ycd41b3MUw=LRFoLl92zWj5mkgkc_hRcg=VLDBJqzotzGkTj8-xjlfT-J0k2uFq6FcWg2nA_oKYJo=OkigHoSqH1Z3dnmLqN76lIQ_WxRJDj1uqIDl35SI58A=
 
[2] https://github.com/sbates130272/switchtec-user

--

Logan Gunthorpe (4):
  MicroSemi Switchtec management interface driver
  switchtec: Add user interface documentation
  switchtec: Add sysfs attributes to the Switchtec driver
  switchtec: Add IOCTLs to the Switchtec driver

 Documentation/ABI/testing/sysfs-class-switchtec |   96 ++
 Documentation/ioctl/ioctl-number.txt|1 +
 Documentation/switchtec.txt |   80 ++
 MAINTAINERS |   11 +
 drivers/pci/Kconfig  

Re: [PATCH v2 0/4] New Microsemi PCI Switch Management Driver

2017-02-09 Thread Wei Zhang
Corrections to the email below.  It should be

Tested-by: Krishna Dhulipala krish...@fb.com
Reviewed-by: Wei Zhang wzh...@fb.com  

Thanks,
-Wei

--
wei zhang | software engineer | facebook
wzh...@fb.com | (408) 460-4803

On 2/9/17, 3:16 PM, "Wei Zhang"  wrote:

Hi,

The switchtec driver patches [PATCH v2 (0-4)/4] in conjunction with the 
switchtec userland tool is used to communicate with the Microsemi 8536 PCIe 
Switch used on Facebook’s Lightning platform. The following essential driver 
and tool functions were successfully tested with it:
· Retrieval of firmware and configuration information along with 
CRCs
· Switch firmware and configuration upgrades
· Switch PHY/Link error counter collection and the ability to reset 
them
· Switch upstream and downstream ports’ link status reporting
· Switch interface functioning
· Retrieval of switch ASIC temperature
· Exporting switch firmware log dump
· Read the information of firmware and configuration binaries
· Extract the firmware and configuration images stored in the 
switch EEPROM

Tested-by: Krishna Dhulipala krish...@fb.com
Reviewed-by: Wei Zhang wzh...@fb.com  

Thanks,
-Wei

    --
    wei zhang | software engineer | facebook
wzh...@fb.com | (408) 460-4803

On 2/2/17, 10:05 AM, "Logan Gunthorpe"  wrote:

Changes since v1:

* Rebased onto 4.10-rc6 (cleanly)
* Split the patch into a few more easily digestible patches (as
  suggested by Greg Kroah-Hartman)
* Folded switchtec.c into switchtec.h (per Greg)
* Fixed a bunch of 32bit build warnings caught by the kbuild test robot
* Fixed some issues in the documentation so it has a proper
  reStructredText format (as noted by Jonathan Corbet)
* Fixed padding and sizes in the IOCTL structures as noticed by Emil
  Velikov and used pahole to verify their consistency across 32 and 64
  bit builds
* Reworked one of the IOCTL interfaces to be more future proof (per
  Emil).

Changes since RFC:

* Fixed incorrect use of the drive model as pointed out by Greg
  Kroah-Hartman
* Used devm functions as suggested by Keith Busch
* Added a handful of sysfs attributes to the switchtec class
* Added a handful of IOCTLs to the switchtec device
* A number of miscellaneous bug fixes

--

Hi,

This is a continuation of the RFC we posted lasted month [1] which
proposes a management driver for Microsemi's Switchtec line of PCI
switches. This hardware is still looking to be used in the Open
Compute Platform

To make this entirely clear: the Switchtec products are compliant
with the PCI specifications and are supported today with the standard
in-kernel driver. However, these devices also expose a management 
endpoint
on a separate PCI function address which can be used to perform some
advanced operations. This is a driver for that function. See the patch
for more information.

Since the RFC, we've made the changes requested by Greg Kroah-Hartman
and Keith Busch, and we've also fleshed out a number of features. We've
added a couple of IOCTLs and sysfs attributes which are documented in
the patch. Significant work has also been done on the userspace tool
which is available under a GPL license at [2]. We've also had testing
done by some of the interested parties.

We hope to see this work included in either 4.11 or 4.12 assuming a
smooth review process.

The patch is based off of the v4.10-rc6 release.

Thanks for your review,

Logan

[1] 
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.net_lists_linux-2Dpci_msg56897.html=DwIBAg=5VD0RTtNlTh3ycd41b3MUw=LRFoLl92zWj5mkgkc_hRcg=VLDBJqzotzGkTj8-xjlfT-J0k2uFq6FcWg2nA_oKYJo=OkigHoSqH1Z3dnmLqN76lIQ_WxRJDj1uqIDl35SI58A=
 
[2] https://github.com/sbates130272/switchtec-user

--

Logan Gunthorpe (4):
  MicroSemi Switchtec management interface driver
  switchtec: Add user interface documentation
  switchtec: Add sysfs attributes to the Switchtec driver
  switchtec: Add IOCTLs to the Switchtec driver

 Documentation/ABI/testing/sysfs-class-switchtec |   96 ++
 Documentation/ioctl/ioctl-number.txt|1 +
 Documentation/switchtec.txt |   80 ++
 MAINTAINERS |   11 +
 drivers/pci/Kconfig |1 +
 

Re: [PATCH v2 0/4] New Microsemi PCI Switch Management Driver

2017-02-09 Thread Wei Zhang
Hi,

The switchtec driver patches [PATCH v2 (0-4)/4] in conjunction with the 
switchtec userland tool is used to communicate with the Microsemi 8536 PCIe 
Switch used on Facebook’s Lightning platform. The following essential driver 
and tool functions were successfully tested with it:
· Retrieval of firmware and configuration information along with CRCs
· Switch firmware and configuration upgrades
· Switch PHY/Link error counter collection and the ability to reset them
· Switch upstream and downstream ports’ link status reporting
· Switch interface functioning
· Retrieval of switch ASIC temperature
· Exporting switch firmware log dump
· Read the information of firmware and configuration binaries
· Extract the firmware and configuration images stored in the switch 
EEPROM

Tested-by: Krishna Dhulipala krish...@fb.com
Tested-by: Wei Zhang wzh...@fb.com  

Thanks,
-Wei

--
wei zhang | software engineer | facebook
wzh...@fb.com | (408) 460-4803

On 2/2/17, 10:05 AM, "Logan Gunthorpe" <log...@deltatee.com> wrote:

Changes since v1:

* Rebased onto 4.10-rc6 (cleanly)
* Split the patch into a few more easily digestible patches (as
  suggested by Greg Kroah-Hartman)
* Folded switchtec.c into switchtec.h (per Greg)
* Fixed a bunch of 32bit build warnings caught by the kbuild test robot
* Fixed some issues in the documentation so it has a proper
  reStructredText format (as noted by Jonathan Corbet)
* Fixed padding and sizes in the IOCTL structures as noticed by Emil
  Velikov and used pahole to verify their consistency across 32 and 64
  bit builds
* Reworked one of the IOCTL interfaces to be more future proof (per
  Emil).

Changes since RFC:

* Fixed incorrect use of the drive model as pointed out by Greg
  Kroah-Hartman
* Used devm functions as suggested by Keith Busch
* Added a handful of sysfs attributes to the switchtec class
* Added a handful of IOCTLs to the switchtec device
* A number of miscellaneous bug fixes

--

Hi,

This is a continuation of the RFC we posted lasted month [1] which
proposes a management driver for Microsemi's Switchtec line of PCI
switches. This hardware is still looking to be used in the Open
Compute Platform

To make this entirely clear: the Switchtec products are compliant
with the PCI specifications and are supported today with the standard
in-kernel driver. However, these devices also expose a management endpoint
on a separate PCI function address which can be used to perform some
advanced operations. This is a driver for that function. See the patch
for more information.

Since the RFC, we've made the changes requested by Greg Kroah-Hartman
and Keith Busch, and we've also fleshed out a number of features. We've
added a couple of IOCTLs and sysfs attributes which are documented in
the patch. Significant work has also been done on the userspace tool
which is available under a GPL license at [2]. We've also had testing
done by some of the interested parties.

We hope to see this work included in either 4.11 or 4.12 assuming a
smooth review process.

The patch is based off of the v4.10-rc6 release.

Thanks for your review,

Logan

[1] 
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.net_lists_linux-2Dpci_msg56897.html=DwIBAg=5VD0RTtNlTh3ycd41b3MUw=LRFoLl92zWj5mkgkc_hRcg=VLDBJqzotzGkTj8-xjlfT-J0k2uFq6FcWg2nA_oKYJo=OkigHoSqH1Z3dnmLqN76lIQ_WxRJDj1uqIDl35SI58A=
 
[2] https://github.com/sbates130272/switchtec-user

--

Logan Gunthorpe (4):
  MicroSemi Switchtec management interface driver
  switchtec: Add user interface documentation
  switchtec: Add sysfs attributes to the Switchtec driver
  switchtec: Add IOCTLs to the Switchtec driver

 Documentation/ABI/testing/sysfs-class-switchtec |   96 ++
 Documentation/ioctl/ioctl-number.txt|1 +
 Documentation/switchtec.txt |   80 ++
 MAINTAINERS |   11 +
 drivers/pci/Kconfig |1 +
 drivers/pci/Makefile|1 +
 drivers/pci/switch/Kconfig  |   13 +
 drivers/pci/switch/Makefile |1 +
 drivers/pci/switch/switchtec.c  | 1608 
+++
 include/uapi/linux/switchtec_ioctl.h|  132 ++
 10 files changed, 1944 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-class-switchtec
 create mode 100644 Documentation/switchtec.txt
 create mode 100644 drivers/pci/switch/Kconfig
 create mode 100644 drivers/pci/switch/Makefile
 create mode 100644 drivers/pci/switch/switchtec.c
 create mode 100644 in

Re: [PATCH v2 0/4] New Microsemi PCI Switch Management Driver

2017-02-09 Thread Wei Zhang
Hi,

The switchtec driver patches [PATCH v2 (0-4)/4] in conjunction with the 
switchtec userland tool is used to communicate with the Microsemi 8536 PCIe 
Switch used on Facebook’s Lightning platform. The following essential driver 
and tool functions were successfully tested with it:
· Retrieval of firmware and configuration information along with CRCs
· Switch firmware and configuration upgrades
· Switch PHY/Link error counter collection and the ability to reset them
· Switch upstream and downstream ports’ link status reporting
· Switch interface functioning
· Retrieval of switch ASIC temperature
· Exporting switch firmware log dump
· Read the information of firmware and configuration binaries
· Extract the firmware and configuration images stored in the switch 
EEPROM

Tested-by: Krishna Dhulipala krish...@fb.com
Tested-by: Wei Zhang wzh...@fb.com  

Thanks,
-Wei

--
wei zhang | software engineer | facebook
wzh...@fb.com | (408) 460-4803

On 2/2/17, 10:05 AM, "Logan Gunthorpe"  wrote:

Changes since v1:

* Rebased onto 4.10-rc6 (cleanly)
* Split the patch into a few more easily digestible patches (as
  suggested by Greg Kroah-Hartman)
* Folded switchtec.c into switchtec.h (per Greg)
* Fixed a bunch of 32bit build warnings caught by the kbuild test robot
* Fixed some issues in the documentation so it has a proper
  reStructredText format (as noted by Jonathan Corbet)
* Fixed padding and sizes in the IOCTL structures as noticed by Emil
  Velikov and used pahole to verify their consistency across 32 and 64
  bit builds
* Reworked one of the IOCTL interfaces to be more future proof (per
  Emil).

Changes since RFC:

* Fixed incorrect use of the drive model as pointed out by Greg
  Kroah-Hartman
* Used devm functions as suggested by Keith Busch
* Added a handful of sysfs attributes to the switchtec class
* Added a handful of IOCTLs to the switchtec device
* A number of miscellaneous bug fixes

--

Hi,

This is a continuation of the RFC we posted lasted month [1] which
proposes a management driver for Microsemi's Switchtec line of PCI
switches. This hardware is still looking to be used in the Open
Compute Platform

To make this entirely clear: the Switchtec products are compliant
with the PCI specifications and are supported today with the standard
in-kernel driver. However, these devices also expose a management endpoint
on a separate PCI function address which can be used to perform some
advanced operations. This is a driver for that function. See the patch
for more information.

Since the RFC, we've made the changes requested by Greg Kroah-Hartman
and Keith Busch, and we've also fleshed out a number of features. We've
added a couple of IOCTLs and sysfs attributes which are documented in
the patch. Significant work has also been done on the userspace tool
which is available under a GPL license at [2]. We've also had testing
done by some of the interested parties.

We hope to see this work included in either 4.11 or 4.12 assuming a
smooth review process.

The patch is based off of the v4.10-rc6 release.

Thanks for your review,

Logan

[1] 
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.net_lists_linux-2Dpci_msg56897.html=DwIBAg=5VD0RTtNlTh3ycd41b3MUw=LRFoLl92zWj5mkgkc_hRcg=VLDBJqzotzGkTj8-xjlfT-J0k2uFq6FcWg2nA_oKYJo=OkigHoSqH1Z3dnmLqN76lIQ_WxRJDj1uqIDl35SI58A=
 
[2] https://github.com/sbates130272/switchtec-user

--

Logan Gunthorpe (4):
  MicroSemi Switchtec management interface driver
  switchtec: Add user interface documentation
  switchtec: Add sysfs attributes to the Switchtec driver
  switchtec: Add IOCTLs to the Switchtec driver

 Documentation/ABI/testing/sysfs-class-switchtec |   96 ++
 Documentation/ioctl/ioctl-number.txt|1 +
 Documentation/switchtec.txt |   80 ++
 MAINTAINERS |   11 +
 drivers/pci/Kconfig |1 +
 drivers/pci/Makefile|1 +
 drivers/pci/switch/Kconfig  |   13 +
 drivers/pci/switch/Makefile |1 +
 drivers/pci/switch/switchtec.c  | 1608 
+++
 include/uapi/linux/switchtec_ioctl.h|  132 ++
 10 files changed, 1944 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-class-switchtec
 create mode 100644 Documentation/switchtec.txt
 create mode 100644 drivers/pci/switch/Kconfig
 create mode 100644 drivers/pci/switch/Makefile
 create mode 100644 drivers/pci/switch/switchtec.c
 create mode 100644 include/uapi/linux/switcht

[PATCH v2] net: fix incorrect original ingress device index in PKTINFO

2016-12-29 Thread Wei Zhang
When we send a packet for our own local address on a non-loopback
interface (e.g. eth0), due to the change had been introduced from
commit 0b922b7a829c ("net: original ingress device index in PKTINFO"), the
original ingress device index would be set as the loopback interface.
However, the packet should be considered as if it is being arrived via the
sending interface (eth0), otherwise it would break the expectation of the
userspace application (e.g. the DHCPRELEASE message from dhcp_release
binary would be ignored by the dnsmasq daemon, since it come from lo which
is not the interface dnsmasq bind to)

Fixes: 0b922b7a829c ("net: original ingress device index in PKTINFO")
Acked-by: David Ahern <d...@cumulusnetworks.com>
Signed-off-by: Wei Zhang <asuka@163.com>
---
v2:
 - add the missing Fixes line
 - better comment come from David Ahern

 net/ipv4/ip_sockglue.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 57e1405..53ae0c6 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -1225,8 +1225,14 @@ void ipv4_pktinfo_prepare(const struct sock *sk, struct 
sk_buff *skb)
 * which has interface index (iif) as the first member of the
 * underlying inet{6}_skb_parm struct. This code then overlays
 * PKTINFO_SKB_CB and in_pktinfo also has iif as the first
-* element so the iif is picked up from the prior IPCB
+* element so the iif is picked up from the prior IPCB. If iif
+* is the loopback interface, then return the sending interface
+* (e.g., process binds socket to eth0 for Tx which is
+* redirected to loopback in the rtable/dst).
 */
+   if (pktinfo->ipi_ifindex == LOOPBACK_IFINDEX)
+   pktinfo->ipi_ifindex = inet_iif(skb);
+
pktinfo->ipi_spec_dst.s_addr = fib_compute_spec_dst(skb);
} else {
pktinfo->ipi_ifindex = 0;
-- 
1.8.3.1




[PATCH v2] net: fix incorrect original ingress device index in PKTINFO

2016-12-29 Thread Wei Zhang
When we send a packet for our own local address on a non-loopback
interface (e.g. eth0), due to the change had been introduced from
commit 0b922b7a829c ("net: original ingress device index in PKTINFO"), the
original ingress device index would be set as the loopback interface.
However, the packet should be considered as if it is being arrived via the
sending interface (eth0), otherwise it would break the expectation of the
userspace application (e.g. the DHCPRELEASE message from dhcp_release
binary would be ignored by the dnsmasq daemon, since it come from lo which
is not the interface dnsmasq bind to)

Fixes: 0b922b7a829c ("net: original ingress device index in PKTINFO")
Acked-by: David Ahern 
Signed-off-by: Wei Zhang 
---
v2:
 - add the missing Fixes line
 - better comment come from David Ahern

 net/ipv4/ip_sockglue.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 57e1405..53ae0c6 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -1225,8 +1225,14 @@ void ipv4_pktinfo_prepare(const struct sock *sk, struct 
sk_buff *skb)
 * which has interface index (iif) as the first member of the
 * underlying inet{6}_skb_parm struct. This code then overlays
 * PKTINFO_SKB_CB and in_pktinfo also has iif as the first
-* element so the iif is picked up from the prior IPCB
+* element so the iif is picked up from the prior IPCB. If iif
+* is the loopback interface, then return the sending interface
+* (e.g., process binds socket to eth0 for Tx which is
+* redirected to loopback in the rtable/dst).
 */
+   if (pktinfo->ipi_ifindex == LOOPBACK_IFINDEX)
+   pktinfo->ipi_ifindex = inet_iif(skb);
+
pktinfo->ipi_spec_dst.s_addr = fib_compute_spec_dst(skb);
} else {
pktinfo->ipi_ifindex = 0;
-- 
1.8.3.1




[PATCH] net: fix incorrect original ingress device index in PKTINFO

2016-12-27 Thread Wei Zhang
When we send a packet for our own local address on a non-loopback
interface (e.g. eth0), due to the change had been introduced from
commit 0b922b7a829c ("net: original ingress device index in PKTINFO"), the
original ingress device index would be set as the loopback interface.
However, the packet should be considered as if it is being arrived via the
sending interface (eth0), otherwise it would break the expectation of the
userspace application (e.g. the DHCPRELEASE message from dhcp_release
binary would be ignored by the dnsmasq daemon, since it come from lo which
is not the interface dnsmasq bind to)

Signed-off-by: Wei Zhang <asuka@163.com>
---
 net/ipv4/ip_sockglue.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index b8a2d63..76d78a7 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -1202,8 +1202,14 @@ void ipv4_pktinfo_prepare(const struct sock *sk, struct 
sk_buff *skb)
 * which has interface index (iif) as the first member of the
 * underlying inet{6}_skb_parm struct. This code then overlays
 * PKTINFO_SKB_CB and in_pktinfo also has iif as the first
-* element so the iif is picked up from the prior IPCB
+* element so the iif is picked up from the prior IPCB except
+* iif is loopback interface which the packet should be
+* considered as if it is being arrived via the sending
+* interface
 */
+   if (pktinfo->ipi_ifindex == LOOPBACK_IFINDEX)
+   pktinfo->ipi_ifindex = inet_iif(skb);
+
pktinfo->ipi_spec_dst.s_addr = fib_compute_spec_dst(skb);
} else {
pktinfo->ipi_ifindex = 0;
-- 
1.8.3.1




[PATCH] net: fix incorrect original ingress device index in PKTINFO

2016-12-27 Thread Wei Zhang
When we send a packet for our own local address on a non-loopback
interface (e.g. eth0), due to the change had been introduced from
commit 0b922b7a829c ("net: original ingress device index in PKTINFO"), the
original ingress device index would be set as the loopback interface.
However, the packet should be considered as if it is being arrived via the
sending interface (eth0), otherwise it would break the expectation of the
userspace application (e.g. the DHCPRELEASE message from dhcp_release
binary would be ignored by the dnsmasq daemon, since it come from lo which
is not the interface dnsmasq bind to)

Signed-off-by: Wei Zhang 
---
 net/ipv4/ip_sockglue.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index b8a2d63..76d78a7 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -1202,8 +1202,14 @@ void ipv4_pktinfo_prepare(const struct sock *sk, struct 
sk_buff *skb)
 * which has interface index (iif) as the first member of the
 * underlying inet{6}_skb_parm struct. This code then overlays
 * PKTINFO_SKB_CB and in_pktinfo also has iif as the first
-* element so the iif is picked up from the prior IPCB
+* element so the iif is picked up from the prior IPCB except
+* iif is loopback interface which the packet should be
+* considered as if it is being arrived via the sending
+* interface
 */
+   if (pktinfo->ipi_ifindex == LOOPBACK_IFINDEX)
+   pktinfo->ipi_ifindex = inet_iif(skb);
+
pktinfo->ipi_spec_dst.s_addr = fib_compute_spec_dst(skb);
} else {
pktinfo->ipi_ifindex = 0;
-- 
1.8.3.1




Re:[PATCH] net: fix incorrect original ingress device index in PKTINFO

2016-12-27 Thread wei zhang
At 2016-12-27 15:52:18, "Wei Zhang" <asuka@163.com> wrote:
>When we send a packet for our own local address on a non-loopback interface
>(e.g. eth0), due to the change had been introduced from commit 0b922b7a829c
>("net: original ingress device index in PKTINFO"), the original ingress
>device index would be set as the loopback interface. However, the packet
>should be considered as if it is being arrived via the sending interface
>(eth0), otherwise it would break the expectation of the userspace
>application (e.g. the DHCPRELEASE message from dhcp_release binary would
>be ignored by the dnsmasq daemon)
>
>Signed-off-by: Wei Zhang <asuka@163.com>
>---
> net/ipv4/ip_sockglue.c | 7 ++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
>diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
>index b8a2d63..b6a6d35 100644
>--- a/net/ipv4/ip_sockglue.c
>+++ b/net/ipv4/ip_sockglue.c
>@@ -1202,8 +1202,13 @@ void ipv4_pktinfo_prepare(const struct sock *sk, struct 
>sk_buff *skb)
>* which has interface index (iif) as the first member of the
>* underlying inet{6}_skb_parm struct. This code then overlays
>* PKTINFO_SKB_CB and in_pktinfo also has iif as the first
>-   * element so the iif is picked up from the prior IPCB
>+   * element so the iif is picked up from the prior IPCB except
>+   * iif is loopback interface which the packet should be 
>+   * considered as if it is being arrived via the sending 
>interface
>*/
>+  if (pktinfo->ipi_ifindex == LOOPBACK_IFINDEX) {
>+  pktinfo->ipi_ifindex = inet_iif(skb);
>+  }
>   pktinfo->ipi_spec_dst.s_addr = fib_compute_spec_dst(skb);
>   } else {
>   pktinfo->ipi_ifindex = 0;
>-- 
>1.8.3.1
>

  When I upgrade to the 4.9, the dhcp_release could not release the dhcp
lease, the dnsmasq ignored all the DHCPRELEASE message which it think come
from lo. I think this is due to the commit 0b922b7a829c, which set the
IPCB(skb)->iif = skb->skb_iif in the ip_rcv()!

  And I'm very sorry about forgetting checkpatch, I will resend the patch,
hope I'm not bothering you!

Thanks,
Wei Zhang

Re:[PATCH] net: fix incorrect original ingress device index in PKTINFO

2016-12-27 Thread wei zhang
At 2016-12-27 15:52:18, "Wei Zhang"  wrote:
>When we send a packet for our own local address on a non-loopback interface
>(e.g. eth0), due to the change had been introduced from commit 0b922b7a829c
>("net: original ingress device index in PKTINFO"), the original ingress
>device index would be set as the loopback interface. However, the packet
>should be considered as if it is being arrived via the sending interface
>(eth0), otherwise it would break the expectation of the userspace
>application (e.g. the DHCPRELEASE message from dhcp_release binary would
>be ignored by the dnsmasq daemon)
>
>Signed-off-by: Wei Zhang 
>---
> net/ipv4/ip_sockglue.c | 7 ++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
>diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
>index b8a2d63..b6a6d35 100644
>--- a/net/ipv4/ip_sockglue.c
>+++ b/net/ipv4/ip_sockglue.c
>@@ -1202,8 +1202,13 @@ void ipv4_pktinfo_prepare(const struct sock *sk, struct 
>sk_buff *skb)
>* which has interface index (iif) as the first member of the
>* underlying inet{6}_skb_parm struct. This code then overlays
>* PKTINFO_SKB_CB and in_pktinfo also has iif as the first
>-   * element so the iif is picked up from the prior IPCB
>+   * element so the iif is picked up from the prior IPCB except
>+   * iif is loopback interface which the packet should be 
>+   * considered as if it is being arrived via the sending 
>interface
>*/
>+  if (pktinfo->ipi_ifindex == LOOPBACK_IFINDEX) {
>+  pktinfo->ipi_ifindex = inet_iif(skb);
>+  }
>   pktinfo->ipi_spec_dst.s_addr = fib_compute_spec_dst(skb);
>   } else {
>   pktinfo->ipi_ifindex = 0;
>-- 
>1.8.3.1
>

  When I upgrade to the 4.9, the dhcp_release could not release the dhcp
lease, the dnsmasq ignored all the DHCPRELEASE message which it think come
from lo. I think this is due to the commit 0b922b7a829c, which set the
IPCB(skb)->iif = skb->skb_iif in the ip_rcv()!

  And I'm very sorry about forgetting checkpatch, I will resend the patch,
hope I'm not bothering you!

Thanks,
Wei Zhang

[PATCH] net: fix incorrect original ingress device index in PKTINFO

2016-12-26 Thread Wei Zhang
When we send a packet for our own local address on a non-loopback interface
(e.g. eth0), due to the change had been introduced from commit 0b922b7a829c
("net: original ingress device index in PKTINFO"), the original ingress
device index would be set as the loopback interface. However, the packet
should be considered as if it is being arrived via the sending interface
(eth0), otherwise it would break the expectation of the userspace
application (e.g. the DHCPRELEASE message from dhcp_release binary would
be ignored by the dnsmasq daemon)

Signed-off-by: Wei Zhang <asuka@163.com>
---
 net/ipv4/ip_sockglue.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index b8a2d63..b6a6d35 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -1202,8 +1202,13 @@ void ipv4_pktinfo_prepare(const struct sock *sk, struct 
sk_buff *skb)
 * which has interface index (iif) as the first member of the
 * underlying inet{6}_skb_parm struct. This code then overlays
 * PKTINFO_SKB_CB and in_pktinfo also has iif as the first
-* element so the iif is picked up from the prior IPCB
+* element so the iif is picked up from the prior IPCB except
+* iif is loopback interface which the packet should be 
+* considered as if it is being arrived via the sending 
interface
 */
+   if (pktinfo->ipi_ifindex == LOOPBACK_IFINDEX) {
+   pktinfo->ipi_ifindex = inet_iif(skb);
+   }
pktinfo->ipi_spec_dst.s_addr = fib_compute_spec_dst(skb);
} else {
pktinfo->ipi_ifindex = 0;
-- 
1.8.3.1




[PATCH] net: fix incorrect original ingress device index in PKTINFO

2016-12-26 Thread Wei Zhang
When we send a packet for our own local address on a non-loopback interface
(e.g. eth0), due to the change had been introduced from commit 0b922b7a829c
("net: original ingress device index in PKTINFO"), the original ingress
device index would be set as the loopback interface. However, the packet
should be considered as if it is being arrived via the sending interface
(eth0), otherwise it would break the expectation of the userspace
application (e.g. the DHCPRELEASE message from dhcp_release binary would
be ignored by the dnsmasq daemon)

Signed-off-by: Wei Zhang 
---
 net/ipv4/ip_sockglue.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index b8a2d63..b6a6d35 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -1202,8 +1202,13 @@ void ipv4_pktinfo_prepare(const struct sock *sk, struct 
sk_buff *skb)
 * which has interface index (iif) as the first member of the
 * underlying inet{6}_skb_parm struct. This code then overlays
 * PKTINFO_SKB_CB and in_pktinfo also has iif as the first
-* element so the iif is picked up from the prior IPCB
+* element so the iif is picked up from the prior IPCB except
+* iif is loopback interface which the packet should be 
+* considered as if it is being arrived via the sending 
interface
 */
+   if (pktinfo->ipi_ifindex == LOOPBACK_IFINDEX) {
+   pktinfo->ipi_ifindex = inet_iif(skb);
+   }
pktinfo->ipi_spec_dst.s_addr = fib_compute_spec_dst(skb);
} else {
pktinfo->ipi_ifindex = 0;
-- 
1.8.3.1




Re: [PATCH v2] openvswitch: supply a dummy err_handler of gre_cisco_protocol to prevent kernel crash

2014-04-04 Thread wei zhang
At 2014-04-05 07:05:59,"Jesse Gross"  wrote:
>On Tue, Apr 1, 2014 at 5:23 PM, Wei Zhang  wrote:
>>
>> v2 -> v1: use the same logic of the gre_rcv() to distinguish which packet is
>> intended to us!
>
>As a tip on kernel process: if you put the version information after
>three dashes below the signed-off-by line then git will automatically
>remove it when the final patch is applied.

Thanks, should I modify it and send a v3 patch?

>
>> diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c
>> index a3d6951..f391df1 100644
>> --- a/net/openvswitch/vport-gre.c
>> +++ b/net/openvswitch/vport-gre.c
>> @@ -110,6 +110,21 @@ static int gre_rcv(struct sk_buff *skb,
>> return PACKET_RCVD;
>>  }
>>
>> +/* Called with rcu_read_lock and BH disabled. */
>> +static int gre_err(struct sk_buff *skb, u32 info,
>> +  const struct tnl_ptk_info *tpi)
>> +{
>> +   struct ovs_net *ovs_net;
>> +   struct vport *vport;
>> +
>> +   ovs_net = net_generic(dev_net(skb->dev), ovs_net_id);
>> +   vport = rcu_dereference(ovs_net->vport_net.gre_vport);
>> +   if (unlikely(!vport))
>> +   return PACKET_REJECT;
>> +   else
>> +   return PACKET_RCVD;
>
>Sorry, I forgot to say this before - if we receive the packet then we
>should also call consume_skb() on it.

Maybe there is no need to call consume_skb()? The icmp_rcv() would
call kfree_skb() for us. I also checked the ipgre_err(), it return 
PACKET_RCVD without call consume_skb() too.

Regards,
Wei Zhang

Re: [PATCH v2] openvswitch: supply a dummy err_handler of gre_cisco_protocol to prevent kernel crash

2014-04-04 Thread wei zhang
At 2014-04-05 07:05:59,Jesse Gross je...@nicira.com wrote:
On Tue, Apr 1, 2014 at 5:23 PM, Wei Zhang asuka@163.com wrote:

 v2 - v1: use the same logic of the gre_rcv() to distinguish which packet is
 intended to us!

As a tip on kernel process: if you put the version information after
three dashes below the signed-off-by line then git will automatically
remove it when the final patch is applied.

Thanks, should I modify it and send a v3 patch?


 diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c
 index a3d6951..f391df1 100644
 --- a/net/openvswitch/vport-gre.c
 +++ b/net/openvswitch/vport-gre.c
 @@ -110,6 +110,21 @@ static int gre_rcv(struct sk_buff *skb,
 return PACKET_RCVD;
  }

 +/* Called with rcu_read_lock and BH disabled. */
 +static int gre_err(struct sk_buff *skb, u32 info,
 +  const struct tnl_ptk_info *tpi)
 +{
 +   struct ovs_net *ovs_net;
 +   struct vport *vport;
 +
 +   ovs_net = net_generic(dev_net(skb-dev), ovs_net_id);
 +   vport = rcu_dereference(ovs_net-vport_net.gre_vport);
 +   if (unlikely(!vport))
 +   return PACKET_REJECT;
 +   else
 +   return PACKET_RCVD;

Sorry, I forgot to say this before - if we receive the packet then we
should also call consume_skb() on it.

Maybe there is no need to call consume_skb()? The icmp_rcv() would
call kfree_skb() for us. I also checked the ipgre_err(), it return 
PACKET_RCVD without call consume_skb() too.

Regards,
Wei Zhang

Re: [PATCH] openvswitch: supply a dummy err_handler of gre_cisco_protocol to prevent kernel crash

2014-04-01 Thread wei zhang
At 2014-04-02 02:27:56,"Jesse Gross"  wrote:
>On Tue, Apr 1, 2014 at 8:24 AM, wei zhang  wrote:
>> At 2014-04-01 08:49:53,"Jesse Gross"  wrote:
>>>On Sun, Mar 30, 2014 at 5:12 AM, wei zhang  wrote:
>>>> At 2014-03-29 06:02:25,"Jesse Gross"  wrote:
>>
>>>> Maybe I misunderstand something? I think if we discard all packet pass to 
>>>>us
>>>> when we use gre vport, new gre_cisco_protocol which has lower priority 
>>>>could
>>>> not see the packet intended to it.
>>>
>>>That's true but in this case it would also not see any data packets,
>>>so I don't think that situation would work well anyways.
>>>
>>>> I checked the implementation of the ipgre_err(), which has be called before
>>>> the err_handler of gre vport. It use the the (local address, remote 
>>>>address, key)
>>>> to distinguish the packet which is realy intended to it, although it could 
>>>>not
>>>> always get the key from the icmp packet. Should we do as the same as it?
>>>> I'm not sure this is feasible, any advice is appreciate.
>>>
>>>OVS does flow based matching rather than using a static set of
>>>configuration parameters, so everything "matches" in some way
>>>(although the result might be to drop).
>>
>> So the flow based match could dynamically determine by the ovs daemon, we 
>>could
>> not find out the belonging of the packet as far as we call ovs_dp_upcall(), 
>>isn't it?
>
>That's right - and since the OVS flow table always has a default
>behavior (even if it is drop or send to controller) there's never a
>packet that isn't considered to be destined to OVS once it is
>received.
>
>If this makes sense to you, would you mind submitting the patch you
>had earlier formally with a commit message and signed off by line?

Sure, thank you again as it's my first time to send patch to the kernel, your 
patient and kind
help give me confidence to do it :)

Regards,
Wei 
ZhangN�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [PATCH] openvswitch: supply a dummy err_handler of gre_cisco_protocol to prevent kernel crash

2014-04-01 Thread wei zhang
At 2014-04-01 08:49:53,"Jesse Gross"  wrote:
>On Sun, Mar 30, 2014 at 5:12 AM, wei zhang  wrote:
>> At 2014-03-29 06:02:25,"Jesse Gross"  wrote:

>> Maybe I misunderstand something? I think if we discard all packet pass to us
>> when we use gre vport, new gre_cisco_protocol which has lower priority could
>> not see the packet intended to it.
>
>That's true but in this case it would also not see any data packets,
>so I don't think that situation would work well anyways.
>
>> I checked the implementation of the ipgre_err(), which has be called before
>> the err_handler of gre vport. It use the the (local address, remote address, 
>>key)
>> to distinguish the packet which is realy intended to it, although it could 
>>not
>> always get the key from the icmp packet. Should we do as the same as it?
>> I'm not sure this is feasible, any advice is appreciate.
>
>OVS does flow based matching rather than using a static set of
>configuration parameters, so everything "matches" in some way
>(although the result might be to drop). 

So the flow based match could dynamically determine by the ovs daemon, we could
not find out the belonging of the packet as far as we call ovs_dp_upcall(), 
isn't it?

>This generally means that OVS
>is the receiver of last resort and nothing currently has a lower
>priority. 

Thanks for your kind help, this clarify my misunderstanding!

>That actually means the difference between the patches is
>somewhat academic but it seems more robust for the logic to be
>consistent.

Regards,
Wei Zhang 



Re: [PATCH] openvswitch: supply a dummy err_handler of gre_cisco_protocol to prevent kernel crash

2014-04-01 Thread wei zhang
At 2014-04-01 08:49:53,Jesse Gross je...@nicira.com wrote:
On Sun, Mar 30, 2014 at 5:12 AM, wei zhang asuka@163.com wrote:
 At 2014-03-29 06:02:25,Jesse Gross je...@nicira.com wrote:

 Maybe I misunderstand something? I think if we discard all packet pass to us
 when we use gre vport, new gre_cisco_protocol which has lower priority could
 not see the packet intended to it.

That's true but in this case it would also not see any data packets,
so I don't think that situation would work well anyways.

 I checked the implementation of the ipgre_err(), which has be called before
 the err_handler of gre vport. It use the the (local address, remote address, 
key)
 to distinguish the packet which is realy intended to it, although it could 
not
 always get the key from the icmp packet. Should we do as the same as it?
 I'm not sure this is feasible, any advice is appreciate.

OVS does flow based matching rather than using a static set of
configuration parameters, so everything matches in some way
(although the result might be to drop). 

So the flow based match could dynamically determine by the ovs daemon, we could
not find out the belonging of the packet as far as we call ovs_dp_upcall(), 
isn't it?

This generally means that OVS
is the receiver of last resort and nothing currently has a lower
priority. 

Thanks for your kind help, this clarify my misunderstanding!

That actually means the difference between the patches is
somewhat academic but it seems more robust for the logic to be
consistent.

Regards,
Wei Zhang 



Re: [PATCH] openvswitch: supply a dummy err_handler of gre_cisco_protocol to prevent kernel crash

2014-04-01 Thread wei zhang
At 2014-04-02 02:27:56,Jesse Gross je...@nicira.com wrote:
On Tue, Apr 1, 2014 at 8:24 AM, wei zhang asuka@163.com wrote:
 At 2014-04-01 08:49:53,Jesse Gross je...@nicira.com wrote:
On Sun, Mar 30, 2014 at 5:12 AM, wei zhang asuka@163.com wrote:
 At 2014-03-29 06:02:25,Jesse Gross je...@nicira.com wrote:

 Maybe I misunderstand something? I think if we discard all packet pass to 
us
 when we use gre vport, new gre_cisco_protocol which has lower priority 
could
 not see the packet intended to it.

That's true but in this case it would also not see any data packets,
so I don't think that situation would work well anyways.

 I checked the implementation of the ipgre_err(), which has be called before
 the err_handler of gre vport. It use the the (local address, remote 
address, key)
 to distinguish the packet which is realy intended to it, although it could 
not
 always get the key from the icmp packet. Should we do as the same as it?
 I'm not sure this is feasible, any advice is appreciate.

OVS does flow based matching rather than using a static set of
configuration parameters, so everything matches in some way
(although the result might be to drop).

 So the flow based match could dynamically determine by the ovs daemon, we 
could
 not find out the belonging of the packet as far as we call ovs_dp_upcall(), 
isn't it?

That's right - and since the OVS flow table always has a default
behavior (even if it is drop or send to controller) there's never a
packet that isn't considered to be destined to OVS once it is
received.

If this makes sense to you, would you mind submitting the patch you
had earlier formally with a commit message and signed off by line?

Sure, thank you again as it's my first time to send patch to the kernel, your 
patient and kind
help give me confidence to do it :)

Regards,
Wei 
ZhangN�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [PATCH] openvswitch: supply a dummy err_handler of gre_cisco_protocol to prevent kernel crash

2014-03-30 Thread wei zhang
At 2014-03-29 06:02:25,"Jesse Gross"  wrote:

>I'm not sure that rejecting all ICMP packets is the correct thing do
>here since it means that we could pass them onto a later caller even
>though they are intended for us. We should probably use the same logic
>as for receiving packets and just discard them here.

Thank you very much for your advice, did you mean this logic?  

diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c
index a3d6951..c183a56 100644
--- a/net/openvswitch/vport-gre.c
+++ b/net/openvswitch/vport-gre.c
@@ -110,6 +110,21 @@ static int gre_rcv(struct sk_buff *skb,
        return PACKET_RCVD;
 }
 
+/* Called with rcu_read_lock and BH disabled. */
+static int gre_err(struct sk_buff *skb, u32 info,
+                  const struct tnl_ptk_info *tpi)
+{
+       struct ovs_net *ovs_net;
+       struct vport *vport;
+
+       ovs_net = net_generic(dev_net(skb->dev), ovs_net_id);
+       vport = rcu_dereference(ovs_net->vport_net.gre_vport);
+       if (unlikely(!vport))
+               return PACKET_REJECT;
+       else
+               return PACKET_RCVD;
+}

Maybe I misunderstand something? I think if we discard all packet pass to us
when we use gre vport, new gre_cisco_protocol which has lower priority could
not see the packet intended to it.

I checked the implementation of the ipgre_err(), which has be called before
the err_handler of gre vport. It use the the (local address, remote address, 
key)
to distinguish the packet which is realy intended to it, although it could not 
always get the key from the icmp packet. Should we do as the same as it?
I'm not sure this is feasible, any advice is appreciate.

Regards,
Wei Zhang
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [PATCH] openvswitch: supply a dummy err_handler of gre_cisco_protocol to prevent kernel crash

2014-03-30 Thread wei zhang
At 2014-03-29 06:02:25,Jesse Gross je...@nicira.com wrote:

I'm not sure that rejecting all ICMP packets is the correct thing do
here since it means that we could pass them onto a later caller even
though they are intended for us. We should probably use the same logic
as for receiving packets and just discard them here.

Thank you very much for your advice, did you mean this logic?  

diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c
index a3d6951..c183a56 100644
--- a/net/openvswitch/vport-gre.c
+++ b/net/openvswitch/vport-gre.c
@@ -110,6 +110,21 @@ static int gre_rcv(struct sk_buff *skb,
        return PACKET_RCVD;
 }
 
+/* Called with rcu_read_lock and BH disabled. */
+static int gre_err(struct sk_buff *skb, u32 info,
+                  const struct tnl_ptk_info *tpi)
+{
+       struct ovs_net *ovs_net;
+       struct vport *vport;
+
+       ovs_net = net_generic(dev_net(skb-dev), ovs_net_id);
+       vport = rcu_dereference(ovs_net-vport_net.gre_vport);
+       if (unlikely(!vport))
+               return PACKET_REJECT;
+       else
+               return PACKET_RCVD;
+}

Maybe I misunderstand something? I think if we discard all packet pass to us
when we use gre vport, new gre_cisco_protocol which has lower priority could
not see the packet intended to it.

I checked the implementation of the ipgre_err(), which has be called before
the err_handler of gre vport. It use the the (local address, remote address, 
key)
to distinguish the packet which is realy intended to it, although it could not 
always get the key from the icmp packet. Should we do as the same as it?
I'm not sure this is feasible, any advice is appreciate.

Regards,
Wei Zhang
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [PATCH] ipv4: gre: Fix null pointer dereference in gre_cisco_err()

2014-03-25 Thread wei zhang
At 2014-03-25 03:14:17,"David Miller"  wrote:
>From: Wei Zhang 
>Date: Mon, 24 Mar 2014 15:34:31 +0800
>
>> When use the gre vport, openvswitch register a gre_cisco_protocol but
>> does not supply a err_handler with it. The gre_cisco_err() call the
>> err_handler without existence check, cause the kernel crash.
>> 
>> This patch base on v3.14-rc7. But the bug affect all kernel newer than
>> 3.11!
>> 
>> Signed-off-by: Wei Zhang 
>
>Rather, openvswitch should provide an appropriate ->err_handler() that
>returns PACKET_RCVD or PACKET_REJECT.

Thank you for your explanation, I misunderstand it .
I'm very sorry about this bothering!

>
>I'm not applying this patch.

Re: [PATCH] ipv4: gre: Fix null pointer dereference in gre_cisco_err()

2014-03-25 Thread wei zhang
At 2014-03-25 03:14:17,David Miller da...@davemloft.net wrote:
From: Wei Zhang asuka@163.com
Date: Mon, 24 Mar 2014 15:34:31 +0800

 When use the gre vport, openvswitch register a gre_cisco_protocol but
 does not supply a err_handler with it. The gre_cisco_err() call the
 err_handler without existence check, cause the kernel crash.
 
 This patch base on v3.14-rc7. But the bug affect all kernel newer than
 3.11!
 
 Signed-off-by: Wei Zhang asuka@163.com

Rather, openvswitch should provide an appropriate -err_handler() that
returns PACKET_RCVD or PACKET_REJECT.

Thank you for your explanation, I misunderstand it .
I'm very sorry about this bothering!


I'm not applying this patch.

Re:[PATCH] ipv4: gre: Fix null pointer dereference in gre_cisco_err()

2014-03-24 Thread wei zhang
The crash is occur on Centos 6.4, when we use gre vport of openvswitch!

<1>BUG: unable to handle kernel NULL pointer dereference at (null)
<1>IP: [<(null)>] (null)
<4>PGD c2910b067 PUD c2927d067 PMD 0 
<4>Oops: 0010 [#1] SMP 
<4>last sysfs file: /sys/devices/virtual/net/gretap0/flags
<4>CPU 20 
<4>Modules linked in: ip_gre ip_tunnel xt_conntrack act_police cls_basic 
sch_ingress veth ipt_REDIRECT ipmi_devintf ipv6 openvswitch vxlan 
iptable_filter iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 
iptable_mangle iptable_raw iptable_security ip_tables power_meter sg dcdbas 
microcode sb_edac edac_core iTCO_wdt iTCO_vendor_support shpchp tg3 ext4 jbd2 
mbcache sd_mod crc_t10dif ahci wmi megaraid_sas dm_mirror dm_region_hash dm_log 
dm_mod [last unloaded: speedstep_lib]
<4>
<4>Pid: 2358, comm: ovs-vswitchd Not tainted 
2.6.32-358.123.2.openstack.el6.x86_64 #1 Dell Inc. PowerEdge R620/0D2D5F
<4>RIP: 0010:[<>]  [<(null)>] (null)
<4>RSP: 0018:880053743c70  EFLAGS: 00010282
<4>RAX: a01cbe20 RBX: 880bda46ecc0 RCX: 
<4>RDX: 880053743c78 RSI:  RDI: 880bda46ecc0
<4>RBP: 880053743ca8 R08: 5617f772 R09: 
<4>R10:  R11:  R12: 880053743c78
<4>R13:  R14: 880be5a5b244 R15: 
<4>FS:  7fe5d5f837c0() GS:88005374() knlGS:
<4>CS:  0010 DS:  ES:  CR0: 80050033
<4>CR2:  CR3: 000c2a804000 CR4: 000407e0
<4>DR0:  DR1:  DR2: 
<4>DR3:  DR6: 0ff0 DR7: 0400
<4>Process ovs-vswitchd (pid: 2358, threadinfo 880c284e, task 
880c28d1f500)
<4>Stack:
<4> 814c6e61 aba73000  880053743cc8
<4> 002f 880bda46ecc0 881800deb200 880053743cb8
<4> 814c676b 880053743cf8 814afa91 880053743ce8
<4>Call Trace:
<4>  
<4> [] ? gre_cisco_err+0x71/0x80
<4> [] gre_err+0x4b/0x50
<4> [] icmp_unreach+0x141/0x2e0
<4> [] icmp_rcv+0x290/0x330
<4> [] ? raw_local_deliver+0x221/0x250
<4> [] ip_local_deliver_finish+0xdd/0x2d0
<4> [] ip_local_deliver+0x98/0xa0
<4> [] ip_rcv_finish+0x12d/0x440
<4> [] ip_rcv+0x275/0x350
<4> [] ? ovs_netdev_frame_hook+0xb3/0x110 [openvswitch]
<4> [] __netif_receive_skb+0x4ab/0x750
<4> [] process_backlog+0x9a/0x100
<4> [] net_rx_action+0x103/0x2f0
<4> [] __do_softirq+0xc1/0x1e0
<4> [] ? call_softirq+0x1c/0x30
<4> [] call_softirq+0x1c/0x30
<4>  
<4> [] ? do_softirq+0x65/0xa0
<4> [] local_bh_enable+0x9a/0xb0
<4> [] ovs_packet_cmd_execute+0x20c/0x240 [openvswitch]
<4> [] genl_rcv_msg+0x203/0x250
<4> [] ? genl_rcv_msg+0x0/0x250
<4> [] netlink_rcv_skb+0xa9/0xd0
<4> [] genl_rcv+0x25/0x40
<4> [] netlink_unicast+0x2db/0x320
<4> [] netlink_sendmsg+0x2c0/0x3d0
<4> [] sock_sendmsg+0x123/0x150
<4> [] ? sock_recvmsg+0x133/0x160
<4> [] ? autoremove_wake_function+0x0/0x40
<4> [] ? pollwake+0x0/0x60
<4> [] ? pollwake+0x0/0x60
<4> [] ? pipe_read+0x2a7/0x4e0
<4> [] ? pollwake+0x0/0x60
<4> [] __sys_sendmsg+0x406/0x420
<4> [] ? ep_scan_ready_list+0x194/0x1a0
<4> [] ? ep_poll+0x12e/0x330
<4> [] ? security_file_permission+0x16/0x20
<4> [] sys_sendmsg+0x49/0x90
<4> [] system_call_fastpath+0x16/0x1b
<4>Code:  Bad RIP value.
<1>RIP  [<(null)>] (null)
<4> RSP 
<4>CR2: 
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

[PATCH] ipv4: gre: Fix null pointer dereference in gre_cisco_err()

2014-03-24 Thread Wei Zhang
When use the gre vport, openvswitch register a gre_cisco_protocol but
does not supply a err_handler with it. The gre_cisco_err() call the
err_handler without existence check, cause the kernel crash.

This patch base on v3.14-rc7. But the bug affect all kernel newer than
3.11!

Signed-off-by: Wei Zhang 
---
 net/ipv4/gre_demux.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/gre_demux.c b/net/ipv4/gre_demux.c
index 1863422..56b0d67 100644
--- a/net/ipv4/gre_demux.c
+++ b/net/ipv4/gre_demux.c
@@ -250,7 +250,7 @@ static void gre_cisco_err(struct sk_buff *skb, u32 info)
struct gre_cisco_protocol *proto;
 
proto = rcu_dereference(gre_cisco_proto_list[i]);
-   if (!proto)
+   if (!proto || !proto->err_handler)
continue;
 
if (proto->err_handler(skb, info, ) == PACKET_RCVD)
-- 
1.7.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ipv4: gre: Fix null pointer dereference in gre_cisco_err()

2014-03-24 Thread Wei Zhang
When use the gre vport, openvswitch register a gre_cisco_protocol but
does not supply a err_handler with it. The gre_cisco_err() call the
err_handler without existence check, cause the kernel crash.

This patch base on v3.14-rc7. But the bug affect all kernel newer than
3.11!

Signed-off-by: Wei Zhang asuka@163.com
---
 net/ipv4/gre_demux.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/gre_demux.c b/net/ipv4/gre_demux.c
index 1863422..56b0d67 100644
--- a/net/ipv4/gre_demux.c
+++ b/net/ipv4/gre_demux.c
@@ -250,7 +250,7 @@ static void gre_cisco_err(struct sk_buff *skb, u32 info)
struct gre_cisco_protocol *proto;
 
proto = rcu_dereference(gre_cisco_proto_list[i]);
-   if (!proto)
+   if (!proto || !proto-err_handler)
continue;
 
if (proto-err_handler(skb, info, tpi) == PACKET_RCVD)
-- 
1.7.1


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re:[PATCH] ipv4: gre: Fix null pointer dereference in gre_cisco_err()

2014-03-24 Thread wei zhang
The crash is occur on Centos 6.4, when we use gre vport of openvswitch!

1BUG: unable to handle kernel NULL pointer dereference at (null)
1IP: [(null)] (null)
4PGD c2910b067 PUD c2927d067 PMD 0 
4Oops: 0010 [#1] SMP 
4last sysfs file: /sys/devices/virtual/net/gretap0/flags
4CPU 20 
4Modules linked in: ip_gre ip_tunnel xt_conntrack act_police cls_basic 
sch_ingress veth ipt_REDIRECT ipmi_devintf ipv6 openvswitch vxlan 
iptable_filter iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 
iptable_mangle iptable_raw iptable_security ip_tables power_meter sg dcdbas 
microcode sb_edac edac_core iTCO_wdt iTCO_vendor_support shpchp tg3 ext4 jbd2 
mbcache sd_mod crc_t10dif ahci wmi megaraid_sas dm_mirror dm_region_hash dm_log 
dm_mod [last unloaded: speedstep_lib]
4
4Pid: 2358, comm: ovs-vswitchd Not tainted 
2.6.32-358.123.2.openstack.el6.x86_64 #1 Dell Inc. PowerEdge R620/0D2D5F
4RIP: 0010:[]  [(null)] (null)
4RSP: 0018:880053743c70  EFLAGS: 00010282
4RAX: a01cbe20 RBX: 880bda46ecc0 RCX: 
4RDX: 880053743c78 RSI:  RDI: 880bda46ecc0
4RBP: 880053743ca8 R08: 5617f772 R09: 
4R10:  R11:  R12: 880053743c78
4R13:  R14: 880be5a5b244 R15: 
4FS:  7fe5d5f837c0() GS:88005374() knlGS:
4CS:  0010 DS:  ES:  CR0: 80050033
4CR2:  CR3: 000c2a804000 CR4: 000407e0
4DR0:  DR1:  DR2: 
4DR3:  DR6: 0ff0 DR7: 0400
4Process ovs-vswitchd (pid: 2358, threadinfo 880c284e, task 
880c28d1f500)
4Stack:
4 814c6e61 aba73000  880053743cc8
4d 002f 880bda46ecc0 881800deb200 880053743cb8
4d 814c676b 880053743cf8 814afa91 880053743ce8
4Call Trace:
4 IRQ 
4 [814c6e61] ? gre_cisco_err+0x71/0x80
4 [814c676b] gre_err+0x4b/0x50
4 [814afa91] icmp_unreach+0x141/0x2e0
4 [814b02e0] icmp_rcv+0x290/0x330
4 [814a8c71] ? raw_local_deliver+0x221/0x250
4 [8148222d] ip_local_deliver_finish+0xdd/0x2d0
4 [814824b8] ip_local_deliver+0x98/0xa0
4 [8148197d] ip_rcv_finish+0x12d/0x440
4 [81481f05] ip_rcv+0x275/0x350
4 [a01ca503] ? ovs_netdev_frame_hook+0xb3/0x110 [openvswitch]
4 [81449e6b] __netif_receive_skb+0x4ab/0x750
4 [8144a1aa] process_backlog+0x9a/0x100
4 [8144f483] net_rx_action+0x103/0x2f0
4 [810770b1] __do_softirq+0xc1/0x1e0
4 [8100c1cc] ? call_softirq+0x1c/0x30
4 [8100c1cc] call_softirq+0x1c/0x30
4 EOI 
4 [8100de05] ? do_softirq+0x65/0xa0
4 [81076f3a] local_bh_enable+0x9a/0xb0
4 [a01c3b9c] ovs_packet_cmd_execute+0x20c/0x240 [openvswitch]
4 [81476013] genl_rcv_msg+0x203/0x250
4 [81475e10] ? genl_rcv_msg+0x0/0x250
4 [81474ca9] netlink_rcv_skb+0xa9/0xd0
4 [81475df5] genl_rcv+0x25/0x40
4 [814748db] netlink_unicast+0x2db/0x320
4 [81475350] netlink_sendmsg+0x2c0/0x3d0
4 [81436b33] sock_sendmsg+0x123/0x150
4 [814387e3] ? sock_recvmsg+0x133/0x160
4 [81096da0] ? autoremove_wake_function+0x0/0x40
4 [81197af0] ? pollwake+0x0/0x60
4 [81197af0] ? pollwake+0x0/0x60
4 [8118c687] ? pipe_read+0x2a7/0x4e0
4 [81197af0] ? pollwake+0x0/0x60
4 [81438326] __sys_sendmsg+0x406/0x420
4 [811c7c54] ? ep_scan_ready_list+0x194/0x1a0
4 [811c7dae] ? ep_poll+0x12e/0x330
4 [8121cb26] ? security_file_permission+0x16/0x20
4 [81438549] sys_sendmsg+0x49/0x90
4 [8100b072] system_call_fastpath+0x16/0x1b
4Code:  Bad RIP value.
1RIP  [(null)] (null)
4 RSP 880053743c70
4CR2: 
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a���
0��h���i