Re: [PATCH] mwifiex: add __GFP_REPEAT to skb allocation call

2016-03-28 Thread James Cameron
On Tue, Mar 29, 2016 at 12:47:20PM +0800, Wei-Ning Huang wrote:
> "single skb allocation failure" happens when system is under heavy
> memory pressure.  Add __GFP_REPEAT to skb allocation call so kernel
> attempts to reclaim pages and retry the allocation.

Oh, that's interesting, we're back to this symptom again.

Nice to see this fix.

Heavy memory pressure on 3.5 caused dev_alloc_skb failure in this
driver.  Tracked at OLPC as #12694.

-- 
James Cameron
http://quozl.netrek.org/


Re: am335x: no multicast reception over VLAN

2016-03-28 Thread Yegor Yefremov
Hi Mugunthan,

On Tue, Mar 29, 2016 at 6:00 AM, Mugunthan V N  wrote:
> Hi Yegor
>
> On Wednesday 16 March 2016 08:35 PM, Yegor Yefremov wrote:
>> I have an am335x based board using CPSW in Dual EMAC mode. Without
>> VLAN IDs I can receive and send multicast packets [1]. When I create
>> VLAN ID:
>>
>> ip link add link eth1 name eth1.100 type vlan id 100
>> ip addr add 192.168.100.2/24 brd 192.168.100.255 dev eth1.100
>> route add -net 224.0.0.0 netmask 224.0.0.0 eth1.100
>>
>> I can successfully send multicast packets, but not receive them. On
>> the other side of the Ethernet cable I've used Pandaboard. Pandaboard
>> could both receive and send multicast packets via VLAN.
>
> Are you trying multicast tx/rx on eth1 or eth1.100?

I'm trying multicast tx/rx on eth1.100.

eth1 has no problems.

Yegor

>> This setup was tested with both 3.18.21 and 4.5 kernels.
>>
>> Any idea?
>>
>> [1] https://pymotw.com/2/socket/multicast.html
>>
>
> --
> Regards
> Mugunthan V N
>


Re: [PATCH net] team: team should sync the port's uc/mc addrs when add a port

2016-03-28 Thread Cong Wang
On Mon, Mar 28, 2016 at 9:42 AM, Xin Long  wrote:
> There is an issue when we use mavtap over team:
> When we replug nic links from team0, the real nics's mc list will not
> include the maddr for macvtap any more. then we can't receive pkts to
> macvtap device, as they are filterred by mc list of nic.
>
> In Bonding Driver, it syncs the uc/mc addrs in bond_enslave().
>
> We will fix this issue on team by adding the port's uc/mc addrs sync in
> team_port_add.
>
> Signed-off-by: Xin Long 
> ---
>  drivers/net/team/team.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
> index 26c64d2..17ff367 100644
> --- a/drivers/net/team/team.c
> +++ b/drivers/net/team/team.c
> @@ -1198,6 +1198,9 @@ static int team_port_add(struct team *team, struct 
> net_device *port_dev)
> goto err_dev_open;
> }
>
> +   dev_uc_sync_multiple(port_dev, dev);
> +   dev_mc_sync_multiple(port_dev, dev);
> +
> err = vlan_vids_add_by_dev(port_dev, dev);

You need to call dev_{uc,mc}_unsync() on error path, don't you?


Re: [net PATCH] gro: Allow tunnel stacking in the case of FOU/GUE

2016-03-28 Thread Tom Herbert
On Mon, Mar 28, 2016 at 9:15 PM, Alex Duyck  wrote:
> On Mon, Mar 28, 2016 at 9:01 PM, Tom Herbert  wrote:
>> On Mon, Mar 28, 2016 at 8:27 PM, Alexander Duyck
>>  wrote:
>>> On Mon, Mar 28, 2016 at 8:17 PM, Tom Herbert  wrote:
 On Mon, Mar 28, 2016 at 6:54 PM, Jesse Gross  wrote:
> On Mon, Mar 28, 2016 at 6:24 PM, Tom Herbert  wrote:
>> On Mon, Mar 28, 2016 at 4:58 PM, Alexander Duyck  
>> wrote:
>>> This patch should fix the issues seen with a recent fix to prevent
>>> tunnel-in-tunnel frames from being generated with GRO.  The fix itself 
>>> is
>>> correct for now as long as we do not add any devices that support
>>> NETIF_F_GSO_GRE_CSUM.  When such a device is added it could have the
>>> potential to mess things up due to the fact that the outer transport 
>>> header
>>> points to the outer UDP header and not the GRE header as would be 
>>> expected.
>>>
>>> Fixes: fac8e0f579695 ("tunnels: Don't apply GRO to multiple layers of 
>>> encapsulation.")
>>
>> This could only fix FOU/GUE. It is very possible someone else could
>> happily be doing some other layered encapsulation and never had a
>> problem before, so the decision to start enforcing only a single layer
>> of encapsulation for GRO would still break them. I still think we
>> should revert the patch, and for next version fixes things to that any
>> combination/nesting of encapsulation is supported, and if there are
>> exceptions to that support they need be clearly documented.
>
> It was pointed out to me that prior to my patch, it was also possible
> to remotely cause a stack overflow by filling up a packet with tunnel
> headers and letting GRO descend through them over and over again.
>
 Then the fix would be set set a reasonable limit on the number of
 encapsulation levels.

> Tom, I'm sorry that you don't like how I fixed this issue but there
> really, truly is a bug here. I gave you a specific example to be clear
> but that doesn't mean that is the only case. I am aware that the bug
> is not encountered in all situations and that the fix removes an
> optimization in some of those but I think that ensuring correct
> behavior must come first.

 The example you gave results in packet loss, this is not
 incorrectness. Actually reproduce a real issue that leads to
 incorrectness and then we can talk about a solution.
>>>
>>> Tom,
>>>
>>> Just take a look in the __skb_udp_tunnel_segment or gre_gso_segment
>>> code.  Then tell me how we are supposed to deal with the fact that the
>>> GSO code expects skb_inner_network_offset() to be valid.  If you have
>>> more than an inner and an outer network header we cannot.  So we
>>> cannot put GRE in UDP, or UDP in GRE if there is a network header
>>> between them.  The FOU/GUE code gets around this because in the IPIP
>>> and SIT cases you are adding an L4 header between two L3 headers.  The
>>> GRE case works because you essentially convert the GRE header into a
>>> tunnel header like VXLAN or GENEVE and we just overwrite the outer
>>> transport header offset.
>>>
>>> What it comes down to is that we can only support 2 network headers
>>> per frame.  One for the inner and one for the outer.  That is why we
>>> can have an exception for GUE as it only has 2 network headers.  If we
>>> had multiple levels of UDP, or GRE, or 2 levels of network headers
>>> either before or after either UDP or GRE we cannot support
>>> segmentation because the code will blow up and generate a malformed
>>> frame.
>>>
>> If you apply Edward's jumbo L2 header concept then
>> Eth|IP|UDP|VXLAN|Eth|IP|UDP|GUE|GRE|IPIP|IPv6|TCP|payload becomes
>> Eth|IP|UDP|encapsulation-hdrs|IPv6|TCP|Payload. One set of outer
>> headers, one set of inner headers. The rules that encapsulation_hdrs
>> don't contain fields that need to be modified for every segment need
>> to be supported in GRO and the stack when it generates such a
>> configuration.
>
> Thats all well and good but nothing like that exists now.  So you
> cannot expect us to fix the kernel to support code that isn't there.

No, but I do expect that you support code that is already there. There
was apparently zero testing done on the original patch and it caused
one very obvious regression. So how can we have any confidence
whatsoever that this patch doesn't break other things? Furthermore,
with all these claims of bugs I still don't see that _anyone_ has
taken the time to reproduce any issue and show that this patch
materially fixes any thing. I seriously don't understand how basic
testing could be such a challenge.

Anyway, what I expect is moot. It's up to davem to decide what to do
with this...

Tom

> In addition there were a number of issues with the jumbo L2 

[PATCH] mwifiex: add __GFP_REPEAT to skb allocation call

2016-03-28 Thread Wei-Ning Huang
"single skb allocation failure" happens when system is under heavy
memory pressure.  Add __GFP_REPEAT to skb allocation call so kernel
attempts to reclaim pages and retry the allocation.

Signed-off-by: Wei-Ning Huang 
---
 drivers/net/wireless/marvell/mwifiex/sdio.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/sdio.c 
b/drivers/net/wireless/marvell/mwifiex/sdio.c
index b2c839a..c64989c 100644
--- a/drivers/net/wireless/marvell/mwifiex/sdio.c
+++ b/drivers/net/wireless/marvell/mwifiex/sdio.c
@@ -1124,7 +1124,8 @@ static void mwifiex_deaggr_sdio_pkt(struct 
mwifiex_adapter *adapter,
break;
}
skb_deaggr = mwifiex_alloc_dma_align_buf(pkt_len,
-GFP_KERNEL | GFP_DMA);
+GFP_KERNEL | GFP_DMA |
+__GFP_REPEAT);
if (!skb_deaggr)
break;
skb_put(skb_deaggr, pkt_len);
@@ -1374,7 +1375,8 @@ static int mwifiex_sdio_card_to_host_mp_aggr(struct 
mwifiex_adapter *adapter,
/* copy pkt to deaggr buf */
skb_deaggr = mwifiex_alloc_dma_align_buf(len_arr[pind],
 GFP_KERNEL |
-GFP_DMA);
+GFP_DMA |
+__GFP_REPEAT);
if (!skb_deaggr) {
mwifiex_dbg(adapter, ERROR, "skb allocation 
failure\t"
"drop pkt len=%d type=%d\n",
@@ -1416,7 +1418,8 @@ rx_curr_single:
mwifiex_dbg(adapter, INFO, "info: RX: port: %d, rx_len: %d\n",
port, rx_len);
 
-   skb = mwifiex_alloc_dma_align_buf(rx_len, GFP_KERNEL | GFP_DMA);
+   skb = mwifiex_alloc_dma_align_buf(rx_len, GFP_KERNEL | GFP_DMA |
+ __GFP_REPEAT);
if (!skb) {
mwifiex_dbg(adapter, ERROR,
"single skb allocated fail,\t"
@@ -1521,7 +1524,8 @@ static int mwifiex_process_int_status(struct 
mwifiex_adapter *adapter)
rx_len = (u16) (rx_blocks * MWIFIEX_SDIO_BLOCK_SIZE);
mwifiex_dbg(adapter, INFO, "info: rx_len = %d\n", rx_len);
 
-   skb = mwifiex_alloc_dma_align_buf(rx_len, GFP_KERNEL | GFP_DMA);
+   skb = mwifiex_alloc_dma_align_buf(rx_len, GFP_KERNEL | GFP_DMA |
+ __GFP_REPEAT);
if (!skb)
return -1;
 
-- 
2.8.0.rc3.226.g39d4020



Re: [PATCH 3/4] wcn36xx: Transition driver to SMD client

2016-03-28 Thread Bjorn Andersson
On Mon, Jan 11, 2016 at 1:02 AM, Eugene Krasnikov  wrote:
> Better late than never! Looks good to me.
>

Unfortunately I ran into an issue with ordering of operations between
the WiFi driver and the wcnss_ctrl driver. So an updated series is on
the way, but this depends on changes to the wcnss_ctrl driver, which
are being reviewed right now.

Regards,
Bjorn

> 2015-12-28 1:34 GMT+00:00 Bjorn Andersson :
>> The wcn36xx wifi driver follows the life cycle of the WLAN_CTRL SMD
>> channel, as such it should be a SMD client. This patch makes this
>> transition, now that we have the necessary frameworks available.
>>
>> Signed-off-by: Bjorn Andersson 
>> ---
>>  drivers/net/wireless/ath/wcn36xx/Kconfig   |   2 +-
>>  drivers/net/wireless/ath/wcn36xx/dxe.c |  16 +++--
>>  drivers/net/wireless/ath/wcn36xx/main.c| 111 
>> +
>>  drivers/net/wireless/ath/wcn36xx/smd.c |  26 ---
>>  drivers/net/wireless/ath/wcn36xx/smd.h |   4 ++
>>  drivers/net/wireless/ath/wcn36xx/wcn36xx.h |  21 ++
>>  6 files changed, 99 insertions(+), 81 deletions(-)
>>
>> diff --git a/drivers/net/wireless/ath/wcn36xx/Kconfig 
>> b/drivers/net/wireless/ath/wcn36xx/Kconfig
>> index 591ebaea8265..394fe5b77c90 100644
>> --- a/drivers/net/wireless/ath/wcn36xx/Kconfig
>> +++ b/drivers/net/wireless/ath/wcn36xx/Kconfig
>> @@ -1,6 +1,6 @@
>>  config WCN36XX
>> tristate "Qualcomm Atheros WCN3660/3680 support"
>> -   depends on MAC80211 && HAS_DMA
>> +   depends on MAC80211 && HAS_DMA && QCOM_SMD
>> ---help---
>>   This module adds support for wireless adapters based on
>>   Qualcomm Atheros WCN3660 and WCN3680 mobile chipsets.
>> diff --git a/drivers/net/wireless/ath/wcn36xx/dxe.c 
>> b/drivers/net/wireless/ath/wcn36xx/dxe.c
>> index f8dfa05b290a..47f3937a7ab9 100644
>> --- a/drivers/net/wireless/ath/wcn36xx/dxe.c
>> +++ b/drivers/net/wireless/ath/wcn36xx/dxe.c
>> @@ -23,6 +23,7 @@
>>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>>
>>  #include 
>> +#include 
>>  #include "wcn36xx.h"
>>  #include "txrx.h"
>>
>> @@ -150,9 +151,12 @@ int wcn36xx_dxe_alloc_ctl_blks(struct wcn36xx *wcn)
>> goto out_err;
>>
>> /* Initialize SMSM state  Clear TX Enable RING EMPTY STATE */
>> -   ret = wcn->ctrl_ops->smsm_change_state(
>> -   WCN36XX_SMSM_WLAN_TX_ENABLE,
>> -   WCN36XX_SMSM_WLAN_TX_RINGS_EMPTY);
>> +   ret = qcom_smem_state_update_bits(wcn->tx_enable_state,
>> + WCN36XX_SMSM_WLAN_TX_ENABLE |
>> + WCN36XX_SMSM_WLAN_TX_RINGS_EMPTY,
>> + WCN36XX_SMSM_WLAN_TX_RINGS_EMPTY);
>> +   if (ret)
>> +   goto out_err;
>>
>> return 0;
>>
>> @@ -676,9 +680,9 @@ int wcn36xx_dxe_tx_frame(struct wcn36xx *wcn,
>>  * notify chip about new frame through SMSM bus.
>>  */
>> if (is_low &&  vif_priv->pw_state == WCN36XX_BMPS) {
>> -   wcn->ctrl_ops->smsm_change_state(
>> - 0,
>> - WCN36XX_SMSM_WLAN_TX_ENABLE);
>> +   qcom_smem_state_update_bits(wcn->tx_rings_empty_state,
>> +   WCN36XX_SMSM_WLAN_TX_ENABLE,
>> +   WCN36XX_SMSM_WLAN_TX_ENABLE);
>> } else {
>> /* indicate End Of Packet and generate interrupt on 
>> descriptor
>>  * done.
>> diff --git a/drivers/net/wireless/ath/wcn36xx/main.c 
>> b/drivers/net/wireless/ath/wcn36xx/main.c
>> index 7c169abdbafe..8659e3f997d2 100644
>> --- a/drivers/net/wireless/ath/wcn36xx/main.c
>> +++ b/drivers/net/wireless/ath/wcn36xx/main.c
>> @@ -19,6 +19,9 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>> +#include 
>> +#include 
>>  #include "wcn36xx.h"
>>
>>  unsigned int wcn36xx_dbg_mask;
>> @@ -981,48 +984,63 @@ static int wcn36xx_init_ieee80211(struct wcn36xx *wcn)
>>  }
>>
>>  static int wcn36xx_platform_get_resources(struct wcn36xx *wcn,
>> - struct platform_device *pdev)
>> + struct device *dev)
>>  {
>> -   struct resource *res;
>> +   u32 mmio[2];
>> +   int ret;
>> +
>> /* Set TX IRQ */
>> -   res = platform_get_resource_byname(pdev, IORESOURCE_IRQ,
>> -  "wcnss_wlantx_irq");
>> -   if (!res) {
>> +   wcn->tx_irq = irq_of_parse_and_map(dev->of_node, 0);
>> +   if (!wcn->tx_irq) {
>> wcn36xx_err("failed to get tx_irq\n");
>> return -ENOENT;
>> }
>> -   wcn->tx_irq = res->start;
>>
>> /* Set RX IRQ */
>> -   res = platform_get_resource_byname(pdev, IORESOURCE_IRQ,
>> -  "wcnss_wlanrx_irq");
>> -   

Re: [RFC PATCH] tcp: Add SOF_TIMESTAMPING_TX_EOR and allow MSG_EOR in tcp_sendmsg

2016-03-28 Thread Yuchung Cheng
On Sun, Mar 27, 2016 at 10:42 PM, Martin KaFai Lau  wrote:
>
> On Fri, Mar 25, 2016 at 04:05:51PM -0700, Yuchung Cheng wrote:
> > Looks like an interesting and useful patch. Since HTTP2 allows
> > multiplexing data stream frames from multiple logical streams on a
> > single socket,
> > how would you instrument to measure the latency of each stream? e.g.,
> >
> > sendmsg of data_frame_1_of_stream_a
> > sendmsg of data_frame_1_of_stream_b
> > sendmsg of data_frame_2_of_stream_a
> > sendmsg of data_frame_1_of_stream_c
> > sendmsg of data_frame_2_of_stream_b
>
> A quick recall from the end of the commit message:
> "One of our use case is at the webserver.  The webserver tracks
> the HTTP2 response latency by measuring when the webserver
> sends the first byte to the socket till the TCP ACK of the last byte
> is received."
>
> It is the server side perception on how long does it take to deliver
> the whole response/stream to the client.  Hence, the number of
> interleaved streams does not matter.
>
> Some sample use cases are,
> comparing TCP sysctl/code changes,
> observing encoding/compression impact (e.g. HPACK in HTTP2).
>
> Assuming frame_2 is the end stream for 'a' and 'b':
> sendmsg of data_frame_1_of_stream_a
> sendmsg of data_frame_1_of_stream_b
> sendmsg of data_frame_2_of_stream_a MSG_EOR
> sendmsg of data_frame_1_of_stream_c
> sendmsg of data_frame_2_of_stream_b MSG_EOR
>
> Are you suggesting other useful ways/metrics should be measured in
> this case?
No this is what I have in mind but wanted to confirm. Thanks.


Re: [PATCH net] team: team should sync the port's uc/mc addrs when add a port

2016-03-28 Thread David Miller
From: Xin Long 
Date: Tue, 29 Mar 2016 00:42:31 +0800

> There is an issue when we use mavtap over team:
> When we replug nic links from team0, the real nics's mc list will not
> include the maddr for macvtap any more. then we can't receive pkts to
> macvtap device, as they are filterred by mc list of nic.
> 
> In Bonding Driver, it syncs the uc/mc addrs in bond_enslave().
> 
> We will fix this issue on team by adding the port's uc/mc addrs sync in
> team_port_add.
> 
> Signed-off-by: Xin Long 

Applied, thank you.


Re: [net PATCH] gro: Allow tunnel stacking in the case of FOU/GUE

2016-03-28 Thread Alex Duyck
On Mon, Mar 28, 2016 at 9:01 PM, Tom Herbert  wrote:
> On Mon, Mar 28, 2016 at 8:27 PM, Alexander Duyck
>  wrote:
>> On Mon, Mar 28, 2016 at 8:17 PM, Tom Herbert  wrote:
>>> On Mon, Mar 28, 2016 at 6:54 PM, Jesse Gross  wrote:
 On Mon, Mar 28, 2016 at 6:24 PM, Tom Herbert  wrote:
> On Mon, Mar 28, 2016 at 4:58 PM, Alexander Duyck  
> wrote:
>> This patch should fix the issues seen with a recent fix to prevent
>> tunnel-in-tunnel frames from being generated with GRO.  The fix itself is
>> correct for now as long as we do not add any devices that support
>> NETIF_F_GSO_GRE_CSUM.  When such a device is added it could have the
>> potential to mess things up due to the fact that the outer transport 
>> header
>> points to the outer UDP header and not the GRE header as would be 
>> expected.
>>
>> Fixes: fac8e0f579695 ("tunnels: Don't apply GRO to multiple layers of 
>> encapsulation.")
>
> This could only fix FOU/GUE. It is very possible someone else could
> happily be doing some other layered encapsulation and never had a
> problem before, so the decision to start enforcing only a single layer
> of encapsulation for GRO would still break them. I still think we
> should revert the patch, and for next version fixes things to that any
> combination/nesting of encapsulation is supported, and if there are
> exceptions to that support they need be clearly documented.

 It was pointed out to me that prior to my patch, it was also possible
 to remotely cause a stack overflow by filling up a packet with tunnel
 headers and letting GRO descend through them over and over again.

>>> Then the fix would be set set a reasonable limit on the number of
>>> encapsulation levels.
>>>
 Tom, I'm sorry that you don't like how I fixed this issue but there
 really, truly is a bug here. I gave you a specific example to be clear
 but that doesn't mean that is the only case. I am aware that the bug
 is not encountered in all situations and that the fix removes an
 optimization in some of those but I think that ensuring correct
 behavior must come first.
>>>
>>> The example you gave results in packet loss, this is not
>>> incorrectness. Actually reproduce a real issue that leads to
>>> incorrectness and then we can talk about a solution.
>>
>> Tom,
>>
>> Just take a look in the __skb_udp_tunnel_segment or gre_gso_segment
>> code.  Then tell me how we are supposed to deal with the fact that the
>> GSO code expects skb_inner_network_offset() to be valid.  If you have
>> more than an inner and an outer network header we cannot.  So we
>> cannot put GRE in UDP, or UDP in GRE if there is a network header
>> between them.  The FOU/GUE code gets around this because in the IPIP
>> and SIT cases you are adding an L4 header between two L3 headers.  The
>> GRE case works because you essentially convert the GRE header into a
>> tunnel header like VXLAN or GENEVE and we just overwrite the outer
>> transport header offset.
>>
>> What it comes down to is that we can only support 2 network headers
>> per frame.  One for the inner and one for the outer.  That is why we
>> can have an exception for GUE as it only has 2 network headers.  If we
>> had multiple levels of UDP, or GRE, or 2 levels of network headers
>> either before or after either UDP or GRE we cannot support
>> segmentation because the code will blow up and generate a malformed
>> frame.
>>
> If you apply Edward's jumbo L2 header concept then
> Eth|IP|UDP|VXLAN|Eth|IP|UDP|GUE|GRE|IPIP|IPv6|TCP|payload becomes
> Eth|IP|UDP|encapsulation-hdrs|IPv6|TCP|Payload. One set of outer
> headers, one set of inner headers. The rules that encapsulation_hdrs
> don't contain fields that need to be modified for every segment need
> to be supported in GRO and the stack when it generates such a
> configuration.

Thats all well and good but nothing like that exists now.  So you
cannot expect us to fix the kernel to support code that isn't there.
In addition there were a number of issues with the jumbo L2 header
approach.  That was one of the reasons why I went with the jumbo L3
header approach as it is much easier to be compliant with all the
RFCs.

We might be able to get some of that supported for net-next but things
are going to be limited.  We need to have the UDP tunnels actually
setting the DF bit which as far as I know none of them do now.  In
addition we will have to add rules for all the encapsulated types so
that we can enforce the outer IP header incrementing in the event that
DF is not set.  Then we will also have to go through and make certain
that we have the DF bit set in all headers between the transport and
the outer network header in order to allow support for GSO partial.

What you are describing is no small task.  There are bugs 

Re: [PATCH net 0/4] misc. small fixes.

2016-03-28 Thread David Miller
From: Michael Chan 
Date: Mon, 28 Mar 2016 19:46:03 -0400

> Misc. small fixes for net.

Series applied, thanks Michael.


Re: [net PATCH] gro: Allow tunnel stacking in the case of FOU/GUE

2016-03-28 Thread Tom Herbert
On Mon, Mar 28, 2016 at 8:27 PM, Alexander Duyck
 wrote:
> On Mon, Mar 28, 2016 at 8:17 PM, Tom Herbert  wrote:
>> On Mon, Mar 28, 2016 at 6:54 PM, Jesse Gross  wrote:
>>> On Mon, Mar 28, 2016 at 6:24 PM, Tom Herbert  wrote:
 On Mon, Mar 28, 2016 at 4:58 PM, Alexander Duyck  
 wrote:
> This patch should fix the issues seen with a recent fix to prevent
> tunnel-in-tunnel frames from being generated with GRO.  The fix itself is
> correct for now as long as we do not add any devices that support
> NETIF_F_GSO_GRE_CSUM.  When such a device is added it could have the
> potential to mess things up due to the fact that the outer transport 
> header
> points to the outer UDP header and not the GRE header as would be 
> expected.
>
> Fixes: fac8e0f579695 ("tunnels: Don't apply GRO to multiple layers of 
> encapsulation.")

 This could only fix FOU/GUE. It is very possible someone else could
 happily be doing some other layered encapsulation and never had a
 problem before, so the decision to start enforcing only a single layer
 of encapsulation for GRO would still break them. I still think we
 should revert the patch, and for next version fixes things to that any
 combination/nesting of encapsulation is supported, and if there are
 exceptions to that support they need be clearly documented.
>>>
>>> It was pointed out to me that prior to my patch, it was also possible
>>> to remotely cause a stack overflow by filling up a packet with tunnel
>>> headers and letting GRO descend through them over and over again.
>>>
>> Then the fix would be set set a reasonable limit on the number of
>> encapsulation levels.
>>
>>> Tom, I'm sorry that you don't like how I fixed this issue but there
>>> really, truly is a bug here. I gave you a specific example to be clear
>>> but that doesn't mean that is the only case. I am aware that the bug
>>> is not encountered in all situations and that the fix removes an
>>> optimization in some of those but I think that ensuring correct
>>> behavior must come first.
>>
>> The example you gave results in packet loss, this is not
>> incorrectness. Actually reproduce a real issue that leads to
>> incorrectness and then we can talk about a solution.
>
> Tom,
>
> Just take a look in the __skb_udp_tunnel_segment or gre_gso_segment
> code.  Then tell me how we are supposed to deal with the fact that the
> GSO code expects skb_inner_network_offset() to be valid.  If you have
> more than an inner and an outer network header we cannot.  So we
> cannot put GRE in UDP, or UDP in GRE if there is a network header
> between them.  The FOU/GUE code gets around this because in the IPIP
> and SIT cases you are adding an L4 header between two L3 headers.  The
> GRE case works because you essentially convert the GRE header into a
> tunnel header like VXLAN or GENEVE and we just overwrite the outer
> transport header offset.
>
> What it comes down to is that we can only support 2 network headers
> per frame.  One for the inner and one for the outer.  That is why we
> can have an exception for GUE as it only has 2 network headers.  If we
> had multiple levels of UDP, or GRE, or 2 levels of network headers
> either before or after either UDP or GRE we cannot support
> segmentation because the code will blow up and generate a malformed
> frame.
>
If you apply Edward's jumbo L2 header concept then
Eth|IP|UDP|VXLAN|Eth|IP|UDP|GUE|GRE|IPIP|IPv6|TCP|payload becomes
Eth|IP|UDP|encapsulation-hdrs|IPv6|TCP|Payload. One set of outer
headers, one set of inner headers. The rules that encapsulation_hdrs
don't contain fields that need to be modified for every segment need
to be supported in GRO and the stack when it generates such a
configuration.

> - Alex


Re: am335x: no multicast reception over VLAN

2016-03-28 Thread Mugunthan V N
Hi Yegor

On Wednesday 16 March 2016 08:35 PM, Yegor Yefremov wrote:
> I have an am335x based board using CPSW in Dual EMAC mode. Without
> VLAN IDs I can receive and send multicast packets [1]. When I create
> VLAN ID:
> 
> ip link add link eth1 name eth1.100 type vlan id 100
> ip addr add 192.168.100.2/24 brd 192.168.100.255 dev eth1.100
> route add -net 224.0.0.0 netmask 224.0.0.0 eth1.100
> 
> I can successfully send multicast packets, but not receive them. On
> the other side of the Ethernet cable I've used Pandaboard. Pandaboard
> could both receive and send multicast packets via VLAN.

Are you trying multicast tx/rx on eth1 or eth1.100?

> 
> This setup was tested with both 3.18.21 and 4.5 kernels.
> 
> Any idea?
> 
> [1] https://pymotw.com/2/socket/multicast.html
> 

--
Regards
Mugunthan V N



Re: [net PATCH] gro: Allow tunnel stacking in the case of FOU/GUE

2016-03-28 Thread Alexander Duyck
On Mon, Mar 28, 2016 at 8:17 PM, Tom Herbert  wrote:
> On Mon, Mar 28, 2016 at 6:54 PM, Jesse Gross  wrote:
>> On Mon, Mar 28, 2016 at 6:24 PM, Tom Herbert  wrote:
>>> On Mon, Mar 28, 2016 at 4:58 PM, Alexander Duyck  
>>> wrote:
 This patch should fix the issues seen with a recent fix to prevent
 tunnel-in-tunnel frames from being generated with GRO.  The fix itself is
 correct for now as long as we do not add any devices that support
 NETIF_F_GSO_GRE_CSUM.  When such a device is added it could have the
 potential to mess things up due to the fact that the outer transport header
 points to the outer UDP header and not the GRE header as would be expected.

 Fixes: fac8e0f579695 ("tunnels: Don't apply GRO to multiple layers of 
 encapsulation.")
>>>
>>> This could only fix FOU/GUE. It is very possible someone else could
>>> happily be doing some other layered encapsulation and never had a
>>> problem before, so the decision to start enforcing only a single layer
>>> of encapsulation for GRO would still break them. I still think we
>>> should revert the patch, and for next version fixes things to that any
>>> combination/nesting of encapsulation is supported, and if there are
>>> exceptions to that support they need be clearly documented.
>>
>> It was pointed out to me that prior to my patch, it was also possible
>> to remotely cause a stack overflow by filling up a packet with tunnel
>> headers and letting GRO descend through them over and over again.
>>
> Then the fix would be set set a reasonable limit on the number of
> encapsulation levels.
>
>> Tom, I'm sorry that you don't like how I fixed this issue but there
>> really, truly is a bug here. I gave you a specific example to be clear
>> but that doesn't mean that is the only case. I am aware that the bug
>> is not encountered in all situations and that the fix removes an
>> optimization in some of those but I think that ensuring correct
>> behavior must come first.
>
> The example you gave results in packet loss, this is not
> incorrectness. Actually reproduce a real issue that leads to
> incorrectness and then we can talk about a solution.

Tom,

Just take a look in the __skb_udp_tunnel_segment or gre_gso_segment
code.  Then tell me how we are supposed to deal with the fact that the
GSO code expects skb_inner_network_offset() to be valid.  If you have
more than an inner and an outer network header we cannot.  So we
cannot put GRE in UDP, or UDP in GRE if there is a network header
between them.  The FOU/GUE code gets around this because in the IPIP
and SIT cases you are adding an L4 header between two L3 headers.  The
GRE case works because you essentially convert the GRE header into a
tunnel header like VXLAN or GENEVE and we just overwrite the outer
transport header offset.

What it comes down to is that we can only support 2 network headers
per frame.  One for the inner and one for the outer.  That is why we
can have an exception for GUE as it only has 2 network headers.  If we
had multiple levels of UDP, or GRE, or 2 levels of network headers
either before or after either UDP or GRE we cannot support
segmentation because the code will blow up and generate a malformed
frame.

- Alex


Re: [net PATCH] gro: Allow tunnel stacking in the case of FOU/GUE

2016-03-28 Thread Tom Herbert
On Mon, Mar 28, 2016 at 6:54 PM, Jesse Gross  wrote:
> On Mon, Mar 28, 2016 at 6:24 PM, Tom Herbert  wrote:
>> On Mon, Mar 28, 2016 at 4:58 PM, Alexander Duyck  wrote:
>>> This patch should fix the issues seen with a recent fix to prevent
>>> tunnel-in-tunnel frames from being generated with GRO.  The fix itself is
>>> correct for now as long as we do not add any devices that support
>>> NETIF_F_GSO_GRE_CSUM.  When such a device is added it could have the
>>> potential to mess things up due to the fact that the outer transport header
>>> points to the outer UDP header and not the GRE header as would be expected.
>>>
>>> Fixes: fac8e0f579695 ("tunnels: Don't apply GRO to multiple layers of 
>>> encapsulation.")
>>
>> This could only fix FOU/GUE. It is very possible someone else could
>> happily be doing some other layered encapsulation and never had a
>> problem before, so the decision to start enforcing only a single layer
>> of encapsulation for GRO would still break them. I still think we
>> should revert the patch, and for next version fixes things to that any
>> combination/nesting of encapsulation is supported, and if there are
>> exceptions to that support they need be clearly documented.
>
> It was pointed out to me that prior to my patch, it was also possible
> to remotely cause a stack overflow by filling up a packet with tunnel
> headers and letting GRO descend through them over and over again.
>
Then the fix would be set set a reasonable limit on the number of
encapsulation levels.

> Tom, I'm sorry that you don't like how I fixed this issue but there
> really, truly is a bug here. I gave you a specific example to be clear
> but that doesn't mean that is the only case. I am aware that the bug
> is not encountered in all situations and that the fix removes an
> optimization in some of those but I think that ensuring correct
> behavior must come first.

The example you gave results in packet loss, this is not
incorrectness. Actually reproduce a real issue that leads to
incorrectness and then we can talk about a solution.

Tom


Re: [net PATCH] gro: Allow tunnel stacking in the case of FOU/GUE

2016-03-28 Thread Jesse Gross
On Mon, Mar 28, 2016 at 4:58 PM, Alexander Duyck  wrote:
> This patch should fix the issues seen with a recent fix to prevent
> tunnel-in-tunnel frames from being generated with GRO.  The fix itself is
> correct for now as long as we do not add any devices that support
> NETIF_F_GSO_GRE_CSUM.  When such a device is added it could have the
> potential to mess things up due to the fact that the outer transport header
> points to the outer UDP header and not the GRE header as would be expected.
>
> Fixes: fac8e0f579695 ("tunnels: Don't apply GRO to multiple layers of 
> encapsulation.")
> Signed-off-by: Alexander Duyck 
> ---
>
> This should allow us to keep the fix that Jesse added without breaking the
> 3 cases that Tom called out in terms of FOU/GUE.
>
> Additional work will be needed in net-next as we probably need to make it
> so that offloads work correctly when we get around to supporting
> NETIF_F_GSO_GRE_CSUM.

Thanks, this looks like a reasonable fix to me. I agree that there is
more that can be done in the future to improve things but this should
restore GRO while still avoiding possible issues.


Re: [PATCH net v2 2/3] tunnels: Don't apply GRO to multiple layers of encapsulation.

2016-03-28 Thread Alexander Duyck
On Mon, Mar 28, 2016 at 5:50 PM, Tom Herbert  wrote:
> On Mon, Mar 28, 2016 at 4:34 PM, Jesse Gross  wrote:
>> * A packet is received that is encapsulated with two layers of GRE. It
>> looks like this: Eth|IP|GRE|IP|GRE|IP|TCP
>> * The packet is processed through GRO successfully. skb->encapsulation
>> bit is set and skb_shinfo(skb)->gso_type is set to SKB_GSO_GRE |
>> SKB_GSO_TCPV4.
>> * The packet is bridged to an output device and is prepared for
>> transmission. skb_gso_segment() is called to segment the packet.
>> * As we descent down the GSO code, we get to gre_gso_segment() for the
>> first GRE header. skb->encapsulation is cleared and we keep going to
>> the next header.
>> * We return to gre_gso_segment() again for the second GRE header.
>> There is a check for skb->encapsulation being set but it is now clear.
>> GSO processing is aborted.
>> * The packet is dropped.
>
> If multiple level of GRE GSO is the problem then you are welcome to
> fix it after FOU is fixed and when net-next opens (GSO partial should
> fix this this anyway). But this is not at all the same problem as
> saying all multiple levels of encapsulation are invalid so we need to
> disallow that in GRO for everyone regardless of whether the packet is
> is being forwarded or some driver can't understand a certain valid
> combination.

GSO partial is not magic.  It is not going to fix a great many of
these kind of issues.  All GSO partial gives us is a way for hardware
to segment frames if the headers can be replicated.  If GSO cannot
handle the frame in software then GSO partial still cannot handle it.

It looks like what we probably need to do is rewrite GSO in order to
support this as multiple tunnel levels are currently not supported.  I
figure it is something we can add once we have GSO partial in.  Maybe
call it GSO_STACKED.  I still have to figure out where we are going to
want the header pointers since there are obvious bits that wouldn't
work such as the part in UDP or GRE segmentation code where we assume
we can just reset the network header offset based off of the inner
network header offset.  If we have multiple levels of tunnels then
this is blatantly broken.  Maybe to get around it we might have to add
the tunnel header length as a value passed with UDP offloads.

> Testing GUE is really not hard. There is no reason why GUE, Geneve,
> and VXLAN should not be tested each time a change is made to any of
> the common UDP tunneling code.
>
> Configuring ipip-GUE can be done by:
>
> ./ip fou add port 6080 gue
> ./ip link add name tun1 type ipip remote 10.1.1.2 local 10.1.1.1 ttl
> 225 encap gue encap-sport auto encap-dport 6080 encap-csum
> encap-remcsum
> ifconfig tun1 192.168.1.1
> ip route add 192.168.1.0/24 dev tun1
>
> Configuring gre-GUE is just s/ipip/gre/ of above.
>
> ./netperf -H 192.168.1.2 -t TCP_STREAM -l 100

This is essentially the test case I ran to verify that the patch I
submitted fixed the issue.  I'm not sure we need to do too much
testing for the patch I submitted since the whole thing is pretty
straightforward.

> Is good for showing showing regression in a single flow. You should
> see GSO/GRO being effective on both sides and perf should show only a
> minute amount of time in csum_partial assuming that your device has
> checksum offload for outer UDP (which should be about all).
>
> ./super_netperf 200 -H 192.168.1.2 -j -l 1 -t TCP_RR -- -r 1,1 -o
> TRANSACTION_RATE,P50_LATENCY,P90_LATENCY,P99_LATENCY
>
> Is run to test pps. We're looking for no regression in tput, latency,
> or CPU utilization.
>
> Next time you make changes to anything that affects common paths in
> tunnels please run these tests and report the results in the commit
> log so we can avoid regressions like this. You should also be running
> an equivalent battery of VXLAN tests. The configuration I use is:
>
> ./ip link add vxlan0 type vxlan id 10 group 224.10.10.10 ttl 10 dev
> eth0  udpcsum remcsumtx remcsumrx
> ifconfig vxlan0 192.168.111.1
> ip route add 192.168.111.0/24 dev vxlan0
>
> Again we expect the same sort of results, GRO/GSO should be effective,
> csum_partial should not be getting significant tine, etc...

With my patch I can verify that GRO is working and coalescing frames
at least for the IPIP case as that was the only one I tested.

If there is something I missed I am willing to respin it and resubmit
it.  From what I can tell though it should restore the original
functionality for the FOU/GUE cases though since all it really does is
push the setting of the encap_mark back from the UDP header over to
the next level header which is the correct behavior (at least in my
opinion).

- Alex


Re: [net PATCH] gro: Allow tunnel stacking in the case of FOU/GUE

2016-03-28 Thread Jesse Gross
On Mon, Mar 28, 2016 at 6:24 PM, Tom Herbert  wrote:
> On Mon, Mar 28, 2016 at 4:58 PM, Alexander Duyck  wrote:
>> This patch should fix the issues seen with a recent fix to prevent
>> tunnel-in-tunnel frames from being generated with GRO.  The fix itself is
>> correct for now as long as we do not add any devices that support
>> NETIF_F_GSO_GRE_CSUM.  When such a device is added it could have the
>> potential to mess things up due to the fact that the outer transport header
>> points to the outer UDP header and not the GRE header as would be expected.
>>
>> Fixes: fac8e0f579695 ("tunnels: Don't apply GRO to multiple layers of 
>> encapsulation.")
>
> This could only fix FOU/GUE. It is very possible someone else could
> happily be doing some other layered encapsulation and never had a
> problem before, so the decision to start enforcing only a single layer
> of encapsulation for GRO would still break them. I still think we
> should revert the patch, and for next version fixes things to that any
> combination/nesting of encapsulation is supported, and if there are
> exceptions to that support they need be clearly documented.

It was pointed out to me that prior to my patch, it was also possible
to remotely cause a stack overflow by filling up a packet with tunnel
headers and letting GRO descend through them over and over again.

Tom, I'm sorry that you don't like how I fixed this issue but there
really, truly is a bug here. I gave you a specific example to be clear
but that doesn't mean that is the only case. I am aware that the bug
is not encountered in all situations and that the fix removes an
optimization in some of those but I think that ensuring correct
behavior must come first.


Re: [net PATCH] gro: Allow tunnel stacking in the case of FOU/GUE

2016-03-28 Thread Tom Herbert
On Mon, Mar 28, 2016 at 4:58 PM, Alexander Duyck  wrote:
> This patch should fix the issues seen with a recent fix to prevent
> tunnel-in-tunnel frames from being generated with GRO.  The fix itself is
> correct for now as long as we do not add any devices that support
> NETIF_F_GSO_GRE_CSUM.  When such a device is added it could have the
> potential to mess things up due to the fact that the outer transport header
> points to the outer UDP header and not the GRE header as would be expected.
>
> Fixes: fac8e0f579695 ("tunnels: Don't apply GRO to multiple layers of 
> encapsulation.")

This could only fix FOU/GUE. It is very possible someone else could
happily be doing some other layered encapsulation and never had a
problem before, so the decision to start enforcing only a single layer
of encapsulation for GRO would still break them. I still think we
should revert the patch, and for next version fixes things to that any
combination/nesting of encapsulation is supported, and if there are
exceptions to that support they need be clearly documented.

> Signed-off-by: Alexander Duyck 
> ---
>
> This should allow us to keep the fix that Jesse added without breaking the
> 3 cases that Tom called out in terms of FOU/GUE.
>
> Additional work will be needed in net-next as we probably need to make it
> so that offloads work correctly when we get around to supporting
> NETIF_F_GSO_GRE_CSUM.
>
>  net/ipv4/fou.c |   32 
>  1 file changed, 32 insertions(+)
>
> diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
> index 4136da9275b2..2c30256ee959 100644
> --- a/net/ipv4/fou.c
> +++ b/net/ipv4/fou.c
> @@ -195,6 +195,22 @@ static struct sk_buff **fou_gro_receive(struct sk_buff 
> **head,
> u8 proto = NAPI_GRO_CB(skb)->proto;
> const struct net_offload **offloads;
>
> +   switch (proto) {
> +   case IPPROTO_IPIP:
> +   case IPPROTO_IPV6:
> +   case IPPROTO_GRE:
> +   /* We can clear the encap_mark for these 3 protocols as
> +* we are either adding an L4 tunnel header to the outer
> +* L3 tunnel header, or we are are simply treating the
> +* GRE tunnel header as though it is a UDP protocol
> +* specific header such as VXLAN or GENEVE.
> +*/
> +   NAPI_GRO_CB(skb)->encap_mark = 0;
> +   /* fall-through */
> +   default:
> +   break;
> +   }
> +
> rcu_read_lock();
> offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads;
> ops = rcu_dereference(offloads[proto]);
> @@ -359,6 +375,22 @@ static struct sk_buff **gue_gro_receive(struct sk_buff 
> **head,
> NAPI_GRO_CB(p)->flush |= NAPI_GRO_CB(p)->flush_id;
> }
>
> +   switch (guehdr->proto_ctype) {
> +   case IPPROTO_IPIP:
> +   case IPPROTO_IPV6:
> +   case IPPROTO_GRE:
> +   /* We can clear the encap_mark for these 3 protocols as
> +* we are either adding an L4 tunnel header to the outer
> +* L3 tunnel header, or we are are simply treating the
> +* GRE tunnel header as though it is a UDP protocol
> +* specific header such as VXLAN or GENEVE.
> +*/
> +   NAPI_GRO_CB(skb)->encap_mark = 0;
> +   /* fall-through */
> +   default:
> +   break;
> +   }
> +
> rcu_read_lock();
> offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads;
> ops = rcu_dereference(offloads[guehdr->proto_ctype]);
>


Re: BUG: net/netfilter: KASAN: stack-out-of-bounds in tcp_packet

2016-03-28 Thread Eric Dumazet
On Mon, 2016-03-28 at 19:54 -0400, David Miller wrote:
> From: Eric Dumazet 
> Date: Mon, 28 Mar 2016 13:51:46 -0700
> 
> > On Mon, 2016-03-28 at 13:46 -0700, Eric Dumazet wrote:
> > 
> >> We have at least 384 bytes of padding in skb->head (this is struct
> >> skb_shared_info).
> >> 
> >> Whatever garbage we might read, current code is fine.
> >> 
> >> We have to deal with a false positive here.
> > 
> > Very similar to the one fixed in 
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=10ec9472f05b45c94db3c854d22581a20b97db41
> 
> I don't see them as similar.
> 
> The current options code we are talking about here never references
> past legitimate parts of the packet data.  We always check 'length',
> and we never access past the boundary it describes.
> 
> This was the entire point of my posting.
> 
> Talking about padding, rather than the logical correctness of the
> code, is therefore a distraction I think :-)

Not really, we do read one out of bound byte David.

length = 1;
...
while (length > 0) {
int opcode = *ptr++; // Note that length is still 1
switch (opcode) {
...
default:
opsize = *ptr++; // Note that length is still 1


...
length -= opsize;
}

So we do read 2 bytes, while length was 1.

opsize definitely can read garbage.
Call it padding or redzone or whatever ;)




Re: [net PATCH] gro: Allow tunnel stacking in the case of FOU/GUE

2016-03-28 Thread Tom Herbert
On Mon, Mar 28, 2016 at 4:58 PM, Alexander Duyck  wrote:
> This patch should fix the issues seen with a recent fix to prevent
> tunnel-in-tunnel frames from being generated with GRO.  The fix itself is
> correct for now as long as we do not add any devices that support
> NETIF_F_GSO_GRE_CSUM.  When such a device is added it could have the
> potential to mess things up due to the fact that the outer transport header
> points to the outer UDP header and not the GRE header as would be expected.
>
Did you test this?

> Fixes: fac8e0f579695 ("tunnels: Don't apply GRO to multiple layers of 
> encapsulation.")
> Signed-off-by: Alexander Duyck 
> ---
>
> This should allow us to keep the fix that Jesse added without breaking the
> 3 cases that Tom called out in terms of FOU/GUE.
>
> Additional work will be needed in net-next as we probably need to make it
> so that offloads work correctly when we get around to supporting
> NETIF_F_GSO_GRE_CSUM.
>
>  net/ipv4/fou.c |   32 
>  1 file changed, 32 insertions(+)
>
> diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
> index 4136da9275b2..2c30256ee959 100644
> --- a/net/ipv4/fou.c
> +++ b/net/ipv4/fou.c
> @@ -195,6 +195,22 @@ static struct sk_buff **fou_gro_receive(struct sk_buff 
> **head,
> u8 proto = NAPI_GRO_CB(skb)->proto;
> const struct net_offload **offloads;
>
> +   switch (proto) {
> +   case IPPROTO_IPIP:
> +   case IPPROTO_IPV6:
> +   case IPPROTO_GRE:
> +   /* We can clear the encap_mark for these 3 protocols as
> +* we are either adding an L4 tunnel header to the outer
> +* L3 tunnel header, or we are are simply treating the
> +* GRE tunnel header as though it is a UDP protocol
> +* specific header such as VXLAN or GENEVE.
> +*/
> +   NAPI_GRO_CB(skb)->encap_mark = 0;
> +   /* fall-through */
> +   default:
> +   break;
> +   }
> +
> rcu_read_lock();
> offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads;
> ops = rcu_dereference(offloads[proto]);
> @@ -359,6 +375,22 @@ static struct sk_buff **gue_gro_receive(struct sk_buff 
> **head,
> NAPI_GRO_CB(p)->flush |= NAPI_GRO_CB(p)->flush_id;
> }
>
> +   switch (guehdr->proto_ctype) {
> +   case IPPROTO_IPIP:
> +   case IPPROTO_IPV6:
> +   case IPPROTO_GRE:
> +   /* We can clear the encap_mark for these 3 protocols as
> +* we are either adding an L4 tunnel header to the outer
> +* L3 tunnel header, or we are are simply treating the
> +* GRE tunnel header as though it is a UDP protocol
> +* specific header such as VXLAN or GENEVE.
> +*/
> +   NAPI_GRO_CB(skb)->encap_mark = 0;
> +   /* fall-through */
> +   default:
> +   break;
> +   }
> +
> rcu_read_lock();
> offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads;
> ops = rcu_dereference(offloads[guehdr->proto_ctype]);
>


Re: [PATCH net v2 2/3] tunnels: Don't apply GRO to multiple layers of encapsulation.

2016-03-28 Thread Tom Herbert
On Mon, Mar 28, 2016 at 4:34 PM, Jesse Gross  wrote:
> On Mon, Mar 28, 2016 at 3:10 PM, Tom Herbert  wrote:
>> On Mon, Mar 28, 2016 at 1:34 PM, Alexander Duyck
>>  wrote:
>>> On Mon, Mar 28, 2016 at 1:03 PM, Tom Herbert  wrote:
 On Mon, Mar 28, 2016 at 12:31 PM, Alexander Duyck
  wrote:
> On Mon, Mar 28, 2016 at 11:47 AM, Tom Herbert  
> wrote:
>> On Mon, Mar 28, 2016 at 10:37 AM, Alexander Duyck
>>  wrote:
>>> On Mon, Mar 28, 2016 at 9:31 AM, Tom Herbert  
>>> wrote:
 On Sun, Mar 27, 2016 at 9:38 PM, Jesse Gross  wrote:
> On Sat, Mar 26, 2016 at 12:41 PM, Tom Herbert  
> wrote:
>> On Sat, Mar 19, 2016 at 9:32 AM, Jesse Gross  
>> wrote:
>>> When drivers express support for TSO of encapsulated packets, they
>>> only mean that they can do it for one layer of encapsulation.
>>> Supporting additional levels would mean updating, at a minimum,
>>> more IP length fields and they are unaware of this.
>>>
>> This patch completely breaks GRO for FOU and GUE.
>>
>>> No encapsulation device expresses support for handling offloaded
>>> encapsulated packets, so we won't generate these types of frames
>>> in the transmit path. However, GRO doesn't have a check for
>>> multiple levels of encapsulation and will attempt to build them.
>>>
>>> UDP tunnel GRO actually does prevent this situation but it only
>>> handles multiple UDP tunnels stacked on top of each other. This
>>> generalizes that solution to prevent any kind of tunnel stacking
>>> that would cause problems.
>>>
>> GUE and FOU regularly create packets that will be both GSO UDP 
>> tunnels
>> and IPIP, SIT, GRE, etc. This is by design. There should be no
>> ambiguity in the drivers as to what this means. For instance, if
>> SKB_GSO_UDP_TUNNEL and SKB_GSO_GRE are set that is GRE/UDP packet, a
>> driver can use ndo_features_check to validate.
>>
>> So multiple levels of encapsulation with GRO is perfectly valid and I
>> would suggest to simply revert this patch. The one potential issue we
>> could have is the potential for GRO to construct a packet which is a
>> UDP-encapsulation inside another encapsulation, like UDP-encap in 
>> GRE.
>> In this case the GSO flags don't provide enough information to
>> distinguish say between GRE/UDP (common case) and UDP/GRE (uncommon
>> case). To make this clear we can set udp_mark in GRE, ipip, and sit
>> but *not* check for it.
>
> Generally speaking, multiple levels of encapsulation offload are not
> valid. I think this is pretty clear from the fact that none of the
> encapsulation drivers expose support for encapsulation offloads on
> transmit and the drivers supporting NETIF_F_GSO_GRE and
> NETIF_F_GSO_UDP_TUNNEL do not mean they can handle GRE in VXLAN.
>

 Kernel software offload does support this combination just fine.
 Seriously, I've tested that more than a thousand times. This is a few
 HW implementations you're referring to. The limitations of these
 drivers should not dictate how we build the software-- it needs to
 work the other way around.
>>>
>>> Jesse does have a point.  Drivers aren't checking for this kind of
>>> thing currently as the transmit side doesn't generate these kind of
>>> frames.  The fact that GRO is makes things a bit more messy as we will
>>> really need to restructure the GSO code in order to handle it.  As far
>>> as drivers testing for it I am pretty certain the i40e isn't.  I would
>>> wonder if we need to add yet another GSO bit to indicate that we
>>> support multiple layers of encapsulation.  I'm pretty sure the only
>>> way we could possibly handle it would be in software since what you
>>> are indicating is a indeterminate number of headers that all require
>>> updates.
>>>
> Asking drivers to assume that this combination of flags means FOU
> doesn't seem right to me. To the best of my knowledge, no driver uses
> ndo_feature_check to do validation of multiple tunnel offload flags
> since the assumption is that the stack will never try to do this.
> Since FOU is being treated as only a single level of encapsulation, I
> think it would be better to just advertise it that way on transmit
> (i.e. only set SKB_GSO_UDP_TUNNEL).
>
 If it's not FOU it will be something else. Arbitrarily declaring

Re: [PATCH net,stable] qmi_wwan: add "D-Link DWM-221 B1" device id

2016-03-28 Thread David Miller
From: Bjørn Mork 
Date: Mon, 28 Mar 2016 22:38:16 +0200

> Thomas reports:
> "Windows:
 ...
> Linux:
 ...
> Reported-by: Thomas Schäfer 
> Signed-off-by: Bjørn Mork 

Applied and queued up for -stable, thanks!


[net PATCH] gro: Allow tunnel stacking in the case of FOU/GUE

2016-03-28 Thread Alexander Duyck
This patch should fix the issues seen with a recent fix to prevent
tunnel-in-tunnel frames from being generated with GRO.  The fix itself is
correct for now as long as we do not add any devices that support
NETIF_F_GSO_GRE_CSUM.  When such a device is added it could have the
potential to mess things up due to the fact that the outer transport header
points to the outer UDP header and not the GRE header as would be expected.

Fixes: fac8e0f579695 ("tunnels: Don't apply GRO to multiple layers of 
encapsulation.")
Signed-off-by: Alexander Duyck 
---

This should allow us to keep the fix that Jesse added without breaking the
3 cases that Tom called out in terms of FOU/GUE.

Additional work will be needed in net-next as we probably need to make it
so that offloads work correctly when we get around to supporting
NETIF_F_GSO_GRE_CSUM.

 net/ipv4/fou.c |   32 
 1 file changed, 32 insertions(+)

diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
index 4136da9275b2..2c30256ee959 100644
--- a/net/ipv4/fou.c
+++ b/net/ipv4/fou.c
@@ -195,6 +195,22 @@ static struct sk_buff **fou_gro_receive(struct sk_buff 
**head,
u8 proto = NAPI_GRO_CB(skb)->proto;
const struct net_offload **offloads;
 
+   switch (proto) {
+   case IPPROTO_IPIP:
+   case IPPROTO_IPV6:
+   case IPPROTO_GRE:
+   /* We can clear the encap_mark for these 3 protocols as
+* we are either adding an L4 tunnel header to the outer
+* L3 tunnel header, or we are are simply treating the
+* GRE tunnel header as though it is a UDP protocol
+* specific header such as VXLAN or GENEVE.
+*/
+   NAPI_GRO_CB(skb)->encap_mark = 0;
+   /* fall-through */
+   default:
+   break;
+   }
+
rcu_read_lock();
offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads;
ops = rcu_dereference(offloads[proto]);
@@ -359,6 +375,22 @@ static struct sk_buff **gue_gro_receive(struct sk_buff 
**head,
NAPI_GRO_CB(p)->flush |= NAPI_GRO_CB(p)->flush_id;
}
 
+   switch (guehdr->proto_ctype) {
+   case IPPROTO_IPIP:
+   case IPPROTO_IPV6:
+   case IPPROTO_GRE:
+   /* We can clear the encap_mark for these 3 protocols as
+* we are either adding an L4 tunnel header to the outer
+* L3 tunnel header, or we are are simply treating the
+* GRE tunnel header as though it is a UDP protocol
+* specific header such as VXLAN or GENEVE.
+*/
+   NAPI_GRO_CB(skb)->encap_mark = 0;
+   /* fall-through */
+   default:
+   break;
+   }
+
rcu_read_lock();
offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads;
ops = rcu_dereference(offloads[guehdr->proto_ctype]);



[PATCH net 3/4] bnxt_en: Fix typo in bnxt_hwrm_set_pause_common().

2016-03-28 Thread Michael Chan
The typo caused the wrong flow control bit to be set.

Reported by: Ajit Khaparde 
Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 4600a05..12a009d 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -4559,7 +4559,7 @@ bnxt_hwrm_set_pause_common(struct bnxt *bp, struct 
hwrm_port_phy_cfg_input *req)
if (bp->link_info.req_flow_ctrl & BNXT_LINK_PAUSE_RX)
req->auto_pause |= PORT_PHY_CFG_REQ_AUTO_PAUSE_RX;
if (bp->link_info.req_flow_ctrl & BNXT_LINK_PAUSE_TX)
-   req->auto_pause |= PORT_PHY_CFG_REQ_AUTO_PAUSE_RX;
+   req->auto_pause |= PORT_PHY_CFG_REQ_AUTO_PAUSE_TX;
req->enables |=
cpu_to_le32(PORT_PHY_CFG_REQ_ENABLES_AUTO_PAUSE);
} else {
-- 
1.8.3.1



[PATCH net 2/4] bnxt_en: Implement proper firmware message padding.

2016-03-28 Thread Michael Chan
The size of every padded firmware message is specified in the first
HWRM_VER_GET response message.  Use this value to pad every message
after that.

Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 6 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h | 2 ++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index c92053c..4600a05 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -2653,7 +2653,7 @@ static int bnxt_hwrm_do_send_msg(struct bnxt *bp, void 
*msg, u32 msg_len,
/* Write request msg to hwrm channel */
__iowrite32_copy(bp->bar0, data, msg_len / 4);
 
-   for (i = msg_len; i < HWRM_MAX_REQ_LEN; i += 4)
+   for (i = msg_len; i < BNXT_HWRM_MAX_REQ_LEN; i += 4)
writel(0, bp->bar0 + i);
 
/* currently supports only one outstanding message */
@@ -3830,6 +3830,7 @@ static int bnxt_hwrm_ver_get(struct bnxt *bp)
struct hwrm_ver_get_input req = {0};
struct hwrm_ver_get_output *resp = bp->hwrm_cmd_resp_addr;
 
+   bp->hwrm_max_req_len = HWRM_MAX_REQ_LEN;
bnxt_hwrm_cmd_hdr_init(bp, , HWRM_VER_GET, -1, -1);
req.hwrm_intf_maj = HWRM_VERSION_MAJOR;
req.hwrm_intf_min = HWRM_VERSION_MINOR;
@@ -3855,6 +3856,9 @@ static int bnxt_hwrm_ver_get(struct bnxt *bp)
if (!bp->hwrm_cmd_timeout)
bp->hwrm_cmd_timeout = DFLT_HWRM_CMD_TIMEOUT;
 
+   if (resp->hwrm_intf_maj >= 1)
+   bp->hwrm_max_req_len = le16_to_cpu(resp->max_req_win_len);
+
 hwrm_ver_get_exit:
mutex_unlock(>hwrm_cmd_lock);
return rc;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index ec04c47..709b95b 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -477,6 +477,7 @@ struct rx_tpa_end_cmp_ext {
 #define RING_CMP(idx)  ((idx) & bp->cp_ring_mask)
 #define NEXT_CMP(idx)  RING_CMP(ADV_RAW_CMP(idx, 1))
 
+#define BNXT_HWRM_MAX_REQ_LEN  (bp->hwrm_max_req_len)
 #define DFLT_HWRM_CMD_TIMEOUT  500
 #define HWRM_CMD_TIMEOUT   (bp->hwrm_cmd_timeout)
 #define HWRM_RESET_TIMEOUT ((HWRM_CMD_TIMEOUT) * 4)
@@ -953,6 +954,7 @@ struct bnxt {
dma_addr_t  hw_tx_port_stats_map;
int hw_port_stats_size;
 
+   u16 hwrm_max_req_len;
int hwrm_cmd_timeout;
struct mutexhwrm_cmd_lock;  /* serialize hwrm messages */
struct hwrm_ver_get_output  ver_resp;
-- 
1.8.3.1



[PATCH net 0/4] misc. small fixes.

2016-03-28 Thread Michael Chan
Misc. small fixes for net.

Michael Chan (4):
  bnxt_en: Initialize CP doorbell value before ring allocation
  bnxt_en: Implement proper firmware message padding.
  bnxt_en: Fix typo in bnxt_hwrm_set_pause_common().
  bnxt_en: Fix ethtool -a reporting.

 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 10 +++---
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |  2 ++
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c |  6 ++
 3 files changed, 11 insertions(+), 7 deletions(-)

-- 
1.8.3.1



[PATCH net 4/4] bnxt_en: Fix ethtool -a reporting.

2016-03-28 Thread Michael Chan
To report flow control tx/rx settings accurately regardless of autoneg
setting, we should use link_info->req_flow_ctrl.  Before this patch,
the reported settings were only correct when autoneg was on.

Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index 9ada166..2e472f6 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -855,10 +855,8 @@ static void bnxt_get_pauseparam(struct net_device *dev,
if (BNXT_VF(bp))
return;
epause->autoneg = !!(link_info->autoneg & BNXT_AUTONEG_FLOW_CTRL);
-   epause->rx_pause =
-   ((link_info->auto_pause_setting & BNXT_LINK_PAUSE_RX) != 0);
-   epause->tx_pause =
-   ((link_info->auto_pause_setting & BNXT_LINK_PAUSE_TX) != 0);
+   epause->rx_pause = !!(link_info->req_flow_ctrl & BNXT_LINK_PAUSE_RX);
+   epause->tx_pause = !!(link_info->req_flow_ctrl & BNXT_LINK_PAUSE_TX);
 }
 
 static int bnxt_set_pauseparam(struct net_device *dev,
-- 
1.8.3.1



[PATCH net 1/4] bnxt_en: Initialize CP doorbell value before ring allocation

2016-03-28 Thread Michael Chan
From: Prashant Sreedharan 

The existing code does the following:
allocate completion ring
initialize completion ring doorbell
disable interrupts on this completion ring by writing to the doorbell

We can have a race where firmware sends an asynchronous event to the host
after completion ring allocation and before doorbell is initialized.
When this happens driver can crash while ringing the doorbell using
uninitialized value as part of handling the IRQ/napi request.

Signed-off-by: Prashant Sreedharan 
Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index aabbd51..c92053c 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -3391,11 +3391,11 @@ static int bnxt_hwrm_ring_alloc(struct bnxt *bp)
struct bnxt_cp_ring_info *cpr = >cp_ring;
struct bnxt_ring_struct *ring = >cp_ring_struct;
 
+   cpr->cp_doorbell = bp->bar1 + i * 0x80;
rc = hwrm_ring_alloc_send_msg(bp, ring, HWRM_RING_ALLOC_CMPL, i,
  INVALID_STATS_CTX_ID);
if (rc)
goto err_out;
-   cpr->cp_doorbell = bp->bar1 + i * 0x80;
BNXT_CP_DB(cpr->cp_doorbell, cpr->cp_raw_cons);
bp->grp_info[i].cp_fw_ring_id = ring->fw_ring_id;
}
-- 
1.8.3.1



Re: BUG: net/netfilter: KASAN: stack-out-of-bounds in tcp_packet

2016-03-28 Thread David Miller
From: Eric Dumazet 
Date: Mon, 28 Mar 2016 13:51:46 -0700

> On Mon, 2016-03-28 at 13:46 -0700, Eric Dumazet wrote:
> 
>> We have at least 384 bytes of padding in skb->head (this is struct
>> skb_shared_info).
>> 
>> Whatever garbage we might read, current code is fine.
>> 
>> We have to deal with a false positive here.
> 
> Very similar to the one fixed in 
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=10ec9472f05b45c94db3c854d22581a20b97db41

I don't see them as similar.

The current options code we are talking about here never references
past legitimate parts of the packet data.  We always check 'length',
and we never access past the boundary it describes.

This was the entire point of my posting.

Talking about padding, rather than the logical correctness of the
code, is therefore a distraction I think :-)


Re: BUG: net/netfilter: KASAN: stack-out-of-bounds in tcp_packet

2016-03-28 Thread David Miller
From: Jan Engelhardt 
Date: Mon, 28 Mar 2016 22:20:39 +0200 (CEST)

> 
> On Monday 2016-03-28 21:29, David Miller wrote:
 > > @@ -3716,6 +3716,8 @@ void tcp_parse_options(const struct sk_buff *skb,
 > >   length--;
 > >   continue;
 > >   default:
 > > +if (length < 2)
 > > +return;
 > >   opsize = *ptr++;
 > >   if (opsize < 2) /* "silly options" */
 > >   return;
>>
>>I'm trying to figure out how this can even matter.
>>If we are in the loop, length is at least one.
>>That means it is legal to read the opsize byte.
> 
> Is that because the skbuff is always padded to a multiple of (at
> least) two?

No, it's because length is at least one, so we can read one byte.

And then before we read more, we make sure length is at least opsize
which is at least two.

This has nothing to do with padding, and everything to do logically
with the code.


Re: [PATCH] ipv6: Fix the pmtu path for connected UDP socket

2016-03-28 Thread Martin KaFai Lau
On Mon, Mar 28, 2016 at 03:39:42PM -0700, Cong Wang wrote:
> On Fri, Mar 25, 2016 at 5:16 PM, Martin KaFai Lau  wrote:
> > On Fri, Mar 25, 2016 at 04:55:27PM -0700, Martin KaFai Lau wrote:
> >>  void ip6_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, __be32 mtu)
> >>  {
> >> + struct dst_entry *odst;
> >> +
> >> + odst = sk_dst_get(sk);
> >> +
> >>   ip6_update_pmtu(skb, sock_net(sk), mtu,
> >>   sk->sk_bound_dev_if, sk->sk_mark);
> >> +
> >> + if (odst && !odst->error &&
> >> + !ip6_dst_check(odst, inet6_sk(sk)->dst_cookie)) {
> >> + struct dst_entry *ndst;
> >> + struct flowi6 fl6;
> >> +
> >> + build_skb_flow_key(, skb, sock_net(sk),
> >> +sk->sk_bound_dev_if, sk->sk_mark);
> >> + ndst = ip6_route_output(sock_net(sk), NULL, );
> >> + if (!ndst->error)
> >> + ip6_dst_store(sk, ndst, NULL, NULL);
> > oops...missed:
> > else
> > dst_release(ndst);
>
>
> Can you send an updated patch for review? Since you have a test case
> while I didn't.
>
> Also,
>
> 2) Not sure if we need to update dst or dst->path here, I guess the later
> counts in xfrm case therefore more correct.
Thanks for pointing it out.  Calling ip6_dst_check is also not appropriate here.
odst->ops->check() should be used.  I need to take a closer look and may try
another approach.

Thanks
-- Martin


Re: [PATCH net v2 2/3] tunnels: Don't apply GRO to multiple layers of encapsulation.

2016-03-28 Thread Jesse Gross
On Mon, Mar 28, 2016 at 3:10 PM, Tom Herbert  wrote:
> On Mon, Mar 28, 2016 at 1:34 PM, Alexander Duyck
>  wrote:
>> On Mon, Mar 28, 2016 at 1:03 PM, Tom Herbert  wrote:
>>> On Mon, Mar 28, 2016 at 12:31 PM, Alexander Duyck
>>>  wrote:
 On Mon, Mar 28, 2016 at 11:47 AM, Tom Herbert  wrote:
> On Mon, Mar 28, 2016 at 10:37 AM, Alexander Duyck
>  wrote:
>> On Mon, Mar 28, 2016 at 9:31 AM, Tom Herbert  
>> wrote:
>>> On Sun, Mar 27, 2016 at 9:38 PM, Jesse Gross  wrote:
 On Sat, Mar 26, 2016 at 12:41 PM, Tom Herbert  
 wrote:
> On Sat, Mar 19, 2016 at 9:32 AM, Jesse Gross  wrote:
>> When drivers express support for TSO of encapsulated packets, they
>> only mean that they can do it for one layer of encapsulation.
>> Supporting additional levels would mean updating, at a minimum,
>> more IP length fields and they are unaware of this.
>>
> This patch completely breaks GRO for FOU and GUE.
>
>> No encapsulation device expresses support for handling offloaded
>> encapsulated packets, so we won't generate these types of frames
>> in the transmit path. However, GRO doesn't have a check for
>> multiple levels of encapsulation and will attempt to build them.
>>
>> UDP tunnel GRO actually does prevent this situation but it only
>> handles multiple UDP tunnels stacked on top of each other. This
>> generalizes that solution to prevent any kind of tunnel stacking
>> that would cause problems.
>>
> GUE and FOU regularly create packets that will be both GSO UDP tunnels
> and IPIP, SIT, GRE, etc. This is by design. There should be no
> ambiguity in the drivers as to what this means. For instance, if
> SKB_GSO_UDP_TUNNEL and SKB_GSO_GRE are set that is GRE/UDP packet, a
> driver can use ndo_features_check to validate.
>
> So multiple levels of encapsulation with GRO is perfectly valid and I
> would suggest to simply revert this patch. The one potential issue we
> could have is the potential for GRO to construct a packet which is a
> UDP-encapsulation inside another encapsulation, like UDP-encap in GRE.
> In this case the GSO flags don't provide enough information to
> distinguish say between GRE/UDP (common case) and UDP/GRE (uncommon
> case). To make this clear we can set udp_mark in GRE, ipip, and sit
> but *not* check for it.

 Generally speaking, multiple levels of encapsulation offload are not
 valid. I think this is pretty clear from the fact that none of the
 encapsulation drivers expose support for encapsulation offloads on
 transmit and the drivers supporting NETIF_F_GSO_GRE and
 NETIF_F_GSO_UDP_TUNNEL do not mean they can handle GRE in VXLAN.

>>>
>>> Kernel software offload does support this combination just fine.
>>> Seriously, I've tested that more than a thousand times. This is a few
>>> HW implementations you're referring to. The limitations of these
>>> drivers should not dictate how we build the software-- it needs to
>>> work the other way around.
>>
>> Jesse does have a point.  Drivers aren't checking for this kind of
>> thing currently as the transmit side doesn't generate these kind of
>> frames.  The fact that GRO is makes things a bit more messy as we will
>> really need to restructure the GSO code in order to handle it.  As far
>> as drivers testing for it I am pretty certain the i40e isn't.  I would
>> wonder if we need to add yet another GSO bit to indicate that we
>> support multiple layers of encapsulation.  I'm pretty sure the only
>> way we could possibly handle it would be in software since what you
>> are indicating is a indeterminate number of headers that all require
>> updates.
>>
 Asking drivers to assume that this combination of flags means FOU
 doesn't seem right to me. To the best of my knowledge, no driver uses
 ndo_feature_check to do validation of multiple tunnel offload flags
 since the assumption is that the stack will never try to do this.
 Since FOU is being treated as only a single level of encapsulation, I
 think it would be better to just advertise it that way on transmit
 (i.e. only set SKB_GSO_UDP_TUNNEL).

>>> If it's not FOU it will be something else. Arbitrarily declaring
>>> multi-levels of encapsulation invalid is simply the wrong direction,
>>> we should be increasing generality and capabilities of the kernel not
>>> holding it down with artificial 

Re: [PATCH] ipv6: Fix the pmtu path for connected UDP socket

2016-03-28 Thread Cong Wang
On Fri, Mar 25, 2016 at 5:16 PM, Martin KaFai Lau  wrote:
> On Fri, Mar 25, 2016 at 04:55:27PM -0700, Martin KaFai Lau wrote:
>>  void ip6_sk_update_pmtu(struct sk_buff *skb, struct sock *sk, __be32 mtu)
>>  {
>> + struct dst_entry *odst;
>> +
>> + odst = sk_dst_get(sk);
>> +
>>   ip6_update_pmtu(skb, sock_net(sk), mtu,
>>   sk->sk_bound_dev_if, sk->sk_mark);
>> +
>> + if (odst && !odst->error &&
>> + !ip6_dst_check(odst, inet6_sk(sk)->dst_cookie)) {
>> + struct dst_entry *ndst;
>> + struct flowi6 fl6;
>> +
>> + build_skb_flow_key(, skb, sock_net(sk),
>> +sk->sk_bound_dev_if, sk->sk_mark);
>> + ndst = ip6_route_output(sock_net(sk), NULL, );
>> + if (!ndst->error)
>> + ip6_dst_store(sk, ndst, NULL, NULL);
> oops...missed:
> else
> dst_release(ndst);


Can you send an updated patch for review? Since you have a test case
while I didn't.

Also,

1) I think you need to check obsolete before update? There is no
reason to update an obsoleted dst?

2) Not sure if we need to update dst or dst->path here, I guess the later
counts in xfrm case therefore more correct.

Thanks.


Re: [RFT Patch net 1/2] ipv6: invalidate the socket cached route on pmtu events if possible

2016-03-28 Thread Cong Wang
On Fri, Mar 25, 2016 at 11:11 AM, Eric Dumazet  wrote:
> On Fri, 2016-03-25 at 10:17 -0700, Cong Wang wrote:
>
>> 1) sock lock protects the whole update: the whole check, update, recheck,
>> set logic, to make sure another CPU will not do the same to the same socket
>> at the same time.
>>
>> 2) the dst itself is safe, because it is always refcounted, and we use xchg()
>> to change the pointer in sk_dst_cache.
>>
>> Or am I still missing anything here?
>
> As TCP always lock the socket before doing its heavy stuff,
> it can use a variant of sk_dst_cache manipulations that do not use extra
> atomic operations.
>
> But UDP gets xchg() to safely exchange sk_dst_cache, because we do not
> feel locking the socket is needed for UDP for typical uses (! cork)
>
> If you hold the socket lock in ICMP handler, then it would be
> inconsistent with udp sendmsg() where we do not hold the socket lock.
>
> Since I believe udp sendmsg() is fine, I do believe you do not need to
> lock the socket, and then care about socket being owned by the user.

I see, seems the whole update logic is safe to become lock-free, then
commit 8141ed9fcedb278f4a3a78680591bef1e55f75fb can be just
reverted.

OTOH, other bh_lock_sock() callers need it to update queues etc.,
here we only need to check and update one single pointer in sk.

Steffen?

Thanks.


Re: [PATCH net v2 2/3] tunnels: Don't apply GRO to multiple layers of encapsulation.

2016-03-28 Thread Tom Herbert
On Mon, Mar 28, 2016 at 1:34 PM, Alexander Duyck
 wrote:
> On Mon, Mar 28, 2016 at 1:03 PM, Tom Herbert  wrote:
>> On Mon, Mar 28, 2016 at 12:31 PM, Alexander Duyck
>>  wrote:
>>> On Mon, Mar 28, 2016 at 11:47 AM, Tom Herbert  wrote:
 On Mon, Mar 28, 2016 at 10:37 AM, Alexander Duyck
  wrote:
> On Mon, Mar 28, 2016 at 9:31 AM, Tom Herbert  wrote:
>> On Sun, Mar 27, 2016 at 9:38 PM, Jesse Gross  wrote:
>>> On Sat, Mar 26, 2016 at 12:41 PM, Tom Herbert  
>>> wrote:
 On Sat, Mar 19, 2016 at 9:32 AM, Jesse Gross  wrote:
> When drivers express support for TSO of encapsulated packets, they
> only mean that they can do it for one layer of encapsulation.
> Supporting additional levels would mean updating, at a minimum,
> more IP length fields and they are unaware of this.
>
 This patch completely breaks GRO for FOU and GUE.

> No encapsulation device expresses support for handling offloaded
> encapsulated packets, so we won't generate these types of frames
> in the transmit path. However, GRO doesn't have a check for
> multiple levels of encapsulation and will attempt to build them.
>
> UDP tunnel GRO actually does prevent this situation but it only
> handles multiple UDP tunnels stacked on top of each other. This
> generalizes that solution to prevent any kind of tunnel stacking
> that would cause problems.
>
 GUE and FOU regularly create packets that will be both GSO UDP tunnels
 and IPIP, SIT, GRE, etc. This is by design. There should be no
 ambiguity in the drivers as to what this means. For instance, if
 SKB_GSO_UDP_TUNNEL and SKB_GSO_GRE are set that is GRE/UDP packet, a
 driver can use ndo_features_check to validate.

 So multiple levels of encapsulation with GRO is perfectly valid and I
 would suggest to simply revert this patch. The one potential issue we
 could have is the potential for GRO to construct a packet which is a
 UDP-encapsulation inside another encapsulation, like UDP-encap in GRE.
 In this case the GSO flags don't provide enough information to
 distinguish say between GRE/UDP (common case) and UDP/GRE (uncommon
 case). To make this clear we can set udp_mark in GRE, ipip, and sit
 but *not* check for it.
>>>
>>> Generally speaking, multiple levels of encapsulation offload are not
>>> valid. I think this is pretty clear from the fact that none of the
>>> encapsulation drivers expose support for encapsulation offloads on
>>> transmit and the drivers supporting NETIF_F_GSO_GRE and
>>> NETIF_F_GSO_UDP_TUNNEL do not mean they can handle GRE in VXLAN.
>>>
>>
>> Kernel software offload does support this combination just fine.
>> Seriously, I've tested that more than a thousand times. This is a few
>> HW implementations you're referring to. The limitations of these
>> drivers should not dictate how we build the software-- it needs to
>> work the other way around.
>
> Jesse does have a point.  Drivers aren't checking for this kind of
> thing currently as the transmit side doesn't generate these kind of
> frames.  The fact that GRO is makes things a bit more messy as we will
> really need to restructure the GSO code in order to handle it.  As far
> as drivers testing for it I am pretty certain the i40e isn't.  I would
> wonder if we need to add yet another GSO bit to indicate that we
> support multiple layers of encapsulation.  I'm pretty sure the only
> way we could possibly handle it would be in software since what you
> are indicating is a indeterminate number of headers that all require
> updates.
>
>>> Asking drivers to assume that this combination of flags means FOU
>>> doesn't seem right to me. To the best of my knowledge, no driver uses
>>> ndo_feature_check to do validation of multiple tunnel offload flags
>>> since the assumption is that the stack will never try to do this.
>>> Since FOU is being treated as only a single level of encapsulation, I
>>> think it would be better to just advertise it that way on transmit
>>> (i.e. only set SKB_GSO_UDP_TUNNEL).
>>>
>> If it's not FOU it will be something else. Arbitrarily declaring
>> multi-levels of encapsulation invalid is simply the wrong direction,
>> we should be increasing generality and capabilities of the kernel not
>> holding it down with artificial limits. This is why the generic TSO
>> work that Alexander and Edward are looking at is so important-- if
>> this flies then we can offload any combination of 

Re: RESEND: Easily reproducible kernel panic due to netdev all_adj_list refcnt handling

2016-03-28 Thread Matthias Schiffer
On 03/25/2016 11:10 PM, Andrew Collins wrote:
> On 03/25/2016 02:43 PM, Matthias Schiffer wrote:
>> We've tried your patch, and it changes the symptoms a bit, but doesn't fix
>> the panic. I've attached kernel logs of the crash both before and after
>> applying the patch.
>>
>> One note: I did not reproduce this issue myself, it was first reported in
>> [1], and then forwarded to the batman-adv issue tracker [2] by me.
>>
>> Regards,
>> Matthias
>>
>>
>> [1] https://github.com/freifunk-gluon/gluon/issues/680
>> [2] https://www.open-mesh.org/issues/247
> 
> On the off chance it helps, the version of the patch I integrated locally
> takes a somewhat different approach
> than the one I sent to the mailing list (propagates adj_list refcnts). 
> I've attached it in case it's useful.
> 
> I haven't submitted this upstream yet as it's still rather ugly.  I'm of
> the opinion that the whole "every device
> knows every upperdev and lowerdev in its tree" model is rather broken, and
> the patch is just working around
> a design that needs some rework.
> 
> Thanks,
> Andrew Collins

It's ugly, but it seems to help. No crashes so far with the new version of
your patch.

Thanks,
Matthias



signature.asc
Description: OpenPGP digital signature


Re: BUG: net/netfilter: KASAN: stack-out-of-bounds in tcp_packet

2016-03-28 Thread Eric Dumazet
On Mon, 2016-03-28 at 23:11 +0200, Jozsef Kadlecsik wrote:

> In net/netfilter/nf_conntrack_proto_tcp.c we copy the options into a 
> buffer with skb_header_pointer(), so it's not a false positive there and 
> the KASAN report referred to that part.
> 

Although the out of bound could be one extra byte,
if skb_header_bpointer() had to copy something (since it also might
return a pointer inside skb->head)

No arch would possibly fault here.

So reading one byte on the stack is fooling KASAN, but no ill effect
would actually happen.

If the read byte is < 2, the function would return because of

 if (opsize < 2)
return;

If the read byte is >= 2, the function would return because of
if (opsize > length)
return; /* don't parse partial options */


(Since we care here of the case where length == 1)

No big deal, it is probably better to 'fix' the code so that it pleases
dynamic checkers.








Re: BUG: net/netfilter: KASAN: stack-out-of-bounds in tcp_packet

2016-03-28 Thread Jozsef Kadlecsik
On Mon, 28 Mar 2016, Eric Dumazet wrote:

> On Mon, 2016-03-28 at 22:20 +0200, Jan Engelhardt wrote:
> > On Monday 2016-03-28 21:29, David Miller wrote:
> > >>> > > @@ -3716,6 +3716,8 @@ void tcp_parse_options(const struct sk_buff 
> > >>> > > *skb,
> > >>> > >   length--;
> > >>> > >   continue;
> > >>> > >   default:
> > >>> > > +if (length < 2)
> > >>> > > +return;
> > >>> > >   opsize = *ptr++;
> > >>> > >   if (opsize < 2) /* "silly options" */
> > >>> > >   return;
> > >
> > >I'm trying to figure out how this can even matter.
> > >If we are in the loop, length is at least one.
> > >That means it is legal to read the opsize byte.
> > 
> > Is that because the skbuff is always padded to a multiple of (at
> > least) two? Maybe such padding is explicitly foregone when ASAN is in
> > place. After all, glibc, in userspace, is likely to do padding as
> > well for malloc, and yet, ASAN catches these cases.

There might be a TCP option combination, which is "properly" padded but 
broken, like (wscale, wscale-value, mss) where the mss-value is missing.

> We have at least 384 bytes of padding in skb->head (this is struct
> skb_shared_info).
> 
> Whatever garbage we might read, current code is fine.
> 
> We have to deal with a false positive here.

In net/netfilter/nf_conntrack_proto_tcp.c we copy the options into a 
buffer with skb_header_pointer(), so it's not a false positive there and 
the KASAN report referred to that part.

I thought it's valid for tcp_parse_options() too, but then I'm wrong so 
at least the part from the patch for tcp_input.c can be dropped.

Best regards,
Jozsef
-
E-mail  : kad...@blackhole.kfki.hu, kadlecsik.joz...@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
  H-1525 Budapest 114, POB. 49, Hungary


Re: [RFC net-next 2/2] udp: No longer use SLAB_DESTROY_BY_RCU

2016-03-28 Thread Eric Dumazet
On Fri, 2016-03-25 at 17:08 -0700, Tom Herbert wrote:
> On Fri, Mar 25, 2016 at 3:29 PM, Eric Dumazet  wrote:

> > +/* Must be called under rcu_read_lock().
> 
> 
> It might be just as easy to do the rcu_read_lock() within the
> function. That way we don't need to require callers to do it now.
> 
> > + * Does increment socket refcount.
> > + */
> > +#if IS_ENABLED(CONFIG_NETFILTER_XT_MATCH_SOCKET) || \
> > +IS_ENABLED(CONFIG_NETFILTER_XT_TARGET_TPROXY)
> >  struct sock *udp4_lib_lookup(struct net *net, __be32 saddr, __be16 sport,
> >  __be32 daddr, __be16 dport, int dif)
> >  {
> > -   return __udp4_lib_lookup(net, saddr, sport, daddr, dport, dif,
> > -_table, NULL);
> > +   struct sock *sk;
> > +
> > +   sk = __udp4_lib_lookup(net, saddr, sport, daddr, dport,
> > +  dif, _table, NULL);
> > +   if (sk && !atomic_inc_not_zero(>sk_refcnt))
> > +   sk = NULL;
> > +   return sk;
> >  }
> >  EXPORT_SYMBOL_GPL(udp4_lib_lookup);
> > +#endif


Well, these callers already run with rcu_read_lock(), I simply added a
comment to remind this assumption.

As I said, we might later avoid the refcounting if callers are modified
to not call sock_put(). This is why I prefer to maintain the reuirement
of rcu_read_lock() being held by callers anyway.






Re: BUG: net/netfilter: KASAN: stack-out-of-bounds in tcp_packet

2016-03-28 Thread Eric Dumazet
On Mon, 2016-03-28 at 13:46 -0700, Eric Dumazet wrote:

> We have at least 384 bytes of padding in skb->head (this is struct
> skb_shared_info).
> 
> Whatever garbage we might read, current code is fine.
> 
> We have to deal with a false positive here.

Very similar to the one fixed in 
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=10ec9472f05b45c94db3c854d22581a20b97db41





Re: BUG: net/netfilter: KASAN: stack-out-of-bounds in tcp_packet

2016-03-28 Thread Eric Dumazet
On Mon, 2016-03-28 at 22:20 +0200, Jan Engelhardt wrote:
> On Monday 2016-03-28 21:29, David Miller wrote:
> >>> > > @@ -3716,6 +3716,8 @@ void tcp_parse_options(const struct sk_buff 
> >>> > > *skb,
> >>> > >   length--;
> >>> > >   continue;
> >>> > >   default:
> >>> > > +if (length < 2)
> >>> > > +return;
> >>> > >   opsize = *ptr++;
> >>> > >   if (opsize < 2) /* "silly options" */
> >>> > >   return;
> >
> >I'm trying to figure out how this can even matter.
> >If we are in the loop, length is at least one.
> >That means it is legal to read the opsize byte.
> 
> Is that because the skbuff is always padded to a multiple of (at
> least) two? Maybe such padding is explicitly foregone when ASAN is in
> place. After all, glibc, in userspace, is likely to do padding as
> well for malloc, and yet, ASAN catches these cases.

We have at least 384 bytes of padding in skb->head (this is struct
skb_shared_info).

Whatever garbage we might read, current code is fine.

We have to deal with a false positive here.






[PATCH] ethernet: mvneta: Support netpoll

2016-03-28 Thread Ezequiel Garcia
This commit adds the support for netpoll, which is used
to implement netconsole.

Signed-off-by: Ezequiel Garcia 
---
Tested on Armada 370 Mirabox and Armada XP Openblocks AX3-4
with netconsole.

 drivers/net/ethernet/marvell/mvneta.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/drivers/net/ethernet/marvell/mvneta.c 
b/drivers/net/ethernet/marvell/mvneta.c
index 577f7ca7deba..dd114303c98f 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -3813,6 +3813,24 @@ static int mvneta_ethtool_get_rxfh(struct net_device 
*dev, u32 *indir, u8 *key,
return 0;
 }
 
+#ifdef CONFIG_NET_POLL_CONTROLLER
+static void mvneta_percpu_poll_controller(void *arg)
+{
+   struct mvneta_port *pp = arg;
+   struct mvneta_pcpu_port *pcpu_port =
+   this_cpu_ptr(pp->ports);
+   mvneta_isr(pp->dev->irq, pcpu_port);
+}
+
+/* Polled functionality used by netconsole and others in non interrupt mode */
+static void mvneta_poll_controller(struct net_device *dev)
+{
+   struct mvneta_port *pp = netdev_priv(dev);
+
+   on_each_cpu(mvneta_percpu_poll_controller, pp, false);
+}
+#endif
+
 static const struct net_device_ops mvneta_netdev_ops = {
.ndo_open= mvneta_open,
.ndo_stop= mvneta_stop,
@@ -3823,6 +3841,9 @@ static const struct net_device_ops mvneta_netdev_ops = {
.ndo_fix_features= mvneta_fix_features,
.ndo_get_stats64 = mvneta_get_stats64,
.ndo_do_ioctl= mvneta_ioctl,
+#ifdef CONFIG_NET_POLL_CONTROLLER
+   .ndo_poll_controller = mvneta_poll_controller,
+#endif
 };
 
 const struct ethtool_ops mvneta_eth_tool_ops = {
-- 
2.7.0



[PATCH net,stable] qmi_wwan: add "D-Link DWM-221 B1" device id

2016-03-28 Thread Bjørn Mork
Thomas reports:
"Windows:

00 diagnostics
01 modem
02 at-port
03 nmea
04 nic

Linux:

T:  Bus=02 Lev=01 Prnt=01 Port=03 Cnt=01 Dev#=  4 Spd=480 MxCh= 0
D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=2001 ProdID=7e19 Rev=02.32
S:  Manufacturer=Mobile Connect
S:  Product=Mobile Connect
S:  SerialNumber=0123456789ABCDEF
C:  #Ifs= 6 Cfg#= 1 Atr=a0 MxPwr=500mA
I:  If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
I:  If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
I:  If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
I:  If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option
I:  If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
I:  If#= 5 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=usb-storage"

Reported-by: Thomas Schäfer 
Signed-off-by: Bjørn Mork 
---
 drivers/net/usb/qmi_wwan.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
index 7d717c66bcb0..9d1fce8a6e84 100644
--- a/drivers/net/usb/qmi_wwan.c
+++ b/drivers/net/usb/qmi_wwan.c
@@ -844,6 +844,7 @@ static const struct usb_device_id products[] = {
{QMI_FIXED_INTF(0x19d2, 0x1426, 2)},/* ZTE MF91 */
{QMI_FIXED_INTF(0x19d2, 0x1428, 2)},/* Telewell TW-LTE 4G v2 */
{QMI_FIXED_INTF(0x19d2, 0x2002, 4)},/* ZTE (Vodafone) K3765-Z */
+   {QMI_FIXED_INTF(0x2001, 0x7e19, 4)},/* D-Link DWM-221 B1 */
{QMI_FIXED_INTF(0x0f3d, 0x68a2, 8)},/* Sierra Wireless MC7700 */
{QMI_FIXED_INTF(0x114f, 0x68a2, 8)},/* Sierra Wireless MC7750 */
{QMI_FIXED_INTF(0x1199, 0x68a2, 8)},/* Sierra Wireless MC7710 in 
QMI mode */
-- 
2.1.4



Re: [PATCH net v2 2/3] tunnels: Don't apply GRO to multiple layers of encapsulation.

2016-03-28 Thread Alexander Duyck
On Mon, Mar 28, 2016 at 1:03 PM, Tom Herbert  wrote:
> On Mon, Mar 28, 2016 at 12:31 PM, Alexander Duyck
>  wrote:
>> On Mon, Mar 28, 2016 at 11:47 AM, Tom Herbert  wrote:
>>> On Mon, Mar 28, 2016 at 10:37 AM, Alexander Duyck
>>>  wrote:
 On Mon, Mar 28, 2016 at 9:31 AM, Tom Herbert  wrote:
> On Sun, Mar 27, 2016 at 9:38 PM, Jesse Gross  wrote:
>> On Sat, Mar 26, 2016 at 12:41 PM, Tom Herbert  
>> wrote:
>>> On Sat, Mar 19, 2016 at 9:32 AM, Jesse Gross  wrote:
 When drivers express support for TSO of encapsulated packets, they
 only mean that they can do it for one layer of encapsulation.
 Supporting additional levels would mean updating, at a minimum,
 more IP length fields and they are unaware of this.

>>> This patch completely breaks GRO for FOU and GUE.
>>>
 No encapsulation device expresses support for handling offloaded
 encapsulated packets, so we won't generate these types of frames
 in the transmit path. However, GRO doesn't have a check for
 multiple levels of encapsulation and will attempt to build them.

 UDP tunnel GRO actually does prevent this situation but it only
 handles multiple UDP tunnels stacked on top of each other. This
 generalizes that solution to prevent any kind of tunnel stacking
 that would cause problems.

>>> GUE and FOU regularly create packets that will be both GSO UDP tunnels
>>> and IPIP, SIT, GRE, etc. This is by design. There should be no
>>> ambiguity in the drivers as to what this means. For instance, if
>>> SKB_GSO_UDP_TUNNEL and SKB_GSO_GRE are set that is GRE/UDP packet, a
>>> driver can use ndo_features_check to validate.
>>>
>>> So multiple levels of encapsulation with GRO is perfectly valid and I
>>> would suggest to simply revert this patch. The one potential issue we
>>> could have is the potential for GRO to construct a packet which is a
>>> UDP-encapsulation inside another encapsulation, like UDP-encap in GRE.
>>> In this case the GSO flags don't provide enough information to
>>> distinguish say between GRE/UDP (common case) and UDP/GRE (uncommon
>>> case). To make this clear we can set udp_mark in GRE, ipip, and sit
>>> but *not* check for it.
>>
>> Generally speaking, multiple levels of encapsulation offload are not
>> valid. I think this is pretty clear from the fact that none of the
>> encapsulation drivers expose support for encapsulation offloads on
>> transmit and the drivers supporting NETIF_F_GSO_GRE and
>> NETIF_F_GSO_UDP_TUNNEL do not mean they can handle GRE in VXLAN.
>>
>
> Kernel software offload does support this combination just fine.
> Seriously, I've tested that more than a thousand times. This is a few
> HW implementations you're referring to. The limitations of these
> drivers should not dictate how we build the software-- it needs to
> work the other way around.

 Jesse does have a point.  Drivers aren't checking for this kind of
 thing currently as the transmit side doesn't generate these kind of
 frames.  The fact that GRO is makes things a bit more messy as we will
 really need to restructure the GSO code in order to handle it.  As far
 as drivers testing for it I am pretty certain the i40e isn't.  I would
 wonder if we need to add yet another GSO bit to indicate that we
 support multiple layers of encapsulation.  I'm pretty sure the only
 way we could possibly handle it would be in software since what you
 are indicating is a indeterminate number of headers that all require
 updates.

>> Asking drivers to assume that this combination of flags means FOU
>> doesn't seem right to me. To the best of my knowledge, no driver uses
>> ndo_feature_check to do validation of multiple tunnel offload flags
>> since the assumption is that the stack will never try to do this.
>> Since FOU is being treated as only a single level of encapsulation, I
>> think it would be better to just advertise it that way on transmit
>> (i.e. only set SKB_GSO_UDP_TUNNEL).
>>
> If it's not FOU it will be something else. Arbitrarily declaring
> multi-levels of encapsulation invalid is simply the wrong direction,
> we should be increasing generality and capabilities of the kernel not
> holding it down with artificial limits. This is why the generic TSO
> work that Alexander and Edward are looking at is so important-- if
> this flies then we can offload any combination of encapsulations with
> out protocol specific information.

 The segmentation code isn't designed to handle more than 2 layers of
 headers.  Currently we 

Re: BUG: net/netfilter: KASAN: stack-out-of-bounds in tcp_packet

2016-03-28 Thread Jan Engelhardt

On Monday 2016-03-28 21:29, David Miller wrote:
>>> > > @@ -3716,6 +3716,8 @@ void tcp_parse_options(const struct sk_buff *skb,
>>> > >   length--;
>>> > >   continue;
>>> > >   default:
>>> > > +if (length < 2)
>>> > > +return;
>>> > >   opsize = *ptr++;
>>> > >   if (opsize < 2) /* "silly options" */
>>> > >   return;
>
>I'm trying to figure out how this can even matter.
>If we are in the loop, length is at least one.
>That means it is legal to read the opsize byte.

Is that because the skbuff is always padded to a multiple of (at
least) two? Maybe such padding is explicitly foregone when ASAN is in
place. After all, glibc, in userspace, is likely to do padding as
well for malloc, and yet, ASAN catches these cases.


Re: [RFC net-next 2/2] udp: No longer use SLAB_DESTROY_BY_RCU

2016-03-28 Thread Rick Jones

On 03/28/2016 01:01 PM, Eric Dumazet wrote:

Note : file structures got RCU freeing back in 2.6.14, and I do not
think named users ever complained about added cost ;)


Couldn't see the tree for the forest I guess :)

rick



Re: BUG: net/netfilter: KASAN: stack-out-of-bounds in tcp_packet

2016-03-28 Thread Eric Dumazet
On Mon, 2016-03-28 at 15:29 -0400, David Miller wrote:
> From: Jozsef Kadlecsik 
> Date: Mon, 28 Mar 2016 18:48:51 +0200 (CEST)
> 
> >> > > @@ -3716,6 +3716,8 @@ void tcp_parse_options(const struct sk_buff *skb,
> >> > >   length--;
> >> > >   continue;
> >> > >   default:
> >> > > +if (length < 2)
> >> > > +return;
> >> > >   opsize = *ptr++;
> >> > >   if (opsize < 2) /* "silly options" */
> >> > >   return;
> 
> I'm trying to figure out how this can even matter.
> 
> If we are in the loop, length is at least one.
> 
> That means it is legal to read the opsize byte.
> 
> By the next check, opsize is at least 2.
> 
> And then the very next line in this code makes sure length >= opsize:
> 
>   if (opsize > length)
>   return; /* don't parse partial options */
> 
> Therefore no out-of-range access is possible as far as I can see.

Maybe use kasan_disable_current() and kasan_enable_current() to silence
kasan ?

Oh wait, these are not BH safe.






Re: [PATCH net v2 2/3] tunnels: Don't apply GRO to multiple layers of encapsulation.

2016-03-28 Thread Tom Herbert
On Mon, Mar 28, 2016 at 12:31 PM, Alexander Duyck
 wrote:
> On Mon, Mar 28, 2016 at 11:47 AM, Tom Herbert  wrote:
>> On Mon, Mar 28, 2016 at 10:37 AM, Alexander Duyck
>>  wrote:
>>> On Mon, Mar 28, 2016 at 9:31 AM, Tom Herbert  wrote:
 On Sun, Mar 27, 2016 at 9:38 PM, Jesse Gross  wrote:
> On Sat, Mar 26, 2016 at 12:41 PM, Tom Herbert  
> wrote:
>> On Sat, Mar 19, 2016 at 9:32 AM, Jesse Gross  wrote:
>>> When drivers express support for TSO of encapsulated packets, they
>>> only mean that they can do it for one layer of encapsulation.
>>> Supporting additional levels would mean updating, at a minimum,
>>> more IP length fields and they are unaware of this.
>>>
>> This patch completely breaks GRO for FOU and GUE.
>>
>>> No encapsulation device expresses support for handling offloaded
>>> encapsulated packets, so we won't generate these types of frames
>>> in the transmit path. However, GRO doesn't have a check for
>>> multiple levels of encapsulation and will attempt to build them.
>>>
>>> UDP tunnel GRO actually does prevent this situation but it only
>>> handles multiple UDP tunnels stacked on top of each other. This
>>> generalizes that solution to prevent any kind of tunnel stacking
>>> that would cause problems.
>>>
>> GUE and FOU regularly create packets that will be both GSO UDP tunnels
>> and IPIP, SIT, GRE, etc. This is by design. There should be no
>> ambiguity in the drivers as to what this means. For instance, if
>> SKB_GSO_UDP_TUNNEL and SKB_GSO_GRE are set that is GRE/UDP packet, a
>> driver can use ndo_features_check to validate.
>>
>> So multiple levels of encapsulation with GRO is perfectly valid and I
>> would suggest to simply revert this patch. The one potential issue we
>> could have is the potential for GRO to construct a packet which is a
>> UDP-encapsulation inside another encapsulation, like UDP-encap in GRE.
>> In this case the GSO flags don't provide enough information to
>> distinguish say between GRE/UDP (common case) and UDP/GRE (uncommon
>> case). To make this clear we can set udp_mark in GRE, ipip, and sit
>> but *not* check for it.
>
> Generally speaking, multiple levels of encapsulation offload are not
> valid. I think this is pretty clear from the fact that none of the
> encapsulation drivers expose support for encapsulation offloads on
> transmit and the drivers supporting NETIF_F_GSO_GRE and
> NETIF_F_GSO_UDP_TUNNEL do not mean they can handle GRE in VXLAN.
>

 Kernel software offload does support this combination just fine.
 Seriously, I've tested that more than a thousand times. This is a few
 HW implementations you're referring to. The limitations of these
 drivers should not dictate how we build the software-- it needs to
 work the other way around.
>>>
>>> Jesse does have a point.  Drivers aren't checking for this kind of
>>> thing currently as the transmit side doesn't generate these kind of
>>> frames.  The fact that GRO is makes things a bit more messy as we will
>>> really need to restructure the GSO code in order to handle it.  As far
>>> as drivers testing for it I am pretty certain the i40e isn't.  I would
>>> wonder if we need to add yet another GSO bit to indicate that we
>>> support multiple layers of encapsulation.  I'm pretty sure the only
>>> way we could possibly handle it would be in software since what you
>>> are indicating is a indeterminate number of headers that all require
>>> updates.
>>>
> Asking drivers to assume that this combination of flags means FOU
> doesn't seem right to me. To the best of my knowledge, no driver uses
> ndo_feature_check to do validation of multiple tunnel offload flags
> since the assumption is that the stack will never try to do this.
> Since FOU is being treated as only a single level of encapsulation, I
> think it would be better to just advertise it that way on transmit
> (i.e. only set SKB_GSO_UDP_TUNNEL).
>
 If it's not FOU it will be something else. Arbitrarily declaring
 multi-levels of encapsulation invalid is simply the wrong direction,
 we should be increasing generality and capabilities of the kernel not
 holding it down with artificial limits. This is why the generic TSO
 work that Alexander and Edward are looking at is so important-- if
 this flies then we can offload any combination of encapsulations with
 out protocol specific information.
>>>
>>> The segmentation code isn't designed to handle more than 2 layers of
>>> headers.  Currently we have the pointers for the inner headers and the
>>> outer headers.  If you are talking about adding yet another level we
>>> would need additional pointers in the 

Re: [RFC net-next 2/2] udp: No longer use SLAB_DESTROY_BY_RCU

2016-03-28 Thread Eric Dumazet
On Mon, 2016-03-28 at 12:11 -0700, Rick Jones wrote:

> I was under the impression that individual DNS queries were supposed to 
> have not only random DNS query IDs but also originate from random UDP 
> source ports.  https://tools.ietf.org/html/rfc5452 4.5 at least touches 
> on the topic but I don't see it making it hard and fast.  By section 10 
> though it is more explicit:
> 
> This document recommends the use of UDP source port number
> randomization to extend the effective DNS transaction ID beyond the
> available 16 bits.
> 
> That being the case, if indeed there were to be 3-odd concurrent 
> requests outstanding "upstream" from that location there'd have to be 
> 3 ephemeral ports in play.

Sure, so these heavy duty programs have to consider the randomness
implications already, when picking a port in their pool.

Using the kernel and force socket()/bind()/close() in the hope to get
random values is safe, but never had been optimal for performance.

Note : file structures got RCU freeing back in 2.6.14, and I do not
think named users ever complained about added cost ;)





Re: [PATCH 0/9] Netfilter fixes for net

2016-03-28 Thread David Miller
From: Pablo Neira Ayuso 
Date: Mon, 28 Mar 2016 19:57:53 +0200

> The following patchset contains Netfilter fixes for you net tree,
> they are:
 ...
> This batch comes with four patches to validate x_tables blobs coming
> from userspace. CONFIG_USERNS exposes the x_tables interface to
> unpriviledged users and to be honest this interface never received the
> attention for this move away from the CAP_NET_ADMIN domain. Florian is
> working on another round with more patches with more sanity checks, so
> expect a bit more Netfilter fixes in this development cycle than usual.
> 
> You can pull these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Looks good, pulled, thanks Pablo!


Re: [PATCH net v2 2/3] tunnels: Don't apply GRO to multiple layers of encapsulation.

2016-03-28 Thread David Miller
From: Jesse Gross 
Date: Sun, 27 Mar 2016 21:38:34 -0700

> Generally speaking, multiple levels of encapsulation offload are not
> valid.

That's very much not true Jesse.

Please fix the regression you introduced or I will have no choice but to
revert your entire set of changes.


Re: [PATCH net v2 2/3] tunnels: Don't apply GRO to multiple layers of encapsulation.

2016-03-28 Thread Alexander Duyck
On Mon, Mar 28, 2016 at 11:47 AM, Tom Herbert  wrote:
> On Mon, Mar 28, 2016 at 10:37 AM, Alexander Duyck
>  wrote:
>> On Mon, Mar 28, 2016 at 9:31 AM, Tom Herbert  wrote:
>>> On Sun, Mar 27, 2016 at 9:38 PM, Jesse Gross  wrote:
 On Sat, Mar 26, 2016 at 12:41 PM, Tom Herbert  wrote:
> On Sat, Mar 19, 2016 at 9:32 AM, Jesse Gross  wrote:
>> When drivers express support for TSO of encapsulated packets, they
>> only mean that they can do it for one layer of encapsulation.
>> Supporting additional levels would mean updating, at a minimum,
>> more IP length fields and they are unaware of this.
>>
> This patch completely breaks GRO for FOU and GUE.
>
>> No encapsulation device expresses support for handling offloaded
>> encapsulated packets, so we won't generate these types of frames
>> in the transmit path. However, GRO doesn't have a check for
>> multiple levels of encapsulation and will attempt to build them.
>>
>> UDP tunnel GRO actually does prevent this situation but it only
>> handles multiple UDP tunnels stacked on top of each other. This
>> generalizes that solution to prevent any kind of tunnel stacking
>> that would cause problems.
>>
> GUE and FOU regularly create packets that will be both GSO UDP tunnels
> and IPIP, SIT, GRE, etc. This is by design. There should be no
> ambiguity in the drivers as to what this means. For instance, if
> SKB_GSO_UDP_TUNNEL and SKB_GSO_GRE are set that is GRE/UDP packet, a
> driver can use ndo_features_check to validate.
>
> So multiple levels of encapsulation with GRO is perfectly valid and I
> would suggest to simply revert this patch. The one potential issue we
> could have is the potential for GRO to construct a packet which is a
> UDP-encapsulation inside another encapsulation, like UDP-encap in GRE.
> In this case the GSO flags don't provide enough information to
> distinguish say between GRE/UDP (common case) and UDP/GRE (uncommon
> case). To make this clear we can set udp_mark in GRE, ipip, and sit
> but *not* check for it.

 Generally speaking, multiple levels of encapsulation offload are not
 valid. I think this is pretty clear from the fact that none of the
 encapsulation drivers expose support for encapsulation offloads on
 transmit and the drivers supporting NETIF_F_GSO_GRE and
 NETIF_F_GSO_UDP_TUNNEL do not mean they can handle GRE in VXLAN.

>>>
>>> Kernel software offload does support this combination just fine.
>>> Seriously, I've tested that more than a thousand times. This is a few
>>> HW implementations you're referring to. The limitations of these
>>> drivers should not dictate how we build the software-- it needs to
>>> work the other way around.
>>
>> Jesse does have a point.  Drivers aren't checking for this kind of
>> thing currently as the transmit side doesn't generate these kind of
>> frames.  The fact that GRO is makes things a bit more messy as we will
>> really need to restructure the GSO code in order to handle it.  As far
>> as drivers testing for it I am pretty certain the i40e isn't.  I would
>> wonder if we need to add yet another GSO bit to indicate that we
>> support multiple layers of encapsulation.  I'm pretty sure the only
>> way we could possibly handle it would be in software since what you
>> are indicating is a indeterminate number of headers that all require
>> updates.
>>
 Asking drivers to assume that this combination of flags means FOU
 doesn't seem right to me. To the best of my knowledge, no driver uses
 ndo_feature_check to do validation of multiple tunnel offload flags
 since the assumption is that the stack will never try to do this.
 Since FOU is being treated as only a single level of encapsulation, I
 think it would be better to just advertise it that way on transmit
 (i.e. only set SKB_GSO_UDP_TUNNEL).

>>> If it's not FOU it will be something else. Arbitrarily declaring
>>> multi-levels of encapsulation invalid is simply the wrong direction,
>>> we should be increasing generality and capabilities of the kernel not
>>> holding it down with artificial limits. This is why the generic TSO
>>> work that Alexander and Edward are looking at is so important-- if
>>> this flies then we can offload any combination of encapsulations with
>>> out protocol specific information.
>>
>> The segmentation code isn't designed to handle more than 2 layers of
>> headers.  Currently we have the pointers for the inner headers and the
>> outer headers.  If you are talking about adding yet another level we
>> would need additional pointers in the skbuff to handle them and we
>> currently don't have them at present.
>>
 The change that you are suggesting would result in packets generated
 by GRO that cannot be 

Re: BUG: net/netfilter: KASAN: stack-out-of-bounds in tcp_packet

2016-03-28 Thread David Miller
From: Jozsef Kadlecsik 
Date: Mon, 28 Mar 2016 18:48:51 +0200 (CEST)

>> > > @@ -3716,6 +3716,8 @@ void tcp_parse_options(const struct sk_buff *skb,
>> > >   length--;
>> > >   continue;
>> > >   default:
>> > > +if (length < 2)
>> > > +return;
>> > >   opsize = *ptr++;
>> > >   if (opsize < 2) /* "silly options" */
>> > >   return;

I'm trying to figure out how this can even matter.

If we are in the loop, length is at least one.

That means it is legal to read the opsize byte.

By the next check, opsize is at least 2.

And then the very next line in this code makes sure length >= opsize:

if (opsize > length)
return; /* don't parse partial options */

Therefore no out-of-range access is possible as far as I can see.


Re: [RFC net-next 2/2] udp: No longer use SLAB_DESTROY_BY_RCU

2016-03-28 Thread Rick Jones

On 03/28/2016 11:55 AM, Eric Dumazet wrote:

On Mon, 2016-03-28 at 11:44 -0700, Rick Jones wrote:

On 03/28/2016 10:00 AM, Eric Dumazet wrote:

If you mean that a busy DNS resolver spends _most_ of its time doing :

fd = socket()
bind(fd  port=0)
< send and receive one frame >
close(fd)


Yes.  Although it has been a long time, I thought that say the likes of
a caching named in the middle between hosts and the rest of the DNS
would behave that way as it was looking-up names on behalf those who
asked it.


I really doubt a modern program would dynamically allocate one UDP port
for every in-flight request, as it would limit them to number of
ephemeral ports concurrent requests (~3 assuming the process can get
them all on the host)


I was under the impression that individual DNS queries were supposed to 
have not only random DNS query IDs but also originate from random UDP 
source ports.  https://tools.ietf.org/html/rfc5452 4.5 at least touches 
on the topic but I don't see it making it hard and fast.  By section 10 
though it is more explicit:


   This document recommends the use of UDP source port number
   randomization to extend the effective DNS transaction ID beyond the
   available 16 bits.

That being the case, if indeed there were to be 3-odd concurrent 
requests outstanding "upstream" from that location there'd have to be 
3 ephemeral ports in play.


rick



Managing a pool would be more efficient (The 1.3 usec penalty becomes
more like 4 usec in multi threaded programs)

Sure, you always can find badly written programs, but they already hit
scalability issues anyway.

UDP refcounting cost about 2 cache line misses per packet in stress
situations, this really has to go, so that well written programs can get
full speed.






Re: [RFC net-next 2/2] udp: No longer use SLAB_DESTROY_BY_RCU

2016-03-28 Thread Eric Dumazet
On Mon, 2016-03-28 at 11:44 -0700, Rick Jones wrote:
> On 03/28/2016 10:00 AM, Eric Dumazet wrote:
> > On Mon, 2016-03-28 at 09:15 -0700, Rick Jones wrote:
> >> On 03/25/2016 03:29 PM, Eric Dumazet wrote:
> >>> UDP sockets are not short lived in the high usage case, so the added
> >>> cost of call_rcu() should not be a concern.
> >>
> >> Even a busy DNS resolver?
> >
> > If you mean that a busy DNS resolver spends _most_ of its time doing :
> >
> > fd = socket()
> > bind(fd  port=0)
> >< send and receive one frame >
> > close(fd)
> 
> Yes.  Although it has been a long time, I thought that say the likes of 
> a caching named in the middle between hosts and the rest of the DNS 
> would behave that way as it was looking-up names on behalf those who 
> asked it.

I really doubt a modern program would dynamically allocate one UDP port
for every in-flight request, as it would limit them to number of
ephemeral ports concurrent requests (~3 assuming the process can get
them all on the host)

Managing a pool would be more efficient (The 1.3 usec penalty becomes
more like 4 usec in multi threaded programs)

Sure, you always can find badly written programs, but they already hit
scalability issues anyway.

UDP refcounting cost about 2 cache line misses per packet in stress
situations, this really has to go, so that well written programs can get
full speed.





Re: [PATCH net v2 2/3] tunnels: Don't apply GRO to multiple layers of encapsulation.

2016-03-28 Thread Tom Herbert
On Mon, Mar 28, 2016 at 10:37 AM, Alexander Duyck
 wrote:
> On Mon, Mar 28, 2016 at 9:31 AM, Tom Herbert  wrote:
>> On Sun, Mar 27, 2016 at 9:38 PM, Jesse Gross  wrote:
>>> On Sat, Mar 26, 2016 at 12:41 PM, Tom Herbert  wrote:
 On Sat, Mar 19, 2016 at 9:32 AM, Jesse Gross  wrote:
> When drivers express support for TSO of encapsulated packets, they
> only mean that they can do it for one layer of encapsulation.
> Supporting additional levels would mean updating, at a minimum,
> more IP length fields and they are unaware of this.
>
 This patch completely breaks GRO for FOU and GUE.

> No encapsulation device expresses support for handling offloaded
> encapsulated packets, so we won't generate these types of frames
> in the transmit path. However, GRO doesn't have a check for
> multiple levels of encapsulation and will attempt to build them.
>
> UDP tunnel GRO actually does prevent this situation but it only
> handles multiple UDP tunnels stacked on top of each other. This
> generalizes that solution to prevent any kind of tunnel stacking
> that would cause problems.
>
 GUE and FOU regularly create packets that will be both GSO UDP tunnels
 and IPIP, SIT, GRE, etc. This is by design. There should be no
 ambiguity in the drivers as to what this means. For instance, if
 SKB_GSO_UDP_TUNNEL and SKB_GSO_GRE are set that is GRE/UDP packet, a
 driver can use ndo_features_check to validate.

 So multiple levels of encapsulation with GRO is perfectly valid and I
 would suggest to simply revert this patch. The one potential issue we
 could have is the potential for GRO to construct a packet which is a
 UDP-encapsulation inside another encapsulation, like UDP-encap in GRE.
 In this case the GSO flags don't provide enough information to
 distinguish say between GRE/UDP (common case) and UDP/GRE (uncommon
 case). To make this clear we can set udp_mark in GRE, ipip, and sit
 but *not* check for it.
>>>
>>> Generally speaking, multiple levels of encapsulation offload are not
>>> valid. I think this is pretty clear from the fact that none of the
>>> encapsulation drivers expose support for encapsulation offloads on
>>> transmit and the drivers supporting NETIF_F_GSO_GRE and
>>> NETIF_F_GSO_UDP_TUNNEL do not mean they can handle GRE in VXLAN.
>>>
>>
>> Kernel software offload does support this combination just fine.
>> Seriously, I've tested that more than a thousand times. This is a few
>> HW implementations you're referring to. The limitations of these
>> drivers should not dictate how we build the software-- it needs to
>> work the other way around.
>
> Jesse does have a point.  Drivers aren't checking for this kind of
> thing currently as the transmit side doesn't generate these kind of
> frames.  The fact that GRO is makes things a bit more messy as we will
> really need to restructure the GSO code in order to handle it.  As far
> as drivers testing for it I am pretty certain the i40e isn't.  I would
> wonder if we need to add yet another GSO bit to indicate that we
> support multiple layers of encapsulation.  I'm pretty sure the only
> way we could possibly handle it would be in software since what you
> are indicating is a indeterminate number of headers that all require
> updates.
>
>>> Asking drivers to assume that this combination of flags means FOU
>>> doesn't seem right to me. To the best of my knowledge, no driver uses
>>> ndo_feature_check to do validation of multiple tunnel offload flags
>>> since the assumption is that the stack will never try to do this.
>>> Since FOU is being treated as only a single level of encapsulation, I
>>> think it would be better to just advertise it that way on transmit
>>> (i.e. only set SKB_GSO_UDP_TUNNEL).
>>>
>> If it's not FOU it will be something else. Arbitrarily declaring
>> multi-levels of encapsulation invalid is simply the wrong direction,
>> we should be increasing generality and capabilities of the kernel not
>> holding it down with artificial limits. This is why the generic TSO
>> work that Alexander and Edward are looking at is so important-- if
>> this flies then we can offload any combination of encapsulations with
>> out protocol specific information.
>
> The segmentation code isn't designed to handle more than 2 layers of
> headers.  Currently we have the pointers for the inner headers and the
> outer headers.  If you are talking about adding yet another level we
> would need additional pointers in the skbuff to handle them and we
> currently don't have them at present.
>
>>> The change that you are suggesting would result in packets generated
>>> by GRO that cannot be handled properly on transmit in some cases and
>>> would likely end up being dropped or malformed. GRO is just an
>>> optimization and correctness must come first so 

Re: [RFC net-next 2/2] udp: No longer use SLAB_DESTROY_BY_RCU

2016-03-28 Thread Rick Jones

On 03/28/2016 10:00 AM, Eric Dumazet wrote:

On Mon, 2016-03-28 at 09:15 -0700, Rick Jones wrote:

On 03/25/2016 03:29 PM, Eric Dumazet wrote:

UDP sockets are not short lived in the high usage case, so the added
cost of call_rcu() should not be a concern.


Even a busy DNS resolver?


If you mean that a busy DNS resolver spends _most_ of its time doing :

fd = socket()
bind(fd  port=0)
   < send and receive one frame >
close(fd)


Yes.  Although it has been a long time, I thought that say the likes of 
a caching named in the middle between hosts and the rest of the DNS 
would behave that way as it was looking-up names on behalf those who 
asked it.


rick



(If this is the case, may I suggest doing something different, and use
some kind of caches ? It will be way faster.)

Then the result for 10,000,000 loops of  are

Before patch :

real0m13.665s
user0m0.548s
sys 0m12.372s

After patch :

real0m20.599s
user0m0.465s
sys 0m17.965s

So the worst overhead is 700 ns

This is roughly the cost for bringing 960 bytes from memory, or 15 cache
lines (on x86_64)

# grep UDP /proc/slabinfo
UDPLITEv6  0  0   108872 : tunables   24   128 : 
slabdata  0  0  0
UDPv6 24 49   108872 : tunables   24   128 : 
slabdata  7  7  0
UDP-Lite   0  096041 : tunables   54   278 : 
slabdata  0  0  0
UDP   30 3696041 : tunables   54   278 : 
slabdata  9  9  2

In reality, chances that UDP sockets are re-opened right after being
freed and their 15 cache lines are very hot in cpu caches is quite
small, so I would not worry at all about this rather stupid benchmark.

int main(int argc, char *argv[]) {
struct sockaddr_in addr;
int i, fd, loops = 1000;

for (i = 0; i < loops; i++) {
fd = socket(AF_INET, SOCK_DGRAM, 0);
if (fd == -1) {
perror("socket");
break;
}
memset(, 0, sizeof(addr));
addr.sin_family = AF_INET;
if (bind(fd, (const struct sockaddr *), sizeof(addr)) == 
-1) {
perror("bind");
break;
}
close(fd);
}
return 0;
}





Re: [PATCH net] team: team should sync the port's uc/mc addrs when add a port

2016-03-28 Thread Jiri Pirko
Mon, Mar 28, 2016 at 06:42:31PM CEST, lucien@gmail.com wrote:
>There is an issue when we use mavtap over team:
>When we replug nic links from team0, the real nics's mc list will not
>include the maddr for macvtap any more. then we can't receive pkts to
>macvtap device, as they are filterred by mc list of nic.
>
>In Bonding Driver, it syncs the uc/mc addrs in bond_enslave().
>
>We will fix this issue on team by adding the port's uc/mc addrs sync in
>team_port_add.
>
>Signed-off-by: Xin Long 

Acked-by: Jiri Pirko 


[PATCH 3/9] openvswitch: call only into reachable nf-nat code

2016-03-28 Thread Pablo Neira Ayuso
From: Arnd Bergmann 

The openvswitch code has gained support for calling into the
nf-nat-ipv4/ipv6 modules, however those can be loadable modules
in a configuration in which openvswitch is built-in, leading
to link errors:

net/built-in.o: In function `__ovs_ct_lookup':
:(.text+0x2cc2c8): undefined reference to `nf_nat_icmp_reply_translation'
:(.text+0x2cc66c): undefined reference to `nf_nat_icmpv6_reply_translation'

The dependency on (!NF_NAT || NF_NAT) prevents similar issues,
but NF_NAT is set to 'y' if any of the symbols selecting
it are built-in, but the link error happens when any of them
are modular.

A second issue is that even if CONFIG_NF_NAT_IPV6 is built-in,
CONFIG_NF_NAT_IPV4 might be completely disabled. This is unlikely
to be useful in practice, but the driver currently only handles
IPv6 being optional.

This patch improves the Kconfig dependency so that openvswitch
cannot be built-in if either of the two other symbols are set
to 'm', and it replaces the incorrect #ifdef in ovs_ct_nat_execute()
with two "if (IS_ENABLED())" checks that should catch all corner
cases also make the code more readable.

The same #ifdef exists ovs_ct_nat_to_attr(), where it does not
cause a link error, but for consistency I'm changing it the same
way.

Signed-off-by: Arnd Bergmann 
Fixes: 05752523e565 ("openvswitch: Interface with NAT.")
Acked-by: Joe Stringer 
Signed-off-by: Pablo Neira Ayuso 
---
 net/openvswitch/Kconfig |  4 +++-
 net/openvswitch/conntrack.c | 16 
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/net/openvswitch/Kconfig b/net/openvswitch/Kconfig
index 234a733..ce94729 100644
--- a/net/openvswitch/Kconfig
+++ b/net/openvswitch/Kconfig
@@ -7,7 +7,9 @@ config OPENVSWITCH
depends on INET
depends on !NF_CONNTRACK || \
   (NF_CONNTRACK && ((!NF_DEFRAG_IPV6 || NF_DEFRAG_IPV6) && \
-(!NF_NAT || NF_NAT)))
+(!NF_NAT || NF_NAT) && \
+(!NF_NAT_IPV4 || NF_NAT_IPV4) && \
+(!NF_NAT_IPV6 || NF_NAT_IPV6)))
select LIBCRC32C
select MPLS
select NET_MPLS_GSO
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 47f7c62..3797879 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -535,14 +535,15 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct 
nf_conn *ct,
switch (ctinfo) {
case IP_CT_RELATED:
case IP_CT_RELATED_REPLY:
-   if (skb->protocol == htons(ETH_P_IP) &&
+   if (IS_ENABLED(CONFIG_NF_NAT_IPV4) &&
+   skb->protocol == htons(ETH_P_IP) &&
ip_hdr(skb)->protocol == IPPROTO_ICMP) {
if (!nf_nat_icmp_reply_translation(skb, ct, ctinfo,
   hooknum))
err = NF_DROP;
goto push;
-#if IS_ENABLED(CONFIG_NF_NAT_IPV6)
-   } else if (skb->protocol == htons(ETH_P_IPV6)) {
+   } else if (IS_ENABLED(CONFIG_NF_NAT_IPV6) &&
+  skb->protocol == htons(ETH_P_IPV6)) {
__be16 frag_off;
u8 nexthdr = ipv6_hdr(skb)->nexthdr;
int hdrlen = ipv6_skip_exthdr(skb,
@@ -557,7 +558,6 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct 
nf_conn *ct,
err = NF_DROP;
goto push;
}
-#endif
}
/* Non-ICMP, fall thru to initialize if needed. */
case IP_CT_NEW:
@@ -1239,7 +1239,8 @@ static bool ovs_ct_nat_to_attr(const struct 
ovs_conntrack_info *info,
}
 
if (info->range.flags & NF_NAT_RANGE_MAP_IPS) {
-   if (info->family == NFPROTO_IPV4) {
+   if (IS_ENABLED(CONFIG_NF_NAT_IPV4) &&
+   info->family == NFPROTO_IPV4) {
if (nla_put_in_addr(skb, OVS_NAT_ATTR_IP_MIN,
info->range.min_addr.ip) ||
(info->range.max_addr.ip
@@ -1247,8 +1248,8 @@ static bool ovs_ct_nat_to_attr(const struct 
ovs_conntrack_info *info,
 (nla_put_in_addr(skb, OVS_NAT_ATTR_IP_MAX,
  info->range.max_addr.ip
return false;
-#if IS_ENABLED(CONFIG_NF_NAT_IPV6)
-   } else if (info->family == NFPROTO_IPV6) {
+   } else if (IS_ENABLED(CONFIG_NF_NAT_IPV6) &&
+  info->family == NFPROTO_IPV6) {
if (nla_put_in6_addr(skb, OVS_NAT_ATTR_IP_MIN,
 >range.min_addr.in6) ||

[PATCH 4/9] netfilter: x_tables: validate e->target_offset early

2016-03-28 Thread Pablo Neira Ayuso
From: Florian Westphal 

We should check that e->target_offset is sane before
mark_source_chains gets called since it will fetch the target entry
for loop detection.

Signed-off-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
---
 net/ipv4/netfilter/arp_tables.c | 17 -
 net/ipv4/netfilter/ip_tables.c  | 17 -
 net/ipv6/netfilter/ip6_tables.c | 17 -
 3 files changed, 24 insertions(+), 27 deletions(-)

diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index bf08192..830bbe8 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -474,14 +474,12 @@ next:
return 1;
 }
 
-static inline int check_entry(const struct arpt_entry *e, const char *name)
+static inline int check_entry(const struct arpt_entry *e)
 {
const struct xt_entry_target *t;
 
-   if (!arp_checkentry(>arp)) {
-   duprintf("arp_tables: arp check failed %p %s.\n", e, name);
+   if (!arp_checkentry(>arp))
return -EINVAL;
-   }
 
if (e->target_offset + sizeof(struct xt_entry_target) > e->next_offset)
return -EINVAL;
@@ -522,10 +520,6 @@ find_check_entry(struct arpt_entry *e, const char *name, 
unsigned int size)
struct xt_target *target;
int ret;
 
-   ret = check_entry(e, name);
-   if (ret)
-   return ret;
-
e->counters.pcnt = xt_percpu_counter_alloc();
if (IS_ERR_VALUE(e->counters.pcnt))
return -ENOMEM;
@@ -576,6 +570,7 @@ static inline int check_entry_size_and_hooks(struct 
arpt_entry *e,
 unsigned int valid_hooks)
 {
unsigned int h;
+   int err;
 
if ((unsigned long)e % __alignof__(struct arpt_entry) != 0 ||
(unsigned char *)e + sizeof(struct arpt_entry) >= limit) {
@@ -590,6 +585,10 @@ static inline int check_entry_size_and_hooks(struct 
arpt_entry *e,
return -EINVAL;
}
 
+   err = check_entry(e);
+   if (err)
+   return err;
+
/* Check hooks & underflows */
for (h = 0; h < NF_ARP_NUMHOOKS; h++) {
if (!(valid_hooks & (1 << h)))
@@ -1246,7 +1245,7 @@ check_compat_entry_size_and_hooks(struct 
compat_arpt_entry *e,
}
 
/* For purposes of check_entry casting the compat entry is fine */
-   ret = check_entry((struct arpt_entry *)e, name);
+   ret = check_entry((struct arpt_entry *)e);
if (ret)
return ret;
 
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index e53f8d6..1d72a3c 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -569,14 +569,12 @@ static void cleanup_match(struct xt_entry_match *m, 
struct net *net)
 }
 
 static int
-check_entry(const struct ipt_entry *e, const char *name)
+check_entry(const struct ipt_entry *e)
 {
const struct xt_entry_target *t;
 
-   if (!ip_checkentry(>ip)) {
-   duprintf("ip check failed %p %s.\n", e, name);
+   if (!ip_checkentry(>ip))
return -EINVAL;
-   }
 
if (e->target_offset + sizeof(struct xt_entry_target) >
e->next_offset)
@@ -666,10 +664,6 @@ find_check_entry(struct ipt_entry *e, struct net *net, 
const char *name,
struct xt_mtchk_param mtpar;
struct xt_entry_match *ematch;
 
-   ret = check_entry(e, name);
-   if (ret)
-   return ret;
-
e->counters.pcnt = xt_percpu_counter_alloc();
if (IS_ERR_VALUE(e->counters.pcnt))
return -ENOMEM;
@@ -741,6 +735,7 @@ check_entry_size_and_hooks(struct ipt_entry *e,
   unsigned int valid_hooks)
 {
unsigned int h;
+   int err;
 
if ((unsigned long)e % __alignof__(struct ipt_entry) != 0 ||
(unsigned char *)e + sizeof(struct ipt_entry) >= limit) {
@@ -755,6 +750,10 @@ check_entry_size_and_hooks(struct ipt_entry *e,
return -EINVAL;
}
 
+   err = check_entry(e);
+   if (err)
+   return err;
+
/* Check hooks & underflows */
for (h = 0; h < NF_INET_NUMHOOKS; h++) {
if (!(valid_hooks & (1 << h)))
@@ -1506,7 +1505,7 @@ check_compat_entry_size_and_hooks(struct compat_ipt_entry 
*e,
}
 
/* For purposes of check_entry casting the compat entry is fine */
-   ret = check_entry((struct ipt_entry *)e, name);
+   ret = check_entry((struct ipt_entry *)e);
if (ret)
return ret;
 
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 84f9baf..26a5ad1 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -581,14 +581,12 @@ static void cleanup_match(struct xt_entry_match *m, 
struct net *net)
 }
 
 static int
-check_entry(const struct ip6t_entry *e, 

[PATCH 1/9] netfilter: ipset: fix race condition in ipset save, swap and delete

2016-03-28 Thread Pablo Neira Ayuso
From: Vishwanath Pai 

This fix adds a new reference counter (ref_netlink) for the struct ip_set.
The other reference counter (ref) can be swapped out by ip_set_swap and we
need a separate counter to keep track of references for netlink events
like dump. Using the same ref counter for dump causes a race condition
which can be demonstrated by the following script:

ipset create hash_ip1 hash:ip family inet hashsize 1024 maxelem 50 \
counters
ipset create hash_ip2 hash:ip family inet hashsize 30 maxelem 50 \
counters
ipset create hash_ip3 hash:ip family inet hashsize 1024 maxelem 50 \
counters

ipset save &

ipset swap hash_ip3 hash_ip2
ipset destroy hash_ip3 /* will crash the machine */

Swap will exchange the values of ref so destroy will see ref = 0 instead of
ref = 1. With this fix in place swap will not succeed because ipset save
still has ref_netlink on the set (ip_set_swap doesn't swap ref_netlink).

Both delete and swap will error out if ref_netlink != 0 on the set.

Note: The changes to *_head functions is because previously we would
increment ref whenever we called these functions, we don't do that
anymore.

Reviewed-by: Joshua Hunt 
Signed-off-by: Vishwanath Pai 
Signed-off-by: Jozsef Kadlecsik 
Signed-off-by: Pablo Neira Ayuso 
---
 include/linux/netfilter/ipset/ip_set.h  |  4 
 net/netfilter/ipset/ip_set_bitmap_gen.h |  2 +-
 net/netfilter/ipset/ip_set_core.c   | 33 -
 net/netfilter/ipset/ip_set_hash_gen.h   |  2 +-
 net/netfilter/ipset/ip_set_list_set.c   |  2 +-
 5 files changed, 35 insertions(+), 8 deletions(-)

diff --git a/include/linux/netfilter/ipset/ip_set.h 
b/include/linux/netfilter/ipset/ip_set.h
index 0e1f433..f48b8a6 100644
--- a/include/linux/netfilter/ipset/ip_set.h
+++ b/include/linux/netfilter/ipset/ip_set.h
@@ -234,6 +234,10 @@ struct ip_set {
spinlock_t lock;
/* References to the set */
u32 ref;
+   /* References to the set for netlink events like dump,
+* ref can be swapped out by ip_set_swap
+*/
+   u32 ref_netlink;
/* The core set type */
struct ip_set_type *type;
/* The type variant doing the real job */
diff --git a/net/netfilter/ipset/ip_set_bitmap_gen.h 
b/net/netfilter/ipset/ip_set_bitmap_gen.h
index b0bc475..2e8e7e5 100644
--- a/net/netfilter/ipset/ip_set_bitmap_gen.h
+++ b/net/netfilter/ipset/ip_set_bitmap_gen.h
@@ -95,7 +95,7 @@ mtype_head(struct ip_set *set, struct sk_buff *skb)
if (!nested)
goto nla_put_failure;
if (mtype_do_head(skb, map) ||
-   nla_put_net32(skb, IPSET_ATTR_REFERENCES, htonl(set->ref - 1)) ||
+   nla_put_net32(skb, IPSET_ATTR_REFERENCES, htonl(set->ref)) ||
nla_put_net32(skb, IPSET_ATTR_MEMSIZE, htonl(memsize)))
goto nla_put_failure;
if (unlikely(ip_set_put_flags(skb, set)))
diff --git a/net/netfilter/ipset/ip_set_core.c 
b/net/netfilter/ipset/ip_set_core.c
index 7e6568c..a748b0c 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -497,6 +497,26 @@ __ip_set_put(struct ip_set *set)
write_unlock_bh(_set_ref_lock);
 }
 
+/* set->ref can be swapped out by ip_set_swap, netlink events (like dump) need
+ * a separate reference counter
+ */
+static inline void
+__ip_set_get_netlink(struct ip_set *set)
+{
+   write_lock_bh(_set_ref_lock);
+   set->ref_netlink++;
+   write_unlock_bh(_set_ref_lock);
+}
+
+static inline void
+__ip_set_put_netlink(struct ip_set *set)
+{
+   write_lock_bh(_set_ref_lock);
+   BUG_ON(set->ref_netlink == 0);
+   set->ref_netlink--;
+   write_unlock_bh(_set_ref_lock);
+}
+
 /* Add, del and test set entries from kernel.
  *
  * The set behind the index must exist and must be referenced
@@ -1002,7 +1022,7 @@ static int ip_set_destroy(struct net *net, struct sock 
*ctnl,
if (!attr[IPSET_ATTR_SETNAME]) {
for (i = 0; i < inst->ip_set_max; i++) {
s = ip_set(inst, i);
-   if (s && s->ref) {
+   if (s && (s->ref || s->ref_netlink)) {
ret = -IPSET_ERR_BUSY;
goto out;
}
@@ -1024,7 +1044,7 @@ static int ip_set_destroy(struct net *net, struct sock 
*ctnl,
if (!s) {
ret = -ENOENT;
goto out;
-   } else if (s->ref) {
+   } else if (s->ref || s->ref_netlink) {
ret = -IPSET_ERR_BUSY;
goto out;
}
@@ -1171,6 +1191,9 @@ static int ip_set_swap(struct net *net, struct sock 
*ctnl, struct sk_buff *skb,
  from->family == to->family))
return -IPSET_ERR_TYPE_MISMATCH;
 
+   if (from->ref_netlink || to->ref_netlink)
+   

[PATCH 0/9] Netfilter fixes for net

2016-03-28 Thread Pablo Neira Ayuso
Hi David,

The following patchset contains Netfilter fixes for you net tree,
they are:

1) There was a race condition between parallel save/swap and delete,
   which resulted a kernel crash due to the increase ref for save, swap,
   wrong ref decrease operations. Reported and fixed by Vishwanath Pai.

2) OVS should call into CT NAT for packets of new expected connections only
   when the conntrack state is persisted with the 'commit' option to the
   OVS CT action. From Jarno Rajahalme.

3) Resolve kconfig dependencies with new OVS NAT support. From Arnd Bergmann.

4) Early validation of entry->target_offset to make sure it doesn't take us
   out from the blob, from Florian Westphal.

5) Again early validation of entry->next_offset to make sure it doesn't take
   out from the blob, also from Florian.

6) Check that entry->target_offset is always of of sizeof(struct xt_entry)
   for unconditional entries, when checking both from check_underflow()
   and when checking for loops in mark_source_chains(), again from
   Florian.

7) Fix inconsistent behaviour in nfnetlink_queue when
   NFQA_CFG_F_FAIL_OPEN is set and netlink_unicast() fails due to buffer
   overrun, we have to reinject the packet as the user expects.

8) Enforce nul-terminated table names from getsockopt GET_ENTRIES
   requests.

9) Don't assume skb->sk is set from nft_bridge_reject and synproxy,
   this fixes a recent update of the code to namespaceify
   ip_default_ttl, patch from Liping Zhang.

This batch comes with four patches to validate x_tables blobs coming
from userspace. CONFIG_USERNS exposes the x_tables interface to
unpriviledged users and to be honest this interface never received the
attention for this move away from the CAP_NET_ADMIN domain. Florian is
working on another round with more patches with more sanity checks, so
expect a bit more Netfilter fixes in this development cycle than usual.

You can pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Thanks!



The following changes since commit d7be81a5916bdb1d904803958e5991a16f7ae4b2:

  ravb: fix software timestamping (2016-03-27 22:41:37 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git HEAD

for you to fetch changes up to 29421198c3a860092e27c2ad8499dfe603398817:

  netfilter: ipv4: fix NULL dereference (2016-03-28 17:59:29 +0200)


Arnd Bergmann (1):
  openvswitch: call only into reachable nf-nat code

Florian Westphal (3):
  netfilter: x_tables: validate e->target_offset early
  netfilter: x_tables: make sure e->next_offset covers remaining blob size
  netfilter: x_tables: fix unconditional helper

Jarno Rajahalme (1):
  openvswitch: Fix checking for new expected connections.

Liping Zhang (1):
  netfilter: ipv4: fix NULL dereference

Pablo Neira Ayuso (2):
  netfilter: nfnetlink_queue: honor NFQA_CFG_F_FAIL_OPEN when netlink 
unicast fails
  netfilter: x_tables: enforce nul-terminated table name from getsockopt 
GET_ENTRIES

Vishwanath Pai (1):
  netfilter: ipset: fix race condition in ipset save, swap and delete

 include/linux/netfilter/ipset/ip_set.h   |  4 +++
 net/bridge/netfilter/ebtables.c  |  4 +++
 net/bridge/netfilter/nft_reject_bridge.c | 20 ++--
 net/ipv4/netfilter/arp_tables.c  | 43 +
 net/ipv4/netfilter/ip_tables.c   | 48 ++--
 net/ipv4/netfilter/ipt_SYNPROXY.c| 54 +---
 net/ipv6/netfilter/ip6_tables.c  | 48 ++--
 net/netfilter/ipset/ip_set_bitmap_gen.h  |  2 +-
 net/netfilter/ipset/ip_set_core.c| 33 ---
 net/netfilter/ipset/ip_set_hash_gen.h|  2 +-
 net/netfilter/ipset/ip_set_list_set.c|  2 +-
 net/netfilter/nfnetlink_queue.c  |  7 -
 net/openvswitch/Kconfig  |  4 ++-
 net/openvswitch/conntrack.c  | 21 +++--
 14 files changed, 170 insertions(+), 122 deletions(-)


[PATCH 7/9] netfilter: nfnetlink_queue: honor NFQA_CFG_F_FAIL_OPEN when netlink unicast fails

2016-03-28 Thread Pablo Neira Ayuso
When netlink unicast fails to deliver the message to userspace, we
should also check if the NFQA_CFG_F_FAIL_OPEN flag is set so we reinject
the packet back to the stack.

I think the user expects no packet drops when this flag is set due to
queueing to userspace errors, no matter if related to the internal queue
or when sending the netlink message to userspace.

The userspace application will still get the ENOBUFS error via recvmsg()
so the user still knows that, with the current configuration that is in
place, the userspace application is not consuming the messages at the
pace that the kernel needs.

Reported-by: "Yigal Reiss (yreiss)" 
Signed-off-by: Pablo Neira Ayuso 
Tested-by: "Yigal Reiss (yreiss)" 
---
 net/netfilter/nfnetlink_queue.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index 7542999..cb5b630 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -582,7 +582,12 @@ __nfqnl_enqueue_packet(struct net *net, struct 
nfqnl_instance *queue,
/* nfnetlink_unicast will either free the nskb or add it to a socket */
err = nfnetlink_unicast(nskb, net, queue->peer_portid, MSG_DONTWAIT);
if (err < 0) {
-   queue->queue_user_dropped++;
+   if (queue->flags & NFQA_CFG_F_FAIL_OPEN) {
+   failopen = 1;
+   err = 0;
+   } else {
+   queue->queue_user_dropped++;
+   }
goto err_out_unlock;
}
 
-- 
2.1.4



[PATCH 9/9] netfilter: ipv4: fix NULL dereference

2016-03-28 Thread Pablo Neira Ayuso
From: Liping Zhang 

Commit fa50d974d104 ("ipv4: Namespaceify ip_default_ttl sysctl knob")
use sock_net(skb->sk) to get the net namespace, but we can't assume
that sk_buff->sk is always exist, so when it is NULL, oops will happen.

Signed-off-by: Liping Zhang 
Reviewed-by: Nikolay Borisov 
Signed-off-by: Pablo Neira Ayuso 
---
 net/bridge/netfilter/nft_reject_bridge.c | 20 ++--
 net/ipv4/netfilter/ipt_SYNPROXY.c| 54 +---
 2 files changed, 38 insertions(+), 36 deletions(-)

diff --git a/net/bridge/netfilter/nft_reject_bridge.c 
b/net/bridge/netfilter/nft_reject_bridge.c
index adc8d72..77f7e7a 100644
--- a/net/bridge/netfilter/nft_reject_bridge.c
+++ b/net/bridge/netfilter/nft_reject_bridge.c
@@ -40,7 +40,8 @@ static void nft_reject_br_push_etherhdr(struct sk_buff 
*oldskb,
 /* We cannot use oldskb->dev, it can be either bridge device (NF_BRIDGE INPUT)
  * or the bridge port (NF_BRIDGE PREROUTING).
  */
-static void nft_reject_br_send_v4_tcp_reset(struct sk_buff *oldskb,
+static void nft_reject_br_send_v4_tcp_reset(struct net *net,
+   struct sk_buff *oldskb,
const struct net_device *dev,
int hook)
 {
@@ -48,7 +49,6 @@ static void nft_reject_br_send_v4_tcp_reset(struct sk_buff 
*oldskb,
struct iphdr *niph;
const struct tcphdr *oth;
struct tcphdr _oth;
-   struct net *net = sock_net(oldskb->sk);
 
if (!nft_bridge_iphdr_validate(oldskb))
return;
@@ -75,7 +75,8 @@ static void nft_reject_br_send_v4_tcp_reset(struct sk_buff 
*oldskb,
br_deliver(br_port_get_rcu(dev), nskb);
 }
 
-static void nft_reject_br_send_v4_unreach(struct sk_buff *oldskb,
+static void nft_reject_br_send_v4_unreach(struct net *net,
+ struct sk_buff *oldskb,
  const struct net_device *dev,
  int hook, u8 code)
 {
@@ -86,7 +87,6 @@ static void nft_reject_br_send_v4_unreach(struct sk_buff 
*oldskb,
void *payload;
__wsum csum;
u8 proto;
-   struct net *net = sock_net(oldskb->sk);
 
if (oldskb->csum_bad || !nft_bridge_iphdr_validate(oldskb))
return;
@@ -273,17 +273,17 @@ static void nft_reject_bridge_eval(const struct nft_expr 
*expr,
case htons(ETH_P_IP):
switch (priv->type) {
case NFT_REJECT_ICMP_UNREACH:
-   nft_reject_br_send_v4_unreach(pkt->skb, pkt->in,
- pkt->hook,
+   nft_reject_br_send_v4_unreach(pkt->net, pkt->skb,
+ pkt->in, pkt->hook,
  priv->icmp_code);
break;
case NFT_REJECT_TCP_RST:
-   nft_reject_br_send_v4_tcp_reset(pkt->skb, pkt->in,
-   pkt->hook);
+   nft_reject_br_send_v4_tcp_reset(pkt->net, pkt->skb,
+   pkt->in, pkt->hook);
break;
case NFT_REJECT_ICMPX_UNREACH:
-   nft_reject_br_send_v4_unreach(pkt->skb, pkt->in,
- pkt->hook,
+   nft_reject_br_send_v4_unreach(pkt->net, pkt->skb,
+ pkt->in, pkt->hook,
  
nft_reject_icmp_code(priv->icmp_code));
break;
}
diff --git a/net/ipv4/netfilter/ipt_SYNPROXY.c 
b/net/ipv4/netfilter/ipt_SYNPROXY.c
index 7b8fbb3..db5b875 100644
--- a/net/ipv4/netfilter/ipt_SYNPROXY.c
+++ b/net/ipv4/netfilter/ipt_SYNPROXY.c
@@ -18,10 +18,10 @@
 #include 
 
 static struct iphdr *
-synproxy_build_ip(struct sk_buff *skb, __be32 saddr, __be32 daddr)
+synproxy_build_ip(struct net *net, struct sk_buff *skb, __be32 saddr,
+ __be32 daddr)
 {
struct iphdr *iph;
-   struct net *net = sock_net(skb->sk);
 
skb_reset_network_header(skb);
iph = (struct iphdr *)skb_put(skb, sizeof(*iph));
@@ -40,14 +40,12 @@ synproxy_build_ip(struct sk_buff *skb, __be32 saddr, __be32 
daddr)
 }
 
 static void
-synproxy_send_tcp(const struct synproxy_net *snet,
+synproxy_send_tcp(struct net *net,
  const struct sk_buff *skb, struct sk_buff *nskb,
  struct nf_conntrack *nfct, enum ip_conntrack_info ctinfo,
  struct iphdr *niph, struct tcphdr *nth,
  unsigned int tcp_hdr_size)
 {
-   struct net *net = nf_ct_net(snet->tmpl);
-
nth->check = 

[PATCH 8/9] netfilter: x_tables: enforce nul-terminated table name from getsockopt GET_ENTRIES

2016-03-28 Thread Pablo Neira Ayuso
Make sure the table names via getsockopt GET_ENTRIES is nul-terminated
in ebtables and all the x_tables variants and their respective compat
code. Uncovered by KASAN.

Reported-by: Baozeng Ding 
Signed-off-by: Pablo Neira Ayuso 
---
 net/bridge/netfilter/ebtables.c | 4 
 net/ipv4/netfilter/arp_tables.c | 2 ++
 net/ipv4/netfilter/ip_tables.c  | 2 ++
 net/ipv6/netfilter/ip6_tables.c | 2 ++
 4 files changed, 10 insertions(+)

diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index 67b2e27..8570bc7 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -1521,6 +1521,8 @@ static int do_ebt_get_ctl(struct sock *sk, int cmd, void 
__user *user, int *len)
if (copy_from_user(, user, sizeof(tmp)))
return -EFAULT;
 
+   tmp.name[sizeof(tmp.name) - 1] = '\0';
+
t = find_table_lock(net, tmp.name, , _mutex);
if (!t)
return ret;
@@ -2332,6 +2334,8 @@ static int compat_do_ebt_get_ctl(struct sock *sk, int cmd,
if (copy_from_user(, user, sizeof(tmp)))
return -EFAULT;
 
+   tmp.name[sizeof(tmp.name) - 1] = '\0';
+
t = find_table_lock(net, tmp.name, , _mutex);
if (!t)
return ret;
diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index a1bb5e7..4133b0f 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -969,6 +969,7 @@ static int get_entries(struct net *net, struct 
arpt_get_entries __user *uptr,
 sizeof(struct arpt_get_entries) + get.size);
return -EINVAL;
}
+   get.name[sizeof(get.name) - 1] = '\0';
 
t = xt_find_table_lock(net, NFPROTO_ARP, get.name);
if (!IS_ERR_OR_NULL(t)) {
@@ -1663,6 +1664,7 @@ static int compat_get_entries(struct net *net,
 *len, sizeof(get) + get.size);
return -EINVAL;
}
+   get.name[sizeof(get.name) - 1] = '\0';
 
xt_compat_lock(NFPROTO_ARP);
t = xt_find_table_lock(net, NFPROTO_ARP, get.name);
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 89b5d95..631c100 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -1156,6 +1156,7 @@ get_entries(struct net *net, struct ipt_get_entries 
__user *uptr,
 *len, sizeof(get) + get.size);
return -EINVAL;
}
+   get.name[sizeof(get.name) - 1] = '\0';
 
t = xt_find_table_lock(net, AF_INET, get.name);
if (!IS_ERR_OR_NULL(t)) {
@@ -1935,6 +1936,7 @@ compat_get_entries(struct net *net, struct 
compat_ipt_get_entries __user *uptr,
 *len, sizeof(get) + get.size);
return -EINVAL;
}
+   get.name[sizeof(get.name) - 1] = '\0';
 
xt_compat_lock(AF_INET);
t = xt_find_table_lock(net, AF_INET, get.name);
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 541b59f..86b67b7 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -1168,6 +1168,7 @@ get_entries(struct net *net, struct ip6t_get_entries 
__user *uptr,
 *len, sizeof(get) + get.size);
return -EINVAL;
}
+   get.name[sizeof(get.name) - 1] = '\0';
 
t = xt_find_table_lock(net, AF_INET6, get.name);
if (!IS_ERR_OR_NULL(t)) {
@@ -1944,6 +1945,7 @@ compat_get_entries(struct net *net, struct 
compat_ip6t_get_entries __user *uptr,
 *len, sizeof(get) + get.size);
return -EINVAL;
}
+   get.name[sizeof(get.name) - 1] = '\0';
 
xt_compat_lock(AF_INET6);
t = xt_find_table_lock(net, AF_INET6, get.name);
-- 
2.1.4



[PATCH 2/9] openvswitch: Fix checking for new expected connections.

2016-03-28 Thread Pablo Neira Ayuso
From: Jarno Rajahalme 

OVS should call into CT NAT for packets of new expected connections only
when the conntrack state is persisted with the 'commit' option to the
OVS CT action.  The test for this condition is doubly wrong, as the CT
status field is ANDed with the bit number (IPS_EXPECTED_BIT) rather
than the mask (IPS_EXPECTED), and due to the wrong assumption that the
expected bit would apply only for the first (i.e., 'new') packet of a
connection, while in fact the expected bit remains on for the lifetime of
an expected connection.  The 'ctinfo' value IP_CT_RELATED derived from
the ct status can be used instead, as it is only ever applicable to
the 'new' packets of the expected connection.

Fixes: 05752523e565 ('openvswitch: Interface with NAT.')
Reported-by: Dan Carpenter 
Signed-off-by: Jarno Rajahalme 
Signed-off-by: Pablo Neira Ayuso 
---
 net/openvswitch/conntrack.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index dc5eb29..47f7c62 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -664,11 +664,12 @@ static int ovs_ct_nat(struct net *net, struct sw_flow_key 
*key,
 
/* Determine NAT type.
 * Check if the NAT type can be deduced from the tracked connection.
-* Make sure expected traffic is NATted only when committing.
+* Make sure new expected connections (IP_CT_RELATED) are NATted only
+* when committing.
 */
if (info->nat & OVS_CT_NAT && ctinfo != IP_CT_NEW &&
ct->status & IPS_NAT_MASK &&
-   (!(ct->status & IPS_EXPECTED_BIT) || info->commit)) {
+   (ctinfo != IP_CT_RELATED || info->commit)) {
/* NAT an established or related connection like before. */
if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY)
/* This is the REPLY direction for a connection
-- 
2.1.4



[PATCH 6/9] netfilter: x_tables: fix unconditional helper

2016-03-28 Thread Pablo Neira Ayuso
From: Florian Westphal 

Ben Hawkes says:

 In the mark_source_chains function (net/ipv4/netfilter/ip_tables.c) it
 is possible for a user-supplied ipt_entry structure to have a large
 next_offset field. This field is not bounds checked prior to writing a
 counter value at the supplied offset.

Problem is that mark_source_chains should not have been called --
the rule doesn't have a next entry, so its supposed to return
an absolute verdict of either ACCEPT or DROP.

However, the function conditional() doesn't work as the name implies.
It only checks that the rule is using wildcard address matching.

However, an unconditional rule must also not be using any matches
(no -m args).

The underflow validator only checked the addresses, therefore
passing the 'unconditional absolute verdict' test, while
mark_source_chains also tested for presence of matches, and thus
proceeeded to the next (not-existent) rule.

Unify this so that all the callers have same idea of 'unconditional rule'.

Reported-by: Ben Hawkes 
Signed-off-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
---
 net/ipv4/netfilter/arp_tables.c | 18 +-
 net/ipv4/netfilter/ip_tables.c  | 23 +++
 net/ipv6/netfilter/ip6_tables.c | 23 +++
 3 files changed, 31 insertions(+), 33 deletions(-)

diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index 51d4fe5..a1bb5e7 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -359,11 +359,12 @@ unsigned int arpt_do_table(struct sk_buff *skb,
 }
 
 /* All zeroes == unconditional rule. */
-static inline bool unconditional(const struct arpt_arp *arp)
+static inline bool unconditional(const struct arpt_entry *e)
 {
static const struct arpt_arp uncond;
 
-   return memcmp(arp, , sizeof(uncond)) == 0;
+   return e->target_offset == sizeof(struct arpt_entry) &&
+  memcmp(>arp, , sizeof(uncond)) == 0;
 }
 
 /* Figures out from what hook each rule can be called: returns 0 if
@@ -402,11 +403,10 @@ static int mark_source_chains(const struct xt_table_info 
*newinfo,
|= ((1 << hook) | (1 << NF_ARP_NUMHOOKS));
 
/* Unconditional return/END. */
-   if ((e->target_offset == sizeof(struct arpt_entry) &&
+   if ((unconditional(e) &&
 (strcmp(t->target.u.user.name,
 XT_STANDARD_TARGET) == 0) &&
-t->verdict < 0 && unconditional(>arp)) ||
-   visited) {
+t->verdict < 0) || visited) {
unsigned int oldpos, size;
 
if ((strcmp(t->target.u.user.name,
@@ -551,7 +551,7 @@ static bool check_underflow(const struct arpt_entry *e)
const struct xt_entry_target *t;
unsigned int verdict;
 
-   if (!unconditional(>arp))
+   if (!unconditional(e))
return false;
t = arpt_get_target_c(e);
if (strcmp(t->u.user.name, XT_STANDARD_TARGET) != 0)
@@ -598,9 +598,9 @@ static inline int check_entry_size_and_hooks(struct 
arpt_entry *e,
newinfo->hook_entry[h] = hook_entries[h];
if ((unsigned char *)e - base == underflows[h]) {
if (!check_underflow(e)) {
-   pr_err("Underflows must be unconditional and "
-  "use the STANDARD target with "
-  "ACCEPT/DROP\n");
+   pr_debug("Underflows must be unconditional and "
+"use the STANDARD target with "
+"ACCEPT/DROP\n");
return -EINVAL;
}
newinfo->underflow[h] = underflows[h];
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index fb7694e..89b5d95 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -168,11 +168,12 @@ get_entry(const void *base, unsigned int offset)
 
 /* All zeroes == unconditional rule. */
 /* Mildly perf critical (only if packet tracing is on) */
-static inline bool unconditional(const struct ipt_ip *ip)
+static inline bool unconditional(const struct ipt_entry *e)
 {
static const struct ipt_ip uncond;
 
-   return memcmp(ip, , sizeof(uncond)) == 0;
+   return e->target_offset == sizeof(struct ipt_entry) &&
+  memcmp(>ip, , sizeof(uncond)) == 0;
 #undef FWINV
 }
 
@@ -229,11 +230,10 @@ get_chainname_rulenum(const struct ipt_entry *s, const 
struct ipt_entry *e,
} else if (s == e) {
(*rulenum)++;
 
-   if (s->target_offset == sizeof(struct ipt_entry) &&
+  

Re: [PATCH net v2 2/3] tunnels: Don't apply GRO to multiple layers of encapsulation.

2016-03-28 Thread Alexander Duyck
On Mon, Mar 28, 2016 at 9:31 AM, Tom Herbert  wrote:
> On Sun, Mar 27, 2016 at 9:38 PM, Jesse Gross  wrote:
>> On Sat, Mar 26, 2016 at 12:41 PM, Tom Herbert  wrote:
>>> On Sat, Mar 19, 2016 at 9:32 AM, Jesse Gross  wrote:
 When drivers express support for TSO of encapsulated packets, they
 only mean that they can do it for one layer of encapsulation.
 Supporting additional levels would mean updating, at a minimum,
 more IP length fields and they are unaware of this.

>>> This patch completely breaks GRO for FOU and GUE.
>>>
 No encapsulation device expresses support for handling offloaded
 encapsulated packets, so we won't generate these types of frames
 in the transmit path. However, GRO doesn't have a check for
 multiple levels of encapsulation and will attempt to build them.

 UDP tunnel GRO actually does prevent this situation but it only
 handles multiple UDP tunnels stacked on top of each other. This
 generalizes that solution to prevent any kind of tunnel stacking
 that would cause problems.

>>> GUE and FOU regularly create packets that will be both GSO UDP tunnels
>>> and IPIP, SIT, GRE, etc. This is by design. There should be no
>>> ambiguity in the drivers as to what this means. For instance, if
>>> SKB_GSO_UDP_TUNNEL and SKB_GSO_GRE are set that is GRE/UDP packet, a
>>> driver can use ndo_features_check to validate.
>>>
>>> So multiple levels of encapsulation with GRO is perfectly valid and I
>>> would suggest to simply revert this patch. The one potential issue we
>>> could have is the potential for GRO to construct a packet which is a
>>> UDP-encapsulation inside another encapsulation, like UDP-encap in GRE.
>>> In this case the GSO flags don't provide enough information to
>>> distinguish say between GRE/UDP (common case) and UDP/GRE (uncommon
>>> case). To make this clear we can set udp_mark in GRE, ipip, and sit
>>> but *not* check for it.
>>
>> Generally speaking, multiple levels of encapsulation offload are not
>> valid. I think this is pretty clear from the fact that none of the
>> encapsulation drivers expose support for encapsulation offloads on
>> transmit and the drivers supporting NETIF_F_GSO_GRE and
>> NETIF_F_GSO_UDP_TUNNEL do not mean they can handle GRE in VXLAN.
>>
>
> Kernel software offload does support this combination just fine.
> Seriously, I've tested that more than a thousand times. This is a few
> HW implementations you're referring to. The limitations of these
> drivers should not dictate how we build the software-- it needs to
> work the other way around.

Jesse does have a point.  Drivers aren't checking for this kind of
thing currently as the transmit side doesn't generate these kind of
frames.  The fact that GRO is makes things a bit more messy as we will
really need to restructure the GSO code in order to handle it.  As far
as drivers testing for it I am pretty certain the i40e isn't.  I would
wonder if we need to add yet another GSO bit to indicate that we
support multiple layers of encapsulation.  I'm pretty sure the only
way we could possibly handle it would be in software since what you
are indicating is a indeterminate number of headers that all require
updates.

>> Asking drivers to assume that this combination of flags means FOU
>> doesn't seem right to me. To the best of my knowledge, no driver uses
>> ndo_feature_check to do validation of multiple tunnel offload flags
>> since the assumption is that the stack will never try to do this.
>> Since FOU is being treated as only a single level of encapsulation, I
>> think it would be better to just advertise it that way on transmit
>> (i.e. only set SKB_GSO_UDP_TUNNEL).
>>
> If it's not FOU it will be something else. Arbitrarily declaring
> multi-levels of encapsulation invalid is simply the wrong direction,
> we should be increasing generality and capabilities of the kernel not
> holding it down with artificial limits. This is why the generic TSO
> work that Alexander and Edward are looking at is so important-- if
> this flies then we can offload any combination of encapsulations with
> out protocol specific information.

The segmentation code isn't designed to handle more than 2 layers of
headers.  Currently we have the pointers for the inner headers and the
outer headers.  If you are talking about adding yet another level we
would need additional pointers in the skbuff to handle them and we
currently don't have them at present.

>> The change that you are suggesting would result in packets generated
>> by GRO that cannot be handled properly on transmit in some cases and
>> would likely end up being dropped or malformed. GRO is just an
>> optimization and correctness must come first so we cannot use it if it
>> might corrupt traffic.
>
> That's (a few) drivers problem. It's no different than if they had
> decided that SKB_GSO_UDP_TUNNEL means vxlan, or they can support 

Re: [PATCH] net: macb: Only call GPIO functions if there is a valid GPIO

2016-03-28 Thread Sergei Shtylyov

Hello.

On 03/28/2016 03:47 PM, Charles Keepax wrote:


GPIOlib will print warning messages if we call GPIO functions without a
valid GPIO. Change the code to avoid doing so.

Signed-off-by: Charles Keepax 
---
  drivers/net/ethernet/cadence/macb.c | 8 +---
  1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb.c 
b/drivers/net/ethernet/cadence/macb.c
index 6619178..71bb42e 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -2957,9 +2957,10 @@ static int macb_probe(struct platform_device *pdev)
phy_node =  of_get_next_available_child(np, NULL);
if (phy_node) {
int gpio = of_get_named_gpio(phy_node, "reset-gpios", 0);
-   if (gpio_is_valid(gpio))
+   if (gpio_is_valid(gpio)) {
bp->reset_gpio = gpio_to_desc(gpio);
-   gpiod_direction_output(bp->reset_gpio, 1);
+   gpiod_direction_output(bp->reset_gpio, 1);
+   }


   Oops, sorry for missing this while fixing this code.


}
of_node_put(phy_node);

@@ -3029,7 +3030,8 @@ static int macb_remove(struct platform_device *pdev)
mdiobus_free(bp->mii_bus);

/* Shutdown the PHY if there is a GPIO reset */
-   gpiod_set_value(bp->reset_gpio, 0);
+   if (bp->reset_gpio)
+   gpiod_set_value(bp->reset_gpio, 0);


   Hm, this function was previously OK to call with NULL (it didn't curse)...



unregister_netdev(dev);
clk_disable_unprepare(bp->tx_clk);


MBR, Sergei



Re: BUG: net/netfilter: KASAN: stack-out-of-bounds in tcp_packet

2016-03-28 Thread Pablo Neira Ayuso
On Mon, Mar 28, 2016 at 06:48:51PM +0200, Jozsef Kadlecsik wrote:
> Hi David, Pablo,
> 
> David, do you agree with the patch for net/ipv4/tcp_input.c? If yes, how 
> should I proceed? Should I send the whole patch to you or is it OK to send 
> to Pablo?

Submit a formal patch and Cc: netdev@vger.kernel.org and
netfilter-de...@vger.kernel.org, then we can request David to ack this
if he considers it fine or the other way around (I ack it he takes
it).

Thanks!


Re: [RFC net-next 2/2] udp: No longer use SLAB_DESTROY_BY_RCU

2016-03-28 Thread Eric Dumazet
On Mon, 2016-03-28 at 09:15 -0700, Rick Jones wrote:
> On 03/25/2016 03:29 PM, Eric Dumazet wrote:
> > UDP sockets are not short lived in the high usage case, so the added
> > cost of call_rcu() should not be a concern.
> 
> Even a busy DNS resolver?

If you mean that a busy DNS resolver spends _most_ of its time doing :

fd = socket()
bind(fd  port=0)
  < send and receive one frame >
close(fd)

(If this is the case, may I suggest doing something different, and use
some kind of caches ? It will be way faster.)

Then the result for 10,000,000 loops of  are

Before patch : 

real0m13.665s
user0m0.548s
sys 0m12.372s

After patch :

real0m20.599s
user0m0.465s
sys 0m17.965s

So the worst overhead is 700 ns

This is roughly the cost for bringing 960 bytes from memory, or 15 cache
lines (on x86_64)

# grep UDP /proc/slabinfo 
UDPLITEv6  0  0   108872 : tunables   24   128 : 
slabdata  0  0  0
UDPv6 24 49   108872 : tunables   24   128 : 
slabdata  7  7  0
UDP-Lite   0  096041 : tunables   54   278 : 
slabdata  0  0  0
UDP   30 3696041 : tunables   54   278 : 
slabdata  9  9  2

In reality, chances that UDP sockets are re-opened right after being
freed and their 15 cache lines are very hot in cpu caches is quite
small, so I would not worry at all about this rather stupid benchmark.

int main(int argc, char *argv[]) {
struct sockaddr_in addr;
int i, fd, loops = 1000;

for (i = 0; i < loops; i++) {
fd = socket(AF_INET, SOCK_DGRAM, 0);
if (fd == -1) {
perror("socket");
break;
}
memset(, 0, sizeof(addr));
addr.sin_family = AF_INET;
if (bind(fd, (const struct sockaddr *), sizeof(addr)) == 
-1) {
perror("bind");
break;
}
close(fd);
}
return 0;
}




Re: [RFC net-next 2/2] udp: No longer use SLAB_DESTROY_BY_RCU

2016-03-28 Thread Tom Herbert
On Mon, Mar 28, 2016 at 9:15 AM, Rick Jones  wrote:
> On 03/25/2016 03:29 PM, Eric Dumazet wrote:
>>
>> UDP sockets are not short lived in the high usage case, so the added
>> cost of call_rcu() should not be a concern.
>
>
> Even a busy DNS resolver?
>
Should use connectionless UDP sockets.

> rick jones


Re: [PATCH net] team: team should sync the port's uc/mc addrs when add a port

2016-03-28 Thread Marcelo Ricardo Leitner
On Tue, Mar 29, 2016 at 12:42:31AM +0800, Xin Long wrote:
> There is an issue when we use mavtap over team:
> When we replug nic links from team0, the real nics's mc list will not
> include the maddr for macvtap any more. then we can't receive pkts to
> macvtap device, as they are filterred by mc list of nic.
> 
> In Bonding Driver, it syncs the uc/mc addrs in bond_enslave().
> 
> We will fix this issue on team by adding the port's uc/mc addrs sync in
> team_port_add.
> 
> Signed-off-by: Xin Long 

Reviewed-by: Marcelo Ricardo Leitner 

> ---
>  drivers/net/team/team.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
> index 26c64d2..17ff367 100644
> --- a/drivers/net/team/team.c
> +++ b/drivers/net/team/team.c
> @@ -1198,6 +1198,9 @@ static int team_port_add(struct team *team, struct 
> net_device *port_dev)
>   goto err_dev_open;
>   }
>  
> + dev_uc_sync_multiple(port_dev, dev);
> + dev_mc_sync_multiple(port_dev, dev);
> +
>   err = vlan_vids_add_by_dev(port_dev, dev);
>   if (err) {
>   netdev_err(dev, "Failed to add vlan ids to device %s\n",
> -- 
> 2.1.0
> 


Re: BUG: net/netfilter: KASAN: stack-out-of-bounds in tcp_packet

2016-03-28 Thread Jozsef Kadlecsik
Hi David, Pablo,

David, do you agree with the patch for net/ipv4/tcp_input.c? If yes, how 
should I proceed? Should I send the whole patch to you or is it OK to send 
to Pablo?

Best regards,
Jozsef

On Mon, 28 Mar 2016, Baozeng Ding wrote:

> 
> 
> On 2016/3/28 10:35, Baozeng Ding wrote:
> > 
> > 
> > On 2016/3/28 6:25, Jozsef Kadlecsik wrote:
> > > On Mon, 28 Mar 2016, Jozsef Kadlecsik wrote:
> > > 
> > > > On Sun, 27 Mar 2016, Baozeng Ding wrote:
> > > > 
> > > > > The following program triggers stack-out-of-bounds in tcp_packet. The
> > > > > kernel version is 4.5 (on Mar 16 commit
> > > > > 09fd671ccb2475436bd5f597f751ca4a7d177aea).
> > > > > Uncovered with syzkaller. Thanks.
> > > > > 
> > > > > ==
> > > > > BUG: KASAN: stack-out-of-bounds in tcp_packet+0x4b77/0x51c0 at addr
> > > > > 8800a45df3c8
> > > > > Read of size 1 by task 0327/11132
> > > > > page:ea00029177c0 count:0 mapcount:0 mapping: (null) index:0x0
> > > > > flags: 0x1fffc00()
> > > > > page dumped because: kasan: bad access detected
> > > > > CPU: 1 PID: 11132 Comm: 0327 Tainted: GB 4.5.0+ #12
> > > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> > > > > rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
> > > > >   0001 8800a45df148 82945051 8800a45df1d8
> > > > >   8800a45df3c8 0027 0001 8800a45df1c8
> > > > >   81709f88 8800b4f7e3d0 0028 0286
> > > > > Call Trace:
> > > > > [< inline >] __dump_stack /kernel/lib/dump_stack.c:15
> > > > > [] dump_stack+0xb3/0x112 /kernel/lib/dump_stack.c:51
> > > > > [< inline >] print_address_description
> > > > > /kernel/mm/kasan/report.c:150
> > > > > [] kasan_report_error+0x4f8/0x530
> > > > > /kernel/mm/kasan/report.c:236
> > > > > [] ? skb_copy_bits+0x49d/0x6d0
> > > > > /kernel/net/core/skbuff.c:1675
> > > > > [< inline >] ? spin_lock_bh
> > > > > /kernel/include/linux/spinlock.h:307
> > > > > [] ? tcp_packet+0x1c9/0x51c0
> > > > > /kernel/net/netfilter/nf_conntrack_proto_tcp.c:833
> > > > > [< inline >] kasan_report /kernel/mm/kasan/report.c:259
> > > > > [] __asan_report_load1_noabort+0x3e/0x40
> > > > > /kernel/mm/kasan/report.c:277
> > > > > [< inline >] ? tcp_sack
> > > > > /kernel/net/netfilter/nf_conntrack_proto_tcp.c:473
> > > > > [< inline >] ? tcp_in_window
> > > > > /kernel/net/netfilter/nf_conntrack_proto_tcp.c:527
> > > > > [] ? tcp_packet+0x4b77/0x51c0
> > > > > /kernel/net/netfilter/nf_conntrack_proto_tcp.c:1036
> > > > > [< inline >] tcp_sack
> > > > > /kernel/net/netfilter/nf_conntrack_proto_tcp.c:473
> > > > > [< inline >] tcp_in_window
> > > > > /kernel/net/netfilter/nf_conntrack_proto_tcp.c:527
> > > > > [] tcp_packet+0x4b77/0x51c0
> > > > > /kernel/net/netfilter/nf_conntrack_proto_tcp.c:1036
> > > > > [] ? memset+0x28/0x30 /kernel/mm/kasan/kasan.c:302
> > > > > [] ? tcp_new+0x1a4/0xc20
> > > > > /kernel/net/netfilter/nf_conntrack_proto_tcp.c:1122
> > > > > [< inline >] ? build_report /kernel/include/net/netlink.h:499
> > > > > [] ? xfrm_send_report+0x426/0x450
> > > > > /kernel/net/xfrm/xfrm_user.c:3039
> > > > > [] ? tcp_new+0xc20/0xc20
> > > > > /kernel/net/netfilter/nf_conntrack_proto_tcp.c:1169
> > > > > [] ? init_conntrack+0xca/0x9e0
> > > > > /kernel/net/netfilter/nf_conntrack_core.c:972
> > > > > [] ? nf_conntrack_alloc+0x40/0x40
> > > > > /kernel/net/netfilter/nf_conntrack_core.c:903
> > > > > [] ? tcp_init_net+0x6e0/0x6e0
> > > > > /kernel/include/net/netfilter/nf_conntrack_l4proto.h:137
> > > > > [] ? ipv4_get_l4proto+0x262/0x390
> > > > > /kernel/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c:89
> > > > > [] ? nf_ct_get_tuple+0xaf/0x190
> > > > > /kernel/net/netfilter/nf_conntrack_core.c:197
> > > > > [] nf_conntrack_in+0x8ee/0x1170
> > > > > /kernel/net/netfilter/nf_conntrack_core.c:1177
> > > > > [] ? init_conntrack+0x9e0/0x9e0
> > > > > /kernel/net/netfilter/nf_conntrack_core.c:287
> > > > > [] ? ipt_do_table+0xa16/0x1260
> > > > > /kernel/net/ipv4/netfilter/ip_tables.c:423
> > > > > [] ? trace_hardirqs_on+0xd/0x10
> > > > > /kernel/kernel/locking/lockdep.c:2635
> > > > > [] ? __local_bh_enable_ip+0x6b/0xc0
> > > > > /kernel/kernel/softirq.c:175
> > > > > [] ? check_entry.isra.4+0x190/0x190
> > > > > /kernel/net/ipv6/netfilter/ip6_tables.c:594
> > > > > [] ? ip_reply_glue_bits+0xc0/0xc0
> > > > > /kernel/net/ipv4/ip_output.c:1530
> > > > > [] ipv4_conntrack_local+0x14e/0x1a0
> > > > > /kernel/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c:161
> > > > > [] ? iptable_raw_hook+0x9d/0x1e0
> > > > > /kernel/net/ipv4/netfilter/iptable_raw.c:32
> > > > > [] nf_iterate+0x15d/0x230
> > > > > /kernel/net/netfilter/core.c:274
> > > > > [] ? nf_iterate+0x230/0x230
> > > > > /kernel/net/netfilter/core.c:268
> > > > > [] nf_hook_slow+0x1ad/0x310
> > > > > /kernel/net/netfilter/core.c:306
> > > > > [] ? 

[PATCH net] team: team should sync the port's uc/mc addrs when add a port

2016-03-28 Thread Xin Long
There is an issue when we use mavtap over team:
When we replug nic links from team0, the real nics's mc list will not
include the maddr for macvtap any more. then we can't receive pkts to
macvtap device, as they are filterred by mc list of nic.

In Bonding Driver, it syncs the uc/mc addrs in bond_enslave().

We will fix this issue on team by adding the port's uc/mc addrs sync in
team_port_add.

Signed-off-by: Xin Long 
---
 drivers/net/team/team.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index 26c64d2..17ff367 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -1198,6 +1198,9 @@ static int team_port_add(struct team *team, struct 
net_device *port_dev)
goto err_dev_open;
}
 
+   dev_uc_sync_multiple(port_dev, dev);
+   dev_mc_sync_multiple(port_dev, dev);
+
err = vlan_vids_add_by_dev(port_dev, dev);
if (err) {
netdev_err(dev, "Failed to add vlan ids to device %s\n",
-- 
2.1.0



Re: [PATCH net v2 2/3] tunnels: Don't apply GRO to multiple layers of encapsulation.

2016-03-28 Thread Tom Herbert
On Sun, Mar 27, 2016 at 9:38 PM, Jesse Gross  wrote:
> On Sat, Mar 26, 2016 at 12:41 PM, Tom Herbert  wrote:
>> On Sat, Mar 19, 2016 at 9:32 AM, Jesse Gross  wrote:
>>> When drivers express support for TSO of encapsulated packets, they
>>> only mean that they can do it for one layer of encapsulation.
>>> Supporting additional levels would mean updating, at a minimum,
>>> more IP length fields and they are unaware of this.
>>>
>> This patch completely breaks GRO for FOU and GUE.
>>
>>> No encapsulation device expresses support for handling offloaded
>>> encapsulated packets, so we won't generate these types of frames
>>> in the transmit path. However, GRO doesn't have a check for
>>> multiple levels of encapsulation and will attempt to build them.
>>>
>>> UDP tunnel GRO actually does prevent this situation but it only
>>> handles multiple UDP tunnels stacked on top of each other. This
>>> generalizes that solution to prevent any kind of tunnel stacking
>>> that would cause problems.
>>>
>> GUE and FOU regularly create packets that will be both GSO UDP tunnels
>> and IPIP, SIT, GRE, etc. This is by design. There should be no
>> ambiguity in the drivers as to what this means. For instance, if
>> SKB_GSO_UDP_TUNNEL and SKB_GSO_GRE are set that is GRE/UDP packet, a
>> driver can use ndo_features_check to validate.
>>
>> So multiple levels of encapsulation with GRO is perfectly valid and I
>> would suggest to simply revert this patch. The one potential issue we
>> could have is the potential for GRO to construct a packet which is a
>> UDP-encapsulation inside another encapsulation, like UDP-encap in GRE.
>> In this case the GSO flags don't provide enough information to
>> distinguish say between GRE/UDP (common case) and UDP/GRE (uncommon
>> case). To make this clear we can set udp_mark in GRE, ipip, and sit
>> but *not* check for it.
>
> Generally speaking, multiple levels of encapsulation offload are not
> valid. I think this is pretty clear from the fact that none of the
> encapsulation drivers expose support for encapsulation offloads on
> transmit and the drivers supporting NETIF_F_GSO_GRE and
> NETIF_F_GSO_UDP_TUNNEL do not mean they can handle GRE in VXLAN.
>

Kernel software offload does support this combination just fine.
Seriously, I've tested that more than a thousand times. This is a few
HW implementations you're referring to. The limitations of these
drivers should not dictate how we build the software-- it needs to
work the other way around.

> Asking drivers to assume that this combination of flags means FOU
> doesn't seem right to me. To the best of my knowledge, no driver uses
> ndo_feature_check to do validation of multiple tunnel offload flags
> since the assumption is that the stack will never try to do this.
> Since FOU is being treated as only a single level of encapsulation, I
> think it would be better to just advertise it that way on transmit
> (i.e. only set SKB_GSO_UDP_TUNNEL).
>
If it's not FOU it will be something else. Arbitrarily declaring
multi-levels of encapsulation invalid is simply the wrong direction,
we should be increasing generality and capabilities of the kernel not
holding it down with artificial limits. This is why the generic TSO
work that Alexander and Edward are looking at is so important-- if
this flies then we can offload any combination of encapsulations with
out protocol specific information.

> The change that you are suggesting would result in packets generated
> by GRO that cannot be handled properly on transmit in some cases and
> would likely end up being dropped or malformed. GRO is just an
> optimization and correctness must come first so we cannot use it if it
> might corrupt traffic.

That's (a few) drivers problem. It's no different than if they had
decided that SKB_GSO_UDP_TUNNEL means vxlan, or they can support GRE
in IPv4 offload but not GRE in IPv6, or they can only handle headers
of a certain size, can't handle IPv6 ext. hdrs., etc. As I mentioned,
the long term solution is to eliminate the GSO_ flags and use a
generic segmentation offload interface. But in the interim, it is
*incumbent* on drivers to determine if they can handle a GSO packet
and the interfaces to do that exist. Restricting the capabilities of
GRO just to appease those drivers is not right. Breaking existing GRO
for their benefit is definitely not right.

Tom


Re: [RFC PATCH 7/9] GSO: Support partial segmentation offload

2016-03-28 Thread Alexander Duyck
On Sun, Mar 27, 2016 at 10:36 PM, Jesse Gross  wrote:
> On Fri, Mar 18, 2016 at 4:25 PM, Alexander Duyck  wrote:
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index edb7179bc051..666cf427898b 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -2711,6 +2711,19 @@ struct sk_buff *__skb_gso_segment(struct sk_buff *skb,
> [...]
>> +   /* Only report GSO partial support if it will enable us to
>> +* support segmentation on this frame without needing additional
>> +* work.
>> +*/
>> +   if (features & NETIF_F_GSO_PARTIAL) {
>> +   netdev_features_t partial_features;
>> +   struct net_device *dev = skb->dev;
>> +
>> +   partial_features = dev->features & dev->gso_partial_features;
>> +   if (!skb_gso_ok(skb, features | partial_features))
>> +   features &= ~NETIF_F_GSO_PARTIAL;
>
> I think we need to add NETIF_F_GSO_ROBUST into the skb_gso_ok() check
> - otherwise packets coming from VMs fail this test and we lose GSO
> partial. It's totally safe to expose this feature, since we'll compute
> gso_segs anyways.

Good point.  I will update that before submitting for net-next.

>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index f044f970f1a6..bdcba77e164c 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -3281,6 +3291,15 @@ perform_csum_check:
>>  */
>> segs->prev = tail;
>>
>> +   /* Update GSO info on first skb in partial sequence. */
>> +   if (partial_segs) {
>> +   skb_shinfo(segs)->gso_size = mss / partial_segs;
>
> One small thing: this gso_size is the same as the original MSS, right?
> It seems like we could trivially stick it in a local variable and
> avoid the extra division.

Nice catch.  I was a bit too in the mindset of having to use the same
variable throughout.

>> +   skb_shinfo(segs)->gso_segs = partial_segs;
>> +   skb_shinfo(segs)->gso_type = skb_shinfo(head_skb)->gso_type |
>> +SKB_GSO_PARTIAL;
>
> Since we're computing the gso_segs ourselves, it might be nice to
> strip out SKB_GSO_DODGY when we set the type.

I will have that fixed for the version I submit for net-next.

> I just wanted to say that this is really nice work - I was expecting
> it to turn out to be really messy and unmaintainable but this is very
> clean. Thanks!

Thanks.

- Alex


Re: [RFC net-next 2/2] udp: No longer use SLAB_DESTROY_BY_RCU

2016-03-28 Thread Rick Jones

On 03/25/2016 03:29 PM, Eric Dumazet wrote:

UDP sockets are not short lived in the high usage case, so the added
cost of call_rcu() should not be a concern.


Even a busy DNS resolver?

rick jones


Re: [PATCH 0/3] Control ethernet PHY LEDs via LED subsystem

2016-03-28 Thread Andrew Lunn
Hi Stephen
 
> There already is LED control via ethtool.
> It is more important that the existing API (ethtool) still work.

I'm having trouble finding this. The man page for ethtool only
mentions -p --identify with respect to LEDs.

I also don't see anything in include/uapi/linux/ethtool.h

Please could you give more specific pointers.

Thanks
Andrew


[PATCH] mwifiex: ie_list is an array, so no need to check if NULL

2016-03-28 Thread Colin King
From: Colin Ian King 

ap_ie->ie_list is an array of struct mwifiex_ie and can never
be null, so the null check on this array is redundant and can
be removed.

Signed-off-by: Colin Ian King 
---
 drivers/net/wireless/marvell/mwifiex/uap_cmd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/uap_cmd.c 
b/drivers/net/wireless/marvell/mwifiex/uap_cmd.c
index 16d95b2..92ce32f 100644
--- a/drivers/net/wireless/marvell/mwifiex/uap_cmd.c
+++ b/drivers/net/wireless/marvell/mwifiex/uap_cmd.c
@@ -694,7 +694,7 @@ static int mwifiex_uap_custom_ie_prepare(u8 *tlv, void 
*cmd_buf, u16 *ie_size)
struct mwifiex_ie_list *ap_ie = cmd_buf;
struct mwifiex_ie_types_header *tlv_ie = (void *)tlv;
 
-   if (!ap_ie || !ap_ie->len || !ap_ie->ie_list)
+   if (!ap_ie || !ap_ie->len)
return -1;
 
*ie_size += le16_to_cpu(ap_ie->len) +
-- 
2.7.4



Re: [PATCH 0/3] Control ethernet PHY LEDs via LED subsystem

2016-03-28 Thread Stephen Hemminger
On Wed, 23 Mar 2016 18:51:37 +0100
Vishal Thanki  wrote:

> Hi all,
> 
> Based on suggestions from Andrew and Florian, I have made some changes
> to expose the ethernet PHY LEDs using kernel LED subsystem. The following
> ALPHA patchset introduces two new LED triggers:
> 
> 1) -eth-phy-link:
> To monitor the PHY Link status
> 
> 2) -eth-phy-activity:
> To monitor the PHY activity status. This trigger may still some more work
> because as of now it just takes decision to set the trigger based on
> PHY state machine and triggers the blink_set with delay_on and delay_off
> parameters set to 0.
> 
> Please provide the review comments so that I can work on this patchset to
> make it complete.
> 
> Thanks,
> Vishal
> 
> Vishal Thanki (3):
>   net: phy: Add ethernet PHY LED triggers
>   net: phy: at8030: Expose the Link and Activity LEDs
>   led: at8030: Add LED driver for AT8030 ethernet PHY
> 
>  drivers/leds/Kconfig |   7 ++
>  drivers/leds/Makefile|   1 +
>  drivers/leds/leds-at803x.c   | 158 
> +++
>  drivers/net/phy/Kconfig  |   7 ++
>  drivers/net/phy/Makefile |   1 +
>  drivers/net/phy/at803x.c |  55 ++-
>  drivers/net/phy/phy.c|  20 +-
>  drivers/net/phy/phy_device.c |   4 ++
>  drivers/net/phy/phy_led.c|  70 +++
>  drivers/net/phy/phy_led.h|  37 ++
>  include/linux/leds.h |   1 +
>  include/linux/phy.h  |   6 ++
>  include/linux/phy/at803x.h   |  45 
>  13 files changed, 409 insertions(+), 3 deletions(-)
>  create mode 100644 drivers/leds/leds-at803x.c
>  create mode 100644 drivers/net/phy/phy_led.c
>  create mode 100644 drivers/net/phy/phy_led.h
>  create mode 100644 include/linux/phy/at803x.h
> 

There already is LED control via ethtool.
It is more important that the existing API (ethtool) still work.


Re: [PATCH] netlabel: fix a problem with netlbl_secattr_catmap_setrng()

2016-03-28 Thread David Miller
From: Paul Moore 
Date: Mon, 28 Mar 2016 11:17:28 -0400

> On Mon, Mar 28, 2016 at 11:10 AM, Paul Moore  wrote:
>> From: Janak Desai 
>>
>> We try to be clever and set large chunks of the bitmap at once, when
>> possible; unfortunately we weren't very clever when we wrote the code
>> and messed up the if-conditional.  Fix this bug and restore proper
>> operation.
>>
>> Signed-off-by: Janak Desai 
>> Signed-off-by: Paul Moore 
>> ---
>>  net/netlabel/netlabel_kapi.c |2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> DaveM, I'm planning on carrying this in selinux#next unless you really
> want to this to go to Linus via the netdev tree.

I'm fine with it going via your tree, thanks.


Re: [PATCH net-next 1/2] net: hns: fixed the setting and getting overtime bug

2016-03-28 Thread David Miller
From: Yisen Zhuang 
Date: Mon, 28 Mar 2016 18:40:56 +0800

> From: Lisheng 
> 
> The overtime setting and getting REGs in HNS V2 is defferent from HNS V1.
> It needs to be distinguished between them if getting or setting the REGs.
> 
> Signed-off-by: Lisheng 
> Signed-off-by: Yisen Zhuang 

Applied.


Re: [PATCH net-next 2/2] net: hns: set-coalesce-usecs returns errno by dsaf.ko

2016-03-28 Thread David Miller
From: Yisen Zhuang 
Date: Mon, 28 Mar 2016 18:40:57 +0800

> From: Lisheng 
> 
> It may fail to set coalesce usecs to HW, and Ethtool needs to know if it
> is successful to cfg the parameter or not. So it needs return the errno by
> dsaf.ko.
> 
> Signed-off-by: Lisheng 
> Signed-off-by: Yisen Zhuang 

Applied.


Re: [PATCH] net: macb: Only call GPIO functions if there is a valid GPIO

2016-03-28 Thread David Miller
From: Charles Keepax 
Date: Mon, 28 Mar 2016 13:47:42 +0100

> GPIOlib will print warning messages if we call GPIO functions without a
> valid GPIO. Change the code to avoid doing so.
> 
> Signed-off-by: Charles Keepax 

Applied.


Re: [PATCH] openvswitch: Use proper buffer size in nla_memcpy

2016-03-28 Thread David Miller
From: Haishuang Yan 
Date: Mon, 28 Mar 2016 18:08:59 +0800

> For the input parameter count, it's better to use the size
> of destination buffer size, as nla_memcpy would take into
> account the length of the source netlink attribute when
> a data is copied from an attribute.
> 
> Signed-off-by: Haishuang Yan 

Applied, thanks.


Re: [B.A.T.M.A.N.] [PATCH 07/14] batman-adv: use list_for_each_entry_safe

2016-03-28 Thread Marek Lindner
On Friday, December 18, 2015 23:33:31 Geliang Tang wrote:
> Use list_for_each_entry_safe() instead of list_for_each_safe() to
> simplify the code.
> 
> Signed-off-by: Geliang Tang 
> ---
>  net/batman-adv/icmp_socket.c | 22 +-
>  1 file changed, 9 insertions(+), 13 deletions(-)

Applied in the batman tree (revision 1557404).

Thanks,
Marek


signature.asc
Description: This is a digitally signed message part.


Re: [B.A.T.M.A.N.] [PATCH 1/3] batman-adv: use to_delayed_work

2016-03-28 Thread Marek Lindner
On Monday, December 28, 2015 23:43:37 Geliang Tang wrote:
> Use to_delayed_work() instead of open-coding it.
> 
> Signed-off-by: Geliang Tang 
> ---
>  net/batman-adv/bridge_loop_avoidance.c | 2 +-
>  net/batman-adv/distributed-arp-table.c | 2 +-
>  net/batman-adv/network-coding.c| 2 +-
>  net/batman-adv/originator.c| 2 +-
>  net/batman-adv/send.c  | 4 ++--
>  net/batman-adv/translation-table.c | 2 +-
>  6 files changed, 7 insertions(+), 7 deletions(-)

Applied in the batman tree (revision ecc6f9a).

Thanks,
Marek


signature.asc
Description: This is a digitally signed message part.


Re: [PATCH net] inet: add proper locking in __inet{6}_lookup()

2016-03-28 Thread David Miller
From: Eric Dumazet 
Date: Mon, 28 Mar 2016 07:23:55 -0700

> On Mon, 2016-03-28 at 07:10 -0700, Eric Dumazet wrote:
>> On Mon, 2016-03-28 at 06:29 -0700, Eric Dumazet wrote:
>> 
>> > Sure, but the caller changed quite a lot in all stable versions.
>> > 
>> > I took the easiest path for stable maintainers, and was planing to
>> > implement a better way in net-next.
>> 
>> I misread your suggestion David, I send a V2 shortly.
>> 
> 
> Actually, I'll wait for net-next opening.
> 
> My brain farted while working on the 'non refcounted TCP listener'
> because __inet_lookup_listener() will really need to be called from
> enclosed rcu_read_lock(), and for a reason I though net tree had a bug.
> 
> The local_bh_disable()/local_bh_enable() removal can certainly wait
> net-next.
> 
> Sorry for the confusion.

Ok, I'll mark this patch as deferred then.

Thanks Eric.


Re: [PATCH] netlabel: fix a problem with netlbl_secattr_catmap_setrng()

2016-03-28 Thread Paul Moore
On Mon, Mar 28, 2016 at 11:10 AM, Paul Moore  wrote:
> From: Janak Desai 
>
> We try to be clever and set large chunks of the bitmap at once, when
> possible; unfortunately we weren't very clever when we wrote the code
> and messed up the if-conditional.  Fix this bug and restore proper
> operation.
>
> Signed-off-by: Janak Desai 
> Signed-off-by: Paul Moore 
> ---
>  net/netlabel/netlabel_kapi.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

DaveM, I'm planning on carrying this in selinux#next unless you really
want to this to go to Linus via the netdev tree.

> diff --git a/net/netlabel/netlabel_kapi.c b/net/netlabel/netlabel_kapi.c
> index 28cddc8..1325776 100644
> --- a/net/netlabel/netlabel_kapi.c
> +++ b/net/netlabel/netlabel_kapi.c
> @@ -677,7 +677,7 @@ int netlbl_catmap_setrng(struct netlbl_lsm_catmap 
> **catmap,
> u32 spot = start;
>
> while (rc == 0 && spot <= end) {
> -   if (((spot & (BITS_PER_LONG - 1)) != 0) &&
> +   if (((spot & (BITS_PER_LONG - 1)) == 0) &&
> ((end - spot) > BITS_PER_LONG)) {
> rc = netlbl_catmap_setlong(catmap,
>spot,
>

-- 
paul moore
www.paul-moore.com


[PATCH] netlabel: fix a problem with netlbl_secattr_catmap_setrng()

2016-03-28 Thread Paul Moore
From: Janak Desai 

We try to be clever and set large chunks of the bitmap at once, when
possible; unfortunately we weren't very clever when we wrote the code
and messed up the if-conditional.  Fix this bug and restore proper
operation.

Signed-off-by: Janak Desai 
Signed-off-by: Paul Moore 
---
 net/netlabel/netlabel_kapi.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netlabel/netlabel_kapi.c b/net/netlabel/netlabel_kapi.c
index 28cddc8..1325776 100644
--- a/net/netlabel/netlabel_kapi.c
+++ b/net/netlabel/netlabel_kapi.c
@@ -677,7 +677,7 @@ int netlbl_catmap_setrng(struct netlbl_lsm_catmap **catmap,
u32 spot = start;
 
while (rc == 0 && spot <= end) {
-   if (((spot & (BITS_PER_LONG - 1)) != 0) &&
+   if (((spot & (BITS_PER_LONG - 1)) == 0) &&
((end - spot) > BITS_PER_LONG)) {
rc = netlbl_catmap_setlong(catmap,
   spot,



Re: [PATCH 25/31] ethernet: use parity8 in sun/niu.c

2016-03-28 Thread Michal Nazarewicz
On Sun, Mar 27 2016, zhaoxiu zeng wrote:
> From: Zeng Zhaoxiu 
>
> Signed-off-by: Zeng Zhaoxiu 

No idea why I’ve been CC’d, but code looks good to me so:

Acked-by: Michal Nazarewicz 

> ---
>  drivers/net/ethernet/sun/niu.c | 10 ++
>  1 file changed, 2 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/net/ethernet/sun/niu.c b/drivers/net/ethernet/sun/niu.c
> index 9cc4564..8c344ef 100644
> --- a/drivers/net/ethernet/sun/niu.c
> +++ b/drivers/net/ethernet/sun/niu.c
> @@ -2742,18 +2742,12 @@ static int niu_set_alt_mac_rdc_table(struct niu *np, 
> int idx,
>  
>  static u64 vlan_entry_set_parity(u64 reg_val)
>  {
> - u64 port01_mask;
> - u64 port23_mask;
> -
> - port01_mask = 0x00ff;
> - port23_mask = 0xff00;
> -
> - if (hweight64(reg_val & port01_mask) & 1)
> + if (parity8(reg_val))
>   reg_val |= ENET_VLAN_TBL_PARITY0;
>   else
>   reg_val &= ~ENET_VLAN_TBL_PARITY0;
>  
> - if (hweight64(reg_val & port23_mask) & 1)
> + if (parity8((unsigned int)reg_val >> 8))
>   reg_val |= ENET_VLAN_TBL_PARITY1;
>   else
>   reg_val &= ~ENET_VLAN_TBL_PARITY1;
> -- 
> 2.5.5
>

-- 
Best regards
ミハウ “퓶퓲퓷퓪86” ナザレヴイツ
«If at first you don’t succeed, give up skydiving»


Re: [PATCH v8 net-next] ravb: Add dma queue interrupt support

2016-03-28 Thread Yoshihiro Kaneko
Hello.

2016-03-27 22:02 GMT+09:00 Sergei Shtylyov :
> Hello.
>
>
> On 03/22/2016 06:22 PM, Yoshihiro Kaneko wrote:
>
>> From: Kazuya Mizuguchi 
>>
>> This patch supports the following interrupts.
>>
>> - One interrupt for multiple (timestamp, error, gPTP)
>> - One interrupt for emac
>> - Four interrupts for dma queue (best effort rx/tx, network control rx/tx)
>>
>> This patch improve efficiency of the interrupt handler by adding the
>> interrupt handler corresponding to each interrupt source described
>> above. Additionally, it reduces the number of times of the access to
>> EthernetAVB IF.
>> Also this patch prevent this driver depends on the whim of a boot loader.
>>
>> [ykaneko0...@gmail.com: define bit names of registers]
>> [ykaneko0...@gmail.com: add comment for gen3 only registers]
>> [ykaneko0...@gmail.com: fix coding style]
>> [ykaneko0...@gmail.com: update changelog]
>> [ykaneko0...@gmail.com: gen3: fix initialization of interrupts]
>> [ykaneko0...@gmail.com: gen3: fix clearing interrupts]
>> [ykaneko0...@gmail.com: gen3: add helper function for request_irq()]
>> [ykaneko0...@gmail.com: gen3: remove IRQF_SHARED flag for request_irq()]
>> [ykaneko0...@gmail.com: revert ravb_close() and ravb_ptp_stop()]
>> [ykaneko0...@gmail.com: avoid calling free_irq() to non-hooked interrupts]
>> [ykaneko0...@gmail.com: make NC/BE interrupt handler a function]
>> [ykaneko0...@gmail.com: make timestamp interrupt handler a function]
>> [ykaneko0...@gmail.com: timestamp interrupt is handled in multiple
>>   interrupt handler instead of dma queue interrupt handler]
>> Signed-off-by: Kazuya Mizuguchi 
>> Signed-off-by: Yoshihiro Kaneko 
>
> [...]
>
> Acked-by: Sergei Shtylyov 

Many thanks for your review and much help!

>
> MBR, Sergei
>

Thanks,
kaneko


Re: [PATCH net] inet: add proper locking in __inet{6}_lookup()

2016-03-28 Thread Eric Dumazet
On Mon, 2016-03-28 at 07:10 -0700, Eric Dumazet wrote:
> On Mon, 2016-03-28 at 06:29 -0700, Eric Dumazet wrote:
> 
> > Sure, but the caller changed quite a lot in all stable versions.
> > 
> > I took the easiest path for stable maintainers, and was planing to
> > implement a better way in net-next.
> 
> I misread your suggestion David, I send a V2 shortly.
> 

Actually, I'll wait for net-next opening.

My brain farted while working on the 'non refcounted TCP listener'
because __inet_lookup_listener() will really need to be called from
enclosed rcu_read_lock(), and for a reason I though net tree had a bug.

The local_bh_disable()/local_bh_enable() removal can certainly wait
net-next.

Sorry for the confusion.




Re: [PATCH net] inet: add proper locking in __inet{6}_lookup()

2016-03-28 Thread Eric Dumazet
On Mon, 2016-03-28 at 06:29 -0700, Eric Dumazet wrote:

> Sure, but the caller changed quite a lot in all stable versions.
> 
> I took the easiest path for stable maintainers, and was planing to
> implement a better way in net-next.

I misread your suggestion David, I send a V2 shortly.




RE: [PATCH net-next 1/1] qlge: Update version to 1.00.00.35

2016-03-28 Thread Manish Chopra
> -Original Message-
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Friday, March 25, 2016 9:12 PM
> To: Manish Chopra 
> Cc: netdev ; Dept-GE Linux NIC Dev  gelinuxnic...@qlogic.com>; Harish Patil 
> Subject: Re: [PATCH net-next 1/1] qlge: Update version to 1.00.00.35
> 
> From: Manish Chopra 
> Date: Fri, 25 Mar 2016 07:14:09 -0400
> 
> > Just updating version as many fixes got accumulated over 1.00.00.34
> >
> > Signed-off-by: Manish Chopra 
> 
> Applied, but it is the 'net' tree that fixes et al. like this should be 
> targetting as
> 'net-next' is closed.

Thanks Dave. By the way I used to see an announcement regarding "net-next is 
CLOSED" on netdev which I don't see these days.
I know that there are other ways to know this but I think it's much easier and 
straight to know this from announcement email so that
everybody be more aware of NOT sending patches for net-next over this period at 
first place.


  



Re: [PATCH net] inet: add proper locking in __inet{6}_lookup()

2016-03-28 Thread Eric Dumazet
On Sun, 2016-03-27 at 22:32 -0400, David Miller wrote:
> From: Eric Dumazet 
> Date: Fri, 25 Mar 2016 15:15:15 -0700
> 
> > From: Eric Dumazet 
> > 
> > Blocking BH in __inet{6}_lookup() is not correct, as the lookups
> > are done using RCU protection.
> > 
> > It matters only starting from Lorenzo Colitti patches to destroy
> > a TCP socket, since rcu_read_lock() is already held by other users
> > of these functions.
> > 
> > This can be backported to all known stable versions, since TCP got RCU
> > lookups back in 2.6.29 : Even if iproute2 contained no code to trigger
> > the bug, some user programs could have used the API.
> > 
> > Signed-off-by: Eric Dumazet 
> > Cc: Lorenzo Colitti 
> 
> This is quite the maze of RCU locking here.  With this change,
> inet_lookup is now:
> 
>   rcu_read_lock();
>   sk = x(); {
>   rcu_read_lock();
>   ...
>   rcu_read_unlock();
>   }
>   if (!sk) {
>   sk = y(); {
>   rcu_read_lock();
>   ...
>   rcu_read_unlock();
>   }
>   }
>   rcu_read_unlock();
> 
> It would seem to me that we should bubble up the rcu locking calls.
> 
> If, as you say, the other users do RCU locking already, then it should
> be safe to do what your patch is doing and also remove the RCU locking
> done by __inet_lookup_established() and __inet_lookup_listener().
> 

Sure, but the caller changed quite a lot in all stable versions.

I took the easiest path for stable maintainers, and was planing to
implement a better way in net-next.

I certainly can take the final form and let you do the backports ?

Thanks.





Re: [PATCH v2] netpoll: Fix extra refcount release in netpoll_cleanup()

2016-03-28 Thread Neil Horman
On Fri, Mar 25, 2016 at 03:16:36PM -0400, David Miller wrote:
> From: Bjorn Helgaas 
> Date: Fri, 25 Mar 2016 11:46:39 -0500
> 
> > You're right, there is an issue here.  I reproduced a problem with a
> > bond device.  bond_netpoll_setup() calls __netpoll_setup() directly
> > (not netpoll_setup()).  I'll debug it more; just wanted to let you
> > know there *is* a problem with this patch.
> 
> I bet that's why the assignment to np->dev and the reference counting
> were separated in the first place :-/
> 
> Indeed, commit 30fdd8a082a00126a6feec994e43e8dc12f5bccb:
> 
> commit 30fdd8a082a00126a6feec994e43e8dc12f5bccb
> Author: Jiri Pirko 
> Date:   Tue Jul 17 05:22:35 2012 +
> 
> netpoll: move np->dev and np->dev_name init into __netpoll_setup()
> 
> Signed-off-by: Jiri Pirko 
> Signed-off-by: David S. Miller 

We probably just want to balance the setting/clearing of np->dev in
__netpoll_setup, so that any error return (that would result in a drop of the
refcount in netpoll_setup) correlates to a setting of np->dev to NULL in
__netpoll_setup. That leaves us with the problem of having to watch for future
imbalances as you mentioned previously Dave, but it seems a potential problem
tomorrow is preferable to an actual problem today.

Another option would be to move the dev_hold/put into __netpoll_setup, but doing
so would I think require some additional refactoring in netpoll_setup.  Namely
that we would still need a dev_hold/put in netpoll_setup to prevent the device
from being removed during the period where we release the rtnl lock in the if
(!netif_running(ndev)) clause. We would have to hold the device, unlock rtnl,
then put the device after re-aquiring rtnl at the end of that if block.

Neil



Re: BUG: net/netfilter: KASAN: stack-out-of-bounds in tcp_packet

2016-03-28 Thread Baozeng Ding



On 2016/3/28 10:35, Baozeng Ding wrote:



On 2016/3/28 6:25, Jozsef Kadlecsik wrote:

On Mon, 28 Mar 2016, Jozsef Kadlecsik wrote:


On Sun, 27 Mar 2016, Baozeng Ding wrote:


The following program triggers stack-out-of-bounds in tcp_packet. The
kernel version is 4.5 (on Mar 16 commit
09fd671ccb2475436bd5f597f751ca4a7d177aea).
Uncovered with syzkaller. Thanks.

==
BUG: KASAN: stack-out-of-bounds in tcp_packet+0x4b77/0x51c0 at addr
8800a45df3c8
Read of size 1 by task 0327/11132
page:ea00029177c0 count:0 mapcount:0 mapping: (null) index:0x0
flags: 0x1fffc00()
page dumped because: kasan: bad access detected
CPU: 1 PID: 11132 Comm: 0327 Tainted: GB 4.5.0+ #12
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
  0001 8800a45df148 82945051 8800a45df1d8
  8800a45df3c8 0027 0001 8800a45df1c8
  81709f88 8800b4f7e3d0 0028 0286
Call Trace:
[< inline >] __dump_stack /kernel/lib/dump_stack.c:15
[] dump_stack+0xb3/0x112 /kernel/lib/dump_stack.c:51
[< inline >] print_address_description 
/kernel/mm/kasan/report.c:150

[] kasan_report_error+0x4f8/0x530
/kernel/mm/kasan/report.c:236
[] ? skb_copy_bits+0x49d/0x6d0
/kernel/net/core/skbuff.c:1675
[< inline >] ? spin_lock_bh 
/kernel/include/linux/spinlock.h:307

[] ? tcp_packet+0x1c9/0x51c0
/kernel/net/netfilter/nf_conntrack_proto_tcp.c:833
[< inline >] kasan_report /kernel/mm/kasan/report.c:259
[] __asan_report_load1_noabort+0x3e/0x40
/kernel/mm/kasan/report.c:277
[< inline >] ? tcp_sack
/kernel/net/netfilter/nf_conntrack_proto_tcp.c:473
[< inline >] ? tcp_in_window
/kernel/net/netfilter/nf_conntrack_proto_tcp.c:527
[] ? tcp_packet+0x4b77/0x51c0
/kernel/net/netfilter/nf_conntrack_proto_tcp.c:1036
[< inline >] tcp_sack
/kernel/net/netfilter/nf_conntrack_proto_tcp.c:473
[< inline >] tcp_in_window
/kernel/net/netfilter/nf_conntrack_proto_tcp.c:527
[] tcp_packet+0x4b77/0x51c0
/kernel/net/netfilter/nf_conntrack_proto_tcp.c:1036
[] ? memset+0x28/0x30 /kernel/mm/kasan/kasan.c:302
[] ? tcp_new+0x1a4/0xc20
/kernel/net/netfilter/nf_conntrack_proto_tcp.c:1122
[< inline >] ? build_report /kernel/include/net/netlink.h:499
[] ? xfrm_send_report+0x426/0x450
/kernel/net/xfrm/xfrm_user.c:3039
[] ? tcp_new+0xc20/0xc20
/kernel/net/netfilter/nf_conntrack_proto_tcp.c:1169
[] ? init_conntrack+0xca/0x9e0
/kernel/net/netfilter/nf_conntrack_core.c:972
[] ? nf_conntrack_alloc+0x40/0x40
/kernel/net/netfilter/nf_conntrack_core.c:903
[] ? tcp_init_net+0x6e0/0x6e0
/kernel/include/net/netfilter/nf_conntrack_l4proto.h:137
[] ? ipv4_get_l4proto+0x262/0x390
/kernel/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c:89
[] ? nf_ct_get_tuple+0xaf/0x190
/kernel/net/netfilter/nf_conntrack_core.c:197
[] nf_conntrack_in+0x8ee/0x1170
/kernel/net/netfilter/nf_conntrack_core.c:1177
[] ? init_conntrack+0x9e0/0x9e0
/kernel/net/netfilter/nf_conntrack_core.c:287
[] ? ipt_do_table+0xa16/0x1260
/kernel/net/ipv4/netfilter/ip_tables.c:423
[] ? trace_hardirqs_on+0xd/0x10
/kernel/kernel/locking/lockdep.c:2635
[] ? __local_bh_enable_ip+0x6b/0xc0
/kernel/kernel/softirq.c:175
[] ? check_entry.isra.4+0x190/0x190
/kernel/net/ipv6/netfilter/ip6_tables.c:594
[] ? ip_reply_glue_bits+0xc0/0xc0
/kernel/net/ipv4/ip_output.c:1530
[] ipv4_conntrack_local+0x14e/0x1a0
/kernel/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c:161
[] ? iptable_raw_hook+0x9d/0x1e0
/kernel/net/ipv4/netfilter/iptable_raw.c:32
[] nf_iterate+0x15d/0x230 
/kernel/net/netfilter/core.c:274
[] ? nf_iterate+0x230/0x230 
/kernel/net/netfilter/core.c:268
[] nf_hook_slow+0x1ad/0x310 
/kernel/net/netfilter/core.c:306
[] ? nf_iterate+0x230/0x230 
/kernel/net/netfilter/core.c:268
[] ? nf_iterate+0x230/0x230 
/kernel/net/netfilter/core.c:268

[] ? prandom_u32+0x24/0x30 /kernel/lib/random32.c:83
[] ? ip_idents_reserve+0x9f/0xf0
/kernel/net/ipv4/route.c:484
[< inline >] nf_hook_thresh 
/kernel/include/linux/netfilter.h:187

[< inline >] nf_hook /kernel/include/linux/netfilter.h:197
[] __ip_local_out+0x263/0x3c0
/kernel/net/ipv4/ip_output.c:104
[] ? ip_finish_output+0xd00/0xd00
/kernel/include/net/ip.h:322
[] ? __ip_flush_pending_frames.isra.45+0x2e0/0x2e0
/kernel/net/ipv4/ip_output.c:1337
[] ? __ip_make_skb+0xfe6/0x1610
/kernel/net/ipv4/ip_output.c:1436
[] ip_local_out+0x2d/0x1c0 
/kernel/net/ipv4/ip_output.c:113
[] ip_send_skb+0x3c/0xc0 
/kernel/net/ipv4/ip_output.c:1443

[] ip_push_pending_frames+0x64/0x80
/kernel/net/ipv4/ip_output.c:1463
[< inline >] rcu_read_unlock 
/kernel/include/linux/rcupdate.h:922

[] raw_sendmsg+0x17bb/0x25c0
/kernel/net/ieee802154/socket.c:53
[] ? dst_output+0x190/0x190 
/kernel/include/net/dst.h:492

[< inline >] ? trace_mm_page_alloc
/kernel/include/trace/events/kmem.h:217
[] ? __alloc_pages_nodemask+0x559/0x16b0

[PATCH] net: macb: Only call GPIO functions if there is a valid GPIO

2016-03-28 Thread Charles Keepax
GPIOlib will print warning messages if we call GPIO functions without a
valid GPIO. Change the code to avoid doing so.

Signed-off-by: Charles Keepax 
---
 drivers/net/ethernet/cadence/macb.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb.c 
b/drivers/net/ethernet/cadence/macb.c
index 6619178..71bb42e 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -2957,9 +2957,10 @@ static int macb_probe(struct platform_device *pdev)
phy_node =  of_get_next_available_child(np, NULL);
if (phy_node) {
int gpio = of_get_named_gpio(phy_node, "reset-gpios", 0);
-   if (gpio_is_valid(gpio))
+   if (gpio_is_valid(gpio)) {
bp->reset_gpio = gpio_to_desc(gpio);
-   gpiod_direction_output(bp->reset_gpio, 1);
+   gpiod_direction_output(bp->reset_gpio, 1);
+   }
}
of_node_put(phy_node);
 
@@ -3029,7 +3030,8 @@ static int macb_remove(struct platform_device *pdev)
mdiobus_free(bp->mii_bus);
 
/* Shutdown the PHY if there is a GPIO reset */
-   gpiod_set_value(bp->reset_gpio, 0);
+   if (bp->reset_gpio)
+   gpiod_set_value(bp->reset_gpio, 0);
 
unregister_netdev(dev);
clk_disable_unprepare(bp->tx_clk);
-- 
2.1.4



  1   2   >