Re: [net-next] openvswitch: add macro MODULE_ALIAS_VPORT_TYPE for vport type alias

2017-06-03 Thread Pravin Shelar
On Sat, Jun 3, 2017 at 6:47 AM, Zhang Shengju
 wrote:
> Add a new macro MODULE_ALIAS_VPORT_TYPE to unify and simplify the
> declaration of vport type alias, and replace magic numbers with
> symbolic constants.
>
> Signed-off-by: Zhang Shengju 
> ---
>  net/openvswitch/vport-geneve.c | 2 +-
>  net/openvswitch/vport-gre.c| 2 +-
>  net/openvswitch/vport-vxlan.c  | 2 +-
>  net/openvswitch/vport.h| 3 +++
>  4 files changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/net/openvswitch/vport-geneve.c b/net/openvswitch/vport-geneve.c
> index 5aaf3ba..1c068d6 100644
> --- a/net/openvswitch/vport-geneve.c
> +++ b/net/openvswitch/vport-geneve.c
> @@ -141,4 +141,4 @@ static void __exit ovs_geneve_tnl_exit(void)
>
>  MODULE_DESCRIPTION("OVS: Geneve switching port");
>  MODULE_LICENSE("GPL");
> -MODULE_ALIAS("vport-type-5");
> +MODULE_ALIAS_VPORT_TYPE(OVS_VPORT_TYPE_GENEVE);
> diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c
> index 0e72d95..48a5852 100644
> --- a/net/openvswitch/vport-gre.c
> +++ b/net/openvswitch/vport-gre.c
> @@ -113,4 +113,4 @@ static void __exit ovs_gre_tnl_exit(void)
>
>  MODULE_DESCRIPTION("OVS: GRE switching port");
>  MODULE_LICENSE("GPL");
> -MODULE_ALIAS("vport-type-3");
> +MODULE_ALIAS_VPORT_TYPE(OVS_VPORT_TYPE_GRE);

This is user visible change. For example this is changing the gre
module alias from "vport-type-3" to "vport-type-OVS_VPORT_TYPE_GRE".
This could break userspace application.


Re: [PATCH net] geneve: fix needed_headroom and max_mtu for collect_metadata

2017-06-03 Thread Pravin Shelar
On Fri, Jun 2, 2017 at 11:54 AM, Eric Garver  wrote:
> Since commit 9b4437a5b870 ("geneve: Unify LWT and netdev handling.")
> when using COLLECT_METADATA geneve devices are created with too small of
> a needed_headroom and too large of a max_mtu. This is because
> ip_tunnel_info_af() is not valid with the device level info when using
> COLLECT_METADATA and we mistakenly fall into the IPv4 case.
>
> For COLLECT_METADATA, always use the worst case of ipv6 since both
> sockets are created.
>
> Fixes: 9b4437a5b870 ("geneve: Unify LWT and netdev handling.")
> Signed-off-by: Eric Garver 

Acked-by: Pravin B Shelar 

Thanks.


Re: [PATCH v2 2/3] PCI: Enable PCIe Relaxed Ordering if supported

2017-06-03 Thread Ding Tianhong


On 2017/6/4 2:19, Alexander Duyck wrote:
> On Fri, Jun 2, 2017 at 9:04 PM, Ding Tianhong  wrote:
>> The PCIe Device Control Register use the bit 4 to indicate that
>> whether the device is permitted to enable relaxed ordering or not.
>> But relaxed ordering is not safe for some platform which could only
>> use strong write ordering, so devices are allowed (but not required)
>> to enable relaxed ordering bit by default.
>>
>> If a platform support relaxed ordering but does not enable it by
>> default, enable it in the PCIe configuration. This allows some device
>> to send TLPs with the relaxed ordering attributes set, which may
>> improve the performance.
>>
>> Signed-off-by: Ding Tianhong 
>> ---
>>  drivers/pci/pci.c   | 42 ++
>>  drivers/pci/probe.c | 11 +++
>>  include/linux/pci.h |  3 +++
>>  3 files changed, 56 insertions(+)
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index b01bd5b..f57a374 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -4878,6 +4878,48 @@ int pcie_set_mps(struct pci_dev *dev, int mps)
>>  EXPORT_SYMBOL(pcie_set_mps);
>>
>>  /**
>> + * pcie_set_relaxed_ordering - set PCI Express relexed ordering bit
>> + * @dev: PCI device to query
>> + *
>> + * If possible sets relaxed ordering
>> + */
>> +int pcie_set_relaxed_ordering(struct pci_dev *dev)
>> +{
>> +   return pcie_capability_set_word(dev, PCI_EXP_DEVCTL, 
>> PCI_EXP_DEVCTL_RELAX_EN);
>> +}
>> +EXPORT_SYMBOL(pcie_set_relaxed_ordering);
>> +
>> +/**
>> + * pcie_clear_relaxed_ordering - clear PCI Express relexed ordering bit
>> + * @dev: PCI device to query
>> + *
>> + * If possible clear relaxed ordering
>> + */
>> +int pcie_clear_relaxed_ordering(struct pci_dev *dev)
>> +{
>> +   return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL, 
>> PCI_EXP_DEVCTL_RELAX_EN);
>> +}
>> +EXPORT_SYMBOL(pcie_clear_relaxed_ordering);
>> +
>> +/**
>> + * pcie_get_relaxed_ordering - check PCI Express relexed ordering bit
>> + * @dev: PCI device to query
>> + *
>> + * Returns true if relaxed ordering is been set
>> + */
>> +int pcie_get_relaxed_ordering(struct pci_dev *dev)
>> +{
>> +   u16 v;
>> +
>> +   pcie_capability_read_word(dev, PCI_EXP_DEVCTL, );
>> +
>> +   return (v & PCI_EXP_DEVCTL_RELAX_EN) >> 4;
>> +}
>> +EXPORT_SYMBOL(pcie_get_relaxed_ordering);
>> +
>> +/**
>> + * pcie_set_mps - set PCI Express maximum payload size
>> +/**
>>   * pcie_get_minimum_link - determine minimum link settings of a PCI device
>>   * @dev: PCI device to query
>>   * @speed: storage for minimum speed
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index 19c8950..aeb22b5 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -1701,6 +1701,16 @@ static void pci_configure_extended_tags(struct 
>> pci_dev *dev)
>>  PCI_EXP_DEVCTL_EXT_TAG);
>>  }
>>
>> +static void pci_configure_relaxed_ordering(struct pci_dev *dev)
>> +{
>> +   int ret;
>> +
>> +   if (dev && (dev->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING))
> 
> So there is a minor issue here. The problem is this is only trying to
> modify relaxed ordering for the device itself. That isn't what we
> want. What we want is to modify it on all of the upstream port
> interfaces where there is something the path to the root complex that
> has an issue. So if the root complex has to set the
> NO_RELAXED_ORDERING flag on a root port, all of the interfaces below
> it that would be pushing traffic toward it should not have the relaxed
> ordering bit set.
> 
> Also I am pretty sure this is a PCIe capability, not a PCI capability.
> You probably need to make sure you code is making this distinction
> which I don't know if it currently is. If you need an example of the
> kind of checks I am suggesting just take a look at
> pcie_configure_mps(). It is verifying the function is PCIe before
> attempting to make any updates. In your case you will probably also
> need to make sure there is a bus for you to walk up the chain of.
> Otherwise this shouldn't apply.
> 

Yes, I miss the upstream ports and the pcie/pci capability, will check
the pcie_configure_mps() again and fix it, thanks.

> 
>> +   pcie_set_relaxed_ordering(dev);
>> +   else
>> +   pcie_clear_relaxed_ordering(dev);
>> +}
> 
> Also I am not a fan of the way this is handled currently. If you don't
> have relaxed ordering set then you don't need to do anything else, if
> you do have it set but there is no bus to walk up you shouldn't change
> it, and if there is a bus to walk up and you find that the root
> complex on that bus has the NO_RELAXED_ORDERING set you should clear
> it. Right now this code seems to be enabling relaxed ordering if the
> NO_RELAXED_ORDERING flag is set.
> 

I'm not very clear about this, I check the PCIe Standard, it said that:

Enable Relaxed Ordering – If this bit is set, the device is

[PATCH net-next v10 0/5] Avoiding stack overflow in skb_to_sgvec

2017-06-03 Thread Jason A. Donenfeld
Changes v9->v10:
   - Spaces to tabs on one line.
   - Added some acked-by, reviewed-by lines.

Since we're down to only cleaning up things like spaces-to-tabs, I
believe we can merge this patch series. David - would you put this in
net-next, please?


The recent bug with macsec and historical one with virtio have
indicated that letting skb_to_sgvec trounce all over an sglist
without checking the length is probably a bad idea. And it's not
necessary either: an sglist already explicitly marks its last
item, and the initialization functions are diligent in doing so.
Thus there's a clear way of avoiding future overflows.

So, this patchset, from a high level, makes skb_to_sgvec return
a potential error code, and then adjusts all callers to check
for the error code. There are two situations in which skb_to_sgvec
might return such an error:

   1) When the passed in sglist is too small; and
   2) When the passed in skbuff is too deeply nested.

So, the first patch in this series handles the issues with
skb_to_sgvec directly, and the remaining ones then handle the call
sites.

Jason A. Donenfeld (5):
  skbuff: return -EMSGSIZE in skb_to_sgvec to prevent overflow
  ipsec: check return value of skb_to_sgvec always
  rxrpc: check return value of skb_to_sgvec always
  macsec: check return value of skb_to_sgvec always
  virtio_net: check return value of skb_to_sgvec always

 drivers/net/macsec.c | 13 --
 drivers/net/virtio_net.c |  9 +--
 include/linux/skbuff.h   |  8 +++---
 net/core/skbuff.c| 65 +++-
 net/ipv4/ah4.c   |  8 --
 net/ipv4/esp4.c  | 20 +--
 net/ipv6/ah6.c   |  8 --
 net/ipv6/esp6.c  | 20 +--
 net/rxrpc/rxkad.c| 19 ++
 9 files changed, 116 insertions(+), 54 deletions(-)

-- 
2.13.0



[PATCH net-next v10 1/5] skbuff: return -EMSGSIZE in skb_to_sgvec to prevent overflow

2017-06-03 Thread Jason A. Donenfeld
This is a defense-in-depth measure in response to bugs like
4d6fa57b4dab ("macsec: avoid heap overflow in skb_to_sgvec"). There's
not only a potential overflow of sglist items, but also a stack overflow
potential, so we fix this by limiting the amount of recursion this function
is allowed to do. Not actually providing a bounded base case is a future
disaster that we can easily avoid here.

As a small matter of house keeping, we take this opportunity to move the
documentation comment over the actual function the documentation is for.

While this could be implemented by using an explicit stack of skbuffs,
when implementing this, the function complexity increased considerably,
and I don't think such complexity and bloat is actually worth it. So,
instead I built this and tested it on x86, x86_64, ARM, ARM64, and MIPS,
and measured the stack usage there. I also reverted the recent MIPS
changes that give it a separate IRQ stack, so that I could experience
some worst-case situations. I found that limiting it to 24 layers deep
yielded a good stack usage with room for safety, as well as being much
deeper than any driver actually ever creates.

Signed-off-by: Jason A. Donenfeld 
Cc: Steffen Klassert 
Cc: Herbert Xu 
Cc: "David S. Miller" 
Cc: David Howells 
Cc: Sabrina Dubroca 
Cc: "Michael S. Tsirkin" 
Cc: Jason Wang 
---
 include/linux/skbuff.h |  8 +++
 net/core/skbuff.c  | 65 --
 2 files changed, 46 insertions(+), 27 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 45a59c1e0cc7..d460a4cbda1c 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -953,10 +953,10 @@ struct sk_buff *skb_realloc_headroom(struct sk_buff *skb,
 unsigned int headroom);
 struct sk_buff *skb_copy_expand(const struct sk_buff *skb, int newheadroom,
int newtailroom, gfp_t priority);
-int skb_to_sgvec_nomark(struct sk_buff *skb, struct scatterlist *sg,
-   int offset, int len);
-int skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset,
-int len);
+int __must_check skb_to_sgvec_nomark(struct sk_buff *skb, struct scatterlist 
*sg,
+int offset, int len);
+int __must_check skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg,
+ int offset, int len);
 int skb_cow_data(struct sk_buff *skb, int tailbits, struct sk_buff **trailer);
 int skb_pad(struct sk_buff *skb, int pad);
 #define dev_kfree_skb(a)   consume_skb(a)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 780b7c1563d0..bba33cf4f7cd 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3508,24 +3508,18 @@ void __init skb_init(void)
NULL);
 }
 
-/**
- * skb_to_sgvec - Fill a scatter-gather list from a socket buffer
- * @skb: Socket buffer containing the buffers to be mapped
- * @sg: The scatter-gather list to map into
- * @offset: The offset into the buffer's contents to start mapping
- * @len: Length of buffer space to be mapped
- *
- * Fill the specified scatter-gather list with mappings/pointers into a
- * region of the buffer space attached to a socket buffer.
- */
 static int
-__skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int 
len)
+__skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int 
len,
+  unsigned int recursion_level)
 {
int start = skb_headlen(skb);
int i, copy = start - offset;
struct sk_buff *frag_iter;
int elt = 0;
 
+   if (unlikely(recursion_level >= 24))
+   return -EMSGSIZE;
+
if (copy > 0) {
if (copy > len)
copy = len;
@@ -3544,6 +3538,8 @@ __skb_to_sgvec(struct sk_buff *skb, struct scatterlist 
*sg, int offset, int len)
end = start + skb_frag_size(_shinfo(skb)->frags[i]);
if ((copy = end - offset) > 0) {
skb_frag_t *frag = _shinfo(skb)->frags[i];
+   if (unlikely(elt && sg_is_last([elt - 1])))
+   return -EMSGSIZE;
 
if (copy > len)
copy = len;
@@ -3558,16 +3554,22 @@ __skb_to_sgvec(struct sk_buff *skb, struct scatterlist 
*sg, int offset, int len)
}
 
skb_walk_frags(skb, frag_iter) {
-   int end;
+   int end, ret;
 
WARN_ON(start > offset + len);
 
end = start + frag_iter->len;
if ((copy = end - offset) > 0) {
+   if (unlikely(elt && sg_is_last([elt - 1])))
+   return 

[PATCH net-next v10 3/5] rxrpc: check return value of skb_to_sgvec always

2017-06-03 Thread Jason A. Donenfeld
Signed-off-by: Jason A. Donenfeld 
Acked-by: David Howells 
---
 net/rxrpc/rxkad.c | 19 ++-
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/net/rxrpc/rxkad.c b/net/rxrpc/rxkad.c
index 1bb9b2ccc267..29fe20ad04aa 100644
--- a/net/rxrpc/rxkad.c
+++ b/net/rxrpc/rxkad.c
@@ -227,7 +227,9 @@ static int rxkad_secure_packet_encrypt(const struct 
rxrpc_call *call,
len &= ~(call->conn->size_align - 1);
 
sg_init_table(sg, nsg);
-   skb_to_sgvec(skb, sg, 0, len);
+   err = skb_to_sgvec(skb, sg, 0, len);
+   if (unlikely(err < 0))
+   goto out;
skcipher_request_set_crypt(req, sg, sg, len, iv.x);
crypto_skcipher_encrypt(req);
 
@@ -324,7 +326,7 @@ static int rxkad_verify_packet_1(struct rxrpc_call *call, 
struct sk_buff *skb,
bool aborted;
u32 data_size, buf;
u16 check;
-   int nsg;
+   int nsg, ret;
 
_enter("");
 
@@ -342,7 +344,9 @@ static int rxkad_verify_packet_1(struct rxrpc_call *call, 
struct sk_buff *skb,
goto nomem;
 
sg_init_table(sg, nsg);
-   skb_to_sgvec(skb, sg, offset, 8);
+   ret = skb_to_sgvec(skb, sg, offset, 8);
+   if (unlikely(ret < 0))
+   return ret;
 
/* start the decryption afresh */
memset(, 0, sizeof(iv));
@@ -409,7 +413,7 @@ static int rxkad_verify_packet_2(struct rxrpc_call *call, 
struct sk_buff *skb,
bool aborted;
u32 data_size, buf;
u16 check;
-   int nsg;
+   int nsg, ret;
 
_enter(",{%d}", skb->len);
 
@@ -434,7 +438,12 @@ static int rxkad_verify_packet_2(struct rxrpc_call *call, 
struct sk_buff *skb,
}
 
sg_init_table(sg, nsg);
-   skb_to_sgvec(skb, sg, offset, len);
+   ret = skb_to_sgvec(skb, sg, offset, len);
+   if (unlikely(ret < 0)) {
+   if (sg != _sg)
+   kfree(sg);
+   return ret;
+   }
 
/* decrypt from the session key */
token = call->conn->params.key->payload.data[0];
-- 
2.13.0



[PATCH net-next v10 2/5] ipsec: check return value of skb_to_sgvec always

2017-06-03 Thread Jason A. Donenfeld
Signed-off-by: Jason A. Donenfeld 
Cc: Steffen Klassert 
Cc: Herbert Xu 
Cc: "David S. Miller" 
---
 net/ipv4/ah4.c  |  8 ++--
 net/ipv4/esp4.c | 20 +---
 net/ipv6/ah6.c  |  8 ++--
 net/ipv6/esp6.c | 20 +---
 4 files changed, 38 insertions(+), 18 deletions(-)

diff --git a/net/ipv4/ah4.c b/net/ipv4/ah4.c
index 22377c8ff14b..e8f862358518 100644
--- a/net/ipv4/ah4.c
+++ b/net/ipv4/ah4.c
@@ -220,7 +220,9 @@ static int ah_output(struct xfrm_state *x, struct sk_buff 
*skb)
ah->seq_no = htonl(XFRM_SKB_CB(skb)->seq.output.low);
 
sg_init_table(sg, nfrags + sglists);
-   skb_to_sgvec_nomark(skb, sg, 0, skb->len);
+   err = skb_to_sgvec_nomark(skb, sg, 0, skb->len);
+   if (unlikely(err < 0))
+   goto out_free;
 
if (x->props.flags & XFRM_STATE_ESN) {
/* Attach seqhi sg right after packet payload */
@@ -393,7 +395,9 @@ static int ah_input(struct xfrm_state *x, struct sk_buff 
*skb)
skb_push(skb, ihl);
 
sg_init_table(sg, nfrags + sglists);
-   skb_to_sgvec_nomark(skb, sg, 0, skb->len);
+   err = skb_to_sgvec_nomark(skb, sg, 0, skb->len);
+   if (unlikely(err < 0))
+   goto out_free;
 
if (x->props.flags & XFRM_STATE_ESN) {
/* Attach seqhi sg right after packet payload */
diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index 93322f895eab..d815d1755473 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -377,9 +377,11 @@ int esp_output_tail(struct xfrm_state *x, struct sk_buff 
*skb, struct esp_info *
esp->esph = esph;
 
sg_init_table(sg, esp->nfrags);
-   skb_to_sgvec(skb, sg,
-(unsigned char *)esph - skb->data,
-assoclen + ivlen + esp->clen + alen);
+   err = skb_to_sgvec(skb, sg,
+  (unsigned char *)esph - skb->data,
+  assoclen + ivlen + esp->clen + alen);
+   if (unlikely(err < 0))
+   goto error;
 
if (!esp->inplace) {
int allocsize;
@@ -403,9 +405,11 @@ int esp_output_tail(struct xfrm_state *x, struct sk_buff 
*skb, struct esp_info *
spin_unlock_bh(>lock);
 
sg_init_table(dsg, skb_shinfo(skb)->nr_frags + 1);
-   skb_to_sgvec(skb, dsg,
-(unsigned char *)esph - skb->data,
-assoclen + ivlen + esp->clen + alen);
+   err = skb_to_sgvec(skb, dsg,
+  (unsigned char *)esph - skb->data,
+  assoclen + ivlen + esp->clen + alen);
+   if (unlikely(err < 0))
+   goto error;
}
 
if ((x->props.flags & XFRM_STATE_ESN))
@@ -690,7 +694,9 @@ static int esp_input(struct xfrm_state *x, struct sk_buff 
*skb)
esp_input_set_header(skb, seqhi);
 
sg_init_table(sg, nfrags);
-   skb_to_sgvec(skb, sg, 0, skb->len);
+   err = skb_to_sgvec(skb, sg, 0, skb->len);
+   if (unlikely(err < 0))
+   goto out;
 
skb->ip_summed = CHECKSUM_NONE;
 
diff --git a/net/ipv6/ah6.c b/net/ipv6/ah6.c
index dda6035e3b84..755f38271dd5 100644
--- a/net/ipv6/ah6.c
+++ b/net/ipv6/ah6.c
@@ -423,7 +423,9 @@ static int ah6_output(struct xfrm_state *x, struct sk_buff 
*skb)
ah->seq_no = htonl(XFRM_SKB_CB(skb)->seq.output.low);
 
sg_init_table(sg, nfrags + sglists);
-   skb_to_sgvec_nomark(skb, sg, 0, skb->len);
+   err = skb_to_sgvec_nomark(skb, sg, 0, skb->len);
+   if (unlikely(err < 0))
+   goto out_free;
 
if (x->props.flags & XFRM_STATE_ESN) {
/* Attach seqhi sg right after packet payload */
@@ -606,7 +608,9 @@ static int ah6_input(struct xfrm_state *x, struct sk_buff 
*skb)
ip6h->hop_limit   = 0;
 
sg_init_table(sg, nfrags + sglists);
-   skb_to_sgvec_nomark(skb, sg, 0, skb->len);
+   err = skb_to_sgvec_nomark(skb, sg, 0, skb->len);
+   if (unlikely(err < 0))
+   goto out_free;
 
if (x->props.flags & XFRM_STATE_ESN) {
/* Attach seqhi sg right after packet payload */
diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c
index 1fe99ba8066c..2ede4e459c4e 100644
--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -346,9 +346,11 @@ int esp6_output_tail(struct xfrm_state *x, struct sk_buff 
*skb, struct esp_info
esph = esp_output_set_esn(skb, x, ip_esp_hdr(skb), seqhi);
 
sg_init_table(sg, esp->nfrags);
-   skb_to_sgvec(skb, sg,
-(unsigned char *)esph - skb->data,
-assoclen + ivlen + esp->clen + alen);
+   err = skb_to_sgvec(skb, sg,
+  (unsigned char *)esph - skb->data,
+  assoclen + ivlen + esp->clen + alen);
+   if (unlikely(err < 0))
+   goto 

[PATCH net-next v10 4/5] macsec: check return value of skb_to_sgvec always

2017-06-03 Thread Jason A. Donenfeld
Signed-off-by: Jason A. Donenfeld 
Cc: Sabrina Dubroca 
---
 drivers/net/macsec.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
index 91642fd87cd1..b79513b8322f 100644
--- a/drivers/net/macsec.c
+++ b/drivers/net/macsec.c
@@ -740,7 +740,12 @@ static struct sk_buff *macsec_encrypt(struct sk_buff *skb,
macsec_fill_iv(iv, secy->sci, pn);
 
sg_init_table(sg, ret);
-   skb_to_sgvec(skb, sg, 0, skb->len);
+   ret = skb_to_sgvec(skb, sg, 0, skb->len);
+   if (unlikely(ret < 0)) {
+   macsec_txsa_put(tx_sa);
+   kfree_skb(skb);
+   return ERR_PTR(ret);
+   }
 
if (tx_sc->encrypt) {
int len = skb->len - macsec_hdr_len(sci_present) -
@@ -947,7 +952,11 @@ static struct sk_buff *macsec_decrypt(struct sk_buff *skb,
macsec_fill_iv(iv, sci, ntohl(hdr->packet_number));
 
sg_init_table(sg, ret);
-   skb_to_sgvec(skb, sg, 0, skb->len);
+   ret = skb_to_sgvec(skb, sg, 0, skb->len);
+   if (unlikely(ret < 0)) {
+   kfree_skb(skb);
+   return ERR_PTR(ret);
+   }
 
if (hdr->tci_an & MACSEC_TCI_E) {
/* confidentiality: ethernet + macsec header
-- 
2.13.0



[PATCH net-next v10 5/5] virtio_net: check return value of skb_to_sgvec always

2017-06-03 Thread Jason A. Donenfeld
Signed-off-by: Jason A. Donenfeld 
Reviewed-by: Sergei Shtylyov 
Cc: "Michael S. Tsirkin" 
Cc: Jason Wang 
---
 drivers/net/virtio_net.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 3e9246cc49c3..57763d30cabb 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1150,7 +1150,7 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff 
*skb)
struct virtio_net_hdr_mrg_rxbuf *hdr;
const unsigned char *dest = ((struct ethhdr *)skb->data)->h_dest;
struct virtnet_info *vi = sq->vq->vdev->priv;
-   unsigned num_sg;
+   int num_sg;
unsigned hdr_len = vi->hdr_len;
bool can_push;
 
@@ -1177,11 +1177,16 @@ static int xmit_skb(struct send_queue *sq, struct 
sk_buff *skb)
if (can_push) {
__skb_push(skb, hdr_len);
num_sg = skb_to_sgvec(skb, sq->sg, 0, skb->len);
+   if (unlikely(num_sg < 0))
+   return num_sg;
/* Pull header back to avoid skew in tx bytes calculations. */
__skb_pull(skb, hdr_len);
} else {
sg_set_buf(sq->sg, hdr, hdr_len);
-   num_sg = skb_to_sgvec(skb, sq->sg + 1, 0, skb->len) + 1;
+   num_sg = skb_to_sgvec(skb, sq->sg + 1, 0, skb->len);
+   if (unlikely(num_sg < 0))
+   return num_sg;
+   num_sg++;
}
return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC);
 }
-- 
2.13.0



Re: [PATCH net] net: ping: do not abuse udp_poll()

2017-06-03 Thread Lorenzo Colitti
On Sun, Jun 4, 2017 at 1:29 AM, Eric Dumazet  wrote:
> The problem is that ping sockets should not use udp_poll() in the first
> place, and recent changes in UDP stack finally exposed this old bug.

Acked-By: Lorenzo Colitti 
Tested-By: Lorenzo Colitti 


Re: [pull request][for-next 0/6] Mellanox mlx5 updates 2017-05-23

2017-06-03 Thread Saeed Mahameed
On Fri, Jun 2, 2017 at 7:57 PM, Alexei Starovoitov
 wrote:
> On Fri, Jun 02, 2017 at 12:08:39PM -0400, David Miller wrote:
>> From: Alexei Starovoitov 
>> Date: Fri, 2 Jun 2017 09:06:43 -0700
>>
>> > On Fri, Jun 02, 2017 at 06:39:40PM +0300, Leon Romanovsky wrote:
>> >> On Thu, Jun 01, 2017 at 06:57:59PM -0400, Doug Ledford wrote:
>> >> > On Thu, 2017-05-25 at 12:02 -0400, David Miller wrote:
>> >> > > From: Saeed Mahameed 
>> >> > > Date: Tue, 23 May 2017 14:43:58 +0300
>> >> > >
>> >> > > > Hi Dave and Doug,
>> >> > > >
>> >> > > > This series introduces some small updates and FPGA support to the
>> >> > > mlx5
>> >> > > > core/ethernet and IB drivers.
>> >> > > >
>> >> > > > For more details please see below.
>> >> > > >
>> >> > > > Please pull and let me know if there's any problem.
>> >> > >
>> >> > > Ok, I've pulled this into net-next.
>> >> > >
>> >> > > Doug let me know if there are any merge hassles we need to coordinate
>> >> > > on.
>> >> > >
>> >> > > Thanks.
>> >> >
>> >> > Will do.  But is the question of a possible revert settled?  Because
>> >> > once I pull, a revert means I have to pull all the way up to the revert
>> >> > as well...
>> >>
>> >> Revert is unlikely to happen, Saeed is working with Alexey to resolve
>> >> his raised concerns.
>> >
>> > I still think the direction is absolutely wrong.
>> > Reinventing fpga interfaces via nic driver is no-go.
>> > mlx needs to send a patch to revert it to show that they are willing
>> > to listen to the community.
>>
>> "Two people from Facebook" is not the commnity Alexei. :-)
>
> I've read the thread that fpga folks are not excited about
> fpga behind the nic either... but yeah i'm mainly expressing
> my opinions here.
>
>> And ironically, you guys will want to be one of the main users of
>> these facilities for the KTLS offload bits, so the desire to separate
>> it out and make "backports easier" really confuses me.
>
> We will not be using ktls offload via hidden fpga which
> is not properly exposed to the kernel.
> We've learned our lesson with eswitch.
>

KTLS/IPSec or any mlx5 FPGA/Innova applications is yet to be submitted,
and as Ilan said the kernel API is WIP, and we will not push anything
until the kernel API is submitted and accepted.

The patch in question that you are asking to revert, is adding no new
fancy functionality or FPGA capabilities to the mlx5 driver yet.
it is only basic mlx5 core stuff to add support of new device
"Mellanox Innova™ Flex 4 Lx EN Adapter Card" [1], so such device which
is already in production
can work with the mlx5_core.ko as is (ConnectX4-Lx) with no special
offloads or special Innova applications.
so this patch is mandatory for those who have this device to have
basic and standard ConnectX4-Lx networking interfaces.

It also adds the needed code separation between mlx5 basic core and
Innova stuff for future Innova applications and API development
(IPsec,KTLS, etc ..), so it will be easy for you to remove Innova
support at compilation time. And when the time comes, each new part
will be introduced with the suitable kernel API in the right place.

> ktls patches are independent of hw offload. It's great that
> mlx fpga can accelerate ktls tx which is extra confirmation
> that ktls api is on the right track. There is still 'ktls rx'
> to finish and 'ktls rx offload' is even trickier, since it's
> more intrusive into the core networking. Once crypto scatter gather
> is improved, we expect to see 8% perf gain out of sw ktls alone.
> So ktls hw offload is complementary and not mandatory.
>
> Back to eswitch analogy. All I hear from mlx is "at this stage
> it's not approriate ...". Well, I've been asking to do
> 'the right thing' for eswitch for months now and eswitch was
> in the driver for years. What are the chances that this fpga
> will ever be exposed? Once the code is merged there is no
> incentive for the vendor to do it differently.
>

"mlx5 eswitch == ethernet SRIOV" and today we already implement most
-if not all- of SRIOV VF ndos, and all are accessible from
"(ip link show) and (ip set vf xyz) " this is for the Legacy mode.
for the new switchdev mode, eswitch is managed by TC tool and
switchdev APIs only.

So we already have max kernel/user visibility in both modes.

can you clarify what bothers you and what is "the right thing"?
because i think we already agreed on the MLX5_ESWITCH/SRIOV kconfig
directive to eliminate eswitch for those who don't need it.
What exactly is missing ?

[1] 
http://www.mellanox.com/page/products_dyn?product_family=228=programmable_adapter_cards_


Re: [PATCH 7/7] mlx5: Do not build eswitch_offloads if CONFIG_MLX5_EN_ESWITCH_OFFLOADS is set

2017-06-03 Thread Saeed Mahameed
On Sat, Jun 3, 2017 at 10:37 PM, Or Gerlitz  wrote:
> On Fri, Jun 2, 2017 at 11:22 PM, Jes Sorensen  wrote:
>> On 05/28/2017 02:03 AM, Or Gerlitz wrote:
>>>
>>> On Sun, May 28, 2017 at 5:23 AM, Jes Sorensen 
>>> wrote:

 On 05/27/2017 05:02 PM, Or Gerlitz wrote:
>
>
> On Sat, May 27, 2017 at 12:16 AM, Jes Sorensen 
> wrote:
>>
>>
>> This gets rid of the temporary #ifdef spaghetti and allows the code to
>> compile without offload support enabled.
>>>
>>>
> I am pretty sure we can do that exercise you're up to without any
> spaghetti cooking and even put more code under that CONFIG directive
> (en_rep.c), I'll take that with Saeed.
>>>
>>>
 I want to avoid adding #ifdef CONFIG_foo to the main code in order to
 keep
 it readable. I did it gradually to make sure I didn't break anything and
 to
 allow for it to be bisected in case something did break. If we can move
 out
 more code from places like en_rep.c into eswitch_offload.c and get it
 disabled that way that would be great, but I like to limit the number of
 #ifdefs we add to the actual code.
>>>
>>>
>>> FWIW (see below), squashing your seven patches to one resulted in a
>>> fairly simple/clear
>>> patch, so if we go that way, no need to have seven commits just for this
>>> piece.
>>
>>
>> Squashing patches into jumbo patches is inherently broken and bad coding
>> practice! It makes it way more complicated to debug and bisect in case a
>> minor detail broke in the process.
>
> Not that pure LOC ##-s is the only/deep measurement, but your overall
> changes in the the seven patch series account to:
>
>  5 files changed, 94 insertions(+), 3 deletions(-)
>
> and by no mean this is jumbo or inherently broken and bad coded, so
> please slow down please, I looked with care on the resulted patch and
> said it's basically ok.
>
>
>>> Re SRIOV, I don't think it would be correct to break the support info few
>>> CONFIG directives. If we want to allow someone to build the driver w.o
>>> SRIOV that's fine, but I don't think we should further go down and disable
>>> some of the SRIOV sub-modes.
>
>>> Re TC offload support, that's make sense.
>
>> OK, so disabling SRIOV and disabling TC makes sense - I'll look at that.
>
> I think Saeed wants us to conduct that exercise, let me check with him
> and get back to you

disabling SRIOV in all the kernel can do the trick, but we want
something more flexible, yet simple.
eswitch is required __ONLY__ for SRIOV, in the light of that fact we
can have CONFIG_MLX5_SRIOV which depends on kernel SRIOV
kconfig directive, that will eliminate  MLX5 sriov support (basically
compile out sriov.c, eswitch.c, eswitch_offloads.c and en_rep.c).

and stub out some sriov NDOs and some little API calls from the main
flows to the above mlx5 sub-modules.
Also we will need to compile out some en_tc.c offloads which meant to
program the eswitch they also call the eswitch_offloads API.


Re: [PATCH] virtio_net: lower limit on buffer size

2017-06-03 Thread Sergei Shtylyov

On 06/02/2017 11:25 PM, J. Bruce Fields wrote:


commit d85b758f72b0 "virtio_net: fix support for small rings"


   Commit d85b758f72b0 ("virtio_net: fix support for small rings")


was supposed to increase the buffer size for small rings
but had an unintentional side effect of decreasing
it for large rings. This seems to break some setups -
it's not yet clear why, but increasing buffer size
back to what it was before helps.

Fixes: d85b758f72b0 "virtio_net: fix support for small rings"


Fixes: d85b758f72b0 ("virtio_net: fix support for small rings")


I may be bikeshedding, but, personally I never do the parens--they're
redundant given the quotes, and space is often tight.


   Just see Documetation/process/submitting-patches.rst.

MBR, Sergei



Re: [pull request][for-next 0/6] Mellanox mlx5 updates 2017-05-23

2017-06-03 Thread Or Gerlitz
On Fri, Jun 2, 2017 at 7:57 PM, Alexei Starovoitov
 wrote:
> Back to eswitch analogy. All I hear from mlx is "at this stage
> it's not approriate ...". Well, I've been asking to do
> 'the right thing' for eswitch for months now and eswitch was
> in the driver for years. What are the chances that this fpga
> will ever be exposed? Once the code is merged there is no
> incentive for the vendor to do it differently.

Alexei, can you please clarify the claims re eswitch? are you
referring to the interfaces to program SRIOV eswitch? if not, what?

Or.


Re: [PATCH 7/7] mlx5: Do not build eswitch_offloads if CONFIG_MLX5_EN_ESWITCH_OFFLOADS is set

2017-06-03 Thread Or Gerlitz
On Fri, Jun 2, 2017 at 11:22 PM, Jes Sorensen  wrote:
> On 05/28/2017 02:03 AM, Or Gerlitz wrote:
>>
>> On Sun, May 28, 2017 at 5:23 AM, Jes Sorensen 
>> wrote:
>>>
>>> On 05/27/2017 05:02 PM, Or Gerlitz wrote:


 On Sat, May 27, 2017 at 12:16 AM, Jes Sorensen 
 wrote:
>
>
> This gets rid of the temporary #ifdef spaghetti and allows the code to
> compile without offload support enabled.
>>
>>
 I am pretty sure we can do that exercise you're up to without any
 spaghetti cooking and even put more code under that CONFIG directive
 (en_rep.c), I'll take that with Saeed.
>>
>>
>>> I want to avoid adding #ifdef CONFIG_foo to the main code in order to
>>> keep
>>> it readable. I did it gradually to make sure I didn't break anything and
>>> to
>>> allow for it to be bisected in case something did break. If we can move
>>> out
>>> more code from places like en_rep.c into eswitch_offload.c and get it
>>> disabled that way that would be great, but I like to limit the number of
>>> #ifdefs we add to the actual code.
>>
>>
>> FWIW (see below), squashing your seven patches to one resulted in a
>> fairly simple/clear
>> patch, so if we go that way, no need to have seven commits just for this
>> piece.
>
>
> Squashing patches into jumbo patches is inherently broken and bad coding
> practice! It makes it way more complicated to debug and bisect in case a
> minor detail broke in the process.

Not that pure LOC ##-s is the only/deep measurement, but your overall
changes in the the seven patch series account to:

 5 files changed, 94 insertions(+), 3 deletions(-)

and by no mean this is jumbo or inherently broken and bad coded, so
please slow down please, I looked with care on the resulted patch and
said it's basically ok.


>> Re SRIOV, I don't think it would be correct to break the support info few
>> CONFIG directives. If we want to allow someone to build the driver w.o
>> SRIOV that's fine, but I don't think we should further go down and disable
>> some of the SRIOV sub-modes.

>> Re TC offload support, that's make sense.

> OK, so disabling SRIOV and disabling TC makes sense - I'll look at that.

I think Saeed wants us to conduct that exercise, let me check with him
and get back to you


Re: [PATCH v2 2/3] PCI: Enable PCIe Relaxed Ordering if supported

2017-06-03 Thread Alexander Duyck
On Fri, Jun 2, 2017 at 9:04 PM, Ding Tianhong  wrote:
> The PCIe Device Control Register use the bit 4 to indicate that
> whether the device is permitted to enable relaxed ordering or not.
> But relaxed ordering is not safe for some platform which could only
> use strong write ordering, so devices are allowed (but not required)
> to enable relaxed ordering bit by default.
>
> If a platform support relaxed ordering but does not enable it by
> default, enable it in the PCIe configuration. This allows some device
> to send TLPs with the relaxed ordering attributes set, which may
> improve the performance.
>
> Signed-off-by: Ding Tianhong 
> ---
>  drivers/pci/pci.c   | 42 ++
>  drivers/pci/probe.c | 11 +++
>  include/linux/pci.h |  3 +++
>  3 files changed, 56 insertions(+)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index b01bd5b..f57a374 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4878,6 +4878,48 @@ int pcie_set_mps(struct pci_dev *dev, int mps)
>  EXPORT_SYMBOL(pcie_set_mps);
>
>  /**
> + * pcie_set_relaxed_ordering - set PCI Express relexed ordering bit
> + * @dev: PCI device to query
> + *
> + * If possible sets relaxed ordering
> + */
> +int pcie_set_relaxed_ordering(struct pci_dev *dev)
> +{
> +   return pcie_capability_set_word(dev, PCI_EXP_DEVCTL, 
> PCI_EXP_DEVCTL_RELAX_EN);
> +}
> +EXPORT_SYMBOL(pcie_set_relaxed_ordering);
> +
> +/**
> + * pcie_clear_relaxed_ordering - clear PCI Express relexed ordering bit
> + * @dev: PCI device to query
> + *
> + * If possible clear relaxed ordering
> + */
> +int pcie_clear_relaxed_ordering(struct pci_dev *dev)
> +{
> +   return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL, 
> PCI_EXP_DEVCTL_RELAX_EN);
> +}
> +EXPORT_SYMBOL(pcie_clear_relaxed_ordering);
> +
> +/**
> + * pcie_get_relaxed_ordering - check PCI Express relexed ordering bit
> + * @dev: PCI device to query
> + *
> + * Returns true if relaxed ordering is been set
> + */
> +int pcie_get_relaxed_ordering(struct pci_dev *dev)
> +{
> +   u16 v;
> +
> +   pcie_capability_read_word(dev, PCI_EXP_DEVCTL, );
> +
> +   return (v & PCI_EXP_DEVCTL_RELAX_EN) >> 4;
> +}
> +EXPORT_SYMBOL(pcie_get_relaxed_ordering);
> +
> +/**
> + * pcie_set_mps - set PCI Express maximum payload size
> +/**
>   * pcie_get_minimum_link - determine minimum link settings of a PCI device
>   * @dev: PCI device to query
>   * @speed: storage for minimum speed
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 19c8950..aeb22b5 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1701,6 +1701,16 @@ static void pci_configure_extended_tags(struct pci_dev 
> *dev)
>  PCI_EXP_DEVCTL_EXT_TAG);
>  }
>
> +static void pci_configure_relaxed_ordering(struct pci_dev *dev)
> +{
> +   int ret;
> +
> +   if (dev && (dev->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING))

So there is a minor issue here. The problem is this is only trying to
modify relaxed ordering for the device itself. That isn't what we
want. What we want is to modify it on all of the upstream port
interfaces where there is something the path to the root complex that
has an issue. So if the root complex has to set the
NO_RELAXED_ORDERING flag on a root port, all of the interfaces below
it that would be pushing traffic toward it should not have the relaxed
ordering bit set.

Also I am pretty sure this is a PCIe capability, not a PCI capability.
You probably need to make sure you code is making this distinction
which I don't know if it currently is. If you need an example of the
kind of checks I am suggesting just take a look at
pcie_configure_mps(). It is verifying the function is PCIe before
attempting to make any updates. In your case you will probably also
need to make sure there is a bus for you to walk up the chain of.
Otherwise this shouldn't apply.


> +   pcie_set_relaxed_ordering(dev);
> +   else
> +   pcie_clear_relaxed_ordering(dev);
> +}

Also I am not a fan of the way this is handled currently. If you don't
have relaxed ordering set then you don't need to do anything else, if
you do have it set but there is no bus to walk up you shouldn't change
it, and if there is a bus to walk up and you find that the root
complex on that bus has the NO_RELAXED_ORDERING set you should clear
it. Right now this code seems to be enabling relaxed ordering if the
NO_RELAXED_ORDERING flag is set.

> +
>  static void pci_configure_device(struct pci_dev *dev)
>  {
> struct hotplug_params hpp;
> @@ -1708,6 +1718,7 @@ static void pci_configure_device(struct pci_dev *dev)
>
> pci_configure_mps(dev);
> pci_configure_extended_tags(dev);
> +   pci_configure_relaxed_ordering(dev);
>
> memset(, 0, sizeof(hpp));
> ret = pci_get_hp_params(dev, );
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 

[PATCHv2 net-next] net: phy: smsc: Implement PHY statistics

2017-06-03 Thread Andrew Lunn
Most of the PHYs supported by the SMSC driver have a counter of symbol
errors. This is 16 bit wide and wraps around when it reaches its
maximum value.

Signed-off-by: Andrew Lunn 
Reviewed-by: Florian Fainelli 
Reviewed-By: Woojung Huh 
---
v2:
Align members
Add Reviewed-by's
---
 drivers/net/phy/smsc.c | 72 ++
 1 file changed, 72 insertions(+)

diff --git a/drivers/net/phy/smsc.c b/drivers/net/phy/smsc.c
index 67c9f2b26c8e..1b8204be064c 100644
--- a/drivers/net/phy/smsc.c
+++ b/drivers/net/phy/smsc.c
@@ -25,6 +25,16 @@
 #include 
 #include 
 
+struct smsc_hw_stat {
+   const char *string;
+   u8 reg;
+   u8 bits;
+};
+
+static struct smsc_hw_stat smsc_hw_stats[] = {
+   { "phy_symbol_errors", 26, 16},
+};
+
 struct smsc_phy_priv {
bool energy_enable;
 };
@@ -143,6 +153,48 @@ static int lan87xx_read_status(struct phy_device *phydev)
return err;
 }
 
+static int smsc_get_sset_count(struct phy_device *phydev)
+{
+   return ARRAY_SIZE(smsc_hw_stats);
+}
+
+static void smsc_get_strings(struct phy_device *phydev, u8 *data)
+{
+   int i;
+
+   for (i = 0; i < ARRAY_SIZE(smsc_hw_stats); i++) {
+   memcpy(data + i * ETH_GSTRING_LEN,
+  smsc_hw_stats[i].string, ETH_GSTRING_LEN);
+   }
+}
+
+#ifndef UINT64_MAX
+#define UINT64_MAX  (u64)(~((u64)0))
+#endif
+static u64 smsc_get_stat(struct phy_device *phydev, int i)
+{
+   struct smsc_hw_stat stat = smsc_hw_stats[i];
+   int val;
+   u64 ret;
+
+   val = phy_read(phydev, stat.reg);
+   if (val < 0)
+   ret = UINT64_MAX;
+   else
+   ret = val;
+
+   return ret;
+}
+
+static void smsc_get_stats(struct phy_device *phydev,
+  struct ethtool_stats *stats, u64 *data)
+{
+   int i;
+
+   for (i = 0; i < ARRAY_SIZE(smsc_hw_stats); i++)
+   data[i] = smsc_get_stat(phydev, i);
+}
+
 static int smsc_phy_probe(struct phy_device *phydev)
 {
struct device *dev = >mdio.dev;
@@ -206,6 +258,11 @@ static struct phy_driver smsc_phy_driver[] = {
.ack_interrupt  = smsc_phy_ack_interrupt,
.config_intr= smsc_phy_config_intr,
 
+   /* Statistics */
+   .get_sset_count = smsc_get_sset_count,
+   .get_strings= smsc_get_strings,
+   .get_stats  = smsc_get_stats,
+
.suspend= genphy_suspend,
.resume = genphy_resume,
 }, {
@@ -228,6 +285,11 @@ static struct phy_driver smsc_phy_driver[] = {
.ack_interrupt  = smsc_phy_ack_interrupt,
.config_intr= smsc_phy_config_intr,
 
+   /* Statistics */
+   .get_sset_count = smsc_get_sset_count,
+   .get_strings= smsc_get_strings,
+   .get_stats  = smsc_get_stats,
+
.suspend= genphy_suspend,
.resume = genphy_resume,
 }, {
@@ -271,6 +333,11 @@ static struct phy_driver smsc_phy_driver[] = {
.ack_interrupt  = smsc_phy_ack_interrupt,
.config_intr= smsc_phy_config_intr,
 
+   /* Statistics */
+   .get_sset_count = smsc_get_sset_count,
+   .get_strings= smsc_get_strings,
+   .get_stats  = smsc_get_stats,
+
.suspend= genphy_suspend,
.resume = genphy_resume,
 }, {
@@ -293,6 +360,11 @@ static struct phy_driver smsc_phy_driver[] = {
.ack_interrupt  = smsc_phy_ack_interrupt,
.config_intr= smsc_phy_config_intr,
 
+   /* Statistics */
+   .get_sset_count = smsc_get_sset_count,
+   .get_strings= smsc_get_strings,
+   .get_stats  = smsc_get_stats,
+
.suspend= genphy_suspend,
.resume = genphy_resume,
 } };
-- 
2.11.0



Re: [PATCH v3 2/7] net: pch_gbe: Pull PHY GPIO handling out of Minnow code

2017-06-03 Thread Andrew Lunn
On Fri, Jun 02, 2017 at 04:40:37PM -0700, Paul Burton wrote:
> The MIPS Boston development board uses the Intel EG20T Platform
> Controller Hub, including its gigabit ethernet controller, and requires
> that its RTL8211E PHY be reset much like the Minnow platform. Pull the
> PHY reset GPIO handling out of Minnow-specific code such that it can be
> shared by later patches.
> 
> Signed-off-by: Paul Burton 
> Cc: David S. Miller 
> Cc: Eric Dumazet 
> Cc: Jarod Wilson 
> Cc: Tobias Klauser 
> Cc: linux-m...@linux-mips.org
> Cc: netdev@vger.kernel.org
> ---
> 
> Changes in v3:
> - Use adapter->pdata as arg to platform_init, to fix bisectability.
> 
> Changes in v2: None
> 
>  drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h|  4 ++-
>  .../net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c   | 33 
> +++---
>  2 files changed, 26 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h 
> b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h
> index 8d710a3b4db0..de1dd08050f4 100644
> --- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h
> +++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h
> @@ -580,15 +580,17 @@ struct pch_gbe_hw_stats {
>  
>  /**
>   * struct pch_gbe_privdata - PCI Device ID driver data
> + * @phy_reset_gpio:  PHY reset GPIO descriptor.
>   * @phy_tx_clk_delay:Bool, configure the PHY TX delay in 
> software
>   * @phy_disable_hibernate:   Bool, disable PHY hibernation
>   * @platform_init:   Platform initialization callback, called from
>   *   probe, prior to PHY initialization.
>   */
>  struct pch_gbe_privdata {
> + struct gpio_desc *phy_reset_gpio;
>   bool phy_tx_clk_delay;
>   bool phy_disable_hibernate;
> - int (*platform_init)(struct pci_dev *pdev);
> + int (*platform_init)(struct pci_dev *, struct pch_gbe_privdata *);
>  };
>  
>  /**
> diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c 
> b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
> index d38198718005..cb9b904786e4 100644
> --- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
> +++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
> @@ -360,6 +360,16 @@ static void pch_gbe_mac_mar_set(struct pch_gbe_hw *hw, 
> u8 * addr, u32 index)
>   pch_gbe_wait_clr_bit(>reg->ADDR_MASK, PCH_GBE_BUSY);
>  }
>  
> +static void pch_gbe_phy_set_reset(struct pch_gbe_hw *hw, int value)
> +{
> + struct pch_gbe_adapter *adapter = pch_gbe_hw_to_adapter(hw);
> +
> + if (!adapter->pdata || !adapter->pdata->phy_reset_gpio)
> + return;
> +
> + gpiod_set_value(adapter->pdata->phy_reset_gpio, value);

Hi Paul

Since you are using the gpiod_ API, the core will take notice of the
active low/active high flag when performing this set.

> +}
> +
>  /**
>   * pch_gbe_mac_reset_hw - Reset hardware
>   * @hw:  Pointer to the HW structure
>  
>   ret = devm_gpio_request_one(>dev, gpio, flags,
>   "minnow_phy_reset");
> - if (ret) {
> + if (!ret)
> + pdata->phy_reset_gpio = gpio_to_desc(gpio);

Here however, you are using the gpio_ API, which ignores the active
high/low flag in device tree. And in your binding patch, you give the
example:

+   phy-reset-gpios = <_gpio 6
+  GPIO_ACTIVE_LOW>;

This active low is totally ignored.

I personally would say this is all messed up, and going to result in
problems for somebody with a board which actually needs an
GPIO_ACTIVE_HIGH.

Please use the gpiod_ API through out and respect the flags in the
device tree binding.

   Andrew


Re: [PATCH net] net: dsa: Fix stale cpu_switch reference after unbind then bind

2017-06-03 Thread Vivien Didelot
Hi Florian,

Florian Fainelli  writes:

> Commit 9520ed8fb841 ("net: dsa: use cpu_switch instead of ds[0]")
> replaced the use of dst->ds[0] with dst->cpu_switch since that is
> functionally equivalent, however, we can now run into an use after free
> scenario after unbinding then rebinding the switch driver.
>
> The use after free happens because we do correctly initialize
> dst->cpu_switch the first time we probe in dsa_cpu_parse(), then we
> unbind the driver: dsa_dst_unapply() is called, and we rebind again.
> dst->cpu_switch now points to a freed "ds" structure, and so when we
> finally dereference it in dsa_cpu_port_ethtool_setup(), we oops.
>
> To fix this, simply set dst->cpu_switch to NULL in dsa_dst_unapply()
> which guarantees that we always correctly re-assign dst->cpu_switch in
> dsa_cpu_parse().
>
> Fixes: 9520ed8fb841 ("net: dsa: use cpu_switch instead of ds[0]")
> Signed-off-by: Florian Fainelli 

Reviewed-by: Vivien Didelot 

Thanks!

Vivien


Re: [PATCH] net: stmmac: ensure jumbo_frm error return is correctly checked for -ve value

2017-06-03 Thread Andy Shevchenko
On Sat, Jun 3, 2017 at 7:35 PM, Colin Ian King  wrote:
> On 03/06/17 16:55, Andy Shevchenko wrote:
>> On Fri, Jun 2, 2017 at 5:58 PM, Colin King  wrote:
>>> The current comparison of entry < 0 will never be true since entry is an
>>> unsigned integer. Cast entry to an int to ensure -ve error return values
>>> from the call to jumbo_frm are correctly being caught.
>>
>>> if (unlikely(is_jumbo) && likely(priv->synopsys_id <
>>>  DWMAC_CORE_4_00)) {
>>> entry = priv->hw->mode->jumbo_frm(tx_q, skb, 
>>> csum_insertion);
>>> -   if (unlikely(entry < 0))
>>> +   if (unlikely((int)entry < 0))
>>
>> It feels like a hiding some other issue.
>>
>
> The alternative is:
>
> int rc = priv->hw->mode->jumbo_frm(tx_q, skb, csum_insertion);
> if (unlikely(rc < 0))
> goto dma_map_err;
>
> entry = rc;

Looks cleaner, though I was thinking about changing type of entry as
Julia suggested. Needs to be checked carefully anyway.

> however, that is effectively the same. The cast I'm using is a well used
> idiom in the kernel, it used in almost a hundred similar cases.

> git grep "< 0" | grep "(int)" | wc -l
> 95

With refined grep I'e got only 56 so far. And I see it's spread only
in 45 modules, most cases are single use cases.
So, I hardly can tell this is an idiom in kernel.

-- 
With Best Regards,
Andy Shevchenko


Re: [PATCH] net: stmmac: ensure jumbo_frm error return is correctly checked for -ve value

2017-06-03 Thread Julia Lawall


On Sat, 3 Jun 2017, Colin Ian King wrote:

> On 03/06/17 16:55, Andy Shevchenko wrote:
> > On Fri, Jun 2, 2017 at 5:58 PM, Colin King  wrote:
> >> The current comparison of entry < 0 will never be true since entry is an
> >> unsigned integer. Cast entry to an int to ensure -ve error return values
> >> from the call to jumbo_frm are correctly being caught.
> >
> >> if (unlikely(is_jumbo) && likely(priv->synopsys_id <
> >>  DWMAC_CORE_4_00)) {
> >> entry = priv->hw->mode->jumbo_frm(tx_q, skb, 
> >> csum_insertion);
> >> -   if (unlikely(entry < 0))
> >> +   if (unlikely((int)entry < 0))
> >
> > It feels like a hiding some other issue.
> >
>
> The alternative is:
>
>   int rc = priv->hw->mode->jumbo_frm(tx_q, skb, csum_insertion);
>   if (unlikely(rc < 0))
>   goto dma_map_err;
>
>   entry = rc;
>
> however, that is effectively the same. The cast I'm using is a well used
> idiom in the kernel, it used in almost a hundred similar cases.
>
> git grep "< 0" | grep "(int)" | wc -l
> 95

Does entry really have to be unsigned?  The jumbo_frm function returns an
int, not an unsigned int, so it seems unpleasant to make it unsigned
prematurely just to put a cast afterwards.  The remaining computation
seems to involve only small numbers.

julia


Re: [PATCH] net: stmmac: ensure jumbo_frm error return is correctly checked for -ve value

2017-06-03 Thread Colin Ian King
On 03/06/17 16:55, Andy Shevchenko wrote:
> On Fri, Jun 2, 2017 at 5:58 PM, Colin King  wrote:
>> The current comparison of entry < 0 will never be true since entry is an
>> unsigned integer. Cast entry to an int to ensure -ve error return values
>> from the call to jumbo_frm are correctly being caught.
> 
>> if (unlikely(is_jumbo) && likely(priv->synopsys_id <
>>  DWMAC_CORE_4_00)) {
>> entry = priv->hw->mode->jumbo_frm(tx_q, skb, csum_insertion);
>> -   if (unlikely(entry < 0))
>> +   if (unlikely((int)entry < 0))
> 
> It feels like a hiding some other issue.
> 

The alternative is:

int rc = priv->hw->mode->jumbo_frm(tx_q, skb, csum_insertion);
if (unlikely(rc < 0))
goto dma_map_err;

entry = rc;

however, that is effectively the same. The cast I'm using is a well used
idiom in the kernel, it used in almost a hundred similar cases.

git grep "< 0" | grep "(int)" | wc -l
95

Colin


[PATCH net] net: ping: do not abuse udp_poll()

2017-06-03 Thread Eric Dumazet
From: Eric Dumazet 

Alexander reported various KASAN messages triggered in recent kernels 

The problem is that ping sockets should not use udp_poll() in the first
place, and recent changes in UDP stack finally exposed this old bug.

Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
Fixes: 6d0bfe226116 ("net: ipv6: Add IPv6 support to the ping socket.")
Signed-off-by: Eric Dumazet 
Reported-by: Sasha Levin 
Cc: Solar Designer 
Cc: Vasiliy Kulikov 
Cc: Lorenzo Colitti 
---
 include/net/ipv6.h |1 +
 net/ipv4/af_inet.c |2 +-
 net/ipv6/ping.c|2 +-
 net/ipv6/raw.c |2 +-
 4 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 
dbf0abba33b8da21be05abf6e719f69542da80fc..3e505bbff8ca4a41f8d39fefcd59aa01b85424f4
 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -1007,6 +1007,7 @@ int inet6_hash_connect(struct inet_timewait_death_row 
*death_row,
  */
 extern const struct proto_ops inet6_stream_ops;
 extern const struct proto_ops inet6_dgram_ops;
+extern const struct proto_ops inet6_sockraw_ops;
 
 struct group_source_req;
 struct group_filter;
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 
f3dad16613437c0c7ac3e9c7518a0929cddb3ca7..58925b6597de83e7d643fb9b1c7e992c9748ae1c
 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1043,7 +1043,7 @@ static struct inet_protosw inetsw_array[] =
.type =   SOCK_DGRAM,
.protocol =   IPPROTO_ICMP,
.prot =   _prot,
-   .ops =_dgram_ops,
+   .ops =_sockraw_ops,
.flags =  INET_PROTOSW_REUSE,
},
 
diff --git a/net/ipv6/ping.c b/net/ipv6/ping.c
index 
9b522fa90e6d8f4a87ebed7cf574a36ceea89c61..ac826dd338ff0825eaf0d2d74cee92d008e018bb
 100644
--- a/net/ipv6/ping.c
+++ b/net/ipv6/ping.c
@@ -192,7 +192,7 @@ static struct inet_protosw pingv6_protosw = {
.type =  SOCK_DGRAM,
.protocol =  IPPROTO_ICMPV6,
.prot =  _prot,
-   .ops =   _dgram_ops,
+   .ops =   _sockraw_ops,
.flags = INET_PROTOSW_REUSE,
 };
 
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 
1f992d9e261d8b75226659a4cead95f8dc04dc4f..60be012fe7085cc7a199e84333cef5ee95ed1f04
 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -1338,7 +1338,7 @@ void raw6_proc_exit(void)
 #endif /* CONFIG_PROC_FS */
 
 /* Same as inet6_dgram_ops, sans udp_poll.  */
-static const struct proto_ops inet6_sockraw_ops = {
+const struct proto_ops inet6_sockraw_ops = {
.family= PF_INET6,
.owner = THIS_MODULE,
.release   = inet6_release,




Re: [PATCH] net: stmmac: ensure jumbo_frm error return is correctly checked for -ve value

2017-06-03 Thread Andy Shevchenko
On Fri, Jun 2, 2017 at 5:58 PM, Colin King  wrote:
> The current comparison of entry < 0 will never be true since entry is an
> unsigned integer. Cast entry to an int to ensure -ve error return values
> from the call to jumbo_frm are correctly being caught.

> if (unlikely(is_jumbo) && likely(priv->synopsys_id <
>  DWMAC_CORE_4_00)) {
> entry = priv->hw->mode->jumbo_frm(tx_q, skb, csum_insertion);
> -   if (unlikely(entry < 0))
> +   if (unlikely((int)entry < 0))

It feels like a hiding some other issue.

-- 
With Best Regards,
Andy Shevchenko


Re: [PATCH] brcmfmac: Fix glob_skb leak in brcmf_sdiod_recv_chain

2017-06-03 Thread Andy Shevchenko
On Fri, Jun 2, 2017 at 9:52 PM, Franky Lin  wrote:
> On Fri, Jun 2, 2017 at 10:22 AM, Peter S. Housel  wrote:

>> err = brcmf_sdiod_buffrw(sdiodev, SDIO_FUNC_2, false, addr,
>>  glom_skb);
>> -   if (err) {
>> -   brcmu_pkt_buf_free_skb(glom_skb);
>> -   goto done;
>> -   }

> What about
> if (!err) {
> skb_queue_walk(pktq, skb) {
> memcpy(skb->data, glom_skb->data, skb->len);
> skb_pull(glom_skb, skb->len);
> }
> }
> brcmu_pkt_buf_free_skb(glom_skb);
>
> Then no goto is needed.

For my point of view it has two subtle inconveniences:
1. Not so usual pattern in use if (!ret)
2. Less error prone in case someone decides to expand the code and
missed ! or something else there.

Since both makes an approach less error prone I wouldn't suggest doing
that as I commented in new version.

-- 
With Best Regards,
Andy Shevchenko


Re: [PATCH v2] brcmfmac: Fix glom_skb leak in brcmf_sdiod_recv_chain

2017-06-03 Thread Andy Shevchenko
On Sat, Jun 3, 2017 at 1:29 AM, Peter S. Housel  wrote:
> An earlier change to this function (3bdae810721b) fixed a leak in the
> case of an unsuccessful call to brcmf_sdiod_buffrw(). However, the
> glom_skb buffer, used for emulating a scattering read, is never used
> or referenced after its contents are copied into the destination
> buffers, and therefore always needs to be freed by the end of the
> function.

> err = brcmf_sdiod_buffrw(sdiodev, SDIO_FUNC_2, false, addr,
>  glom_skb);
> -   if (err) {
> -   brcmu_pkt_buf_free_skb(glom_skb);
> -   goto done;
> -   }
> -
> -   skb_queue_walk(pktq, skb) {
> -   memcpy(skb->data, glom_skb->data, skb->len);
> -   skb_pull(glom_skb, skb->len);

> +   if (!err) {

This is not so often in use type of pattern.

> +   skb_queue_walk(pktq, skb) {
> +   memcpy(skb->data, glom_skb->data, skb->len);
> +   skb_pull(glom_skb, skb->len);
> +   }
> }

> +   brcmu_pkt_buf_free_skb(glom_skb);

Can we just add this one line instead or I'm missing something?

> } else
> err = brcmf_sdiod_sglist_rw(sdiodev, SDIO_FUNC_2, false, addr,
> pktq);



-- 
With Best Regards,
Andy Shevchenko


Re: net: icmp vs udp_poll race?

2017-06-03 Thread Eric Dumazet
On Fri, Jun 2, 2017 at 10:17 PM, Levin, Alexander (Sasha Levin)
 wrote:
> Hi all,
>
> On the latest linux-next I'm seeing issues that look like an icmp
> socket destruction racing with poll(). It manifests in two ways, first:
>
> BUG: KASAN: slab-out-of-bounds in skb_queue_empty include/linux/skbuff.h:1197 
> [inline]
> BUG: KASAN: slab-out-of-bounds in udp_poll+0x5fb/0x6f0 net/ipv4/udp.c:2443
> Read of size 8 at addr 88006941a200 by task syz-executor5/9052
>
> CPU: 2 PID: 9052 Comm: syz-executor5 Not tainted 4.12.0-rc3-next-20170601+ #47
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 
> 04/01/2014
> Call Trace:
>  __dump_stack lib/dump_stack.c:16 [inline]
>  dump_stack+0x115/0x1d1 lib/dump_stack.c:52
>  print_address_description+0xe7/0x370 mm/kasan/report.c:252
>  kasan_report_error mm/kasan/report.c:351 [inline]
>  kasan_report+0x1b0/0x450 mm/kasan/report.c:408
>  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:429
>  skb_queue_empty include/linux/skbuff.h:1197 [inline]
>  udp_poll+0x5fb/0x6f0 net/ipv4/udp.c:2443
>  sock_poll+0x169/0x410 net/socket.c:1101
>  do_pollfd fs/select.c:825 [inline]
>  do_poll fs/select.c:875 [inline]
>  do_sys_poll+0x7a7/0x13b0 fs/select.c:969
>  SYSC_poll fs/select.c:1027 [inline]
>  SyS_poll+0x106/0x460 fs/select.c:1015
>  do_syscall_64+0x275/0x810 arch/x86/entry/common.c:284
>  entry_SYSCALL64_slow_path+0x25/0x25
> RIP: 0033:0x451429
> RSP: 002b:7fee2df0dc08 EFLAGS: 0216 ORIG_RAX: 0007
> RAX: ffda RBX: 2fb0 RCX: 00451429
> RDX: 001f RSI: 000a RDI: 2fb0
> RBP: 00718000 R08:  R09: 
> R10:  R11: 0216 R12: 
> R13: 000a R14: 03c4 R15: 7fee2df0e700
>
> Allocated by task 9052:
>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:513
>  set_track mm/kasan/kasan.c:525 [inline]
>  kasan_kmalloc+0xae/0xe0 mm/kasan/kasan.c:617
>  kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:555
>  slab_post_alloc_hook mm/slab.h:456 [inline]
>  slab_alloc_node mm/slub.c:2712 [inline]
>  slab_alloc mm/slub.c:2720 [inline]
>  kmem_cache_alloc+0x12f/0x610 mm/slub.c:2725
>  sk_prot_alloc+0x6e/0x300 net/core/sock.c:1422
>  sk_alloc+0x82/0x880 net/core/sock.c:1484
>  inet_create+0x519/0x11b0 net/ipv4/af_inet.c:318
>  __sock_create+0x52e/0xa50 net/socket.c:1249
>  sock_create net/socket.c:1289 [inline]
>  SYSC_socket net/socket.c:1319 [inline]
>  SyS_socket+0x105/0x260 net/socket.c:1299
>  do_syscall_64+0x275/0x810 arch/x86/entry/common.c:284
>  return_from_SYSCALL_64+0x0/0x7a
>
> Freed by task 8076:
>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:513
>  set_track mm/kasan/kasan.c:525 [inline]
>  kasan_slab_free+0x72/0xc0 mm/kasan/kasan.c:590
>  slab_free_hook mm/slub.c:1357 [inline]
>  slab_free_freelist_hook mm/slub.c:1379 [inline]
>  slab_free mm/slub.c:2955 [inline]
>  kmem_cache_free+0xec/0x630 mm/slub.c:2977
>  sk_prot_free net/core/sock.c:1465 [inline]
>  __sk_destruct+0x6a1/0xb40 net/core/sock.c:1546
>  sk_destruct+0x57/0xb0 net/core/sock.c:1554
>  __sk_free+0x62/0x260 net/core/sock.c:1562
>  sk_free+0x28/0x40 net/core/sock.c:1573
>  sock_put include/net/sock.h:1655 [inline]
>  sk_common_release+0x241/0x3c0 net/core/sock.c:2902
>  ping_close+0x15/0x20 net/ipv4/ping.c:295
>  inet_release+0x108/0x240 net/ipv4/af_inet.c:425
>  sock_release+0x96/0x260 net/socket.c:597
>  SYSC_socketpair net/socket.c:1436 [inline]
>  SyS_socketpair+0x522/0x710 net/socket.c:1340
>  do_syscall_64+0x275/0x810 arch/x86/entry/common.c:284
>  return_from_SYSCALL_64+0x0/0x7a
>
> The buggy address belongs to the object at 880069419c40
>  which belongs to the cache PING of size 1392
> The buggy address is located 80 bytes to the right of
>  1392-byte region [880069419c40, 88006941a1b0)
> The buggy address belongs to the page:
> page:ea0001a50600 count:1 mapcount:0 mapping:  (null) 
> index:0x88006941d440 compound_mapcount: 0
> flags: 0x5fffc008100(slab|head)
> raw: 05fffc008100  88006941d440 000100120005
> raw: 88006c5ba490 88006c5ba490 88006b197c40 
> page dumped because: kasan: bad access detected
>
> Memory state around the buggy address:
>  88006941a100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>  88006941a180: 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc
>>88006941a200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>^
>  88006941a280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>  88006941a300: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
>
> And second:
>
> INFO: trying to register non-static key.
> the code is fine but needs lockdep annotation.
> turning off the locking correctness validator.
> CPU: 3 PID: 12664 

[net-next] openvswitch: add macro MODULE_ALIAS_VPORT_TYPE for vport type alias

2017-06-03 Thread Zhang Shengju
Add a new macro MODULE_ALIAS_VPORT_TYPE to unify and simplify the
declaration of vport type alias, and replace magic numbers with
symbolic constants.

Signed-off-by: Zhang Shengju 
---
 net/openvswitch/vport-geneve.c | 2 +-
 net/openvswitch/vport-gre.c| 2 +-
 net/openvswitch/vport-vxlan.c  | 2 +-
 net/openvswitch/vport.h| 3 +++
 4 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/openvswitch/vport-geneve.c b/net/openvswitch/vport-geneve.c
index 5aaf3ba..1c068d6 100644
--- a/net/openvswitch/vport-geneve.c
+++ b/net/openvswitch/vport-geneve.c
@@ -141,4 +141,4 @@ static void __exit ovs_geneve_tnl_exit(void)
 
 MODULE_DESCRIPTION("OVS: Geneve switching port");
 MODULE_LICENSE("GPL");
-MODULE_ALIAS("vport-type-5");
+MODULE_ALIAS_VPORT_TYPE(OVS_VPORT_TYPE_GENEVE);
diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c
index 0e72d95..48a5852 100644
--- a/net/openvswitch/vport-gre.c
+++ b/net/openvswitch/vport-gre.c
@@ -113,4 +113,4 @@ static void __exit ovs_gre_tnl_exit(void)
 
 MODULE_DESCRIPTION("OVS: GRE switching port");
 MODULE_LICENSE("GPL");
-MODULE_ALIAS("vport-type-3");
+MODULE_ALIAS_VPORT_TYPE(OVS_VPORT_TYPE_GRE);
diff --git a/net/openvswitch/vport-vxlan.c b/net/openvswitch/vport-vxlan.c
index 869acb3..b6257bb 100644
--- a/net/openvswitch/vport-vxlan.c
+++ b/net/openvswitch/vport-vxlan.c
@@ -179,4 +179,4 @@ static void __exit ovs_vxlan_tnl_exit(void)
 
 MODULE_DESCRIPTION("OVS: VXLAN switching port");
 MODULE_LICENSE("GPL");
-MODULE_ALIAS("vport-type-4");
+MODULE_ALIAS_VPORT_TYPE(OVS_VPORT_TYPE_VXLAN);
diff --git a/net/openvswitch/vport.h b/net/openvswitch/vport.h
index cda66c2..1d1584f 100644
--- a/net/openvswitch/vport.h
+++ b/net/openvswitch/vport.h
@@ -199,4 +199,7 @@ static inline const char *ovs_vport_name(struct vport 
*vport)
 void ovs_vport_ops_unregister(struct vport_ops *ops);
 void ovs_vport_send(struct vport *vport, struct sk_buff *skb, u8 mac_proto);
 
+#define MODULE_ALIAS_VPORT_TYPE(type) \
+   MODULE_ALIAS("vport-type-" __stringify(type))
+
 #endif /* vport.h */
-- 
1.8.3.1





[PATCH] net: Update TCP congestion control documentation

2017-06-03 Thread Anmol Sarma
Update tcp.txt to fix mandatory congestion control ops and default
CCA selection. Also, fix comment in tcp.h for undo_cwnd.

Signed-off-by: Anmol Sarma 
---
 Documentation/networking/tcp.txt | 31 +--
 include/net/tcp.h|  2 +-
 2 files changed, 14 insertions(+), 19 deletions(-)

diff --git a/Documentation/networking/tcp.txt b/Documentation/networking/tcp.txt
index bdc4c0d..9c7139d 100644
--- a/Documentation/networking/tcp.txt
+++ b/Documentation/networking/tcp.txt
@@ -1,7 +1,7 @@
 TCP protocol
 
 
-Last updated: 9 February 2008
+Last updated: 3 June 2017
 
 Contents
 
@@ -29,18 +29,19 @@ As of 2.6.13, Linux supports pluggable congestion control 
algorithms.
 A congestion control mechanism can be registered through functions in
 tcp_cong.c. The functions used by the congestion control mechanism are
 registered via passing a tcp_congestion_ops struct to
-tcp_register_congestion_control. As a minimum name, ssthresh,
-cong_avoid must be valid.
+tcp_register_congestion_control. As a minimum, the congestion control
+mechanism must provide a valid name and must implement either ssthresh,
+cong_avoid and undo_cwnd hooks or the "omnipotent" cong_control hook.
 
 Private data for a congestion control mechanism is stored in tp->ca_priv.
 tcp_ca(tp) returns a pointer to this space.  This is preallocated space - it
 is important to check the size of your private data will fit this space, or
-alternatively space could be allocated elsewhere and a pointer to it could
+alternatively, space could be allocated elsewhere and a pointer to it could
 be stored here.
 
 There are three kinds of congestion control algorithms currently: The
 simplest ones are derived from TCP reno (highspeed, scalable) and just
-provide an alternative the congestion window calculation. More complex
+provide an alternative congestion window calculation. More complex
 ones like BIC try to look at other events to provide better
 heuristics.  There are also round trip time based algorithms like
 Vegas and Westwood+.
@@ -49,21 +50,15 @@ Good TCP congestion control is a complex problem because 
the algorithm
 needs to maintain fairness and performance. Please review current
 research and RFC's before developing new modules.
 
-The method that is used to determine which congestion control mechanism is
-determined by the setting of the sysctl net.ipv4.tcp_congestion_control.
-The default congestion control will be the last one registered (LIFO);
-so if you built everything as modules, the default will be reno. If you
-build with the defaults from Kconfig, then CUBIC will be builtin (not a
-module) and it will end up the default.
+The default congestion control mechanism is chosen based on the
+DEFAULT_TCP_CONG Kconfig parameter. If you really want a particular default
+value then you can set it using sysctl net.ipv4.tcp_congestion_control. The
+module will be autoloaded if needed and you will get the expected protocol. If
+you ask for an unknown congestion method, then the sysctl attempt will fail.
 
-If you really want a particular default value then you will need
-to set it with the sysctl.  If you use a sysctl, the module will be autoloaded
-if needed and you will get the expected protocol. If you ask for an
-unknown congestion method, then the sysctl attempt will fail.
-
-If you remove a tcp congestion control module, then you will get the next
+If you remove a TCP congestion control module, then you will get the next
 available one. Since reno cannot be built as a module, and cannot be
-deleted, it will always be available.
+removed, it will always be available.
 
 How the new TCP output machine [nyi] works.
 ===
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 82462db..28b577a 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -969,7 +969,7 @@ struct tcp_congestion_ops {
void (*cwnd_event)(struct sock *sk, enum tcp_ca_event ev);
/* call when ack arrives (optional) */
void (*in_ack_event)(struct sock *sk, u32 flags);
-   /* new value of cwnd after loss (optional) */
+   /* new value of cwnd after loss (required) */
u32  (*undo_cwnd)(struct sock *sk);
/* hook for packet ack accounting (optional) */
void (*pkts_acked)(struct sock *sk, const struct ack_sample *sample);
-- 
2.7.4



Re: loosing netdevices with namespaces and unshare?

2017-06-03 Thread Eric W. Biederman
Cong Wang  writes:

> On Wed, May 31, 2017 at 11:32 PM, Eric W. Biederman
>  wrote:
>> Cong Wang  writes:
>>> Network namespace does not special-case the physical devices,
>>> it treats them all equally as abstract net devices.
>>
>> Absolutely not true.
>>
>> The relevant code is in net/core/dev.c:default_device_exit
>>
>> If a network device does not implement rntl_link_ops it is returned to
>> the initial network namespace.   Anything else will loose physical
>> devices.
>
> Hmm, I never noticed that if check...
>
>>
>> Only for pure software based devices do we delete them.  Perhaps your
>> sub interface implements rtnl_link_ops?  Either that or something is
>> still holding a reference to your network namespace, which would prevent
>> the network device from being returned.
>>
>
> But this simply sucks:
>
> snprintf(fb_name, IFNAMSIZ, "dev%d", dev->ifindex);
> err = dev_change_net_namespace(dev, _net, fb_name);
> if (err) {
> pr_emerg("%s: failed to move %s to init_net: %d\n",
>  __func__, dev->name, err);
> BUG();
> }
>
> It is essentially hard to handle the error here, but it is quite easy to
> trigger such BUG() by naming other device devX, it is no better
> than just losing it.

The rename only happens if there is a conflicting device name.

Beyond that there is the entire hotplug functionality so it should be
possible to automatically detect a new device in your network namespace
and do something with it.

Eric


Re: ath9k_htc - Division by zero in kernel (as well as firmware panic)

2017-06-03 Thread Nathan Royce
On Sat, Jun 3, 2017 at 2:57 AM, Oleksij Rempel  wrote:
> Hm... this function and file:
> linux/drivers/net/wireless/ath/ath9k/common-beacon.c
> didn't changed since 2015. So, it should be some thing different.
> Can you run
> git bisect to find exact patch caused this regression?
>
That was the first time I experienced the x/0 issue and don't know how
I'd reproduce it.

> Yes, this is "normal" problem. The firmware has no error handler for PCI
> bus related exceptions. So if we filed to read PCI bus first time, we
> have choice to Ooops and stall or Ooops and reboot ASAP. So we reboot
> and provide an kernel "firmware panic!" message.
> Every one who can or will to fix this, is welcome.
>
Thanks for that explanation. I'm not sure it's something I could
tackle though. My best bet in the meantime is to coax systemd to
restart the services when the device pops up. However, every attempt
has failed so far.

> It is possible. If adapter is used in AP mode, then lots of WiFi noise
> is dumped over this interface. I assume the reproducibility depends on
> external environment, not internal.
>
I find this quite believable. I have 2.4ghz happening with the
TP-Link, ZTE Mobley, bluetooth, logitech unifying, usb 3.0. Though all
4 devices are going through the USB 2.0 port, and the tp-link and
mobley have long usb cables in the attic and the hub connects through
a 2m usb extension. So yeah, I've got a lot of variables in play.


Re: ath9k_htc - Division by zero in kernel (as well as firmware panic)

2017-06-03 Thread Oleksij Rempel
Hi,

Am 03.06.2017 um 00:02 schrieb Nathan Royce:
> ODroid XU4
> 
> $ uname -a
> Linux computer 4.12.0-rc3-dirty #1 SMP Wed May 31 15:02:05 CDT 2017
> armv7l GNU/Linux
> 
> $ lsusb
> ...
> Bus 001 Device 002: ID 2109:2813 VIA Labs, Inc.
> Bus 001 Device 010: ID 0cf3:7015 Qualcomm Atheros Communications
> TP-Link TL-WN821N v3 / TL-WN822N v2 802.11n [Atheros AR7010+AR9287]
> ...
> 
> *
> Jun 02 16:20:11 computer hostapd[14954]: vwlan0: interface state
> COUNTRY_UPDATE->HT_SCAN
> Jun 02 16:20:17 computer hostapd[14954]: 20/40 MHz operation not
> permitted on channel pri=7 sec=3 based on overlapping BSSes
> Jun 02 16:20:18 computer kernel: Division by zero in kernel.
> Jun 02 16:20:18 computer kernel: CPU: 1 PID: 14507 Comm: kworker/u16:2
> Tainted: GW   4.12.0-rc3-dirty #1
> Jun 02 16:20:18 computer kernel: Hardware name: SAMSUNG EXYNOS
> (Flattened Device Tree)
> Jun 02 16:20:18 computer kernel: Workqueue: phy5 ieee80211_scan_work 
> [mac80211]
> Jun 02 16:20:18 computer kernel: [] (unwind_backtrace) from
> [] (show_stack+0x10/0x14)
> Jun 02 16:20:18 computer kernel: [] (show_stack) from
> [] (dump_stack+0x88/0x9c)
> Jun 02 16:20:18 computer kernel: [] (dump_stack) from
> [] (Ldiv0_64+0x8/0x18)
> Jun 02 16:20:18 computer kernel: [] (Ldiv0_64) from
> [] (ath9k_get_next_tbtt+0x58/0x5c [ath9k_common])

Hm... this function and file:
linux/drivers/net/wireless/ath/ath9k/common-beacon.c
didn't changed since 2015. So, it should be some thing different.
Can you run
git bisect to find exact patch caused this regression?

> Jun 02 16:20:18 computer kernel: [] (ath9k_get_next_tbtt
> [ath9k_common]) from [] (ath9k_cmn_beacon_config
> Jun 02 16:20:18 computer kernel: []
> (ath9k_cmn_beacon_config_ap [ath9k_common]) from []
> (ath9k_htc_beacon
> Jun 02 16:20:18 computer kernel: []
> (ath9k_htc_beacon_config_ap [ath9k_htc]) from []
> (ath9k_htc_vif_recon
> Jun 02 16:20:18 computer kernel: [] (ath9k_htc_vif_reconfig
> [ath9k_htc]) from [] (ath9k_htc_sw_scan_compl
> Jun 02 16:20:18 computer kernel: []
> (ath9k_htc_sw_scan_complete [ath9k_htc]) from []
> (__ieee80211_scan_co
> Jun 02 16:20:18 computer kernel: []
> (__ieee80211_scan_completed [mac80211]) from []
> (ieee80211_scan_work+
> Jun 02 16:20:18 computer kernel: [] (ieee80211_scan_work
> [mac80211]) from [] (process_one_work+0x1d8/0x40
> Jun 02 16:20:18 computer kernel: [] (process_one_work) from
> [] (worker_thread+0x4c/0x564)
> Jun 02 16:20:18 computer kernel: [] (worker_thread) from
> [] (kthread+0x14c/0x154)
> Jun 02 16:20:18 computer kernel: [] (kthread) from
> [] (ret_from_fork+0x14/0x3c)
> Jun 02 16:20:18 computer hostapd[14954]: Using interface wlan0 with
> hwaddr  and ssid ""
> Jun 02 16:20:18 computer kernel: IPv6: ADDRCONF(NETDEV_CHANGE):
> vwlan0: link becomes ready
> *
> This is a new one on me.
> 
> The "normal" problem (search shows to be a very old issue) I
> consistently (daily or multiple times/day) encounter is:

Yes, this is "normal" problem. The firmware has no error handler for PCI
bus related exceptions. So if we filed to read PCI bus first time, we
have choice to Ooops and stall or Ooops and reboot ASAP. So we reboot
and provide an kernel "firmware panic!" message.
Every one who can or will to fix this, is welcome.

> *
> Jun 02 14:55:30 computer kernel: usb 1-1.1: ath: firmware panic!
> exccause: 0x000d; pc: 0x0090ae81; badvaddr: 0x10ff4038.
> Jun 02 14:55:30 computer kernel: usb 1-1.1: USB disconnect, device number 9
> Jun 02 14:55:30 computer systemd-networkd[11959]: vwlan0: Lost carrier
> Jun 02 14:55:30 computer kernel: br0: port 2(vwlan0) entered disabled state
> Jun 02 14:55:30 computer kernel: wlan0: deauthenticating from
>  by local choice (Reason: 3=DEAUTH_LEAVING)
> Jun 02 14:55:30 computer kernel: ath: phy4: Failed to wakeup in 500us
> Jun 02 14:55:30 computer kernel: ath: phy4: Failed to wakeup in 500us
> Jun 02 14:55:30 computer kernel: ath: phy4: Failed to wakeup in 500us
> Jun 02 14:55:30 computer kernel: ath: phy4: Failed to wakeup in 500us
> Jun 02 14:55:30 computer systemd-networkd[11959]: wlan0: Lost carrier
> Jun 02 14:55:30 computer systemd[1]: Stopping A simple WPA encrypted
> wireless connection using a static IP...
> -- Subject: Unit netctl@wlan0.service has begun shutting down
> -- Defined-By: systemd
> -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
> --
> -- Unit netctl@wlan0.service has begun shutting down.
> Jun 02 14:55:30 computer kernel: device vwlan0 left promiscuous mode
> Jun 02 14:55:30 computer kernel: br0: port 2(vwlan0) entered disabled state
> Jun 02 14:55:30 computer audit: ANOM_PROMISCUOUS dev=vwlan0 prom=0
> old_prom=256 auid=4294967295 uid=0 gid=0 ses=4294967295
> Jun 02 14:55:30 computer hostapd[13218]: vwlan0: AP-STA-DISCONNECTED 
> 
> Jun 02 14:55:30 computer hostapd[13218]: Failed to set beacon parameters
> Jun 02 14:55:30 computer hostapd[13218]: vwlan0: INTERFACE-DISABLED
> Jun 02 14:55:30 computer kernel: usb 1-1.1: ath9k_htc: USB layer 

[PATCH] net: pch_gbe: fix err_cast.cocci warnings

2017-06-03 Thread kbuild test robot
drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c:2587:9-16: WARNING: 
ERR_CAST can be used with gpio


 Use ERR_CAST inlined function instead of ERR_PTR(PTR_ERR(...))

Generated by: scripts/coccinelle/api/err_cast.cocci

CC: Paul Burton 
Signed-off-by: Fengguang Wu 
---

 pch_gbe_main.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
@@ -2584,7 +2584,7 @@ pch_gbe_get_priv(struct pci_dev *pdev, c
if (!IS_ERR(gpio))
pdata->phy_reset_gpio = gpio;
else if (PTR_ERR(gpio) != -ENOENT)
-   return ERR_PTR(PTR_ERR(gpio));
+   return ERR_CAST(gpio);
 
return pdata;
 }


Re: [PATCH v3 4/7] net: pch_gbe: Add device tree support

2017-06-03 Thread kbuild test robot
Hi Paul,

[auto build test WARNING on net-next/master]
[also build test WARNING on v4.12-rc3 next-20170602]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Paul-Burton/net-pch_gbe-Fixes-MIPS-support/20170603-112042


coccinelle warnings: (new ones prefixed by >>)

>> drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c:2587:9-16: WARNING: 
>> ERR_CAST can be used with gpio

Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


Re: [PATCH v2 2/3] PCI: Enable PCIe Relaxed Ordering if supported

2017-06-03 Thread kbuild test robot
Hi Ding,

[auto build test WARNING on pci/next]
[cannot apply to v4.12-rc3 next-20170602]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Ding-Tianhong/Add-new-PCI_DEV_FLAGS_NO_RELAXED_ORDERING-flag/20170603-132448
base:   https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git next
config: x86_64-randconfig-x019-201722 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

>> drivers//pci/pci.c:4922:1: warning: "/*" within comment [-Wcomment]
/**
 

vim +4922 drivers//pci/pci.c

  4906   * @dev: PCI device to query
  4907   *
  4908   * Returns true if relaxed ordering is been set
  4909   */
  4910  int pcie_get_relaxed_ordering(struct pci_dev *dev)
  4911  {
  4912  u16 v;
  4913  
  4914  pcie_capability_read_word(dev, PCI_EXP_DEVCTL, );
  4915  
  4916  return (v & PCI_EXP_DEVCTL_RELAX_EN) >> 4;
  4917  }
  4918  EXPORT_SYMBOL(pcie_get_relaxed_ordering);
  4919  
  4920  /**
  4921   * pcie_set_mps - set PCI Express maximum payload size
> 4922  /**
  4923   * pcie_get_minimum_link - determine minimum link settings of a PCI 
device
  4924   * @dev: PCI device to query
  4925   * @speed: storage for minimum speed
  4926   * @width: storage for minimum width
  4927   *
  4928   * This function will walk up the PCI device chain and determine the 
minimum
  4929   * link width and speed of the device.
  4930   */

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip