Re: [PATCH] unix: properly account for FDs passed over unix sockets

2015-12-30 Thread Tetsuo Handa
Willy Tarreau wrote:
> On Wed, Dec 30, 2015 at 09:58:42AM +0100, Hannes Frederic Sowa wrote:
> > The MSG_PEEK code should not be harmful and the patch is good as is. I 
> > first understood from the published private thread, that it is possible 
> > for a program to exceed the rlimit of fds. But the DoS is only by 
> > keeping the fds in flight and not attaching them to any program.
> 
> Exactly. The real issue is when these FDs become very expensive such as
> pipes full of data.
> 

As you wrote how to abuse this vulnerability which exists in Linux 2.0
and later kernel, I quote a short description from private thread.

  "an unprivileged user consumes all file descriptors so that other
  unprivileged user cannot work" and "an unprivileged user consumes all
  kernel memory so that the OOM killer kills almost all processes before
  the culprit process is killed (CVE-2013-4312)".

Reported-by: Tetsuo Handa 
Mitigates: CVE-2013-4312 (Linux 2.0+)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] ixgbe: synchronize link_up and link_speed of a slave interface

2015-12-30 Thread Jeff Kirsher
On Thu, 2015-12-31 at 13:04 +0800, zyjzyj2...@gmail.com wrote:
> From: Zhu Yanjun 
> 
> According to the suggestions from Rustad, Mark D, to all the slave 
> interfaces, the link_speed and link_up should be synchronized since
> the time span between link_up and link_speed will make some virtual
> NICs not work well, such as a bonding driver in 802.3ad mode.
> 
> Signed-off-by: Zhu Yanjun 
> ---

Since this is version 5 of your original patch, you should be putting
the change log of this patch here, so that we can follow how you got to
this point.  This helpful for the reviewers (and me, the maintainer).
 For example:
v2: dropped the "else" case of "if" statement, based on feedback from
    Jeff Kirsher
v3: changed the if statement to simply return when the link speed is
    unknown on X540 parts based on feedback from Emil Tantilov
v4: updated code comment and if statement to test for IFF_SLAVE flag
V5: simplified code comment and if statement to only test for IFF_SLAVE
    flag and unknown link speed.

Of course, you can expand on what I have above, I just did a quick
summary of the changes as an example of a change log. 

>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 5 +
>  1 file changed, 5 insertions(+)


signature.asc
Description: This is a digitally signed message part


Re: [PATCH] unix: properly account for FDs passed over unix sockets

2015-12-30 Thread Willy Tarreau
On Thu, Dec 31, 2015 at 03:08:53PM +0900, Tetsuo Handa wrote:
> Willy Tarreau wrote:
> > On Wed, Dec 30, 2015 at 09:58:42AM +0100, Hannes Frederic Sowa wrote:
> > > The MSG_PEEK code should not be harmful and the patch is good as is. I 
> > > first understood from the published private thread, that it is possible 
> > > for a program to exceed the rlimit of fds. But the DoS is only by 
> > > keeping the fds in flight and not attaching them to any program.
> > 
> > Exactly. The real issue is when these FDs become very expensive such as
> > pipes full of data.
> > 
> 
> As you wrote how to abuse this vulnerability which exists in Linux 2.0
> and later kernel, I quote a short description from private thread.
> 
>   "an unprivileged user consumes all file descriptors so that other
>   unprivileged user cannot work" and "an unprivileged user consumes all
>   kernel memory so that the OOM killer kills almost all processes before
>   the culprit process is killed (CVE-2013-4312)".
> 
> Reported-by: Tetsuo Handa 
> Mitigates: CVE-2013-4312 (Linux 2.0+)

Well I didn't reveal any secret as it was publicly reported first
in 2010, it's only that Mark sent us the proof of concept exploit
on the security list recently :-)

https://bugzilla.kernel.org/show_bug.cgi?id=20402

Anyway I'll resend the patch with your reported-by, the CVE and
Hannes' ACK.

Thanks!
Willy

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC] vhost: basic device IOTLB support

2015-12-30 Thread Jason Wang
This patch tries to implement an device IOTLB for vhost. This could be
used with for co-operation with userspace(qemu) implementation of
iommu for a secure DMA environment in guest.

The idea is simple. When vhost meets an IOTLB miss, it will request
the assistance of userspace to do the translation, this is done
through:

- Fill the translation request in a preset userspace address (This
  address is set through ioctl VHOST_SET_IOTLB_REQUEST_ENTRY).
- Notify userspace through eventfd (This eventfd was set through ioctl
  VHOST_SET_IOTLB_FD).

When userspace finishes the translation, it will update the vhost
IOTLB through VHOST_UPDATE_IOTLB ioctl. Userspace is also in charge of
snooping the IOTLB invalidation of IOMMU IOTLB and use
VHOST_UPDATE_IOTLB to invalidate the possible entry in vhost.

For simplicity, IOTLB was implemented with a simple hash array. The
index were calculated from IOVA page frame number which can only works
at PAGE_SIZE level.

An qemu implementation (for reference) is available at:
g...@github.com:jasowang/qemu.git iommu

TODO & Known issues:

- read/write permission validation was not implemented.
- no feature negotiation.
- VHOST_SET_MEM_TABLE is not reused (maybe there's a chance).
- working at PAGE_SIZE level, don't support large mappings.
- better data structure for IOTLB instead of simple hash array.
- better API, e.g using mmap() instead of preset userspace address.

Signed-off-by: Jason Wang 
---
 drivers/vhost/net.c|   2 +-
 drivers/vhost/vhost.c  | 190 -
 drivers/vhost/vhost.h  |  13 
 include/uapi/linux/vhost.h |  26 +++
 4 files changed, 229 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 9eda69e..a172be9 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1083,7 +1083,7 @@ static long vhost_net_ioctl(struct file *f, unsigned int 
ioctl,
r = vhost_dev_ioctl(>dev, ioctl, argp);
if (r == -ENOIOCTLCMD)
r = vhost_vring_ioctl(>dev, ioctl, argp);
-   else
+   else if (ioctl != VHOST_UPDATE_IOTLB)
vhost_net_flush(n);
mutex_unlock(>dev.mutex);
return r;
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index eec2f11..729fe05 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -113,6 +113,11 @@ static void vhost_init_is_le(struct vhost_virtqueue *vq)
 }
 #endif /* CONFIG_VHOST_CROSS_ENDIAN_LEGACY */
 
+static inline int vhost_iotlb_hash(u64 iova)
+{
+   return (iova >> PAGE_SHIFT) & (VHOST_IOTLB_SIZE - 1);
+}
+
 static void vhost_poll_func(struct file *file, wait_queue_head_t *wqh,
poll_table *pt)
 {
@@ -384,8 +389,14 @@ void vhost_dev_init(struct vhost_dev *dev,
dev->memory = NULL;
dev->mm = NULL;
spin_lock_init(>work_lock);
+   spin_lock_init(>iotlb_lock);
+   mutex_init(>iotlb_req_mutex);
INIT_LIST_HEAD(>work_list);
dev->worker = NULL;
+   dev->iotlb_request = NULL;
+   dev->iotlb_ctx = NULL;
+   dev->iotlb_file = NULL;
+   dev->pending_request.flags.type = VHOST_IOTLB_INVALIDATE;
 
for (i = 0; i < dev->nvqs; ++i) {
vq = dev->vqs[i];
@@ -393,12 +404,17 @@ void vhost_dev_init(struct vhost_dev *dev,
vq->indirect = NULL;
vq->heads = NULL;
vq->dev = dev;
+   vq->iotlb_request = NULL;
mutex_init(>mutex);
vhost_vq_reset(dev, vq);
if (vq->handle_kick)
vhost_poll_init(>poll, vq->handle_kick,
POLLIN, dev);
}
+
+   init_completion(>iotlb_completion);
+   for (i = 0; i < VHOST_IOTLB_SIZE; i++)
+   dev->iotlb[i].flags.valid = VHOST_IOTLB_INVALID;
 }
 EXPORT_SYMBOL_GPL(vhost_dev_init);
 
@@ -940,9 +956,10 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned int 
ioctl, void __user *argp)
 {
struct file *eventfp, *filep = NULL;
struct eventfd_ctx *ctx = NULL;
+   struct vhost_iotlb_entry entry;
u64 p;
long r;
-   int i, fd;
+   int index, i, fd;
 
/* If you are not the owner, you can become one */
if (ioctl == VHOST_SET_OWNER) {
@@ -1008,6 +1025,80 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned int 
ioctl, void __user *argp)
if (filep)
fput(filep);
break;
+   case VHOST_SET_IOTLB_FD:
+   r = get_user(fd, (int __user *)argp);
+   if (r < 0)
+   break;
+   eventfp = fd == -1 ? NULL : eventfd_fget(fd);
+   if (IS_ERR(eventfp)) {
+   r = PTR_ERR(eventfp);
+   break;
+   }
+   if (eventfp != d->iotlb_file) {
+

[net-next] ppp: rtnetlink device handling

2015-12-30 Thread Sedat Dilek
Hi Guillaume,

can you explain why you moved ppp to rtnetlink device handling?
Benefits, etc.?

Does anything change when using NetworkManager/ModemManager/pppd for
my network setup/handling (here: Ubuntu/precise AMD64)?

Thanks in advance.

Regards,
- Sedat -

P.S.: Coming soon... Not (only) in the cinemas... 201*6* ;-).

[1] https://patchwork.ozlabs.org/patch/560702/
[2] https://patchwork.ozlabs.org/patch/560703/
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V5] ixgbe: synchronize link_up and link_speed of a slave

2015-12-30 Thread Jeff Kirsher
On Thu, 2015-12-31 at 13:04 +0800, zyjzyj2...@gmail.com wrote:
> Thanks for the suggestions from Rustad, Mark D.
> According to his suggestions, the logs and source code are
> simplified.

I find it funny that this email (no patch) is got the correct subject,
yet the updated patch you sent does not.  The subject line for this
email should have been used for the updated patch you sent out.

signature.asc
Description: This is a digitally signed message part


Re: [PATCHv4 net-next 0/3] Ethtool support for phy stats

2015-12-30 Thread David Miller
From: Andrew Lunn 
Date: Wed, 30 Dec 2015 16:28:24 +0100

> This patchset add ethtool support for reading statistics from the PHY.
> The Marvell and Micrel Phys are then extended to report receiver
> packet errors and idle errors.
> 
> v2:
>   Fix linking when phylib is not enabled.
> v3:
>   Inline helpers into ethtool.c, so fixing when phylib is a module.
> v4:
>   Add missing static

Series applied, thanks Andrew.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] igb: When GbE link up, wait for Remote receiver status condition.

2015-12-30 Thread ueba takuma
2015-12-31 13:42 GMT+09:00 Jeff Kirsher :
> Please send this to intel-wired-...@lists.osuosl.org mailing list, or
> at least CC the list since all Wired Ethernet Intel driver patches are
> handled through that list.

Thanks for the info.
I just sent it.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCHv3 net-next 2/3] phy: marvell: Add ethtool statistics counters

2015-12-30 Thread Shaohui Xie
> -Original Message-
> From: Andrew Lunn [mailto:and...@lunn.ch]
> Sent: Wednesday, December 30, 2015 11:13 PM
> To: Shaohui Xie
> Cc: David Miller; Florian Fainelli; netdev
> Subject: Re: [PATCHv3 net-next 2/3] phy: marvell: Add ethtool statistics
> counters
> 
> On Wed, Dec 30, 2015 at 04:24:40AM +, Shaohui Xie wrote:
> > > Subject: [PATCHv3 net-next 2/3] phy: marvell: Add ethtool statistics
> > > counters
> > > +static int marvell_probe(struct phy_device *phydev) {
> > > + struct marvell_priv *priv;
> > > +
> > > + priv = devm_kzalloc(>dev, sizeof(*priv), GFP_KERNEL);
> > > + if (!priv)
> > > + return -ENOMEM;
> > > +
> > > + phydev->priv = priv;
> > > +
> > > + return 0;
> > > +}
> > > +
> > [S.H] Is a remove() function needed to free the memory?
> 
> Hi Shaohui
> 
> No, since i use devm_kzalloc(). The memory will automatically be freed when
> phydev->dev is destroyed. Take a look at all the devm_ API calls which have 
> this
> property. They are useful for avoiding memory leaks, especially on error 
> paths.
Got it.
Thank you!

Shaohui
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V5] ixgbe: synchronize link_up and link_speed of a slave

2015-12-30 Thread zhuyj

On 12/31/2015 01:17 PM, Jeff Kirsher wrote:

On Thu, 2015-12-31 at 13:04 +0800, zyjzyj2...@gmail.com wrote:

Thanks for the suggestions from Rustad, Mark D.
According to his suggestions, the logs and source code are
simplified.

I find it funny that this email (no patch) is got the correct subject,
yet the updated patch you sent does not.  The subject line for this
email should have been used for the updated patch you sent out.

Thanks for your reply.
I will resend the patch soon.

Best Regards!
Zhu Yanjun
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 4/5] sctp: drop the old assoc hashtable of sctp

2015-12-30 Thread Xin Long
transport hashtable will replace the association hashtable,
so association hashtable is not used in sctp any more, so
drop the codes about that.

Signed-off-by: Xin Long 
Signed-off-by: Marcelo Ricardo Leitner 
---
 include/net/sctp/sctp.h| 21 
 include/net/sctp/structs.h |  5 
 net/sctp/input.c   | 61 --
 net/sctp/protocol.c| 30 ++-
 net/sctp/sm_sideeffect.c   |  2 --
 net/sctp/socket.c  |  6 +
 6 files changed, 3 insertions(+), 122 deletions(-)

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 7bbdfba..835aa2e 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -126,8 +126,6 @@ int sctp_primitive_ASCONF(struct net *, struct 
sctp_association *, void *arg);
  */
 int sctp_rcv(struct sk_buff *skb);
 void sctp_v4_err(struct sk_buff *skb, u32 info);
-void sctp_hash_established(struct sctp_association *);
-void sctp_unhash_established(struct sctp_association *);
 void sctp_hash_endpoint(struct sctp_endpoint *);
 void sctp_unhash_endpoint(struct sctp_endpoint *);
 struct sock *sctp_err_lookup(struct net *net, int family, struct sk_buff *,
@@ -530,25 +528,6 @@ static inline int sctp_ep_hashfn(struct net *net, __u16 
lport)
return (net_hash_mix(net) + lport) & (sctp_ep_hashsize - 1);
 }
 
-/* This is the hash function for the association hash table. */
-static inline int sctp_assoc_hashfn(struct net *net, __u16 lport, __u16 rport)
-{
-   int h = (lport << 16) + rport + net_hash_mix(net);
-   h ^= h>>8;
-   return h & (sctp_assoc_hashsize - 1);
-}
-
-/* This is the hash function for the association hash table.  This is
- * not used yet, but could be used as a better hash function when
- * we have a vtag.
- */
-static inline int sctp_vtag_hashfn(__u16 lport, __u16 rport, __u32 vtag)
-{
-   int h = (lport << 16) + rport;
-   h ^= vtag;
-   return h & (sctp_assoc_hashsize - 1);
-}
-
 #define sctp_for_each_hentry(epb, head) \
hlist_for_each_entry(epb, head, node)
 
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 4ab87d0..20e7212 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -120,8 +120,6 @@ extern struct sctp_globals {
 
/* This is the hash of all endpoints. */
struct sctp_hashbucket *ep_hashtable;
-   /* This is the hash of all associations. */
-   struct sctp_hashbucket *assoc_hashtable;
/* This is the sctp port control hash.  */
struct sctp_bind_hashbucket *port_hashtable;
/* This is the hash of all transports. */
@@ -129,7 +127,6 @@ extern struct sctp_globals {
 
/* Sizes of above hashtables. */
int ep_hashsize;
-   int assoc_hashsize;
int port_hashsize;
 
/* Default initialization values to be applied to new associations. */
@@ -146,8 +143,6 @@ extern struct sctp_globals {
 #define sctp_address_families  (sctp_globals.address_families)
 #define sctp_ep_hashsize   (sctp_globals.ep_hashsize)
 #define sctp_ep_hashtable  (sctp_globals.ep_hashtable)
-#define sctp_assoc_hashsize(sctp_globals.assoc_hashsize)
-#define sctp_assoc_hashtable   (sctp_globals.assoc_hashtable)
 #define sctp_port_hashsize (sctp_globals.port_hashsize)
 #define sctp_port_hashtable(sctp_globals.port_hashtable)
 #define sctp_transport_hashtable   (sctp_globals.transport_hashtable)
diff --git a/net/sctp/input.c b/net/sctp/input.c
index 6f075d8..d9a6e66 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -913,67 +913,6 @@ struct sctp_transport *sctp_epaddr_lookup_transport(
return sctp_addrs_lookup_transport(net, >a, paddr);
 }
 
-/* Insert association into the hash table.  */
-static void __sctp_hash_established(struct sctp_association *asoc)
-{
-   struct net *net = sock_net(asoc->base.sk);
-   struct sctp_ep_common *epb;
-   struct sctp_hashbucket *head;
-
-   epb = >base;
-
-   /* Calculate which chain this entry will belong to. */
-   epb->hashent = sctp_assoc_hashfn(net, epb->bind_addr.port,
-asoc->peer.port);
-
-   head = _assoc_hashtable[epb->hashent];
-
-   write_lock(>lock);
-   hlist_add_head(>node, >chain);
-   write_unlock(>lock);
-}
-
-/* Add an association to the hash. Local BH-safe. */
-void sctp_hash_established(struct sctp_association *asoc)
-{
-   if (asoc->temp)
-   return;
-
-   local_bh_disable();
-   __sctp_hash_established(asoc);
-   local_bh_enable();
-}
-
-/* Remove association from the hash table.  */
-static void __sctp_unhash_established(struct sctp_association *asoc)
-{
-   struct net *net = sock_net(asoc->base.sk);
-   struct sctp_hashbucket *head;
-   struct sctp_ep_common *epb;
-
-   epb = >base;
-
-   epb->hashent = 

[PATCH net-next 5/5] sctp: remove the local_bh_disable/enable in sctp_endpoint_lookup_assoc

2015-12-30 Thread Xin Long
sctp_endpoint_lookup_assoc is called in the protection of sock lock
there is no need to call local_bh_disable in this function. so remove
them.

Signed-off-by: Xin Long 
Signed-off-by: Marcelo Ricardo Leitner 
---
 net/sctp/endpointola.c | 17 +
 1 file changed, 1 insertion(+), 16 deletions(-)

diff --git a/net/sctp/endpointola.c b/net/sctp/endpointola.c
index 8838bf4..52838ea 100644
--- a/net/sctp/endpointola.c
+++ b/net/sctp/endpointola.c
@@ -317,7 +317,7 @@ struct sctp_endpoint *sctp_endpoint_is_match(struct 
sctp_endpoint *ep,
  * We lookup the transport from hashtable at first, then get association
  * through t->assoc.
  */
-static struct sctp_association *__sctp_endpoint_lookup_assoc(
+struct sctp_association *sctp_endpoint_lookup_assoc(
const struct sctp_endpoint *ep,
const union sctp_addr *paddr,
struct sctp_transport **transport)
@@ -342,21 +342,6 @@ out:
return asoc;
 }
 
-/* Lookup association on an endpoint based on a peer address.  BH-safe.  */
-struct sctp_association *sctp_endpoint_lookup_assoc(
-   const struct sctp_endpoint *ep,
-   const union sctp_addr *paddr,
-   struct sctp_transport **transport)
-{
-   struct sctp_association *asoc;
-
-   local_bh_disable();
-   asoc = __sctp_endpoint_lookup_assoc(ep, paddr, transport);
-   local_bh_enable();
-
-   return asoc;
-}
-
 /* Look for any peeled off association from the endpoint that matches the
  * given peer address.
  */
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Intel-wired-lan] [PATCH 2/2] ixgbe: restrict synchronization of link_up and speed

2015-12-30 Thread Tantilov, Emil S
>-Original Message-
>From: zhuyj [mailto:zyjzyj2...@gmail.com]
>Sent: Wednesday, December 30, 2015 12:20 AM
>To: Tantilov, Emil S; Kirsher, Jeffrey T; Brandeburg, Jesse; Nelson,
>Shannon; Wyborny, Carolyn; Skidmore, Donald C; Allan, Bruce W; Ronciak,
>John; Williams, Mitch A; intel-wired-...@lists.osuosl.org;
>netdev@vger.kernel.org; e1000-de...@lists.sourceforge.net
>Cc: Viswanathan, Ven (Wind River); Shteinbock, Boris (Wind River); Bourg,
>Vincent (Wind River)
>Subject: Re: [Intel-wired-lan] [PATCH 2/2] ixgbe: restrict synchronization
>of link_up and speed
>
>On 12/30/2015 02:55 PM, Tantilov, Emil S wrote:
>>> -Original Message-
>>> From: zhuyj [mailto:zyjzyj2...@gmail.com]
>>> Sent: Tuesday, December 29, 2015 6:49 PM
>>> To: Tantilov, Emil S; Kirsher, Jeffrey T; Brandeburg, Jesse; Nelson,
>>> Shannon; Wyborny, Carolyn; Skidmore, Donald C; Allan, Bruce W; Ronciak,
>>> John; Williams, Mitch A; intel-wired-...@lists.osuosl.org;
>>> netdev@vger.kernel.org; e1000-de...@lists.sourceforge.net
>>> Cc: Viswanathan, Ven (Wind River); Shteinbock, Boris (Wind River);
>Bourg,
>>> Vincent (Wind River)
>>> Subject: Re: [Intel-wired-lan] [PATCH 2/2] ixgbe: restrict
>synchronization
>>> of link_up and speed
>>>
>>> On 12/30/2015 12:18 AM, Tantilov, Emil S wrote:
> -Original Message-
> From: Intel-wired-lan [mailto:intel-wired-lan-
>boun...@lists.osuosl.org]
>>> On
> Behalf Of zyjzyj2...@gmail.com
> Sent: Monday, December 28, 2015 6:32 PM
> To: Kirsher, Jeffrey T; Brandeburg, Jesse; Nelson, Shannon; Wyborny,
> Carolyn; Skidmore, Donald C; Allan, Bruce W; Ronciak, John; Williams,
>>> Mitch
> A; intel-wired-...@lists.osuosl.org; netdev@vger.kernel.org; e1000-
> de...@lists.sourceforge.net
> Cc: Viswanathan, Ven (Wind River); Shteinbock, Boris (Wind River);
>>> Bourg,
> Vincent (Wind River)
> Subject: [Intel-wired-lan] [PATCH 2/2] ixgbe: restrict synchronization
>>> of
> link_up and speed
>
> From: Zhu Yanjun 
>
> When the X540 NIC acts as a slave of some virtual NICs, it is very
> important to synchronize link_up and link_speed, such as a bonding
> driver in 802.3ad mode. When X540 NIC acts as an independent
>interface,
> it is not necessary to synchronize link_up and link_speed. That is,
> the time span between link_up and link_speed is acceptable.
 What exactly do you mean by "time span between link_up and link_speed"?
>>> In the previous mail, I show you some ethtool logs. In these logs, there
>>> is some
>>> time with NIC up while speed is unknown. I think this "some time" is
>>> time span between
>>> link_up and link_speed. Please see the previous mail for details.
>> Was this when reporting the link state from check_link() (reading the
>LINKS
>> register) or reporting the adapter->link_speed?
>>
 Where is it you think the de-synchronization occurs?
>>> When a NIC interface acts as a slave, a flag "IFF_SLAVE" is set in
>>> netdevice struct.
>>> Before we enter this function, we check IFF_SLAVE flag. If this flag is
>>> set, we continue to check
>>> link_speed. If not, this function is executed whether this link_speed is
>>> unknown or not.
>> I can already see this in your patch. I was asking about the reason why
>>your change is needed.
>
>an extreme example, let us assume this scenario:

Is this the scenario you are trying to fix?

>An ixgbe NIC directly connects to another NIC (let us call it NIC-a).
>And auto-negotiate is off while no static speed is set in the 2 NICs.

The ixgbe driver does not support disabling auto-negotiation directly.
The only time this is true is when the advertised speed is restricted,
so the above scenario is not possible (you either have autoneg or 
advertised speed set) with the current driver.

Is this example in theory or do you have your interface configured this
way somehow? 

Thanks,
Emil

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mac80211: Make addr const in SET_IEEE80211_PERM_ADDR()

2015-12-30 Thread Bjorn Andersson
On Wed, Dec 30, 2015 at 8:47 AM, Souptick Joarder  wrote:
>
> HI Bjorn,
>
> On Thu, Dec 24, 2015 at 2:03 PM, Bjorn Andersson  wrote:
> > Make the addr parameter const in SET_IEEE80211_PERM_ADDR() to save
> > clients from having to cast away a const qualifier.
> >
> > Signed-off-by: Bjorn Andersson 
> > ---
> >  include/net/mac80211.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/include/net/mac80211.h b/include/net/mac80211.h
> > index 7c30faff245f..a6f3c9c4b7c2 100644
> > --- a/include/net/mac80211.h
> > +++ b/include/net/mac80211.h
> > @@ -2167,7 +2167,7 @@ static inline void SET_IEEE80211_DEV(struct 
> > ieee80211_hw *hw, struct device *dev
> >   * @hw: the  ieee80211_hw to set the MAC address for
> >   * @addr: the address to set
> >   */
> > -static inline void SET_IEEE80211_PERM_ADDR(struct ieee80211_hw *hw, u8 
> > *addr)
> > +static inline void SET_IEEE80211_PERM_ADDR(struct ieee80211_hw *hw, const 
> > u8 *addr)
>
> I guess without const or with const doesn't make much difference here.
> Correct me if I am wrong.

For most cases it doesn't make any difference, but in my driver I
acquire the mac address as a const u8 *. Therefor I need to cast away
the const part when calling this API.

There's an existing example of this in
drivers/net/wireless/st/cw1200/main.c line 601.


I think it's safe to assume that this API won't ever modify the passed
addr buffer, so there would be no future issues of marking the
parameter const either.

>
> >  {
> > memcpy(hw->wiphy->perm_addr, addr, ETH_ALEN);
> >  }
>

Regards,
Bjorn
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 06/12] net: sched: support qdisc_reset on NOLOCK qdisc

2015-12-30 Thread John Fastabend
The qdisc_reset operation depends on the qdisc lock at the moment
to halt any additions to gso_skb and statistics while the list is
free'd and the stats zeroed.

Without the qdisc lock we can not guarantee another cpu is not in
the process of adding a skb to one of the "cells". Here are the
two cases we have to handle.

 case 1: qdisc_graft operation. In this case a "new" qdisc is attached
 and the 'qdisc_destroy' operation is called on the old qdisc.
 The destroy operation will wait a rcu grace period and call
 qdisc_rcu_free(). At which point gso_cpu_skb is free'd along
 with all stats so no need to zero stats and gso_cpu_skb from
 the reset operation itself.

 Because we can not continue to call qdisc_reset before waiting
 an rcu grace period so that the qdisc is detached from all
 cpus simply do not call qdisc_reset() at all and let the
 qdisc_destroy operation clean up the qdisc. Note, a refcnt
 greater than 1 would cause the destroy operation to be
 aborted however if this ever happened the reference to the
 qdisc would be lost and we would have a memory leak.

 case 2: dev_deactivate sequence. This can come from a user bringing
 the interface down which causes the gso_skb list to be flushed
 and the qlen zero'd. At the moment this is protected by the
 qdisc lock so while we clear the qlen/gso_skb fields we are
 guaranteed no new skbs are added. For the lockless case
 though this is not true. To resolve this move the qdisc_reset
 call after the new qdisc is assigned and a grace period is
 exercised to ensure no new skbs can be enqueued. Further
 the RTNL lock is held so we can not get another call to
 activate the qdisc while the skb lists are being free'd.

 Finally, fix qdisc_reset to handle the per cpu stats and
 skb lists.

Signed-off-by: John Fastabend 
---
 net/sched/sch_generic.c |   42 +++---
 1 file changed, 35 insertions(+), 7 deletions(-)

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 9aeb51f..134fb95 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -666,6 +666,18 @@ void qdisc_reset(struct Qdisc *qdisc)
if (ops->reset)
ops->reset(qdisc);
 
+   if (qdisc->gso_cpu_skb) {
+   int i;
+
+   for_each_possible_cpu(i) {
+   struct gso_cell *cell;
+
+   cell = per_cpu_ptr(qdisc->gso_cpu_skb, i);
+   if (cell)
+   kfree_skb_list(cell->skb);
+   }
+   }
+
if (qdisc->gso_skb) {
kfree_skb_list(qdisc->gso_skb);
qdisc->gso_skb = NULL;
@@ -740,10 +752,6 @@ struct Qdisc *dev_graft_qdisc(struct netdev_queue 
*dev_queue,
root_lock = qdisc_lock(oqdisc);
spin_lock_bh(root_lock);
 
-   /* Prune old scheduler */
-   if (oqdisc && atomic_read(>refcnt) <= 1)
-   qdisc_reset(oqdisc);
-
/* ... and graft new one */
if (qdisc == NULL)
qdisc = _qdisc;
@@ -857,7 +865,6 @@ static void dev_deactivate_queue(struct net_device *dev,
set_bit(__QDISC_STATE_DEACTIVATED, >state);
 
rcu_assign_pointer(dev_queue->qdisc, qdisc_default);
-   qdisc_reset(qdisc);
 
spin_unlock_bh(qdisc_lock(qdisc));
}
@@ -890,6 +897,18 @@ static bool some_qdisc_is_busy(struct net_device *dev)
return false;
 }
 
+static void dev_qdisc_reset(struct net_device *dev,
+   struct netdev_queue *dev_queue,
+   void *none)
+{
+   struct Qdisc *qdisc = dev_queue->qdisc_sleeping;
+
+   WARN_ON(!qdisc);
+
+   if (qdisc)
+   qdisc_reset(qdisc);
+}
+
 /**
  * dev_deactivate_many - deactivate transmissions on several devices
  * @head: list of devices to deactivate
@@ -910,7 +929,7 @@ void dev_deactivate_many(struct list_head *head)
 _qdisc);
 
dev_watchdog_down(dev);
-   sync_needed |= !dev->dismantle;
+   sync_needed = true;
}
 
/* Wait for outstanding qdisc-less dev_queue_xmit calls.
@@ -921,9 +940,18 @@ void dev_deactivate_many(struct list_head *head)
synchronize_net();
 
/* Wait for outstanding qdisc_run calls. */
-   list_for_each_entry(dev, head, close_list)
+   list_for_each_entry(dev, head, close_list) {
while (some_qdisc_is_busy(dev))
yield();
+
+   /* The new qdisc is assigned at this point so we can safely
+* unwind stale skb lists and qdisc statistics
+*/
+   netdev_for_each_tx_queue(dev, dev_qdisc_reset, NULL);
+ 

[RFC PATCH 07/12] net: sched: qdisc_qlen for per cpu logic

2015-12-30 Thread John Fastabend
This is a bit interesting because it means sch_direct_xmit will
return a positive value which causes the dequeue/xmit cycle to
continue only when a specific cpu has a qlen > 0.

However checking each cpu for qlen will break performance so
its important to note that qdiscs that set the no lock bit need
to have some sort of per cpu enqueue/dequeue data structure that
maps to the per cpu qlen value.

Signed-off-by: John Fastabend 
---
 include/net/sch_generic.h |8 
 1 file changed, 8 insertions(+)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index aa39dd4..30f4c60 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -273,8 +273,16 @@ static inline void qdisc_cb_private_validate(const struct 
sk_buff *skb, int sz)
BUILD_BUG_ON(sizeof(qcb->data) < sz);
 }
 
+static inline int qdisc_qlen_cpu(const struct Qdisc *q)
+{
+   return this_cpu_ptr(q->cpu_qstats)->qlen;
+}
+
 static inline int qdisc_qlen(const struct Qdisc *q)
 {
+   if (q->flags & TCQ_F_NOLOCK)
+   return qdisc_qlen_cpu(q);
+
return q->q.qlen;
 }
 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 05/12] net: sched: per cpu gso handlers

2015-12-30 Thread John Fastabend
The net sched infrastructure has a gso ptr that points to skb structs
that have failed to be enqueued by the device driver.

This can happen when multiple cores try to push a skb onto the same
underlying hardware queue resulting in lock contention. This case is
handled by a cpu collision handler handle_dev_cpu_collision(). Another
case occurs when the stack overruns the drivers low level tx queues
capacity. Ideally these should be a rare occurrence in a well-tuned
system but they do happen.

To handle this in the lockless case use a per cpu gso field to park
the skb until the conflict can be resolved. Note at this point the
skb has already been popped off the qdisc so it has to be handled
by the infrastructure.

Signed-off-by: John Fastabend 
---
 include/net/sch_generic.h |   36 
 net/sched/sch_generic.c   |   34 --
 2 files changed, 68 insertions(+), 2 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 9966c17..aa39dd4 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -44,6 +44,10 @@ struct qdisc_size_table {
u16 data[];
 };
 
+struct gso_cell {
+   struct sk_buff *skb;
+};
+
 struct Qdisc {
int (*enqueue)(struct sk_buff *skb, struct Qdisc 
*dev);
struct sk_buff *(*dequeue)(struct Qdisc *dev);
@@ -88,6 +92,7 @@ struct Qdisc {
 
struct Qdisc*next_sched;
struct sk_buff  *gso_skb;
+   struct gso_cell __percpu *gso_cpu_skb;
/*
 * For performance sake on SMP, we put highly modified fields at the end
 */
@@ -699,6 +704,22 @@ static inline struct sk_buff *qdisc_peek_dequeued(struct 
Qdisc *sch)
return sch->gso_skb;
 }
 
+static inline struct sk_buff *qdisc_peek_dequeued_cpu(struct Qdisc *sch)
+{
+   struct gso_cell *gso = this_cpu_ptr(sch->gso_cpu_skb);
+
+   if (!gso->skb) {
+   struct sk_buff *skb = sch->dequeue(sch);
+
+   if (skb) {
+   gso->skb = skb;
+   qdisc_qstats_cpu_qlen_inc(sch);
+   }
+   }
+
+   return gso->skb;
+}
+
 /* use instead of qdisc->dequeue() for all qdiscs queried with ->peek() */
 static inline struct sk_buff *qdisc_dequeue_peeked(struct Qdisc *sch)
 {
@@ -714,6 +735,21 @@ static inline struct sk_buff *qdisc_dequeue_peeked(struct 
Qdisc *sch)
return skb;
 }
 
+static inline struct sk_buff *qdisc_dequeue_peeked_skb(struct Qdisc *sch)
+{
+   struct gso_cell *gso = this_cpu_ptr(sch->gso_cpu_skb);
+   struct sk_buff *skb = gso->skb;
+
+   if (skb) {
+   gso->skb = NULL;
+   qdisc_qstats_cpu_qlen_dec(sch);
+   } else {
+   skb = sch->dequeue(sch);
+   }
+
+   return skb;
+}
+
 static inline void __qdisc_reset_queue(struct Qdisc *sch,
   struct sk_buff_head *list)
 {
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 37dfa4a..9aeb51f 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -44,8 +44,7 @@ EXPORT_SYMBOL(default_qdisc_ops);
  * - ingress filtering is also serialized via qdisc root lock
  * - updates to tree and tree walking are only done under the rtnl mutex.
  */
-
-static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
+static inline int __dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
 {
q->gso_skb = skb;
q->qstats.requeues++;
@@ -55,6 +54,24 @@ static inline int dev_requeue_skb(struct sk_buff *skb, 
struct Qdisc *q)
return 0;
 }
 
+static inline int dev_requeue_cpu_skb(struct sk_buff *skb, struct Qdisc *q)
+{
+   this_cpu_ptr(q->gso_cpu_skb)->skb = skb;
+   qdisc_qstats_cpu_requeues_inc(q);
+   qdisc_qstats_cpu_qlen_inc(q);
+   __netif_schedule(q);
+
+   return 0;
+}
+
+static inline int dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
+{
+   if (q->flags & TCQ_F_NOLOCK)
+   return __dev_requeue_skb(skb, q);
+   else
+   return dev_requeue_cpu_skb(skb, q);
+}
+
 static void try_bulk_dequeue_skb(struct Qdisc *q,
 struct sk_buff *skb,
 const struct netdev_queue *txq,
@@ -666,6 +683,19 @@ static void qdisc_rcu_free(struct rcu_head *head)
free_percpu(qdisc->cpu_qstats);
}
 
+   if (qdisc->gso_cpu_skb) {
+   int i;
+
+   for_each_possible_cpu(i) {
+   struct gso_cell *cell;
+
+   cell = per_cpu_ptr(qdisc->gso_cpu_skb, i);
+   kfree_skb_list(cell->skb);
+   }
+
+   free_percpu(qdisc->gso_cpu_skb);
+   }
+
kfree((char *) qdisc - qdisc->padded);
 }
 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to 

[RFC PATCH 04/12] net: sched: provide per cpu qstat helpers

2015-12-30 Thread John Fastabend
The per cpu qstats support was added with per cpu bstat support
which is currently used by the ingress qdisc. This patch adds
a set of helpers needed to make other qdiscs that use qstats
per cpu as well.

Signed-off-by: John Fastabend 
---
 include/net/sch_generic.h |   39 +++
 1 file changed, 39 insertions(+)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index c8d42c3..9966c17 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -545,12 +545,43 @@ static inline void qdisc_qstats_backlog_dec(struct Qdisc 
*sch,
sch->qstats.backlog -= qdisc_pkt_len(skb);
 }
 
+static inline void qdisc_qstats_cpu_backlog_dec(struct Qdisc *sch,
+   const struct sk_buff *skb)
+{
+   struct gnet_stats_queue *q = this_cpu_ptr(sch->cpu_qstats);
+
+   q->backlog -= qdisc_pkt_len(skb);
+}
+
 static inline void qdisc_qstats_backlog_inc(struct Qdisc *sch,
const struct sk_buff *skb)
 {
sch->qstats.backlog += qdisc_pkt_len(skb);
 }
 
+static inline void qdisc_qstats_cpu_backlog_inc(struct Qdisc *sch,
+   const struct sk_buff *skb)
+{
+   struct gnet_stats_queue *q = this_cpu_ptr(sch->cpu_qstats);
+
+   q->backlog += qdisc_pkt_len(skb);
+}
+
+static inline void qdisc_qstats_cpu_qlen_inc(struct Qdisc *sch)
+{
+   this_cpu_ptr(sch->cpu_qstats)->qlen++;
+}
+
+static inline void qdisc_qstats_cpu_qlen_dec(struct Qdisc *sch)
+{
+   this_cpu_ptr(sch->cpu_qstats)->qlen--;
+}
+
+static inline void qdisc_qstats_cpu_requeues_inc(struct Qdisc *sch)
+{
+   this_cpu_ptr(sch->cpu_qstats)->requeues++;
+}
+
 static inline void __qdisc_qstats_drop(struct Qdisc *sch, int count)
 {
sch->qstats.drops += count;
@@ -726,6 +757,14 @@ static inline int qdisc_drop(struct sk_buff *skb, struct 
Qdisc *sch)
return NET_XMIT_DROP;
 }
 
+static inline int qdisc_drop_cpu(struct sk_buff *skb, struct Qdisc *sch)
+{
+   kfree_skb(skb);
+   qdisc_qstats_cpu_drop(sch);
+
+   return NET_XMIT_DROP;
+}
+
 static inline int qdisc_reshape_fail(struct sk_buff *skb, struct Qdisc *sch)
 {
qdisc_qstats_drop(sch);

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 08/12] net: sched: a dflt qdisc may be used with per cpu stats

2015-12-30 Thread John Fastabend
Enable dflt qdisc support for per cpu stats before this patch a
dflt qdisc was required to use the global statistics qstats and
bstats.

Signed-off-by: John Fastabend 
---
 net/sched/sch_generic.c |   24 
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 134fb95..be5d63a 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -641,18 +641,34 @@ struct Qdisc *qdisc_create_dflt(struct netdev_queue 
*dev_queue,
struct Qdisc *sch;
 
if (!try_module_get(ops->owner))
-   goto errout;
+   return NULL;
 
sch = qdisc_alloc(dev_queue, ops);
if (IS_ERR(sch))
-   goto errout;
+   return NULL;
sch->parent = parentid;
 
-   if (!ops->init || ops->init(sch, NULL) == 0)
+   if (!ops->init)
return sch;
 
-   qdisc_destroy(sch);
+   if (ops->init(sch, NULL))
+   goto errout;
+
+   /* init() may have set percpu flags so init data structures */
+   if (qdisc_is_percpu_stats(sch)) {
+   sch->cpu_bstats =
+   netdev_alloc_pcpu_stats(struct gnet_stats_basic_cpu);
+   if (!sch->cpu_bstats)
+   goto errout;
+
+   sch->cpu_qstats = alloc_percpu(struct gnet_stats_queue);
+   if (!sch->cpu_qstats)
+   goto errout;
+   }
+
+   return sch;
 errout:
+   qdisc_destroy(sch);
return NULL;
 }
 EXPORT_SYMBOL(qdisc_create_dflt);

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: nf_unregister_net_hook: hook not found!

2015-12-30 Thread Sander Eikelenboom

On 2015-12-30 03:39, ebied...@xmission.com wrote:

Pablo Neira Ayuso  writes:


On Mon, Dec 28, 2015 at 09:05:03PM +0100, Sander Eikelenboom wrote:

Hi,

Running a 4.4.0-rc6 kernel i encountered the warning below.


Cc'ing Eric Biederman.

@Sander, could you provide a way to reproduce this?


I am on vacation until the new year, but if this is reproducible we
should be able to print out reg, reg->pf, reg->hooknum, reg->hook
to figure out which hook is having something very weird happen to it.

This is happening in some network namespace exit.

Eric



Unfortunately i have found no way to reproduce,
13 seconds implies it was at boot, but i only have seen this once.

--
Sander


Thanks.


[   13.740472] ip_tables: (C) 2000-2006 Netfilter Core Team
[   13.936237] iwlwifi :03:00.0: L1 Enabled - LTR Disabled
[   13.945391] iwlwifi :03:00.0: L1 Enabled - LTR Disabled
[   13.947434] iwlwifi :03:00.0: Radio type=0x2-0x1-0x0
[   14.223990] iwlwifi :03:00.0: L1 Enabled - LTR Disabled
[   14.232065] iwlwifi :03:00.0: L1 Enabled - LTR Disabled
[   14.233570] iwlwifi :03:00.0: Radio type=0x2-0x1-0x0
[   14.328141] systemd-logind[2485]: Failed to start user service: 
Unknown

unit: user@117.service
[   14.356634] systemd-logind[2485]: New session c1 of user lightdm.
[   14.357320] [ cut here ]
[   14.357327] WARNING: CPU: 2 PID: 102 at net/netfilter/core.c:143
netfilter_net_exit+0x25/0x50()
[   14.357328] nf_unregister_net_hook: hook not found!
[   14.357371] Modules linked in: iptable_security(+) iptable_raw
iptable_filter ip_tables x_tables input_polldev bnep binfmt_misc nfsd
auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc 
uvcvideo

videobuf2_vmalloc iTCO_wdt arc4 videobuf2_memops iTCO_vendor_support
intel_rapl iosf_mbi videobuf2_v4l2 x86_pkg_temp_thermal 
intel_powerclamp
btusb coretemp snd_hda_codec_hdmi iwldvm videobuf2_core btrtl 
kvm_intel
v4l2_common mac80211 videodev btbcm snd_hda_codec_conexant btintel 
media kvm
snd_hda_codec_generic bluetooth psmouse thinkpad_acpi iwlwifi 
snd_hda_intel
pcspkr serio_raw snd_hda_codec nvram cfg80211 snd_hwdep snd_hda_core 
rfkill
i2c_i801 lpc_ich snd_pcm mfd_core snd_timer evdev snd soundcore 
shpchp
tpm_tis tpm algif_skcipher af_alg crct10dif_pclmul crc32_pclmul 
crc32c_intel

aesni_intel
[   14.357380]  ehci_pci sdhci_pci aes_x86_64 glue_helper ehci_hcd 
e1000e
lrw ablk_helper sg sdhci cryptd sd_mod ptp mmc_core usbcore 
usb_common

pps_core
[   14.357383] CPU: 2 PID: 102 Comm: kworker/u16:3 Tainted: G U
4.4.0-rc6-x220-20151224+ #1
[   14.357384] Hardware name: LENOVO 42912ZU/42912ZU, BIOS 8DET69WW 
(1.39 )

07/18/2013
[   14.357390] Workqueue: netns cleanup_net
[   14.357393]  81a27dfd 81359c69 88030e7cbd40
81060297
[   14.357395]  88030e820d80 88030e7cbd90 81c962d8
81c962e0
[   14.357397]  88030e7cbdf8 81060317 81a2c010
88030018
[   14.357398] Call Trace:
[   14.357405]  [] ? dump_stack+0x40/0x57
[   14.357408]  [] ? warn_slowpath_common+0x77/0xb0
[   14.357410]  [] ? warn_slowpath_fmt+0x47/0x50
[   14.357416]  [] ? mutex_lock+0x9/0x30
[   14.357418]  [] ? netfilter_net_exit+0x25/0x50
[   14.357421]  [] ? ops_exit_list.isra.6+0x2e/0x60
[   14.357424]  [] ? cleanup_net+0x1ab/0x280
[   14.357427]  [] ? process_one_work+0x133/0x330
[   14.357429]  [] ? worker_thread+0x60/0x470
[   14.357430]  [] ? process_one_work+0x330/0x330
[   14.357434]  [] ? kthread+0xca/0xe0
[   14.357436]  [] ? 
kthread_create_on_node+0x170/0x170

[   14.357439]  [] ? ret_from_fork+0x3f/0x70
[   14.357441]  [] ? 
kthread_create_on_node+0x170/0x170

[   14.357443] ---[ end trace 9984cc4b0e89f818 ]---
[   14.357443] [ cut here ]
[   14.357446] WARNING: CPU: 2 PID: 102 at net/netfilter/core.c:143
netfilter_net_exit+0x25/0x50()
[   14.357446] nf_unregister_net_hook: hook not found!
[   14.357472] Modules linked in: iptable_security(+) iptable_raw
iptable_filter ip_tables x_tables input_polldev bnep binfmt_misc nfsd
auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc 
uvcvideo

videobuf2_vmalloc iTCO_wdt arc4 videobuf2_memops iTCO_vendor_support
intel_rapl iosf_mbi videobuf2_v4l2 x86_pkg_temp_thermal 
intel_powerclamp
btusb coretemp snd_hda_codec_hdmi iwldvm videobuf2_core btrtl 
kvm_intel
v4l2_common mac80211 videodev btbcm snd_hda_codec_conexant btintel 
media kvm
snd_hda_codec_generic bluetooth psmouse thinkpad_acpi iwlwifi 
snd_hda_intel
pcspkr serio_raw snd_hda_codec nvram cfg80211 snd_hwdep snd_hda_core 
rfkill
i2c_i801 lpc_ich snd_pcm mfd_core snd_timer evdev snd soundcore 
shpchp
tpm_tis tpm algif_skcipher af_alg crct10dif_pclmul crc32_pclmul 
crc32c_intel

aesni_intel
[   14.357478]  ehci_pci sdhci_pci aes_x86_64 glue_helper ehci_hcd 
e1000e
lrw ablk_helper sg sdhci cryptd sd_mod ptp mmc_core usbcore 
usb_common

pps_core
[   14.357480] CPU: 2 PID: 102 Comm: kworker/u16:3 Tainted: G U  
W


[PATCH v2] mwifiex: correctly handling kzalloc

2015-12-30 Thread Insu Yun
Since kzalloc can be failed in memory pressure,
it needs to be handled, otherwise NULL dereference could be happened

Signed-off-by: Insu Yun 
---
 drivers/net/wireless/mwifiex/sdio.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/wireless/mwifiex/sdio.c 
b/drivers/net/wireless/mwifiex/sdio.c
index 78a8474..a8af72d 100644
--- a/drivers/net/wireless/mwifiex/sdio.c
+++ b/drivers/net/wireless/mwifiex/sdio.c
@@ -2053,8 +2053,19 @@ static int mwifiex_init_sdio(struct mwifiex_adapter 
*adapter)
/* Allocate skb pointer buffers */
card->mpa_rx.skb_arr = kzalloc((sizeof(void *)) *
   card->mp_agg_pkt_limit, GFP_KERNEL);
+   if (!card->mpa_rx.skb_arr) {
+   kfree(card->mp_regs);
+   return -ENOMEM;
+   }
+
card->mpa_rx.len_arr = kzalloc(sizeof(*card->mpa_rx.len_arr) *
   card->mp_agg_pkt_limit, GFP_KERNEL);
+   if (!card->mpa_rx.len_arr) {
+   kfree(card->mp_regs);
+   kfree(card->mpa_rx.skb_arr);
+   return -ENOMEM;
+   }
+
ret = mwifiex_alloc_sdio_mpa_buffers(adapter,
 card->mp_tx_agg_buf_size,
 card->mp_rx_agg_buf_size);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable

2015-12-30 Thread Marcelo Ricardo Leitner
On Wed, Dec 30, 2015 at 12:19:39PM -0500, Eric Dumazet wrote:
> On Wed, 2015-12-30 at 23:50 +0800, Xin Long wrote:
> 
> > besides, this patchset will use transport hashtable to replace
> > association hashtable to lookup with rhashtable api. get transport
> > first then get association by t->asoc. and also it will make tcp
> > style work better.
> 
> SCTP already has a hash table, why not simply changing the way items are
> hashed into it ?

Because Vlad asked to split the patch so it gets easier to review. The
direct change was quite big.

> Sure, storing thousands of sockets in a single hash bucket is not wise.
> 
> Switching SCTP to rhashtable at this moment is premature, it is still
> moving fast.

Dave and Vlad had asked in the first review for considering using
rhashtable (ok, Dave didn't mention it by name).  We did, and it seemed
nice beside 1 issue Xin found, regarding multiple rehashing, which I'll
highlight in a reply right away. 
Said all this, I know this was your second email already against this
usage, but I have to ask, sorry: still really against it?

Initial post was with subject:
[PATCH net] sctp: support global vtag assochash and per endpoint
s(d)port assochash table

Thanks,
Marcelo

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v2 00/10] be2net: patch set

2015-12-30 Thread David Miller
From: Sathya Perla 
Date: Wed, 30 Dec 2015 11:57:18 +0530

> On Wed, Dec 30, 2015 at 2:20 AM, David Miller  wrote:
>>
>> Please fix the problems reported by the kbuild test robot, they happened
>> when I tried to build this too.
> 
> David, the test robot is complaining that all values of the enum are
> not being handled by the switch statement.

That is a legitimate warning, and we fix those wherever they occur.

Don't even try to argue that leaving the gcc warning in new code
is legitimate.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 03/12] net: sched: allow qdiscs to handle locking

2015-12-30 Thread John Fastabend
This patch adds a flag for queueing disciplines to indicate
the stack does not need to use the qdisc lock to protect
operations. This can be used to build lockless scheduling
algorithms and improving performance.

The flag is checked in the tx path and the qdisc lock is
only taken if it is not set. For now use a conditional
if statement. Later we could be more aggressive if it
proves worthwhile and use a static key or wrap this in
a likely().

Signed-off-by: John Fastabend 
---
 include/net/sch_generic.h |1 +
 net/core/dev.c|   20 
 net/sched/sch_generic.c   |7 +--
 3 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index b2a8e63..c8d42c3 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -64,6 +64,7 @@ struct Qdisc {
 #define TCQ_F_NOPARENT 0x40 /* root of its hierarchy :
  * qdisc_tree_decrease_qlen() should stop.
  */
+#define TCQ_F_NOLOCK   0x80 /* qdisc does not require locking */
u32 limit;
const struct Qdisc_ops  *ops;
struct qdisc_size_table __rcu *stab;
diff --git a/net/core/dev.c b/net/core/dev.c
index 914b4a2..7a51609 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3004,7 +3004,7 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, 
struct Qdisc *q,
 struct netdev_queue *txq)
 {
spinlock_t *root_lock = qdisc_lock(q);
-   bool contended;
+   bool contended = false;
int rc;
 
qdisc_pkt_len_init(skb);
@@ -3015,11 +3015,13 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, 
struct Qdisc *q,
 * This permits __QDISC___STATE_RUNNING owner to get the lock more
 * often and dequeue packets faster.
 */
-   contended = qdisc_is_running(q);
-   if (unlikely(contended))
-   spin_lock(>busylock);
+   if (!(q->flags & TCQ_F_NOLOCK)) {
+   contended = qdisc_is_running(q);
+   if (unlikely(contended))
+   spin_lock(>busylock);
+   spin_lock(root_lock);
+   }
 
-   spin_lock(root_lock);
if (unlikely(test_bit(__QDISC_STATE_DEACTIVATED, >state))) {
kfree_skb(skb);
rc = NET_XMIT_DROP;
@@ -3053,9 +3055,11 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, 
struct Qdisc *q,
__qdisc_run(q);
}
}
-   spin_unlock(root_lock);
-   if (unlikely(contended))
-   spin_unlock(>busylock);
+   if (!(q->flags & TCQ_F_NOLOCK)) {
+   spin_unlock(root_lock);
+   if (unlikely(contended))
+   spin_unlock(>busylock);
+   }
return rc;
 }
 
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 16bc83b..37dfa4a 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -153,7 +153,8 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
int ret = NETDEV_TX_BUSY;
 
/* And release qdisc */
-   spin_unlock(root_lock);
+   if (!(q->flags & TCQ_F_NOLOCK))
+   spin_unlock(root_lock);
 
/* Note that we validate skb (GSO, checksum, ...) outside of locks */
if (validate)
@@ -166,7 +167,9 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
 
HARD_TX_UNLOCK(dev, txq);
}
-   spin_lock(root_lock);
+
+   if (!(q->flags & TCQ_F_NOLOCK))
+   spin_lock(root_lock);
 
if (dev_xmit_complete(ret)) {
/* Driver sent out skb successfully or skb was consumed */

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 02/12] net: sched: free per cpu bstats

2015-12-30 Thread John Fastabend
When a qdisc is using per cpu stats only the bstats are being
freed. This also free's the qstats.

Signed-off-by: John Fastabend 
---
 net/sched/sch_generic.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index e82a1ad..16bc83b 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -658,8 +658,10 @@ static void qdisc_rcu_free(struct rcu_head *head)
 {
struct Qdisc *qdisc = container_of(head, struct Qdisc, rcu_head);
 
-   if (qdisc_is_percpu_stats(qdisc))
+   if (qdisc_is_percpu_stats(qdisc)) {
free_percpu(qdisc->cpu_bstats);
+   free_percpu(qdisc->cpu_qstats);
+   }
 
kfree((char *) qdisc - qdisc->padded);
 }

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 12/12] net: sched: pfifo_fast new option to deque multiple pkts

2015-12-30 Thread John Fastabend
On 15-12-30 09:55 AM, John Fastabend wrote:
> Now that pfifo_fast is using the alf_queue data structures we can
> dequeue multiple skbs and save some overhead.
> 
> This works because the bulk dequeue logic accepts skb lists already.
> 
> Signed-off-by: John Fastabend 
> ---

oops I didn't mean to send this it obviously doesn't work because
until you have 8 skbs nothing gets dequeued. This was just a test
patch I was looking at for perf numbers. Maybe it provides some
insight into how we could build a pfifo_bulk or add an option to
pfifo_fast to dequeue multiple pkts at a time. The trick is to
sort out how long to wait for packets to build up or possibly
just remove this line,

+   if (this_cpu_ptr(qdisc->cpu_qstats)->qlen < 8)
+   return NULL;

And opportunistically pull packets out at the risk of over-running
the driver if those are large skbs.

.John
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mac80211: Make addr const in SET_IEEE80211_PERM_ADDR()

2015-12-30 Thread Souptick Joarder
On Wed, Dec 30, 2015 at 10:35 PM, Bjorn Andersson  wrote:
> On Wed, Dec 30, 2015 at 8:47 AM, Souptick Joarder  
> wrote:
>>
>> HI Bjorn,
>>
>> On Thu, Dec 24, 2015 at 2:03 PM, Bjorn Andersson  wrote:
>> > Make the addr parameter const in SET_IEEE80211_PERM_ADDR() to save
>> > clients from having to cast away a const qualifier.
>> >
>> > Signed-off-by: Bjorn Andersson 
>> > ---
>> >  include/net/mac80211.h | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/include/net/mac80211.h b/include/net/mac80211.h
>> > index 7c30faff245f..a6f3c9c4b7c2 100644
>> > --- a/include/net/mac80211.h
>> > +++ b/include/net/mac80211.h
>> > @@ -2167,7 +2167,7 @@ static inline void SET_IEEE80211_DEV(struct 
>> > ieee80211_hw *hw, struct device *dev
>> >   * @hw: the  ieee80211_hw to set the MAC address for
>> >   * @addr: the address to set
>> >   */
>> > -static inline void SET_IEEE80211_PERM_ADDR(struct ieee80211_hw *hw, u8 
>> > *addr)
>> > +static inline void SET_IEEE80211_PERM_ADDR(struct ieee80211_hw *hw, const 
>> > u8 *addr)
>>
>> I guess without const or with const doesn't make much difference here.
>> Correct me if I am wrong.
>
> For most cases it doesn't make any difference, but in my driver I
> acquire the mac address as a const u8 *. Therefor I need to cast away
> the const part when calling this API.
>
> There's an existing example of this in
> drivers/net/wireless/st/cw1200/main.c line 601.

Is the path correct ? I think path is
drivers/net/wireless/cw1200/main.c line 334

> I think it's safe to assume that this API won't ever modify the passed
> addr buffer, so there would be no future issues of marking the
> parameter const either.

I agree with you.

>
>>
>> >  {
>> > memcpy(hw->wiphy->perm_addr, addr, ETH_ALEN);
>> >  }
>>
>
> Regards,
> Bjorn

-Souptick
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 3/5] sctp: apply rhashtable api to sctp procfs

2015-12-30 Thread Xin Long
Traversal the transport rhashtable, get the association only once through
the condition assoc->peer.primary_path != transport.

Signed-off-by: Xin Long 
Signed-off-by: Marcelo Ricardo Leitner 
---
 net/sctp/proc.c | 316 +++-
 1 file changed, 173 insertions(+), 143 deletions(-)

diff --git a/net/sctp/proc.c b/net/sctp/proc.c
index 0697eda..dfa7eec 100644
--- a/net/sctp/proc.c
+++ b/net/sctp/proc.c
@@ -281,88 +281,136 @@ void sctp_eps_proc_exit(struct net *net)
remove_proc_entry("eps", net->sctp.proc_net_sctp);
 }
 
+struct sctp_ht_iter {
+   struct seq_net_private p;
+   struct rhashtable_iter hti;
+};
 
-static void *sctp_assocs_seq_start(struct seq_file *seq, loff_t *pos)
+static struct sctp_transport *sctp_transport_get_next(struct seq_file *seq)
 {
-   if (*pos >= sctp_assoc_hashsize)
-   return NULL;
+   struct sctp_ht_iter *iter = seq->private;
+   struct sctp_transport *t;
 
-   if (*pos < 0)
-   *pos = 0;
+   t = rhashtable_walk_next(>hti);
+   for (; t; t = rhashtable_walk_next(>hti)) {
+   if (IS_ERR(t)) {
+   if (PTR_ERR(t) == -EAGAIN)
+   continue;
+   break;
+   }
 
-   if (*pos == 0)
-   seq_printf(seq, " ASSOC SOCK   STY SST ST HBKT "
-   "ASSOC-ID TX_QUEUE RX_QUEUE UID INODE LPORT "
-   "RPORT LADDRS <-> RADDRS "
-   "HBINT INS OUTS MAXRT T1X T2X RTXC "
-   "wmema wmemq sndbuf rcvbuf\n");
+   if (net_eq(sock_net(t->asoc->base.sk), seq_file_net(seq)) &&
+   t->asoc->peer.primary_path == t)
+   break;
+   }
 
-   return (void *)pos;
+   return t;
 }
 
-static void sctp_assocs_seq_stop(struct seq_file *seq, void *v)
+static struct sctp_transport *sctp_transport_get_idx(struct seq_file *seq,
+loff_t pos)
+{
+   void *obj;
+
+   while (pos && (obj = sctp_transport_get_next(seq)) && !IS_ERR(obj))
+   pos--;
+
+   return obj;
+}
+
+static int sctp_transport_walk_start(struct seq_file *seq)
 {
+   struct sctp_ht_iter *iter = seq->private;
+   int err;
+
+   err = rhashtable_walk_init(_transport_hashtable, >hti);
+   if (err)
+   return err;
+
+   err = rhashtable_walk_start(>hti);
+
+   return err == -EAGAIN ? 0 : err;
 }
 
+static void sctp_transport_walk_stop(struct seq_file *seq)
+{
+   struct sctp_ht_iter *iter = seq->private;
+
+   rhashtable_walk_stop(>hti);
+   rhashtable_walk_exit(>hti);
+}
+
+static void *sctp_assocs_seq_start(struct seq_file *seq, loff_t *pos)
+{
+   int err = sctp_transport_walk_start(seq);
+
+   if (err)
+   return ERR_PTR(err);
+
+   return *pos ? sctp_transport_get_idx(seq, *pos) : SEQ_START_TOKEN;
+}
+
+static void sctp_assocs_seq_stop(struct seq_file *seq, void *v)
+{
+   sctp_transport_walk_stop(seq);
+}
 
 static void *sctp_assocs_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 {
-   if (++*pos >= sctp_assoc_hashsize)
-   return NULL;
+   ++*pos;
 
-   return pos;
+   return sctp_transport_get_next(seq);
 }
 
 /* Display sctp associations (/proc/net/sctp/assocs). */
 static int sctp_assocs_seq_show(struct seq_file *seq, void *v)
 {
-   struct sctp_hashbucket *head;
-   struct sctp_ep_common *epb;
+   struct sctp_transport *transport;
struct sctp_association *assoc;
+   struct sctp_ep_common *epb;
struct sock *sk;
-   inthash = *(loff_t *)v;
-
-   if (hash >= sctp_assoc_hashsize)
-   return -ENOMEM;
 
-   head = _assoc_hashtable[hash];
-   local_bh_disable();
-   read_lock(>lock);
-   sctp_for_each_hentry(epb, >chain) {
-   assoc = sctp_assoc(epb);
-   sk = epb->sk;
-   if (!net_eq(sock_net(sk), seq_file_net(seq)))
-   continue;
-   seq_printf(seq,
-  "%8pK %8pK %-3d %-3d %-2d %-4d "
-  "%4d %8d %8d %7u %5lu %-5d %5d ",
-  assoc, sk, sctp_sk(sk)->type, sk->sk_state,
-  assoc->state, hash,
-  assoc->assoc_id,
-  assoc->sndbuf_used,
-  atomic_read(>rmem_alloc),
-  from_kuid_munged(seq_user_ns(seq), sock_i_uid(sk)),
-  sock_i_ino(sk),
-  epb->bind_addr.port,
-  assoc->peer.port);
-   seq_printf(seq, " ");
-   sctp_seq_dump_local_addrs(seq, epb);
-   seq_printf(seq, "<-> ");
-   

[PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable

2015-12-30 Thread Xin Long
tranport hashtbale will replace the association hashtable to do the
lookup for transport, and then get association by t->assoc, rhashtable
apis will be used because of it's resizable, scalable and using rcu.

lport + rport + paddr will be the base hashkey to locate the chain,
with net to protect one netns from another, then plus the laddr to
compare to get the target.

this patch will provider the lookup functions:
- sctp_epaddr_lookup_transport
- sctp_addrs_lookup_transport

hash/unhash functions:
- sctp_hash_transport
- sctp_unhash_transport

init/destroy functions:
- sctp_transport_hashtable_init
- sctp_transport_hashtable_destroy

Signed-off-by: Xin Long 
Signed-off-by: Marcelo Ricardo Leitner 
---
 include/net/sctp/sctp.h|  11 
 include/net/sctp/structs.h |   5 ++
 net/sctp/input.c   | 131 +
 3 files changed, 147 insertions(+)

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index ce13cf2..7bbdfba 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -143,6 +143,17 @@ void sctp_icmp_proto_unreachable(struct sock *sk,
 struct sctp_transport *t);
 void sctp_backlog_migrate(struct sctp_association *assoc,
  struct sock *oldsk, struct sock *newsk);
+int sctp_transport_hashtable_init(void);
+void sctp_transport_hashtable_destroy(void);
+void sctp_hash_transport(struct sctp_transport *t);
+void sctp_unhash_transport(struct sctp_transport *t);
+struct sctp_transport *sctp_addrs_lookup_transport(
+   struct net *net,
+   const union sctp_addr *laddr,
+   const union sctp_addr *paddr);
+struct sctp_transport *sctp_epaddr_lookup_transport(
+   const struct sctp_endpoint *ep,
+   const union sctp_addr *paddr);
 
 /*
  * sctp/proc.c
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index eea9bde..4ab87d0 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -48,6 +48,7 @@
 #define __sctp_structs_h__
 
 #include 
+#include 
 #include   /* linux/in.h needs this!!*/
 #include   /* We get struct sockaddr_in. */
 #include  /* We get struct in6_addr */
@@ -123,6 +124,8 @@ extern struct sctp_globals {
struct sctp_hashbucket *assoc_hashtable;
/* This is the sctp port control hash.  */
struct sctp_bind_hashbucket *port_hashtable;
+   /* This is the hash of all transports. */
+   struct rhashtable transport_hashtable;
 
/* Sizes of above hashtables. */
int ep_hashsize;
@@ -147,6 +150,7 @@ extern struct sctp_globals {
 #define sctp_assoc_hashtable   (sctp_globals.assoc_hashtable)
 #define sctp_port_hashsize (sctp_globals.port_hashsize)
 #define sctp_port_hashtable(sctp_globals.port_hashtable)
+#define sctp_transport_hashtable   (sctp_globals.transport_hashtable)
 #define sctp_checksum_disable  (sctp_globals.checksum_disable)
 
 /* SCTP Socket type: UDP or TCP style. */
@@ -753,6 +757,7 @@ static inline int sctp_packet_empty(struct sctp_packet 
*packet)
 struct sctp_transport {
/* A list of transports. */
struct list_head transports;
+   struct rhash_head node;
 
/* Reference counting. */
atomic_t refcnt;
diff --git a/net/sctp/input.c b/net/sctp/input.c
index b6493b3..bac8278 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -782,6 +782,137 @@ hit:
return ep;
 }
 
+/* rhashtable for transport */
+struct sctp_hash_cmp_arg {
+   const union sctp_addr   *laddr;
+   const union sctp_addr   *paddr;
+   const struct net*net;
+};
+
+static inline int sctp_hash_cmp(struct rhashtable_compare_arg *arg,
+   const void *ptr)
+{
+   const struct sctp_hash_cmp_arg *x = arg->key;
+   const struct sctp_transport *t = ptr;
+   struct sctp_association *asoc = t->asoc;
+   const struct net *net = x->net;
+
+   if (x->laddr->v4.sin_port != htons(asoc->base.bind_addr.port))
+   return 1;
+   if (!sctp_cmp_addr_exact(>ipaddr, x->paddr))
+   return 1;
+   if (!net_eq(sock_net(asoc->base.sk), net))
+   return 1;
+   if (!sctp_bind_addr_match(>base.bind_addr,
+ x->laddr, sctp_sk(asoc->base.sk)))
+   return 1;
+
+   return 0;
+}
+
+static inline u32 sctp_hash_obj(const void *data, u32 len, u32 seed)
+{
+   const struct sctp_transport *t = data;
+   const union sctp_addr *paddr = >ipaddr;
+   const struct net *net = sock_net(t->asoc->base.sk);
+   u16 lport = htons(t->asoc->base.bind_addr.port);
+   u32 addr;
+
+   if (paddr->sa.sa_family == AF_INET6)
+   addr = jhash(>v6.sin6_addr, 16, seed);

[PATCH net-next 2/5] sctp: apply rhashtable api to send/recv path

2015-12-30 Thread Xin Long
apply lookup apis to two functions, for __sctp_endpoint_lookup_assoc
and __sctp_lookup_association, it's invoked in the protection of sock
lock, it will be safe, but sctp_lookup_association need to call
rcu_read_lock() and to detect the t->dead to protect it.

Signed-off-by: Xin Long 
Signed-off-by: Marcelo Ricardo Leitner 
---
 net/sctp/associola.c   |  5 +
 net/sctp/endpointola.c | 35 ---
 net/sctp/input.c   | 39 ++-
 net/sctp/protocol.c|  6 ++
 4 files changed, 29 insertions(+), 56 deletions(-)

diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 559afd0..2bf8ec9 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -383,6 +383,7 @@ void sctp_association_free(struct sctp_association *asoc)
list_for_each_safe(pos, temp, >peer.transport_addr_list) {
transport = list_entry(pos, struct sctp_transport, transports);
list_del_rcu(pos);
+   sctp_unhash_transport(transport);
sctp_transport_free(transport);
}
 
@@ -500,6 +501,8 @@ void sctp_assoc_rm_peer(struct sctp_association *asoc,
 
/* Remove this peer from the list. */
list_del_rcu(>transports);
+   /* Remove this peer from the transport hashtable */
+   sctp_unhash_transport(peer);
 
/* Get the first transport of asoc. */
pos = asoc->peer.transport_addr_list.next;
@@ -699,6 +702,8 @@ struct sctp_transport *sctp_assoc_add_peer(struct 
sctp_association *asoc,
/* Attach the remote transport to our asoc.  */
list_add_tail_rcu(>transports, >peer.transport_addr_list);
asoc->peer.transport_count++;
+   /* Add this peer into the transport hashtable */
+   sctp_hash_transport(peer);
 
/* If we do not yet have a primary path, set one.  */
if (!asoc->peer.primary_path) {
diff --git a/net/sctp/endpointola.c b/net/sctp/endpointola.c
index 9da76ba..8838bf4 100644
--- a/net/sctp/endpointola.c
+++ b/net/sctp/endpointola.c
@@ -314,8 +314,8 @@ struct sctp_endpoint *sctp_endpoint_is_match(struct 
sctp_endpoint *ep,
 }
 
 /* Find the association that goes with this chunk.
- * We do a linear search of the associations for this endpoint.
- * We return the matching transport address too.
+ * We lookup the transport from hashtable at first, then get association
+ * through t->assoc.
  */
 static struct sctp_association *__sctp_endpoint_lookup_assoc(
const struct sctp_endpoint *ep,
@@ -323,12 +323,7 @@ static struct sctp_association 
*__sctp_endpoint_lookup_assoc(
struct sctp_transport **transport)
 {
struct sctp_association *asoc = NULL;
-   struct sctp_association *tmp;
-   struct sctp_transport *t = NULL;
-   struct sctp_hashbucket *head;
-   struct sctp_ep_common *epb;
-   int hash;
-   int rport;
+   struct sctp_transport *t;
 
*transport = NULL;
 
@@ -337,26 +332,12 @@ static struct sctp_association 
*__sctp_endpoint_lookup_assoc(
 */
if (!ep->base.bind_addr.port)
goto out;
+   t = sctp_epaddr_lookup_transport(ep, paddr);
+   if (!t || t->asoc->temp)
+   goto out;
 
-   rport = ntohs(paddr->v4.sin_port);
-
-   hash = sctp_assoc_hashfn(sock_net(ep->base.sk), ep->base.bind_addr.port,
-rport);
-   head = _assoc_hashtable[hash];
-   read_lock(>lock);
-   sctp_for_each_hentry(epb, >chain) {
-   tmp = sctp_assoc(epb);
-   if (tmp->ep != ep || rport != tmp->peer.port)
-   continue;
-
-   t = sctp_assoc_lookup_paddr(tmp, paddr);
-   if (t) {
-   asoc = tmp;
-   *transport = t;
-   break;
-   }
-   }
-   read_unlock(>lock);
+   *transport = t;
+   asoc = t->asoc;
 out:
return asoc;
 }
diff --git a/net/sctp/input.c b/net/sctp/input.c
index bac8278..6f075d8 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -981,38 +981,19 @@ static struct sctp_association *__sctp_lookup_association(
const union sctp_addr *peer,
struct sctp_transport **pt)
 {
-   struct sctp_hashbucket *head;
-   struct sctp_ep_common *epb;
-   struct sctp_association *asoc;
-   struct sctp_transport *transport;
-   int hash;
+   struct sctp_transport *t;
 
-   /* Optimize here for direct hit, only listening connections can
-* have wildcards anyways.
-*/
-   hash = sctp_assoc_hashfn(net, ntohs(local->v4.sin_port),
-ntohs(peer->v4.sin_port));
-   head = _assoc_hashtable[hash];
-   read_lock(>lock);
-   sctp_for_each_hentry(epb, >chain) {
-   asoc = sctp_assoc(epb);
-   transport = 

[PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable

2015-12-30 Thread Xin Long
for telecom center, the usual case is that a server is connected by thousands
of clients. but if the server with only one enpoint(udp style) use the same
sport and dport to communicate with every clients, and every assoc in server
will be hashed in the same chain of global assoc hashtable due to currently we
choose dport and sport as the hash key.

when a packet is received, sctp_rcv try to find the assoc with sport and dport,
since that chain is too long to find it fast, it make the performance turn to
very low, some test data is as follow:

in server:
$./ss [start a udp style server there]
in client:
$./cc [start 2500 sockets to connect server with same port and different ip,
   and use one of them to send data to server]

= test on net-next
-- perf top
server:
  55.73%  [kernel] [k] sctp_assoc_is_match
   6.80%  [kernel] [k] sctp_assoc_lookup_paddr
   4.81%  [kernel] [k] sctp_v4_cmp_addr
   3.12%  [kernel] [k] _raw_spin_unlock_irqrestore
   1.94%  [kernel] [k] sctp_cmp_addr_exact

client:
  46.01%  [kernel][k] sctp_endpoint_lookup_assoc
   5.55%  libc-2.17.so[.] __libc_calloc
   5.39%  libc-2.17.so[.] _int_free
   3.92%  libc-2.17.so[.] _int_malloc
   3.23%  [kernel][k] __memset

-- spent time
time is 487s, send pkt is 1000

we need to change the way to calculate the hash key, to use lport +
rport + paddr as the hash key can avoid this issue.

besides, this patchset will use transport hashtable to replace
association hashtable to lookup with rhashtable api. get transport
first then get association by t->asoc. and also it will make tcp
style work better.

= test with this patchset:
-- perf top
server:
  15.98%  [kernel] [k] _raw_spin_unlock_irqrestore
   9.92%  [kernel] [k] __pv_queued_spin_lock_slowpath
   7.22%  [kernel] [k] copy_user_generic_string
   2.38%  libpthread-2.17.so   [.] __recvmsg_nocancel
   1.88%  [kernel] [k] sctp_recvmsg

client:
  11.90%  [kernel]   [k] sctp_hash_cmp
   8.52%  [kernel]   [k] rht_deferred_worker
   4.94%  [kernel]   [k] __pv_queued_spin_lock_slowpath
   3.95%  [kernel]   [k] sctp_bind_addr_match
   2.49%  [kernel]   [k] __memset

-- spent time
time is 22s, send pkt is 1000

Xin Long (5):
  sctp: add the rhashtable apis for sctp global transport hashtable
  sctp: apply rhashtable api to send/recv path
  sctp: apply rhashtable api to sctp procfs
  sctp: drop the old assoc hashtable of sctp
  sctp: remove the local_bh_disable/enable in sctp_endpoint_lookup_assoc

 include/net/sctp/sctp.h|  32 ++---
 include/net/sctp/structs.h |  10 +-
 net/sctp/associola.c   |   5 +
 net/sctp/endpointola.c |  52 ++--
 net/sctp/input.c   | 187 +--
 net/sctp/proc.c| 316 +
 net/sctp/protocol.c|  36 ++
 net/sctp/sm_sideeffect.c   |   2 -
 net/sctp/socket.c  |   6 +-
 9 files changed, 331 insertions(+), 315 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Q: bad routing table cache entries

2015-12-30 Thread David Miller
From: Eric Dumazet 
Date: Wed, 30 Dec 2015 09:17:42 -0500

> On Wed, 2015-12-30 at 15:42 +0300, Stas Sergeev wrote:
>> 29.12.2015 18:22, Sowmini Varadhan пишет:
>> > Do you have admin control over the ubuntu router?
>> > If yes, you might want to check the shared_media [#] setting 
>> > on that router for the interfaces with overlapping subnets.
>> > (it is on by default, I would try turning it off).
>> That didn't help, problem re-appears.
>> Thanks anyway, looks like I am going to disable accept_redirects then.
>> It seems buggy and obviously no one cares.
> 
> Obviously some people take vacations at this period of the year, and do
> stay away from netdev traffic.

+1
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net v2] skbuff: Fix skb checksum flag on skb pull

2015-12-30 Thread Eric Dumazet
On Thu, 2015-09-24 at 14:09 -0700, David Miller wrote:
> From: Pravin B Shelar 
> Date: Tue, 22 Sep 2015 12:57:53 -0700
> 
> > VXLAN device can receive skb with checksum partial. But the checksum
> > offset could be in outer header which is pulled on receive. This results
> > in negative checksum offset for the skb. Such skb can cause the assert
> > failure in skb_checksum_help(). Following patch fixes the bug by setting
> > checksum-none while pulling outer header.
> > 
> > Following is the kernel panic msg from old kernel hitting the bug.
>  ...
> > Reported-by: Anupam Chanda 
> > Signed-off-by: Pravin B Shelar 
> 
> Applied, thanks.


It looks like we also should clear skb->csum ?

__skb_checksum_complete() definitely would be confused by garbage in
skb->csum

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 6b6bd42d6134..43e6f6163e07 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2799,8 +2799,10 @@ static inline void skb_postpull_rcsum(struct sk_buff 
*skb,
if (skb->ip_summed == CHECKSUM_COMPLETE)
skb->csum = csum_sub(skb->csum, csum_partial(start, len, 0));
else if (skb->ip_summed == CHECKSUM_PARTIAL &&
-skb_checksum_start_offset(skb) < 0)
+skb_checksum_start_offset(skb) < 0) {
skb->ip_summed = CHECKSUM_NONE;
+   skb->csum = 0;
+   }
 }
 
 unsigned char *skb_pull_rcsum(struct sk_buff *skb, unsigned int len);



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mac80211: Make addr const in SET_IEEE80211_PERM_ADDR()

2015-12-30 Thread Souptick Joarder
HI Bjorn,

On Thu, Dec 24, 2015 at 2:03 PM, Bjorn Andersson  wrote:
> Make the addr parameter const in SET_IEEE80211_PERM_ADDR() to save
> clients from having to cast away a const qualifier.
>
> Signed-off-by: Bjorn Andersson 
> ---
>  include/net/mac80211.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/net/mac80211.h b/include/net/mac80211.h
> index 7c30faff245f..a6f3c9c4b7c2 100644
> --- a/include/net/mac80211.h
> +++ b/include/net/mac80211.h
> @@ -2167,7 +2167,7 @@ static inline void SET_IEEE80211_DEV(struct 
> ieee80211_hw *hw, struct device *dev
>   * @hw: the  ieee80211_hw to set the MAC address for
>   * @addr: the address to set
>   */
> -static inline void SET_IEEE80211_PERM_ADDR(struct ieee80211_hw *hw, u8 *addr)
> +static inline void SET_IEEE80211_PERM_ADDR(struct ieee80211_hw *hw, const u8 
> *addr)

I guess without const or with const doesn't make much difference here.
Correct me if I am wrong.
>  {
> memcpy(hw->wiphy->perm_addr, addr, ETH_ALEN);
>  }
> --
> 2.5.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-Souptick
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable

2015-12-30 Thread Eric Dumazet
On Wed, 2015-12-30 at 23:50 +0800, Xin Long wrote:
> tranport hashtbale will replace the association hashtable to do the
> lookup for transport, and then get association by t->assoc, rhashtable
> apis will be used because of it's resizable, scalable and using rcu.
> 
> lport + rport + paddr will be the base hashkey to locate the chain,
> with net to protect one netns from another, then plus the laddr to
> compare to get the target.
> 
> this patch will provider the lookup functions:
> - sctp_epaddr_lookup_transport
> - sctp_addrs_lookup_transport
> 
> hash/unhash functions:
> - sctp_hash_transport
> - sctp_unhash_transport
> 
> init/destroy functions:
> - sctp_transport_hashtable_init
> - sctp_transport_hashtable_destroy
> 
> Signed-off-by: Xin Long 
> Signed-off-by: Marcelo Ricardo Leitner 
> ---


I am against using rhashtable in SCTP (or TCP) at this stage, given the
number of bugs we have with it.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable

2015-12-30 Thread Marcelo Ricardo Leitner
On Wed, Dec 30, 2015 at 11:50:46PM +0800, Xin Long wrote:
...
> +void sctp_hash_transport(struct sctp_transport *t)
> +{
> + struct sctp_sockaddr_entry *addr;
> + struct sctp_hash_cmp_arg arg;
> +
> + addr = list_entry(t->asoc->base.bind_addr.address_list.next,
> +   struct sctp_sockaddr_entry, list);
> + arg.laddr = >a;
> + arg.paddr = >ipaddr;
> + arg.net   = sock_net(t->asoc->base.sk);
> +
> +reinsert:
> + if (rhashtable_lookup_insert_key(_transport_hashtable, ,
> +  >node, sctp_hash_params) == -EBUSY)
> + goto reinsert;
> +}

This is the nasty situation I mentioned in previous email. It seems that
a stress test can trigger a double rehash and cause an entry to not be
added.

This is in fact very near some bugs you caught on rhashtable in the past
few days/couple of weeks tops.

I'm actually against this loop as is. I may have not been clear with Xin
about not adding my signature to the patchset due to this.

Please take a look at Xin's emails on thread 'rhashtable: Prevent
spurious EBUSY errors on insertion' about this particular situation.
Cc'ing Herbert as he wanted to see the patches for that issue.

  Marcelo

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 00/12] drop the qdisc lock for pfifo_fast/mq

2015-12-30 Thread John Fastabend
Hi,

This is a first take at removing the qdisc lock on the xmit path
where qdiscs actually have queues of skbs. The ingress qdisc
which is already lockless was "easy" at least in the sense that
we did not need any lock-free data structures to hold skbs.

The series here is experimental at the moment I decided to dump
it to netdev list when the list of folks I wanted to send it to
privately grew to three or four. Hopefully more people will take
a look at it and give feedback/criticism/whatever. For now I've
only done very basic performance tests and it showed a slight
performance improvement with pfifo_fast but this is somewhat to
be expected as the dequeue operation in the qdisc is only removing
a single skb at a time a bulk dequeue would be better presumably
so I'm tinkering with a pfifo_bulk or option to pfifo_fast to make
that work. All that said I ran some traffic over night and my
kernel didn't crash, did a few interface resets and up/downs and
functionally everything is still up and running. On the TODO list
though is to review all the code paths into/out of sch_generic and
sch_api at the moment no promises I didn't miss a path.

The plan of attack here was

 - use the alf_queue (patch 1 from Jesper) and then convert
   pfifo_fast linked list of skbs over to the alf_queue.

 - fixup all the cases where pfifo fast uses qstats to be per-cpu

 - fixup qlen to support per cpu operations

 - make the gso_skb logic per cpu so any given cpu can park an
   skb when the driver throws an error or we get a cpu collision

 - wrap all the qdisc_lock calls in the xmit path with a wrapper
   that checks for a NOLOCK flag first

 - set the per cpu stats bit and nolock bit in pfifo fast and
   see if it works.

On the TODO list,

 - get some performance numbers for various cases all I've done
   so far is run some basic pktgen tests with a debug kernel and
   a few 'perf records'. Both seem to look positive but I'll do
   some more tests over the next few days.

 - review the code paths some more

 - have some cleanup/improvements/review to do in alf_queue

 - add helpers to remove nasty **void casts in alf_queue ops

 - support bulk dequeue from qdisc either pfifo_fast or new qdisc

 - support mqprio and multiq. multiq lets me run classifiers/actions
   and with the lockless bit lets multiple cpus run in parrallel
   for performance close to mq and mqprio.

Another note in my original take on this I tried to rework some of
the error handling out of the drivers and cpu_collision paths to drop
the gso_skb logic altogether. By using dql we could/should(?) know
if a pkt can be consumed at least in the ONETX case. I haven't given
up on this but it got a bit tricky so I dropped it for now.

---

John Fastabend (12):
  lib: array based lock free queue
  net: sched: free per cpu bstats
  net: sched: allow qdiscs to handle locking
  net: sched: provide per cpu qstat helpers
  net: sched: per cpu gso handlers
  net: sched: support qdisc_reset on NOLOCK qdisc
  net: sched: qdisc_qlen for per cpu logic
  net: sched: a dflt qdisc may be used with per cpu stats
  net: sched: pfifo_fast use alf_queue
  net: sched: helper to sum qlen
  net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mq
  net: sched: pfifo_fast new option to deque multiple pkts


 include/linux/alf_queue.h |  368 +
 include/net/gen_stats.h   |3 
 include/net/sch_generic.h |  101 
 lib/Makefile  |2 
 lib/alf_queue.c   |   42 +
 net/core/dev.c|   20 +-
 net/core/gen_stats.c  |9 +
 net/sched/sch_generic.c   |  237 +
 net/sched/sch_mq.c|   25 ++-
 9 files changed, 717 insertions(+), 90 deletions(-)
 create mode 100644 include/linux/alf_queue.h
 create mode 100644 lib/alf_queue.c

--
Signature
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 01/12] lib: array based lock free queue

2015-12-30 Thread John Fastabend
Initial implementation of an array based lock free queue. This works
is originally done by Jesper Dangaard Brouer and I've grabbed it made
only minor tweaks at the moment and plan to use it with the 'tc'
subsystem although it is general enough to be used elsewhere.

Certainly this implementation can be furthered optimized and improved
but it is a good base implementation.

Signed-off-by: Jesper Dangaard Brouer 
Signed-off-by: John Fastabend 
---
 include/linux/alf_queue.h |  368 +
 lib/Makefile  |2 
 lib/alf_queue.c   |   42 +
 3 files changed, 411 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/alf_queue.h
 create mode 100644 lib/alf_queue.c

diff --git a/include/linux/alf_queue.h b/include/linux/alf_queue.h
new file mode 100644
index 000..dac304e
--- /dev/null
+++ b/include/linux/alf_queue.h
@@ -0,0 +1,368 @@
+#ifndef _LINUX_ALF_QUEUE_H
+#define _LINUX_ALF_QUEUE_H
+/* linux/alf_queue.h
+ *
+ * ALF: Array-based Lock-Free queue
+ *
+ * Queue properties
+ *  - Array based for cache-line optimization
+ *  - Bounded by the array size
+ *  - FIFO Producer/Consumer queue, no queue traversal supported
+ *  - Very fast
+ *  - Designed as a queue for pointers to objects
+ *  - Bulk enqueue and dequeue support
+ *  - Supports combinations of Multi and Single Producer/Consumer
+ *
+ * Copyright (C) 2014, Red Hat, Inc.,
+ *  by Jesper Dangaard Brouer and Hannes Frederic Sowa
+ *  for licensing details see kernel-base/COPYING
+ */
+#include 
+#include 
+
+struct alf_actor {
+   u32 head;
+   u32 tail;
+};
+
+struct alf_queue {
+   u32 size;
+   u32 mask;
+   u32 flags;
+   struct alf_actor producer cacheline_aligned_in_smp;
+   struct alf_actor consumer cacheline_aligned_in_smp;
+   void *ring[0] cacheline_aligned_in_smp;
+};
+
+struct alf_queue *alf_queue_alloc(u32 size, gfp_t gfp);
+void alf_queue_free(struct alf_queue *q);
+
+/* Helpers for LOAD and STORE of elements, have been split-out because:
+ *  1. They can be reused for both "Single" and "Multi" variants
+ *  2. Allow us to experiment with (pipeline) optimizations in this area.
+ */
+/* Only a single of these helpers will survive upstream submission */
+#define __helper_alf_enqueue_store __helper_alf_enqueue_store_unroll
+#define __helper_alf_dequeue_load  __helper_alf_dequeue_load_unroll
+
+static inline void
+__helper_alf_enqueue_store_unroll(u32 p_head, struct alf_queue *q,
+ void **ptr, const u32 n)
+{
+   int i, iterations = n & ~3UL;
+   u32 index = p_head & q->mask;
+
+   if (likely((index + n) <= q->mask)) {
+   /* Can save masked-AND knowing we cannot wrap */
+   /* Loop unroll */
+   for (i = 0; i < iterations; i += 4, index += 4) {
+   q->ring[index]   = ptr[i];
+   q->ring[index+1] = ptr[i+1];
+   q->ring[index+2] = ptr[i+2];
+   q->ring[index+3] = ptr[i+3];
+   }
+   /* Remainder handling */
+   switch (n & 0x3) {
+   case 3:
+   q->ring[index]   = ptr[i];
+   q->ring[index+1] = ptr[i+1];
+   q->ring[index+2] = ptr[i+2];
+   break;
+   case 2:
+   q->ring[index]   = ptr[i];
+   q->ring[index+1] = ptr[i+1];
+   break;
+   case 1:
+   q->ring[index] = ptr[i];
+   }
+   } else {
+   /* Fall-back to "mask" version */
+   for (i = 0; i < n; i++, index++)
+   q->ring[index & q->mask] = ptr[i];
+   }
+}
+
+static inline void
+__helper_alf_dequeue_load_unroll(u32 c_head, struct alf_queue *q,
+void **ptr, const u32 elems)
+{
+   int i, iterations = elems & ~3UL;
+   u32 index = c_head & q->mask;
+
+   if (likely((index + elems) <= q->mask)) {
+   /* Can save masked-AND knowing we cannot wrap */
+   /* Loop unroll */
+   for (i = 0; i < iterations; i += 4, index += 4) {
+   ptr[i]   = q->ring[index];
+   ptr[i+1] = q->ring[index+1];
+   ptr[i+2] = q->ring[index+2];
+   ptr[i+3] = q->ring[index+3];
+   }
+   /* Remainder handling */
+   switch (elems & 0x3) {
+   case 3:
+   ptr[i]   = q->ring[index];
+   ptr[i+1] = q->ring[index+1];
+   ptr[i+2] = q->ring[index+2];
+   break;
+   case 2:
+   ptr[i]   = q->ring[index];
+   ptr[i+1] = q->ring[index+1];
+   break;
+  

Re: 4.4-rc7 failure report

2015-12-30 Thread David Miller
From: Eric Dumazet 
Date: Wed, 30 Dec 2015 11:55:25 -0500

> On Wed, 2015-12-30 at 10:11 -0500, Dave Jones wrote:
>> On Wed, Dec 30, 2015 at 10:38:56AM +0100, Daniel Borkmann wrote:
>> 
>>  > Given that this drop doesn't strictly need to be caused by filter code,
>>  > it would be nice if you could pin the location down where the packet gets
>>  > dropped exactly. Perhaps dropwatch or perf with '-e skb:kfree_skb -a -g
>>  > dhclient ', etc could help to get a first overview to dig into
>>  > details then.
>> 
>> Wild stab in the dark, but..
>> Could this bug be another symptom fixed by 
>> http://article.gmane.org/gmane.linux.network/392885 ?
> 
> dhclient does not use async io

But the bug causes requests to "LOOK" like async I/O, right?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/5] sctp: add the rhashtable apis for sctp global transport hashtable

2015-12-30 Thread David Miller
From: Eric Dumazet 
Date: Wed, 30 Dec 2015 11:57:31 -0500

> I am against using rhashtable in SCTP (or TCP) at this stage, given the
> number of bugs we have with it.

Come on Eric, we've largely dealt with all of these problems.  I haven't
seen a serious report in a while.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 4.4-rc7 failure report

2015-12-30 Thread Eric Dumazet
On Wed, 2015-12-30 at 10:11 -0500, Dave Jones wrote:
> On Wed, Dec 30, 2015 at 10:38:56AM +0100, Daniel Borkmann wrote:
> 
>  > Given that this drop doesn't strictly need to be caused by filter code,
>  > it would be nice if you could pin the location down where the packet gets
>  > dropped exactly. Perhaps dropwatch or perf with '-e skb:kfree_skb -a -g
>  > dhclient ', etc could help to get a first overview to dig into
>  > details then.
> 
> Wild stab in the dark, but..
> Could this bug be another symptom fixed by 
> http://article.gmane.org/gmane.linux.network/392885 ?

dhclient does not use async io



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND] iwlwifi:Fix error handling in the function iwl_pcie_enqueue_hcmd

2015-12-30 Thread Grumbach, Emmanuel
Hi,


On 12/30/2015 07:15 PM, Nicholas Krause wrote:
> This fixes error handling in the function iwl_pcie_enqueue_hcmd
> by checking if all calls to the function wl_pcie_txq_build_tfd
> have failed by returning a error code and if so jump to the goto
> label out from the cleaning up of acquired resources before


For sure you haven't ran your code otherwise you would have noticed it
break pretty much everything.
Moreover this patch is not based on my -next tree.
Simple rebasing won't fix the obvious issues in your patch though.

> Signed-off-by: Nicholas Krause 
> ---
>  drivers/net/wireless/iwlwifi/pcie/tx.c | 27 +--
>  1 file changed, 17 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/net/wireless/iwlwifi/pcie/tx.c 
> b/drivers/net/wireless/iwlwifi/pcie/tx.c
> index 2b86c21..49c8c77 100644
> --- a/drivers/net/wireless/iwlwifi/pcie/tx.c
> +++ b/drivers/net/wireless/iwlwifi/pcie/tx.c
> @@ -1472,9 +1472,11 @@ static int iwl_pcie_enqueue_hcmd(struct iwl_trans 
> *trans,
>   /* start the TFD with the scratchbuf */
>   scratch_size = min_t(int, copy_size, IWL_HCMD_SCRATCHBUF_SIZE);
>   memcpy(>scratchbufs[q->write_ptr], _cmd->hdr, scratch_size);
> - iwl_pcie_txq_build_tfd(trans, txq,
> -iwl_pcie_get_scratchbuf_dma(txq, q->write_ptr),
> -scratch_size, true);
> + idx = iwl_pcie_txq_build_tfd(trans, txq,
> +  iwl_pcie_get_scratchbuf_dma(txq, 
> q->write_ptr),
> +  scratch_size, true);
> + if (idx)
> + goto out;
>  
>   /* map first command fragment, if any remains */
>   if (copy_size > scratch_size) {
> @@ -1489,8 +1491,9 @@ static int iwl_pcie_enqueue_hcmd(struct iwl_trans 
> *trans,
>   goto out;
>   }
>  
> - iwl_pcie_txq_build_tfd(trans, txq, phys_addr,
> -copy_size - scratch_size, false);
> + idx = iwl_pcie_txq_build_tfd(trans, txq, phys_addr, copy_size - 
> scratch_size, false);
> + if (idx)
> + goto out;
>   }
>  
>   /* map the remaining (adjusted) nocopy/dup fragments */
> @@ -1513,7 +1516,9 @@ static int iwl_pcie_enqueue_hcmd(struct iwl_trans 
> *trans,
>   goto out;
>   }
>  
> - iwl_pcie_txq_build_tfd(trans, txq, phys_addr, cmdlen[i], false);
> + idx = iwl_pcie_txq_build_tfd(trans, txq, phys_addr, cmdlen[i], 
> false);
> + if (idx)
> + goto out;
>   }
>  
>   out_meta->flags = cmd->flags;
> @@ -1830,8 +1835,8 @@ int iwl_trans_pcie_tx(struct iwl_trans *trans, struct 
> sk_buff *skb,
>   /* The first TB points to the scratchbuf data - min_copy bytes */
>   memcpy(>scratchbufs[q->write_ptr], _cmd->hdr,
>  IWL_HCMD_SCRATCHBUF_SIZE);
> - iwl_pcie_txq_build_tfd(trans, txq, tb0_phys,
> -IWL_HCMD_SCRATCHBUF_SIZE, true);
> + if (iwl_pcie_txq_build_tfd(trans, txq, tb0_phys, 
> IWL_HCMD_SCRATCHBUF_SIZE, true))
> + goto out_err;
>  
>   /* there must be data left over for TB1 or this code must be changed */
>   BUILD_BUG_ON(sizeof(struct iwl_tx_cmd) < IWL_HCMD_SCRATCHBUF_SIZE);
> @@ -1841,7 +1846,8 @@ int iwl_trans_pcie_tx(struct iwl_trans *trans, struct 
> sk_buff *skb,
>   tb1_phys = dma_map_single(trans->dev, tb1_addr, tb1_len, DMA_TO_DEVICE);
>   if (unlikely(dma_mapping_error(trans->dev, tb1_phys)))
>   goto out_err;
> - iwl_pcie_txq_build_tfd(trans, txq, tb1_phys, tb1_len, false);
> + if (iwl_pcie_txq_build_tfd(trans, txq, tb1_phys, tb1_len, false))
> + goto out_err;
>  
>   /*
>* Set up TFD's third entry to point directly to remainder
> @@ -1857,7 +1863,8 @@ int iwl_trans_pcie_tx(struct iwl_trans *trans, struct 
> sk_buff *skb,
>  >tfds[q->write_ptr]);
>   goto out_err;
>   }
> - iwl_pcie_txq_build_tfd(trans, txq, tb2_phys, tb2_len, false);
> + if (iwl_pcie_txq_build_tfd(trans, txq, tb2_phys, tb2_len, 
> false))
> + goto out_err;
>   }
>  
>   /* Set up entry for this TFD in Tx byte-count array */

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 10/12] net: sched: helper to sum qlen

2015-12-30 Thread John Fastabend
Reporting qlen when qlen is per cpu requires aggregating the per
cpu counters. This adds a helper routine for this.

Signed-off-by: John Fastabend 
---
 include/net/sch_generic.h |   15 +++
 1 file changed, 15 insertions(+)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 30f4c60..2c57278 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -286,6 +286,21 @@ static inline int qdisc_qlen(const struct Qdisc *q)
return q->q.qlen;
 }
 
+static inline int qdisc_qlen_sum(const struct Qdisc *q)
+{
+   __u32 qlen = 0;
+   int i;
+
+   if (q->flags & TCQ_F_NOLOCK) {
+   for_each_possible_cpu(i)
+   qlen += per_cpu_ptr(q->cpu_qstats, i)->qlen;
+   } else {
+   qlen = q->q.qlen;
+   }
+
+   return qlen;
+}
+
 static inline struct qdisc_skb_cb *qdisc_skb_cb(const struct sk_buff *skb)
 {
return (struct qdisc_skb_cb *)skb->cb;

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 09/12] net: sched: pfifo_fast use alf_queue

2015-12-30 Thread John Fastabend
This converts the pfifo_fast qdisc to use the alf_queue enqueue
and dequeue routines then sets the NOLOCK bit.

This also removes the logic used to pick the next band to dequeue
from and instead just checks each alf_queue for packets from
top priority to lowest. This might need to be a bit more clever
but seems to work for now.

Signed-off-by: John Fastabend 
---
 net/sched/sch_generic.c |  120 +--
 1 file changed, 65 insertions(+), 55 deletions(-)

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index be5d63a..480cf63 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -467,87 +468,80 @@ static const u8 prio2band[TC_PRIO_MAX + 1] = {
 
 /*
  * Private data for a pfifo_fast scheduler containing:
- * - queues for the three band
- * - bitmap indicating which of the bands contain skbs
+ * - rings for the priority bands
  */
 struct pfifo_fast_priv {
-   u32 bitmap;
-   struct sk_buff_head q[PFIFO_FAST_BANDS];
+   struct alf_queue *q[PFIFO_FAST_BANDS];
 };
 
-/*
- * Convert a bitmap to the first band number where an skb is queued, where:
- * bitmap=0 means there are no skbs on any band.
- * bitmap=1 means there is an skb on band 0.
- * bitmap=7 means there are skbs on all 3 bands, etc.
- */
-static const int bitmap2band[] = {-1, 0, 1, 0, 2, 0, 1, 0};
-
-static inline struct sk_buff_head *band2list(struct pfifo_fast_priv *priv,
-int band)
+static inline struct alf_queue *band2list(struct pfifo_fast_priv *priv,
+ int band)
 {
-   return priv->q + band;
+   return priv->q[band];
 }
 
 static int pfifo_fast_enqueue(struct sk_buff *skb, struct Qdisc *qdisc)
 {
-   if (skb_queue_len(>q) < qdisc_dev(qdisc)->tx_queue_len) {
-   int band = prio2band[skb->priority & TC_PRIO_MAX];
-   struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
-   struct sk_buff_head *list = band2list(priv, band);
-
-   priv->bitmap |= (1 << band);
-   qdisc->q.qlen++;
-   return __qdisc_enqueue_tail(skb, qdisc, list);
-   }
-
-   return qdisc_drop(skb, qdisc);
-}
-
-static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc)
-{
+   int band = prio2band[skb->priority & TC_PRIO_MAX];
struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
-   int band = bitmap2band[priv->bitmap];
+   struct alf_queue *q = band2list(priv, band);
+   int n;
 
-   if (likely(band >= 0)) {
-   struct sk_buff_head *list = band2list(priv, band);
-   struct sk_buff *skb = __qdisc_dequeue_head(qdisc, list);
+   if (!q) {
+   WARN_ON(1);
+   return qdisc_drop(skb, qdisc);
+   }
 
-   qdisc->q.qlen--;
-   if (skb_queue_empty(list))
-   priv->bitmap &= ~(1 << band);
+   n = alf_mp_enqueue(q, , 1);
 
-   return skb;
+   /* If queue is overrun fall through to drop */
+   if (n) {
+   qdisc_qstats_cpu_qlen_inc(qdisc);
+   qdisc_qstats_cpu_backlog_inc(qdisc, skb);
+   return NET_XMIT_SUCCESS;
}
 
-   return NULL;
+   return qdisc_drop_cpu(skb, qdisc);
 }
 
-static struct sk_buff *pfifo_fast_peek(struct Qdisc *qdisc)
+static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc)
 {
struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
-   int band = bitmap2band[priv->bitmap];
+   struct sk_buff *skb = NULL;
+   int band;
+
+   for (band = 0; band < PFIFO_FAST_BANDS && !skb; band++) {
+   struct alf_queue *q = band2list(priv, band);
 
-   if (band >= 0) {
-   struct sk_buff_head *list = band2list(priv, band);
+   if (alf_queue_empty(q))
+   continue;
 
-   return skb_peek(list);
+   alf_mc_dequeue(q, , 1);
}
 
-   return NULL;
+   if (likely(skb)) {
+   qdisc_qstats_cpu_backlog_dec(qdisc, skb);
+   qdisc_bstats_cpu_update(qdisc, skb);
+   qdisc_qstats_cpu_qlen_dec(qdisc);
+   }
+
+   return skb;
 }
 
 static void pfifo_fast_reset(struct Qdisc *qdisc)
 {
-   int prio;
+   int i, band;
struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
 
-   for (prio = 0; prio < PFIFO_FAST_BANDS; prio++)
-   __qdisc_reset_queue(qdisc, band2list(priv, prio));
+   for (band = 0; band < PFIFO_FAST_BANDS; band++)
+   alf_queue_flush(band2list(priv, band));
 
-   priv->bitmap = 0;
-   qdisc->qstats.backlog = 0;
-   qdisc->q.qlen = 0;
+   for_each_possible_cpu(i) {
+   struct gnet_stats_queue *q = per_cpu_ptr(qdisc->cpu_qstats, i);
+
+   q->backlog = 0;
+  

[RFC PATCH 11/12] net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mq

2015-12-30 Thread John Fastabend
The sch_mq qdisc creates a sub-qdisc per tx queue which are then
called for independently for enqueue and dequeue operations. However
statistics are aggregated and pushed up to the "master" qdisc.

This patch adds support for any of the sub-qdiscs to be per cpu
statistic qdiscs. To handle this case add a check when calculating
stats and aggregate the per cpu stats if needed.

Also exports __gnet_stats_copy_queue() to use as a helper function.

Signed-off-by: John Fastabend 
---
 include/net/gen_stats.h |3 +++
 net/core/gen_stats.c|9 +
 net/sched/sch_mq.c  |   25 ++---
 3 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/include/net/gen_stats.h b/include/net/gen_stats.h
index cbafa37..a902061 100644
--- a/include/net/gen_stats.h
+++ b/include/net/gen_stats.h
@@ -43,6 +43,9 @@ int gnet_stats_copy_rate_est(struct gnet_dump *d,
 int gnet_stats_copy_queue(struct gnet_dump *d,
  struct gnet_stats_queue __percpu *cpu_q,
  struct gnet_stats_queue *q, __u32 qlen);
+void __gnet_stats_copy_queue(struct gnet_stats_queue *qstats,
+const struct gnet_stats_queue __percpu *cpu_q,
+const struct gnet_stats_queue *q, __u32 qlen);
 int gnet_stats_copy_app(struct gnet_dump *d, void *st, int len);
 
 int gnet_stats_finish_copy(struct gnet_dump *d);
diff --git a/net/core/gen_stats.c b/net/core/gen_stats.c
index 1e2f46a..b653a56 100644
--- a/net/core/gen_stats.c
+++ b/net/core/gen_stats.c
@@ -235,10 +235,10 @@ __gnet_stats_copy_queue_cpu(struct gnet_stats_queue 
*qstats,
}
 }
 
-static void __gnet_stats_copy_queue(struct gnet_stats_queue *qstats,
-   const struct gnet_stats_queue __percpu *cpu,
-   const struct gnet_stats_queue *q,
-   __u32 qlen)
+void __gnet_stats_copy_queue(struct gnet_stats_queue *qstats,
+const struct gnet_stats_queue __percpu *cpu,
+const struct gnet_stats_queue *q,
+__u32 qlen)
 {
if (cpu) {
__gnet_stats_copy_queue_cpu(qstats, cpu);
@@ -252,6 +252,7 @@ static void __gnet_stats_copy_queue(struct gnet_stats_queue 
*qstats,
 
qstats->qlen = qlen;
 }
+EXPORT_SYMBOL(__gnet_stats_copy_queue);
 
 /**
  * gnet_stats_copy_queue - copy queue statistics into statistics TLV
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index 3e82f04..3468317 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct mq_sched {
struct Qdisc**qdiscs;
@@ -107,15 +108,25 @@ static int mq_dump(struct Qdisc *sch, struct sk_buff *skb)
memset(>qstats, 0, sizeof(sch->qstats));
 
for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
+   struct gnet_stats_basic_cpu __percpu *cpu_bstats = NULL;
+   struct gnet_stats_queue __percpu *cpu_qstats = NULL;
+   __u32 qlen = 0;
+
qdisc = netdev_get_tx_queue(dev, ntx)->qdisc_sleeping;
spin_lock_bh(qdisc_lock(qdisc));
-   sch->q.qlen += qdisc->q.qlen;
-   sch->bstats.bytes   += qdisc->bstats.bytes;
-   sch->bstats.packets += qdisc->bstats.packets;
-   sch->qstats.backlog += qdisc->qstats.backlog;
-   sch->qstats.drops   += qdisc->qstats.drops;
-   sch->qstats.requeues+= qdisc->qstats.requeues;
-   sch->qstats.overlimits  += qdisc->qstats.overlimits;
+
+   if (qdisc_is_percpu_stats(qdisc)) {
+   cpu_bstats = qdisc->cpu_bstats;
+   cpu_qstats = qdisc->cpu_qstats;
+   }
+
+   qlen = qdisc_qlen_sum(qdisc);
+
+   __gnet_stats_copy_basic(>bstats,
+   cpu_bstats, >bstats);
+   __gnet_stats_copy_queue(>qstats,
+   cpu_qstats, >qstats, qlen);
+
spin_unlock_bh(qdisc_lock(qdisc));
}
return 0;

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 12/12] net: sched: pfifo_fast new option to deque multiple pkts

2015-12-30 Thread John Fastabend
Now that pfifo_fast is using the alf_queue data structures we can
dequeue multiple skbs and save some overhead.

This works because the bulk dequeue logic accepts skb lists already.

Signed-off-by: John Fastabend 
---
 include/net/sch_generic.h |2 +-
 net/sched/sch_generic.c   |   30 --
 2 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 2c57278..95c11ed 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -128,7 +128,7 @@ static inline void qdisc_run_end(struct Qdisc *qdisc)
 
 static inline bool qdisc_may_bulk(const struct Qdisc *qdisc)
 {
-   return qdisc->flags & TCQ_F_ONETXQUEUE;
+   return (qdisc->flags & TCQ_F_ONETXQUEUE) & !(qdisc->flags & 
TCQ_F_NOLOCK);
 }
 
 static inline int qdisc_avail_bulklimit(const struct netdev_queue *txq)
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 480cf63..ec5e78e 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -507,25 +507,35 @@ static int pfifo_fast_enqueue(struct sk_buff *skb, struct 
Qdisc *qdisc)
 static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc)
 {
struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
-   struct sk_buff *skb = NULL;
-   int band;
+   struct sk_buff *skb[8+1] = {NULL};
+   int band, i, elems = 0;
 
-   for (band = 0; band < PFIFO_FAST_BANDS && !skb; band++) {
+   if (this_cpu_ptr(qdisc->cpu_qstats)->qlen < 8)
+   return NULL;
+
+   for (band = 0; band < PFIFO_FAST_BANDS && !skb[0]; band++) {
struct alf_queue *q = band2list(priv, band);
 
if (alf_queue_empty(q))
continue;
 
-   alf_mc_dequeue(q, , 1);
+   elems = alf_mc_dequeue(q, skb, 8);
+
+   /* link array of skbs for driver to process */
+   for (i = 0; i < elems; i++)
+   skb[i]->next = skb[i+1];
}
 
-   if (likely(skb)) {
-   qdisc_qstats_cpu_backlog_dec(qdisc, skb);
-   qdisc_bstats_cpu_update(qdisc, skb);
-   qdisc_qstats_cpu_qlen_dec(qdisc);
+   if (likely(skb[0])) {
+   for (i = 0; i < elems; i++) {
+   qdisc_qstats_cpu_backlog_dec(qdisc, skb[i]);
+   qdisc_bstats_cpu_update(qdisc, skb[i]);
+   }
+
+   this_cpu_ptr(qdisc->cpu_qstats)->qlen -= elems;
}
 
-   return skb;
+   return skb[0];
 }
 
 static void pfifo_fast_reset(struct Qdisc *qdisc)
@@ -579,7 +589,7 @@ static int pfifo_fast_init(struct Qdisc *qdisc, struct 
nlattr *opt)
}
 
/* Can by-pass the queue discipline */
-   qdisc->flags |= TCQ_F_CAN_BYPASS;
+   //qdisc->flags |= TCQ_F_CAN_BYPASS;
qdisc->flags |= TCQ_F_NOLOCK;
qdisc->flags |= TCQ_F_CPUSTATS;
 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mac80211: Make addr const in SET_IEEE80211_PERM_ADDR()

2015-12-30 Thread Bjorn Andersson
On Wed, Dec 30, 2015 at 10:30 AM, Souptick Joarder  wrote:
> On Wed, Dec 30, 2015 at 10:35 PM, Bjorn Andersson  wrote:
>> On Wed, Dec 30, 2015 at 8:47 AM, Souptick Joarder  
>> wrote:
>>>
>>> HI Bjorn,
>>>
>>> On Thu, Dec 24, 2015 at 2:03 PM, Bjorn Andersson  wrote:
>>> > Make the addr parameter const in SET_IEEE80211_PERM_ADDR() to save
>>> > clients from having to cast away a const qualifier.
>>> >
>>> > Signed-off-by: Bjorn Andersson 
>>> > ---
>>> >  include/net/mac80211.h | 2 +-
>>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>>> >
>>> > diff --git a/include/net/mac80211.h b/include/net/mac80211.h
>>> > index 7c30faff245f..a6f3c9c4b7c2 100644
>>> > --- a/include/net/mac80211.h
>>> > +++ b/include/net/mac80211.h
>>> > @@ -2167,7 +2167,7 @@ static inline void SET_IEEE80211_DEV(struct 
>>> > ieee80211_hw *hw, struct device *dev
>>> >   * @hw: the  ieee80211_hw to set the MAC address for
>>> >   * @addr: the address to set
>>> >   */
>>> > -static inline void SET_IEEE80211_PERM_ADDR(struct ieee80211_hw *hw, u8 
>>> > *addr)
>>> > +static inline void SET_IEEE80211_PERM_ADDR(struct ieee80211_hw *hw, 
>>> > const u8 *addr)
>>>
>>> I guess without const or with const doesn't make much difference here.
>>> Correct me if I am wrong.
>>
>> For most cases it doesn't make any difference, but in my driver I
>> acquire the mac address as a const u8 *. Therefor I need to cast away
>> the const part when calling this API.
>>
>> There's an existing example of this in
>> drivers/net/wireless/st/cw1200/main.c line 601.
>
> Is the path correct ? I think path is
> drivers/net/wireless/cw1200/main.c line 334
>

It's apparently being relocated in linux-next, and I'm not sure where
I got that line number from. But that's the example I tried to refer
to ;)

Sorry about that.

Regards,
Bjorn

>> I think it's safe to assume that this API won't ever modify the passed
>> addr buffer, so there would be no future issues of marking the
>> parameter const either.
>
> I agree with you.
>
>>
>>>
>>> >  {
>>> > memcpy(hw->wiphy->perm_addr, addr, ETH_ALEN);
>>> >  }
>>>
>>
>> Regards,
>> Bjorn
>
> -Souptick
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 05/12] net: sched: per cpu gso handlers

2015-12-30 Thread Jesper Dangaard Brouer
On Wed, 30 Dec 2015 09:52:49 -0800
John Fastabend  wrote:

> The net sched infrastructure has a gso ptr that points to skb structs
> that have failed to be enqueued by the device driver.

What about fixing up the naming "gso" to something else like "requeue",
in the process (or by an pre-patch) ?


> This can happen when multiple cores try to push a skb onto the same
> underlying hardware queue resulting in lock contention. This case is
> handled by a cpu collision handler handle_dev_cpu_collision(). Another
> case occurs when the stack overruns the drivers low level tx queues
> capacity. Ideally these should be a rare occurrence in a well-tuned
> system but they do happen.
> 
> To handle this in the lockless case use a per cpu gso field to park
> the skb until the conflict can be resolved. Note at this point the
> skb has already been popped off the qdisc so it has to be handled
> by the infrastructure.

I generally like this idea of resolving this per cpu.  (I stalled here,
on the requeue issue, last time I implemented a lockless qdisc
approach).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


net/sctp: sock memory leak

2015-12-30 Thread Dmitry Vyukov
Hello,

The following program leads to a leak of two sock objects:

// autogenerated by syzkaller (http://github.com/google/syzkaller)
#include 
#include 
#include 
#include 
#include 

int fd;

void *thr(void *arg)
{
memcpy((void*)0x2000bbbe,
"\x0a\x00\x33\xdc\x14\x4d\x5b\xd1\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\xdd\x01\xf8\xfd\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00",
128);
syscall(SYS_sendto, fd, 0x2000b000ul, 0x70ul, 0x8000ul,
0x2000bbbeul, 0x80ul);
return 0;
}

int main()
{
long i;
pthread_t th[6];

syscall(SYS_mmap, 0x2000ul, 0x2ul, 0x3ul, 0x32ul,
0xul, 0x0ul);
fd = syscall(SYS_socket, 0xaul, 0x1ul, 0x84ul, 0, 0, 0);
memcpy((void*)0x20003000,
"\x02\x00\x33\xdf\x7f\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00",
128);
syscall(SYS_bind, fd, 0x20003000ul, 0x80ul, 0, 0, 0);
pthread_create([0], 0, thr, (void*)0);
usleep(10);
syscall(SYS_listen, fd, 0x3ul, 0, 0, 0, 0);
syscall(SYS_accept, fd, 0x20005f80ul, 0x20003000ul, 0, 0, 0);
return 0;
}


unreferenced object 0x8800342540c0 (size 1864):
  comm "a.out", pid 24109, jiffies 4299060398 (age 27.984s)
  hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
0a 00 07 40 00 00 00 00 00 00 00 00 00 00 00 00  ...@
  backtrace:
[] kmemleak_alloc+0x72/0xc0 mm/kmemleak.c:915
[< inline >] kmemleak_alloc_recursive include/linux/kmemleak.h:47
[< inline >] slab_post_alloc_hook mm/slub.c:1335
[< inline >] slab_alloc_node mm/slub.c:2594
[< inline >] slab_alloc mm/slub.c:2602
[] kmem_cache_alloc+0x12d/0x2c0 mm/slub.c:2607
[] sk_prot_alloc+0x69/0x340 net/core/sock.c:1344
[] sk_alloc+0x3a/0x6b0 net/core/sock.c:1419
[] inet6_create+0x2d7/0x1000 net/ipv6/af_inet6.c:173
[] __sock_create+0x37c/0x640 net/socket.c:1162
[< inline >] sock_create net/socket.c:1202
[< inline >] SYSC_socket net/socket.c:1232
[] SyS_socket+0xef/0x1b0 net/socket.c:1212
[] entry_SYSCALL_64_fastpath+0x16/0x7a
arch/x86/entry/entry_64.S:185
[] 0x
unreferenced object 0x880034253780 (size 1864):
  comm "a.out", pid 24109, jiffies 4299060500 (age 27.882s)
  hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 33 dc 00 00  3...
0a 00 07 40 00 00 00 00 d8 40 25 34 00 88 ff ff  ...@.@%4
  backtrace:
[] kmemleak_alloc+0x72/0xc0 mm/kmemleak.c:915
[< inline >] kmemleak_alloc_recursive include/linux/kmemleak.h:47
[< inline >] slab_post_alloc_hook mm/slub.c:1335
[< inline >] slab_alloc_node mm/slub.c:2594
[< inline >] slab_alloc mm/slub.c:2602
[] kmem_cache_alloc+0x12d/0x2c0 mm/slub.c:2607
[] sk_prot_alloc+0x69/0x340 net/core/sock.c:1344
[] sk_alloc+0x3a/0x6b0 net/core/sock.c:1419
[] sctp_v6_create_accept_sk+0xf0/0x790 net/sctp/ipv6.c:646
[] sctp_accept+0x409/0x6d0 net/sctp/socket.c:3925
[] inet_accept+0xe3/0x660 net/ipv4/af_inet.c:671
[] SYSC_accept4+0x32c/0x630 net/socket.c:1474
[< inline >] SyS_accept4 net/socket.c:1424
[< inline >] SYSC_accept net/socket.c:1508
[] SyS_accept+0x26/0x30 net/socket.c:1505
[] entry_SYSCALL_64_fastpath+0x16/0x7a
arch/x86/entry/entry_64.S:185
[] 0x

On commit 8513342170278468bac126640a5d2d12ffbff106 (Dec 28).
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 05/12] net: sched: per cpu gso handlers

2015-12-30 Thread John Fastabend
On 15-12-30 12:26 PM, Jesper Dangaard Brouer wrote:
> On Wed, 30 Dec 2015 09:52:49 -0800
> John Fastabend  wrote:
> 
>> The net sched infrastructure has a gso ptr that points to skb structs
>> that have failed to be enqueued by the device driver.
> 
> What about fixing up the naming "gso" to something else like "requeue",
> in the process (or by an pre-patch) ?

Sure I'll throw a patch in front of this to rename it.

> 
> 
>> This can happen when multiple cores try to push a skb onto the same
>> underlying hardware queue resulting in lock contention. This case is
>> handled by a cpu collision handler handle_dev_cpu_collision(). Another
>> case occurs when the stack overruns the drivers low level tx queues
>> capacity. Ideally these should be a rare occurrence in a well-tuned
>> system but they do happen.
>>
>> To handle this in the lockless case use a per cpu gso field to park
>> the skb until the conflict can be resolved. Note at this point the
>> skb has already been popped off the qdisc so it has to be handled
>> by the infrastructure.
> 
> I generally like this idea of resolving this per cpu.  (I stalled here,
> on the requeue issue, last time I implemented a lockless qdisc
> approach).
> 

Great, this approach seems to work OK.

On another note even if we only get a single skb dequeued at a time in
the initial implementation this is still a win as soon as we start
running classifiers/actions. Even if doing simple pfifo_fast sans
classifiers raw throughput net gain is minimal.

.John
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 4.4-rc7 failure report

2015-12-30 Thread Doug Ledford
On 12/30/2015 12:50 PM, David Miller wrote:
> From: Eric Dumazet 
> Date: Wed, 30 Dec 2015 11:55:25 -0500
> 
>> On Wed, 2015-12-30 at 10:11 -0500, Dave Jones wrote:
>>> On Wed, Dec 30, 2015 at 10:38:56AM +0100, Daniel Borkmann wrote:
>>>
>>>  > Given that this drop doesn't strictly need to be caused by filter code,
>>>  > it would be nice if you could pin the location down where the packet gets
>>>  > dropped exactly. Perhaps dropwatch or perf with '-e skb:kfree_skb -a -g
>>>  > dhclient ', etc could help to get a first overview to dig into
>>>  > details then.
>>>
>>> Wild stab in the dark, but..
>>> Could this bug be another symptom fixed by 
>>> http://article.gmane.org/gmane.linux.network/392885 ?
>>
>> dhclient does not use async io
> 
> But the bug causes requests to "LOOK" like async I/O, right?
> 

I got my hands on a machine that's reliable, so the bisecting is finally
progressing again.  The machines with ocrdma devices can have link
issues, the machines with mlx5 devices don't support vlans in all kernel
versions, and some of my machines with mlx4 devices simply didn't have
their Ethernet port plugged in.  I managed to check out a machine with
mlx4 in IB/Eth mode that is otherwise reliable on all of the kernel
versions the bisection covers and modified its setup to show me at a
glance if the dhcp on vlan is working and now I'm probably over half
down with the bisection.

-- 
Doug Ledford 
  GPG KeyID: 0E572FDD




signature.asc
Description: OpenPGP digital signature


Re: [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable

2015-12-30 Thread David Miller
From: Eric Dumazet 
Date: Wed, 30 Dec 2015 14:11:20 -0500

> Let see how funny it will be then.

It is more fun than waiting longer for the more limited uses of it to
trigger problems.

I cannot be convinced that using it in more places in order to find
and fix more bugs is a bad thing.

I'm sorry if a lot of bug fixes in a short period of time concerns
you, but for me that's an even clearer sign that it needs help, and
exposing it to more use cases is one of the best forms of help it can
get.

It also tells me that the people actually working on those fixes, such
as Herbert Xu, are motivated and reliable when they are shown properly
formed bug reports.

I cannot think of a report Herbert and others did not resolve in a
timely manner.  They usually add test cases too.

And that matters more to me than anything else.  A subsystem can be
buggy as shit, but if someone is responsible about fixing the reported
bugs properly, then I have absolutely nothing to worry about.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable

2015-12-30 Thread Eric Dumazet
On Wed, 2015-12-30 at 12:52 -0500, David Miller wrote:
> From: Eric Dumazet 
> Date: Wed, 30 Dec 2015 12:19:39 -0500
> 
> > Switching SCTP to rhashtable at this moment is premature, it is
> > still moving fast.
> 
> I completely, and totally, disagree.
> 
> rhashtable actually _needs_ a strong active user like one of the
> protocol socket hashes.
> 
> It's a step backwards to keep rhashtable in the shadows by only
> allowing certain subsystems to convert to it.  That's really
> incredibly stupid if you ask me.

You sure can disagree with me, but calling my opinion 'incredily stupid'
is not wise.

Let me check how stable is rhashtable :

# git log --oneline v4.2.. lib/rhashtable.c
179ccc0a7364 rhashtable: Kill harmless RCU warning in rhashtable_walk_init
c6ff5268293e rhashtable: Fix walker list corruption
3a324606bbab rhashtable: Enforce minimum size on initial hash table
a90099d9fabd Revert "rhashtable: Use __vmalloc with GFP_ATOMIC for table 
allocation"
d3716f18a7d8 rhashtable: Use __vmalloc with GFP_ATOMIC for table allocation
3cf9a39c rhashtable: Prevent spurious EBUSY errors on insertion
7def0f952ecc lib: fix data race in rhashtable_rehash_one


Seriously, I think we can wait one release before 'en masse'
conversions.

I understand we would love to do that, but what is the hurry for SCTP,
that needed rhashtable so desperately that it could not be done before
2016 ?



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] ixgbe: synchronize the link_speed and link_up of a slave interface

2015-12-30 Thread Rustad, Mark D
zyjzyj2...@gmail.com wrote:

> From: Zhu Yanjun 
> 
> According to the suggestion from Rustad, Mark D, this behavior perhaps
> is more related to the copper phy. But to make fiber phy more robust,
> to all the interfaces as a slave interface, the link_speed and link_up
> is synchronized.
> 
> Signed-off-by: Zhu Yanjun 
> ---
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |8 +---
> 1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
> b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index 1bb6056..ce47639 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -6441,10 +6441,12 @@ static void ixgbe_watchdog_link_is_up(struct 
> ixgbe_adapter *adapter)
>* a bonding driver in 802.3ad mode. When X540 NIC acts as an
>* independent interface, it is not necessary to synchronize link_up
>* and link_speed.
> -  * In the end, not continue if (X540 NIC && SLAVE && link_speed UNKNOWN)
> +  * According to the suggestion from Rustad, Mark D, this behavior
> +  * perhaps is related to the copper phy. To make fiber phy more robust,
> +  * To all the interfaces as a slave, the link_speed is checked.
> +  * In the end, not continue if (SLAVE && link_speed UNKNOWN)

There is no need to make reference to my suggestion in the comment, especially 
since that is in the commit message. Please simplify your comment above to be 
something like:
 * For all slave interfaces, wait for the link_speed to be known.

>*/
> - if ((hw->mac.type == ixgbe_mac_X540) &&
> - (netdev->flags & IFF_SLAVE))
> + if (netdev->flags & IFF_SLAVE)
>   if (link_speed == IXGBE_LINK_SPEED_UNKNOWN)
>   return;

The above would be better as:
if ((netdev->flags & IFF_SLAVE) &&
link_speed == IXGBE_LINK_SPEED_UNKNOWN)
return;

Please do not send a series of patches - it just adds needless confusion and is 
a bisect hazard. Just send a single patch with the desired change as a V5.

--
Mark Rustad, Networking Division, Intel Corporation


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: 4.4-rc7 failure report

2015-12-30 Thread Eric Dumazet
On Wed, 2015-12-30 at 12:50 -0500, David Miller wrote:
> From: Eric Dumazet 
> Date: Wed, 30 Dec 2015 11:55:25 -0500
> 
> > On Wed, 2015-12-30 at 10:11 -0500, Dave Jones wrote:
> >> On Wed, Dec 30, 2015 at 10:38:56AM +0100, Daniel Borkmann wrote:
> >> 
> >>  > Given that this drop doesn't strictly need to be caused by filter code,
> >>  > it would be nice if you could pin the location down where the packet 
> >> gets
> >>  > dropped exactly. Perhaps dropwatch or perf with '-e skb:kfree_skb -a -g
> >>  > dhclient ', etc could help to get a first overview to dig into
> >>  > details then.
> >> 
> >> Wild stab in the dark, but..
> >> Could this bug be another symptom fixed by 
> >> http://article.gmane.org/gmane.linux.network/392885 ?
> > 
> > dhclient does not use async io
> 
> But the bug causes requests to "LOOK" like async I/O, right?


This is not how I understood the bug.

By having a bit set (because we lacked a clear of wq->flags), we have :

sock_wake_async()
...
case SOCK_WAKE_WAITD:
if (test_bit(SOCKWQ_ASYNC_WAITDATA, >flags))
break; 

So we never call kill_fasync()




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH iproute2 net-next 1/2] iproute2: ip-route.8.in: Add missing '[' before 'pref'

2015-12-30 Thread Stephen Hemminger
On Fri, 25 Dec 2015 11:12:15 +0800
Hangbin Liu  wrote:

> Signed-off-by: Hangbin Liu 

Both applied, thank  you
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [iproute PATCH v3 2/2] ss: support closing inet sockets via SOCK_DESTROY.

2015-12-30 Thread Stephen Hemminger
On Tue, 22 Dec 2015 17:31:34 +0900
Lorenzo Colitti  wrote:

>  
> +static int kill_inet_sock(const struct sockaddr_nl *addr,
> + struct nlmsghdr *h, void *arg)
> +{
> + struct inet_diag_msg *d = NLMSG_DATA(h);
> + struct inet_diag_arg *diag_arg = arg;
> + struct rtnl_handle *rth = diag_arg->rth;
> + DIAG_REQUEST(req, struct inet_diag_req_v2 r);
> +
> + req.nlh.nlmsg_type = SOCK_DESTROY;
> + req.nlh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK;
> + req.nlh.nlmsg_seq = ++rth->seq;
> + req.r.sdiag_family = d->idiag_family;
> + req.r.sdiag_protocol = diag_arg->protocol;
> + req.r.id = d->id;
> +
> + return rtnl_send_check_ack(rth, , req.nlh.nlmsg_len, 1);

Just use rtnl_talk() instead, it does request/reply.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable

2015-12-30 Thread David Miller
From: Eric Dumazet 
Date: Wed, 30 Dec 2015 14:03:43 -0500

> You sure can disagree with me, but calling my opinion 'incredily
> stupid' is not wise.

I think fundamentally giving facilities less rather than more coverage
is not a smart approach at all.

If the code is that bad that people are discouraged from even using
it, then it should be moved to drivers/staging or removed.  Otherwise
it must work and we must be able to make use of it.

> I understand we would love to do that, but what is the hurry for
> SCTP, that needed rhashtable so desperately that it could not be
> done before 2016 ?

There is no rush, but quite frankly finding people to do serious
work on SCTP is no easy task so I'm hesitant to "defer" someone's
work in this area... :-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: net/sctp: sock memory leak

2015-12-30 Thread Marcelo Ricardo Leitner
On Wed, Dec 30, 2015 at 09:42:27PM +0100, Dmitry Vyukov wrote:
> Hello,
> 
> The following program leads to a leak of two sock objects:

Damn, Dmitry ;-)
If no one takes care of it by then, I'll look into it next week, thanks.

  Marcelo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


memory leak in lapb_create_cb

2015-12-30 Thread Dmitry Vyukov
Hello,

The following program leads to a leak of struct lapb_cb:

// autogenerated by syzkaller (http://github.com/google/syzkaller)
#include 
#include 
#include 
#include 

int main()
{
syscall(SYS_mmap, 0x2000ul, 0x1ul, 0x3ul, 0x32ul,
0xul, 0x0ul);
int fd = syscall(SYS_open, "/dev/ptmx", 0x103002ul, 0x0ul, 0, 0, 0);
*(uint32_t*)0x20001000 = (uint32_t)0x6;
syscall(SYS_ioctl, fd, 0x5423ul, 0x20001000ul, 0, 0, 0);
*(uint32_t*)0x20006ff8 = (uint32_t)0x1;
*(uint8_t*)0x20006ffc = (uint8_t)0x29c;
*(uint8_t*)0x20006ffd = (uint8_t)0x3;
*(uint8_t*)0x20006ffe = (uint8_t)0x4;
*(uint8_t*)0x20006fff = (uint8_t)0x8;
*(uint8_t*)0x20007000 = (uint8_t)0x3;
*(uint8_t*)0x20007001 = (uint8_t)0x1;
syscall(SYS_ioctl, fd, 0x400442c9ul, 0x20006ff8ul, 0, 0, 0);
*(uint8_t*)0x20006f64 = (uint8_t)0xe7;
*(uint8_t*)0x20006f65 = (uint8_t)0xfb49;
*(uint16_t*)0x20006f66 = (uint16_t)0x3;
syscall(SYS_ioctl, fd, 0x4b46ul, 0x20006f64ul, 0, 0, 0);
*(uint16_t*)0x20003000 = (uint16_t)0x8;
*(uint16_t*)0x20003002 = (uint16_t)0x5;
*(uint16_t*)0x20003004 = (uint16_t)0x8;
*(uint16_t*)0x20003006 = (uint16_t)0x3;
*(uint8_t*)0x20003008 = (uint8_t)0x6;
*(uint8_t*)0x20003009 = (uint8_t)0x4b8;
*(uint8_t*)0x2000300a = (uint8_t)0x2;
*(uint8_t*)0x2000300b = (uint8_t)0x1;
*(uint32_t*)0x2000300c = (uint32_t)0x5;
*(uint8_t*)0x20003010 = (uint8_t)0x802;
syscall(SYS_ioctl, fd, 0x5404ul, 0x20003000ul, 0, 0, 0);
return 0;
}


unreferenced object 0x8800633575d0 (size 512):
  comm "softirq", pid 0, jiffies 4299764624 (age 16.395s)
  hex dump (first 32 bytes):
00 01 00 00 00 00 ad de 00 02 00 00 00 00 ad de  
c0 a3 d4 34 00 88 ff ff 00 00 00 00 00 00 00 00  ...4
  backtrace:
[] kmemleak_alloc+0x72/0xc0 mm/kmemleak.c:915
[< inline >] kmemleak_alloc_recursive include/linux/kmemleak.h:47
[< inline >] slab_post_alloc_hook mm/slub.c:1335
[< inline >] slab_alloc_node mm/slub.c:2594
[< inline >] slab_alloc mm/slub.c:2602
[] kmem_cache_alloc_trace+0x138/0x2f0 mm/slub.c:2619
[< inline >] kzalloc include/linux/slab.h:458
[< inline >] lapb_create_cb net/lapb/lapb_iface.c:121
[] lapb_register+0xfa/0x590 net/lapb/lapb_iface.c:158
[< inline >] x25_asy_open drivers/net/wan/x25_asy.c:485
[] x25_asy_open_tty+0x431/0x740
drivers/net/wan/x25_asy.c:573
[] tty_ldisc_open.isra.2+0x78/0xd0
drivers/tty/tty_ldisc.c:447
[] tty_set_ldisc+0x1ca/0xa30 drivers/tty/tty_ldisc.c:567
[< inline >] tiocsetd drivers/tty/tty_io.c:2650
[] tty_ioctl+0xb2a/0x2160 drivers/tty/tty_io.c:2883
[< inline >] vfs_ioctl fs/ioctl.c:43
[] do_vfs_ioctl+0x681/0xe40 fs/ioctl.c:607
[< inline >] SYSC_ioctl fs/ioctl.c:622
[] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:613
[] entry_SYSCALL_64_fastpath+0x16/0x7a
arch/x86/entry/entry_64.S:185
[] 0x


On commit 8513342170278468bac126640a5d2d12ffbff106 (Dec 28).
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable

2015-12-30 Thread Eric Dumazet
On Wed, 2015-12-30 at 15:32 -0200, Marcelo Ricardo Leitner wrote:
> On Wed, Dec 30, 2015 at 12:19:39PM -0500, Eric Dumazet wrote:
> > On Wed, 2015-12-30 at 23:50 +0800, Xin Long wrote:
> > 
> > > besides, this patchset will use transport hashtable to replace
> > > association hashtable to lookup with rhashtable api. get transport
> > > first then get association by t->asoc. and also it will make tcp
> > > style work better.
> > 
> > SCTP already has a hash table, why not simply changing the way items are
> > hashed into it ?
> 
> Because Vlad asked to split the patch so it gets easier to review. The
> direct change was quite big.
> 
> > Sure, storing thousands of sockets in a single hash bucket is not wise.
> > 
> > Switching SCTP to rhashtable at this moment is premature, it is still
> > moving fast.
> 
> Dave and Vlad had asked in the first review for considering using
> rhashtable (ok, Dave didn't mention it by name).  We did, and it seemed
> nice beside 1 issue Xin found, regarding multiple rehashing, which I'll
> highlight in a reply right away. 
> Said all this, I know this was your second email already against this
> usage, but I have to ask, sorry: still really against it?

Well, it seems that Dave is OK to fix all remaining bugs in rhashtable.

I was not aware that 'we' decided to force rhashtable all over the
places, because it looks so sexy and fun.

Let see how funny it will be then.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: New "ip wait" subcommand for iproute2

2015-12-30 Thread Stephen Hemminger
On Mon, 28 Dec 2015 18:47:51 -0500
Nathaniel W Filardo  wrote:

> Hallo netdev@,
> 
> I had occasion to want to programmatically wait for an interface to become
> available from within a shell script, but found there to be no off-the-shelf
> tool for such a thing.  Could this patch be considered for inclusion as part
> of iproute2?  It adds an "ip wait link" subcommand ("link" required in case
> someone wants to add things like "ip wait addr" or somesuch) based quite
> heavily on the ipmonitor.c file.
> 
> For example, one might "ip wait link dev eth0 up" to wait for an interface
> of that name to appear (specifically, for a RTM_NEWLINK message).  "ip wait
> link dev eth0 down" will wait for it to go away (RTM_DELLINK).
> 
> This should be checkpatch clean, but please let me know if I missed
> something.
> 
> Cheers,
> --nwf;
>


Thank you for your contribution, it looks useful.
Could you also update the man page?

> +static int accept_msg(const struct sockaddr_nl *who,
> +   struct rtnl_ctrl_data *ctrl,
> +   struct nlmsghdr *n, void *arg)
> +{
> + int done = 0;
> +
> + if (n->nlmsg_type == RTM_NEWLINK || n->nlmsg_type == RTM_DELLINK) {
> + if (wait_for == n->nlmsg_type
> + && wait_for == RTM_DELLINK
> + && ll_name_to_index(wait_dev) != 0)
> + done = 1;
> +
> + ll_remember_index(who, n, NULL);
> + if (verbose)
> + print_linkinfo(who, n, stdout);
> +
> + if (wait_for == n->nlmsg_type
> + && wait_for == RTM_NEWLINK
> + && ll_name_to_index(wait_dev) != 0)
> + done = 1;
> + }
> + if (done) {
> + fflush(stdout);
> + exit(0);

I don't think you need explicit fflush here. Stdio does it automatically.
Which means all the conditional(done) can be removed.

Have you considered how wait could be used with --batch option
to write a script?




pgpPz1yuXOjv7.pgp
Description: OpenPGP digital signature


Re: [PATCH] wlcore/wl12xx: spi: fix NULL pointer dereference (Oops)

2015-12-30 Thread Uri Mashiach

Hello Kalle Valo,

On 12/30/2015 05:15 PM, Kalle Valo wrote:

Uri Mashiach  writes:


Fix the below Oops when trying to modprobe wlcore_spi.
The oops occurs because the wl1271_power_{off,on}()
function doesn't check the power() function pointer.

[   23.401447] Unable to handle kernel NULL pointer dereference at
virtual address 
[   23.409954] pgd = c0004000
[   23.412922] [] *pgd=
[   23.416693] Internal error: Oops: 8007 [#1] SMP ARM
[   23.422168] Modules linked in: wl12xx wlcore mac80211 cfg80211
musb_dsps musb_hdrc usbcore usb_common snd_soc_simple_card evdev joydev
omap_rng wlcore_spi snd_soc_tlv320aic23_i2c rng_core snd_soc_tlv320aic23
c_can_platform c_can can_dev snd_soc_davinci_mcasp snd_soc_edma
snd_soc_omap omap_wdt musb_am335x cpufreq_dt thermal_sys hwmon
[   23.453253] CPU: 0 PID: 36 Comm: kworker/0:2 Not tainted
4.2.0-2-g951efee-dirty #233
[   23.461720] Hardware name: Generic AM33XX (Flattened Device Tree)
[   23.468123] Workqueue: events request_firmware_work_func
[   23.473690] task: de32efc0 ti: de4ee000 task.ti: de4ee000
[   23.479341] PC is at 0x0
[   23.482112] LR is at wl12xx_set_power_on+0x28/0x124 [wlcore]
[   23.488074] pc : [<>]lr : []psr: 6013
[   23.488074] sp : de4efe50  ip : 0002  fp : 
[   23.500162] r10: de7cdd00  r9 : dc848800  r8 : bf27af00
[   23.505663] r7 : bf27a1a8  r6 : dcbd8a80  r5 : dce0e2e0  r4 :
dce0d2e0
[   23.512536] r3 :   r2 :   r1 : 0001  r0 :
dc848810
[   23.519412] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM
Segment kernel
[   23.527109] Control: 10c5387d  Table: 9cb78019  DAC: 0015
[   23.533160] Process kworker/0:2 (pid: 36, stack limit = 0xde4ee218)
[   23.539760] Stack: (0xde4efe50 to 0xde4f)

[...]

[   23.665030] [] (wl12xx_set_power_on [wlcore]) from
[] (wlcore_nvs_cb+0x118/0xa4c [wlcore])
[   23.675604] [] (wlcore_nvs_cb [wlcore]) from []
(request_firmware_work_func+0x30/0x58)
[   23.685784] [] (request_firmware_work_func) from
[] (process_one_work+0x1b4/0x4b4)
[   23.695591] [] (process_one_work) from []
(worker_thread+0x3c/0x4a4)
[   23.704124] [] (worker_thread) from []
(kthread+0xd4/0xf0)
[   23.711747] [] (kthread) from []
(ret_from_fork+0x14/0x3c)
[   23.719357] Code: bad PC value
[   23.722760] ---[ end trace 981be8510db9b3a9 ]---

Prevent oops by validationg power() pointer value before
calling the function.

Signed-off-by: Uri Mashiach 
Cc: sta...@vger.kernel.org
Acked-by: Igor Grinberg 


Please always provide a changelog when you resend patches, I lost track
what I'm supposed to do with this. Should I apply or drop?


Sorry for not providing a changelog.
The patch should be applied.

--
Thanks,
Uri
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable

2015-12-30 Thread Eric Dumazet
On Wed, 2015-12-30 at 23:50 +0800, Xin Long wrote:

> besides, this patchset will use transport hashtable to replace
> association hashtable to lookup with rhashtable api. get transport
> first then get association by t->asoc. and also it will make tcp
> style work better.

SCTP already has a hash table, why not simply changing the way items are
hashed into it ?

Sure, storing thousands of sockets in a single hash bucket is not wise.

Switching SCTP to rhashtable at this moment is premature, it is still
moving fast.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


net/sctp: sctp_datamsg memory leak

2015-12-30 Thread Dmitry Vyukov
Hello,

The following program leads to leak of multiple objects allocated in
sctp_datamsg_from_user:


// autogenerated by syzkaller (http://github.com/google/syzkaller)
#include 
#include 
#include 
#include 
#include 

long r[50];

int main()
{
memset(r, -1, sizeof(r));
r[0] = syscall(SYS_mmap, 0x2000ul, 0x10ul, 0x3ul,
0x32ul, 0xul, 0x0ul);
r[1] = syscall(SYS_socket, 0x2ul, 0x80801ul, 0x84ul, 0, 0, 0);
memcpy((void*)0x20002f80,
"\x02\x00\x33\xd9\x7f\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00",
128);
r[3] = syscall(SYS_bind, r[1], 0x20002f80ul, 0x80ul, 0, 0, 0);
memcpy((void*)0x20003f80,
"\x02\x00\x33\xd9\x7f\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00",
128);
r[5] = syscall(SYS_connect, r[1], 0x20003f80ul, 0x80ul, 0, 0, 0);
r[6] = syscall(SYS_pread64, r[1], 0x2feeul, 0xe5ul, 0x0ul, 0, 0);
memcpy((void*)0x20003000,
"\xdb\x4c\xcc\xa8\x07\xbd\xaa\x58\x7c\x57\x37\x63\xa1\x4d\xdb\x5b\x85\x4e\x37\x3b\x20\xb3\x12\xef\x9b\x75\xf0\x88\x28\xa5\x43\x8e\x56\x59\x3c\x16\xfd\xa0\x01\x4f\x90\x83\x4c\x1b\x22\x3e\xd4\xea\x36\x6f\xb5\x43\x96\x02\x8e\x82\xa1\xc6\x47\xd7\xeb\x08\x56\x6f\x40\xb6\x00\x3f\x52\x38\x99\x2f\x57\x63\x9b\xe4\x0e\xb2\x59\xb2\x59\xbc\x9d\x46\xd0\x52\xd4\x91\xe8\xee\x7f\xcf\x81\xa0\xd5\x10\xc4\x77\xf6\xa1\xa1\x35\xb3\xeb\xb5\x46\xfe\xbc\x83\x74\x9f\x78\xa4\xf1\x0b\xf2\x3a\x41\xc3\x2d\x78\x32\x3b\x88\xe9\xb7\x9f\x56",
128);
r[8] = syscall(SYS_write, r[1], 0x20003000ul, 0x80ul, 0, 0, 0);
memcpy((void*)0x2000332a,
"\xdf\x9a\x13\x9f\x3d\xc5\xd9\xbb\xba\x6d\x46\xb4\xd9\x55\xc0\x39\x0d\xf7\xd0\x9d\x1b\x2b\x8c\xb7\xb2\x52\x8b\xe9\xb8\x73\x6d\x47\x24\x4e\xa3\x1d\xb9\x31\xf1\xae\xa3\x4f\x0f\xd7\xbb\xad\xa7\x4f\xa9\xa3\x2b\x04\xf7\xa8\x5e\x81\x93\x75\x03\x9d\xec\x9a\x03\xbf\xc5\x6c\xb2\xf3\x8b",
69);
r[10] = syscall(SYS_write, r[1], 0x2000332aul, 0x45ul, 0, 0, 0);
r[11] = syscall(SYS_shutdown, r[1], 0x1ul, 0, 0, 0, 0);
memcpy((void*)0x20001919, "\x2e\x2f\x66\x69\x6c\x65\x30\x00", 8);
memcpy((void*)0x20001000, "\x2e\x2f\x66\x69\x6c\x65\x30\x00", 8);
r[14] = syscall(SYS_rename, 0x20001919ul, 0x20001000ul, 0, 0, 0, 0);
*(uint32_t*)0x200013b2 = r[1];
*(uint16_t*)0x200013b6 = (uint16_t)0x9;
*(uint16_t*)0x200013b8 = (uint16_t)0x8;
*(uint32_t*)0x200013ba = r[1];
*(uint16_t*)0x200013be = (uint16_t)0xe77;
*(uint16_t*)0x200013c0 = (uint16_t)0xa036af6cbe637e9d;
*(uint32_t*)0x200013c2 = r[1];
*(uint16_t*)0x200013c6 = (uint16_t)0x8;
*(uint16_t*)0x200013c8 = (uint16_t)0xf1de;
*(uint64_t*)0x2ff9 = (uint64_t)0x0;
*(uint64_t*)0x20001001 = (uint64_t)0x989680;
*(uint64_t*)0x20001000 = (uint64_t)0x2;
r[27] = syscall(SYS_ppoll, 0x200013b2ul, 0x3ul, 0x2ff9ul,
0x20001000ul, 0x8ul, 0);
*(uint64_t*)0x20001000 = (uint64_t)0x20001d27;
*(uint64_t*)0x20001008 = (uint64_t)0x39;
*(uint64_t*)0x20001010 = (uint64_t)0x20001ffe;
*(uint64_t*)0x20001018 = (uint64_t)0xd9;
*(uint64_t*)0x20001020 = (uint64_t)0x20001323;
*(uint64_t*)0x20001028 = (uint64_t)0xfb;
*(uint64_t*)0x20001030 = (uint64_t)0x2fe3;
*(uint64_t*)0x20001038 = (uint64_t)0x1c;
*(uint64_t*)0x20001040 = (uint64_t)0x20001fc6;
*(uint64_t*)0x20001048 = (uint64_t)0xea;
memcpy((void*)0x20001d27,
"\x5d\x27\xd4\x12\xc2\x99\xce\x3f\x64\x88\x1f\x2f\xb1\xe9\xcb\x5c\x1e\x23\x13\xa1\xbb\x1c\xf0\xb3\x76\xa5\xfd\xf6\x0e\x87\xaf\x9f\x68\x47\xb2\x7a\x2e\xb2\xea\x18\xd6\x2a\x9b\xf5\xce\xaa\x33\x6c\x0a\x2d\xdb\x2b\xf7\x6c\xb5\x38\x31",
57);
memcpy((void*)0x20001ffe,

Re: [PATCH v2] net, socket, socket_wq: fix missing initialization of flags

2015-12-30 Thread David Miller
From: Nicolai Stange 
Date: Tue, 29 Dec 2015 13:29:55 +0100

> Fixes: ceb5d58b2170 ("net: fix sock_wake_async() rcu protection")
> 
> Commit ceb5d58b2170 ("net: fix sock_wake_async() rcu protection") from
> the current 4.4 release cycle introduced a new flags member in
> struct socket_wq and moved SOCKWQ_ASYNC_NOSPACE and SOCKWQ_ASYNC_WAITDATA
> from struct socket's flags member into that new place.
> 
> Unfortunately, the new flags field is never initialized properly, at least
> not for the struct socket_wq instance created in sock_alloc_inode().
> 
> One particular issue I encountered because of this is that my GNU Emacs
> failed to draw anything on my desktop -- i.e. what I got is a transparent
> window, including the title bar. Bisection lead to the commit mentioned
> above and further investigation by means of strace told me that Emacs
> is indeed speaking to my Xorg through an O_ASYNC AF_UNIX socket. This is
> reproducible 100% of times and the fact that properly initializing the
> struct socket_wq ->flags fixes the issue leads me to the conclusion that
> somehow SOCKWQ_ASYNC_WAITDATA got set in the uninitialized ->flags,
> preventing my Emacs from receiving any SIGIO's due to data becoming
> available and it got stuck.
> 
> Make sock_alloc_inode() set the newly created struct socket_wq's ->flags
> member to zero.
> 
> Signed-off-by: Nicolai Stange 

Applied, but please in the future please put the Fixes: tag right
above the first signoff/ack, like this:

Fixes: ceb5d58b2170 ("net: fix sock_wake_async() rcu protection")
Signed-off-by: Nicolai Stange 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 net] sctp: sctp should release assoc when sctp_make_abort_user return NULL in sctp_close

2015-12-30 Thread David Miller
From: Xin Long 
Date: Tue, 29 Dec 2015 17:49:25 +0800

> In sctp_close, sctp_make_abort_user may return NULL because of memory
> allocation failure. If this happens, it will bypass any state change
> and never free the assoc. The assoc has no chance to be freed and it
> will be kept in memory with the state it had even after the socket is
> closed by sctp_close().
> 
> So if sctp_make_abort_user fails to allocate memory, we should abort
> the asoc via sctp_primitive_ABORT as well. Just like the annotation in
> sctp_sf_cookie_wait_prm_abort and sctp_sf_do_9_1_prm_abort said,
> "Even if we can't send the ABORT due to low memory delete the TCB.
> This is a departure from our typical NOMEM handling".
> 
> But then the chunk is NULL (low memory) and the SCTP_CMD_REPLY cmd would
> dereference the chunk pointer, and system crash. So we should add
> SCTP_CMD_REPLY cmd only when the chunk is not NULL, just like other
> places where it adds SCTP_CMD_REPLY cmd.
> 
> Signed-off-by: Xin Long 
> Acked-by: Marcelo Ricardo Leitner 

Applied and queued up for -stable, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable

2015-12-30 Thread Marcelo Ricardo Leitner

Em 30-12-2015 19:57, Eric Dumazet escreveu:

On Wed, 2015-12-30 at 15:44 -0500, David Miller wrote:


It is more fun than waiting longer for the more limited uses of it to
trigger problems.

I cannot be convinced that using it in more places in order to find
and fix more bugs is a bad thing.

I'm sorry if a lot of bug fixes in a short period of time concerns
you, but for me that's an even clearer sign that it needs help, and
exposing it to more use cases is one of the best forms of help it can
get.

It also tells me that the people actually working on those fixes, such
as Herbert Xu, are motivated and reliable when they are shown properly
formed bug reports.

I cannot think of a report Herbert and others did not resolve in a
timely manner.  They usually add test cases too.



I have no doubts we can fix bugs in upstream kernels in a few days (at
most).

The problem is when a customer is stuck using a distro, with a release
cycle of extra months after upstream fixes.


If one takes extra months to have a fix delivered to a customer, they 
probably are also months late on security fixes as well, right? That 
would be pretty scary by itself already.



I had to deal with customers having issues with resolvers hitting the
netlink/rhashtable bugs, and I can tell you it was not pretty nor funny.

Seeing all these SCTP bugs being currently tracked/fixed (reports from
Dmitry Vyukov), I am concerned about having to backport fixes into old
kernels without proper rhashtable if now SCTP relies heavily on
rhashtable.


This happens with every major change in the kernel. Try backporting 
vxlan fixes to an older kernel, for example, to one without ip_tunnel.


Can't say about the future, but so far none of those bugs were related 
to the hash that we want to replace and they were all small/contained 
patches.


And at least for now, we are not adding new stuff which relies on this 
new hash. It's on a central part of sctp, yes, but somewhat contained. 
Like what happened with vxlan/ip_tunnel, which ended up growing together.



Hopefully nothing bad will happen.


+1 :)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: Fix potential NULL pointer dereference in __skb_try_recv_datagram

2015-12-30 Thread Cong Wang
On Wed, Dec 30, 2015 at 6:30 AM, Jacob Siverskog
 wrote:
> On Wed, Dec 30, 2015 at 2:26 PM, Eric Dumazet  wrote:
>> How often can you trigger this bug ?
>
> Ok. I don't have a good repro to trigger it unfortunately, I've seen it just a
> few times when bringing up/down network interfaces. Does the trace
> give any clue?
>

A little bit. You need to help people to narrow down the problem
because there are too many places using skb->next and skb->prev.

Since you mentioned it seems related to network interface flip,
what network interfaces are you using? What's is your TC setup?

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] udp: properly support MSG_PEEK with truncated buffers

2015-12-30 Thread Herbert Xu
On Wed, Dec 30, 2015 at 08:51:12AM -0500, Eric Dumazet wrote:
> From: Eric Dumazet 
> 
> Backport of this upstream commit into stable kernels :
> 89c22d8c3b27 ("net: Fix skb csum races when peeking")
> exposed a bug in udp stack vs MSG_PEEK support, when user provides
> a buffer smaller than skb payload.
> 
> In this case,
> skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr),
>  msg->msg_iov);
> returns -EFAULT.
> 
> This bug does not happen in upstream kernels since Al Viro did a great
> job to replace this into :
> skb_copy_and_csum_datagram_msg(skb, sizeof(struct udphdr), msg);
> This variant is safe vs short buffers.
> 
> For the time being, instead reverting Herbert Xu patch and add back
> skb->ip_summed invalid changes, simply store the result of
> udp_lib_checksum_complete() so that we avoid computing the checksum a
> second time, and avoid the problematic
> skb_copy_and_csum_datagram_iovec() call.
> 
> This patch can be applied on recent kernels as it avoids a double
> checksumming, then backported to stable kernels as a bug fix.
> 
> Signed-off-by: Eric Dumazet 

Acked-by: Herbert Xu 
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v3 00/10] be2net: patch set

2015-12-30 Thread David Miller
From: Sathya Perla 
Date: Wed, 30 Dec 2015 01:28:55 -0500

> The following patch set contains some feature additions, code
> re-organization and cleanup and a few non-critical fixes. Pls
> consider applying this to the net-next tree. Thanks.

Series applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: pull-request: wireless-drivers 2015-12-28

2015-12-30 Thread David Miller
From: Kalle Valo 
Date: Mon, 28 Dec 2015 13:47:43 +0200

> here's one more pull request, a bit late due to holidays but I hope this
> still makes it to 4.4. Just two small fixes to iwlwifi, nothing else.

Pulled, thanks Kalle.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 0/5] sctp: use transport hashtable to replace association's with rhashtable

2015-12-30 Thread Eric Dumazet
On Wed, 2015-12-30 at 15:44 -0500, David Miller wrote:

> It is more fun than waiting longer for the more limited uses of it to
> trigger problems.
> 
> I cannot be convinced that using it in more places in order to find
> and fix more bugs is a bad thing.
> 
> I'm sorry if a lot of bug fixes in a short period of time concerns
> you, but for me that's an even clearer sign that it needs help, and
> exposing it to more use cases is one of the best forms of help it can
> get.
> 
> It also tells me that the people actually working on those fixes, such
> as Herbert Xu, are motivated and reliable when they are shown properly
> formed bug reports.
> 
> I cannot think of a report Herbert and others did not resolve in a
> timely manner.  They usually add test cases too.


I have no doubts we can fix bugs in upstream kernels in a few days (at
most).

The problem is when a customer is stuck using a distro, with a release
cycle of extra months after upstream fixes.

I had to deal with customers having issues with resolvers hitting the
netlink/rhashtable bugs, and I can tell you it was not pretty nor funny.

Seeing all these SCTP bugs being currently tracked/fixed (reports from
Dmitry Vyukov), I am concerned about having to backport fixes into old
kernels without proper rhashtable if now SCTP relies heavily on
rhashtable.

Hopefully nothing bad will happen.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 4.4-rc7 failure report

2015-12-30 Thread Doug Ledford
On 12/29/2015 11:16 PM, Alexei Starovoitov wrote:
> On Tue, Dec 29, 2015 at 10:44:31PM -0500, Doug Ledford wrote:
>> On 12/29/2015 10:43 PM, Alexei Starovoitov wrote:
>>> On Mon, Dec 28, 2015 at 08:26:44PM -0500, Doug Ledford wrote:
 On 12/28/2015 05:20 PM, Daniel Borkmann wrote:
> On 12/28/2015 10:53 PM, Doug Ledford wrote:
>> The 4.4-rc7 kernel is failing for me.  In my case, all of my vlan
>> interfaces are failing to obtain a dhcp address using dhclient.  I've
>> tried a hand built 4.4-rc7, and the Fedora rawhide 4.4-rc7 kernel, both
>> failed.  I've tried NetworkManager and the old SysV network service,
>> both fail.  I tried a working dhclient from rhel7 on the Fedora rawhide
>> install and it failed too.  Running tcpdump on the interface shows the
>> dhcp request going out, and a dhcp response coming back in.  Running
>> strace on dhclient shows that it writes the dhcp request, but it never
>> recvs a dhcp response.  If I manually bring the interface up with a
>> static IP address then I'm able to run typical IP traffic across the
>> link (aka, ping).  It would seem that when dhclient registers a packet
>> filter on the socket, that filter is preventing it from ever getting the
>> dhcp response.  The same dhclient works on any non-vlan interfaces in
>> the system, so the filter must work for non-vlan interfaces.  Aside from
>> the fact that the interface is a vlan, we also use a priority egress map
>> on the interface, and we use PFC flow control.  Let me know if you need
>> anymore to debug the issue, or email me off list and I can get you
>> logins to my reproducer machines.
>
> When you say 4.4-rc7 kernel is failing for you, what latest kernel version
> was working, where the socket filter was properly receiving the response 
> on
> your vlan iface?

 v4.3 final works.  I haven't bisected where in the 4.4 series it quits
 working.  I can do that tomorrow.
>>>
>>> I've tried to reproduce, but cannot seem to make dnsmasq work properly
>>> over vlan, so bisect would be great.
>>>
>>
>> Yeah, I've been working on it.  Issues with available machines that
>> reproduce combined with what hardware they have and whether or not that
>> hardware works at various steps in the bisection :-/
> 
> I've looked through all bpf related commits between v4.3..HEAD and don't see
> anything suspicious. Could it be that your setup exploited a bug that was 
> fixed by 
> 28f9ee22bcdd ("vlan: Do not put vlan headers back on bridge and macvlan 
> ports")
> 
> Could you also provide more details on vlan+dhcp setup to help narrow it
> down if bisect is taking too long.
> 

My bisection got down to the last few steps and just didn't make sense.
 So, I ended up starting it over.  I'm not sure how/why I saw that v4.3
worked the first time around, but the second time around it failed.  So
I also tried a pre-made 4.2.8-300 kernel from Fedora 23 and it failed as
well.  The problem at least spans 4.2 through 4.4, so it's been a while.
 I'll continue searching more kernels tomorrow, but I've been doing this
while I still have company in town for the holidays so I'm gonna go be
with them when I'm done writing this.

I've recently made some changes to my network setup here, so that might
be related to why I'm seeing it now.  I'll provide details on my test
setup in case any of it helps people on this:

Ethernet network is used for RDMA testing.  Switches are Mellanox 56GigE
switches.  The ports with multiple vlans are all set in hybrid mode,
untagged frames to vlan 40, tagged frames for vlans 43 and 45 allowed.
Switch has DCB enabled, priority 5 is no-drop, ports are set to use PFC
and MTU 9216 and LLDP is enabled on the ports as well.

The head node of the cluster runs dhcpd on the vlans (as well as the
InfiniBand ports).  The test machine has a static IP address configured
for each port/vlan in the server's config.

On the client, I've set the base interface to dhcp, vlan 43 to static IP
assignment, and vlan 45 to dhcp.  This allows me to see at a glance if
things are working since I know if the base device gets an IP and vlan
45 doesn't and instead times out and goes away, then the dhcp on the
vlan failed.  (I needed to set one vlan to static so the vlan creation
didn't depend on dhcp success because with some kernel versions and some
hardware types, namely mlx5, vlans weren't working at all and you could
mistake no vlans made for a problem with dhcp when it was really a
problem with vlans on mlx5 hardware).

This is the failing device's config:

[root@rdma-perf-00 ~]$ more
/etc/sysconfig/network-scripts/ifcfg-mlx4_roce.45
DEVICE=mlx4_roce.45
VLAN=yes
VLAN_ID=45
REORDER_HDR=0
VLAN_EGRESS_PRIORITY_MAP=0:5,1:5,2:5,3:5,4:5,5:5,6:5,7:5
TYPE=Vlan
ONBOOT=yes
BOOTPROTO=dhcp
DEFROUTE=no
PEERDNS=no
PEERROUTES=yes
IPV4_FAILURE_FATAL=yes
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=no
IPV6_PEERDNS=no
IPV6_PEERROUTES=yes

Re: [PATCH] igb: When GbE link up, wait for Remote receiver status condition.

2015-12-30 Thread Jeff Kirsher
On Wed, Dec 30, 2015 at 4:09 AM, Takuma Ueba  wrote:
> I210 device IPv6 autoconf test sometimes fails,
> because DAD NS for link-local is not transmitted.
> This packet is silently dropped.
> This problem is seen only GbE environment.
>
> igb_watchdog_task link up detection continues to the following process.
> The following cases are observed:
> 1.PHY 1000BASE-T Status Register Remote receiver status bit is NG.
> (NG status becomes OK after about 200 - 700ms)
> 2.In this case, the transfer packet is silently dropped.
>
> 1000BASE-T Status register
> [Expected]: 0x3800 or 0x7800
> [problem occurred]: 0x2800 or 0x6800
> Frequency of occurrence: approx 1/10 - 1/40 observed
>
> In order to avoid this problem,
> wait until 1000BASE-T Status register "Remote receiver status OK"
>
> After applying this patch, at least 400 runs succeed with no problems.
>
> Signed-off-by: Takuma Ueba 
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 20 
>  1 file changed, 20 insertions(+)

Please send this to intel-wired-...@lists.osuosl.org mailing list, or
at least CC the list since all Wired Ethernet Intel driver patches are
handled through that list.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V5] ixgbe: synchronize link_up and link_speed of a slave

2015-12-30 Thread zyjzyj2000

Hi, all

Thanks for the suggestions from Rustad, Mark D.
According to his suggestions, the logs and source code are simplified.

Zhu Yanjun
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] ixgbe: synchronize link_up and link_speed of a slave interface

2015-12-30 Thread zyjzyj2000
From: Zhu Yanjun 

According to the suggestions from Rustad, Mark D, to all the slave 
interfaces, the link_speed and link_up should be synchronized since
the time span between link_up and link_speed will make some virtual
NICs not work well, such as a bonding driver in 802.3ad mode.

Signed-off-by: Zhu Yanjun 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index aed8d02..fc461b9 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -6426,6 +6426,11 @@ static void ixgbe_watchdog_link_is_up(struct 
ixgbe_adapter *adapter)
if (netif_carrier_ok(netdev))
return;
 
+   /* For all slave interfaces, wait for the link_speed to be known. */
+   if ((netdev->flags & IFF_SLAVE) &&
+   (link_speed == IXGBE_LINK_SPEED_UNKNOWN))
+   return;
+
adapter->flags2 &= ~IXGBE_FLAG2_SEARCH_FOR_SFP;
 
switch (hw->mac.type) {
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V6 1/1] ixgbe: synchronize link_up and link_speed of a slave interface

2015-12-30 Thread zyjzyj2000
From: Zhu Yanjun 

According to the suggestions from Rustad, Mark D, to all the slave 
interfaces, the link_speed and link_up should be synchronized since
the time span between link_up and link_speed will make some virtual
NICs not work well, such as a bonding driver in 802.3ad mode.

Signed-off-by: Zhu Yanjun 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index aed8d02..fc461b9 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -6426,6 +6426,11 @@ static void ixgbe_watchdog_link_is_up(struct 
ixgbe_adapter *adapter)
if (netif_carrier_ok(netdev))
return;
 
+   /* For all slave interfaces, wait for the link_speed to be known. */
+   if ((netdev->flags & IFF_SLAVE) &&
+   (link_speed == IXGBE_LINK_SPEED_UNKNOWN))
+   return;
+
adapter->flags2 &= ~IXGBE_FLAG2_SEARCH_FOR_SFP;
 
switch (hw->mac.type) {
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V6] ixgbe: synchronize link_up and link_speed of a slave

2015-12-30 Thread zyjzyj2000

Hi, all

Thanks for the reply from Jeff.

V2: Based on feedback from Jeff Kirsher, it is not appropriate to continue
in the function ixgbe_watchdog_link_is_up without link_speed since this will
make some virtual NICs not work well.

V3: Based on feedback from Emil Tantilov, the time span between link_up 
and link_speed is not important when the X540 NIC acts as an independent 
interface. 

V4: According to Rustad, Mark D, maybe it is related with copper phy. To make
fiber phy more robust, synchronize both the link_up and link_speed of a 
slave
interface in ixgbe driver.

V5: Based on feedback from Rustad, Mark D, simplify code comment and if 
statement 
to only test for IFF_SLAVE flag and unknown link speed.

V6: Based on feedback from Jeff Kirsher, the patch format is adjusted 
and change log of this patch are added.

Best Regards!
Zhu Yanjun
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


net/nfc: user-controllable kmalloc size in nfc_llcp_send_ui_frame

2015-12-30 Thread Dmitry Vyukov
Hello,

The following program triggers WARNING In kmalloc:


[ cut here ]
WARNING: CPU: 2 PID: 6754 at mm/page_alloc.c:2989
__alloc_pages_nodemask+0x771/0x15f0()
Modules linked in:
CPU: 2 PID: 6754 Comm: a.out Not tainted 4.4.0-rc7+ #181
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  88006275f5e0 8289d9dd 
 8800621c8000 85dbab40 88006275f620 812ebbb9
 815fc6b1 85dbab40 0bad 88006275f8a8
Call Trace:
 [< inline >] __dump_stack lib/dump_stack.c:15
 [] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
 [] warn_slowpath_common+0xd9/0x140 kernel/panic.c:460
 [] warn_slowpath_null+0x29/0x30 kernel/panic.c:493
 [< inline >] __alloc_pages_slowpath mm/page_alloc.c:2989
 [] __alloc_pages_nodemask+0x771/0x15f0 mm/page_alloc.c:3235
 [] alloc_pages_current+0xee/0x340 mm/mempolicy.c:2055
 [< inline >] alloc_pages include/linux/gfp.h:451
 [] alloc_kmem_pages+0x16/0xf0 mm/page_alloc.c:3414
 [] kmalloc_order+0x1f/0x80 mm/slab_common.c:1007
 [] kmalloc_order_trace+0x1f/0x140 mm/slab_common.c:1018
 [< inline >] kmalloc_large include/linux/slab.h:390
 [] __kmalloc+0x2de/0x330 mm/slub.c:3555
 [< inline >] kmalloc include/linux/slab.h:463
 [< inline >] kzalloc include/linux/slab.h:602
 [] nfc_llcp_send_ui_frame+0xdc/0x3d0
net/nfc/llcp_commands.c:732
 [] llcp_sock_sendmsg+0x250/0x310 net/nfc/llcp_sock.c:782
 [< inline >] sock_sendmsg_nosec net/socket.c:610
 [] sock_sendmsg+0xca/0x110 net/socket.c:620
 [] ___sys_sendmsg+0x72a/0x840 net/socket.c:1946
 [] __sys_sendmsg+0xce/0x170 net/socket.c:1980
 [< inline >] SYSC_sendmsg net/socket.c:1991
 [] SyS_sendmsg+0x2d/0x50 net/socket.c:1987
 [] entry_SYSCALL_64_fastpath+0x16/0x7a
arch/x86/entry/entry_64.S:185
---[ end trace 62962d1ed2b9f41a ]---


// autogenerated by syzkaller (http://github.com/google/syzkaller)
#include 
#include 
#include 
#include 

long r[68];

int main()
{
memset(r, -1, sizeof(r));
r[0] = syscall(SYS_mmap, 0x2000ul, 0x2ul, 0x3ul,
0x32ul, 0xul, 0x0ul);
r[1] = syscall(SYS_socket, 0x27ul, 0x2ul, 0x1ul, 0, 0, 0);
*(uint16_t*)0x2000cfa0 = (uint16_t)0x27;
*(uint32_t*)0x2000cfa4 = (uint32_t)0x1;
*(uint32_t*)0x2000cfa8 = (uint32_t)0x8;
*(uint32_t*)0x2000cfac = (uint32_t)0x7;
*(uint8_t*)0x2000cfb0 = (uint8_t)0x0;
*(uint8_t*)0x2000cfb1 = (uint8_t)0x38;
*(uint8_t*)0x2000cfb2 = (uint8_t)0x6;
*(uint8_t*)0x2000cfb3 = (uint8_t)0x0;
*(uint32_t*)0x2000cfb4 = (uint32_t)0x9;
*(uint32_t*)0x2000cfb8 = (uint32_t)0x7;
*(uint32_t*)0x2000cfbc = (uint32_t)0x9;
*(uint32_t*)0x2000cfc0 = (uint32_t)0xfff7;
*(uint32_t*)0x2000cfc4 = (uint32_t)0x8;
*(uint32_t*)0x2000cfc8 = (uint32_t)0xcf77;
*(uint32_t*)0x2000cfcc = (uint32_t)0x39;
*(uint32_t*)0x2000cfd0 = (uint32_t)0x6;
*(uint32_t*)0x2000cfd4 = (uint32_t)0x8;
*(uint32_t*)0x2000cfd8 = (uint32_t)0x4;
*(uint32_t*)0x2000cfdc = (uint32_t)0x4b;
*(uint32_t*)0x2000cfe0 = (uint32_t)0x9;
*(uint32_t*)0x2000cfe4 = (uint32_t)0x5;
*(uint32_t*)0x2000cfe8 = (uint32_t)0x4;
*(uint32_t*)0x2000cfec = (uint32_t)0x7;
*(uint8_t*)0x2000cff0 = (uint8_t)0xfffd;
*(uint64_t*)0x2000cff8 = (uint64_t)0x8;
r[27] = syscall(SYS_bind, r[1], 0x2000cfa0ul, 0x60ul, 0, 0, 0);
*(uint64_t*)0x20014fc8 = (uint64_t)0x20014000;
*(uint32_t*)0x20014fd0 = (uint32_t)0x60;
*(uint64_t*)0x20014fd8 = (uint64_t)0x20014000;
*(uint64_t*)0x20014fe0 = (uint64_t)0x1;
*(uint64_t*)0x20014fe8 = (uint64_t)0x20014000;
*(uint64_t*)0x20014ff0 = (uint64_t)0x11;
*(uint32_t*)0x20014ff8 = (uint32_t)0x0;
*(uint16_t*)0x20014000 = (uint16_t)0x27;
*(uint32_t*)0x20014004 = (uint32_t)0x3;
*(uint32_t*)0x20014008 = (uint32_t)0x0;
*(uint32_t*)0x2001400c = (uint32_t)0x0;
*(uint8_t*)0x20014010 = (uint8_t)0x2;
*(uint8_t*)0x20014011 = (uint8_t)0x52;
*(uint8_t*)0x20014012 = (uint8_t)0x7;
*(uint8_t*)0x20014013 = (uint8_t)0x2;
*(uint32_t*)0x20014014 = (uint32_t)0x3;
*(uint32_t*)0x20014018 = (uint32_t)0x8;
*(uint32_t*)0x2001401c = (uint32_t)0x9;
*(uint32_t*)0x20014020 = (uint32_t)0xde4;
*(uint32_t*)0x20014024 = (uint32_t)0x8;
*(uint32_t*)0x20014028 = (uint32_t)0x6;
*(uint32_t*)0x2001402c = (uint32_t)0x6850;
*(uint32_t*)0x20014030 = (uint32_t)0x24;
*(uint32_t*)0x20014034 = (uint32_t)0x0;
*(uint32_t*)0x20014038 = (uint32_t)0xffe4;
*(uint32_t*)0x2001403c = (uint32_t)0x6;
*(uint32_t*)0x20014040 = (uint32_t)0x4e;
*(uint32_t*)0x20014044 = (uint32_t)0x6;
*(uint32_t*)0x20014048 = (uint32_t)0xf14c;

Re: [PATCH] net: Fix potential NULL pointer dereference in __skb_try_recv_datagram

2015-12-30 Thread Jacob Siverskog
On Tue, Dec 29, 2015 at 9:08 PM, David Miller  wrote:
> From: Rainer Weikusat 
> Date: Tue, 29 Dec 2015 19:42:36 +
>
>> Jacob Siverskog  writes:
>>> This should fix a NULL pointer dereference I encountered (dump
>>> below). Since __skb_unlink is called while walking,
>>> skb_queue_walk_safe should be used.
>>
>> The code in question is:
>  ...
>> __skb_unlink is only called prior to returning from the function.
>> Consequently, it won't affect the skb_queue_walk code.
>
> Agreed, this patch doesn't fix anything.

Ok. Thanks for your feedback. How do you believe the issue could be
solved? Investigating it gives:

static inline void __skb_unlink(struct sk_buff *skb, struct sk_buff_head *list)
{
struct sk_buff *next, *prev;

list->qlen--;
 51c: e2433001 sub r3, r3, #1
 520: e58b3074 str r3, [fp, #116] ; 0x74
next   = skb->next;
prev   = skb->prev;
 524: e894000c ldm r4, {r2, r3}
skb->next  = skb->prev = NULL;
 528: e5841000 str r1, [r4]
 52c: e5841004 str r1, [r4, #4]
next->prev = prev;
 530: e5823004 str r3, [r2, #4]  <--
trapping instruction (r2 NULL)

Register contents:
r7 : c58cfe1c  r6 : c06351d0  r5 : c77810ac  r4 : c583eac0
r3 :   r2 :   r1 :   r0 : 2013

If I understand this correctly, then r4 = skb, r2 = next, r3 = prev.

Should there be a check for this in __skb_try_recv_datagram?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] include/uapi/linux/sockios.h: mark SIOCRTMSG unused

2015-12-30 Thread Heinrich Schuchardt
On 12/30/2015 11:56 AM, Michael Kerrisk (man-pages) wrote:
> Hi Heinrich,
> 
> On 12/29/2015 11:22 PM, Heinrich Schuchardt wrote:
>> IOCTL SIOCRTMSG does nothing but return EINVAL.
>>
>> So comment it as unused.
> 
> Can you say something about how you confirmed this?
> It's not immediately obvious from the code.
> 
> Cheers,
> 
> Michael

grep -GHrn SIOCRTMSG

SIOCRTMSG is only used in:
* net/ipv4/af_inet.c
* include/uapi/linux/sockios.h

inet_ioctl calls ip_rt_ioctl.

ip_rt_ioctl only handles SIOCADDRT and SIOCDELRT and returns -EINVAL
otherwise.

cf.
http://lkml.iu.edu/hypermail/linux/kernel/0911.0/02636.html

Best regards

Heinrich

> 
> 
>> Signed-off-by: Heinrich Schuchardt 
>> ---
>>  include/uapi/linux/sockios.h | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/include/uapi/linux/sockios.h b/include/uapi/linux/sockios.h
>> index e888b1a..8e7890b 100644
>> --- a/include/uapi/linux/sockios.h
>> +++ b/include/uapi/linux/sockios.h
>> @@ -27,7 +27,7 @@
>>  /* Routing table calls. */
>>  #define SIOCADDRT   0x890B  /* add routing table entry  */
>>  #define SIOCDELRT   0x890C  /* delete routing table entry   */
>> -#define SIOCRTMSG   0x890D  /* call to routing system   */
>> +#define SIOCRTMSG   0x890D  /* unused   */
>>  
>>  /* Socket configuration controls. */
>>  #define SIOCGIFNAME 0x8910  /* get iface name   */
>>
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Linux 4.4-rc4 regression, bisected to "net: fix sock_wake_async() rcu protection"

2015-12-30 Thread Andy Lutomirski
On recent v4.4-rc releases, I can't run emacs.  No, really, running
"emacs" in a GNOME 3 session makes gnome-shell think that emacs is
running, but no window is drawn, and the overall system UI is a bit
weird when the invisible emacs window is focused.

This is 100% reproducible.

There might be other symptoms involving gdb malfunctioning, but those
are, at best, sporadic.  The emacs failure is entirely reliable.  I
have no idea what the underlying failure mode is, but failure to wake
a socket waiter seems plausible,  I also have no idea why oocalc,
gimp, vim, gedit, firefox, etc aren't affected.

A somewhat unorthodox "git bisect" run blames:

commit ceb5d58b217098a657f3850b7a2640f995032e62
Author: Eric Dumazet 
Date:   Sun Nov 29 20:03:11 2015 -0800

net: fix sock_wake_async() rcu protection

I've confirmed that v4.4-rc7 with that patch reverted works fine.

Since the offending commit was apparently a security fix, simply
reverting it might not be the best idea.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] unix: properly account for FDs passed over unix sockets

2015-12-30 Thread Willy Tarreau
On Wed, Dec 30, 2015 at 09:58:42AM +0100, Hannes Frederic Sowa wrote:
> The MSG_PEEK code should not be harmful and the patch is good as is. I 
> first understood from the published private thread, that it is possible 
> for a program to exceed the rlimit of fds. But the DoS is only by 
> keeping the fds in flight and not attaching them to any program.

Exactly. The real issue is when these FDs become very expensive such as
pipes full of data.

> __alloc_fd, called on the receiver side, does check for the rlimit 
> maximum anyway, so I don't see a loophole anymore:
> 
> Acked-by: Hannes Frederic Sowa 

Thanks!

> Another idea would be to add the amount of memory used to manage the fds 
> to sock_rmem/wmem but I don't see any advantages or disadvantages.

Compared to the impact of the pending data in pipes themselves in flight,
this would remain fairly minimal.

Thanks,
Willy

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] iwlegacy: 4965-mac: constify il_sensitivity_ranges structure

2015-12-30 Thread Julia Lawall
The il_sensitivity_ranges is never modified, so declare it as const.

Done with the help of Coccinelle.

Signed-off-by: Julia Lawall 

---
 drivers/net/wireless/intel/iwlegacy/4965-mac.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/intel/iwlegacy/4965-mac.c 
b/drivers/net/wireless/intel/iwlegacy/4965-mac.c
index 6656215..fd38aa0 100644
--- a/drivers/net/wireless/intel/iwlegacy/4965-mac.c
+++ b/drivers/net/wireless/intel/iwlegacy/4965-mac.c
@@ -6416,7 +6416,7 @@ il4965_hw_detect(struct il_priv *il)
D_INFO("HW Revision ID = 0x%X\n", il->rev_id);
 }
 
-static struct il_sensitivity_ranges il4965_sensitivity = {
+static const struct il_sensitivity_ranges il4965_sensitivity = {
.min_nrg_cck = 97,
.max_nrg_cck = 0,   /* not used, set to 0 */
 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] include/uapi/linux/sockios.h: mark SIOCRTMSG unused

2015-12-30 Thread Michael Kerrisk (man-pages)
Hi Heinrich,

On 12/29/2015 11:22 PM, Heinrich Schuchardt wrote:
> IOCTL SIOCRTMSG does nothing but return EINVAL.
> 
> So comment it as unused.

Can you say something about how you confirmed this?
It's not immediately obvious from the code.

Cheers,

Michael


> Signed-off-by: Heinrich Schuchardt 
> ---
>  include/uapi/linux/sockios.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/uapi/linux/sockios.h b/include/uapi/linux/sockios.h
> index e888b1a..8e7890b 100644
> --- a/include/uapi/linux/sockios.h
> +++ b/include/uapi/linux/sockios.h
> @@ -27,7 +27,7 @@
>  /* Routing table calls. */
>  #define SIOCADDRT0x890B  /* add routing table entry  */
>  #define SIOCDELRT0x890C  /* delete routing table entry   */
> -#define SIOCRTMSG0x890D  /* call to routing system   */
> +#define SIOCRTMSG0x890D  /* unused   */
>  
>  /* Socket configuration controls. */
>  #define SIOCGIFNAME  0x8910  /* get iface name   */
> 


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux 4.4-rc4 regression, bisected to "net: fix sock_wake_async() rcu protection"

2015-12-30 Thread Nicolai Stange
Andy Lutomirski  writes:

> On recent v4.4-rc releases, I can't run emacs.  No, really, running
> "emacs" in a GNOME 3 session makes gnome-shell think that emacs is
> running, but no window is drawn, and the overall system UI is a bit
> weird when the invisible emacs window is focused.
>
> This is 100% reproducible.
>
> There might be other symptoms involving gdb malfunctioning, but those
> are, at best, sporadic.  The emacs failure is entirely reliable.  I
> have no idea what the underlying failure mode is, but failure to wake
> a socket waiter seems plausible,  I also have no idea why oocalc,
> gimp, vim, gedit, firefox, etc aren't affected.
>
> A somewhat unorthodox "git bisect" run blames:
>
> commit ceb5d58b217098a657f3850b7a2640f995032e62
> Author: Eric Dumazet 
> Date:   Sun Nov 29 20:03:11 2015 -0800
>
> net: fix sock_wake_async() rcu protection
>
> I've confirmed that v4.4-rc7 with that patch reverted works fine.
>
> Since the offending commit was apparently a security fix, simply
> reverting it might not be the best idea.

Please have a look at https://lkml.kernel.org/g/87ege73bma@gmail.com

I ran into the same issue and this one fixes it for me.

Best,

Nicolai
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] unix: properly account for FDs passed over unix sockets

2015-12-30 Thread Hannes Frederic Sowa

On 29.12.2015 21:35, Willy Tarreau wrote:

On Tue, Dec 29, 2015 at 03:48:45PM +0100, Hannes Frederic Sowa wrote:

On 28.12.2015 15:14, Willy Tarreau wrote:

It is possible for a process to allocate and accumulate far more FDs than
the process' limit by sending them over a unix socket then closing them
to keep the process' fd count low.

This change addresses this problem by keeping track of the number of FDs
in flight per user and preventing non-privileged processes from having
more FDs in flight than their configured FD limit.

Reported-by: socketp...@gmail.com
Suggested-by: Linus Torvalds 
Signed-off-by: Willy Tarreau 


Thanks for the patch!

I think this does not close the DoS attack completely as we duplicate
fds if the reader uses MSG_PEEK on the unix domain socket and thus
clones the fd. Have I overlooked something?


I didn't know this behaviour. However, then the fd remains in flight, right ?
So as long as it's not removed from the queue, the sender cannot add more
than its FD limit. I may be missing something obvious though :-/


Yes, it remains in flight.

The MSG_PEEK code should not be harmful and the patch is good as is. I 
first understood from the published private thread, that it is possible 
for a program to exceed the rlimit of fds. But the DoS is only by 
keeping the fds in flight and not attaching them to any program.


__alloc_fd, called on the receiver side, does check for the rlimit 
maximum anyway, so I don't see a loophole anymore:


Acked-by: Hannes Frederic Sowa 

Another idea would be to add the amount of memory used to manage the fds 
to sock_rmem/wmem but I don't see any advantages or disadvantages.


Thanks!

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 4.4-rc7 failure report

2015-12-30 Thread Daniel Borkmann

On 12/30/2015 05:16 AM, Alexei Starovoitov wrote:

On Tue, Dec 29, 2015 at 10:44:31PM -0500, Doug Ledford wrote:

On 12/29/2015 10:43 PM, Alexei Starovoitov wrote:

On Mon, Dec 28, 2015 at 08:26:44PM -0500, Doug Ledford wrote:

On 12/28/2015 05:20 PM, Daniel Borkmann wrote:

On 12/28/2015 10:53 PM, Doug Ledford wrote:

The 4.4-rc7 kernel is failing for me.  In my case, all of my vlan
interfaces are failing to obtain a dhcp address using dhclient.  I've
tried a hand built 4.4-rc7, and the Fedora rawhide 4.4-rc7 kernel, both
failed.  I've tried NetworkManager and the old SysV network service,
both fail.  I tried a working dhclient from rhel7 on the Fedora rawhide
install and it failed too.  Running tcpdump on the interface shows the
dhcp request going out, and a dhcp response coming back in.  Running
strace on dhclient shows that it writes the dhcp request, but it never
recvs a dhcp response.  If I manually bring the interface up with a
static IP address then I'm able to run typical IP traffic across the
link (aka, ping).  It would seem that when dhclient registers a packet
filter on the socket, that filter is preventing it from ever getting the
dhcp response.  The same dhclient works on any non-vlan interfaces in
the system, so the filter must work for non-vlan interfaces.  Aside from
the fact that the interface is a vlan, we also use a priority egress map
on the interface, and we use PFC flow control.  Let me know if you need
anymore to debug the issue, or email me off list and I can get you
logins to my reproducer machines.


When you say 4.4-rc7 kernel is failing for you, what latest kernel version
was working, where the socket filter was properly receiving the response on
your vlan iface?


v4.3 final works.  I haven't bisected where in the 4.4 series it quits
working.  I can do that tomorrow.


I've tried to reproduce, but cannot seem to make dnsmasq work properly
over vlan, so bisect would be great.


Yeah, I've been working on it.  Issues with available machines that
reproduce combined with what hardware they have and whether or not that
hardware works at various steps in the bisection :-/


I've looked through all bpf related commits between v4.3..HEAD and don't see
anything suspicious. Could it be that your setup exploited a bug that was fixed 
by


Agreed, also went over the bpf history yesterday and didn't find anything
that could be related to this issue between the two tags.

The filter that dhclient seems to be using is (common/bpf.c):

struct bpf_insn dhcp_bpf_filter [] = {
/* Make sure this is an IP packet... */
BPF_STMT (BPF_LD + BPF_H + BPF_ABS, 12),
BPF_JUMP (BPF_JMP + BPF_JEQ + BPF_K, ETHERTYPE_IP, 0, 8),

/* Make sure it's a UDP packet... */
BPF_STMT (BPF_LD + BPF_B + BPF_ABS, 23),
BPF_JUMP (BPF_JMP + BPF_JEQ + BPF_K, IPPROTO_UDP, 0, 6),

/* Make sure this isn't a fragment... */
BPF_STMT(BPF_LD + BPF_H + BPF_ABS, 20),
BPF_JUMP(BPF_JMP + BPF_JSET + BPF_K, 0x1fff, 4, 0),

/* Get the IP header length... */
BPF_STMT (BPF_LDX + BPF_B + BPF_MSH, 14),

/* Make sure it's to the right port... */
BPF_STMT (BPF_LD + BPF_H + BPF_IND, 16),
BPF_JUMP (BPF_JMP + BPF_JEQ + BPF_K, 67, 0, 1), /* patch */

/* If we passed all the tests, ask for the whole packet. */
BPF_STMT(BPF_RET+BPF_K, (u_int)-1),

/* Otherwise, drop it. */
BPF_STMT(BPF_RET+BPF_K, 0),
};

Given that this drop doesn't strictly need to be caused by filter code,
it would be nice if you could pin the location down where the packet gets
dropped exactly. Perhaps dropwatch or perf with '-e skb:kfree_skb -a -g
dhclient ', etc could help to get a first overview to dig into
details then.


28f9ee22bcdd ("vlan: Do not put vlan headers back on bridge and macvlan ports")

Could you also provide more details on vlan+dhcp setup to help narrow it
down if bisect is taking too long.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Intel-wired-lan] [PATCH 2/2] ixgbe: restrict synchronization of link_up and speed

2015-12-30 Thread zhuyj

On 12/30/2015 02:55 PM, Tantilov, Emil S wrote:

-Original Message-
From: zhuyj [mailto:zyjzyj2...@gmail.com]
Sent: Tuesday, December 29, 2015 6:49 PM
To: Tantilov, Emil S; Kirsher, Jeffrey T; Brandeburg, Jesse; Nelson,
Shannon; Wyborny, Carolyn; Skidmore, Donald C; Allan, Bruce W; Ronciak,
John; Williams, Mitch A; intel-wired-...@lists.osuosl.org;
netdev@vger.kernel.org; e1000-de...@lists.sourceforge.net
Cc: Viswanathan, Ven (Wind River); Shteinbock, Boris (Wind River); Bourg,
Vincent (Wind River)
Subject: Re: [Intel-wired-lan] [PATCH 2/2] ixgbe: restrict synchronization
of link_up and speed

On 12/30/2015 12:18 AM, Tantilov, Emil S wrote:

-Original Message-
From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org]

On

Behalf Of zyjzyj2...@gmail.com
Sent: Monday, December 28, 2015 6:32 PM
To: Kirsher, Jeffrey T; Brandeburg, Jesse; Nelson, Shannon; Wyborny,
Carolyn; Skidmore, Donald C; Allan, Bruce W; Ronciak, John; Williams,

Mitch

A; intel-wired-...@lists.osuosl.org; netdev@vger.kernel.org; e1000-
de...@lists.sourceforge.net
Cc: Viswanathan, Ven (Wind River); Shteinbock, Boris (Wind River);

Bourg,

Vincent (Wind River)
Subject: [Intel-wired-lan] [PATCH 2/2] ixgbe: restrict synchronization

of

link_up and speed

From: Zhu Yanjun 

When the X540 NIC acts as a slave of some virtual NICs, it is very
important to synchronize link_up and link_speed, such as a bonding
driver in 802.3ad mode. When X540 NIC acts as an independent interface,
it is not necessary to synchronize link_up and link_speed. That is,
the time span between link_up and link_speed is acceptable.

What exactly do you mean by "time span between link_up and link_speed"?

In the previous mail, I show you some ethtool logs. In these logs, there
is some
time with NIC up while speed is unknown. I think this "some time" is
time span between
link_up and link_speed. Please see the previous mail for details.

Was this when reporting the link state from check_link() (reading the LINKS
register) or reporting the adapter->link_speed?


Where is it you think the de-synchronization occurs?

When a NIC interface acts as a slave, a flag "IFF_SLAVE" is set in
netdevice struct.
Before we enter this function, we check IFF_SLAVE flag. If this flag is
set, we continue to check
link_speed. If not, this function is executed whether this link_speed is
unknown or not.

I can already see this in your patch. I was asking about the reason why your
change is needed.


an extreme example, let us assume this scenario:

An ixgbe NIC directly connects to another NIC (let us call it NIC-a). 
And auto-negotiate is off while no static speed is set in the 2 NICs. 
These 2 NICs acts as 2 independent interfaces. As such, at this time, 
there is no speed in the both 2 NICs. That is, link_speed is unknown.


When the user run "ifconfig or ethtool", NIC-a will show "Link detected: 
yes" while ixgbe NIC will show "Link detected: no" if the flag IFF_SLAVE 
is not set.


NIC-a stands for most NIC, such as e1000, e1000e and so on.

Best Regards!
Zhu Yanjun



Signed-off-by: Zhu Yanjun 
---
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |9 -
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index ace21b9..1bb6056 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -6436,8 +6436,15 @@ static void ixgbe_watchdog_link_is_up(struct
ixgbe_adapter *adapter)
 * time. To X540 NIC, there is a time span between link_up and
 * link_speed. As such, only continue if link_up and link_speed are
 * ready to X540 NIC.
+* The time span between link_up and link_speed is very important
+* when the X540 NIC acts as a slave in some virtual NICs, such as
+* a bonding driver in 802.3ad mode. When X540 NIC acts as an
+* independent interface, it is not necessary to synchronize link_up
+* and link_speed.
+* In the end, not continue if (X540 NIC && SLAVE && link_speed
UNKNOWN)

This is a patch on top of your previous patch which I don't think was

applied,

so this is not going to apply cleanly.


 */
-   if (hw->mac.type == ixgbe_mac_X540)
+   if ((hw->mac.type == ixgbe_mac_X540) &&
+   (netdev->flags & IFF_SLAVE))
if (link_speed == IXGBE_LINK_SPEED_UNKNOWN)
return;

If you were to enter ixgbe_watchdog_link_is_up() with unknown speed, then

I would

assume that you also have a dmesg that shows:
"NIC Link is Up unknown speed"

by the interface you use in the bond?

Sure. There is a dmesg log from the customer.
"
...
2015-10-05T06:14:34.350 controller-0 kernel: info bonding: bond0: link
status definitely up for interface eth0, 0 Mbps full duplex.

This message is from the bonding driver not from ixgbe.

In your patch you 

[PATCH 3/3] ixgbe: synchronize the link_speed and link_up of a slave interface

2015-12-30 Thread zyjzyj2000
From: Zhu Yanjun 

According to the suggestion from Rustad, Mark D, this behavior perhaps
is more related to the copper phy. But to make fiber phy more robust,
to all the interfaces as a slave interface, the link_speed and link_up
is synchronized.

Signed-off-by: Zhu Yanjun 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 1bb6056..ce47639 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -6441,10 +6441,12 @@ static void ixgbe_watchdog_link_is_up(struct 
ixgbe_adapter *adapter)
 * a bonding driver in 802.3ad mode. When X540 NIC acts as an
 * independent interface, it is not necessary to synchronize link_up
 * and link_speed.
-* In the end, not continue if (X540 NIC && SLAVE && link_speed UNKNOWN)
+* According to the suggestion from Rustad, Mark D, this behavior
+* perhaps is related to the copper phy. To make fiber phy more robust,
+* To all the interfaces as a slave, the link_speed is checked.
+* In the end, not continue if (SLAVE && link_speed UNKNOWN)
 */
-   if ((hw->mac.type == ixgbe_mac_X540) &&
-   (netdev->flags & IFF_SLAVE))
+   if (netdev->flags & IFF_SLAVE)
if (link_speed == IXGBE_LINK_SPEED_UNKNOWN)
return;
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] ixgbe: force to synchronize reporting "link on" and getting speed

2015-12-30 Thread zyjzyj2000
From: Zhu Yanjun 

In X540 NIC, there is a time span between reporting "link on" and
getting the speed and duplex. To a bonding driver in 802.3ad mode,
this time span will make it not work well if the time span is big
enough. The big time span will make bonding driver change the state of
the slave device to up while the speed and duplex of the slave device
can not be gotten. Later the bonding driver will not have change to
get the speed and duplex of the slave device. The speed and duplex of
the slave device are important to a bonding driver in 802.3ad mode.

To 82599_SFP NIC and other kinds of NICs, this problem does
not exist. As such, it is necessary for X540 to report"link on" when
the link speed is not IXGBE_LINK_SPEED_UNKNOWN.

Signed-off-by: Zhu Yanjun 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   15 +++
 1 file changed, 15 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index aed8d02..ace21b9 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -6426,6 +6426,21 @@ static void ixgbe_watchdog_link_is_up(struct 
ixgbe_adapter *adapter)
if (netif_carrier_ok(netdev))
return;
 
+   /* In X540 NIC, there is a time span between reporting "link on"
+* and getting the speed and duplex. To a bonding driver in 802.3ad
+* mode, this time span will make it not work well if the time span
+* is big enough. To 82599_SFP NIC and other kinds of NICs, this
+* problem does not exist. As such, it is better for X540 to report
+* "link on" when the link speed is not IXGBE_LINK_SPEED_UNKNOWN.
+* To other NICs, the link_up and link_speed are gotten at the same
+* time. To X540 NIC, there is a time span between link_up and
+* link_speed. As such, only continue if link_up and link_speed are
+* ready to X540 NIC.
+*/
+   if (hw->mac.type == ixgbe_mac_X540)
+   if (link_speed == IXGBE_LINK_SPEED_UNKNOWN)
+   return;
+
adapter->flags2 &= ~IXGBE_FLAG2_SEARCH_FOR_SFP;
 
switch (hw->mac.type) {
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V4] ixgbe: synchronize the link_speed and link_up of a slave interface

2015-12-30 Thread zyjzyj2000

Hi, all

According to Rustad, Mark D, maybe it is related with copper phy. To make fiber 
phy more
robust, synchronize both the link_up and link_speed of a slave interface in 
ixgbe driver.

Best Regards!
Zhu Yanjun
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] ixgbe: restrict synchronization of link_up and speed

2015-12-30 Thread zyjzyj2000
From: Zhu Yanjun 

When the X540 NIC acts as a slave of some virtual NICs, it is very
important to synchronize link_up and link_speed, such as a bonding
driver in 802.3ad mode. When X540 NIC acts as an independent interface,
it is not necessary to synchronize link_up and link_speed. That is,
the time span between link_up and link_speed is acceptable.

Signed-off-by: Zhu Yanjun 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index ace21b9..1bb6056 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -6436,8 +6436,15 @@ static void ixgbe_watchdog_link_is_up(struct 
ixgbe_adapter *adapter)
 * time. To X540 NIC, there is a time span between link_up and
 * link_speed. As such, only continue if link_up and link_speed are
 * ready to X540 NIC.
+* The time span between link_up and link_speed is very important
+* when the X540 NIC acts as a slave in some virtual NICs, such as
+* a bonding driver in 802.3ad mode. When X540 NIC acts as an
+* independent interface, it is not necessary to synchronize link_up
+* and link_speed.
+* In the end, not continue if (X540 NIC && SLAVE && link_speed UNKNOWN)
 */
-   if (hw->mac.type == ixgbe_mac_X540)
+   if ((hw->mac.type == ixgbe_mac_X540) &&
+   (netdev->flags & IFF_SLAVE))
if (link_speed == IXGBE_LINK_SPEED_UNKNOWN)
return;
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND PATCH v1 0/4] Add support emac for the RK3036 SoC platform

2015-12-30 Thread Geert Uytterhoeven
Hi David,

On Wed, Dec 30, 2015 at 2:48 AM, David Miller  wrote:
> From: Heiko Stübner 
> Date: Tue, 29 Dec 2015 23:27:55 +0100
>> Am Dienstag, 29. Dezember 2015, 15:53:14 schrieb David Miller:
>>> You have to submit this series properly, the same problem happend twice
>>> now.
>>>
>>> When you submit a series you should:
>>>
>>> 1) Make it clear which tree you expect these changes to be applied
>>>to.  Here it is completely ambiguous, do you want it to go into
>>>my networking tree or some other subsystem tree?
>>>
>>> 2) You MUST keep all parties informed about all patches for a series
>>>like this.  That means you cannot drop netdev from patch #4 as
>>>you did both times.  Doing this aggravates the situation for
>>>#1 even more, because if a patch is not CC:'d to netdev it does
>>>not enter patchwork.  And if it doesn't go into patchwork, I'm
>>>not looking at it.
>>
>> I guess that is some unfortunate result of git send-email combined with
>> get_maintainer.pl . In general I also prefer to see the whole series, but 
>> have
>> gotten such partial series from other maintainers as well in the past, so it
>> seems to be depending on preferences somewhat.
>>
>> For the series at hand, the 4th patch is the devicetree addition, which the
>> expected way is me picking it up, after you are comfortable with the code-
>> related changes.
>
> Why would it not be appropriate for a DT file change to go into my tree
> if it corresponds to functionality created by the rest of the patches
> in the series?

Because the DT change is very likely to conflict with other DT changes.
That's why typically all DT changes go in through the platform/architecture
maintainer.

> It looks better to put it all together as a unit, via one series, with
> a merge commit containing your "[PATCH 0/N]" description in the commit
> message.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] udp: properly support MSG_PEEK with truncated buffers

2015-12-30 Thread Eric Dumazet
From: Eric Dumazet 

Backport of this upstream commit into stable kernels :
89c22d8c3b27 ("net: Fix skb csum races when peeking")
exposed a bug in udp stack vs MSG_PEEK support, when user provides
a buffer smaller than skb payload.

In this case,
skb_copy_and_csum_datagram_iovec(skb, sizeof(struct udphdr),
 msg->msg_iov);
returns -EFAULT.

This bug does not happen in upstream kernels since Al Viro did a great
job to replace this into :
skb_copy_and_csum_datagram_msg(skb, sizeof(struct udphdr), msg);
This variant is safe vs short buffers.

For the time being, instead reverting Herbert Xu patch and add back
skb->ip_summed invalid changes, simply store the result of
udp_lib_checksum_complete() so that we avoid computing the checksum a
second time, and avoid the problematic
skb_copy_and_csum_datagram_iovec() call.

This patch can be applied on recent kernels as it avoids a double
checksumming, then backported to stable kernels as a bug fix.

Signed-off-by: Eric Dumazet 
---
 net/ipv4/udp.c |6 --
 net/ipv6/udp.c |6 --
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 8841e984f8bf..ac14ae44390d 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1271,6 +1271,7 @@ int udp_recvmsg(struct sock *sk, struct msghdr *msg, 
size_t len, int noblock,
int peeked, off = 0;
int err;
int is_udplite = IS_UDPLITE(sk);
+   bool checksum_valid = false;
bool slow;
 
if (flags & MSG_ERRQUEUE)
@@ -1296,11 +1297,12 @@ try_again:
 */
 
if (copied < ulen || UDP_SKB_CB(skb)->partial_cov) {
-   if (udp_lib_checksum_complete(skb))
+   checksum_valid = !udp_lib_checksum_complete(skb);
+   if (!checksum_valid)
goto csum_copy_err;
}
 
-   if (skb_csum_unnecessary(skb))
+   if (checksum_valid || skb_csum_unnecessary(skb))
err = skb_copy_datagram_msg(skb, sizeof(struct udphdr),
msg, copied);
else {
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 9da3287a3923..00775ee27d86 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -402,6 +402,7 @@ int udpv6_recvmsg(struct sock *sk, struct msghdr *msg, 
size_t len,
int peeked, off = 0;
int err;
int is_udplite = IS_UDPLITE(sk);
+   bool checksum_valid = false;
int is_udp4;
bool slow;
 
@@ -433,11 +434,12 @@ try_again:
 */
 
if (copied < ulen || UDP_SKB_CB(skb)->partial_cov) {
-   if (udp_lib_checksum_complete(skb))
+   checksum_valid = !udp_lib_checksum_complete(skb);
+   if (!checksum_valid)
goto csum_copy_err;
}
 
-   if (skb_csum_unnecessary(skb))
+   if (checksum_valid || skb_csum_unnecessary(skb))
err = skb_copy_datagram_msg(skb, sizeof(struct udphdr),
msg, copied);
else {


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux 4.4-rc4 regression, bisected to "net: fix sock_wake_async() rcu protection"

2015-12-30 Thread Eric Dumazet
On Wed, 2015-12-30 at 12:32 +0100, Nicolai Stange wrote:
> Andy Lutomirski  writes:
> 
> > On recent v4.4-rc releases, I can't run emacs.  No, really, running
> > "emacs" in a GNOME 3 session makes gnome-shell think that emacs is
> > running, but no window is drawn, and the overall system UI is a bit
> > weird when the invisible emacs window is focused.
> >
> > This is 100% reproducible.
> >
> > There might be other symptoms involving gdb malfunctioning, but those
> > are, at best, sporadic.  The emacs failure is entirely reliable.  I
> > have no idea what the underlying failure mode is, but failure to wake
> > a socket waiter seems plausible,  I also have no idea why oocalc,
> > gimp, vim, gedit, firefox, etc aren't affected.
> >
> > A somewhat unorthodox "git bisect" run blames:
> >
> > commit ceb5d58b217098a657f3850b7a2640f995032e62
> > Author: Eric Dumazet 
> > Date:   Sun Nov 29 20:03:11 2015 -0800
> >
> > net: fix sock_wake_async() rcu protection
> >
> > I've confirmed that v4.4-rc7 with that patch reverted works fine.
> >
> > Since the offending commit was apparently a security fix, simply
> > reverting it might not be the best idea.
> 
> Please have a look at https://lkml.kernel.org/g/87ege73bma@gmail.com
> 
> I ran into the same issue and this one fixes it for me.

Right, and the ozlabs pointers for this were :

v1:
https://patchwork.ozlabs.org/patch/561194/

v2:
https://patchwork.ozlabs.org/patch/561553/

Thanks.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Q: bad routing table cache entries

2015-12-30 Thread Eric Dumazet
On Wed, 2015-12-30 at 15:42 +0300, Stas Sergeev wrote:
> 29.12.2015 18:22, Sowmini Varadhan пишет:
> > Do you have admin control over the ubuntu router?
> > If yes, you might want to check the shared_media [#] setting 
> > on that router for the interfaces with overlapping subnets.
> > (it is on by default, I would try turning it off).
> That didn't help, problem re-appears.
> Thanks anyway, looks like I am going to disable accept_redirects then.
> It seems buggy and obviously no one cares.

Obviously some people take vacations at this period of the year, and do
stay away from netdev traffic.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: Fix potential NULL pointer dereference in __skb_try_recv_datagram

2015-12-30 Thread Jacob Siverskog
On Wed, Dec 30, 2015 at 2:26 PM, Eric Dumazet  wrote:
> On Wed, Dec 30, 2015 at 6:14 AM, Jacob Siverskog
>  wrote:
>
>> Ok. Thanks for your feedback. How do you believe the issue could be
>> solved? Investigating it gives:
>>
>> static inline void __skb_unlink(struct sk_buff *skb, struct sk_buff_head 
>> *list)
>> {
>> struct sk_buff *next, *prev;
>>
>> list->qlen--;
>>  51c: e2433001 sub r3, r3, #1
>>  520: e58b3074 str r3, [fp, #116] ; 0x74
>> next   = skb->next;
>> prev   = skb->prev;
>>  524: e894000c ldm r4, {r2, r3}
>> skb->next  = skb->prev = NULL;
>>  528: e5841000 str r1, [r4]
>>  52c: e5841004 str r1, [r4, #4]
>> next->prev = prev;
>>  530: e5823004 str r3, [r2, #4]  <--
>> trapping instruction (r2 NULL)
>>
>> Register contents:
>> r7 : c58cfe1c  r6 : c06351d0  r5 : c77810ac  r4 : c583eac0
>> r3 :   r2 :   r1 :   r0 : 2013
>>
>> If I understand this correctly, then r4 = skb, r2 = next, r3 = prev.
>>
>> Should there be a check for this in __skb_try_recv_datagram?
>
> At this point corruption already happened.
> We can not possibly detect every possible corruption caused by bugs
> elsewhere in the kernel and just 'recover' at this point.
> We must indeed find the root cause and fix it, instead of trying to hide it.
>
> How often can you trigger this bug ?

Ok. I don't have a good repro to trigger it unfortunately, I've seen it just a
few times when bringing up/down network interfaces. Does the trace
give any clue?

[] (__skb_recv_datagram) from [] (udpv6_recvmsg+0x1d0/0x6d0)
[] (udpv6_recvmsg) from [] (inet_recvmsg+0x38/0x4c)
[] (inet_recvmsg) from [] (___sys_recvmsg+0x94/0x170)
[] (___sys_recvmsg) from [] (__sys_recvmsg+0x3c/0x6c)
[] (__sys_recvmsg) from [] (ret_fast_syscall+0x0/0x3c)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: Fix potential NULL pointer dereference in __skb_try_recv_datagram

2015-12-30 Thread Eric Dumazet
On Wed, Dec 30, 2015 at 6:14 AM, Jacob Siverskog
 wrote:

> Ok. Thanks for your feedback. How do you believe the issue could be
> solved? Investigating it gives:
>
> static inline void __skb_unlink(struct sk_buff *skb, struct sk_buff_head 
> *list)
> {
> struct sk_buff *next, *prev;
>
> list->qlen--;
>  51c: e2433001 sub r3, r3, #1
>  520: e58b3074 str r3, [fp, #116] ; 0x74
> next   = skb->next;
> prev   = skb->prev;
>  524: e894000c ldm r4, {r2, r3}
> skb->next  = skb->prev = NULL;
>  528: e5841000 str r1, [r4]
>  52c: e5841004 str r1, [r4, #4]
> next->prev = prev;
>  530: e5823004 str r3, [r2, #4]  <--
> trapping instruction (r2 NULL)
>
> Register contents:
> r7 : c58cfe1c  r6 : c06351d0  r5 : c77810ac  r4 : c583eac0
> r3 :   r2 :   r1 :   r0 : 2013
>
> If I understand this correctly, then r4 = skb, r2 = next, r3 = prev.
>
> Should there be a check for this in __skb_try_recv_datagram?

At this point corruption already happened.
We can not possibly detect every possible corruption caused by bugs
elsewhere in the kernel and just 'recover' at this point.
We must indeed find the root cause and fix it, instead of trying to hide it.

How often can you trigger this bug ?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >