Re: [PATCH net-next] qmi_wwan: Add support for Dell Wireless 5809e 4G Modem

2015-07-21 Thread David Miller
From: Pieter Hollants pie...@hollants.com
Date: Mon, 20 Jul 2015 10:14:13 +0200

 Added the USB IDs 0x413c:0x81b1 for the Dell Wireless 5809e Gobi(TM) 4G
 LTE Mobile Broadband Card, a Dell-branded Sierra Wireless EM7305 LTE
 card in M.2 form factor, used eg. in Dell's Latitude E7540 Notebook
 series.
 
 Signed-off-by: Pieter Hollants pie...@hollants.com

Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ss -p segfaults

2015-07-21 Thread j...@openmailbox.org

Patch for 4.1.1.

Essentially all that is needed to get rid of this issue is the
addition of:

memset(u, 0, sizeof(*u));

after:

if (!(u = malloc(sizeof(*u
break;

Also patched some other situations (strcpy and sprintf uses) that
potentially produce the same results.

Signed-off-by: Jose P Santos j...@openmailbox.org

--- iproute2-4.1.1/misc/ss.c.orig   2015-07-06 22:57:34.0 +0100
+++ iproute2-4.1.1/misc/ss.c2015-07-21 10:26:45.0 +0100
@@ -456,7 +456,10 @@ static void user_ent_hash_build(void)

user_ent_hash_build_init = 1;

-   strcpy(name, root);
+   /* Avoid buffer overrun if input size from PROC_ROOT  name */
+   memset(name, 0, sizeof(name));
+   strncpy(name, root, sizeof(name)-2);
+
if (strlen(name) == 0 || name[strlen(name)-1] != '/')
strcat(name, /);

@@ -480,7 +483,7 @@ static void user_ent_hash_build(void)
if (getpidcon(pid, pid_context) != 0)
pid_context = strdup(no_ctx);

-   sprintf(name + nameoff, %d/fd/, pid);
+   snprintf(name + nameoff, sizeof(name) - nameoff, %d/fd/, pid);
pos = strlen(name);
if ((dir1 = opendir(name)) == NULL)
continue;
@@ -499,7 +502,7 @@ static void user_ent_hash_build(void)
if (sscanf(d1-d_name, %d%c, fd, crap) != 1)
continue;

-   sprintf(name+pos, %d, fd);
+   snprintf(name+pos, sizeof(name) - pos, %d, fd);

link_len = readlink(name, lnk, sizeof(lnk)-1);
if (link_len == -1)
@@ -2722,6 +2725,11 @@ static int unix_show(struct filter *f)
if (!(u = malloc(sizeof(*u
break;

+   /* Zero initialization of 'u' struct avoids a segfault
+* when freeing memory 'free(name)' at 'unix_list_free()'.
+*/
+   memset(u, 0, sizeof(*u));
+
if (sscanf(buf, %x: %x %x %x %x %x %d %s,
   u-rport, u-rq, u-wq, flags, u-type,
   u-state, u-ino, name)  8)
@@ -3064,11 +3072,13 @@ static int netlink_show_one(struct filte
strncpy(procname, kernel, 6);
} else if (pid  0) {
FILE *fp;
-   sprintf(procname, %s/%d/stat,
+   snprintf(procname, sizeof(procname), %s/%d/stat,
getenv(PROC_ROOT) ? : /proc, pid);
if ((fp = fopen(procname, r)) != NULL) {
if (fscanf(fp, %*d (%[^)]), procname) == 1) {
-   sprintf(procname+strlen(procname), 
/%d, pid);
+   snprintf(procname+strlen(procname),
+   
sizeof(procname)-strlen(procname),
+   /%d, pid);
done = 1;
}
fclose(fp);



On 2015-07-20 20:09, Stephen Hemminger wrote:
 Patches are always appreciated and this looks like a real bug.
 But before I can accept it there are a couple of small
 changes needed.
 
 1. There is no need to check for NULL when calling free().
Glibc free is documented to accept NULL as a valid request
and do nothing.
 
 2. Please add a Signed-off-by: line with a real name.
Signed-off-by has legal meaning for the Developer's Certificate of Origin
see kernel documentation if you need more explaination.
 
 3. Although what you found is important, giving a full paragraph
of personal comment about it is not required. The point is software
should read like one source independent of who the authors are.
Your comment is basically just justifying using strncpy.
 
 4. Rather than strncpy() which has issues with maximal sized strings
consider using strlcpy() instead.
 
 5. Iproute2 uses kernel identation and style, consider running checkpatch
on your changes.
 
 Please fixup and resubmit to netdev.
 
 


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] drivers: net: cpsw: remove tx event processing in rx napi poll

2015-07-21 Thread Mugunthan V N
With commit c03abd84634d (net: ethernet: cpsw: don't requests IRQs
we don't use) common isr and napi are separated into separate tx isr
and rx isr/napi, but still in rx napi tx events are handled. So removing
the tx event handling in rx napi.

Signed-off-by: Mugunthan V N mugunthan...@ti.com
---
 drivers/net/ethernet/ti/cpsw.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index f335bf1..d155bf2 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -793,9 +793,7 @@ static irqreturn_t cpsw_rx_interrupt(int irq, void *dev_id)
 static int cpsw_poll(struct napi_struct *napi, int budget)
 {
struct cpsw_priv*priv = napi_to_priv(napi);
-   int num_tx, num_rx;
-
-   num_tx = cpdma_chan_process(priv-txch, 128);
+   int num_rx;
 
num_rx = cpdma_chan_process(priv-rxch, budget);
if (num_rx  budget) {
@@ -810,9 +808,8 @@ static int cpsw_poll(struct napi_struct *napi, int budget)
}
}
 
-   if (num_rx || num_tx)
-   cpsw_dbg(priv, intr, poll %d rx, %d tx pkts\n,
-num_rx, num_tx);
+   if (num_rx)
+   cpsw_dbg(priv, intr, poll %d rx pkts\n, num_rx);
 
return num_rx;
 }
-- 
2.5.0.rc2.13.g961abca

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net 1/1] tipc: fix compatibility bug

2015-07-21 Thread Jon Maloy
In commit d999297c3dbbe7fdd832f7fa4ec84301e170b3e6
(tipc: reduce locking scope during packet reception) we introduced
a new function tipc_link_proto_rcv(). This function contains a bug,
so that it sometimes by error sends out a non-zero link priority value
in created protocol messages.

The bug may lead to an extra link reset at initial link establising
with older nodes. This will never happen more than once, whereafter
the link will work as intended.

We fix this bug in this commit.

Signed-off-by: Jon Maloy jon.ma...@ericsson.com
---
 net/tipc/link.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index 55b675d..b63d573 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -1639,7 +1639,7 @@ static int tipc_link_proto_rcv(struct tipc_link *l, 
struct sk_buff *skb,
rcvgap = peers_snd_nxt - l-rcv_nxt;
if (rcvgap || (msg_probe(hdr)))
tipc_link_build_proto_msg(l, STATE_MSG, 0, rcvgap,
- 0, l-mtu, xmitq);
+ 0, 0, xmitq);
tipc_link_release_pkts(l, msg_ack(hdr));
 
/* If NACK, retransmit will now start at right position */
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 1/1] tipc: fix compatibility bug

2015-07-21 Thread Jon Maloy
In commit d999297c3dbbe7fdd832f7fa4ec84301e170b3e6
(tipc: reduce locking scope during packet reception) we introduced
a new function tipc_link_proto_rcv(). This function contains a bug,
so that it sometimes by error sends out a non-zero link priority value
in created protocol messages.

The bug may lead to an extra link reset at initial link establising
with older nodes. This will never happen more than once, whereafter
the link will work as intended.

We fix this bug in this commit.

Signed-off-by: Jon Maloy jon.ma...@ericsson.com
---
 net/tipc/link.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index 55b675d..b63d573 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -1639,7 +1639,7 @@ static int tipc_link_proto_rcv(struct tipc_link *l, 
struct sk_buff *skb,
rcvgap = peers_snd_nxt - l-rcv_nxt;
if (rcvgap || (msg_probe(hdr)))
tipc_link_build_proto_msg(l, STATE_MSG, 0, rcvgap,
- 0, l-mtu, xmitq);
+ 0, 0, xmitq);
tipc_link_release_pkts(l, msg_ack(hdr));
 
/* If NACK, retransmit will now start at right position */
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 1/7] Drivers: hv: vmbus: define the new offer type for Hyper-V socket (hvsock)

2015-07-21 Thread Dexuan Cui
A helper function is also added.

Signed-off-by: Dexuan Cui de...@microsoft.com
---
 include/linux/hyperv.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 30d3a1f..2ca3ac1 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -236,6 +236,7 @@ struct vmbus_channel_offer {
 #define VMBUS_CHANNEL_LOOPBACK_OFFER   0x100
 #define VMBUS_CHANNEL_PARENT_OFFER 0x200
 #define VMBUS_CHANNEL_REQUEST_MONITORED_NOTIFICATION   0x400
+#define VMBUS_CHANNEL_TLNPI_PROVIDER_OFFER 0x2000
 
 struct vmpacket_descriptor {
u16 type;
@@ -758,6 +759,12 @@ struct vmbus_channel {
struct list_head percpu_list;
 };
 
+static inline bool is_hvsock_channel(const struct vmbus_channel *c)
+{
+   return !!(c-offermsg.offer.chn_flags 
+ VMBUS_CHANNEL_TLNPI_PROVIDER_OFFER);
+}
+
 static inline void set_channel_read_state(struct vmbus_channel *c, bool state)
 {
c-batched_reading = state;
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 3/7] Drivers: hv: vmbus: add APIs to send/recv hvsock packet and get the r/w-ability

2015-07-21 Thread Dexuan Cui
This will be used by the coming net/hvsock driver.

Signed-off-by: Dexuan Cui de...@microsoft.com
---
 drivers/hv/channel.c  | 133 ++
 drivers/hv/hyperv_vmbus.h |   4 ++
 drivers/hv/ring_buffer.c  |  14 +
 include/linux/hyperv.h|  32 +++
 4 files changed, 183 insertions(+)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index b09d1b7..ffdef03 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -758,6 +758,53 @@ int vmbus_sendpacket_pagebuffer_ctl(struct vmbus_channel 
*channel,
 EXPORT_SYMBOL_GPL(vmbus_sendpacket_pagebuffer_ctl);
 
 /*
+ * vmbus_sendpacket_hvsock - Send the hvsock payload 'buf' into the vmbus
+ * ringbuffer
+ */
+int vmbus_sendpacket_hvsock(struct vmbus_channel *channel, void *buf, u32 len)
+{
+   struct vmpipe_proto_header pipe_hdr;
+   struct vmpacket_descriptor desc;
+   struct kvec bufferlist[4];
+   u32 packetlen_aligned;
+   u32 packetlen;
+   u64 aligned_data = 0;
+   bool signal = false;
+   int ret;
+
+   packetlen = HVSOCK_HEADER_LEN + len;
+   packetlen_aligned = ALIGN(packetlen, sizeof(u64));
+
+   /* Setup the descriptor */
+   desc.type = VM_PKT_DATA_INBAND;
+   /* in 8-bytes granularity */
+   desc.offset8 = sizeof(struct vmpacket_descriptor)  3;
+   desc.len8 = (u16)(packetlen_aligned  3);
+   desc.flags = 0;
+   desc.trans_id = 0;
+
+   pipe_hdr.pkt_type = 1;
+   pipe_hdr.data_size = len;
+
+   bufferlist[0].iov_base = desc;
+   bufferlist[0].iov_len  = sizeof(struct vmpacket_descriptor);
+   bufferlist[1].iov_base = pipe_hdr;
+   bufferlist[1].iov_len  = sizeof(struct vmpipe_proto_header);
+   bufferlist[2].iov_base = buf;
+   bufferlist[2].iov_len  = len;
+   bufferlist[3].iov_base = aligned_data;
+   bufferlist[3].iov_len  = packetlen_aligned - packetlen;
+
+   ret = hv_ringbuffer_write(channel-outbound, bufferlist, 4, signal);
+
+   if (ret == 0  signal)
+   vmbus_setevent(channel);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(vmbus_sendpacket_hvsock);
+
+/*
  * vmbus_sendpacket_pagebuffer - Send a range of single-page buffer
  * packets using a GPADL Direct packet type.
  */
@@ -978,3 +1025,89 @@ int vmbus_recvpacket_raw(struct vmbus_channel *channel, 
void *buffer,
return ret;
 }
 EXPORT_SYMBOL_GPL(vmbus_recvpacket_raw);
+
+/*
+ * vmbus_recvpacket_hvsock - Receive the hvsock payload from the vmbus
+ * ringbuffer into the 'buffer'.
+ */
+int vmbus_recvpacket_hvsock(struct vmbus_channel *channel, void *buffer,
+   u32 bufferlen, u32 *buffer_actual_len)
+{
+   struct vmpipe_proto_header *pipe_hdr;
+   struct vmpacket_descriptor *desc;
+   u32 packet_len, payload_len;
+   bool signal = false;
+   int ret;
+
+   *buffer_actual_len = 0;
+
+   if (bufferlen  HVSOCK_HEADER_LEN)
+   return -ENOBUFS;
+
+   ret = hv_ringbuffer_peek(channel-inbound, buffer,
+HVSOCK_HEADER_LEN);
+   if (ret != 0)
+   return 0;
+
+   desc = (struct vmpacket_descriptor *)buffer;
+   packet_len = desc-len8  3;
+   if (desc-type != VM_PKT_DATA_INBAND ||
+   desc-offset8 != (sizeof(*desc) / 8) ||
+   packet_len  HVSOCK_HEADER_LEN)
+   return -EIO;
+
+   pipe_hdr = (struct vmpipe_proto_header *)(desc + 1);
+   payload_len = pipe_hdr-data_size;
+
+   if (pipe_hdr-pkt_type != 1 || payload_len == 0)
+   return -EIO;
+
+   if (HVSOCK_PKT_LEN(payload_len) != packet_len + PREV_INDICES_LEN)
+   return -EIO;
+
+   if (bufferlen  packet_len - HVSOCK_HEADER_LEN)
+   return -ENOBUFS;
+
+   /* Copy over the hvsock payload to the user buffer */
+   ret = hv_ringbuffer_read(channel-inbound, buffer,
+packet_len - HVSOCK_HEADER_LEN,
+HVSOCK_HEADER_LEN, signal);
+   if (ret != 0)
+   return ret;
+
+   *buffer_actual_len = payload_len;
+
+   if (signal)
+   vmbus_setevent(channel);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(vmbus_recvpacket_hvsock);
+
+/*
+ * vmbus_get_hvsock_rw_status - can the ringbuffer be read/written?
+ */
+void vmbus_get_hvsock_rw_status(struct vmbus_channel *channel,
+   bool *can_read, bool *can_write)
+{
+   u32 avl_read_bytes, avl_write_bytes, dummy;
+
+   if (can_read != NULL) {
+   hv_get_ringbuffer_available_space(channel-inbound,
+ avl_read_bytes,
+ dummy);
+   *can_read = avl_read_bytes = HVSOCK_MIN_PKT_LEN;
+   }
+
+   /* We write into the ringbuffer only when we're able to write a
+* a payload of 4096 bytes (the actual written payload's length may be
+* less than 4096).
+

Re: Segmentation fault in iproute2 ss -p (versions 4.0.0, 4.1.0 and 4.1.1)

2015-07-21 Thread j...@openmailbox.org
Patch for 4.1.1.

Essentially all that is needed to get rid of this issue is the
addition of:

memset(u, 0, sizeof(*u));

after:

if (!(u = malloc(sizeof(*u
break;

Also patched some other situations (strcpy and sprintf uses) that
potentially produce the same results.

Note: As far as I know strlcpy isn't (yet) available in glibc.

Signed-off-by: Jose P Santos j...@openmailbox.org

--- iproute2-4.1.1/misc/ss.c.orig   2015-07-06 22:57:34.0 +0100
+++ iproute2-4.1.1/misc/ss.c2015-07-21 10:26:45.0 +0100
@@ -456,7 +456,10 @@ static void user_ent_hash_build(void)

user_ent_hash_build_init = 1;

-   strcpy(name, root);
+   /* Avoid buffer overrun if input size from PROC_ROOT  name */
+   memset(name, 0, sizeof(name));
+   strncpy(name, root, sizeof(name)-2);
+
if (strlen(name) == 0 || name[strlen(name)-1] != '/')
strcat(name, /);

@@ -480,7 +483,7 @@ static void user_ent_hash_build(void)
if (getpidcon(pid, pid_context) != 0)
pid_context = strdup(no_ctx);

-   sprintf(name + nameoff, %d/fd/, pid);
+   snprintf(name + nameoff, sizeof(name) - nameoff, %d/fd/, pid);
pos = strlen(name);
if ((dir1 = opendir(name)) == NULL)
continue;
@@ -499,7 +502,7 @@ static void user_ent_hash_build(void)
if (sscanf(d1-d_name, %d%c, fd, crap) != 1)
continue;

-   sprintf(name+pos, %d, fd);
+   snprintf(name+pos, sizeof(name) - pos, %d, fd);

link_len = readlink(name, lnk, sizeof(lnk)-1);
if (link_len == -1)
@@ -2722,6 +2725,11 @@ static int unix_show(struct filter *f)
if (!(u = malloc(sizeof(*u
break;

+   /* Zero initialization of 'u' struct avoids a segfault
+* when freeing memory 'free(name)' at 'unix_list_free()'.
+*/
+   memset(u, 0, sizeof(*u));
+
if (sscanf(buf, %x: %x %x %x %x %x %d %s,
   u-rport, u-rq, u-wq, flags, u-type,
   u-state, u-ino, name)  8)
@@ -3064,11 +3072,13 @@ static int netlink_show_one(struct filte
strncpy(procname, kernel, 6);
} else if (pid  0) {
FILE *fp;
-   sprintf(procname, %s/%d/stat,
+   snprintf(procname, sizeof(procname), %s/%d/stat,
getenv(PROC_ROOT) ? : /proc, pid);
if ((fp = fopen(procname, r)) != NULL) {
if (fscanf(fp, %*d (%[^)]), procname) == 1) {
-   sprintf(procname+strlen(procname), 
/%d, pid);
+   snprintf(procname+strlen(procname),
+   
sizeof(procname)-strlen(procname),
+   /%d, pid);
done = 1;
}
fclose(fp);



On 2015-07-20 20:09, Stephen Hemminger wrote:
 Patches are always appreciated and this looks like a real bug.
 But before I can accept it there are a couple of small
 changes needed.
 
 1. There is no need to check for NULL when calling free().
Glibc free is documented to accept NULL as a valid request
and do nothing.
 
 2. Please add a Signed-off-by: line with a real name.
Signed-off-by has legal meaning for the Developer's Certificate of Origin
see kernel documentation if you need more explaination.
 
 3. Although what you found is important, giving a full paragraph
of personal comment about it is not required. The point is software
should read like one source independent of who the authors are.
Your comment is basically just justifying using strncpy.
 
 4. Rather than strncpy() which has issues with maximal sized strings
consider using strlcpy() instead.
 
 5. Iproute2 uses kernel identation and style, consider running checkpatch
on your changes.
 
 Please fixup and resubmit to netdev.
 
 
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 2/7] Drivers: hv: vmbus: define a new VMBus message type for hvsock

2015-07-21 Thread Dexuan Cui
A function to send the type of message is also added.

The coming net/hvsock driver will use this function to proactively request
the host to offer a VMBus channel for a new hvsock connection.

Signed-off-by: Dexuan Cui de...@microsoft.com
---
 drivers/hv/channel.c  | 15 +++
 drivers/hv/channel_mgmt.c |  4 
 include/linux/hyperv.h| 13 +
 3 files changed, 32 insertions(+)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 603ce97..b09d1b7 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -218,6 +218,21 @@ error0:
 }
 EXPORT_SYMBOL_GPL(vmbus_open);
 
+/* Used for Hyper-V Socket: a guest client's connect() to the host */
+int vmbus_send_tl_connect_request(const uuid_le *shv_guest_servie_id,
+ const uuid_le *shv_host_servie_id)
+{
+   struct vmbus_channel_tl_connect_request conn_msg;
+
+   memset(conn_msg, 0, sizeof(conn_msg));
+   conn_msg.header.msgtype = CHANNELMSG_TL_CONNECT_REQUEST;
+   conn_msg.guest_endpoint_id = *shv_guest_servie_id;
+   conn_msg.host_service_id = *shv_host_servie_id;
+
+   return vmbus_post_msg(conn_msg, sizeof(conn_msg));
+}
+EXPORT_SYMBOL_GPL(vmbus_send_tl_connect_request);
+
 /*
  * create_gpadl_header - Creates a gpadl for the specified buffer
  */
diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 4506a66..7018c53 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -772,6 +772,10 @@ struct vmbus_channel_message_table_entry
{CHANNELMSG_VERSION_RESPONSE,   1, vmbus_onversion_response},
{CHANNELMSG_UNLOAD, 0, NULL},
{CHANNELMSG_UNLOAD_RESPONSE,1, vmbus_unload_response},
+   {CHANNELMSG_18, 0, NULL},
+   {CHANNELMSG_19, 0, NULL},
+   {CHANNELMSG_20, 0, NULL},
+   {CHANNELMSG_TL_CONNECT_REQUEST, 0, NULL},
 };
 
 /*
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 2ca3ac1..264093a 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -393,6 +393,10 @@ enum vmbus_channel_message_type {
CHANNELMSG_VERSION_RESPONSE = 15,
CHANNELMSG_UNLOAD   = 16,
CHANNELMSG_UNLOAD_RESPONSE  = 17,
+   CHANNELMSG_18   = 18,
+   CHANNELMSG_19   = 19,
+   CHANNELMSG_20   = 20,
+   CHANNELMSG_TL_CONNECT_REQUEST   = 21,
CHANNELMSG_COUNT
 };
 
@@ -563,6 +567,13 @@ struct vmbus_channel_initiate_contact {
u64 monitor_page2;
 } __packed;
 
+/* Hyper-V socket: guest's connect()-ing to host */
+struct vmbus_channel_tl_connect_request {
+   struct vmbus_channel_message_header header;
+   uuid_le guest_endpoint_id;
+   uuid_le host_service_id;
+} __packed;
+
 struct vmbus_channel_version_response {
struct vmbus_channel_message_header header;
u8 version_supported;
@@ -1248,4 +1259,6 @@ extern struct resource hyperv_mmio;
 
 extern __u32 vmbus_proto_version;
 
+int vmbus_send_tl_connect_request(const uuid_le *shv_guest_servie_id,
+ const uuid_le *shv_host_servie_id);
 #endif /* _HYPERV_H */
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 0/7] introduce Hyper-V VM Sockets(hvsock)

2015-07-21 Thread Dexuan Cui

Changes since v1:
- updated [PATCH 6/7] hvsock: introduce Hyper-V VM Sockets feature
- added __init and __exit for the module init/exit functions
- net/hv_sock/Kconfig: default m - default m if HYPERV
- MODULE_LICENSE: Dual MIT/GPL - Dual BSD/GPL 

Changes since v2:
- fixed various coding issue pointed by David Miller
- fix indentation issues
- removed pr_debug in net/hv_sock/af_hvsock.c
- used reverse-Chrismas-tree style for local variables.
- EXPORT_SYMBOL - EXPORT_SYMBOL_GPL

Hyper-V VM Sockets (hvsock) is a byte-stream based communication mechanism
between Windowsd 10 (or later) host and a guest. It's kind of TCP over
VMBus, but the transportation layer (VMBus) is much simpler than IP.
With Hyper-V VM Sockets, applications between the host and a guest can
talk with each other directly by the traditional BSD-style socket APIs.

The patchset implements the necessary support in the guest side by adding
the necessary new APIs in the vmbus driver, and introducing a new driver
hv_sock.ko, which implements_a new socket address family AF_HYPERV.

I know the kernel has already had a VM Sockets driver (AF_VSOCK) based
on VMware's VMCI (net/vmw_vsock/, drivers/misc/vmw_vmci), and KVM is
proposing AF_VSOCK of virtio version:
http://thread.gmane.org/gmane.linux.network/365205.

However, though Hyper-V VM Sockets may seem conceptually similar to
AF_VOSCK, there are differences in the transportation layer, and IMO these
make the direct code reusing impractical:

1. In AF_VSOCK, the endpoint type is: u32 ContextID, u32 Port, but in
AF_HYPERV, the endpoint type is: GUID VM_ID, GUID ServiceID. Here GUID
is 128-bit.

2. AF_VSOCK supports SOCK_DGRAM, while AF_HYPERV doesn't.

3. AF_VSOCK supports some special sock opts, like SO_VM_SOCKETS_BUFFER_SIZE,
SO_VM_SOCKETS_BUFFER_MIN/MAX_SIZE and SO_VM_SOCKETS_CONNECT_TIMEOUT.
These are meaningless to AF_HYPERV.

4. Some AF_VSOCK's VMCI transportation ops are meanless to AF_HYPERV/VMBus,
like.notify_recv_init
.notify_recv_pre_block
.notify_recv_pre_dequeue
.notify_recv_post_dequeue
.notify_send_init
.notify_send_pre_block
.notify_send_pre_enqueue
.notify_send_post_enqueue
etc.

So I think we'd better introduce a new address family: AF_HYPERV.

Please review the patchset.

Looking forward to your comments!

Dexuan Cui (7):
  Drivers: hv: vmbus: define the new offer type for Hyper-V socket
(hvsock)
  Drivers: hv: vmbus: define a new VMBus message type for hvsock
  Drivers: hv: vmbus: add APIs to send/recv hvsock packet and get the
r/w-ability
  Drivers: hv: vmbus: add APIs to register callbacks to process hvsock
connection
  Drivers: hv: vmbus: add a helper function to set a channel's pending
send size
  hvsock: introduce Hyper-V VM Sockets feature
  Drivers: hv: vmbus: disable local interrupt when hvsock's callback is
running

 MAINTAINERS   |2 +
 drivers/hv/Makefile   |4 +-
 drivers/hv/channel.c  |  148 +
 drivers/hv/channel_mgmt.c |   13 +
 drivers/hv/connection.c   |   15 +-
 drivers/hv/hvsock_callbacks.c |   71 ++
 drivers/hv/hyperv_vmbus.h |4 +
 drivers/hv/ring_buffer.c  |   14 +
 include/linux/hyperv.h|   68 ++
 include/linux/socket.h|4 +-
 include/net/af_hvsock.h   |   44 ++
 include/uapi/linux/hyperv.h   |   16 +
 net/Kconfig   |1 +
 net/Makefile  |1 +
 net/hv_sock/Kconfig   |   10 +
 net/hv_sock/Makefile  |3 +
 net/hv_sock/af_hvsock.c   | 1430 +
 17 files changed, 1845 insertions(+), 3 deletions(-)
 create mode 100644 drivers/hv/hvsock_callbacks.c
 create mode 100644 include/net/af_hvsock.h
 create mode 100644 net/hv_sock/Kconfig
 create mode 100644 net/hv_sock/Makefile
 create mode 100644 net/hv_sock/af_hvsock.c

-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH,v2 net] net: sched: validate that class is found in qdisc_tree_decrease_qlen

2015-07-21 Thread Jamal Hadi Salim

On 07/20/15 15:40, Alex Gartrell wrote:

We have an application that invokes tc to delete the root every time the
config changes. As a result we stress the cleanup code and were seeing the
following panic:

   crash bt
   PID: 630839  TASK: 8823c990d280  CPU: 14  COMMAND: tc
[... snip ...]
#8 [8820ceec17a0] page_fault at 8160a8c2
   [exception RIP: htb_qlen_notify+24]
   RIP: a0841718  RSP: 8820ceec1858  RFLAGS: 00010282
   RAX:   RBX:   RCX: 88241747b400
   RDX: 88241747b408  RSI:   RDI: 8811fb27d000
   RBP: 8820ceec1868   R8: 88120cdeff24   R9: 88120cdeff30
   R10: 0bd4  R11: a0840919  R12: a0843340
   R13:   R14: 0001  R15: 8808dae5c2e8
   ORIG_RAX:   CS: 0010  SS: 0018
#9 [...] qdisc_tree_decrease_qlen at 81565375
   #10 [...] fq_codel_dequeue at a084e0a0 [sch_fq_codel]
   #11 [...] fq_codel_reset at a084e2f8 [sch_fq_codel]
   #12 [...] qdisc_destroy at 81560d2d
   #13 [...] htb_destroy_class at a08408f8 [sch_htb]
   #14 [...] htb_put at a084095c [sch_htb]
   #15 [...] tc_ctl_tclass at 815645a3
   #16 [...] rtnetlink_rcv_msg at 81552cb0
   [... snip ...]

To my understanding, the following situation is taking place.




   tc_ctl_tclass



- htb_delete
  - class is deleted from clhash
- htb_put
  - qdisc_destroy
- fq_codel_reset


= this part looks suspicious. Why is reset invoking
a dequeue? Shouldnt a destroy just purge the queue?


  - fq_codel_dequeue
- qdidsc_tree_decrease_qlen
  - cl = htb_get # returns NULL, removed in htb_delete
- htb_qlen_notify(sch, NULL) # BOOM



It is worrisome to fix the core code for this. The root cause seems to
be codel. Dont have time but in general, reset would be something like:

struct fq_codel_sched_data *q = qdisc_priv(sch);
qdisc_reset(q)

or something along those lines...
But certainly dequeue semantics dont seem right there..

cheers,
jamal



cheers,
jamal
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Why return E2BIG from bpf map update?

2015-07-21 Thread Alex Gartrell
On Tue, Jul 21, 2015 at 2:40 AM, Daniel Borkmann dan...@iogearbox.net wrote:
 On 07/21/2015 12:24 AM, Alexei Starovoitov wrote:

 On 7/20/15 3:15 PM, Alex Gartrell wrote:

 The ship has probably sailed on this one, but it seems like ENOSPC
 makes more sense than E2BIG.  Any chance of changing it so that poor
 ebpf library maintainers in the future don't have to wonder how their
 argument list got too big?


 sorry, too late.
 It's in tests and even document in bpf manpage:
 E2BIG - indicates that the number of elements in the map reached the
 max_entries limit specified at map creation time.
 I read E2BIG as too big and not as argument list is too long :)


 If some libraries do an strerror(3) on errno then it certainly sounds
 a bit weird, no space left on device perhaps also a bit misleading.
 The bpf(2) manpage was actually submitted/discussed longer time ago,
 but I still didn't see it in Michael's tree yet, will ping him again.

I was just being whiny :)

The options really are
ENOMEM -- really should mean allocation failed
E2BIG -- really should mean argument list too big
ENOSPC -- really should mean that a physical device is full

I suppose there's also
EDQUOT -- Disk Quota Exceeded

And if you're really stretching
EXFULL - Exchange Full

but I've never seen either of the last two in my life.

So clearly this is all very subjective and people who complain about
it on mailing lists (me) are the worst.

The only complaint I'd have though is E2BIG actually does mean that
your bpf_attr argument is too big as well, so that part confused me
for a couple of minutes.  But, the EINVAL errno has similarly been
abused to death so I don't think it's that big of a deal.

/bikeshedding

-- 
Alex Gartrell agartr...@fb.com
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH,v2 net] net: sched: validate that class is found in qdisc_tree_decrease_qlen

2015-07-21 Thread Eric Dumazet
On Tue, 2015-07-21 at 06:04 -0400, Jamal Hadi Salim wrote:

 It is worrisome to fix the core code for this. The root cause seems to
 be codel. Dont have time but in general, reset would be something like:
 
 struct fq_codel_sched_data *q = qdisc_priv(sch);
 qdisc_reset(q)

This only works for very simple qdisc with one queue.

 
 or something along those lines...
 But certainly dequeue semantics dont seem right there..

Well, reset() is trivial to implement like this

while (skb = local_dequeue(sch)) {
kfree_skb(skb);
}

And I guess I copy/pasted sfq code here, because I was lazy.

But yes, qdisc_tree_decrease_qlen() would have to be not called.

It seems I coded fq_reset() differently.

Alex, please try instead :

diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index 21ca33c9f036..3f0320ab6029 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -288,10 +288,21 @@ begin:
 
 static void fq_codel_reset(struct Qdisc *sch)
 {
-   struct sk_buff *skb;
+   struct fq_codel_sched_data *q = qdisc_priv(sch);
+   int i;
 
-   while ((skb = fq_codel_dequeue(sch)) != NULL)
-   kfree_skb(skb);
+   INIT_LIST_HEAD(q-new_flows);
+   INIT_LIST_HEAD(q-old_flows);
+   for (i = 0; i  q-flows_cnt; i++) {
+   struct fq_codel_flow *flow = q-flows + i;
+
+   while (flow-head)
+   kfree_skb(dequeue_head(flow));
+
+   INIT_LIST_HEAD(flow-flowchain);
+   }
+   memset(q-backlogs, 0, q-flows_cnt * sizeof(u32));
+   sch-q.qlen = 0;
 }
 
 static const struct nla_policy fq_codel_policy[TCA_FQ_CODEL_MAX + 1] = {




--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net v2] macvtap: fix network header pointer for VLAN tagged pkts

2015-07-21 Thread Ivan Vecera
Network header is set with offset ETH_HLEN but it is not true for VLAN
(multiple-)tagged and results in checksum issues in lower devices.

v2: leave skb-protocol untouched (thx Vlad), comment added

Signed-off-by: Ivan Vecera ivec...@redhat.com
---
 drivers/net/macvtap.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 3b933bb..b75776b 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -796,6 +796,13 @@ static ssize_t macvtap_get_user(struct macvtap_queue *q, 
struct msghdr *m,
skb_reset_mac_header(skb);
skb-protocol = eth_hdr(skb)-h_proto;
 
+   /* Move network header to the right position for VLAN tagged packets */
+   if (skb_vlan_tagged(skb)) {
+   int depth;
+   __vlan_get_protocol(skb, skb-protocol, depth);
+   skb_set_network_header(skb, depth);
+   }
+
if (vnet_hdr_len) {
err = macvtap_skb_from_vnet_hdr(q, skb, vnet_hdr);
if (err)
-- 
2.3.6

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: pull-request: mac80211 2015-07-17

2015-07-21 Thread David Miller
From: Johannes Berg johan...@sipsolutions.net
Date: Fri, 17 Jul 2015 15:31:34 +0200

 We've accumulated some wireless fixes, please pull. Arik's fix is a bit
 bigger than I might like, but it fixes a real locking issue and we
 didn't really see a good way to make a smaller version.
 
 Let me know if there's any problem.

Pulled, thanks a lot Johannes.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfrm: Fix a typo

2015-07-21 Thread David Miller
From: Jakub Wilk jw...@jwilk.net
Date: Sat, 18 Jul 2015 14:41:51 +0200

 Signed-off-by: Jakub Wilk jw...@jwilk.net

Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] ath10k: fixing wrong initialization of struct channel

2015-07-21 Thread Kalle Valo
Maninder Singh maninder...@samsung.com writes:

 chandef is initialized with NULL and on the very next line,
 we are using it to get channel, which is not correct.

 channel should be initialized after obtaining chandef.

 Signed-off-by: Maninder Singh maninder...@samsung.com

How did you find this bug?

-- 
Kalle Valo
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 net 0/3] BPF JIT fixes for ARM

2015-07-21 Thread Nicolas Schichan
Hello,

These patches are fixing bugs in the ARM JIT and should probably find
their way to a stable kernel. All 60 test_bpf tests in Linux 4.1 release
are now passing OK (was 54 out of 60 before).

Regards,

Changes from original submission:
* split fixes and features in separate patch series.

Nicolas Schichan (3):
  ARM: net: fix condition for load_order  0 when translating load
instructions.
  ARM: net: handle negative offsets in BPF JIT.
  ARM: net: fix vlan access instructions in ARM JIT.

 arch/arm/net/bpf_jit_32.c | 57 ---
 1 file changed, 44 insertions(+), 13 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 net 1/3] ARM: net: fix condition for load_order 0 when translating load instructions.

2015-07-21 Thread Nicolas Schichan
To check whether the load should take the fast path or not, the code
would check that (r_skb_hlen - load_order) is greater than the offset
of the access using an Unsigned higher or same condition. For
halfword accesses and an skb length of 1 at offset 0, that test is
valid, as we end up comparing 0x(-1) and 0, so the fast path
is taken and the filter allows the load to wrongly succeed. A similar
issue exists for word loads at offset 0 and an skb length of less than
4.

Fix that by using the condition Signed greater than or equal
condition for the fast path code for load orders greater than 0.

Signed-off-by: Nicolas Schichan nschic...@freebox.fr
---
 arch/arm/net/bpf_jit_32.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index 4550d24..21f5ace 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -547,7 +547,7 @@ load_common:
emit(ARM_SUB_I(r_scratch, r_skb_hl,
   1  load_order), ctx);
emit(ARM_CMP_R(r_scratch, r_off), ctx);
-   condt = ARM_COND_HS;
+   condt = ARM_COND_GE;
} else {
emit(ARM_CMP_R(r_skb_hl, r_off), ctx);
condt = ARM_COND_HI;
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: reproducable panic eviction work queue

2015-07-21 Thread Frank Schreuder



On 7/20/2015 04:30 PM Florian Westphal wrote:

Frank Schreuder fschreu...@transip.nl wrote:

On 7/18/2015  05:32 PM, Nikolay Aleksandrov wrote:

On 07/18/2015 05:28 PM, Johan Schuijt wrote:

Thx for your looking into this!


Thank you for the report, I will try to reproduce this locally
Could you please post the full crash log ?

Of course, please see attached file.


Also could you test
with a clean current kernel from Linus' tree or Dave's -net ?

Will do.


These are available at:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
respectively.

One last question how many IRQs do you pin i.e. how many cores
do you actively use for receive ?

This varies a bit across our systems, but we’ve managed to reproduce this with 
IRQs pinned on as many as 2,4,8 or 20 cores.

I won’t have access to our test-setup till Monday again, so I’ll be testing 3 
scenario’s then:
- Your patch

-

- Linux tree
- Dave’s -net tree

Just one of these two would be enough. I couldn't reproduce it here but
I don't have as many machines to test right now and had to improvise with VMs. 
:-)


I’ll make sure to keep you posted on all the results then. We have a kernel 
dump of the panic, so if you need me to extract any data from there just let me 
know! (Some instructions might be needed)

- Johan


Great, thank you!


I'm able to reproduce this panic on the following kernel builds:
- 3.18.7
- 3.18.18
- 3.18.18 + patch from Nikolay Aleksandrov
- 4.1.0

Would you happen to have any more suggestions we can try?

Yes, although I admit its clutching at straws.

Problem is that I don't see how we can race with timer, but OTOH
I don't see why this needs to play refcnt tricks if we can just skip
the entry completely ...

The other issue is parallel completion on other cpu, but don't
see how we could trip there either.

Do you always get this one crash backtrace from evictor wq?

I'll set up a bigger test machine soon and will also try to reproduce
this.

Thanks for reporting!

diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -131,24 +131,14 @@ inet_evict_bucket(struct inet_frags *f, struct 
inet_frag_bucket *hb)
unsigned int evicted = 0;
HLIST_HEAD(expired);
  
-evict_again:

spin_lock(hb-chain_lock);
  
  	hlist_for_each_entry_safe(fq, n, hb-chain, list) {

if (!inet_fragq_should_evict(fq))
continue;
  
-		if (!del_timer(fq-timer)) {

-   /* q expiring right now thus increment its refcount so
-* it won't be freed under us and wait until the timer
-* has finished executing then destroy it
-*/
-   atomic_inc(fq-refcnt);
-   spin_unlock(hb-chain_lock);
-   del_timer_sync(fq-timer);
-   inet_frag_put(fq, f);
-   goto evict_again;
-   }
+   if (!del_timer(fq-timer))
+   continue;
  
  		fq-flags |= INET_FRAG_EVICTED;

hlist_del(fq-list);
@@ -240,18 +230,20 @@ void inet_frags_exit_net(struct netns_frags *nf, struct 
inet_frags *f)
int i;
  
  	nf-low_thresh = 0;

-   local_bh_disable();
  
  evict_again:

+   local_bh_disable();
seq = read_seqbegin(f-rnd_seqlock);
  
  	for (i = 0; i  INETFRAGS_HASHSZ ; i++)

inet_evict_bucket(f, f-hash[i]);
  
-	if (read_seqretry(f-rnd_seqlock, seq))

-   goto evict_again;
-
local_bh_enable();
+   cond_resched();
+
+   if (read_seqretry(f-rnd_seqlock, seq) ||
+   percpu_counter_sum(nf-mem))
+   goto evict_again;
  
  	percpu_counter_destroy(nf-mem);

  }
@@ -286,6 +278,8 @@ static inline void fq_unlink(struct inet_frag_queue *fq, 
struct inet_frags *f)
hb = get_frag_bucket_locked(fq, f);
if (!(fq-flags  INET_FRAG_EVICTED))
hlist_del(fq-list);
+
+   fq-flags |= INET_FRAG_COMPLETE;
spin_unlock(hb-chain_lock);
  }
  
@@ -297,7 +291,6 @@ void inet_frag_kill(struct inet_frag_queue *fq, struct inet_frags *f)

if (!(fq-flags  INET_FRAG_COMPLETE)) {
fq_unlink(fq, f);
atomic_dec(fq-refcnt);
-   fq-flags |= INET_FRAG_COMPLETE;
}
  }
  EXPORT_SYMBOL(inet_frag_kill);
Thanks a lot for your time and the patch. Unfortunately we are still 
able to reproduce the panic on kernel 3.18.18 with this patch included.
From all previous tests, the same backtrace occurs. If there is any way 
we can provide you with more debug information, please let me know.


Thanks a lot,
Frank

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net v5 0/4] net: enable inband link state negotiation only when explicitly requested

2015-07-21 Thread Stas Sergeev

21.07.2015 03:49, Florian Fainelli пишет:

Hi all,

Changes in v5:

- removed an invalid use of the link_update callback in the SF2 driver
   was appeared after merging net: phy: fixed_phy: handle link-down case

Thanks for bringing this forward!
For the future, perhaps it will make sense to also
teach phylib to never read link status (including speed)
when link is down. Will help to narrow more of such problems.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 net-next] cxgb4: Add debugfs entry to enable backdoor access

2015-07-21 Thread Hariprasad Shenai
Add debugfs entry 'use_backdoor' to enable backdoor access to read sge
context. By default, we read sge context's via firmware. In case of FW
issues, one can enable backdoor access via debugfs to dump sge context
for debugging purpose.

Signed-off-by: Hariprasad Shenai haripra...@chelsio.com
---
V2: Remove unnecessary braces as per comments by Sergei Shtylyov 
sergei.shtyl...@cogentembedded.com

 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h |  1 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c |  2 ++
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 19 ---
 3 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index 629f75d..58de444 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -767,6 +767,7 @@ struct adapter {
bool tid_release_task_busy;
 
struct dentry *debugfs_root;
+   u32 use_bd; /* Use SGE Back Door intfc for reading SGE Contexts */
 
spinlock_t stats_lock;
spinlock_t win0_lock cacheline_aligned_in_smp;
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
index b135d05..f701a6f 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
@@ -2388,6 +2388,8 @@ int t4_setup_debugfs(struct adapter *adap)
 
de = debugfs_create_file_size(flash, S_IRUSR, adap-debugfs_root, 
adap,
  flash_debugfs_fops, 
adap-params.sf_size);
+   debugfs_create_bool(use_backdoor, S_IWUSR | S_IRUSR,
+   adap-debugfs_root, adap-use_bd);
 
return 0;
 }
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c 
b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
index 1e6597d..800bd48 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
@@ -3689,6 +3689,11 @@ int t4_read_rss(struct adapter *adapter, u16 *map)
return 0;
 }
 
+static unsigned int t4_use_ldst(struct adapter *adap)
+{
+   return (adap-flags  FW_OK) || !adap-use_bd;
+}
+
 /**
  * t4_fw_tp_pio_rw - Access TP PIO through LDST
  * @adap: the adapter
@@ -3732,7 +3737,7 @@ static void t4_fw_tp_pio_rw(struct adapter *adap, u32 
*vals, unsigned int nregs,
  */
 void t4_read_rss_key(struct adapter *adap, u32 *key)
 {
-   if (adap-flags  FW_OK)
+   if (t4_use_ldst(adap))
t4_fw_tp_pio_rw(adap, key, 10, TP_RSS_SECRET_KEY0_A, 1);
else
t4_read_indirect(adap, TP_PIO_ADDR_A, TP_PIO_DATA_A, key, 10,
@@ -3762,7 +3767,7 @@ void t4_write_rss_key(struct adapter *adap, const u32 
*key, int idx)
(vrt  KEYEXTEND_F)  (KEYMODE_G(vrt) == 3))
rss_key_addr_cnt = 32;
 
-   if (adap-flags  FW_OK)
+   if (t4_use_ldst(adap))
t4_fw_tp_pio_rw(adap, (void *)key, 10, TP_RSS_SECRET_KEY0_A, 0);
else
t4_write_indirect(adap, TP_PIO_ADDR_A, TP_PIO_DATA_A, key, 10,
@@ -3791,7 +3796,7 @@ void t4_write_rss_key(struct adapter *adap, const u32 
*key, int idx)
 void t4_read_rss_pf_config(struct adapter *adapter, unsigned int index,
   u32 *valp)
 {
-   if (adapter-flags  FW_OK)
+   if (t4_use_ldst(adapter))
t4_fw_tp_pio_rw(adapter, valp, 1,
TP_RSS_PF0_CONFIG_A + index, 1);
else
@@ -3831,7 +3836,7 @@ void t4_read_rss_vf_config(struct adapter *adapter, 
unsigned int index,
 
/* Grab the VFL/VFH values ...
 */
-   if (adapter-flags  FW_OK) {
+   if (t4_use_ldst(adapter)) {
t4_fw_tp_pio_rw(adapter, vfl, 1, TP_RSS_VFL_CONFIG_A, 1);
t4_fw_tp_pio_rw(adapter, vfh, 1, TP_RSS_VFH_CONFIG_A, 1);
} else {
@@ -3852,7 +3857,7 @@ u32 t4_read_rss_pf_map(struct adapter *adapter)
 {
u32 pfmap;
 
-   if (adapter-flags  FW_OK)
+   if (t4_use_ldst(adapter))
t4_fw_tp_pio_rw(adapter, pfmap, 1, TP_RSS_PF_MAP_A, 1);
else
t4_read_indirect(adapter, TP_PIO_ADDR_A, TP_PIO_DATA_A,
@@ -3870,7 +3875,7 @@ u32 t4_read_rss_pf_mask(struct adapter *adapter)
 {
u32 pfmask;
 
-   if (adapter-flags  FW_OK)
+   if (t4_use_ldst(adapter))
t4_fw_tp_pio_rw(adapter, pfmask, 1, TP_RSS_PF_MSK_A, 1);
else
t4_read_indirect(adapter, TP_PIO_ADDR_A, TP_PIO_DATA_A,
@@ -6275,7 +6280,7 @@ int t4_init_tp_params(struct adapter *adap)
/* Cache the adapter's Compressed Filter Mode and global Incress
 * Configuration.
 */
-   if (adap-flags  FW_OK) {
+   if (t4_use_ldst(adap)) {
t4_fw_tp_pio_rw(adap, adap-params.tp.vlan_pri_map, 1,
TP_VLAN_PRI_MAP_A, 1);
t4_fw_tp_pio_rw(adap, 

Re: [PATCH 1/1] ath10k: fixing wrong initialization of struct channel

2015-07-21 Thread Maninder Singh
 chandef is initialized with NULL and on the very next line,
 we are using it to get channel, which is not correct.

 channel should be initialized after obtaining chandef.

 Signed-off-by: Maninder Singh maninder...@samsung.com

How did you find this bug?

Static anlysis reports this bug like coverity or any other static tool like 
cppcheck :-

drivers/net/wireless/ath/ath10k/mac.c:839]: (error) Possible null pointer 
dereference: chandef

Thanks,
Maninder

Re: Several races in usbnet module (kernel 4.1.x)

2015-07-21 Thread Oliver Neukum
On Mon, 2015-07-20 at 21:13 +0300, Eugene Shatokhin wrote:
 Hi,
 
 I have recently found several data races in usbnet module, checked on 
 vanilla kernel 4.1.0 on x86_64. The races do actually happen, I have 
 confirmed it by adding delays and using hardware breakpoints to detect 
 the conflicting memory accesses (with RaceHound tool, 
 https://github.com/winnukem/racehound).
 
 I have not analyzed yet how harmful these races are (if they are), but 
 it is better to report them anyway, I think.
 
 Everything was checked using YOTA 4G LTE Modem that works via usbnet 
 and cdc_ether kernel modules.
 --
 
 [Race #1]
 
 Race on skb_queue ('next' pointer) between usbnet_stop() and rx_complete().
 
 Reproduced that by unplugging the device while the system was 
 downloading a large file from the Net.
 
 Here is part of the call stack with the code where the changes to the 
 queue happen:
 
 #0 __skb_unlink (skbuff.h:1517)   
   prev-next = next;
 #1 defer_bh (usbnet.c:430)
   spin_lock_irqsave(list-lock, flags);
   old_state = entry-state;
   entry-state = state;
   __skb_unlink(skb, list);
   spin_unlock(list-lock);
   spin_lock(dev-done.lock);
   __skb_queue_tail(dev-done, skb);
   if (dev-done.qlen == 1)
   tasklet_schedule(dev-bh);
   spin_unlock_irqrestore(dev-done.lock, flags);
 #2 rx_complete (usbnet.c:640)
   state = defer_bh(dev, skb, dev-rxq, state);
 
 At the same time, the following code repeatedly checks if the queue is 
 empty and reads the same values concurrently with the above changes:
 
 #0  usbnet_terminate_urbs (usbnet.c:765)
   /* maybe wait for deletions to finish. */
   while (!skb_queue_empty(dev-rxq)
!skb_queue_empty(dev-txq)
!skb_queue_empty(dev-done)) {
   schedule_timeout(msecs_to_jiffies(UNLINK_TIMEOUT_MS));
   set_current_state(TASK_UNINTERRUPTIBLE);
   netif_dbg(dev, ifdown, dev-net,
 waited for %d urb completions\n, temp);
   }
 #1  usbnet_stop (usbnet.c:806)
   if (!(info-flags  FLAG_AVOID_UNLINK_URBS))
   usbnet_terminate_urbs(dev);
 
 For example, it is possible that the skb is removed from dev-rxq by 
 __skb_unlink() before the check !skb_queue_empty(dev-rxq) in 
 usbnet_terminate_urbs() is made. It is also possible in this case that 
 the skb is added to dev-done queue after !skb_queue_empty(dev-done) 
 is checked. So usbnet_terminate_urbs() may stop waiting and return while 
 dev-done queue still has an item.

Hi,

your analysis is correct and it looks like in addition to your proposed
fix locking needs to be simplified and a common lock to be taken.
Suggestions?

Regards
Oliver


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 net 3/3] ARM: net: fix vlan access instructions in ARM JIT.

2015-07-21 Thread Nicolas Schichan
This makes BPF_ANC | SKF_AD_VLAN_TAG and BPF_ANC | SKF_AD_VLAN_TAG_PRESENT
have the same behaviour as the in kernel VM and makes the test_bpf LD_VLAN_TAG
and LD_VLAN_TAG_PRESENT tests pass.

Signed-off-by: Nicolas Schichan nschic...@freebox.fr
---
 arch/arm/net/bpf_jit_32.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index d9b2524..c011e22 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -889,9 +889,11 @@ b_epilogue:
off = offsetof(struct sk_buff, vlan_tci);
emit(ARM_LDRH_I(r_A, r_skb, off), ctx);
if (code == (BPF_ANC | SKF_AD_VLAN_TAG))
-   OP_IMM3(ARM_AND, r_A, r_A, VLAN_VID_MASK, ctx);
-   else
-   OP_IMM3(ARM_AND, r_A, r_A, VLAN_TAG_PRESENT, 
ctx);
+   OP_IMM3(ARM_AND, r_A, r_A, ~VLAN_TAG_PRESENT, 
ctx);
+   else {
+   OP_IMM3(ARM_LSR, r_A, r_A, 12, ctx);
+   OP_IMM3(ARM_AND, r_A, r_A, 0x1, ctx);
+   }
break;
case BPF_ANC | SKF_AD_QUEUE:
ctx-seen |= SEEN_SKB;
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 net 2/3] ARM: net: handle negative offsets in BPF JIT.

2015-07-21 Thread Nicolas Schichan
Previously, the JIT would reject negative offsets known during code
generation and mishandle negative offsets provided at runtime.

Fix that by calling bpf_internal_load_pointer_neg_helper()
appropriately in the jit_get_skb_{b,h,w} slow path helpers and by forcing
the execution flow to the slow path helpers when the offset is
negative.

Signed-off-by: Nicolas Schichan nschic...@freebox.fr
---
 arch/arm/net/bpf_jit_32.c | 47 ++-
 1 file changed, 38 insertions(+), 9 deletions(-)

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index 21f5ace..d9b2524 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -74,32 +74,52 @@ struct jit_ctx {
 
 int bpf_jit_enable __read_mostly;
 
-static u64 jit_get_skb_b(struct sk_buff *skb, unsigned offset)
+static inline int call_neg_helper(struct sk_buff *skb, int offset, void *ret,
+ unsigned int size)
+{
+   void *ptr = bpf_internal_load_pointer_neg_helper(skb, offset, size);
+
+   if (!ptr)
+   return -EFAULT;
+   memcpy(ret, ptr, size);
+   return 0;
+}
+
+static u64 jit_get_skb_b(struct sk_buff *skb, int offset)
 {
u8 ret;
int err;
 
-   err = skb_copy_bits(skb, offset, ret, 1);
+   if (offset  0)
+   err = call_neg_helper(skb, offset, ret, 1);
+   else
+   err = skb_copy_bits(skb, offset, ret, 1);
 
return (u64)err  32 | ret;
 }
 
-static u64 jit_get_skb_h(struct sk_buff *skb, unsigned offset)
+static u64 jit_get_skb_h(struct sk_buff *skb, int offset)
 {
u16 ret;
int err;
 
-   err = skb_copy_bits(skb, offset, ret, 2);
+   if (offset  0)
+   err = call_neg_helper(skb, offset, ret, 2);
+   else
+   err = skb_copy_bits(skb, offset, ret, 2);
 
return (u64)err  32 | ntohs(ret);
 }
 
-static u64 jit_get_skb_w(struct sk_buff *skb, unsigned offset)
+static u64 jit_get_skb_w(struct sk_buff *skb, int offset)
 {
u32 ret;
int err;
 
-   err = skb_copy_bits(skb, offset, ret, 4);
+   if (offset  0)
+   err = call_neg_helper(skb, offset, ret, 4);
+   else
+   err = skb_copy_bits(skb, offset, ret, 4);
 
return (u64)err  32 | ntohl(ret);
 }
@@ -536,9 +556,6 @@ static int build_body(struct jit_ctx *ctx)
case BPF_LD | BPF_B | BPF_ABS:
load_order = 0;
 load:
-   /* the interpreter will deal with the negative K */
-   if ((int)k  0)
-   return -ENOTSUPP;
emit_mov_i(r_off, k, ctx);
 load_common:
ctx-seen |= SEEN_DATA | SEEN_CALL;
@@ -553,6 +570,18 @@ load_common:
condt = ARM_COND_HI;
}
 
+   /*
+* test for negative offset, only if we are
+* currently scheduled to take the fast
+* path. this will update the flags so that
+* the slowpath instruction are ignored if the
+* offset is negative.
+*
+* for loard_order == 0 the HI condition will
+* make loads at offset 0 take the slow path too.
+*/
+   _emit(condt, ARM_CMP_I(r_off, 0), ctx);
+
_emit(condt, ARM_ADD_R(r_scratch, r_off, r_skb_data),
  ctx);
 
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 net-next 3/3] ARM: net: add support for BPF_ANC | SKF_AD_HATYPE in ARM JIT.

2015-07-21 Thread Nicolas Schichan
Signed-off-by: Nicolas Schichan nschic...@freebox.fr
---
 arch/arm/net/bpf_jit_32.c | 22 --
 arch/arm/net/bpf_jit_32.h |  3 +++
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index fe28beb..6dcff2b 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -828,7 +828,9 @@ b_epilogue:
emit(ARM_LDR_I(r_A, r_scratch, off), ctx);
break;
case BPF_ANC | SKF_AD_IFINDEX:
+   case BPF_ANC | SKF_AD_HATYPE:
/* A = skb-dev-ifindex */
+   /* A = skb-dev-type */
ctx-seen |= SEEN_SKB;
off = offsetof(struct sk_buff, dev);
emit(ARM_LDR_I(r_scratch, r_skb, off), ctx);
@@ -838,8 +840,24 @@ b_epilogue:
 
BUILD_BUG_ON(FIELD_SIZEOF(struct net_device,
  ifindex) != 4);
-   off = offsetof(struct net_device, ifindex);
-   emit(ARM_LDR_I(r_A, r_scratch, off), ctx);
+   BUILD_BUG_ON(FIELD_SIZEOF(struct net_device,
+ type) != 2);
+
+   if (code == (BPF_ANC | SKF_AD_IFINDEX)) {
+   off = offsetof(struct net_device, ifindex);
+   emit(ARM_LDR_I(r_A, r_scratch, off), ctx);
+   } else {
+   /*
+* offset of field type in struct
+* net_device is above what can be
+* used in the ldrh rd, [rn, #imm]
+* instruction, so load the offset in
+* a register and use ldrh rd, [rn, rm]
+*/
+   off = offsetof(struct net_device, type);
+   emit_mov_i(ARM_R3, off, ctx);
+   emit(ARM_LDRH_R(r_A, r_scratch, ARM_R3), ctx);
+   }
break;
case BPF_ANC | SKF_AD_MARK:
ctx-seen |= SEEN_SKB;
diff --git a/arch/arm/net/bpf_jit_32.h b/arch/arm/net/bpf_jit_32.h
index b2d7d92..4b17d5ab 100644
--- a/arch/arm/net/bpf_jit_32.h
+++ b/arch/arm/net/bpf_jit_32.h
@@ -74,6 +74,7 @@
 #define ARM_INST_LDRB_I0x05d0
 #define ARM_INST_LDRB_R0x07d0
 #define ARM_INST_LDRH_I0x01d000b0
+#define ARM_INST_LDRH_R0x019000b0
 #define ARM_INST_LDR_I 0x0590
 
 #define ARM_INST_LDM   0x0890
@@ -160,6 +161,8 @@
 | (rm))
 #define ARM_LDRH_I(rt, rn, off)(ARM_INST_LDRH_I | (rt)  12 | (rn)  
16 \
 | (((off)  0xf0)  4) | ((off)  0xf))
+#define ARM_LDRH_R(rt, rn, rm) (ARM_INST_LDRH_R | (rt)  12 | (rn)  16 \
+| (rm))
 
 #define ARM_LDM(rn, regs)  (ARM_INST_LDM | (rn)  16 | (regs))
 
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 net-next 2/3] ARM: net: add support for BPF_ANC | SKF_AD_PAY_OFFSET in ARM JIT.

2015-07-21 Thread Nicolas Schichan
Signed-off-by: Nicolas Schichan nschic...@freebox.fr
---
 arch/arm/net/bpf_jit_32.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index 67a2d44..fe28beb 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -884,6 +884,14 @@ b_epilogue:
off = offsetof(struct sk_buff, queue_mapping);
emit(ARM_LDRH_I(r_A, r_skb, off), ctx);
break;
+   case BPF_ANC | SKF_AD_PAY_OFFSET:
+   ctx-seen |= SEEN_SKB | SEEN_CALL;
+
+   emit(ARM_MOV_R(ARM_R0, r_skb), ctx);
+   emit_mov_i(ARM_R3, (unsigned int)skb_get_poff, ctx);
+   emit_blx_r(ARM_R3, ctx);
+   emit(ARM_MOV_R(r_A, ARM_R0), ctx);
+   break;
case BPF_LDX | BPF_W | BPF_ABS:
/*
 * load a 32bit word from struct seccomp_data.
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 next-next 1/3] ARM: net: add support for BPF_ANC | SKF_AD_PKTTYPE in ARM JIT.

2015-07-21 Thread Nicolas Schichan
Signed-off-by: Nicolas Schichan nschic...@freebox.fr
---
 arch/arm/net/bpf_jit_32.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index 4550d24..67a2d44 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -864,6 +864,17 @@ b_epilogue:
else
OP_IMM3(ARM_AND, r_A, r_A, VLAN_TAG_PRESENT, 
ctx);
break;
+   case BPF_ANC | SKF_AD_PKTTYPE:
+   ctx-seen |= SEEN_SKB;
+   BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff,
+ __pkt_type_offset[0]) != 1);
+   off = PKT_TYPE_OFFSET();
+   emit(ARM_LDRB_I(r_A, r_skb, off), ctx);
+   emit(ARM_AND_I(r_A, r_A, PKT_TYPE_MAX), ctx);
+#ifdef __BIG_ENDIAN_BITFIELD
+   emit(ARM_LSR_I(r_A, r_A, 5), ctx);
+#endif
+   break;
case BPF_ANC | SKF_AD_QUEUE:
ctx-seen |= SEEN_SKB;
BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff,
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 net-next 0/3] ARM BPF JIT features

2015-07-21 Thread Nicolas Schichan
Hello,

This serie adds support for more instructions to the ARM BPF JIT
namely skb netdevice type retrieval, skb payload offset retrieval, and
skb packet type retrieval.

This allows 35 tests to use the JIT instead of 29 before.

This serie depends on the BPF JIT fixes for ARM serie sent earlier.

Regards,

Changes from original submission:
* split fixes and features in separate patch series.

Nicolas Schichan (3):
  ARM: net: add support for BPF_ANC | SKF_AD_PKTTYPE in ARM JIT.
  ARM: net: add support for BPF_ANC | SKF_AD_PAY_OFFSET in ARM JIT.
  ARM: net: add support for BPF_ANC | SKF_AD_HATYPE in ARM JIT.

 arch/arm/net/bpf_jit_32.c | 41 +++--
 arch/arm/net/bpf_jit_32.h |  3 +++
 2 files changed, 42 insertions(+), 2 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 19/22] openvswitch: Make tunnel set action attach a metadata dst

2015-07-21 Thread Thomas Graf
Utilize the new metadata dst to attach encapsulation instructions to
the skb. The existing egress_tun_info via the OVS_CB() is left in
place until all tunnel vports have been converted to the new method.

Signed-off-by: Thomas Graf tg...@suug.ch
Signed-off-by: Pravin B Shelar pshe...@nicira.com
---
 net/openvswitch/actions.c  | 10 ++-
 net/openvswitch/datapath.c |  8 +++---
 net/openvswitch/flow.h |  5 
 net/openvswitch/flow_netlink.c | 64 +-
 net/openvswitch/flow_netlink.h |  1 +
 net/openvswitch/flow_table.c   |  4 ++-
 6 files changed, 79 insertions(+), 13 deletions(-)

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 27c1687..cf04c2f 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -733,7 +733,15 @@ static int execute_set_action(struct sk_buff *skb,
 {
/* Only tunnel set execution is supported without a mask. */
if (nla_type(a) == OVS_KEY_ATTR_TUNNEL_INFO) {
-   OVS_CB(skb)-egress_tun_info = nla_data(a);
+   struct ovs_tunnel_info *tun = nla_data(a);
+
+   skb_dst_drop(skb);
+   dst_hold((struct dst_entry *)tun-tun_dst);
+   skb_dst_set(skb, (struct dst_entry *)tun-tun_dst);
+
+   /* FIXME: Remove when all vports have been converted */
+   OVS_CB(skb)-egress_tun_info = tun-tun_dst-u.tun_info;
+
return 0;
}
 
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index ff8c4a4..0208210 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -1018,7 +1018,7 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct 
genl_info *info)
}
ovs_unlock();
 
-   ovs_nla_free_flow_actions(old_acts);
+   ovs_nla_free_flow_actions_rcu(old_acts);
ovs_flow_free(new_flow, false);
}
 
@@ -1030,7 +1030,7 @@ err_unlock_ovs:
ovs_unlock();
kfree_skb(reply);
 err_kfree_acts:
-   kfree(acts);
+   ovs_nla_free_flow_actions(acts);
 err_kfree_flow:
ovs_flow_free(new_flow, false);
 error:
@@ -1157,7 +1157,7 @@ static int ovs_flow_cmd_set(struct sk_buff *skb, struct 
genl_info *info)
if (reply)
ovs_notify(dp_flow_genl_family, reply, info);
if (old_acts)
-   ovs_nla_free_flow_actions(old_acts);
+   ovs_nla_free_flow_actions_rcu(old_acts);
 
return 0;
 
@@ -1165,7 +1165,7 @@ err_unlock_ovs:
ovs_unlock();
kfree_skb(reply);
 err_kfree_acts:
-   kfree(acts);
+   ovs_nla_free_flow_actions(acts);
 error:
return error;
 }
diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index cadc6c5..b62cdb3 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -33,6 +33,7 @@
 #include linux/flex_array.h
 #include net/inet_ecn.h
 #include net/ip_tunnels.h
+#include net/dst_metadata.h
 
 struct sk_buff;
 
@@ -45,6 +46,10 @@ struct sk_buff;
 #define TUN_METADATA_OPTS(flow_key, opt_len) \
((void *)((flow_key)-tun_opts + TUN_METADATA_OFFSET(opt_len)))
 
+struct ovs_tunnel_info {
+   struct metadata_dst *tun_dst;
+};
+
 #define OVS_SW_FLOW_KEY_METADATA_SIZE  \
(offsetof(struct sw_flow_key, recirc_id) +  \
FIELD_SIZEOF(struct sw_flow_key, recirc_id))
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index ecfa530..e7906df 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -1548,11 +1548,48 @@ static struct sw_flow_actions 
*nla_alloc_flow_actions(int size, bool log)
return sfa;
 }
 
+static void ovs_nla_free_set_action(const struct nlattr *a)
+{
+   const struct nlattr *ovs_key = nla_data(a);
+   struct ovs_tunnel_info *ovs_tun;
+
+   switch (nla_type(ovs_key)) {
+   case OVS_KEY_ATTR_TUNNEL_INFO:
+   ovs_tun = nla_data(ovs_key);
+   dst_release((struct dst_entry *)ovs_tun-tun_dst);
+   break;
+   }
+}
+
+void ovs_nla_free_flow_actions(struct sw_flow_actions *sf_acts)
+{
+   const struct nlattr *a;
+   int rem;
+
+   if (!sf_acts)
+   return;
+
+   nla_for_each_attr(a, sf_acts-actions, sf_acts-actions_len, rem) {
+   switch (nla_type(a)) {
+   case OVS_ACTION_ATTR_SET:
+   ovs_nla_free_set_action(a);
+   break;
+   }
+   }
+
+   kfree(sf_acts);
+}
+
+static void __ovs_nla_free_flow_actions(struct rcu_head *head)
+{
+   ovs_nla_free_flow_actions(container_of(head, struct sw_flow_actions, 
rcu));
+}
+
 /* Schedules 'sf_acts' to be freed after the next RCU grace period.
  * The caller must hold rcu_read_lock for this to be sensible. */
-void ovs_nla_free_flow_actions(struct sw_flow_actions *sf_acts)
+void ovs_nla_free_flow_actions_rcu(struct sw_flow_actions *sf_acts)
 {
-   

[PATCH net-next 18/22] vxlan: Factor out device configuration

2015-07-21 Thread Thomas Graf
This factors out the device configuration out of the RTNL newlink
API which allows for in-kernel creation of VXLAN net_devices.

Signed-off-by: Thomas Graf tg...@suug.ch
---
 drivers/net/vxlan.c | 332 
 include/net/vxlan.h |  59 ++
 2 files changed, 236 insertions(+), 155 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 2587ac8..30e1f21 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -55,10 +55,6 @@
 
 #define PORT_HASH_BITS 8
 #define PORT_HASH_SIZE  (1PORT_HASH_BITS)
-#define VNI_HASH_BITS  10
-#define VNI_HASH_SIZE  (1VNI_HASH_BITS)
-#define FDB_HASH_BITS  8
-#define FDB_HASH_SIZE  (1FDB_HASH_BITS)
 #define FDB_AGE_DEFAULT 300 /* 5 min */
 #define FDB_AGE_INTERVAL (10 * HZ) /* rescan interval */
 
@@ -75,6 +71,7 @@ module_param(log_ecn_error, bool, 0644);
 MODULE_PARM_DESC(log_ecn_error, Log packets received with corrupted ECN);
 
 static int vxlan_net_id;
+static struct rtnl_link_ops vxlan_link_ops;
 
 static const u8 all_zeros_mac[ETH_ALEN];
 
@@ -85,21 +82,6 @@ struct vxlan_net {
spinlock_tsock_lock;
 };
 
-union vxlan_addr {
-   struct sockaddr_in sin;
-   struct sockaddr_in6 sin6;
-   struct sockaddr sa;
-};
-
-struct vxlan_rdst {
-   union vxlan_addr remote_ip;
-   __be16   remote_port;
-   u32  remote_vni;
-   u32  remote_ifindex;
-   struct list_head list;
-   struct rcu_head  rcu;
-};
-
 /* Forwarding table entry */
 struct vxlan_fdb {
struct hlist_node hlist;/* linked list of entries */
@@ -112,31 +94,6 @@ struct vxlan_fdb {
u8flags;/* see ndm_flags */
 };
 
-/* Pseudo network device */
-struct vxlan_dev {
-   struct hlist_node hlist;/* vni hash table */
-   struct list_head  next; /* vxlan's per namespace list */
-   struct vxlan_sock *vn_sock; /* listening socket */
-   struct net_device *dev;
-   struct net*net; /* netns for packet i/o */
-   struct vxlan_rdst default_dst;  /* default destination */
-   union vxlan_addr  saddr;/* source address */
-   __be16dst_port;
-   __u16 port_min; /* source port range */
-   __u16 port_max;
-   __u8  tos;  /* TOS override */
-   __u8  ttl;
-   u32   flags;/* VXLAN_F_* in vxlan.h */
-
-   unsigned long age_interval;
-   struct timer_list age_timer;
-   spinlock_thash_lock;
-   unsigned int  addrcnt;
-   unsigned int  addrmax;
-
-   struct hlist_head fdb_head[FDB_HASH_SIZE];
-};
-
 /* salt for hash table */
 static u32 vxlan_salt __read_mostly;
 static struct workqueue_struct *vxlan_wq;
@@ -352,7 +309,7 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct 
vxlan_dev *vxlan,
if (send_ip  vxlan_nla_put_addr(skb, NDA_DST, rdst-remote_ip))
goto nla_put_failure;
 
-   if (rdst-remote_port  rdst-remote_port != vxlan-dst_port 
+   if (rdst-remote_port  rdst-remote_port != vxlan-cfg.dst_port 
nla_put_be16(skb, NDA_PORT, rdst-remote_port))
goto nla_put_failure;
if (rdst-remote_vni != vxlan-default_dst.remote_vni 
@@ -756,7 +713,8 @@ static int vxlan_fdb_create(struct vxlan_dev *vxlan,
if (!(flags  NLM_F_CREATE))
return -ENOENT;
 
-   if (vxlan-addrmax  vxlan-addrcnt = vxlan-addrmax)
+   if (vxlan-cfg.addrmax 
+   vxlan-addrcnt = vxlan-cfg.addrmax)
return -ENOSPC;
 
/* Disallow replace to add a multicast entry */
@@ -842,7 +800,7 @@ static int vxlan_fdb_parse(struct nlattr *tb[], struct 
vxlan_dev *vxlan,
return -EINVAL;
*port = nla_get_be16(tb[NDA_PORT]);
} else {
-   *port = vxlan-dst_port;
+   *port = vxlan-cfg.dst_port;
}
 
if (tb[NDA_VNI]) {
@@ -1028,7 +986,7 @@ static bool vxlan_snoop(struct net_device *dev,
vxlan_fdb_create(vxlan, src_mac, src_ip,
 NUD_REACHABLE,
 NLM_F_EXCL|NLM_F_CREATE,
-vxlan-dst_port,
+vxlan-cfg.dst_port,
 vxlan-default_dst.remote_vni,
 0, NTF_SELF);
spin_unlock(vxlan-hash_lock);
@@ -1957,7 +1915,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
info = skb_tunnel_info(skb, AF_INET);
 
if (rdst) {
-   dst_port = rdst-remote_port ? rdst-remote_port : 
vxlan-dst_port;
+   dst_port = rdst-remote_port ? 

[PATCH net-next 09/22] mpls: ip tunnel support

2015-07-21 Thread Thomas Graf
From: Roopa Prabhu ro...@cumulusnetworks.com

This implementation uses lwtunnel infrastructure to register
hooks for mpls tunnel encaps.

It picks cues from iptunnel_encaps infrastructure and previous
mpls iptunnel RFC patches from Eric W. Biederman and Robert Shearman

Signed-off-by: Roopa Prabhu ro...@cumulusnetworks.com
---
 include/linux/mpls_iptunnel.h  |   6 +
 include/net/mpls_iptunnel.h|  29 +
 include/uapi/linux/mpls_iptunnel.h |  28 +
 net/mpls/Kconfig   |   8 +-
 net/mpls/Makefile  |   1 +
 net/mpls/mpls_iptunnel.c   | 233 +
 6 files changed, 304 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/mpls_iptunnel.h
 create mode 100644 include/net/mpls_iptunnel.h
 create mode 100644 include/uapi/linux/mpls_iptunnel.h
 create mode 100644 net/mpls/mpls_iptunnel.c

diff --git a/include/linux/mpls_iptunnel.h b/include/linux/mpls_iptunnel.h
new file mode 100644
index 000..ef29eb2
--- /dev/null
+++ b/include/linux/mpls_iptunnel.h
@@ -0,0 +1,6 @@
+#ifndef _LINUX_MPLS_IPTUNNEL_H
+#define _LINUX_MPLS_IPTUNNEL_H
+
+#include uapi/linux/mpls_iptunnel.h
+
+#endif  /* _LINUX_MPLS_IPTUNNEL_H */
diff --git a/include/net/mpls_iptunnel.h b/include/net/mpls_iptunnel.h
new file mode 100644
index 000..4757997
--- /dev/null
+++ b/include/net/mpls_iptunnel.h
@@ -0,0 +1,29 @@
+/*
+ * Copyright (c) 2015 Cumulus Networks, Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+
+#ifndef _NET_MPLS_IPTUNNEL_H
+#define _NET_MPLS_IPTUNNEL_H 1
+
+#define MAX_NEW_LABELS 2
+
+struct mpls_iptunnel_encap {
+   u32 label[MAX_NEW_LABELS];
+   u32 labels;
+};
+
+static inline struct mpls_iptunnel_encap *mpls_lwtunnel_encap(struct 
lwtunnel_state *lwtstate)
+{
+   return (struct mpls_iptunnel_encap *)lwtstate-data;
+}
+
+#endif
diff --git a/include/uapi/linux/mpls_iptunnel.h 
b/include/uapi/linux/mpls_iptunnel.h
new file mode 100644
index 000..d80a049
--- /dev/null
+++ b/include/uapi/linux/mpls_iptunnel.h
@@ -0,0 +1,28 @@
+/*
+ * mpls tunnel api
+ *
+ * Authors:
+ * Roopa Prabhu ro...@cumulusnetworks.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _UAPI_LINUX_MPLS_IPTUNNEL_H
+#define _UAPI_LINUX_MPLS_IPTUNNEL_H
+
+/* MPLS tunnel attributes
+ * [RTA_ENCAP] = {
+ * [MPLS_IPTUNNEL_DST]
+ * }
+ */
+enum {
+   MPLS_IPTUNNEL_UNSPEC,
+   MPLS_IPTUNNEL_DST,
+   __MPLS_IPTUNNEL_MAX,
+};
+#define MPLS_IPTUNNEL_MAX (__MPLS_IPTUNNEL_MAX - 1)
+
+#endif /* _UAPI_LINUX_MPLS_IPTUNNEL_H */
diff --git a/net/mpls/Kconfig b/net/mpls/Kconfig
index 17bde79..5c467ef 100644
--- a/net/mpls/Kconfig
+++ b/net/mpls/Kconfig
@@ -24,7 +24,13 @@ config NET_MPLS_GSO
 
 config MPLS_ROUTING
tristate MPLS: routing support
-   help
+   ---help---
 Add support for forwarding of mpls packets.
 
+config MPLS_IPTUNNEL
+   tristate MPLS: IP over MPLS tunnel support
+   depends on LWTUNNEL  MPLS_ROUTING
+   ---help---
+mpls ip tunnel support.
+
 endif # MPLS
diff --git a/net/mpls/Makefile b/net/mpls/Makefile
index 65bbe68..9ca9236 100644
--- a/net/mpls/Makefile
+++ b/net/mpls/Makefile
@@ -3,5 +3,6 @@
 #
 obj-$(CONFIG_NET_MPLS_GSO) += mpls_gso.o
 obj-$(CONFIG_MPLS_ROUTING) += mpls_router.o
+obj-$(CONFIG_MPLS_IPTUNNEL) += mpls_iptunnel.o
 
 mpls_router-y := af_mpls.o
diff --git a/net/mpls/mpls_iptunnel.c b/net/mpls/mpls_iptunnel.c
new file mode 100644
index 000..eea096f
--- /dev/null
+++ b/net/mpls/mpls_iptunnel.c
@@ -0,0 +1,233 @@
+/*
+ * mpls tunnelsAn implementation mpls tunnels using the light weight 
tunnel
+ * infrastructure
+ *
+ * Authors:Roopa Prabhu, ro...@cumulusnetworks.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ */
+#include linux/types.h
+#include linux/skbuff.h
+#include linux/net.h
+#include linux/module.h
+#include linux/mpls.h
+#include linux/vmalloc.h
+#include net/ip.h
+#include net/dst.h
+#include net/lwtunnel.h
+#include net/netevent.h
+#include net/netns/generic.h
+#include net/ip6_fib.h
+#include 

[PATCH net-next 01/22] rtnetlink: introduce new RTA_ENCAP_TYPE and RTA_ENCAP attributes

2015-07-21 Thread Thomas Graf
From: Roopa Prabhu ro...@cumulusnetworks.com

This patch introduces two new RTA attributes to attach encap
data to fib routes.

Example iproute2 command to attach mpls encap data to ipv4 routes

$ip route add 10.1.1.0/30 encap mpls 200 via inet 10.1.1.1 dev swp1

Signed-off-by: Roopa Prabhu ro...@cumulusnetworks.com
Suggested-by: Eric W. Biederman ebied...@xmission.com
---
 include/uapi/linux/rtnetlink.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index fdd8f07..0d3d3cc 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -308,6 +308,8 @@ enum rtattr_type_t {
RTA_VIA,
RTA_NEWDST,
RTA_PREF,
+   RTA_ENCAP_TYPE,
+   RTA_ENCAP,
__RTA_MAX
 };
 
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 10/22] ip_tunnel: Make ovs_tunnel_info and ovs_key_ipv4_tunnel generic

2015-07-21 Thread Thomas Graf
Rename the tunnel metadata data structures currently internal to
OVS and make them generic for use by all IP tunnels.

Both structures are kernel internal and will stay that way. Their
members are exposed to user space through individual Netlink
attributes by OVS. It will therefore be possible to extend/modify
these structures without affecting user ABI.

Signed-off-by: Thomas Graf tg...@suug.ch
---
 include/net/ip_tunnels.h | 63 +
 include/uapi/linux/openvswitch.h |  2 +-
 net/openvswitch/actions.c|  2 +-
 net/openvswitch/datapath.h   |  5 +--
 net/openvswitch/flow.c   |  4 +--
 net/openvswitch/flow.h   | 76 ++--
 net/openvswitch/flow_netlink.c   | 16 -
 net/openvswitch/flow_netlink.h   |  2 +-
 net/openvswitch/vport-geneve.c   | 17 +
 net/openvswitch/vport-gre.c  | 16 -
 net/openvswitch/vport-vxlan.c| 18 +-
 net/openvswitch/vport.c  | 30 
 net/openvswitch/vport.h  | 12 +++
 13 files changed, 128 insertions(+), 135 deletions(-)

diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index d8214cb..6b9d559 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -22,6 +22,28 @@
 /* Keep error state on tunnel for 30 sec */
 #define IPTUNNEL_ERR_TIMEO (30*HZ)
 
+/* Used to memset ip_tunnel padding. */
+#define IP_TUNNEL_KEY_SIZE \
+   (offsetof(struct ip_tunnel_key, tp_dst) +   \
+FIELD_SIZEOF(struct ip_tunnel_key, tp_dst))
+
+struct ip_tunnel_key {
+   __be64  tun_id;
+   __be32  ipv4_src;
+   __be32  ipv4_dst;
+   __be16  tun_flags;
+   __u8ipv4_tos;
+   __u8ipv4_ttl;
+   __be16  tp_src;
+   __be16  tp_dst;
+} __packed __aligned(4); /* Minimize padding. */
+
+struct ip_tunnel_info {
+   struct ip_tunnel_keykey;
+   const void  *options;
+   u8  options_len;
+};
+
 /* 6rd prefix/relay information */
 #ifdef CONFIG_IPV6_SIT_6RD
 struct ip_tunnel_6rd_parm {
@@ -136,6 +158,47 @@ int ip_tunnel_encap_add_ops(const struct 
ip_tunnel_encap_ops *op,
 int ip_tunnel_encap_del_ops(const struct ip_tunnel_encap_ops *op,
unsigned int num);
 
+static inline void __ip_tunnel_info_init(struct ip_tunnel_info *tun_info,
+__be32 saddr, __be32 daddr,
+u8 tos, u8 ttl,
+__be16 tp_src, __be16 tp_dst,
+__be64 tun_id, __be16 tun_flags,
+const void *opts, u8 opts_len)
+{
+   tun_info-key.tun_id = tun_id;
+   tun_info-key.ipv4_src = saddr;
+   tun_info-key.ipv4_dst = daddr;
+   tun_info-key.ipv4_tos = tos;
+   tun_info-key.ipv4_ttl = ttl;
+   tun_info-key.tun_flags = tun_flags;
+
+   /* For the tunnel types on the top of IPsec, the tp_src and tp_dst of
+* the upper tunnel are used.
+* E.g: GRE over IPSEC, the tp_src and tp_port are zero.
+*/
+   tun_info-key.tp_src = tp_src;
+   tun_info-key.tp_dst = tp_dst;
+
+   /* Clear struct padding. */
+   if (sizeof(tun_info-key) != IP_TUNNEL_KEY_SIZE)
+   memset((unsigned char *)tun_info-key + IP_TUNNEL_KEY_SIZE,
+  0, sizeof(tun_info-key) - IP_TUNNEL_KEY_SIZE);
+
+   tun_info-options = opts;
+   tun_info-options_len = opts_len;
+}
+
+static inline void ip_tunnel_info_init(struct ip_tunnel_info *tun_info,
+  const struct iphdr *iph,
+  __be16 tp_src, __be16 tp_dst,
+  __be64 tun_id, __be16 tun_flags,
+  const void *opts, u8 opts_len)
+{
+   __ip_tunnel_info_init(tun_info, iph-saddr, iph-daddr,
+ iph-tos, iph-ttl, tp_src, tp_dst,
+ tun_id, tun_flags, opts, opts_len);
+}
+
 #ifdef CONFIG_INET
 
 int ip_tunnel_init(struct net_device *dev);
diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 1dab776..d6b8854 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -321,7 +321,7 @@ enum ovs_key_attr {
 * the accepted length of the array. */
 
 #ifdef __KERNEL__
-   OVS_KEY_ATTR_TUNNEL_INFO,  /* struct ovs_tunnel_info */
+   OVS_KEY_ATTR_TUNNEL_INFO,  /* struct ip_tunnel_info */
 #endif
__OVS_KEY_ATTR_MAX
 };
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 8a8c0b8..27c1687 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -611,7 

[PATCH net-next 22/22] openvswitch: Use regular VXLAN net_device device

2015-07-21 Thread Thomas Graf
This gets rid of all OVS specific VXLAN code in the receive and
transmit path by using a VXLAN net_device to represent the vport.
Only a small shim layer remains which takes care of handling the
VXLAN specific OVS Netlink configuration.

Unexports vxlan_sock_add(), vxlan_sock_release(), vxlan_xmit_skb()
since they are no longer needed.

Signed-off-by: Thomas Graf tg...@suug.ch
Signed-off-by: Pravin B Shelar pshe...@nicira.com
---
 drivers/net/vxlan.c| 242 +++
 include/net/rtnetlink.h|   1 +
 include/net/vxlan.h|  24 +--
 net/core/rtnetlink.c   |  26 ++--
 net/openvswitch/Kconfig|  12 --
 net/openvswitch/Makefile   |   1 -
 net/openvswitch/flow_netlink.c |   6 +-
 net/openvswitch/vport-netdev.c | 201 -
 net/openvswitch/vport-vxlan.c  | 322 -
 net/openvswitch/vport-vxlan.h  |  11 --
 10 files changed, 339 insertions(+), 507 deletions(-)
 delete mode 100644 net/openvswitch/vport-vxlan.c
 delete mode 100644 net/openvswitch/vport-vxlan.h

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 30e1f21..e9feefb 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -75,6 +75,9 @@ static struct rtnl_link_ops vxlan_link_ops;
 
 static const u8 all_zeros_mac[ETH_ALEN];
 
+static struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port,
+bool no_share, u32 flags);
+
 /* per-network namespace private data for this module */
 struct vxlan_net {
struct list_head  vxlan_list;
@@ -1027,7 +1030,7 @@ static bool vxlan_group_used(struct vxlan_net *vn, struct 
vxlan_dev *dev)
return false;
 }
 
-void vxlan_sock_release(struct vxlan_sock *vs)
+static void vxlan_sock_release(struct vxlan_sock *vs)
 {
struct sock *sk = vs-sock-sk;
struct net *net = sock_net(sk);
@@ -1043,7 +1046,6 @@ void vxlan_sock_release(struct vxlan_sock *vs)
 
queue_work(vxlan_wq, vs-del_work);
 }
-EXPORT_SYMBOL_GPL(vxlan_sock_release);
 
 /* Update multicast group membership when first VNI on
  * multicast address is brought up
@@ -1126,6 +1128,102 @@ static struct vxlanhdr *vxlan_remcsum(struct sk_buff 
*skb, struct vxlanhdr *vh,
return vh;
 }
 
+static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb,
+ struct vxlan_metadata *md, u32 vni,
+ struct metadata_dst *tun_dst)
+{
+   struct iphdr *oip = NULL;
+   struct ipv6hdr *oip6 = NULL;
+   struct vxlan_dev *vxlan;
+   struct pcpu_sw_netstats *stats;
+   union vxlan_addr saddr;
+   int err = 0;
+   union vxlan_addr *remote_ip;
+
+   /* For flow based devices, map all packets to VNI 0 */
+   if (vs-flags  VXLAN_F_FLOW_BASED)
+   vni = 0;
+
+   /* Is this VNI defined? */
+   vxlan = vxlan_vs_find_vni(vs, vni);
+   if (!vxlan)
+   goto drop;
+
+   remote_ip = vxlan-default_dst.remote_ip;
+   skb_reset_mac_header(skb);
+   skb_scrub_packet(skb, !net_eq(vxlan-net, dev_net(vxlan-dev)));
+   skb-protocol = eth_type_trans(skb, vxlan-dev);
+   skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN);
+
+   /* Ignore packet loops (and multicast echo) */
+   if (ether_addr_equal(eth_hdr(skb)-h_source, vxlan-dev-dev_addr))
+   goto drop;
+
+   /* Re-examine inner Ethernet packet */
+   if (remote_ip-sa.sa_family == AF_INET) {
+   oip = ip_hdr(skb);
+   saddr.sin.sin_addr.s_addr = oip-saddr;
+   saddr.sa.sa_family = AF_INET;
+#if IS_ENABLED(CONFIG_IPV6)
+   } else {
+   oip6 = ipv6_hdr(skb);
+   saddr.sin6.sin6_addr = oip6-saddr;
+   saddr.sa.sa_family = AF_INET6;
+#endif
+   }
+
+   if (tun_dst) {
+   skb_dst_set(skb, (struct dst_entry *)tun_dst);
+   tun_dst = NULL;
+   }
+
+   if ((vxlan-flags  VXLAN_F_LEARN) 
+   vxlan_snoop(skb-dev, saddr, eth_hdr(skb)-h_source))
+   goto drop;
+
+   skb_reset_network_header(skb);
+   /* In flow-based mode, GBP is carried in dst_metadata */
+   if (!(vs-flags  VXLAN_F_FLOW_BASED))
+   skb-mark = md-gbp;
+
+   if (oip6)
+   err = IP6_ECN_decapsulate(oip6, skb);
+   if (oip)
+   err = IP_ECN_decapsulate(oip, skb);
+
+   if (unlikely(err)) {
+   if (log_ecn_error) {
+   if (oip6)
+   net_info_ratelimited(non-ECT from %pI6\n,
+oip6-saddr);
+   if (oip)
+   net_info_ratelimited(non-ECT from %pI4 with 
TOS=%#x\n,
+oip-saddr, oip-tos);
+   }
+   if (err  1) {
+   ++vxlan-dev-stats.rx_frame_errors;
+   

[PATCH net-next 20/22] openvswitch: Move dev pointer into vport itself

2015-07-21 Thread Thomas Graf
This is the first step in representing all OVS vports as regular
struct net_devices. Move the net_device pointer into the vport
structure itself to get rid of struct vport_netdev.

Signed-off-by: Thomas Graf tg...@suug.ch
Signed-off-by: Pravin B Shelar pshe...@nicira.com
---
 net/openvswitch/datapath.c   |  7 +--
 net/openvswitch/dp_notify.c  |  5 +--
 net/openvswitch/vport-internal_dev.c | 37 +++-
 net/openvswitch/vport-netdev.c   | 86 
 net/openvswitch/vport-netdev.h   | 12 -
 net/openvswitch/vport.h  |  3 +-
 6 files changed, 59 insertions(+), 91 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 0208210..19df28e 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -188,7 +188,7 @@ static int get_dpifindex(const struct datapath *dp)
 
local = ovs_vport_rcu(dp, OVSP_LOCAL);
if (local)
-   ifindex = netdev_vport_priv(local)-dev-ifindex;
+   ifindex = local-dev-ifindex;
else
ifindex = 0;
 
@@ -2219,13 +2219,10 @@ static void __net_exit list_vports_from_net(struct net 
*net, struct net *dnet,
struct vport *vport;
 
hlist_for_each_entry(vport, dp-ports[i], 
dp_hash_node) {
-   struct netdev_vport *netdev_vport;
-
if (vport-ops-type != OVS_VPORT_TYPE_INTERNAL)
continue;
 
-   netdev_vport = netdev_vport_priv(vport);
-   if (dev_net(netdev_vport-dev) == dnet)
+   if (dev_net(vport-dev) == dnet)
list_add(vport-detach_list, head);
}
}
diff --git a/net/openvswitch/dp_notify.c b/net/openvswitch/dp_notify.c
index 2c631fe..a7a80a6 100644
--- a/net/openvswitch/dp_notify.c
+++ b/net/openvswitch/dp_notify.c
@@ -58,13 +58,10 @@ void ovs_dp_notify_wq(struct work_struct *work)
struct hlist_node *n;
 
hlist_for_each_entry_safe(vport, n, dp-ports[i], 
dp_hash_node) {
-   struct netdev_vport *netdev_vport;
-
if (vport-ops-type != OVS_VPORT_TYPE_NETDEV)
continue;
 
-   netdev_vport = netdev_vport_priv(vport);
-   if (!(netdev_vport-dev-priv_flags  
IFF_OVS_DATAPATH))
+   if (!(vport-dev-priv_flags  
IFF_OVS_DATAPATH))
dp_detach_port_notify(vport);
}
}
diff --git a/net/openvswitch/vport-internal_dev.c 
b/net/openvswitch/vport-internal_dev.c
index 6a55f71..a2c205d 100644
--- a/net/openvswitch/vport-internal_dev.c
+++ b/net/openvswitch/vport-internal_dev.c
@@ -156,49 +156,44 @@ static void do_setup(struct net_device *netdev)
 static struct vport *internal_dev_create(const struct vport_parms *parms)
 {
struct vport *vport;
-   struct netdev_vport *netdev_vport;
struct internal_dev *internal_dev;
int err;
 
-   vport = ovs_vport_alloc(sizeof(struct netdev_vport),
-   ovs_internal_vport_ops, parms);
+   vport = ovs_vport_alloc(0, ovs_internal_vport_ops, parms);
if (IS_ERR(vport)) {
err = PTR_ERR(vport);
goto error;
}
 
-   netdev_vport = netdev_vport_priv(vport);
-
-   netdev_vport-dev = alloc_netdev(sizeof(struct internal_dev),
-parms-name, NET_NAME_UNKNOWN,
-do_setup);
-   if (!netdev_vport-dev) {
+   vport-dev = alloc_netdev(sizeof(struct internal_dev),
+ parms-name, NET_NAME_UNKNOWN, do_setup);
+   if (!vport-dev) {
err = -ENOMEM;
goto error_free_vport;
}
 
-   dev_net_set(netdev_vport-dev, ovs_dp_get_net(vport-dp));
-   internal_dev = internal_dev_priv(netdev_vport-dev);
+   dev_net_set(vport-dev, ovs_dp_get_net(vport-dp));
+   internal_dev = internal_dev_priv(vport-dev);
internal_dev-vport = vport;
 
/* Restrict bridge port to current netns. */
if (vport-port_no == OVSP_LOCAL)
-   netdev_vport-dev-features |= NETIF_F_NETNS_LOCAL;
+   vport-dev-features |= NETIF_F_NETNS_LOCAL;
 
rtnl_lock();
-   err = register_netdevice(netdev_vport-dev);
+   err = register_netdevice(vport-dev);
if (err)
goto error_free_netdev;
 
-   dev_set_promiscuity(netdev_vport-dev, 1);
+   dev_set_promiscuity(vport-dev, 1);
rtnl_unlock();
-   netif_start_queue(netdev_vport-dev);
+   netif_start_queue(vport-dev);
 
return 

[PATCH V3 7/7] Drivers: hv: vmbus: disable local interrupt when hvsock's callback is running

2015-07-21 Thread Dexuan Cui
In the SMP guest case, when the per-channel callback hvsock_events() is
running on virtual CPU A, if the guest tries to close the connection on
virtual CPU B: we invoke vmbus_close() - vmbus_close_internal(),
then we can have trouble: on B, vmbus_close_internal() will send IPI
reset_channel_cb() to A, trying to set channel-onchannel_callbackto NULL;
on A, if the IPI handler happens between
if (channel-onchannel_callback != NULL) and invoking
channel-onchannel_callback, we'll invoke a function pointer of NULL.

This is why the patch is necessary.

Signed-off-by: Dexuan Cui de...@microsoft.com
---
 drivers/hv/connection.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 4fc2e88..4766fd8 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -319,6 +319,9 @@ static void process_chn_event(u32 relid)
void *arg;
bool read_state;
u32 bytes_to_read;
+   bool is_hvsock = false;
+
+   local_irq_disable();
 
/*
 * Find the channel based on this relid and invokes the
@@ -327,7 +330,11 @@ static void process_chn_event(u32 relid)
channel = pcpu_relid2channel(relid);
 
if (!channel)
-   return;
+   goto out;
+
+   is_hvsock = is_hvsock_channel(channel);
+   if (!is_hvsock)
+   local_irq_enable();
 
/*
 * A channel once created is persistent even when there
@@ -363,6 +370,12 @@ static void process_chn_event(u32 relid)
bytes_to_read = 0;
} while (read_state  (bytes_to_read != 0));
}
+
+   /* local_irq_enable() is alredy invoked above */
+   if (!is_hvsock)
+   return;
+out:
+   local_irq_enable();
 }
 
 /*
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 06/22] ipv4: redirect dst output to lwtunnel output

2015-07-21 Thread Thomas Graf
From: Roopa Prabhu ro...@cumulusnetworks.com

For input routes with tunnel encap state this patch redirects
dst output functions to lwtunnel_output which later resolves to
the corresponding lwtunnel output function.

This has been tested to work with mpls ip tunnels.

Signed-off-by: Roopa Prabhu ro...@cumulusnetworks.com
---
 net/ipv4/route.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 226570b..cd3157c 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1633,6 +1633,8 @@ static int __mkroute_input(struct sk_buff *skb,
rth-dst.output = ip_output;
 
rt_set_nexthop(rth, daddr, res, fnhe, res-fi, res-type, itag);
+   if (lwtunnel_output_redirect(rth-rt_lwtstate))
+   rth-dst.output = lwtunnel_output;
skb_dst_set(skb, rth-dst);
 out:
err = 0;
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 13/22] arp: Inherit metadata dst when creating ARP requests

2015-07-21 Thread Thomas Graf
If output device wants to see the dst, inherit the dst of the
original skb and pass it on to generate the ARP request.

Signed-off-by: Thomas Graf tg...@suug.ch
---
 net/ipv4/arp.c | 65 +-
 1 file changed, 37 insertions(+), 28 deletions(-)

diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 933a928..1d59e50 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -291,6 +291,40 @@ static void arp_error_report(struct neighbour *neigh, 
struct sk_buff *skb)
kfree_skb(skb);
 }
 
+/* Create and send an arp packet. */
+static void arp_send_dst(int type, int ptype, __be32 dest_ip,
+struct net_device *dev, __be32 src_ip,
+const unsigned char *dest_hw,
+const unsigned char *src_hw,
+const unsigned char *target_hw, struct sk_buff *oskb)
+{
+   struct sk_buff *skb;
+
+   /* arp on this interface. */
+   if (dev-flags  IFF_NOARP)
+   return;
+
+   skb = arp_create(type, ptype, dest_ip, dev, src_ip,
+dest_hw, src_hw, target_hw);
+   if (!skb)
+   return;
+
+   if (oskb)
+   skb_dst_copy(skb, oskb);
+
+   arp_xmit(skb);
+}
+
+void arp_send(int type, int ptype, __be32 dest_ip,
+ struct net_device *dev, __be32 src_ip,
+ const unsigned char *dest_hw, const unsigned char *src_hw,
+ const unsigned char *target_hw)
+{
+   arp_send_dst(type, ptype, dest_ip, dev, src_ip, dest_hw, src_hw,
+target_hw, NULL);
+}
+EXPORT_SYMBOL(arp_send);
+
 static void arp_solicit(struct neighbour *neigh, struct sk_buff *skb)
 {
__be32 saddr = 0;
@@ -346,8 +380,9 @@ static void arp_solicit(struct neighbour *neigh, struct 
sk_buff *skb)
}
}
 
-   arp_send(ARPOP_REQUEST, ETH_P_ARP, target, dev, saddr,
-dst_hw, dev-dev_addr, NULL);
+   arp_send_dst(ARPOP_REQUEST, ETH_P_ARP, target, dev, saddr,
+dst_hw, dev-dev_addr, NULL,
+dev-priv_flags  IFF_XMIT_DST_RELEASE ? NULL : skb);
 }
 
 static int arp_ignore(struct in_device *in_dev, __be32 sip, __be32 tip)
@@ -597,32 +632,6 @@ void arp_xmit(struct sk_buff *skb)
 EXPORT_SYMBOL(arp_xmit);
 
 /*
- * Create and send an arp packet.
- */
-void arp_send(int type, int ptype, __be32 dest_ip,
- struct net_device *dev, __be32 src_ip,
- const unsigned char *dest_hw, const unsigned char *src_hw,
- const unsigned char *target_hw)
-{
-   struct sk_buff *skb;
-
-   /*
-*  No arp on this interface.
-*/
-
-   if (dev-flagsIFF_NOARP)
-   return;
-
-   skb = arp_create(type, ptype, dest_ip, dev, src_ip,
-dest_hw, src_hw, target_hw);
-   if (!skb)
-   return;
-
-   arp_xmit(skb);
-}
-EXPORT_SYMBOL(arp_send);
-
-/*
  * Process an arp request.
  */
 
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 21/22] openvswitch: Abstract vport name through ovs_vport_name()

2015-07-21 Thread Thomas Graf
This allows to get rid of the get_name() vport ops later on.

Signed-off-by: Thomas Graf tg...@suug.ch
---
 net/openvswitch/datapath.c   | 4 ++--
 net/openvswitch/vport-internal_dev.c | 1 -
 net/openvswitch/vport-netdev.c   | 6 --
 net/openvswitch/vport-netdev.h   | 1 -
 net/openvswitch/vport.c  | 4 ++--
 net/openvswitch/vport.h  | 5 +
 6 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 19df28e..ffe984f 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -176,7 +176,7 @@ static inline struct datapath *get_dp(struct net *net, int 
dp_ifindex)
 const char *ovs_dp_name(const struct datapath *dp)
 {
struct vport *vport = ovs_vport_ovsl_rcu(dp, OVSP_LOCAL);
-   return vport-ops-get_name(vport);
+   return ovs_vport_name(vport);
 }
 
 static int get_dpifindex(const struct datapath *dp)
@@ -1800,7 +1800,7 @@ static int ovs_vport_cmd_fill_info(struct vport *vport, 
struct sk_buff *skb,
if (nla_put_u32(skb, OVS_VPORT_ATTR_PORT_NO, vport-port_no) ||
nla_put_u32(skb, OVS_VPORT_ATTR_TYPE, vport-ops-type) ||
nla_put_string(skb, OVS_VPORT_ATTR_NAME,
-  vport-ops-get_name(vport)))
+  ovs_vport_name(vport)))
goto nla_put_failure;
 
ovs_vport_get_stats(vport, vport_stats);
diff --git a/net/openvswitch/vport-internal_dev.c 
b/net/openvswitch/vport-internal_dev.c
index a2c205d..c058bbf 100644
--- a/net/openvswitch/vport-internal_dev.c
+++ b/net/openvswitch/vport-internal_dev.c
@@ -242,7 +242,6 @@ static struct vport_ops ovs_internal_vport_ops = {
.type   = OVS_VPORT_TYPE_INTERNAL,
.create = internal_dev_create,
.destroy= internal_dev_destroy,
-   .get_name   = ovs_netdev_get_name,
.send   = internal_dev_recv,
 };
 
diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index 1c96966..e682bdc 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -171,11 +171,6 @@ static void netdev_destroy(struct vport *vport)
call_rcu(vport-rcu, free_port_rcu);
 }
 
-const char *ovs_netdev_get_name(const struct vport *vport)
-{
-   return vport-dev-name;
-}
-
 static unsigned int packet_length(const struct sk_buff *skb)
 {
unsigned int length = skb-len - ETH_HLEN;
@@ -223,7 +218,6 @@ static struct vport_ops ovs_netdev_vport_ops = {
.type   = OVS_VPORT_TYPE_NETDEV,
.create = netdev_create,
.destroy= netdev_destroy,
-   .get_name   = ovs_netdev_get_name,
.send   = netdev_send,
 };
 
diff --git a/net/openvswitch/vport-netdev.h b/net/openvswitch/vport-netdev.h
index 1c52aed..684fb88 100644
--- a/net/openvswitch/vport-netdev.h
+++ b/net/openvswitch/vport-netdev.h
@@ -26,7 +26,6 @@
 
 struct vport *ovs_netdev_get_vport(struct net_device *dev);
 
-const char *ovs_netdev_get_name(const struct vport *);
 void ovs_netdev_detach_dev(struct vport *);
 
 int __init ovs_netdev_init(void);
diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
index af23ba0..d14f594 100644
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -113,7 +113,7 @@ struct vport *ovs_vport_locate(const struct net *net, const 
char *name)
struct vport *vport;
 
hlist_for_each_entry_rcu(vport, bucket, hash_node)
-   if (!strcmp(name, vport-ops-get_name(vport)) 
+   if (!strcmp(name, ovs_vport_name(vport)) 
net_eq(ovs_dp_get_net(vport-dp), net))
return vport;
 
@@ -226,7 +226,7 @@ struct vport *ovs_vport_add(const struct vport_parms *parms)
}
 
bucket = hash_bucket(ovs_dp_get_net(vport-dp),
-vport-ops-get_name(vport));
+ovs_vport_name(vport));
hlist_add_head_rcu(vport-hash_node, bucket);
return vport;
}
diff --git a/net/openvswitch/vport.h b/net/openvswitch/vport.h
index e05ec68..1a689c2 100644
--- a/net/openvswitch/vport.h
+++ b/net/openvswitch/vport.h
@@ -237,6 +237,11 @@ static inline void ovs_skb_postpush_rcsum(struct sk_buff 
*skb,
skb-csum = csum_add(skb-csum, csum_partial(start, len, 0));
 }
 
+static inline const char *ovs_vport_name(struct vport *vport)
+{
+   return vport-dev ? vport-dev-name : vport-ops-get_name(vport);
+}
+
 int ovs_vport_ops_register(struct vport_ops *ops);
 void ovs_vport_ops_unregister(struct vport_ops *ops);
 
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 08/22] mpls: export mpls functions for use by mpls iptunnels

2015-07-21 Thread Thomas Graf
From: Roopa Prabhu ro...@cumulusnetworks.com

Signed-off-by: Roopa Prabhu ro...@cumulusnetworks.com
---
 net/mpls/af_mpls.c  | 11 ---
 net/mpls/internal.h |  9 +++--
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 1f93a59..6e66911 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -58,10 +58,11 @@ static inline struct mpls_dev *mpls_dev_get(const struct 
net_device *dev)
return rcu_dereference_rtnl(dev-mpls_ptr);
 }
 
-static bool mpls_output_possible(const struct net_device *dev)
+bool mpls_output_possible(const struct net_device *dev)
 {
return dev  (dev-flags  IFF_UP)  netif_carrier_ok(dev);
 }
+EXPORT_SYMBOL_GPL(mpls_output_possible);
 
 static unsigned int mpls_rt_header_size(const struct mpls_route *rt)
 {
@@ -69,13 +70,14 @@ static unsigned int mpls_rt_header_size(const struct 
mpls_route *rt)
return rt-rt_labels * sizeof(struct mpls_shim_hdr);
 }
 
-static unsigned int mpls_dev_mtu(const struct net_device *dev)
+unsigned int mpls_dev_mtu(const struct net_device *dev)
 {
/* The amount of data the layer 2 frame can hold */
return dev-mtu;
 }
+EXPORT_SYMBOL_GPL(mpls_dev_mtu);
 
-static bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned int mtu)
+bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned int mtu)
 {
if (skb-len = mtu)
return false;
@@ -85,6 +87,7 @@ static bool mpls_pkt_too_big(const struct sk_buff *skb, 
unsigned int mtu)
 
return true;
 }
+EXPORT_SYMBOL_GPL(mpls_pkt_too_big);
 
 static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
struct mpls_entry_decoded dec)
@@ -626,6 +629,7 @@ int nla_put_labels(struct sk_buff *skb, int attrtype,
 
return 0;
 }
+EXPORT_SYMBOL_GPL(nla_put_labels);
 
 int nla_get_labels(const struct nlattr *nla,
   u32 max_labels, u32 *labels, u32 label[])
@@ -671,6 +675,7 @@ int nla_get_labels(const struct nlattr *nla,
*labels = nla_labels;
return 0;
 }
+EXPORT_SYMBOL_GPL(nla_get_labels);
 
 static int rtm_to_route_config(struct sk_buff *skb,  struct nlmsghdr *nlh,
   struct mpls_route_config *cfg)
diff --git a/net/mpls/internal.h b/net/mpls/internal.h
index 8cabeb5..2681a4b 100644
--- a/net/mpls/internal.h
+++ b/net/mpls/internal.h
@@ -50,7 +50,12 @@ static inline struct mpls_entry_decoded 
mpls_entry_decode(struct mpls_shim_hdr *
return result;
 }
 
-int nla_put_labels(struct sk_buff *skb, int attrtype,  u8 labels, const u32 
label[]);
-int nla_get_labels(const struct nlattr *nla, u32 max_labels, u32 *labels, u32 
label[]);
+int nla_put_labels(struct sk_buff *skb, int attrtype,  u8 labels,
+  const u32 label[]);
+int nla_get_labels(const struct nlattr *nla, u32 max_labels, u32 *labels,
+  u32 label[]);
+bool mpls_output_possible(const struct net_device *dev);
+unsigned int mpls_dev_mtu(const struct net_device *dev);
+bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned int mtu);
 
 #endif /* MPLS_INTERNAL_H */
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 11/22] icmp: Don't leak original dst into ip_route_input()

2015-07-21 Thread Thomas Graf
ip_route_input() unconditionally overwrites the dst. Hide the original
dst attached to the skb by calling skb_dst_set(skb, NULL) prior to
ip_route_input().

Reported-by: Julian Anastasov j...@ssi.bg
Signed-off-by: Thomas Graf tg...@suug.ch
---
 net/ipv4/icmp.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index f5203fb..c0556f1 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -496,6 +496,7 @@ static struct rtable *icmp_route_lookup(struct net *net,
}
/* Ugh! */
orefdst = skb_in-_skb_refdst; /* save old refdst */
+   skb_dst_set(skb_in, NULL);
err = ip_route_input(skb_in, fl4_dec.daddr, fl4_dec.saddr,
 RT_TOS(tos), rt2-dst.dev);
 
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Why return E2BIG from bpf map update?

2015-07-21 Thread Daniel Borkmann

On 07/21/2015 12:24 AM, Alexei Starovoitov wrote:

On 7/20/15 3:15 PM, Alex Gartrell wrote:

The ship has probably sailed on this one, but it seems like ENOSPC
makes more sense than E2BIG.  Any chance of changing it so that poor
ebpf library maintainers in the future don't have to wonder how their
argument list got too big?


sorry, too late.
It's in tests and even document in bpf manpage:
E2BIG - indicates that the number of elements in the map reached the
max_entries limit specified at map creation time.
I read E2BIG as too big and not as argument list is too long :)


If some libraries do an strerror(3) on errno then it certainly sounds
a bit weird, no space left on device perhaps also a bit misleading.
The bpf(2) manpage was actually submitted/discussed longer time ago,
but I still didn't see it in Michael's tree yet, will ping him again.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 15/22] route: Extend flow representation with tunnel key

2015-07-21 Thread Thomas Graf
Add a new flowi_tunnel structure which is a subset of ip_tunnel_key to
allow routes to match on tunnel metadata. For now, the tunnel id is
added to flowi_tunnel which allows for routes to be bound to specific
virtual tunnels.

Signed-off-by: Thomas Graf tg...@suug.ch
---
 include/net/flow.h  | 8 
 net/ipv4/fib_frontend.c | 2 ++
 net/ipv4/route.c| 8 
 3 files changed, 18 insertions(+)

diff --git a/include/net/flow.h b/include/net/flow.h
index 8109a15..3098ae3 100644
--- a/include/net/flow.h
+++ b/include/net/flow.h
@@ -19,6 +19,10 @@
 
 #define LOOPBACK_IFINDEX   1
 
+struct flowi_tunnel {
+   __be64  tun_id;
+};
+
 struct flowi_common {
int flowic_oif;
int flowic_iif;
@@ -30,6 +34,7 @@ struct flowi_common {
 #define FLOWI_FLAG_ANYSRC  0x01
 #define FLOWI_FLAG_KNOWN_NH0x02
__u32   flowic_secid;
+   struct flowi_tunnel flowic_tun_key;
 };
 
 union flowi_uli {
@@ -66,6 +71,7 @@ struct flowi4 {
 #define flowi4_proto   __fl_common.flowic_proto
 #define flowi4_flags   __fl_common.flowic_flags
 #define flowi4_secid   __fl_common.flowic_secid
+#define flowi4_tun_key __fl_common.flowic_tun_key
 
/* (saddr,daddr) must be grouped, same order as in IP header */
__be32  saddr;
@@ -95,6 +101,7 @@ static inline void flowi4_init_output(struct flowi4 *fl4, 
int oif,
fl4-flowi4_proto = proto;
fl4-flowi4_flags = flags;
fl4-flowi4_secid = 0;
+   fl4-flowi4_tun_key.tun_id = 0;
fl4-daddr = daddr;
fl4-saddr = saddr;
fl4-fl4_dport = dport;
@@ -165,6 +172,7 @@ struct flowi {
 #define flowi_protou.__fl_common.flowic_proto
 #define flowi_flagsu.__fl_common.flowic_flags
 #define flowi_secidu.__fl_common.flowic_secid
+#define flowi_tun_key  u.__fl_common.flowic_tun_key
 } __attribute__((__aligned__(BITS_PER_LONG/8)));
 
 static inline struct flowi *flowi4_to_flowi(struct flowi4 *fl4)
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 9b2019c..6b98de0 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -280,6 +280,7 @@ __be32 fib_compute_spec_dst(struct sk_buff *skb)
fl4.flowi4_tos = RT_TOS(ip_hdr(skb)-tos);
fl4.flowi4_scope = scope;
fl4.flowi4_mark = IN_DEV_SRC_VMARK(in_dev) ? skb-mark : 0;
+   fl4.flowi4_tun_key.tun_id = 0;
if (!fib_lookup(net, fl4, res, 0))
return FIB_RES_PREFSRC(net, res);
} else {
@@ -313,6 +314,7 @@ static int __fib_validate_source(struct sk_buff *skb, 
__be32 src, __be32 dst,
fl4.saddr = dst;
fl4.flowi4_tos = tos;
fl4.flowi4_scope = RT_SCOPE_UNIVERSE;
+   fl4.flowi4_tun_key.tun_id = 0;
 
no_addr = idev-ifa_list == NULL;
 
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 4c8e84e..91da18b 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -91,6 +91,7 @@
 #include linux/slab.h
 #include linux/jhash.h
 #include net/dst.h
+#include net/dst_metadata.h
 #include net/net_namespace.h
 #include net/protocol.h
 #include net/ip.h
@@ -110,6 +111,7 @@
 #include linux/kmemleak.h
 #endif
 #include net/secure_seq.h
+#include net/ip_tunnels.h
 
 #define RT_FL_TOS(oldflp4) \
((oldflp4)-flowi4_tos  (IPTOS_RT_MASK | RTO_ONLINK))
@@ -1673,6 +1675,7 @@ static int ip_route_input_slow(struct sk_buff *skb, 
__be32 daddr, __be32 saddr,
 {
struct fib_result res;
struct in_device *in_dev = __in_dev_get_rcu(dev);
+   struct ip_tunnel_info *tun_info;
struct flowi4   fl4;
unsigned intflags = 0;
u32 itag = 0;
@@ -1690,6 +1693,11 @@ static int ip_route_input_slow(struct sk_buff *skb, 
__be32 daddr, __be32 saddr,
   by fib_lookup.
 */
 
+   tun_info = skb_tunnel_info(skb);
+   if (tun_info  tun_info-mode == IP_TUNNEL_INFO_RX)
+   fl4.flowi4_tun_key.tun_id = tun_info-key.tun_id;
+   else
+   fl4.flowi4_tun_key.tun_id = 0;
skb_dst_drop(skb);
 
if (ipv4_is_multicast(saddr) || ipv4_is_lbcast(saddr))
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 12/22] dst: Metadata destinations

2015-07-21 Thread Thomas Graf
Introduces a new dst_metadata which enables to carry per packet metadata
between forwarding and processing elements via the skb-dst pointer.

The structure is set up to be a union. Thus, each separate type of
metadata requires its own dst instance. If demand arises to carry
multiple types of metadata concurrently, metadata dst entries can be
made stackable.

The metadata dst entry is refcnt'ed as expected for now but a non
reference counted use is possible if the reference is forced before
queueing the skb.

In order to allow allocating dsts with variable length, the existing
dst_alloc() is split into a dst_alloc() and dst_init() function. The
existing dst_init() function to initialize the subsystem is being
renamed to dst_subsys_init() to make it clear what is what.

The check before ip_route_input() is changed to ignore metadata dsts
and drop the dst inside the routing function thus allowing to interpret
metadata in a later commit.

Signed-off-by: Thomas Graf tg...@suug.ch
---
 include/net/dst.h  |  6 +++-
 include/net/dst_metadata.h | 32 ++
 net/core/dev.c |  2 +-
 net/core/dst.c | 84 ++
 net/ipv4/ip_input.c|  3 +-
 net/ipv4/route.c   |  2 ++
 6 files changed, 112 insertions(+), 17 deletions(-)
 create mode 100644 include/net/dst_metadata.h

diff --git a/include/net/dst.h b/include/net/dst.h
index 2bc73f8a..2578811 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -57,6 +57,7 @@ struct dst_entry {
 #define DST_FAKE_RTABLE0x0040
 #define DST_XFRM_TUNNEL0x0080
 #define DST_XFRM_QUEUE 0x0100
+#define DST_METADATA   0x0200
 
unsigned short  pending_confirm;
 
@@ -356,6 +357,9 @@ static inline int dst_discard(struct sk_buff *skb)
 }
 void *dst_alloc(struct dst_ops *ops, struct net_device *dev, int initial_ref,
int initial_obsolete, unsigned short flags);
+void dst_init(struct dst_entry *dst, struct dst_ops *ops,
+ struct net_device *dev, int initial_ref, int initial_obsolete,
+ unsigned short flags);
 void __dst_free(struct dst_entry *dst);
 struct dst_entry *dst_destroy(struct dst_entry *dst);
 
@@ -457,7 +461,7 @@ static inline struct dst_entry *dst_check(struct dst_entry 
*dst, u32 cookie)
return dst;
 }
 
-void dst_init(void);
+void dst_subsys_init(void);
 
 /* Flags for xfrm_lookup flags argument. */
 enum {
diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
new file mode 100644
index 000..4f7694f
--- /dev/null
+++ b/include/net/dst_metadata.h
@@ -0,0 +1,32 @@
+#ifndef __NET_DST_METADATA_H
+#define __NET_DST_METADATA_H 1
+
+#include linux/skbuff.h
+#include net/ip_tunnels.h
+#include net/dst.h
+
+struct metadata_dst {
+   struct dst_entrydst;
+   size_t  opts_len;
+};
+
+static inline struct metadata_dst *skb_metadata_dst(struct sk_buff *skb)
+{
+   struct metadata_dst *md_dst = (struct metadata_dst *) skb_dst(skb);
+
+   if (md_dst  md_dst-dst.flags  DST_METADATA)
+   return md_dst;
+
+   return NULL;
+}
+
+static inline bool skb_valid_dst(const struct sk_buff *skb)
+{
+   struct dst_entry *dst = skb_dst(skb);
+
+   return dst  !(dst-flags  DST_METADATA);
+}
+
+struct metadata_dst *metadata_dst_alloc(u8 optslen, gfp_t flags);
+
+#endif /* __NET_DST_METADATA_H */
diff --git a/net/core/dev.c b/net/core/dev.c
index 2ee15af..cb52cba 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7669,7 +7669,7 @@ static int __init net_dev_init(void)
open_softirq(NET_RX_SOFTIRQ, net_rx_action);
 
hotcpu_notifier(dev_cpu_callback, 0);
-   dst_init();
+   dst_subsys_init();
rc = 0;
 out:
return rc;
diff --git a/net/core/dst.c b/net/core/dst.c
index e956ce6..917364f 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -22,6 +22,7 @@
 #include linux/prefetch.h
 
 #include net/dst.h
+#include net/dst_metadata.h
 
 /*
  * Theory of operations:
@@ -158,19 +159,10 @@ const u32 dst_default_metrics[RTAX_MAX + 1] = {
[RTAX_MAX] = 0xdeadbeef,
 };
 
-
-void *dst_alloc(struct dst_ops *ops, struct net_device *dev,
-   int initial_ref, int initial_obsolete, unsigned short flags)
+void dst_init(struct dst_entry *dst, struct dst_ops *ops,
+ struct net_device *dev, int initial_ref, int initial_obsolete,
+ unsigned short flags)
 {
-   struct dst_entry *dst;
-
-   if (ops-gc  dst_entries_get_fast(ops)  ops-gc_thresh) {
-   if (ops-gc(ops))
-   return NULL;
-   }
-   dst = kmem_cache_alloc(ops-kmem_cachep, GFP_ATOMIC);
-   if (!dst)
-   return NULL;
dst-child = NULL;
dst-dev = dev;
if (dev)
@@ -200,6 +192,25 @@ void *dst_alloc(struct dst_ops *ops, struct net_device 
*dev,
dst-next = NULL;
if (!(flags  DST_NOCOUNT))

[PATCH net-next 17/22] fib: Add fib rule match on tunnel id

2015-07-21 Thread Thomas Graf
This add the ability to select a routing table based on the tunnel
id which allows to maintain separate routing tables for each virtual
tunnel network.

ip rule add from all tunnel-id 100 lookup 100
ip rule add from all tunnel-id 200 lookup 200

A new static key controls the collection of metadata at tunnel level
upon demand.

Signed-off-by: Thomas Graf tg...@suug.ch
---
 drivers/net/vxlan.c|  3 ++-
 include/net/fib_rules.h|  1 +
 include/net/ip_tunnels.h   | 11 +++
 include/uapi/linux/fib_rules.h |  2 +-
 net/core/fib_rules.c   | 24 ++--
 net/ipv4/ip_tunnel_core.c  | 16 
 6 files changed, 53 insertions(+), 4 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 9486d7e..2587ac8 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -143,7 +143,8 @@ static struct workqueue_struct *vxlan_wq;
 
 static inline bool vxlan_collect_metadata(struct vxlan_sock *vs)
 {
-   return vs-flags  VXLAN_F_COLLECT_METADATA;
+   return vs-flags  VXLAN_F_COLLECT_METADATA ||
+  ip_tunnel_collect_metadata();
 }
 
 #if IS_ENABLED(CONFIG_IPV6)
diff --git a/include/net/fib_rules.h b/include/net/fib_rules.h
index 903a55e..4e8f804 100644
--- a/include/net/fib_rules.h
+++ b/include/net/fib_rules.h
@@ -19,6 +19,7 @@ struct fib_rule {
u8  action;
/* 3 bytes hole, try to use */
u32 target;
+   __be64  tun_id;
struct fib_rule __rcu   *ctarget;
struct net  *fr_net;
 
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 0b7e18c..0a5a776 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -303,6 +303,17 @@ static inline struct ip_tunnel_info *lwt_tun_info(struct 
lwtunnel_state *lwtstat
return (struct ip_tunnel_info *)lwtstate-data;
 }
 
+extern struct static_key ip_tunnel_metadata_cnt;
+
+/* Returns  0 if metadata should be collected */
+static inline int ip_tunnel_collect_metadata(void)
+{
+   return static_key_false(ip_tunnel_metadata_cnt);
+}
+
+void ip_tunnel_need_metadata(void);
+void ip_tunnel_unneed_metadata(void);
+
 #endif /* CONFIG_INET */
 
 #endif /* __NET_IP_TUNNELS_H */
diff --git a/include/uapi/linux/fib_rules.h b/include/uapi/linux/fib_rules.h
index 2b82d7e..96161b8 100644
--- a/include/uapi/linux/fib_rules.h
+++ b/include/uapi/linux/fib_rules.h
@@ -43,7 +43,7 @@ enum {
FRA_UNUSED5,
FRA_FWMARK, /* mark */
FRA_FLOW,   /* flow/class id */
-   FRA_UNUSED6,
+   FRA_TUN_ID,
FRA_SUPPRESS_IFGROUP,
FRA_SUPPRESS_PREFIXLEN,
FRA_TABLE,  /* Extended table id */
diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index 9a12668..ae8306e 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -16,6 +16,7 @@
 #include net/net_namespace.h
 #include net/sock.h
 #include net/fib_rules.h
+#include net/ip_tunnels.h
 
 int fib_default_rule_add(struct fib_rules_ops *ops,
 u32 pref, u32 table, u32 flags)
@@ -186,6 +187,9 @@ static int fib_rule_match(struct fib_rule *rule, struct 
fib_rules_ops *ops,
if ((rule-mark ^ fl-flowi_mark)  rule-mark_mask)
goto out;
 
+   if (rule-tun_id  (rule-tun_id != fl-flowi_tun_key.tun_id))
+   goto out;
+
ret = ops-match(rule, fl, flags);
 out:
return (rule-flags  FIB_RULE_INVERT) ? !ret : ret;
@@ -330,6 +334,9 @@ static int fib_nl_newrule(struct sk_buff *skb, struct 
nlmsghdr* nlh)
if (tb[FRA_FWMASK])
rule-mark_mask = nla_get_u32(tb[FRA_FWMASK]);
 
+   if (tb[FRA_TUN_ID])
+   rule-tun_id = nla_get_be64(tb[FRA_TUN_ID]);
+
rule-action = frh-action;
rule-flags = frh-flags;
rule-table = frh_get_table(frh, tb);
@@ -407,6 +414,9 @@ static int fib_nl_newrule(struct sk_buff *skb, struct 
nlmsghdr* nlh)
if (unresolved)
ops-unresolved_rules++;
 
+   if (rule-tun_id)
+   ip_tunnel_need_metadata();
+
notify_rule_change(RTM_NEWRULE, rule, ops, nlh, NETLINK_CB(skb).portid);
flush_route_cache(ops);
rules_ops_put(ops);
@@ -473,6 +483,10 @@ static int fib_nl_delrule(struct sk_buff *skb, struct 
nlmsghdr* nlh)
(rule-mark_mask != nla_get_u32(tb[FRA_FWMASK])))
continue;
 
+   if (tb[FRA_TUN_ID] 
+   (rule-tun_id != nla_get_be64(tb[FRA_TUN_ID])))
+   continue;
+
if (!ops-compare(rule, frh, tb))
continue;
 
@@ -487,6 +501,9 @@ static int fib_nl_delrule(struct sk_buff *skb, struct 
nlmsghdr* nlh)
goto errout;
}
 
+   if (rule-tun_id)
+   ip_tunnel_unneed_metadata();
+
list_del_rcu(rule-list);
 
if (rule-action == FR_ACT_GOTO) {

[PATCH net-next 14/22] vxlan: Flow based tunneling

2015-07-21 Thread Thomas Graf
Allows putting a VXLAN device into a new flow-based mode in which
skbs with a ip_tunnel_info dst metadata attached will be encapsulated
according to the instructions stored in there with the VXLAN device
defaults taken into consideration.

Similar on the receive side, if the VXLAN_F_COLLECT_METADATA flag is
set, the packet processing will populate a ip_tunnel_info struct for
each packet received and attach it to the skb using the new metadata
dst.  The metadata structure will contain the outer header and tunnel
header fields which have been stripped off. Layers further up in the
stack such as routing, tc or netfitler can later match on these fields
and perform forwarding. It is the responsibility of upper layers to
ensure that the flag is set if the metadata is needed. The flag limits
the additional cost of metadata collecting based on demand.

This prepares the VXLAN device to be steered by the routing and other
subsystems which allows to support encapsulation for a large number
of tunnel endpoints and tunnel ids through a single net_device which
improves the scalability.

It also allows for OVS to leverage this mode which in turn allows for
the removal of the OVS specific VXLAN code.

Because the skb is currently scrubed in vxlan_rcv(), the attachment of
the new dst metadata is postponed until after scrubing which requires
the temporary addition of a new member to vxlan_metadata. This member
is removed again in a later commit after the indirect VXLAN receive API
has been removed.

Signed-off-by: Thomas Graf tg...@suug.ch
Signed-off-by: Pravin B Shelar pshe...@nicira.com
---
 drivers/net/vxlan.c  | 149 ---
 include/linux/skbuff.h   |   1 +
 include/net/dst_metadata.h   |  13 
 include/net/ip_tunnels.h |  14 
 include/net/vxlan.h  |  10 ++-
 include/uapi/linux/if_link.h |   1 +
 6 files changed, 165 insertions(+), 23 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index ec86a11..06c092b 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -49,6 +49,7 @@
 #include net/ip6_tunnel.h
 #include net/ip6_checksum.h
 #endif
+#include net/dst_metadata.h
 
 #define VXLAN_VERSION  0.1
 
@@ -140,6 +141,11 @@ struct vxlan_dev {
 static u32 vxlan_salt __read_mostly;
 static struct workqueue_struct *vxlan_wq;
 
+static inline bool vxlan_collect_metadata(struct vxlan_sock *vs)
+{
+   return vs-flags  VXLAN_F_COLLECT_METADATA;
+}
+
 #if IS_ENABLED(CONFIG_IPV6)
 static inline
 bool vxlan_addr_equal(const union vxlan_addr *a, const union vxlan_addr *b)
@@ -1164,10 +1170,13 @@ static struct vxlanhdr *vxlan_remcsum(struct sk_buff 
*skb, struct vxlanhdr *vh,
 /* Callback from net/ipv4/udp.c to receive packets */
 static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 {
+   struct metadata_dst *tun_dst = NULL;
+   struct ip_tunnel_info *info;
struct vxlan_sock *vs;
struct vxlanhdr *vxh;
u32 flags, vni;
-   struct vxlan_metadata md = {0};
+   struct vxlan_metadata _md;
+   struct vxlan_metadata *md = _md;
 
/* Need Vxlan and inner Ethernet header to be present */
if (!pskb_may_pull(skb, VXLAN_HLEN))
@@ -1202,6 +1211,33 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct 
sk_buff *skb)
vni = VXLAN_VNI_MASK;
}
 
+   if (vxlan_collect_metadata(vs)) {
+   const struct iphdr *iph = ip_hdr(skb);
+
+   tun_dst = metadata_dst_alloc(sizeof(*md), GFP_ATOMIC);
+   if (!tun_dst)
+   goto drop;
+
+   info = tun_dst-u.tun_info;
+   info-key.ipv4_src = iph-saddr;
+   info-key.ipv4_dst = iph-daddr;
+   info-key.ipv4_tos = iph-tos;
+   info-key.ipv4_ttl = iph-ttl;
+   info-key.tp_src = udp_hdr(skb)-source;
+   info-key.tp_dst = udp_hdr(skb)-dest;
+
+   info-mode = IP_TUNNEL_INFO_RX;
+   info-key.tun_flags = TUNNEL_KEY;
+   info-key.tun_id = cpu_to_be64(vni  8);
+   if (udp_hdr(skb)-check != 0)
+   info-key.tun_flags |= TUNNEL_CSUM;
+
+   md = ip_tunnel_info_opts(info, sizeof(*md));
+   md-tun_dst = tun_dst;
+   } else {
+   memset(md, 0, sizeof(*md));
+   }
+
/* For backwards compatibility, only allow reserved fields to be
 * used by VXLAN extensions if explicitly requested.
 */
@@ -1209,13 +1245,16 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct 
sk_buff *skb)
struct vxlanhdr_gbp *gbp;
 
gbp = (struct vxlanhdr_gbp *)vxh;
-   md.gbp = ntohs(gbp-policy_id);
+   md-gbp = ntohs(gbp-policy_id);
+
+   if (tun_dst)
+   info-key.tun_flags |= TUNNEL_VXLAN_OPT;
 
if (gbp-dont_learn)
-   md.gbp |= VXLAN_GBP_DONT_LEARN;
+

[PATCH net-next 07/22] ipv6: rt6_info output redirect to tunnel output

2015-07-21 Thread Thomas Graf
From: Roopa Prabhu ro...@cumulusnetworks.com

This is similar to ipv4 redirect of dst output to lwtunnel
output function for encapsulation and xmit.

Signed-off-by: Roopa Prabhu ro...@cumulusnetworks.com
---
 net/ipv6/route.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index b3431b7..7f2214f 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1780,6 +1780,7 @@ int ip6_route_add(struct fib6_config *cfg)
goto out;
lwtunnel_state_get(lwtstate);
rt-rt6i_lwtstate = lwtstate;
+   rt-dst.output = lwtunnel_output6;
}
 
ipv6_addr_prefix(rt-rt6i_dst.addr, cfg-fc_dst, cfg-fc_dst_len);
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 03/22] ipv4: support for fib route lwtunnel encap attributes

2015-07-21 Thread Thomas Graf
From: Roopa Prabhu ro...@cumulusnetworks.com

This patch adds support in ipv4 fib functions to parse user
provided encap attributes and attach encap state data to fib_nh
and rtable.

Signed-off-by: Roopa Prabhu ro...@cumulusnetworks.com
---
 include/net/ip_fib.h |  5 ++-
 include/net/route.h  |  1 +
 net/ipv4/fib_frontend.c  |  8 
 net/ipv4/fib_semantics.c | 96 +++-
 net/ipv4/route.c | 16 +++-
 5 files changed, 122 insertions(+), 4 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 49c142b..5e01960 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -44,7 +44,9 @@ struct fib_config {
u32 fc_flow;
u32 fc_nlflags;
struct nl_info  fc_nlinfo;
- };
+   struct nlattr   *fc_encap;
+   u16 fc_encap_type;
+};
 
 struct fib_info;
 struct rtable;
@@ -89,6 +91,7 @@ struct fib_nh {
struct rtable __rcu * __percpu *nh_pcpu_rth_output;
struct rtable __rcu *nh_rth_input;
struct fnhe_hash_bucket __rcu *nh_exceptions;
+   struct lwtunnel_state   *nh_lwtstate;
 };
 
 /*
diff --git a/include/net/route.h b/include/net/route.h
index fe22d03..2d45f41 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -66,6 +66,7 @@ struct rtable {
 
struct list_headrt_uncached;
struct uncached_list*rt_uncached_list;
+   struct lwtunnel_state   *rt_lwtstate;
 };
 
 static inline bool rt_is_input_route(const struct rtable *rt)
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 6bbc549..9b2019c 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -591,6 +591,8 @@ const struct nla_policy rtm_ipv4_policy[RTA_MAX + 1] = {
[RTA_METRICS]   = { .type = NLA_NESTED },
[RTA_MULTIPATH] = { .len = sizeof(struct rtnexthop) },
[RTA_FLOW]  = { .type = NLA_U32 },
+   [RTA_ENCAP_TYPE]= { .type = NLA_U16 },
+   [RTA_ENCAP] = { .type = NLA_NESTED },
 };
 
 static int rtm_to_fib_config(struct net *net, struct sk_buff *skb,
@@ -656,6 +658,12 @@ static int rtm_to_fib_config(struct net *net, struct 
sk_buff *skb,
case RTA_TABLE:
cfg-fc_table = nla_get_u32(attr);
break;
+   case RTA_ENCAP:
+   cfg-fc_encap = attr;
+   break;
+   case RTA_ENCAP_TYPE:
+   cfg-fc_encap_type = nla_get_u16(attr);
+   break;
}
}
 
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index c7358ea..6754c64 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -42,6 +42,7 @@
 #include net/ip_fib.h
 #include net/netlink.h
 #include net/nexthop.h
+#include net/lwtunnel.h
 
 #include fib_lookup.h
 
@@ -208,6 +209,7 @@ static void free_fib_info_rcu(struct rcu_head *head)
change_nexthops(fi) {
if (nexthop_nh-nh_dev)
dev_put(nexthop_nh-nh_dev);
+   lwtunnel_state_put(nexthop_nh-nh_lwtstate);
free_nh_exceptions(nexthop_nh);
rt_fibinfo_free_cpus(nexthop_nh-nh_pcpu_rth_output);
rt_fibinfo_free(nexthop_nh-nh_rth_input);
@@ -266,6 +268,7 @@ static inline int nh_comp(const struct fib_info *fi, const 
struct fib_info *ofi)
 #ifdef CONFIG_IP_ROUTE_CLASSID
nh-nh_tclassid != onh-nh_tclassid ||
 #endif
+   lwtunnel_cmp_encap(nh-nh_lwtstate, onh-nh_lwtstate) ||
((nh-nh_flags ^ onh-nh_flags)  ~RTNH_COMPARE_MASK))
return -1;
onh++;
@@ -366,6 +369,7 @@ static inline size_t fib_nlmsg_size(struct fib_info *fi)
payload += nla_total_size((RTAX_MAX * nla_total_size(4)));
 
if (fi-fib_nhs) {
+   size_t nh_encapsize = 0;
/* Also handles the special case fib_nhs == 1 */
 
/* each nexthop is packed in an attribute */
@@ -374,8 +378,21 @@ static inline size_t fib_nlmsg_size(struct fib_info *fi)
/* may contain flow and gateway attribute */
nhsize += 2 * nla_total_size(4);
 
+   /* grab encap info */
+   for_nexthops(fi) {
+   if (nh-nh_lwtstate) {
+   /* RTA_ENCAP_TYPE */
+   nh_encapsize += lwtunnel_get_encap_size(
+   nh-nh_lwtstate);
+   /* RTA_ENCAP */
+   nh_encapsize +=  nla_total_size(2);
+   }
+   } endfor_nexthops(fi);
+
/* all nexthops are packed in a nested attribute */
-   payload += nla_total_size(fi-fib_nhs * nhsize);
+   payload += 

[PATCH net-next 05/22] lwtunnel: support dst output redirect function

2015-07-21 Thread Thomas Graf
From: Roopa Prabhu ro...@cumulusnetworks.com

This patch introduces lwtunnel_output function to call corresponding
lwtunnels output function to xmit the packet.

It adds two variants lwtunnel_output and lwtunnel_output6 for ipv4 and
ipv6 respectively today. But this is subject to change when lwtstate will
reside in dst or dst_metadata (as per upstream discussions).

Signed-off-by: Roopa Prabhu ro...@cumulusnetworks.com
---
 include/net/lwtunnel.h | 12 +++
 net/core/lwtunnel.c| 56 ++
 2 files changed, 68 insertions(+)

diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h
index df24b36..918e03c 100644
--- a/include/net/lwtunnel.h
+++ b/include/net/lwtunnel.h
@@ -69,6 +69,8 @@ int lwtunnel_fill_encap(struct sk_buff *skb,
 int lwtunnel_get_encap_size(struct lwtunnel_state *lwtstate);
 struct lwtunnel_state *lwtunnel_state_alloc(int hdr_len);
 int lwtunnel_cmp_encap(struct lwtunnel_state *a, struct lwtunnel_state *b);
+int lwtunnel_output(struct sock *sk, struct sk_buff *skb);
+int lwtunnel_output6(struct sock *sk, struct sk_buff *skb);
 
 #else
 
@@ -127,6 +129,16 @@ static inline int lwtunnel_cmp_encap(struct lwtunnel_state 
*a,
return 0;
 }
 
+static inline int lwtunnel_output(struct sock *sk, struct sk_buff *skb)
+{
+   return -EOPNOTSUPP;
+}
+
+static inline int lwtunnel_output6(struct sock *sk, struct sk_buff *skb)
+{
+   return -EOPNOTSUPP;
+}
+
 #endif
 
 #endif /* __NET_LWTUNNEL_H */
diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c
index d7ae3a2..bb58826 100644
--- a/net/core/lwtunnel.c
+++ b/net/core/lwtunnel.c
@@ -25,6 +25,7 @@
 
 #include net/lwtunnel.h
 #include net/rtnetlink.h
+#include net/ip6_fib.h
 
 struct lwtunnel_state *lwtunnel_state_alloc(int encap_len)
 {
@@ -177,3 +178,58 @@ int lwtunnel_cmp_encap(struct lwtunnel_state *a, struct 
lwtunnel_state *b)
return ret;
 }
 EXPORT_SYMBOL(lwtunnel_cmp_encap);
+
+int __lwtunnel_output(struct sock *sk, struct sk_buff *skb,
+ struct lwtunnel_state *lwtstate)
+{
+   const struct lwtunnel_encap_ops *ops;
+   int ret = -EINVAL;
+
+   if (!lwtstate)
+   goto drop;
+
+   if (lwtstate-type == LWTUNNEL_ENCAP_NONE ||
+   lwtstate-type  LWTUNNEL_ENCAP_MAX)
+   return 0;
+
+   ret = -EOPNOTSUPP;
+   rcu_read_lock();
+   ops = rcu_dereference(lwtun_encaps[lwtstate-type]);
+   if (likely(ops  ops-output))
+   ret = ops-output(sk, skb);
+   rcu_read_unlock();
+
+   if (ret == -EOPNOTSUPP)
+   goto drop;
+
+   return ret;
+
+drop:
+   kfree(skb);
+
+   return ret;
+}
+
+int lwtunnel_output6(struct sock *sk, struct sk_buff *skb)
+{
+   struct rt6_info *rt = (struct rt6_info *)skb_dst(skb);
+   struct lwtunnel_state *lwtstate = NULL;
+
+   if (rt)
+   lwtstate = rt-rt6i_lwtstate;
+
+   return __lwtunnel_output(sk, skb, lwtstate);
+}
+EXPORT_SYMBOL(lwtunnel_output6);
+
+int lwtunnel_output(struct sock *sk, struct sk_buff *skb)
+{
+   struct rtable *rt = (struct rtable *)skb_dst(skb);
+   struct lwtunnel_state *lwtstate = NULL;
+
+   if (rt)
+   lwtstate = rt-rt_lwtstate;
+
+   return __lwtunnel_output(sk, skb, lwtstate);
+}
+EXPORT_SYMBOL(lwtunnel_output);
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 16/22] route: Per route IP tunnel metadata via lightweight tunnel

2015-07-21 Thread Thomas Graf
This introduces a new IP tunnel lightweight tunnel type which allows
to specify IP tunnel instructions per route. Only IPv4 is supported
at this point.

Signed-off-by: Thomas Graf tg...@suug.ch
---
 drivers/net/vxlan.c|  10 +++-
 include/net/dst_metadata.h |  12 -
 include/net/ip_tunnels.h   |   7 ++-
 include/uapi/linux/lwtunnel.h  |   1 +
 include/uapi/linux/rtnetlink.h |  15 ++
 net/ipv4/ip_tunnel_core.c  | 114 +
 net/ipv4/route.c   |   2 +-
 net/openvswitch/vport.h|   1 +
 8 files changed, 157 insertions(+), 5 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 06c092b..9486d7e 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1935,7 +1935,7 @@ static void vxlan_encap_bypass(struct sk_buff *skb, 
struct vxlan_dev *src_vxlan,
 static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
   struct vxlan_rdst *rdst, bool did_rsc)
 {
-   struct ip_tunnel_info *info = skb_tunnel_info(skb);
+   struct ip_tunnel_info *info;
struct vxlan_dev *vxlan = netdev_priv(dev);
struct sock *sk = vxlan-vn_sock-sock-sk;
struct rtable *rt = NULL;
@@ -1952,6 +1952,9 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
int err;
u32 flags = vxlan-flags;
 
+   /* FIXME: Support IPv6 */
+   info = skb_tunnel_info(skb, AF_INET);
+
if (rdst) {
dst_port = rdst-remote_port ? rdst-remote_port : 
vxlan-dst_port;
vni = rdst-remote_vni;
@@ -2141,12 +2144,15 @@ tx_free:
 static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
 {
struct vxlan_dev *vxlan = netdev_priv(dev);
-   const struct ip_tunnel_info *info = skb_tunnel_info(skb);
+   const struct ip_tunnel_info *info;
struct ethhdr *eth;
bool did_rsc = false;
struct vxlan_rdst *rdst, *fdst = NULL;
struct vxlan_fdb *f;
 
+   /* FIXME: Support IPv6 */
+   info = skb_tunnel_info(skb, AF_INET);
+
skb_reset_mac_header(skb);
eth = eth_hdr(skb);
 
diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
index e843937..7b03068 100644
--- a/include/net/dst_metadata.h
+++ b/include/net/dst_metadata.h
@@ -23,13 +23,23 @@ static inline struct metadata_dst *skb_metadata_dst(struct 
sk_buff *skb)
return NULL;
 }
 
-static inline struct ip_tunnel_info *skb_tunnel_info(struct sk_buff *skb)
+static inline struct ip_tunnel_info *skb_tunnel_info(struct sk_buff *skb,
+int family)
 {
struct metadata_dst *md_dst = skb_metadata_dst(skb);
+   struct rtable *rt;
 
if (md_dst)
return md_dst-u.tun_info;
 
+   switch (family) {
+   case AF_INET:
+   rt = (struct rtable *)skb_dst(skb);
+   if (rt  rt-rt_lwtstate)
+   return lwt_tun_info(rt-rt_lwtstate);
+   break;
+   }
+
return NULL;
 }
 
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index d11530f..0b7e18c 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -9,9 +9,9 @@
 #include net/dsfield.h
 #include net/gro_cells.h
 #include net/inet_ecn.h
-#include net/ip.h
 #include net/netns/generic.h
 #include net/rtnetlink.h
+#include net/lwtunnel.h
 
 #if IS_ENABLED(CONFIG_IPV6)
 #include net/ipv6.h
@@ -298,6 +298,11 @@ static inline void *ip_tunnel_info_opts(struct 
ip_tunnel_info *info, size_t n)
return info + 1;
 }
 
+static inline struct ip_tunnel_info *lwt_tun_info(struct lwtunnel_state 
*lwtstate)
+{
+   return (struct ip_tunnel_info *)lwtstate-data;
+}
+
 #endif /* CONFIG_INET */
 
 #endif /* __NET_IP_TUNNELS_H */
diff --git a/include/uapi/linux/lwtunnel.h b/include/uapi/linux/lwtunnel.h
index aa611d9..31377bb 100644
--- a/include/uapi/linux/lwtunnel.h
+++ b/include/uapi/linux/lwtunnel.h
@@ -6,6 +6,7 @@
 enum lwtunnel_encap_types {
LWTUNNEL_ENCAP_NONE,
LWTUNNEL_ENCAP_MPLS,
+   LWTUNNEL_ENCAP_IP,
__LWTUNNEL_ENCAP_MAX,
 };
 
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 0d3d3cc..47d24cb 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -286,6 +286,21 @@ enum rt_class_t {
 
 /* Routing message attributes */
 
+enum ip_tunnel_t {
+   IP_TUN_UNSPEC,
+   IP_TUN_ID,
+   IP_TUN_DST,
+   IP_TUN_SRC,
+   IP_TUN_TTL,
+   IP_TUN_TOS,
+   IP_TUN_SPORT,
+   IP_TUN_DPORT,
+   IP_TUN_FLAGS,
+   __IP_TUN_MAX,
+};
+
+#define IP_TUN_MAX (__IP_TUN_MAX - 1)
+
 enum rtattr_type_t {
RTA_UNSPEC,
RTA_DST,
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index 6a51a71..025b76e 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -190,3 +190,117 @@ struct rtnl_link_stats64 

[PATCH net-next 02/22] lwtunnel: infrastructure for handling light weight tunnels like mpls

2015-07-21 Thread Thomas Graf
From: Roopa Prabhu ro...@cumulusnetworks.com

Provides infrastructure to parse/dump/store encap information for
light weight tunnels like mpls. Encap information for such tunnels
is associated with fib routes.

This infrastructure is based on previous suggestions from
Eric Biederman to follow the xfrm infrastructure.

Signed-off-by: Roopa Prabhu ro...@cumulusnetworks.com
---
 include/linux/lwtunnel.h  |   6 ++
 include/net/lwtunnel.h| 132 +++
 include/uapi/linux/lwtunnel.h |  15 
 net/Kconfig   |   7 ++
 net/core/Makefile |   1 +
 net/core/lwtunnel.c   | 179 ++
 6 files changed, 340 insertions(+)
 create mode 100644 include/linux/lwtunnel.h
 create mode 100644 include/net/lwtunnel.h
 create mode 100644 include/uapi/linux/lwtunnel.h
 create mode 100644 net/core/lwtunnel.c

diff --git a/include/linux/lwtunnel.h b/include/linux/lwtunnel.h
new file mode 100644
index 000..97f32f8
--- /dev/null
+++ b/include/linux/lwtunnel.h
@@ -0,0 +1,6 @@
+#ifndef _LINUX_LWTUNNEL_H_
+#define _LINUX_LWTUNNEL_H_
+
+#include uapi/linux/lwtunnel.h
+
+#endif /* _LINUX_LWTUNNEL_H_ */
diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h
new file mode 100644
index 000..df24b36
--- /dev/null
+++ b/include/net/lwtunnel.h
@@ -0,0 +1,132 @@
+#ifndef __NET_LWTUNNEL_H
+#define __NET_LWTUNNEL_H 1
+
+#include linux/lwtunnel.h
+#include linux/netdevice.h
+#include linux/skbuff.h
+#include linux/types.h
+#include net/route.h
+
+#define LWTUNNEL_HASH_BITS   7
+#define LWTUNNEL_HASH_SIZE   (1  LWTUNNEL_HASH_BITS)
+
+/* lw tunnel state flags */
+#define LWTUNNEL_STATE_OUTPUT_REDIRECT 0x1
+
+struct lwtunnel_state {
+   __u16   type;
+   __u16   flags;
+   atomic_trefcnt;
+   int len;
+   __u8data[0];
+};
+
+struct lwtunnel_encap_ops {
+   int (*build_state)(struct net_device *dev, struct nlattr *encap,
+  struct lwtunnel_state **ts);
+   int (*output)(struct sock *sk, struct sk_buff *skb);
+   int (*fill_encap)(struct sk_buff *skb,
+ struct lwtunnel_state *lwtstate);
+   int (*get_encap_size)(struct lwtunnel_state *lwtstate);
+   int (*cmp_encap)(struct lwtunnel_state *a, struct lwtunnel_state *b);
+};
+
+extern const struct lwtunnel_encap_ops __rcu *
+   lwtun_encaps[LWTUNNEL_ENCAP_MAX+1];
+
+#ifdef CONFIG_LWTUNNEL
+static inline void lwtunnel_state_get(struct lwtunnel_state *lws)
+{
+   atomic_inc(lws-refcnt);
+}
+
+static inline void lwtunnel_state_put(struct lwtunnel_state *lws)
+{
+   if (!lws)
+   return;
+
+   if (atomic_dec_and_test(lws-refcnt))
+   kfree(lws);
+}
+
+static inline bool lwtunnel_output_redirect(struct lwtunnel_state *lwtstate)
+{
+   if (lwtstate  (lwtstate-flags  LWTUNNEL_STATE_OUTPUT_REDIRECT))
+   return true;
+
+   return false;
+}
+
+int lwtunnel_encap_add_ops(const struct lwtunnel_encap_ops *op,
+  unsigned int num);
+int lwtunnel_encap_del_ops(const struct lwtunnel_encap_ops *op,
+  unsigned int num);
+int lwtunnel_build_state(struct net_device *dev, u16 encap_type,
+struct nlattr *encap,
+struct lwtunnel_state **lws);
+int lwtunnel_fill_encap(struct sk_buff *skb,
+   struct lwtunnel_state *lwtstate);
+int lwtunnel_get_encap_size(struct lwtunnel_state *lwtstate);
+struct lwtunnel_state *lwtunnel_state_alloc(int hdr_len);
+int lwtunnel_cmp_encap(struct lwtunnel_state *a, struct lwtunnel_state *b);
+
+#else
+
+static inline void lwtunnel_state_get(struct lwtunnel_state *lws)
+{
+}
+
+static inline void lwtunnel_state_put(struct lwtunnel_state *lws)
+{
+}
+
+static inline bool lwtunnel_output_redirect(struct lwtunnel_state *lwtstate)
+{
+   return false;
+}
+
+static inline int lwtunnel_encap_add_ops(const struct lwtunnel_encap_ops *op,
+unsigned int num)
+{
+   return -EOPNOTSUPP;
+
+}
+
+static inline int lwtunnel_encap_del_ops(const struct lwtunnel_encap_ops *op,
+unsigned int num)
+{
+   return -EOPNOTSUPP;
+}
+
+static inline int lwtunnel_build_state(struct net_device *dev, u16 encap_type,
+  struct nlattr *encap,
+  struct lwtunnel_state **lws)
+{
+   return -EOPNOTSUPP;
+}
+
+static inline int lwtunnel_fill_encap(struct sk_buff *skb,
+ struct lwtunnel_state *lwtstate)
+{
+   return 0;
+}
+
+static inline int lwtunnel_get_encap_size(struct lwtunnel_state *lwtstate)
+{
+   return 0;
+}
+
+static inline struct lwtunnel_state *lwtunnel_state_alloc(int hdr_len)
+{
+   return NULL;
+}
+
+static inline int lwtunnel_cmp_encap(struct lwtunnel_state *a,
+  

[PATCH net-next 00/22 v2] Lightweight flow based encapsulation

2015-07-21 Thread Thomas Graf
This series combines the work previously posted by Roopa, Robert and
myself. It's according to what we discussed at NFWS. The motivation
of this series is to:

 * Consolidate code between OVS and the rest of the kernel and get
   rid of OVS vports and instead represent them as pure net_devices.
 * Introduce a lightweight tunneling mechanism which enables flow
   based encapsulation to improve scalability on both RX and TX.
 * Do the above in an encapsulation unspecific way so that the
   encapsulation type is eventually abstracted away from the user.
 * Use the same forwarding decision for both native forwarding and
   encapsulation thus allowing to switch between native IPv6 and
   UDP encapsulation based on endpoint without requiring additional
   logic

The fundamental changes introduces in this series are:
 * A new RTA_ENCAP Netlink attribute for routes carrying encapsulation
   instructions. Depending on the specified type, the instructions
   apply to UDP encapsulations, MPLS and possible other in the future.
 * Depending on the encapsulation type, the output function of the
   dst is directly overwritten or the dst merely attaches metadata and
   relies on a subsequent net_device to apply it to the packet. The
   latter is typically used if an inner and outer IP header exist which
   require two subsequent routing lookups to be performed.
 * A new metadata_dst structure which can be attached to skbs to
   carry metadata in between subsystems. This new metadata transport
   is used to provide a single interface for VXLAN, routing and OVS
   to communicate through metadata.

The OVS interfaces remain as-is but will transparently create a real
VXLAN net_device in the background. iproute2 is extended with a new
use cases:

  VXLAN:
  ip route add 40.1.1.1/32 encap vxlan id 10 dst 50.1.1.2 dev vxlan0

  MPLS:
  ip route add 10.1.1.0/30 encap mpls 200 via inet 10.1.1.1 dev swp1

Performance implications:
  The additional memory allocation in the receive path should have
  performance implications although it is not observable in standard
  throughput tests if GRO is properly done. The correct net_device
  model outweights the additional cost of the allocation. Furthermore,
  this implication can be relaxed by reintroducing a direct unqueued
  path from a software device to a consumer like bridge or OVS if
  needed.

$ netperf  -t TCP_STREAM -H 15.1.1.201
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
15.1.1.201 (15.1.1.201) port 0 AF_INET : demo
Recv   SendSend
Socket Socket  Message  Elapsed
Size   SizeSize Time Throughput
bytes  bytes   bytessecs.10^6bits/sec

 87380  16384  1638410.009118.17

Changes since v1:
 * Properly initialize tun_id as reported by Julian
 * Drop dupliate netif_keep_dst() as reported by Alexei

Roopa Prabhu (9):
  rtnetlink: introduce new RTA_ENCAP_TYPE and RTA_ENCAP attributes
  lwtunnel: infrastructure for handling light weight tunnels like mpls
  ipv4: support for fib route lwtunnel encap attributes
  ipv6: support for fib route lwtunnel encap attributes
  lwtunnel: support dst output redirect function
  ipv4: redirect dst output to lwtunnel output
  ipv6: rt6_info output redirect to tunnel output
  mpls: export mpls functions for use by mpls iptunnels
  mpls: ip tunnel support

Thomas Graf (13):
  ip_tunnel: Make ovs_tunnel_info and ovs_key_ipv4_tunnel generic
  icmp: Don't leak original dst into ip_route_input()
  dst: Metadata destinations
  arp: Inherit metadata dst when creating ARP requests
  vxlan: Flow based tunneling
  route: Extend flow representation with tunnel key
  route: Per route IP tunnel metadata via lightweight tunnel
  fib: Add fib rule match on tunnel id
  vxlan: Factor out device configuration
  openvswitch: Make tunnel set action attach a metadata dst
  openvswitch: Move dev pointer into vport itself
  openvswitch: Abstract vport name through ovs_vport_name()
  openvswitch: Use regular VXLAN net_device device

 drivers/net/vxlan.c  | 672 +--
 include/linux/lwtunnel.h |   6 +
 include/linux/mpls_iptunnel.h|   6 +
 include/linux/skbuff.h   |   1 +
 include/net/dst.h|   6 +-
 include/net/dst_metadata.h   |  55 +++
 include/net/fib_rules.h  |   1 +
 include/net/flow.h   |   8 +
 include/net/ip6_fib.h|   3 +
 include/net/ip_fib.h |   5 +-
 include/net/ip_tunnels.h |  95 -
 include/net/lwtunnel.h   | 144 
 include/net/mpls_iptunnel.h  |  29 ++
 include/net/route.h  |   1 +
 include/net/rtnetlink.h  |   1 +
 include/net/vxlan.h  |  85 -
 include/uapi/linux/fib_rules.h   |   2 +-
 include/uapi/linux/if_link.h |   1 +
 include/uapi/linux/lwtunnel.h|  16 +
 include/uapi/linux/mpls_iptunnel.h   |  28 ++
 

[PATCH net-next 04/22] ipv6: support for fib route lwtunnel encap attributes

2015-07-21 Thread Thomas Graf
From: Roopa Prabhu ro...@cumulusnetworks.com

This patch adds support in ipv6 fib functions to parse Netlink
RTA encap attributes and attach encap state data to rt6_info.

Signed-off-by: Roopa Prabhu ro...@cumulusnetworks.com
---
 include/net/ip6_fib.h |  3 +++
 net/ipv6/ip6_fib.c|  2 ++
 net/ipv6/route.c  | 33 ++---
 3 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 3b76849..276328e 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -51,6 +51,8 @@ struct fib6_config {
struct nlattr   *fc_mp;
 
struct nl_info  fc_nlinfo;
+   struct nlattr   *fc_encap;
+   u16 fc_encap_type;
 };
 
 struct fib6_node {
@@ -131,6 +133,7 @@ struct rt6_info {
/* more non-fragment space at head required */
unsigned short  rt6i_nfheader_len;
u8  rt6i_protocol;
+   struct lwtunnel_state   *rt6i_lwtstate;
 };
 
 static inline struct inet6_dev *ip6_dst_idev(struct dst_entry *dst)
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 55d1986..d715f2e 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -32,6 +32,7 @@
 #include net/ipv6.h
 #include net/ndisc.h
 #include net/addrconf.h
+#include net/lwtunnel.h
 
 #include net/ip6_fib.h
 #include net/ip6_route.h
@@ -177,6 +178,7 @@ static void rt6_free_pcpu(struct rt6_info *non_pcpu_rt)
 static void rt6_release(struct rt6_info *rt)
 {
if (atomic_dec_and_test(rt-rt6i_ref)) {
+   lwtunnel_state_put(rt-rt6i_lwtstate);
rt6_free_pcpu(rt);
dst_free(rt-dst);
}
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 6090969..b3431b7 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -58,6 +58,7 @@
 #include net/netevent.h
 #include net/netlink.h
 #include net/nexthop.h
+#include net/lwtunnel.h
 
 #include asm/uaccess.h
 
@@ -1770,6 +1771,17 @@ int ip6_route_add(struct fib6_config *cfg)
 
rt-dst.output = ip6_output;
 
+   if (cfg-fc_encap) {
+   struct lwtunnel_state *lwtstate;
+
+   err = lwtunnel_build_state(dev, cfg-fc_encap_type,
+  cfg-fc_encap, lwtstate);
+   if (err)
+   goto out;
+   lwtunnel_state_get(lwtstate);
+   rt-rt6i_lwtstate = lwtstate;
+   }
+
ipv6_addr_prefix(rt-rt6i_dst.addr, cfg-fc_dst, cfg-fc_dst_len);
rt-rt6i_dst.plen = cfg-fc_dst_len;
if (rt-rt6i_dst.plen == 128)
@@ -2595,6 +2607,8 @@ static const struct nla_policy rtm_ipv6_policy[RTA_MAX+1] 
= {
[RTA_METRICS]   = { .type = NLA_NESTED },
[RTA_MULTIPATH] = { .len = sizeof(struct rtnexthop) },
[RTA_PREF]  = { .type = NLA_U8 },
+   [RTA_ENCAP_TYPE]= { .type = NLA_U16 },
+   [RTA_ENCAP] = { .type = NLA_NESTED },
 };
 
 static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh,
@@ -2689,6 +2703,12 @@ static int rtm_to_fib6_config(struct sk_buff *skb, 
struct nlmsghdr *nlh,
cfg-fc_flags |= RTF_PREF(pref);
}
 
+   if (tb[RTA_ENCAP])
+   cfg-fc_encap = tb[RTA_ENCAP];
+
+   if (tb[RTA_ENCAP_TYPE])
+   cfg-fc_encap_type = nla_get_u16(tb[RTA_ENCAP_TYPE]);
+
err = 0;
 errout:
return err;
@@ -2721,6 +2741,10 @@ beginning:
r_cfg.fc_gateway = nla_get_in6_addr(nla);
r_cfg.fc_flags |= RTF_GATEWAY;
}
+   r_cfg.fc_encap = nla_find(attrs, attrlen, RTA_ENCAP);
+   nla = nla_find(attrs, attrlen, RTA_ENCAP_TYPE);
+   if (nla)
+   r_cfg.fc_encap_type = nla_get_u16(nla);
}
err = add ? ip6_route_add(r_cfg) : ip6_route_del(r_cfg);
if (err) {
@@ -2783,7 +2807,7 @@ static int inet6_rtm_newroute(struct sk_buff *skb, struct 
nlmsghdr *nlh)
return ip6_route_add(cfg);
 }
 
-static inline size_t rt6_nlmsg_size(void)
+static inline size_t rt6_nlmsg_size(struct rt6_info *rt)
 {
return NLMSG_ALIGN(sizeof(struct rtmsg))
   + nla_total_size(16) /* RTA_SRC */
@@ -2797,7 +2821,8 @@ static inline size_t rt6_nlmsg_size(void)
   + RTAX_MAX * nla_total_size(4) /* RTA_METRICS */
   + nla_total_size(sizeof(struct rta_cacheinfo))
   + nla_total_size(TCP_CA_NAME_MAX) /* RTAX_CC_ALGO */
-  + nla_total_size(1); /* RTA_PREF */
+  + nla_total_size(1) /* RTA_PREF */
+  + lwtunnel_get_encap_size(rt-rt6i_lwtstate);
 }
 
 static int rt6_fill_node(struct net *net,
@@ -2945,6 +2970,8 @@ static int rt6_fill_node(struct net *net,
if (nla_put_u8(skb, RTA_PREF, IPV6_EXTRACT_PREF(rt-rt6i_flags)))
goto 

[PATCH V3 6/7] hvsock: introduce Hyper-V VM Sockets feature

2015-07-21 Thread Dexuan Cui
Hyper-V VM sockets (hvsock) supplies a byte-stream based communication
mechanism between the host and a guest. It's kind of TCP over VMBus, but
the transportation layer (VMBus) is much simpler than IP. With Hyper-V VM
Sockets, applications between the host and a guest can talk with each
other directly by the traditional BSD-style socket APIs.

Hyper-V VM Sockets is only available on Windows 10 host and later. The
patch implements the necessary support in the guest side by introducing
a new socket address family AF_HYPERV.

Signed-off-by: Dexuan Cui de...@microsoft.com
---

Changes since v1:
- added __init and __exit for the module init/exit functions
- net/hv_sock/Kconfig: default m - default m if HYPERV
- MODULE_LICENSE: Dual MIT/GPL - Dual BSD/GPL

Changes since v2:
- fixed indentation issues
- removed pr_debug

I know the kernel has already had a VM Sockets driver (AF_VSOCK) based
on VMware's VMCI (net/vmw_vsock/, drivers/misc/vmw_vmci), and KVM is
proposing AF_VSOCK of virtio version:
http://thread.gmane.org/gmane.linux.network/365205.

However, though Hyper-V VM Sockets may seem conceptually similar to
AF_VOSCK, there are differences in the transportation layer, and IMO these
make the direct code reusing impractical:

1. In AF_VSOCK, the endpoint type is: u32 ContextID, u32 Port, but in
AF_HYPERV, the endpoint type is: GUID VM_ID, GUID ServiceID. Here GUID
is 128-bit.

2. AF_VSOCK supports SOCK_DGRAM, while AF_HYPERV doesn't.

3. AF_VSOCK supports some special sock opts, like SO_VM_SOCKETS_BUFFER_SIZE,
SO_VM_SOCKETS_BUFFER_MIN/MAX_SIZE and SO_VM_SOCKETS_CONNECT_TIMEOUT.
These are meaningless to AF_HYPERV.

4. Some AF_VSOCK's VMCI transportation ops are meanless to AF_HYPERV/VMBus,
like.notify_recv_init
.notify_recv_pre_block
.notify_recv_pre_dequeue
.notify_recv_post_dequeue
.notify_send_init
.notify_send_pre_block
.notify_send_pre_enqueue
.notify_send_post_enqueue
etc.

So I think we'd better introduce a new address family: AF_HYPERV.

 MAINTAINERS |2 +
 include/linux/socket.h  |4 +-
 include/net/af_hvsock.h |   44 ++
 include/uapi/linux/hyperv.h |   16 +
 net/Kconfig |1 +
 net/Makefile|1 +
 net/hv_sock/Kconfig |   10 +
 net/hv_sock/Makefile|3 +
 net/hv_sock/af_hvsock.c | 1430 +++
 9 files changed, 1510 insertions(+), 1 deletion(-)
 create mode 100644 include/net/af_hvsock.h
 create mode 100644 net/hv_sock/Kconfig
 create mode 100644 net/hv_sock/Makefile
 create mode 100644 net/hv_sock/af_hvsock.c

diff --git a/MAINTAINERS b/MAINTAINERS
index e7bdbac..a4a7e03 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4941,7 +4941,9 @@ F:drivers/input/serio/hyperv-keyboard.c
 F: drivers/net/hyperv/
 F: drivers/scsi/storvsc_drv.c
 F: drivers/video/fbdev/hyperv_fb.c
+F: net/hv_sock/
 F: include/linux/hyperv.h
+F: include/net/af_hvsock.h
 F: tools/hv/
 
 I2C OVER PARALLEL PORT
diff --git a/include/linux/socket.h b/include/linux/socket.h
index 5bf59c8..d5ef612 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -200,7 +200,8 @@ struct ucred {
 #define AF_ALG 38  /* Algorithm sockets*/
 #define AF_NFC 39  /* NFC sockets  */
 #define AF_VSOCK   40  /* vSockets */
-#define AF_MAX 41  /* For now.. */
+#define AF_HYPERV  41  /* Hyper-V virtual sockets  */
+#define AF_MAX 42  /* For now.. */
 
 /* Protocol families, same as address families. */
 #define PF_UNSPEC  AF_UNSPEC
@@ -246,6 +247,7 @@ struct ucred {
 #define PF_ALG AF_ALG
 #define PF_NFC AF_NFC
 #define PF_VSOCK   AF_VSOCK
+#define PF_HYPERV  AF_HYPERV
 #define PF_MAX AF_MAX
 
 /* Maximum queue length specifiable by listen.  */
diff --git a/include/net/af_hvsock.h b/include/net/af_hvsock.h
new file mode 100644
index 000..9951658
--- /dev/null
+++ b/include/net/af_hvsock.h
@@ -0,0 +1,44 @@
+#ifndef __AF_HVSOCK_H__
+#define __AF_HVSOCK_H__
+
+#include linux/kernel.h
+#include linux/hyperv.h
+#include net/sock.h
+
+#define VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV (5 * PAGE_SIZE)
+#define VMBUS_RINGBUFFER_SIZE_HVSOCK_SEND (5 * PAGE_SIZE)
+
+#define HVSOCK_RCV_BUF_SZ  VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV
+#define HVSOCK_SND_BUF_SZ  PAGE_SIZE
+
+#define sk_to_hvsock(__sk)((struct hvsock_sock *)(__sk))
+#define hvsock_to_sk(__hvsk)   ((struct sock *)(__hvsk))
+
+struct hvsock_sock {
+   /* sk must be the first member. */
+   struct sock sk;
+
+   struct sockaddr_hv local_addr;
+   struct sockaddr_hv remote_addr;
+
+   /* protected by the global hvsock_mutex */
+   struct list_head bound_list;
+   struct list_head connected_list;
+
+   struct list_head accept_queue;
+   /* used by enqueue and 

[PATCH V3 5/7] Drivers: hv: vmbus: add a helper function to set a channel's pending send size

2015-07-21 Thread Dexuan Cui
This will be used by the coming net/hvsock driver.

Signed-off-by: Dexuan Cui de...@microsoft.com
---
 include/linux/hyperv.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index fda9790..47c5c1a 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -791,6 +791,12 @@ static inline void *get_per_channel_state(struct 
vmbus_channel *c)
return c-per_channel_state;
 }
 
+static inline void set_channel_pending_send_size(struct vmbus_channel *c,
+u32 size)
+{
+   c-outbound.ring_buffer-pending_send_sz = size;
+}
+
 void vmbus_onmessage(void *context);
 
 int vmbus_request_offers(void);
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 4/7] Drivers: hv: vmbus: add APIs to register callbacks to process hvsock connection

2015-07-21 Thread Dexuan Cui
With the 2 APIs supplied by the VMBus driver, the coming net/hvsock driver
can register 2 callbacks and can know when a new hvsock connection is
offered by the host, and when a hvsock connection is being closed by the
host.

Signed-off-by: Dexuan Cui de...@microsoft.com
---
 drivers/hv/Makefile   |  4 ++-
 drivers/hv/channel_mgmt.c |  9 ++
 drivers/hv/hvsock_callbacks.c | 71 +++
 include/linux/hyperv.h| 10 ++
 4 files changed, 93 insertions(+), 1 deletion(-)
 create mode 100644 drivers/hv/hvsock_callbacks.c

diff --git a/drivers/hv/Makefile b/drivers/hv/Makefile
index 39c9b2c..ef6f8a8 100644
--- a/drivers/hv/Makefile
+++ b/drivers/hv/Makefile
@@ -4,5 +4,7 @@ obj-$(CONFIG_HYPERV_BALLOON)+= hv_balloon.o
 
 hv_vmbus-y := vmbus_drv.o \
 hv.o connection.o channel.o \
-channel_mgmt.o ring_buffer.o
+channel_mgmt.o ring_buffer.o \
+hvsock_callbacks.o
+
 hv_utils-y := hv_util.o hv_kvp.o hv_snapshot.o hv_fcopy.o hv_utils_transport.o
diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 7018c53..a8b1e61 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -300,6 +300,12 @@ static void vmbus_process_offer(struct vmbus_channel 
*newchannel)
return;
}
 
+   if (is_hvsock_channel(newchannel)) {
+   if (hvsock_process_offer(newchannel) != 0)
+   goto err_deq_chan;
+   return;
+   }
+
/*
 * Start the process of binding this offer to the driver
 * We need to set the DeviceObject field before calling
@@ -564,7 +570,10 @@ static void vmbus_onoffer_rescind(struct 
vmbus_channel_message_header *hdr)
vmbus_device_unregister(channel-device_obj);
put_device(dev);
}
+   } else if (is_hvsock_channel(channel)) {
+   hvsock_process_offer_rescind(channel);
} else {
+   /* it is a sub-channel. */
hv_process_channel_removal(channel,
channel-offermsg.child_relid);
}
diff --git a/drivers/hv/hvsock_callbacks.c b/drivers/hv/hvsock_callbacks.c
new file mode 100644
index 000..28f7b75
--- /dev/null
+++ b/drivers/hv/hvsock_callbacks.c
@@ -0,0 +1,71 @@
+/*
+ * Copyright (c) 2015, Microsoft Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME :  fmt
+
+#include linux/hyperv.h
+
+/* We should hold the mutex when getting/setting the function pointers */
+static DEFINE_MUTEX(hvsock_cb_mutex);
+static int (*__process_offer)(struct vmbus_channel *channel);
+static void (*__process_offer_rescind)(struct vmbus_channel *channel);
+
+int hvsock_process_offer(struct vmbus_channel *channel)
+{
+   int ret = -ENODEV;
+
+   mutex_lock(hvsock_cb_mutex);
+
+   if (__process_offer != NULL)
+   ret = __process_offer(channel);
+
+   mutex_unlock(hvsock_cb_mutex);
+
+   return ret;
+}
+
+void hvsock_process_offer_rescind(struct vmbus_channel *channel)
+{
+   mutex_lock(hvsock_cb_mutex);
+
+   if (__process_offer_rescind != NULL)
+   __process_offer_rescind(channel);
+   else
+   hv_process_channel_removal(channel,
+   channel-offermsg.child_relid);
+
+   mutex_unlock(hvsock_cb_mutex);
+}
+
+void vmbus_register_hvsock_callbacks(
+   int (*process_offer)(struct vmbus_channel *),
+   void (*process_offer_rescind)(struct vmbus_channel *))
+{
+   mutex_lock(hvsock_cb_mutex);
+
+   __process_offer = process_offer;
+   __process_offer_rescind = process_offer_rescind;
+
+   mutex_unlock(hvsock_cb_mutex);
+}
+EXPORT_SYMBOL_GPL(vmbus_register_hvsock_callbacks);
+
+void vmbus_unregister_hvsock_callbacks(void)
+{
+   mutex_lock(hvsock_cb_mutex);
+
+   __process_offer = NULL;
+   __process_offer_rescind = NULL;
+
+   mutex_unlock(hvsock_cb_mutex);
+}
+EXPORT_SYMBOL_GPL(vmbus_unregister_hvsock_callbacks);
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index c8e27da..fda9790 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1269,6 +1269,16 @@ extern __u32 vmbus_proto_version;
 
 int vmbus_send_tl_connect_request(const uuid_le *shv_guest_servie_id,
  const uuid_le *shv_host_servie_id);
+
+extern int hvsock_process_offer(struct vmbus_channel *channel);
+extern void hvsock_process_offer_rescind(struct vmbus_channel 

Re: [PATCH v2 0/2] sctp: fix src address selection if using secondary address

2015-07-21 Thread David Miller
From: Marcelo Ricardo Leitner marcelo.leit...@gmail.com
Date: Fri, 17 Jul 2015 12:34:16 -0300

 This series improves the way SCTP chooses its src address so that the
 choosen one will always belong to the interface being used for output.
 
 v1-v2:
  - split out the refactoring from the fix itself
  - Doing a full reverse routing as in v1 is not necessary. Only looking
for the interface that has the address and comparing its number is
enough.

Series applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH iproute2] man ss: Fix explanation when no options specified

2015-07-21 Thread Vadim Kochan
From: Vadim Kochan vadi...@gmail.com

Really by default ss dumps not only TCP sockets but any kind of socket
which is in ESTABLISHED state (TCP/UDP/UNIX).

Signed-off-by: Vadim Kochan vadi...@gmail.com
Reported-by: Miha Marolt mi...@beyondsemi.com
---
 man/man8/ss.8 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/man/man8/ss.8 b/man/man8/ss.8
index b7fbaef..6afbabb 100644
--- a/man/man8/ss.8
+++ b/man/man8/ss.8
@@ -13,7 +13,7 @@ It can display more TCP and state informations than other 
tools.
 
 .SH OPTIONS
 When no option is used ss displays a list of 
-open non-listening TCP sockets that have established connection.
+open non-listening sockets (e.g. TCP/UNIX/UDP) that have established 
connection.
 .TP
 .B \-h, \-\-help
 Show summary of options.
-- 
2.4.2

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mwifiex: usb: Fix double add error when submitting rx urb

2015-07-21 Thread Kalle Valo

 There is an error that can occur where the driver adds the same URB to USB 
 submission list twice.
 This happens since mwifiex_usb_submit_rem_rx can submit packets at same time 
 as an rx urb complete callback.
 This causes list corruption and is fixed by not setting the skb to NULL when 
 submitting an rx packet.
 
 [   84.461242] WARNING: CPU: 1 PID: 748 at lib/list_debug.c:36 
 __list_add+0xcb/0xd0()
 [   84.461245] list_add double add: new=8800c92b0c50, 
 prev=8800c92b0c50, next=8800ced6c430.
 [   84.461247] Modules linked in: rfcomm fuse cmac nf_conntrack_netbios_ns 
 nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack 
 ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables 
 ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle 
 ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat 
 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack bnep 
 iptable_mangle iptable_security iptable_raw btusb btintel bluetooth 
 mwifiex_usb mwifiex x86_pkg_temp_thermal cfg80211 coretemp r8712u(C) 
 kvm_intel kvm hid_sensor_als hid_sensor_incl_3d hid_sensor_rotation 
 hid_sensor_magn_3d hid_sensor_accel_3d hid_sensor_gyro_3d hid_sensor_trigger 
 hid_sensor_iio_common industrialio_triggered_buffer kfifo_buf rfkill iTCO_wdt 
 industrialio iTCO_vendor_support
 [   84.461316]  crc32_pclmul crc32c_intel ghash_clmulni_intel microcode 
 snd_hda_codec_realtek vfat snd_hda_codec_generic fat snd_hda_codec_hdmi 
 snd_hda_intel snd_hda_controller uvcvideo snd_hda_codec videobuf2_vmalloc 
 videobuf2_memops snd_hwdep videobuf2_core snd_hda_core joydev v4l2_common 
 videodev hid_sensor_hub snd_seq hid_multitouch media snd_seq_device snd_pcm 
 snd_timer mei_me snd i2c_i801 lpc_ich mei soundcore tpm_infineon tpm_tis tpm 
 i2c_hid i2c_designware_platform i2c_designware_core nfsd auth_rpcgss nfs_acl 
 lockd grace sunrpc sch_fq_codel i915 i2c_algo_bit drm_kms_helper drm xhci_pci 
 xhci_hcd ehci_pci sd_mod ehci_hcd video
 [   84.461383] CPU: 1 PID: 748 Comm: kworker/u9:0 Tainted: G C  
 4.1.0-rc5+ #163
 [   84.461386] Hardware name: Microsoft Corporation Surface Pro 2/Surface Pro 
 2, BIOS 2.05.0250 04/10/2015
 [   84.461396] Workqueue: MWIFIEX_RX_WORK_QUEUE mwifiex_rx_work_queue 
 [mwifiex]
 [   84.461399]  81a8150e 8801174cf8e8 817df830 
 
 [   84.461405]  8801174cf938 8801174cf928 810a54ba 
 8800c86bd750
 [   84.461410]  8800c92b0c50 8800c92b0c50 8800ced6c430 
 88010c057178
 [   84.461416] Call Trace:
 [   84.461421]  [817df830] dump_stack+0x4f/0x7b
 [   84.461428]  [810a54ba] warn_slowpath_common+0x8a/0xc0
 [   84.461432]  [810a5536] warn_slowpath_fmt+0x46/0x50
 [   84.461436]  [814109fb] __list_add+0xcb/0xd0
 [   84.461442]  [815c551a] ? usb_hcd_link_urb_to_ep+0x2a/0xa0
 [   84.461446]  [815c5570] usb_hcd_link_urb_to_ep+0x80/0xa0
 [   84.461459]  [a004318a] prepare_transfer+0xaa/0x130 [xhci_hcd]
 [   84.461470]  [a0044cf7] xhci_queue_bulk_tx+0xb7/0x7a0 [xhci_hcd]
 [   84.461480]  [a003b67f] ? xhci_urb_enqueue+0x50f/0x660 [xhci_hcd]
 [   84.461489]  [a003b67f] ? xhci_urb_enqueue+0x50f/0x660 [xhci_hcd]
 [   84.461498]  [a003b735] xhci_urb_enqueue+0x5c5/0x660 [xhci_hcd]
 [   84.461503]  [815c7ad3] usb_hcd_submit_urb+0x93/0xa70
 [   84.461507]  [8168dde8] ? __alloc_skb+0x78/0x1f0
 [   84.461511]  [8168d301] ? __kmalloc_reserve.isra.26+0x31/0x90
 [   84.461515]  [8168ddbc] ? __alloc_skb+0x4c/0x1f0
 [   84.461519]  [8168ddfc] ? __alloc_skb+0x8c/0x1f0
 [   84.461523]  [8168badd] ? skb_dequeue+0x5d/0x80
 [   84.461527]  [815c987e] usb_submit_urb+0x42e/0x5f0
 [   84.461531]  [816931d9] ? __alloc_rx_skb+0x39/0x100
 [   84.461536]  [a05aa372] mwifiex_usb_submit_rx_urb+0xb2/0x170 
 [mwifiex_usb]
 [   84.461542]  [a05aa5f5] mwifiex_usb_submit_rem_rx_urbs+0x45/0x50 
 [mwifiex_usb]
 [   84.461550]  [a07094be] mwifiex_rx_work_queue+0x10e/0x140 
 [mwifiex]
 [   84.461556]  [810c4429] process_one_work+0x229/0x890
 [   84.461559]  [810c438c] ? process_one_work+0x18c/0x890
 [   84.461565]  [810c4ae3] worker_thread+0x53/0x470
 [   84.461569]  [810c4a90] ? process_one_work+0x890/0x890
 [   84.461572]  [810cb162] kthread+0xf2/0x110
 [   84.461577]  [811031ad] ? trace_hardirqs_on+0xd/0x10
 [   84.461581]  [810cb070] ? kthread_create_on_node+0x230/0x230
 [   84.461586]  [817e9662] ret_from_fork+0x42/0x70
 [   84.461590]  [810cb070] ? kthread_create_on_node+0x230/0x230
 [   84.461593] ---[ end trace 65103af5e6fb3444 ]---
 
 Signed-off-by: Reyad Attiyat reyad.atti...@gmail.com
 Acked-by: Amitkumar Karwar akar...@marvell.com

Thanks, applied to wireless-drivers-next.git.

Kalle Valo
--
To unsubscribe from this list: send the line 

Re: [PATCH V3 3/7] Drivers: hv: vmbus: add APIs to send/recv hvsock packet and get the r/w-ability

2015-07-21 Thread Vitaly Kuznetsov
Dexuan Cui de...@microsoft.com writes:

 This will be used by the coming net/hvsock driver.

 Signed-off-by: Dexuan Cui de...@microsoft.com
 ---
  drivers/hv/channel.c  | 133 
 ++
  drivers/hv/hyperv_vmbus.h |   4 ++
  drivers/hv/ring_buffer.c  |  14 +
  include/linux/hyperv.h|  32 +++
  4 files changed, 183 insertions(+)

 diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
 index b09d1b7..ffdef03 100644
 --- a/drivers/hv/channel.c
 +++ b/drivers/hv/channel.c
 @@ -758,6 +758,53 @@ int vmbus_sendpacket_pagebuffer_ctl(struct vmbus_channel 
 *channel,
  EXPORT_SYMBOL_GPL(vmbus_sendpacket_pagebuffer_ctl);

  /*
 + * vmbus_sendpacket_hvsock - Send the hvsock payload 'buf' into the vmbus
 + * ringbuffer
 + */
 +int vmbus_sendpacket_hvsock(struct vmbus_channel *channel, void *buf, u32 
 len)
 +{
 + struct vmpipe_proto_header pipe_hdr;
 + struct vmpacket_descriptor desc;
 + struct kvec bufferlist[4];
 + u32 packetlen_aligned;
 + u32 packetlen;
 + u64 aligned_data = 0;
 + bool signal = false;
 + int ret;
 +
 + packetlen = HVSOCK_HEADER_LEN + len;
 + packetlen_aligned = ALIGN(packetlen, sizeof(u64));
 +
 + /* Setup the descriptor */
 + desc.type = VM_PKT_DATA_INBAND;
 + /* in 8-bytes granularity */
 + desc.offset8 = sizeof(struct vmpacket_descriptor)  3;
 + desc.len8 = (u16)(packetlen_aligned  3);
 + desc.flags = 0;
 + desc.trans_id = 0;
 +
 + pipe_hdr.pkt_type = 1;
 + pipe_hdr.data_size = len;
 +
 + bufferlist[0].iov_base = desc;
 + bufferlist[0].iov_len  = sizeof(struct vmpacket_descriptor);
 + bufferlist[1].iov_base = pipe_hdr;
 + bufferlist[1].iov_len  = sizeof(struct vmpipe_proto_header);
 + bufferlist[2].iov_base = buf;
 + bufferlist[2].iov_len  = len;
 + bufferlist[3].iov_base = aligned_data;
 + bufferlist[3].iov_len  = packetlen_aligned - packetlen;
 +
 + ret = hv_ringbuffer_write(channel-outbound, bufferlist, 4, signal);
 +
 + if (ret == 0  signal)
 + vmbus_setevent(channel);
 +
 + return ret;
 +}
 +EXPORT_SYMBOL_GPL(vmbus_sendpacket_hvsock);
 +
 +/*
   * vmbus_sendpacket_pagebuffer - Send a range of single-page buffer
   * packets using a GPADL Direct packet type.
   */
 @@ -978,3 +1025,89 @@ int vmbus_recvpacket_raw(struct vmbus_channel *channel, 
 void *buffer,
   return ret;
  }
  EXPORT_SYMBOL_GPL(vmbus_recvpacket_raw);
 +
 +/*
 + * vmbus_recvpacket_hvsock - Receive the hvsock payload from the vmbus
 + * ringbuffer into the 'buffer'.
 + */
 +int vmbus_recvpacket_hvsock(struct vmbus_channel *channel, void *buffer,
 + u32 bufferlen, u32 *buffer_actual_len)
 +{
 + struct vmpipe_proto_header *pipe_hdr;
 + struct vmpacket_descriptor *desc;
 + u32 packet_len, payload_len;
 + bool signal = false;
 + int ret;
 +
 + *buffer_actual_len = 0;
 +
 + if (bufferlen  HVSOCK_HEADER_LEN)
 + return -ENOBUFS;
 +
 + ret = hv_ringbuffer_peek(channel-inbound, buffer,
 +  HVSOCK_HEADER_LEN);
 + if (ret != 0)
 + return 0;

I'd suggest you do something like

if (ret == -EAGIAIN)
return 0;
else if (ret)
return ret;

to make it future-proof (e.g. when a new error is returned by
hv_ringbuffer_peek). And a comment would also be useful as it is unclear
why we silence errors here.

 +
 + desc = (struct vmpacket_descriptor *)buffer;
 + packet_len = desc-len8  3;
 + if (desc-type != VM_PKT_DATA_INBAND ||
 + desc-offset8 != (sizeof(*desc) / 8) ||
 + packet_len  HVSOCK_HEADER_LEN)
 + return -EIO;
 +
 + pipe_hdr = (struct vmpipe_proto_header *)(desc + 1);
 + payload_len = pipe_hdr-data_size;
 +
 + if (pipe_hdr-pkt_type != 1 || payload_len == 0)
 + return -EIO;
 +
 + if (HVSOCK_PKT_LEN(payload_len) != packet_len + PREV_INDICES_LEN)
 + return -EIO;
 +
 + if (bufferlen  packet_len - HVSOCK_HEADER_LEN)
 + return -ENOBUFS;
 +
 + /* Copy over the hvsock payload to the user buffer */
 + ret = hv_ringbuffer_read(channel-inbound, buffer,
 +  packet_len - HVSOCK_HEADER_LEN,
 +  HVSOCK_HEADER_LEN, signal);
 + if (ret != 0)
 + return ret;
 +
 + *buffer_actual_len = payload_len;
 +
 + if (signal)
 + vmbus_setevent(channel);
 +
 + return 0;
 +}
 +EXPORT_SYMBOL_GPL(vmbus_recvpacket_hvsock);
 +
 +/*
 + * vmbus_get_hvsock_rw_status - can the ringbuffer be read/written?
 + */
 +void vmbus_get_hvsock_rw_status(struct vmbus_channel *channel,
 + bool *can_read, bool *can_write)
 +{
 + u32 avl_read_bytes, avl_write_bytes, dummy;
 +
 + if (can_read != NULL) {
 + hv_get_ringbuffer_available_space(channel-inbound,
 +   avl_read_bytes,
 +

Re: [PATCH net-next 0/4] cxgb4 DCB updates

2015-07-21 Thread David Miller
From: Anish Bhatt an...@chelsio.com
Date: Fri, 17 Jul 2015 13:12:29 -0700

 The following patchset covers changes to work better with  the userspace
 tools cgdcbxd and cgrulesengd and improves firmware support for
 host-managed mode.
 
 Also exports traffic class information that was previously not being
 exported via dcbnl_ops and unfifies how app selector information is passed
 to firmware.

Series applied, thank you.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net 0/3] Couple of classifier fixes

2015-07-21 Thread David Miller
From: Daniel Borkmann dan...@iogearbox.net
Date: Fri, 17 Jul 2015 22:38:42 +0200

 This fixes a couple of panics in the form of (analogous for
 cls_flow{,er}):
 ...
 I've split them into 3 patches, so they can be backported easier
 when needed.

Series applied and queued up for -stable, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 net] inet: frags: fix defragmented packet's IP header for af_packet

2015-07-21 Thread Eric Dumazet
From: Edward Hyunkoo Jee ed...@google.com

When ip_frag_queue() computes positions, it assumes that the passed
sk_buff does not contain L2 headers.

However, when PACKET_FANOUT_FLAG_DEFRAG is used, IP reassembly
functions can be called on outgoing packets that contain L2 headers. 

Also, IPv4 checksum is not corrected after reassembly.

Fixes: 7736d33f4262 (packet: Add pre-defragmentation support for ipv4 
fanouts.)
Signed-off-by: Edward Hyunkoo Jee ed...@google.com
Signed-off-by: Eric Dumazet eduma...@google.com
Cc: Willem de Bruijn will...@google.com
Cc: Jerry Chu hk...@google.com
---
 net/ipv4/ip_fragment.c |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index a50dc6d408d1..31f71b15cfba 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -351,7 +351,7 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff 
*skb)
ihl = ip_hdrlen(skb);
 
/* Determine the position of this fragment. */
-   end = offset + skb-len - ihl;
+   end = offset + skb-len - skb_network_offset(skb) - ihl;
err = -EINVAL;
 
/* Is this the final fragment? */
@@ -381,7 +381,7 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff 
*skb)
goto err;
 
err = -ENOMEM;
-   if (!pskb_pull(skb, ihl))
+   if (!pskb_pull(skb, skb_network_offset(skb) + ihl))
goto err;
 
err = pskb_trim_rcsum(skb, end - offset);
@@ -641,6 +641,8 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff 
*prev,
iph-frag_off = 0;
}
 
+   ip_send_check(iph);
+
IP_INC_STATS_BH(net, IPSTATS_MIB_REASMOKS);
qp-q.fragments = NULL;
qp-q.fragments_tail = NULL;


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [5/6] wireless: cw1200: Remove redundant spi driver bus initialization

2015-07-21 Thread Kalle Valo

 In ancient times it was necessary to manually initialize the bus
 field of an spi_driver to spi_bus_type. These days this is done in
 spi_register_driver(), so we can drop the manual assignment.
 
 Signed-off-by: Antonio Borneo borneo.anto...@gmail.com
 To: Solomon Peachy pi...@shaftnet.org
 To: Kalle Valo kv...@codeaurora.org
 To: linux-wirel...@vger.kernel.org
 To: netdev@vger.kernel.org
 Cc: linux-ker...@vger.kernel.org

Thanks, applied to wireless-drivers-next.git.

Kalle Valo
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC,v3,12/12] fsl/fman: Add FMan MAC driver

2015-07-21 Thread Joakim Tjernlund
On Mon, 2015-07-20 at 13:33 +, Madalin-Cristian Bucur wrote:
  -Original Message-
  From: Joakim Tjernlund [mailto:joakim.tjernl...@transmode.se]
  Sent: Monday, July 20, 2015 3:57 PM
  To: netdev@vger.kernel.org; Liberman Igal-B31950; Bucur Madalin-Cristian-
  B32716
  Cc: linuxppc-...@lists.ozlabs.org; linux-ker...@vger.kernel.org
  Subject: Re: [RFC,v3,12/12] fsl/fman: Add FMan MAC driver
  
  On Mon, 2015-07-20 at 12:28 +, Madalin-Cristian Bucur wrote:
   Hi Joakim,
   
   It seems we just need to align to the API introduced by Thomas Petazzoni
   in 3be2a49e.
   
   Madalin
  
  So it seems, any idea when the next spin will be ready?
  Could you also push it onto
http://git.freescale.com/git/cgit.cgi/ppc/upstream/linux.git/
  ?
  
   Jocke
 
 We're working on addressing all the feedback received to date (you've just 
 added
 a bit more) then we'll re-submit the FMan driver together with the DPAA 
 Ethernet
 driver. A push in the public git is also going to take place after the 
 patches are sent
 for review.

Hi again

Now I got to actually use PHY less(aka. Fixed PHY) too. I had to hack AN off 
for all such links
to get a 1000 SGMII link working:
--- a/drivers/net/ethernet/freescale/fman/mac/memac.c
+++ b/drivers/net/ethernet/freescale/fman/mac/memac.c
@@ -80,7 +80,10 @@ static void setup_sgmii_internal_phy(struct memac_t 
*p_memac, uint8_t phy_addr)
   ENET_SPEED_1000);
 
/* SGMII mode + AN enable */
-   tmp_reg16 = PHY_SGMII_IF_MODE_AN | PHY_SGMII_IF_MODE_SGMII;
+   //tmp_reg16 = PHY_SGMII_IF_MODE_AN | PHY_SGMII_IF_MODE_SGMII;
+   tmp_reg16 = PHY_SGMII_IF_MODE_AN | PHY_SGMII_IF_MODE_SGMII | 0x8;
+   if (p_memac-mac_id != 0)
+   tmp_reg16 = ~PHY_SGMII_IF_MODE_AN;
memac_mii_write_phy_reg(p_memac, phy_addr, 0x14, tmp_reg16);
 
/* Device ability according to SGMII specification */
@@ -104,6 +107,8 @@ static void setup_sgmii_internal_phy(struct memac_t 
*p_memac, uint8_t phy_addr)
 
/* Restart AN */
tmp_reg16 = PHY_SGMII_CR_DEF_VAL | PHY_SGMII_CR_RESET_AN;
+   if (p_memac-mac_id != 0)
+   tmp_reg16 = ~0x1000;
memac_mii_write_phy_reg(p_memac, phy_addr, 0x0, tmp_reg16);
 
/* Restore original enet mode */


Could you please fix this too?

 Jocke--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH iproute2 v2] ss: Fix crash when dump stats from /proc with '-p'

2015-07-21 Thread Vadim Kochan
From: Vadim Kochan vadi...@gmail.com

It really partially reverts:

ec4d0d8a9def35 (ss: Replace unixstat struct by new sockstat struct)

but adds few fields (name  peer_name) from removed unixstat to sockstat struct 
to easy
return original code.

Fixes: ec4d0d8a9def35 (ss: Replace unixstat struct by new sockstat struct)
Reported-by: Marc Dietrich marvi...@gmx.de
Signed-off-by: Vadim Kochan vadi...@gmail.com
---
v2:
Get rid of check for NULL before free(...) func.

 misc/ss.c | 71 ---
 1 file changed, 31 insertions(+), 40 deletions(-)

diff --git a/misc/ss.c b/misc/ss.c
index 03f92fa..320 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -676,18 +676,6 @@ static int get_slabstat(struct slabstat *s)
return 0;
 }
 
-static inline void sock_addr_set_str(inet_prefix *prefix, char **ptr)
-{
-memcpy(prefix-data, ptr, sizeof(char *));
-}
-
-static inline char *sock_addr_get_str(const inet_prefix *prefix)
-{
-char *tmp ;
-memcpy(tmp, prefix-data, sizeof(char *));
-return tmp;
-}
-
 static unsigned long long cookie_sk_get(const uint32_t *cookie)
 {
return (((unsigned long long)cookie[1]  31)  1) | cookie[0];
@@ -739,6 +727,8 @@ struct sockstat
int refcnt;
unsigned intiface;
unsigned long long  sk;
+   char *name;
+   char *peer_name;
 };
 
 struct dctcpstat
@@ -1063,9 +1053,9 @@ static int inet2_addr_match(const inet_prefix *a, const 
inet_prefix *p,
 
 static int unix_match(const inet_prefix *a, const inet_prefix *p)
 {
-   char *addr = sock_addr_get_str(a);
-   char *pattern = sock_addr_get_str(p);
-
+   char *addr, *pattern;
+   memcpy(addr, a-data, sizeof(addr));
+   memcpy(pattern, p-data, sizeof(pattern));
if (pattern == NULL)
return 1;
if (addr == NULL)
@@ -1081,7 +1071,8 @@ static int run_ssfilter(struct ssfilter *f, struct 
sockstat *s)
 static int low, high=65535;
 
if (s-local.family == AF_UNIX) {
-   char *p = sock_addr_get_str(s-local);
+   char *p;
+   memcpy(p, s-local.data, sizeof(p));
return p == NULL || (p[0] == '@'  strlen(p) == 6 
 strspn(p+1, 0123456789abcdef) == 
5);
}
@@ -1401,7 +1392,7 @@ void *parse_hostcond(char *addr, bool is_port)
addr+=5;
p = strdup(addr);
a.addr.bitlen = 8*strlen(p);
-   sock_addr_set_str(a.addr, p);
+   memcpy(a.addr.data, p, sizeof(p));
fam = AF_UNIX;
goto out;
}
@@ -2508,12 +2499,9 @@ static void unix_list_free(struct sockstat *list)
 {
while (list) {
struct sockstat *s = list;
-   char *name = sock_addr_get_str(s-local);
-
list = list-next;
 
-   if (name)
-   free(name);
+   free(s-name);
free(s);
}
 }
@@ -2556,7 +2544,7 @@ static bool unix_use_proc(void)
 static void unix_stats_print(struct sockstat *list, struct filter *f)
 {
struct sockstat *s;
-   char *local, *peer;
+   char *peer;
char *ctx_buf = NULL;
bool use_proc = unix_use_proc();
char port_name[30] = {};
@@ -2567,8 +2555,9 @@ static void unix_stats_print(struct sockstat *list, 
struct filter *f)
if (unix_type_skip(s, f))
continue;
 
-   local = sock_addr_get_str(s-local);
-   peer  = *;
+   peer = *;
+   if (s-peer_name)
+   peer = s-peer_name;
 
if (s-rport  use_proc) {
struct sockstat *p;
@@ -2581,24 +2570,26 @@ static void unix_stats_print(struct sockstat *list, 
struct filter *f)
if (!p) {
peer = ?;
} else {
-   peer = sock_addr_get_str(p-local);
-   peer = peer ? : *;
+   peer = p-name ? : *;
}
}
 
if (use_proc  f-f) {
+   struct sockstat st;
+   st.local.family = AF_UNIX;
+   st.remote.family = AF_UNIX;
+   memcpy(st.local.data, s-name, sizeof(s-name));
if (strcmp(peer, *) == 0)
-   memset(s-remote.data, 0, sizeof(char *));
+   memset(st.remote.data, 0, sizeof(peer));
else
-   sock_addr_set_str(s-remote, peer);
-
-   if (run_ssfilter(f-f, s) == 0)
+   memcpy(st.remote.data, peer, sizeof(peer));
+   if 

Re: Several races in usbnet module (kernel 4.1.x)

2015-07-21 Thread Oliver Neukum
On Mon, 2015-07-20 at 21:13 +0300, Eugene Shatokhin wrote:
 Races on dev-rx_qlen. Reproduced these by repeatedly changing MTU
 (1500 
 - 1400) while downloading large files.

Hi,

I don't see how it matters much. The number of buffers is just
an optimization. As long as it eventually is corrected I don't
see harm.

Regards
Oliver



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 4/9] sfc: add output flag decoding to efx_mcdi_set_workaround

2015-07-21 Thread Edward Cree
From: Daniel Pieczko dpiec...@solarflare.com

The initial use of this will be to check a flag reporting if an FLR was
performed on other functions when enabling cascaded multicast filters.

Signed-off-by: Edward Cree ec...@solarflare.com
---
 drivers/net/ethernet/sfc/ef10.c |  7 ---
 drivers/net/ethernet/sfc/mcdi.c | 22 +++---
 drivers/net/ethernet/sfc/mcdi.h |  3 ++-
 3 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index 4407117..2b93f63 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -387,7 +387,7 @@ static int efx_ef10_probe(struct efx_nic *efx)
 * First try to enable it, then if we get EPERM, just
 * ask if it's already enabled
 */
-   rc = efx_mcdi_set_workaround(efx, MC_CMD_WORKAROUND_BUG35388, true);
+   rc = efx_mcdi_set_workaround(efx, MC_CMD_WORKAROUND_BUG35388, true, 
NULL);
if (rc == 0) {
nic_data-workaround_35388 = true;
} else if (rc == -EPERM) {
@@ -2291,8 +2291,9 @@ static int efx_ef10_ev_init(struct efx_channel *channel)
 
if (implemented  MC_CMD_GET_WORKAROUNDS_OUT_BUG26807 
!nic_data-workaround_26807) {
-   rc = efx_mcdi_set_workaround(efx, 
MC_CMD_WORKAROUND_BUG26807,
-true);
+   rc = efx_mcdi_set_workaround(efx,
+MC_CMD_WORKAROUND_BUG26807,
+true, NULL);
if (!rc)
nic_data-workaround_26807 = true;
else if (rc == -EPERM)
diff --git a/drivers/net/ethernet/sfc/mcdi.c b/drivers/net/ethernet/sfc/mcdi.c
index 58232e7..98d172b 100644
--- a/drivers/net/ethernet/sfc/mcdi.c
+++ b/drivers/net/ethernet/sfc/mcdi.c
@@ -1779,15 +1779,31 @@ int efx_mcdi_wol_filter_reset(struct efx_nic *efx)
return rc;
 }
 
-int efx_mcdi_set_workaround(struct efx_nic *efx, u32 type, bool enabled)
+int efx_mcdi_set_workaround(struct efx_nic *efx, u32 type, bool enabled,
+   unsigned int *flags)
 {
MCDI_DECLARE_BUF(inbuf, MC_CMD_WORKAROUND_IN_LEN);
+   MCDI_DECLARE_BUF(outbuf, MC_CMD_WORKAROUND_EXT_OUT_LEN);
+   size_t outlen;
+   int rc;
 
BUILD_BUG_ON(MC_CMD_WORKAROUND_OUT_LEN != 0);
MCDI_SET_DWORD(inbuf, WORKAROUND_IN_TYPE, type);
MCDI_SET_DWORD(inbuf, WORKAROUND_IN_ENABLED, enabled);
-   return efx_mcdi_rpc(efx, MC_CMD_WORKAROUND, inbuf, sizeof(inbuf),
-   NULL, 0, NULL);
+   rc = efx_mcdi_rpc(efx, MC_CMD_WORKAROUND, inbuf, sizeof(inbuf),
+ outbuf, sizeof(outbuf), outlen);
+   if (rc)
+   return rc;
+
+   if (!flags)
+   return 0;
+
+   if (outlen = MC_CMD_WORKAROUND_EXT_OUT_LEN)
+   *flags = MCDI_DWORD(outbuf, WORKAROUND_EXT_OUT_FLAGS);
+   else
+   *flags = 0;
+
+   return 0;
 }
 
 int efx_mcdi_get_workarounds(struct efx_nic *efx, unsigned int *impl_out,
diff --git a/drivers/net/ethernet/sfc/mcdi.h b/drivers/net/ethernet/sfc/mcdi.h
index 1838afe..025d504 100644
--- a/drivers/net/ethernet/sfc/mcdi.h
+++ b/drivers/net/ethernet/sfc/mcdi.h
@@ -346,7 +346,8 @@ void efx_mcdi_mac_pull_stats(struct efx_nic *efx);
 bool efx_mcdi_mac_check_fault(struct efx_nic *efx);
 enum reset_type efx_mcdi_map_reset_reason(enum reset_type reason);
 int efx_mcdi_reset(struct efx_nic *efx, enum reset_type method);
-int efx_mcdi_set_workaround(struct efx_nic *efx, u32 type, bool enabled);
+int efx_mcdi_set_workaround(struct efx_nic *efx, u32 type, bool enabled,
+   unsigned int *flags);
 int efx_mcdi_get_workarounds(struct efx_nic *efx, unsigned int *impl_out,
 unsigned int *enabled_out);
 

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 5/9] sfc: warn if other functions have been reset by MCFW

2015-07-21 Thread Edward Cree
From: Daniel Pieczko dpiec...@solarflare.com

When enabling the workaround for cascaded multicast filters, the MC
 can reset other functions if they have already inserted filters.
 In that case, the workaround has been enabled, but print an info
 message in the log recording that other functions had to be reset.

As other functions were reset, the MC will have incremented its boot
 count, so also increment the warm_boot_count on the function which
 enabled the workaround, as that function won't have received an MC
 reboot event and does not need to reset.

Signed-off-by: Edward Cree ec...@solarflare.com
---
 drivers/net/ethernet/sfc/ef10.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index 2b93f63..18d6388 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -2291,13 +2291,24 @@ static int efx_ef10_ev_init(struct efx_channel *channel)
 
if (implemented  MC_CMD_GET_WORKAROUNDS_OUT_BUG26807 
!nic_data-workaround_26807) {
+   unsigned int flags;
+
rc = efx_mcdi_set_workaround(efx,
 MC_CMD_WORKAROUND_BUG26807,
-true, NULL);
-   if (!rc)
+true, flags);
+
+   if (!rc) {
+   if (flags 
+   1  
MC_CMD_WORKAROUND_EXT_OUT_FLR_DONE_LBN) {
+   netif_info(efx, drv, efx-net_dev,
+  other functions on NIC have 
been reset\n);
+   /* MC's boot count has incremented */
+   ++nic_data-warm_boot_count;
+   }
nic_data-workaround_26807 = true;
-   else if (rc == -EPERM)
+   } else if (rc == -EPERM) {
rc = 0;
+   }
}
}
 

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 3/9] sfc: cope with ENOSYS from efx_mcdi_get_workarounds()

2015-07-21 Thread Edward Cree
GET_WORKAROUNDS was only introduced in May 2014, not all firmware
 will have it.  So call sites need to handle ENOSYS.
In this case we're probing the bug26807 workaround, which is not
 implemented in any firmware that doesn't have GET_WORKAROUNDS.
 So interpret ENOSYS as 'false'.

Signed-off-by: Edward Cree ec...@solarflare.com
---
 drivers/net/ethernet/sfc/ef10.c | 33 -
 drivers/net/ethernet/sfc/mcdi.c |  6 +-
 2 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index 1193017..4407117 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -2277,20 +2277,27 @@ static int efx_ef10_ev_init(struct efx_channel *channel)
 
/* Successfully created event queue on channel 0 */
rc = efx_mcdi_get_workarounds(efx, implemented, enabled);
-   if (rc)
+   if (rc == -ENOSYS) {
+   /* GET_WORKAROUNDS was implemented before the bug26807
+* workaround, thus the latter must be unavailable in this fw
+*/
+   nic_data-workaround_26807 = false;
+   rc = 0;
+   } else if (rc) {
goto fail;
-
-   nic_data-workaround_26807 =
-   !!(enabled  MC_CMD_GET_WORKAROUNDS_OUT_BUG26807);
-
-   if (implemented  MC_CMD_GET_WORKAROUNDS_OUT_BUG26807 
-   !nic_data-workaround_26807) {
-   rc = efx_mcdi_set_workaround(efx, MC_CMD_WORKAROUND_BUG26807,
-true);
-   if (!rc)
-   nic_data-workaround_26807 = true;
-   else if (rc == -EPERM)
-   rc = 0;
+   } else {
+   nic_data-workaround_26807 =
+   !!(enabled  MC_CMD_GET_WORKAROUNDS_OUT_BUG26807);
+
+   if (implemented  MC_CMD_GET_WORKAROUNDS_OUT_BUG26807 
+   !nic_data-workaround_26807) {
+   rc = efx_mcdi_set_workaround(efx, 
MC_CMD_WORKAROUND_BUG26807,
+true);
+   if (!rc)
+   nic_data-workaround_26807 = true;
+   else if (rc == -EPERM)
+   rc = 0;
+   }
}
 
if (!rc)
diff --git a/drivers/net/ethernet/sfc/mcdi.c b/drivers/net/ethernet/sfc/mcdi.c
index 81640f8..58232e7 100644
--- a/drivers/net/ethernet/sfc/mcdi.c
+++ b/drivers/net/ethernet/sfc/mcdi.c
@@ -1816,7 +1816,11 @@ int efx_mcdi_get_workarounds(struct efx_nic *efx, 
unsigned int *impl_out,
return 0;
 
 fail:
-   netif_err(efx, hw, efx-net_dev, %s: failed rc=%d\n, __func__, rc);
+   /* Older firmware lacks GET_WORKAROUNDS and this isn't especially
+* terrifying.  The call site will have to deal with it though.
+*/
+   netif_printk(efx, hw, rc == -ENOSYS ? KERN_DEBUG : KERN_ERR,
+efx-net_dev, %s: failed rc=%d\n, __func__, rc);
return rc;
 }
 

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 2/9] sfc: enable cascaded multicast filters in MCFW

2015-07-21 Thread Edward Cree
From: Daniel Pieczko dpiec...@solarflare.com

After creating event queue 0, check to see if the workaround is enabled,
 and enable it if necessary.  This will be called during PCI probe and
 also when coming back up after a reset.  The nic_data-workaround_26807
 will be used in the future to control the filter insertion behaviour
 based on this workaround.

Only the primary PF can enable this workaround, so tolerate an EPERM
 error and continue.  Otherwise, if any step in the checking and enabling
 of the workaround fails, the event queue must be removed.

We check that workaround is implemented before trying to enable it,
 and store the current workaround setting before trying to change it.

Signed-off-by: Edward Cree ec...@solarflare.com
---
 drivers/net/ethernet/sfc/ef10.c | 63 +
 drivers/net/ethernet/sfc/nic.h  |  2 ++
 2 files changed, 47 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index 605cc89..1193017 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -2197,6 +2197,29 @@ static int efx_ef10_ev_probe(struct efx_channel *channel)
GFP_KERNEL);
 }
 
+static void efx_ef10_ev_fini(struct efx_channel *channel)
+{
+   MCDI_DECLARE_BUF(inbuf, MC_CMD_FINI_EVQ_IN_LEN);
+   MCDI_DECLARE_BUF_ERR(outbuf);
+   struct efx_nic *efx = channel-efx;
+   size_t outlen;
+   int rc;
+
+   MCDI_SET_DWORD(inbuf, FINI_EVQ_IN_INSTANCE, channel-channel);
+
+   rc = efx_mcdi_rpc_quiet(efx, MC_CMD_FINI_EVQ, inbuf, sizeof(inbuf),
+ outbuf, sizeof(outbuf), outlen);
+
+   if (rc  rc != -EALREADY)
+   goto fail;
+
+   return;
+
+fail:
+   efx_mcdi_display_error(efx, MC_CMD_FINI_EVQ, MC_CMD_FINI_EVQ_IN_LEN,
+  outbuf, outlen, rc);
+}
+
 static int efx_ef10_ev_init(struct efx_channel *channel)
 {
MCDI_DECLARE_BUF(inbuf,
@@ -2208,6 +2231,7 @@ static int efx_ef10_ev_init(struct efx_channel *channel)
struct efx_ef10_nic_data *nic_data;
bool supports_rx_merge;
size_t inlen, outlen;
+   unsigned int enabled, implemented;
dma_addr_t dma_addr;
int rc;
int i;
@@ -2248,30 +2272,33 @@ static int efx_ef10_ev_init(struct efx_channel *channel)
rc = efx_mcdi_rpc(efx, MC_CMD_INIT_EVQ, inbuf, inlen,
  outbuf, sizeof(outbuf), outlen);
/* IRQ return is ignored */
-   return rc;
-}
-
-static void efx_ef10_ev_fini(struct efx_channel *channel)
-{
-   MCDI_DECLARE_BUF(inbuf, MC_CMD_FINI_EVQ_IN_LEN);
-   MCDI_DECLARE_BUF_ERR(outbuf);
-   struct efx_nic *efx = channel-efx;
-   size_t outlen;
-   int rc;
+   if (channel-channel || rc)
+   return rc;
 
-   MCDI_SET_DWORD(inbuf, FINI_EVQ_IN_INSTANCE, channel-channel);
+   /* Successfully created event queue on channel 0 */
+   rc = efx_mcdi_get_workarounds(efx, implemented, enabled);
+   if (rc)
+   goto fail;
 
-   rc = efx_mcdi_rpc_quiet(efx, MC_CMD_FINI_EVQ, inbuf, sizeof(inbuf),
- outbuf, sizeof(outbuf), outlen);
+   nic_data-workaround_26807 =
+   !!(enabled  MC_CMD_GET_WORKAROUNDS_OUT_BUG26807);
 
-   if (rc  rc != -EALREADY)
-   goto fail;
+   if (implemented  MC_CMD_GET_WORKAROUNDS_OUT_BUG26807 
+   !nic_data-workaround_26807) {
+   rc = efx_mcdi_set_workaround(efx, MC_CMD_WORKAROUND_BUG26807,
+true);
+   if (!rc)
+   nic_data-workaround_26807 = true;
+   else if (rc == -EPERM)
+   rc = 0;
+   }
 
-   return;
+   if (!rc)
+   return 0;
 
 fail:
-   efx_mcdi_display_error(efx, MC_CMD_FINI_EVQ, MC_CMD_FINI_EVQ_IN_LEN,
-  outbuf, outlen, rc);
+   efx_ef10_ev_fini(channel);
+   return rc;
 }
 
 static void efx_ef10_ev_remove(struct efx_channel *channel)
diff --git a/drivers/net/ethernet/sfc/nic.h b/drivers/net/ethernet/sfc/nic.h
index 31ff908..0b536e2 100644
--- a/drivers/net/ethernet/sfc/nic.h
+++ b/drivers/net/ethernet/sfc/nic.h
@@ -506,6 +506,7 @@ enum {
  * @rx_rss_context_exclusive: Whether our RSS context is exclusive or shared
  * @stats: Hardware statistics
  * @workaround_35388: Flag: firmware supports workaround for bug 35388
+ * @workaround_26807: Flag: firmware supports workaround for bug 26807
  * @must_check_datapath_caps: Flag: @datapath_caps needs to be revalidated
  * after MC reboot
  * @datapath_caps: Capabilities of datapath firmware (FLAGS1 field of
@@ -535,6 +536,7 @@ struct efx_ef10_nic_data {
bool rx_rss_context_exclusive;
u64 stats[EF10_STAT_COUNT];
bool workaround_35388;
+   bool workaround_26807;
bool must_check_datapath_caps;
u32 

[PATCH net-next 9/9] sfc: clean fallbacks between promisc/normal in efx_ef10_filter_sync_rx_mode

2015-07-21 Thread Edward Cree
Separate functions for inserting individual and promisc filters; explicit
 fallback logic in efx_ef10_filter_sync_rx_mode(), in order not to overload
 the 'promisc' flag as also meaning fall back to promisc.

Signed-off-by: Edward Cree ec...@solarflare.com
---
 drivers/net/ethernet/sfc/ef10.c | 288 +---
 1 file changed, 208 insertions(+), 80 deletions(-)

diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index 0a7cf43..8505d82 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -49,6 +49,7 @@ enum {
  */
 #define HUNT_FILTER_TBL_ROWS 8192
 
+#define EFX_EF10_FILTER_ID_INVALID 0x
 struct efx_ef10_dev_addr {
u8 addr[ETH_ALEN];
u16 id;
@@ -76,8 +77,12 @@ struct efx_ef10_filter_table {
 #define EFX_EF10_FILTER_DEV_MC_MAX 256
struct efx_ef10_dev_addr dev_uc_list[EFX_EF10_FILTER_DEV_UC_MAX];
struct efx_ef10_dev_addr dev_mc_list[EFX_EF10_FILTER_DEV_MC_MAX];
-   int dev_uc_count;   /* negative for PROMISC */
-   int dev_mc_count;   /* negative for PROMISC/ALLMULTI */
+   int dev_uc_count;
+   int dev_mc_count;
+/* Indices (like efx_ef10_dev_addr.id) for promisc/allmulti filters */
+   u16 ucdef_id;
+   u16 bcast_id;
+   u16 mcdef_id;
 };
 
 /* An arbitrary search limit for the software hash table */
@@ -3273,6 +3278,19 @@ static int efx_ef10_filter_remove_safe(struct efx_nic 
*efx,
   filter_id, false);
 }
 
+static u32 efx_ef10_filter_get_unsafe_id(struct efx_nic *efx, u32 filter_id)
+{
+   return filter_id % HUNT_FILTER_TBL_ROWS;
+}
+
+static int efx_ef10_filter_remove_unsafe(struct efx_nic *efx,
+enum efx_filter_priority priority,
+u32 filter_id)
+{
+   return efx_ef10_filter_remove_internal(efx, 1U  priority,
+  filter_id, true);
+}
+
 static int efx_ef10_filter_get_safe(struct efx_nic *efx,
enum efx_filter_priority priority,
u32 filter_id, struct efx_filter_spec *spec)
@@ -3646,6 +3664,10 @@ static int efx_ef10_filter_table_probe(struct efx_nic 
*efx)
goto fail;
}
 
+   table-ucdef_id = EFX_EF10_FILTER_ID_INVALID;
+   table-bcast_id = EFX_EF10_FILTER_ID_INVALID;
+   table-mcdef_id = EFX_EF10_FILTER_ID_INVALID;
+
efx-filter_state = table;
init_waitqueue_head(table-waitq);
return 0;
@@ -3748,6 +3770,12 @@ static void efx_ef10_filter_table_remove(struct efx_nic 
*efx)
kfree(table);
 }
 
+#define EFX_EF10_FILTER_DO_MARK_OLD(id) \
+   if (id != EFX_EF10_FILTER_ID_INVALID) { \
+   filter_idx = efx_ef10_filter_get_unsafe_id(efx, id); \
+   WARN_ON(!table-entry[filter_idx].spec); \
+   table-entry[filter_idx].spec |= 
EFX_EF10_FILTER_FLAG_AUTO_OLD; \
+   }
 static void efx_ef10_filter_mark_old(struct efx_nic *efx)
 {
struct efx_ef10_filter_table *table = efx-filter_state;
@@ -3758,33 +3786,39 @@ static void efx_ef10_filter_mark_old(struct efx_nic 
*efx)
 
/* Mark old filters that may need to be removed */
spin_lock_bh(efx-filter_lock);
-   for (i = 0; i  table-dev_uc_count; i++) {
-   filter_idx = table-dev_uc_list[i].id % HUNT_FILTER_TBL_ROWS;
-   table-entry[filter_idx].spec |= EFX_EF10_FILTER_FLAG_AUTO_OLD;
-   }
-   for (i = 0; i  table-dev_mc_count; i++) {
-   filter_idx = table-dev_mc_list[i].id % HUNT_FILTER_TBL_ROWS;
-   table-entry[filter_idx].spec |= EFX_EF10_FILTER_FLAG_AUTO_OLD;
-   }
+   for (i = 0; i  table-dev_uc_count; i++)
+   EFX_EF10_FILTER_DO_MARK_OLD(table-dev_uc_list[i].id);
+   for (i = 0; i  table-dev_mc_count; i++)
+   EFX_EF10_FILTER_DO_MARK_OLD(table-dev_mc_list[i].id);
+   EFX_EF10_FILTER_DO_MARK_OLD(table-ucdef_id);
+   EFX_EF10_FILTER_DO_MARK_OLD(table-bcast_id);
+   EFX_EF10_FILTER_DO_MARK_OLD(table-mcdef_id);
spin_unlock_bh(efx-filter_lock);
 }
+#undef EFX_EF10_FILTER_DO_MARK_OLD
 
 static void efx_ef10_filter_uc_addr_list(struct efx_nic *efx, bool *promisc)
 {
struct efx_ef10_filter_table *table = efx-filter_state;
struct net_device *net_dev = efx-net_dev;
struct netdev_hw_addr *uc;
+   int addr_count;
unsigned int i;
 
-   if (net_dev-flags  IFF_PROMISC ||
-   netdev_uc_count(net_dev) = EFX_EF10_FILTER_DEV_UC_MAX) {
+   table-ucdef_id = EFX_EF10_FILTER_ID_INVALID;
+   addr_count = netdev_uc_count(net_dev);
+   if (net_dev-flags  IFF_PROMISC)
*promisc = true;
-   }
-   table-dev_uc_count = 1 + netdev_uc_count(net_dev);
+   table-dev_uc_count = 1 + addr_count;

[PATCH net-next 6/9] sfc: Insert multicast filters as well as mismatch filters in promiscuous mode

2015-07-21 Thread Edward Cree
From: Jon Cooper jcoo...@solarflare.com

If a function is in promiscuous mode and another function has a broadcast or
 multicast filter inserted, the function in promiscuous mode won't see that
 broadcast or multicast traffic.
Most notably this breaks broadcast, which means ARP doesn't work. Less
 show-stoppingly, a function listening on a multicast address that's also in
 promiscuous mode will not see that multicast traffic if another function is
 also listening on that multicast address.

Signed-off-by: Edward Cree ec...@solarflare.com
---
 drivers/net/ethernet/sfc/ef10.c | 104 
 1 file changed, 53 insertions(+), 51 deletions(-)

diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index 18d6388..784b46f 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -3758,7 +3758,8 @@ static void efx_ef10_filter_sync_rx_mode(struct efx_nic 
*efx)
struct netdev_hw_addr *uc;
struct netdev_hw_addr *mc;
unsigned int filter_idx;
-   int i, n, rc;
+   int i, rc;
+   bool uc_promisc = false, mc_promisc = false;
 
if (!efx_dev_registered(efx))
return;
@@ -3768,13 +3769,11 @@ static void efx_ef10_filter_sync_rx_mode(struct efx_nic 
*efx)
 
/* Mark old filters that may need to be removed */
spin_lock_bh(efx-filter_lock);
-   n = table-dev_uc_count  0 ? 1 : table-dev_uc_count;
-   for (i = 0; i  n; i++) {
+   for (i = 0; i  table-dev_uc_count; i++) {
filter_idx = table-dev_uc_list[i].id % HUNT_FILTER_TBL_ROWS;
table-entry[filter_idx].spec |= EFX_EF10_FILTER_FLAG_AUTO_OLD;
}
-   n = table-dev_mc_count  0 ? 1 : table-dev_mc_count;
-   for (i = 0; i  n; i++) {
+   for (i = 0; i  table-dev_mc_count; i++) {
filter_idx = table-dev_mc_list[i].id % HUNT_FILTER_TBL_ROWS;
table-entry[filter_idx].spec |= EFX_EF10_FILTER_FLAG_AUTO_OLD;
}
@@ -3786,7 +3785,8 @@ static void efx_ef10_filter_sync_rx_mode(struct efx_nic 
*efx)
netif_addr_lock_bh(net_dev);
if (net_dev-flags  IFF_PROMISC ||
netdev_uc_count(net_dev) = EFX_EF10_FILTER_DEV_UC_MAX) {
-   table-dev_uc_count = -1;
+   table-dev_uc_count = 0;
+   uc_promisc = true;
} else {
table-dev_uc_count = 1 + netdev_uc_count(net_dev);
ether_addr_copy(table-dev_uc_list[0].addr, net_dev-dev_addr);
@@ -3796,9 +3796,11 @@ static void efx_ef10_filter_sync_rx_mode(struct efx_nic 
*efx)
i++;
}
}
-   if (net_dev-flags  (IFF_PROMISC | IFF_ALLMULTI) ||
-   netdev_mc_count(net_dev) = EFX_EF10_FILTER_DEV_MC_MAX) {
-   table-dev_mc_count = -1;
+   if (netdev_mc_count(net_dev) + 2 /* room for broadcast and promisc */
+   = EFX_EF10_FILTER_DEV_MC_MAX) {
+   table-dev_mc_count = 1;
+   eth_broadcast_addr(table-dev_mc_list[0].addr);
+   mc_promisc = true;
} else {
table-dev_mc_count = 1 + netdev_mc_count(net_dev);
eth_broadcast_addr(table-dev_mc_list[0].addr);
@@ -3807,31 +3809,32 @@ static void efx_ef10_filter_sync_rx_mode(struct efx_nic 
*efx)
ether_addr_copy(table-dev_mc_list[i].addr, mc-addr);
i++;
}
+   if (net_dev-flags  (IFF_PROMISC | IFF_ALLMULTI))
+   mc_promisc = true;
}
netif_addr_unlock_bh(net_dev);
 
/* Insert/renew unicast filters */
-   if (table-dev_uc_count = 0) {
-   for (i = 0; i  table-dev_uc_count; i++) {
-   efx_filter_init_rx(spec, EFX_FILTER_PRI_AUTO,
-  EFX_FILTER_FLAG_RX_RSS,
-  0);
-   efx_filter_set_eth_local(spec, EFX_FILTER_VID_UNSPEC,
-table-dev_uc_list[i].addr);
-   rc = efx_ef10_filter_insert(efx, spec, true);
-   if (rc  0) {
-   /* Fall back to unicast-promisc */
-   while (i--)
-   efx_ef10_filter_remove_safe(
-   efx, EFX_FILTER_PRI_AUTO,
-   table-dev_uc_list[i].id);
-   table-dev_uc_count = -1;
-   break;
-   }
-   table-dev_uc_list[i].id = rc;
+   for (i = 0; i  table-dev_uc_count; i++) {
+   efx_filter_init_rx(spec, EFX_FILTER_PRI_AUTO,
+  EFX_FILTER_FLAG_RX_RSS,
+  0);
+   efx_filter_set_eth_local(spec, EFX_FILTER_VID_UNSPEC,
+ 

[PATCH net-next 0/9] sfc: support for cascaded multicast filtering

2015-07-21 Thread Edward Cree
Recent versions of firmware for SFC9100 adapters add support for filter
 chaining, in which packets matching multiple filters are delivered to all
 filters' recipients, rather than only the highest match-priority filter as was
 previously the case.
This patch series enables this feature and redesigns the filter handling code
 to make use of it; in particular, subscribing to a multicast address on one
 function no longer prevents traffic to that address reaching another function
 which is in promiscuous or allmulti mode.
If the firmware does not support filter chaining, the driver will fall back to
 the old behaviour.

Daniel Pieczko (5):
  sfc: enable cascaded multicast filters in MCFW
  sfc: add output flag decoding to efx_mcdi_set_workaround
  sfc: warn if other functions have been reset by MCFW
  sfc: re-factor efx_ef10_filter_sync_rx_mode()
  sfc: support cascaded multicast filters

Edward Cree (3):
  sfc: update MCDI protocol definitions
  sfc: cope with ENOSYS from efx_mcdi_get_workarounds()
  sfc: clean fallbacks between promisc/normal in
efx_ef10_filter_sync_rx_mode

Jon Cooper (1):
  sfc: Insert multicast filters as well as mismatch filters in
promiscuous mode

 drivers/net/ethernet/sfc/ef10.c   |  495 +++--
 drivers/net/ethernet/sfc/mcdi.c   |   28 +-
 drivers/net/ethernet/sfc/mcdi.h   |3 +-
 drivers/net/ethernet/sfc/mcdi_pcol.h  | 3463 +
 drivers/net/ethernet/sfc/net_driver.h |2 +
 drivers/net/ethernet/sfc/nic.h|2 +
 6 files changed, 2997 insertions(+), 996 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Several races in usbnet module (kernel 4.1.x)

2015-07-21 Thread Oliver Neukum
On Mon, 2015-07-20 at 21:13 +0300, Eugene Shatokhin wrote:
 And here, the code clears EVENT_RX_KILL bit in dev-flags, which may 
 execute concurrently with the above operation:
 #0 clear_bit (bitops.h:113, inlined)
 #1 usbnet_bh (usbnet.c:1475)
 /* restart RX again after disabling due to high error rate */
 clear_bit(EVENT_RX_KILL, dev-flags);
 
 If clear_bit() is atomic w.r.t. setting dev-flags to 0, this race is 
 not a problem, I guess. Otherwise, it may be.

clear_bit is atomic with respect to other atomic operations.
So how about this:

Regards
Oliver

From 1c4e685b3a9c183e04c46b661830e5c7ed35b513 Mon Sep 17 00:00:00 2001
From: Oliver Neukum oneu...@suse.com
Date: Tue, 21 Jul 2015 16:19:40 +0200
Subject: [PATCH] usbnet: fix race between usbnet_stop() and the BH

Does this do the job?

Signed-off-by: Oliver Neukum oneu...@suse.com
---
 drivers/net/usb/usbnet.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index 3c86b10..77a9a86 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -778,7 +778,7 @@ int usbnet_stop (struct net_device *net)
 {
struct usbnet   *dev = netdev_priv(net);
struct driver_info  *info = dev-driver_info;
-   int retval, pm;
+   int retval, pm, mpn;
 
clear_bit(EVENT_DEV_OPEN, dev-flags);
netif_stop_queue (net);
@@ -813,14 +813,17 @@ int usbnet_stop (struct net_device *net)
 * can't flush_scheduled_work() until we drop rtnl (later),
 * else workers could deadlock; so make workers a NOP.
 */
+   mpn = !test_and_clear_bit(EVENT_NO_RUNTIME_PM, dev-flags);
dev-flags = 0;
del_timer_sync (dev-delay);
tasklet_kill (dev-bh);
+   mpn |= !test_and_clear_bit(EVENT_NO_RUNTIME_PM, dev-flags);
+   /* in case the bh reset a flag */
+   dev-flags = 0;
if (!pm)
usb_autopm_put_interface(dev-intf);
 
-   if (info-manage_power 
-   !test_and_clear_bit(EVENT_NO_RUNTIME_PM, dev-flags))
+   if (info-manage_power  mpn)
info-manage_power(dev, 0);
else
usb_autopm_put_interface(dev-intf);
-- 
2.1.4



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net] netlink: don't hold mutex in rcu callback when releasing mmapd ring

2015-07-21 Thread Florian Westphal
Kirill A. Shutemov says:

This simple test-case trigers few locking asserts in kernel:

int main(int argc, char **argv)
{
unsigned int block_size = 16 * 4096;
struct nl_mmap_req req = {
.nm_block_size  = block_size,
.nm_block_nr= 64,
.nm_frame_size  = 16384,
.nm_frame_nr= 64 * block_size / 16384,
};
unsigned int ring_size;
int fd;

fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC);
if (setsockopt(fd, SOL_NETLINK, NETLINK_RX_RING, req, sizeof(req))  0)
exit(1);
if (setsockopt(fd, SOL_NETLINK, NETLINK_TX_RING, req, sizeof(req))  0)
exit(1);

ring_size = req.nm_block_nr * req.nm_block_size;
mmap(NULL, 2 * ring_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
return 0;
}

+++ exited with 0 +++
BUG: sleeping function called from invalid context at 
/home/kas/git/public/linux-mm/kernel/locking/mutex.c:616
in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: init
3 locks held by init/1:
 #0:  (reboot_mutex){+.+...}, at: [81080959] SyS_reboot+0xa9/0x220
 #1:  ((reboot_notifier_list).rwsem){.+.+..}, at: [8107f379] 
__blocking_notifier_call_chain+0x39/0x70
 #2:  (rcu_callback){..}, at: [810d32e0] 
rcu_do_batch.isra.49+0x160/0x10c0
Preemption disabled at:[8145365f] __delay+0xf/0x20

CPU: 1 PID: 1 Comm: init Not tainted 4.1.0-9-gbddf4c4818e0 #253
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Debian-1.8.2-1 
04/01/2014
 88017b3d8000 88027bc03c38 81929ceb 0102
  88027bc03c68 81085a9d 0002
 81ca2a20 0268  88027bc03c98
Call Trace:
 IRQ  [81929ceb] dump_stack+0x4f/0x7b
 [81085a9d] ___might_sleep+0x16d/0x270
 [81085bed] __might_sleep+0x4d/0x90
 [8192e96f] mutex_lock_nested+0x2f/0x430
 [81932fed] ? _raw_spin_unlock_irqrestore+0x5d/0x80
 [81464143] ? __this_cpu_preempt_check+0x13/0x20
 [8182fc3d] netlink_set_ring+0x1ed/0x350
 [8182e000] ? netlink_undo_bind+0x70/0x70
 [8182fe20] netlink_sock_destruct+0x80/0x150
 [817e484d] __sk_free+0x1d/0x160
 [817e49a9] sk_free+0x19/0x20
[..]

Cong Wang says:

We can't hold mutex lock in a rcu callback, [..]

Thomas Graf says:

The socket should be dead at this point. It might be simpler to
add a netlink_release_ring() function which doesn't require
locking at all.

Reported-by: Kirill A. Shutemov kir...@shutemov.name
Diagnosed-by: Cong Wang cw...@twopensource.com
Suggested-by: Thomas Graf tg...@suug.ch
Signed-off-by: Florian Westphal f...@strlen.de
---
 net/netlink/af_netlink.c | 79 
 1 file changed, 47 insertions(+), 32 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 9a0ae71..d8e2e39 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -357,25 +357,52 @@ err1:
return NULL;
 }
 
+
+static void
+__netlink_set_ring(struct sock *sk, struct nl_mmap_req *req, bool tx_ring, 
void **pg_vec,
+  unsigned int order)
+{
+   struct netlink_sock *nlk = nlk_sk(sk);
+   struct sk_buff_head *queue;
+   struct netlink_ring *ring;
+
+   queue = tx_ring ? sk-sk_write_queue : sk-sk_receive_queue;
+   ring  = tx_ring ? nlk-tx_ring : nlk-rx_ring;
+
+   spin_lock_bh(queue-lock);
+
+   ring-frame_max = req-nm_frame_nr - 1;
+   ring-head  = 0;
+   ring-frame_size= req-nm_frame_size;
+   ring-pg_vec_pages  = req-nm_block_size / PAGE_SIZE;
+
+   swap(ring-pg_vec_len, req-nm_block_nr);
+   swap(ring-pg_vec_order, order);
+   swap(ring-pg_vec, pg_vec);
+
+   __skb_queue_purge(queue);
+   spin_unlock_bh(queue-lock);
+
+   WARN_ON(atomic_read(nlk-mapped));
+
+   if (pg_vec)
+   free_pg_vec(pg_vec, order, req-nm_block_nr);
+}
+
 static int netlink_set_ring(struct sock *sk, struct nl_mmap_req *req,
-   bool closing, bool tx_ring)
+   bool tx_ring)
 {
struct netlink_sock *nlk = nlk_sk(sk);
struct netlink_ring *ring;
-   struct sk_buff_head *queue;
void **pg_vec = NULL;
unsigned int order = 0;
-   int err;
 
ring  = tx_ring ? nlk-tx_ring : nlk-rx_ring;
-   queue = tx_ring ? sk-sk_write_queue : sk-sk_receive_queue;
 
-   if (!closing) {
-   if (atomic_read(nlk-mapped))
-   return -EBUSY;
-   if (atomic_read(ring-pending))
-   return -EBUSY;
-   }
+   if (atomic_read(nlk-mapped))
+   return -EBUSY;
+   if (atomic_read(ring-pending))
+   return -EBUSY;
 
if (req-nm_block_nr) {
if (ring-pg_vec != NULL)
@@ 

ARP response with link local IP, why not broadcast

2015-07-21 Thread Sebastian Fett

Hello!

According to RFC3927 every ARP packet (reply and request) should be sent 
as link layer broadcast as long as the sender IP is a link local 
address. (see chapter 2.5).
That functionality would help me a lot with a use case I have with our 
application.


But it is not implemented in the kernel that way.
Does anyone know why?

Regards,
Sebastian
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


<    1   2