date:20160808

Re: [PATCH] bonding: Allow tun-interfaces as slaves

2016-08-08 Thread Ding Tianhong

On 2016/8/9 11:09, Jörn Engel wrote:
> Hello Tianhong!
> 
> On Tue, Aug 09, 2016 at 10:18:41AM +0800, Ding Tianhong wrote:
>>
>> I don't understand your problem clearly, can you explain more about how the 
>> 00503b6f702e break tun-interfaces
>> and we will try to fix it.
> 
> Here is a trivial testcase:
> openvpn --mktun --dev tun0
> echo +tun0 > /sys/class/net/bond0/bonding/slaves
> 
> Worked fine before your patch, no longer works after your patch.  Works
> again after my patch.
> 
Hi Jorn:

I check the code and know the reason, the Tun device(or something like tun...) 
didn't has the ndo_set_mac_address, so will not add to bond
as a slaver device in the mode rr, I think the original logic is fine for bond 
enslave processing, the old kernel just avoid this problem and not
fix it, the bond need to considerate the virtual device which not support 
changing mac.

>> and more, dev_set_mac_address will change the salver's mac address, some nic 
>> don't support to change the mac address and
>> could not work as bond slave, so we need to check the return value, I don't 
>> think this patch has any effective improvement.
> 
> Using bonding in balance-rr mode, there doesn't seem to be a need to
> change the mac address.  I suppose you might care in other modes, but I
> don't.
> 
> Jörn
> 
> --
> Time? What's that? Time is only worth what you do with it.
> -- Theo de Raadt
> 
>

Re: [PATCH] bonding: Allow tun-interfaces as slaves

2016-08-08 Thread zhuyj

Can we check slave_ops->ndo_set_mac_address?

1476 if ((slave_ops->ndo_set_mac_address) &&
(!bond->params.fail_over_mac ||
1477 BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)) {
1478 /* Set slave to master's mac address. The
application already
1479 * set the master's mac address to that of the first slave
1480 */
1481 memcpy(addr.sa_data, bond_dev->dev_addr,
bond_dev->addr_len);
1482 addr.sa_family = slave_dev->type;
1483 res = dev_set_mac_address(slave_dev, );
1484 if (res) {
1485 netdev_dbg(bond_dev, "Error %d calling
set_mac_address\n", res);
1486 goto err_restore_mtu;
1487 }
1488 }

On Tue, Aug 9, 2016 at 11:09 AM, Jörn Engel  wrote:
> Hello Tianhong!
>
> On Tue, Aug 09, 2016 at 10:18:41AM +0800, Ding Tianhong wrote:
>>
>> I don't understand your problem clearly, can you explain more about how the 
>> 00503b6f702e break tun-interfaces
>> and we will try to fix it.
>
> Here is a trivial testcase:
> openvpn --mktun --dev tun0
> echo +tun0 > /sys/class/net/bond0/bonding/slaves
>
> Worked fine before your patch, no longer works after your patch.  Works
> again after my patch.
>
>> and more, dev_set_mac_address will change the salver's mac address, some nic 
>> don't support to change the mac address and
>> could not work as bond slave, so we need to check the return value, I don't 
>> think this patch has any effective improvement.
>
> Using bonding in balance-rr mode, there doesn't seem to be a need to
> change the mac address.  I suppose you might care in other modes, but I
> don't.
>
> Jörn
>
> --
> Time? What's that? Time is only worth what you do with it.
> -- Theo de Raadt

net-next is OPEN

2016-08-08 Thread David Miller


I've merged 'net' into 'net-next' and applied a bunch of pending stuff.

Fire at will...

Re: [PATCH net 0/4] qed: dcbx fix series.

2016-08-08 Thread David Miller

From: Sudarsana Reddy Kalluru 
Date: Mon, 8 Aug 2016 21:57:39 -0400

> The patch series contains the minor bug fixes for qed dcbx module.
> Please consider applying this to 'net' branch.

Series applied, thanks.

Re: [PATCH net] net: ipv6: Fix ping to link-local addresses.

2016-08-08 Thread David Miller

From: Lorenzo Colitti 
Date: Tue, 9 Aug 2016 10:00:25 +0900

> Note that pretty much every sendmsg codepath allows other data to take
> precedence over sk_bound_dev_if:
> 
> - udpv6_sendmsg: if sin6_scope_id specified on a scoped address
> - rawv6_sendmsg: if sin6_scope_id specified on a scoped address
> - l2tp_ip6_sendmsg: if sin6_scope_id specified on a scoped address
> - ip_cmsg_send: if IP_PKTINFO or IPV6_PKTINFO specified
> 
> What should I do about those? -EINVAL? Ignore the conflicting data? Leave as 
> is?

That's a good point, I guess this needs some more thought.

[PATCH v6 1/1] rps: Inspect PPTP encapsulated by GRE to get flow hash

2016-08-08 Thread fgao

From: Gao Feng 

The PPTP is encapsulated by GRE header with that GRE_VERSION bits
must contain one. But current GRE RPS needs the GRE_VERSION must be
zero. So RPS does not work for PPTP traffic.

In my test environment, there are four MIPS cores, and all traffic
are passed through by PPTP. As a result, only one core is 100% busy
while other three cores are very idle. After this patch, the usage
of four cores are balanced well.

Signed-off-by: Gao Feng 
---
 v6: 1) Keep the original v4 struct gre_base_hdr and gre_full_hdr style;
 2) Use __cpu_to_be32 instead of htonl;
 v5: 1) Make fix header of gre_full_hdr as uname struct;
 2) Create macro GRE_PPTP_KEY_MASK;
 v4: 1) Define struct gre_full_hdr, and use sizeof its member directly;
 2) Move version and routing check ahead;
 3) Only PPTP in GRE check the ack flag;
 v3: 1) Move struct pptp_gre_header defination into new file pptp.h
 2) Use sizeof GRE and PPTP type instead of literal value;
 3) Remove strict flag check for PPTP to robust;
 4) Consolidate the codes again;
 v2: Update according to Tom and Philp's advice.
 1) Consolidate the codes with GRE version 0 path;
 2) Use PPP_PROTOCOL to get ppp protol;
 3) Set the FLOW_DIS_ENCAPSULATION flag;
 v1: Intial Patch

 drivers/net/ppp/pptp.c |  36 +
 include/net/gre.h  |  10 +++-
 include/net/pptp.h |  40 +++
 include/uapi/linux/if_tunnel.h |   7 ++-
 net/core/flow_dissector.c  | 113 -
 5 files changed, 135 insertions(+), 71 deletions(-)
 create mode 100644 include/net/pptp.h

diff --git a/drivers/net/ppp/pptp.c b/drivers/net/ppp/pptp.c
index ae0905e..3e68dbc 100644
--- a/drivers/net/ppp/pptp.c
+++ b/drivers/net/ppp/pptp.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -53,41 +54,6 @@ static struct proto pptp_sk_proto __read_mostly;
 static const struct ppp_channel_ops pptp_chan_ops;
 static const struct proto_ops pptp_ops;
 
-#define PPP_LCP_ECHOREQ 0x09
-#define PPP_LCP_ECHOREP 0x0A
-#define SC_RCV_BITS(SC_RCV_B7_1|SC_RCV_B7_0|SC_RCV_ODDP|SC_RCV_EVNP)
-
-#define MISSING_WINDOW 20
-#define WRAPPED(curseq, lastseq)\
-   curseq) & 0xff00) == 0) &&\
-   (((lastseq) & 0xff00) == 0xff00))
-
-#define PPTP_GRE_PROTO  0x880B
-#define PPTP_GRE_VER0x1
-
-#define PPTP_GRE_FLAG_C0x80
-#define PPTP_GRE_FLAG_R0x40
-#define PPTP_GRE_FLAG_K0x20
-#define PPTP_GRE_FLAG_S0x10
-#define PPTP_GRE_FLAG_A0x80
-
-#define PPTP_GRE_IS_C(f) ((f)_GRE_FLAG_C)
-#define PPTP_GRE_IS_R(f) ((f)_GRE_FLAG_R)
-#define PPTP_GRE_IS_K(f) ((f)_GRE_FLAG_K)
-#define PPTP_GRE_IS_S(f) ((f)_GRE_FLAG_S)
-#define PPTP_GRE_IS_A(f) ((f)_GRE_FLAG_A)
-
-#define PPTP_HEADER_OVERHEAD (2+sizeof(struct pptp_gre_header))
-struct pptp_gre_header {
-   u8  flags;
-   u8  ver;
-   __be16 protocol;
-   __be16 payload_len;
-   __be16 call_id;
-   __be32 seq;
-   __be32 ack;
-} __packed;
-
 static struct pppox_sock *lookup_chan(u16 call_id, __be32 s_addr)
 {
struct pppox_sock *sock;
diff --git a/include/net/gre.h b/include/net/gre.h
index 7a54a31..8962e1e 100644
--- a/include/net/gre.h
+++ b/include/net/gre.h
@@ -7,7 +7,15 @@
 struct gre_base_hdr {
__be16 flags;
__be16 protocol;
-};
+} __packed;
+
+struct gre_full_hdr {
+   struct gre_base_hdr fixed_header;
+   __be16 csum;
+   __be16 reserved1;
+   __be32 key;
+   __be32 seq;
+} __packed;
 #define GRE_HEADER_SECTION 4
 
 #define GREPROTO_CISCO 0
diff --git a/include/net/pptp.h b/include/net/pptp.h
new file mode 100644
index 000..301d3e2
--- /dev/null
+++ b/include/net/pptp.h
@@ -0,0 +1,40 @@
+#ifndef _NET_PPTP_H
+#define _NET_PPTP_H
+
+#define PPP_LCP_ECHOREQ 0x09
+#define PPP_LCP_ECHOREP 0x0A
+#define SC_RCV_BITS (SC_RCV_B7_1|SC_RCV_B7_0|SC_RCV_ODDP|SC_RCV_EVNP)
+
+#define MISSING_WINDOW 20
+#define WRAPPED(curseq, lastseq)\
+   curseq) & 0xff00) == 0) &&\
+   (((lastseq) & 0xff00) == 0xff00))
+
+#define PPTP_GRE_PROTO  0x880B
+#define PPTP_GRE_VER0x1
+
+#define PPTP_GRE_FLAG_C 0x80
+#define PPTP_GRE_FLAG_R 0x40
+#define PPTP_GRE_FLAG_K 0x20
+#define PPTP_GRE_FLAG_S 0x10
+#define PPTP_GRE_FLAG_A 0x80
+
+#define PPTP_GRE_IS_C(f) ((f)_GRE_FLAG_C)
+#define PPTP_GRE_IS_R(f) ((f)_GRE_FLAG_R)
+#define PPTP_GRE_IS_K(f) ((f)_GRE_FLAG_K)
+#define PPTP_GRE_IS_S(f) ((f)_GRE_FLAG_S)
+#define PPTP_GRE_IS_A(f) ((f)_GRE_FLAG_A)
+
+#define PPTP_HEADER_OVERHEAD (2+sizeof(struct pptp_gre_header))
+struct pptp_gre_header {
+   u8  flags;
+   u8  ver;
+   __be16 protocol;
+   __be16 payload_len;
+   __be16 call_id;
+   __be32 seq;
+   __be32 ack;
+} __packed;
+
+
+#endif
diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
index 1046f55..60dbb20 100644
--- a/include/uapi/linux/if_tunnel.h
+++

Re: [PATCH RESEND] net: can: Introduce MEN 16Z192-00 CAN controller driver

2016-08-08 Thread Benjamin Poirier

On 2016/08/08 09:26, Andreas Werner wrote:
[...]
> > > +
> > > + if (cf->can_dlc > 0)
> > > + data[0] = be32_to_cpup((__be32 *)(cf->data));
> > > + if (cf->can_dlc > 3)
> > > + data[1] = be32_to_cpup((__be32 *)(cf->data + 4));
> > > +
> > > + writel(id, _buf->can_id);
> > > + writel(cf->can_dlc, _buf->length);
> > > +
> > > + if (!(cf->can_id & CAN_RTR_FLAG)) {
> > > + writel(data[0], _buf->data[0]);
> > > + writel(data[1], _buf->data[1]);
> > > +
> > > + stats->tx_bytes += cf->can_dlc;
> > > + }
> > > +
> > > + /* be sure everything is written to the
> > > +  * device before acknowledge the data.
> > > +  */
> > > + mmiowb();
> > > +
> > > + /* trigger the transmission */
> > > + men_z192_ack_tx_pkg(priv, 1);
> > > +
> > > + stats->tx_packets++;
> > > +
> > > + kfree_skb(skb);
> > 
> > What prevents the skb data to be freed/reused before the device has
> > accessed it?

I'm sorry, I hadn't realized that all of the data (all 8 bytes of it!)
is written directly to the device. I was thinking about ethernet devices
that dma packet data.

Re: [PATCH] bonding: Allow tun-interfaces as slaves

2016-08-08 Thread Jörn Engel

Hello Tianhong!

On Tue, Aug 09, 2016 at 10:18:41AM +0800, Ding Tianhong wrote:
> 
> I don't understand your problem clearly, can you explain more about how the 
> 00503b6f702e break tun-interfaces
> and we will try to fix it.

Here is a trivial testcase:
openvpn --mktun --dev tun0
echo +tun0 > /sys/class/net/bond0/bonding/slaves

Worked fine before your patch, no longer works after your patch.  Works
again after my patch.

> and more, dev_set_mac_address will change the salver's mac address, some nic 
> don't support to change the mac address and
> could not work as bond slave, so we need to check the return value, I don't 
> think this patch has any effective improvement.

Using bonding in balance-rr mode, there doesn't seem to be a need to
change the mac address.  I suppose you might care in other modes, but I
don't.

Jörn

--
Time? What's that? Time is only worth what you do with it.
-- Theo de Raadt

[PATCH net 1/4] qed: Remove the endian-ness conversion for pri_to_tc value.

2016-08-08 Thread Sudarsana Reddy Kalluru

Endian-ness conversion is not needed for priority-to-TC field as the
field is already being read/written by the driver in big-endian way.

Signed-off-by: Sudarsana Reddy Kalluru 
Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qed/qed_dcbx.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_dcbx.c 
b/drivers/net/ethernet/qlogic/qed/qed_dcbx.c
index d0dc28f..6869330 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dcbx.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dcbx.c
@@ -483,7 +483,7 @@ qed_dcbx_get_ets_data(struct qed_hwfn *p_hwfn,
bw_map[1] = be32_to_cpu(p_ets->tc_bw_tbl[1]);
tsa_map[0] = be32_to_cpu(p_ets->tc_tsa_tbl[0]);
tsa_map[1] = be32_to_cpu(p_ets->tc_tsa_tbl[1]);
-   pri_map = be32_to_cpu(p_ets->pri_tc_tbl[0]);
+   pri_map = p_ets->pri_tc_tbl[0];
for (i = 0; i < QED_MAX_PFC_PRIORITIES; i++) {
p_params->ets_tc_bw_tbl[i] = ((u8 *)bw_map)[i];
p_params->ets_tc_tsa_tbl[i] = ((u8 *)tsa_map)[i];
@@ -944,7 +944,6 @@ qed_dcbx_set_ets_data(struct qed_hwfn *p_hwfn,
val = (((u32)p_params->ets_pri_tc_tbl[i]) << ((7 - i) * 4));
p_ets->pri_tc_tbl[0] |= val;
}
-   p_ets->pri_tc_tbl[0] = cpu_to_be32(p_ets->pri_tc_tbl[0]);
for (i = 0; i < 2; i++) {
p_ets->tc_bw_tbl[i] = cpu_to_be32(p_ets->tc_bw_tbl[i]);
p_ets->tc_tsa_tbl[i] = cpu_to_be32(p_ets->tc_tsa_tbl[i]);
-- 
1.8.3.1

[PATCH net 0/4] qed: dcbx fix series.

2016-08-08 Thread Sudarsana Reddy Kalluru

The patch series contains the minor bug fixes for qed dcbx module.
Please consider applying this to 'net' branch.

Sudarsana Reddy Kalluru (4):
  qed: Remove the endian-ness conversion for pri_to_tc value.
  qed: Use ieee mfw-mask to get ethtype in ieee-dcbx mode.
  qed: Add dcbx app support for IEEE Selection Field.
  qed: Update app count when adding a new dcbx app entry to the table.

 drivers/net/ethernet/qlogic/qed/qed_dcbx.c | 223 ++---
 drivers/net/ethernet/qlogic/qed/qed_hsi.h  |   8 ++
 include/linux/qed/qed_if.h |   8 ++
 3 files changed, 185 insertions(+), 54 deletions(-)

-- 
1.8.3.1

[PATCH net 2/4] qed: Use ieee mfw-mask to get ethtype in ieee-dcbx mode.

2016-08-08 Thread Sudarsana Reddy Kalluru

Ethtype value is being read incorrectly in ieee-dcbx mode. Use the
correct mfw mask value.

Signed-off-by: Sudarsana Reddy Kalluru 
Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qed/qed_dcbx.c | 88 --
 drivers/net/ethernet/qlogic/qed/qed_hsi.h  |  8 +++
 2 files changed, 66 insertions(+), 30 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_dcbx.c 
b/drivers/net/ethernet/qlogic/qed/qed_dcbx.c
index 6869330..f07f0ac 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dcbx.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dcbx.c
@@ -52,16 +52,33 @@ static bool qed_dcbx_app_ethtype(u32 app_info_bitmap)
  DCBX_APP_SF_ETHTYPE);
 }
 
+static bool qed_dcbx_ieee_app_ethtype(u32 app_info_bitmap)
+{
+   u8 mfw_val = QED_MFW_GET_FIELD(app_info_bitmap, DCBX_APP_SF_IEEE);
+
+   /* Old MFW */
+   if (mfw_val == DCBX_APP_SF_IEEE_RESERVED)
+   return qed_dcbx_app_ethtype(app_info_bitmap);
+
+   return !!(mfw_val == DCBX_APP_SF_IEEE_ETHTYPE);
+}
+
 static bool qed_dcbx_app_port(u32 app_info_bitmap)
 {
return !!(QED_MFW_GET_FIELD(app_info_bitmap, DCBX_APP_SF) ==
  DCBX_APP_SF_PORT);
 }
 
-static bool qed_dcbx_default_tlv(u32 app_info_bitmap, u16 proto_id)
+static bool qed_dcbx_default_tlv(u32 app_info_bitmap, u16 proto_id, bool ieee)
 {
-   return !!(qed_dcbx_app_ethtype(app_info_bitmap) &&
- proto_id == QED_ETH_TYPE_DEFAULT);
+   bool ethtype;
+
+   if (ieee)
+   ethtype = qed_dcbx_ieee_app_ethtype(app_info_bitmap);
+   else
+   ethtype = qed_dcbx_app_ethtype(app_info_bitmap);
+
+   return !!(ethtype && (proto_id == QED_ETH_TYPE_DEFAULT));
 }
 
 static bool qed_dcbx_iscsi_tlv(u32 app_info_bitmap, u16 proto_id)
@@ -70,16 +87,28 @@ static bool qed_dcbx_iscsi_tlv(u32 app_info_bitmap, u16 
proto_id)
  proto_id == QED_TCP_PORT_ISCSI);
 }
 
-static bool qed_dcbx_fcoe_tlv(u32 app_info_bitmap, u16 proto_id)
+static bool qed_dcbx_fcoe_tlv(u32 app_info_bitmap, u16 proto_id, bool ieee)
 {
-   return !!(qed_dcbx_app_ethtype(app_info_bitmap) &&
- proto_id == QED_ETH_TYPE_FCOE);
+   bool ethtype;
+
+   if (ieee)
+   ethtype = qed_dcbx_ieee_app_ethtype(app_info_bitmap);
+   else
+   ethtype = qed_dcbx_app_ethtype(app_info_bitmap);
+
+   return !!(ethtype && (proto_id == QED_ETH_TYPE_FCOE));
 }
 
-static bool qed_dcbx_roce_tlv(u32 app_info_bitmap, u16 proto_id)
+static bool qed_dcbx_roce_tlv(u32 app_info_bitmap, u16 proto_id, bool ieee)
 {
-   return !!(qed_dcbx_app_ethtype(app_info_bitmap) &&
- proto_id == QED_ETH_TYPE_ROCE);
+   bool ethtype;
+
+   if (ieee)
+   ethtype = qed_dcbx_ieee_app_ethtype(app_info_bitmap);
+   else
+   ethtype = qed_dcbx_app_ethtype(app_info_bitmap);
+
+   return !!(ethtype && (proto_id == QED_ETH_TYPE_ROCE));
 }
 
 static bool qed_dcbx_roce_v2_tlv(u32 app_info_bitmap, u16 proto_id)
@@ -164,15 +193,15 @@ qed_dcbx_update_app_info(struct qed_dcbx_results *p_data,
 static bool
 qed_dcbx_get_app_protocol_type(struct qed_hwfn *p_hwfn,
   u32 app_prio_bitmap,
-  u16 id, enum dcbx_protocol_type *type)
+  u16 id, enum dcbx_protocol_type *type, bool ieee)
 {
-   if (qed_dcbx_fcoe_tlv(app_prio_bitmap, id)) {
+   if (qed_dcbx_fcoe_tlv(app_prio_bitmap, id, ieee)) {
*type = DCBX_PROTOCOL_FCOE;
-   } else if (qed_dcbx_roce_tlv(app_prio_bitmap, id)) {
+   } else if (qed_dcbx_roce_tlv(app_prio_bitmap, id, ieee)) {
*type = DCBX_PROTOCOL_ROCE;
} else if (qed_dcbx_iscsi_tlv(app_prio_bitmap, id)) {
*type = DCBX_PROTOCOL_ISCSI;
-   } else if (qed_dcbx_default_tlv(app_prio_bitmap, id)) {
+   } else if (qed_dcbx_default_tlv(app_prio_bitmap, id, ieee)) {
*type = DCBX_PROTOCOL_ETH;
} else if (qed_dcbx_roce_v2_tlv(app_prio_bitmap, id)) {
*type = DCBX_PROTOCOL_ROCE_V2;
@@ -194,17 +223,18 @@ static int
 qed_dcbx_process_tlv(struct qed_hwfn *p_hwfn,
 struct qed_dcbx_results *p_data,
 struct dcbx_app_priority_entry *p_tbl,
-u32 pri_tc_tbl, int count, bool dcbx_enabled)
+u32 pri_tc_tbl, int count, u8 dcbx_version)
 {
u8 tc, priority_map;
enum dcbx_protocol_type type;
+   bool enable, ieee;
u16 protocol_id;
int priority;
-   bool enable;
int i;
 
DP_VERBOSE(p_hwfn, QED_MSG_DCB, "Num APP entries = %d\n", count);
 
+   ieee = (dcbx_version == DCBX_CONFIG_VERSION_IEEE);
/* Parse APP TLV */
for (i = 0; i < count; i++) {
protocol_id = QED_MFW_GET_FIELD(p_tbl[i].entry,
@@ -219,7 +249,7 @@ qed_dcbx_process_tlv(struct

[PATCH net 4/4] qed: Update app count when adding a new dcbx app entry to the table.

2016-08-08 Thread Sudarsana Reddy Kalluru

App count is not updated while adding new app entry to the dcbx app table.

Signed-off-by: Sudarsana Reddy Kalluru 
Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qed/qed_dcbx.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_dcbx.c 
b/drivers/net/ethernet/qlogic/qed/qed_dcbx.c
index b157a6a..226cb08 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dcbx.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dcbx.c
@@ -1707,8 +1707,10 @@ static int qed_dcbnl_setapp(struct qed_dev *cdev,
if ((entry->ethtype == ethtype) && (entry->proto_id == idval))
break;
/* First empty slot */
-   if (!entry->proto_id)
+   if (!entry->proto_id) {
+   dcbx_set.config.params.num_app_entries++;
break;
+   }
}
 
if (i == QED_DCBX_MAX_APP_PROTOCOL) {
@@ -2228,8 +2230,10 @@ int qed_dcbnl_ieee_setapp(struct qed_dev *cdev, struct 
dcb_app *app)
(entry->proto_id == app->protocol))
break;
/* First empty slot */
-   if (!entry->proto_id)
+   if (!entry->proto_id) {
+   dcbx_set.config.params.num_app_entries++;
break;
+   }
}
 
if (i == QED_DCBX_MAX_APP_PROTOCOL) {
-- 
1.8.3.1

[PATCH net 3/4] qed: Add dcbx app support for IEEE Selection Field.

2016-08-08 Thread Sudarsana Reddy Kalluru

MFW now supports the Selection field for IEEE mode. Add driver changes to
use the newer MFW masks to read/write the port-id value.

Signed-off-by: Sudarsana Reddy Kalluru 
Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qed/qed_dcbx.c | 124 -
 include/linux/qed/qed_if.h |   8 ++
 2 files changed, 112 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_dcbx.c 
b/drivers/net/ethernet/qlogic/qed/qed_dcbx.c
index f07f0ac..b157a6a 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dcbx.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dcbx.c
@@ -69,6 +69,17 @@ static bool qed_dcbx_app_port(u32 app_info_bitmap)
  DCBX_APP_SF_PORT);
 }
 
+static bool qed_dcbx_ieee_app_port(u32 app_info_bitmap, u8 type)
+{
+   u8 mfw_val = QED_MFW_GET_FIELD(app_info_bitmap, DCBX_APP_SF_IEEE);
+
+   /* Old MFW */
+   if (mfw_val == DCBX_APP_SF_IEEE_RESERVED)
+   return qed_dcbx_app_port(app_info_bitmap);
+
+   return !!(mfw_val == type || mfw_val == DCBX_APP_SF_IEEE_TCP_UDP_PORT);
+}
+
 static bool qed_dcbx_default_tlv(u32 app_info_bitmap, u16 proto_id, bool ieee)
 {
bool ethtype;
@@ -81,10 +92,17 @@ static bool qed_dcbx_default_tlv(u32 app_info_bitmap, u16 
proto_id, bool ieee)
return !!(ethtype && (proto_id == QED_ETH_TYPE_DEFAULT));
 }
 
-static bool qed_dcbx_iscsi_tlv(u32 app_info_bitmap, u16 proto_id)
+static bool qed_dcbx_iscsi_tlv(u32 app_info_bitmap, u16 proto_id, bool ieee)
 {
-   return !!(qed_dcbx_app_port(app_info_bitmap) &&
- proto_id == QED_TCP_PORT_ISCSI);
+   bool port;
+
+   if (ieee)
+   port = qed_dcbx_ieee_app_port(app_info_bitmap,
+ DCBX_APP_SF_IEEE_TCP_PORT);
+   else
+   port = qed_dcbx_app_port(app_info_bitmap);
+
+   return !!(port && (proto_id == QED_TCP_PORT_ISCSI));
 }
 
 static bool qed_dcbx_fcoe_tlv(u32 app_info_bitmap, u16 proto_id, bool ieee)
@@ -111,10 +129,17 @@ static bool qed_dcbx_roce_tlv(u32 app_info_bitmap, u16 
proto_id, bool ieee)
return !!(ethtype && (proto_id == QED_ETH_TYPE_ROCE));
 }
 
-static bool qed_dcbx_roce_v2_tlv(u32 app_info_bitmap, u16 proto_id)
+static bool qed_dcbx_roce_v2_tlv(u32 app_info_bitmap, u16 proto_id, bool ieee)
 {
-   return !!(qed_dcbx_app_port(app_info_bitmap) &&
- proto_id == QED_UDP_PORT_TYPE_ROCE_V2);
+   bool port;
+
+   if (ieee)
+   port = qed_dcbx_ieee_app_port(app_info_bitmap,
+ DCBX_APP_SF_IEEE_UDP_PORT);
+   else
+   port = qed_dcbx_app_port(app_info_bitmap);
+
+   return !!(port && (proto_id == QED_UDP_PORT_TYPE_ROCE_V2));
 }
 
 static void
@@ -199,11 +224,11 @@ qed_dcbx_get_app_protocol_type(struct qed_hwfn *p_hwfn,
*type = DCBX_PROTOCOL_FCOE;
} else if (qed_dcbx_roce_tlv(app_prio_bitmap, id, ieee)) {
*type = DCBX_PROTOCOL_ROCE;
-   } else if (qed_dcbx_iscsi_tlv(app_prio_bitmap, id)) {
+   } else if (qed_dcbx_iscsi_tlv(app_prio_bitmap, id, ieee)) {
*type = DCBX_PROTOCOL_ISCSI;
} else if (qed_dcbx_default_tlv(app_prio_bitmap, id, ieee)) {
*type = DCBX_PROTOCOL_ETH;
-   } else if (qed_dcbx_roce_v2_tlv(app_prio_bitmap, id)) {
+   } else if (qed_dcbx_roce_v2_tlv(app_prio_bitmap, id, ieee)) {
*type = DCBX_PROTOCOL_ROCE_V2;
} else {
*type = DCBX_MAX_PROTOCOL_TYPE;
@@ -441,8 +466,39 @@ qed_dcbx_get_app_data(struct qed_hwfn *p_hwfn,
  DCBX_APP_NUM_ENTRIES);
for (i = 0; i < DCBX_MAX_APP_PROTOCOL; i++) {
entry = _params->app_entry[i];
-   entry->ethtype = !(QED_MFW_GET_FIELD(p_tbl[i].entry,
-DCBX_APP_SF));
+   if (ieee) {
+   u8 sf_ieee;
+   u32 val;
+
+   sf_ieee = QED_MFW_GET_FIELD(p_tbl[i].entry,
+   DCBX_APP_SF_IEEE);
+   switch (sf_ieee) {
+   case DCBX_APP_SF_IEEE_RESERVED:
+   /* Old MFW */
+   val = QED_MFW_GET_FIELD(p_tbl[i].entry,
+   DCBX_APP_SF);
+   entry->sf_ieee = val ?
+   QED_DCBX_SF_IEEE_TCP_UDP_PORT :
+   QED_DCBX_SF_IEEE_ETHTYPE;
+   break;
+   case DCBX_APP_SF_IEEE_ETHTYPE:
+   entry->sf_ieee = QED_DCBX_SF_IEEE_ETHTYPE;
+   break;
+   case DCBX_APP_SF_IEEE_TCP_PORT:
+

Re: [PATCH] bonding: Allow tun-interfaces as slaves

2016-08-08 Thread Ding Tianhong

On 2016/8/9 5:48, Jörn Engel wrote:
> Up until 00503b6f702e (part of 3.14-rc1), the bonding driver could be
> used to enslave tun-interfaces.  00503b6f702e broke that behaviour,
> afaics as an unintended side-effect.
> 

Hi Jorn:

I don't understand your problem clearly, can you explain more about how the 
00503b6f702e break tun-interfaces
and we will try to fix it.

and more, dev_set_mac_address will change the salver's mac address, some nic 
don't support to change the mac address and
could not work as bond slave, so we need to check the return value, I don't 
think this patch has any effective improvement.

Thanks.
Ding

> For the purpose of bond-over-tun in balance-rr mode, simply ignoring the
> error from dev_set_mac_address() is good enough.  I am not familiar
> enough with the code to judge what new problems this patch might
> introduce.
> 
> Signed-off-by: Joern Engel 
> ---
>  drivers/net/bonding/bond_main.c | 7 +--
>  1 file changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 1f276fa30ba6..bc5dba847f50 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -1489,11 +1489,7 @@ int bond_enslave(struct net_device *bond_dev, struct 
> net_device *slave_dev)
>*/
>   memcpy(addr.sa_data, bond_dev->dev_addr, bond_dev->addr_len);
>   addr.sa_family = slave_dev->type;
> - res = dev_set_mac_address(slave_dev, );
> - if (res) {
> - netdev_dbg(bond_dev, "Error %d calling 
> set_mac_address\n", res);
> - goto err_restore_mtu;
> - }
> + dev_set_mac_address(slave_dev, );
>   }
>  
>   /* set slave flag before open to prevent IPv6 addrconf */
> @@ -1777,7 +1773,6 @@ err_restore_mac:
>   dev_set_mac_address(slave_dev, );
>   }
>  
> -err_restore_mtu:
>   dev_set_mtu(slave_dev, new_slave->original_mtu);
>  
>  err_free:
>

Re: [PATCH v5 1/1] rps: Inspect PPTP encapsulated by GRE to get flow hash

2016-08-08 Thread Feng Gao

Oh, I think I really get you this time.
Please see my inline comment.

If they are right, I could send the v6 patch.

Best Regards
Feng


On Tue, Aug 9, 2016 at 9:06 AM, Philip Prindeville
 wrote:
> Inline...
>
>
>
> On 08/08/2016 06:37 PM, f...@48lvckh6395k16k5.yundunddos.com wrote:
>>
>> From: Gao Feng 
>>
>> The PPTP is encapsulated by GRE header with that GRE_VERSION bits
>> must contain one. But current GRE RPS needs the GRE_VERSION must be
>> zero. So RPS does not work for PPTP traffic.
>>
>> In my test environment, there are four MIPS cores, and all traffic
>> are passed through by PPTP. As a result, only one core is 100% busy
>> while other three cores are very idle. After this patch, the usage
>> of four cores are balanced well.
>>
>> Signed-off-by: Gao Feng 
>> ---
>>   v5: 1) Make fix header of gre_full_hdr as uname struct;
>>   2) Create macro GRE_PPTP_KEY_MASK;
>>   v4: 1) Define struct gre_full_hdr, and use sizeof its member directly;
>>   2) Move version and routing check ahead;
>>   3) Only PPTP in GRE check the ack flag;
>>   v3: 1) Move struct pptp_gre_header defination into new file pptp.h
>>   2) Use sizeof GRE and PPTP type instead of literal value;
>>   3) Remove strict flag check for PPTP to robust;
>>   4) Consolidate the codes again;
>>   v2: Update according to Tom and Philp's advice.
>>   1) Consolidate the codes with GRE version 0 path;
>>   2) Use PPP_PROTOCOL to get ppp protol;
>>   3) Set the FLOW_DIS_ENCAPSULATION flag;
>>   v1: Intial Patch
>>
>>   drivers/net/ppp/pptp.c |  36 +
>>   include/net/gre.h  |  13 -
>>   include/net/pptp.h |  40 +++
>>   include/uapi/linux/if_tunnel.h |   7 ++-
>>   net/core/flow_dissector.c  | 113
>> -
>>   5 files changed, 138 insertions(+), 71 deletions(-)
>>   create mode 100644 include/net/pptp.h
>>
>> diff --git a/drivers/net/ppp/pptp.c b/drivers/net/ppp/pptp.c
>> index ae0905e..3e68dbc 100644
>> --- a/drivers/net/ppp/pptp.c
>> +++ b/drivers/net/ppp/pptp.c
>> @@ -37,6 +37,7 @@
>>   #include 
>>   #include 
>>   #include 
>> +#include 
>> #include 
>>   @@ -53,41 +54,6 @@ static struct proto pptp_sk_proto __read_mostly;
>>   static const struct ppp_channel_ops pptp_chan_ops;
>>   static const struct proto_ops pptp_ops;
>>   -#define PPP_LCP_ECHOREQ 0x09
>> -#define PPP_LCP_ECHOREP 0x0A
>> -#define SC_RCV_BITS(SC_RCV_B7_1|SC_RCV_B7_0|SC_RCV_ODDP|SC_RCV_EVNP)
>> -
>> -#define MISSING_WINDOW 20
>> -#define WRAPPED(curseq, lastseq)\
>> -   curseq) & 0xff00) == 0) &&\
>> -   (((lastseq) & 0xff00) == 0xff00))
>> -
>> -#define PPTP_GRE_PROTO  0x880B
>> -#define PPTP_GRE_VER0x1
>> -
>> -#define PPTP_GRE_FLAG_C0x80
>> -#define PPTP_GRE_FLAG_R0x40
>> -#define PPTP_GRE_FLAG_K0x20
>> -#define PPTP_GRE_FLAG_S0x10
>> -#define PPTP_GRE_FLAG_A0x80
>> -
>> -#define PPTP_GRE_IS_C(f) ((f)_GRE_FLAG_C)
>> -#define PPTP_GRE_IS_R(f) ((f)_GRE_FLAG_R)
>> -#define PPTP_GRE_IS_K(f) ((f)_GRE_FLAG_K)
>> -#define PPTP_GRE_IS_S(f) ((f)_GRE_FLAG_S)
>> -#define PPTP_GRE_IS_A(f) ((f)_GRE_FLAG_A)
>> -
>> -#define PPTP_HEADER_OVERHEAD (2+sizeof(struct pptp_gre_header))
>> -struct pptp_gre_header {
>> -   u8  flags;
>> -   u8  ver;
>> -   __be16 protocol;
>> -   __be16 payload_len;
>> -   __be16 call_id;
>> -   __be32 seq;
>> -   __be32 ack;
>> -} __packed;
>> -
>>   static struct pppox_sock *lookup_chan(u16 call_id, __be32 s_addr)
>>   {
>> struct pppox_sock *sock;
>> diff --git a/include/net/gre.h b/include/net/gre.h
>> index 7a54a31..6347f16 100644
>> --- a/include/net/gre.h
>> +++ b/include/net/gre.h
>> @@ -7,9 +7,20 @@
>>   struct gre_base_hdr {
>> __be16 flags;
>> __be16 protocol;
>> -};
>> +} __packed;
>>   #define GRE_HEADER_SECTION 4
>>   +struct gre_full_hdr {
>> +   struct {
>> +   __be16 flags;
>> +   __be16 protocols;
>> +   };
>
>
> No, you're misunderstanding what I was saying.  I was suggesting using
> "struct gre_base_hdr" here.
>
> Anyway, we can skip it.  Was trying to reflect that the gre_full_hdr was
> derived from the gre_base_hdr.

struct gre_full_hdr {
   struct gre_base_hdr {
__be16 flags;
__be16 protocols;
   } fixed_header;
   .
};
I think you mean it is this.
Sorry about misunderstanding about that always :((

And I have got the advantage now, thanks.

>
>
>
>> +   __be16 csum;
>> +   __be16 reserved1;
>> +   __be32 key;
>> +   __be32 seq;
>> +} __packed;
>> +
>>   #define GREPROTO_CISCO0
>>   #define GREPROTO_PPTP 1
>>   #define GREPROTO_MAX  2
>> diff --git a/include/net/pptp.h b/include/net/pptp.h
>> new file mode 100644
>> index 000..301d3e2
>> --- /dev/null
>> +++ b/include/net/pptp.h
>>

Re: [PATCH v5 1/1] rps: Inspect PPTP encapsulated by GRE to get flow hash

2016-08-08 Thread Philip Prindeville


Inline...


On 08/08/2016 06:37 PM, f...@48lvckh6395k16k5.yundunddos.com wrote:

From: Gao Feng 

The PPTP is encapsulated by GRE header with that GRE_VERSION bits
must contain one. But current GRE RPS needs the GRE_VERSION must be
zero. So RPS does not work for PPTP traffic.

In my test environment, there are four MIPS cores, and all traffic
are passed through by PPTP. As a result, only one core is 100% busy
while other three cores are very idle. After this patch, the usage
of four cores are balanced well.

Signed-off-by: Gao Feng 
---
  v5: 1) Make fix header of gre_full_hdr as uname struct;
  2) Create macro GRE_PPTP_KEY_MASK;
  v4: 1) Define struct gre_full_hdr, and use sizeof its member directly;
  2) Move version and routing check ahead;
  3) Only PPTP in GRE check the ack flag;
  v3: 1) Move struct pptp_gre_header defination into new file pptp.h
  2) Use sizeof GRE and PPTP type instead of literal value;
  3) Remove strict flag check for PPTP to robust;
  4) Consolidate the codes again;
  v2: Update according to Tom and Philp's advice.
  1) Consolidate the codes with GRE version 0 path;
  2) Use PPP_PROTOCOL to get ppp protol;
  3) Set the FLOW_DIS_ENCAPSULATION flag;
  v1: Intial Patch

  drivers/net/ppp/pptp.c |  36 +
  include/net/gre.h  |  13 -
  include/net/pptp.h |  40 +++
  include/uapi/linux/if_tunnel.h |   7 ++-
  net/core/flow_dissector.c  | 113 -
  5 files changed, 138 insertions(+), 71 deletions(-)
  create mode 100644 include/net/pptp.h

diff --git a/drivers/net/ppp/pptp.c b/drivers/net/ppp/pptp.c
index ae0905e..3e68dbc 100644
--- a/drivers/net/ppp/pptp.c
+++ b/drivers/net/ppp/pptp.c
@@ -37,6 +37,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include 
  
@@ -53,41 +54,6 @@ static struct proto pptp_sk_proto __read_mostly;

  static const struct ppp_channel_ops pptp_chan_ops;
  static const struct proto_ops pptp_ops;
  
-#define PPP_LCP_ECHOREQ 0x09

-#define PPP_LCP_ECHOREP 0x0A
-#define SC_RCV_BITS(SC_RCV_B7_1|SC_RCV_B7_0|SC_RCV_ODDP|SC_RCV_EVNP)
-
-#define MISSING_WINDOW 20
-#define WRAPPED(curseq, lastseq)\
-   curseq) & 0xff00) == 0) &&\
-   (((lastseq) & 0xff00) == 0xff00))
-
-#define PPTP_GRE_PROTO  0x880B
-#define PPTP_GRE_VER0x1
-
-#define PPTP_GRE_FLAG_C0x80
-#define PPTP_GRE_FLAG_R0x40
-#define PPTP_GRE_FLAG_K0x20
-#define PPTP_GRE_FLAG_S0x10
-#define PPTP_GRE_FLAG_A0x80
-
-#define PPTP_GRE_IS_C(f) ((f)_GRE_FLAG_C)
-#define PPTP_GRE_IS_R(f) ((f)_GRE_FLAG_R)
-#define PPTP_GRE_IS_K(f) ((f)_GRE_FLAG_K)
-#define PPTP_GRE_IS_S(f) ((f)_GRE_FLAG_S)
-#define PPTP_GRE_IS_A(f) ((f)_GRE_FLAG_A)
-
-#define PPTP_HEADER_OVERHEAD (2+sizeof(struct pptp_gre_header))
-struct pptp_gre_header {
-   u8  flags;
-   u8  ver;
-   __be16 protocol;
-   __be16 payload_len;
-   __be16 call_id;
-   __be32 seq;
-   __be32 ack;
-} __packed;
-
  static struct pppox_sock *lookup_chan(u16 call_id, __be32 s_addr)
  {
struct pppox_sock *sock;
diff --git a/include/net/gre.h b/include/net/gre.h
index 7a54a31..6347f16 100644
--- a/include/net/gre.h
+++ b/include/net/gre.h
@@ -7,9 +7,20 @@
  struct gre_base_hdr {
__be16 flags;
__be16 protocol;
-};
+} __packed;
  #define GRE_HEADER_SECTION 4
  
+struct gre_full_hdr {

+   struct {
+   __be16 flags;
+   __be16 protocols;
+   };


No, you're misunderstanding what I was saying.  I was suggesting using 
"struct gre_base_hdr" here.


Anyway, we can skip it.  Was trying to reflect that the gre_full_hdr was 
derived from the gre_base_hdr.




+   __be16 csum;
+   __be16 reserved1;
+   __be32 key;
+   __be32 seq;
+} __packed;
+
  #define GREPROTO_CISCO0
  #define GREPROTO_PPTP 1
  #define GREPROTO_MAX  2
diff --git a/include/net/pptp.h b/include/net/pptp.h
new file mode 100644
index 000..301d3e2
--- /dev/null
+++ b/include/net/pptp.h
@@ -0,0 +1,40 @@
+#ifndef _NET_PPTP_H
+#define _NET_PPTP_H
+
+#define PPP_LCP_ECHOREQ 0x09
+#define PPP_LCP_ECHOREP 0x0A
+#define SC_RCV_BITS (SC_RCV_B7_1|SC_RCV_B7_0|SC_RCV_ODDP|SC_RCV_EVNP)
+
+#define MISSING_WINDOW 20
+#define WRAPPED(curseq, lastseq)\
+   curseq) & 0xff00) == 0) &&\
+   (((lastseq) & 0xff00) == 0xff00))
+
+#define PPTP_GRE_PROTO  0x880B
+#define PPTP_GRE_VER0x1
+
+#define PPTP_GRE_FLAG_C 0x80
+#define PPTP_GRE_FLAG_R 0x40
+#define PPTP_GRE_FLAG_K 0x20
+#define PPTP_GRE_FLAG_S 0x10
+#define PPTP_GRE_FLAG_A 0x80
+
+#define PPTP_GRE_IS_C(f) ((f)_GRE_FLAG_C)
+#define PPTP_GRE_IS_R(f) ((f)_GRE_FLAG_R)
+#define PPTP_GRE_IS_K(f) ((f)_GRE_FLAG_K)
+#define PPTP_GRE_IS_S(f) ((f)_GRE_FLAG_S)
+#define PPTP_GRE_IS_A(f) ((f)_GRE_FLAG_A)
+
+#define PPTP_HEADER_OVERHEAD (2+sizeof(struct pptp_gre_header))
+struct

Re: [PATCH net] net: ipv6: Fix ping to link-local addresses.

2016-08-08 Thread Lorenzo Colitti

On Tue, Aug 9, 2016 at 6:35 AM, David Miller  wrote:
> We should always give sk_bound_dev_if the highest priority.
>
> Also, we should amend, not delete, the check against the scope
> ID in the sockaddr.  As explained by YOSHIFUJI Hideaki.

Sure, I can do that.

Note that pretty much every sendmsg codepath allows other data to take
precedence over sk_bound_dev_if:

- udpv6_sendmsg: if sin6_scope_id specified on a scoped address
- rawv6_sendmsg: if sin6_scope_id specified on a scoped address
- l2tp_ip6_sendmsg: if sin6_scope_id specified on a scoped address
- ip_cmsg_send: if IP_PKTINFO or IPV6_PKTINFO specified

What should I do about those? -EINVAL? Ignore the conflicting data? Leave as is?

Re: [PATCH v4 1/1] rps: Inspect PPTP encapsulated by GRE to get flow hash

2016-08-08 Thread Feng Gao

Ok, I have sent the v5 patch with two updates:
1. One is make fixed_header of gre_full_header as uname struct;
2. Use one macro instead of the key literal master htonl(0x);
The v5 link is 
https://www.mail-archive.com/netdev@vger.kernel.org/msg122261.html.

But I don't know what's advantage with the uname struct here.
There is one struct gre_base_hdr definition, why could we use the
struct directly?

Philip, could you show me more details about it please?
Thank you.


Best Regards
Feng

On Mon, Aug 8, 2016 at 11:20 PM, Philp Prindeville
 wrote:
> No, I was referring to anonymous structures, which is a feature of C11.
>
> Please see the link I sent.
>
>
>
> On 08/08/2016 03:13 AM, Feng Gao wrote:
>>
>> Hi Philip,
>>
>> Do you mean like the following?
>>
>> struct gre_full_hdr {
>>  struct {
>>  __be16 flags;
>>  __be16 protocol;
>>  } fixed_header;
>>  __be16 csum;
>>  __be16 reserved1;
>>  __be32 key;
>>  __be32 seq;
>> } __packed;
>>
>> But we need struct gre_base_hdr to get the fixed header of GRE in
>> function __skb_flow_dissect like the following codes.
>> struct gre_base_hdr *hdr, _hdr;
>> hdr = __skb_header_pointer(skb, nhoff, sizeof(_hdr), data, hlen, &_hdr);
>>
>> BTW, the original codes define one local stuct gre_hdr. Now I use the
>> unified struct gre_base_hdr instead of it.
>>
>> Best Regards
>> Feng
>>
>>
>> On Mon, Aug 8, 2016 at 11:27 AM, Philp Prindeville
>>  wrote:
>>>
>>> Feng,
>>>
>>> An anonymous structure is defined here:
>>> https://gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html
>>>
>>> i.e.:
>>>
>>> struct gre_full_hdr {
>>>  struct gre_base_hdr;
>>>  ...
>>>
>>> so yes, I'm talking about making fixed_header be anonymous instead.
>>>
>>> -Philip
>>>
>>>
>>>
>>> On 08/07/2016 08:50 PM, Feng Gao wrote:

 Hi Philp,

 Forgive my poor English, I am not clear about the comment #1.
 "Can you make gre_base_hdr be anonymous?".

 +struct gre_full_hdr {
 +   struct gre_base_hdr fixed_header;

 Do you mean make the member "fixed_header" as anonymous or not?

 Best Regards
 Feng


 On Mon, Aug 8, 2016 at 5:03 AM, Philp Prindeville
  wrote:
>
> Inline...
>
>
>
> On 08/04/2016 01:06 AM, f...@48lvckh6395k16k5.yundunddos.com wrote:
>>
>> From: Gao Feng 
>>
>> The PPTP is encapsulated by GRE header with that GRE_VERSION bits
>> must contain one. But current GRE RPS needs the GRE_VERSION must be
>> zero. So RPS does not work for PPTP traffic.
>>
>> In my test environment, there are four MIPS cores, and all traffic
>> are passed through by PPTP. As a result, only one core is 100% busy
>> while other three cores are very idle. After this patch, the usage
>> of four cores are balanced well.
>>
>> Signed-off-by: Gao Feng 
>> ---
>> v4: 1) Define struct gre_full_hdr, and use sizeof its member
>> directly;
>> 2) Move version and routing check ahead;
>> 3) Only PPTP in GRE check the ack flag;
>> v3: 1) Move struct pptp_gre_header defination into new file pptp.h
>> 2) Use sizeof GRE and PPTP type instead of literal value;
>> 3) Remove strict flag check for PPTP to robust;
>> 4) Consolidate the codes again;
>> v2: Update according to Tom and Philp's advice.
>> 1) Consolidate the codes with GRE version 0 path;
>> 2) Use PPP_PROTOCOL to get ppp protol;
>> 3) Set the FLOW_DIS_ENCAPSULATION flag;
>> v1: Intial Patch
>>
>> drivers/net/ppp/pptp.c |  36 +
>> include/net/gre.h  |  10 +++-
>> include/net/pptp.h |  40 +++
>> include/uapi/linux/if_tunnel.h |   7 ++-
>> net/core/flow_dissector.c  | 113
>> -
>> 5 files changed, 135 insertions(+), 71 deletions(-)
>> create mode 100644 include/net/pptp.h
>>
>> diff --git a/drivers/net/ppp/pptp.c b/drivers/net/ppp/pptp.c
>> index ae0905e..3e68dbc 100644
>> --- a/drivers/net/ppp/pptp.c
>> +++ b/drivers/net/ppp/pptp.c
>> @@ -37,6 +37,7 @@
>> #include 
>> #include 
>> #include 
>> +#include 
>>   #include 
>> @@ -53,41 +54,6 @@ static struct proto pptp_sk_proto
>> __read_mostly;
>> static const struct ppp_channel_ops pptp_chan_ops;
>> static const struct proto_ops pptp_ops;
>> -#define PPP_LCP_ECHOREQ 0x09
>> -#define PPP_LCP_ECHOREP 0x0A
>> -#define SC_RCV_BITS
>> (SC_RCV_B7_1|SC_RCV_B7_0|SC_RCV_ODDP|SC_RCV_EVNP)
>> -
>> -#define MISSING_WINDOW 20
>> -#define WRAPPED(curseq, lastseq)\
>> -

Re: [PATCH net] net: ipv6: Fix ping to link-local addresses.

2016-08-08 Thread Lorenzo Colitti

On Tue, Aug 9, 2016 at 1:27 AM, David Ahern  wrote:
> Your description states:
> "ping_v6_sendmsg never sets flowi6_oif, so it is not possible to
> ping an IPv6 address on a different interface."
>
> That code snippet above contradicts that -- flowi6_oif is set in 
> ping_v6_sendmsg.

Ah, yes, thanks. Will fix the commit message.

[PATCH v5 1/1] rps: Inspect PPTP encapsulated by GRE to get flow hash

2016-08-08 Thread fgao

From: Gao Feng 

The PPTP is encapsulated by GRE header with that GRE_VERSION bits
must contain one. But current GRE RPS needs the GRE_VERSION must be
zero. So RPS does not work for PPTP traffic.

In my test environment, there are four MIPS cores, and all traffic
are passed through by PPTP. As a result, only one core is 100% busy
while other three cores are very idle. After this patch, the usage
of four cores are balanced well.

Signed-off-by: Gao Feng 
---
 v5: 1) Make fix header of gre_full_hdr as uname struct;
 2) Create macro GRE_PPTP_KEY_MASK;
 v4: 1) Define struct gre_full_hdr, and use sizeof its member directly;
 2) Move version and routing check ahead;
 3) Only PPTP in GRE check the ack flag;
 v3: 1) Move struct pptp_gre_header defination into new file pptp.h
 2) Use sizeof GRE and PPTP type instead of literal value;
 3) Remove strict flag check for PPTP to robust;
 4) Consolidate the codes again;
 v2: Update according to Tom and Philp's advice.
 1) Consolidate the codes with GRE version 0 path;
 2) Use PPP_PROTOCOL to get ppp protol;
 3) Set the FLOW_DIS_ENCAPSULATION flag;
 v1: Intial Patch

 drivers/net/ppp/pptp.c |  36 +
 include/net/gre.h  |  13 -
 include/net/pptp.h |  40 +++
 include/uapi/linux/if_tunnel.h |   7 ++-
 net/core/flow_dissector.c  | 113 -
 5 files changed, 138 insertions(+), 71 deletions(-)
 create mode 100644 include/net/pptp.h

diff --git a/drivers/net/ppp/pptp.c b/drivers/net/ppp/pptp.c
index ae0905e..3e68dbc 100644
--- a/drivers/net/ppp/pptp.c
+++ b/drivers/net/ppp/pptp.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -53,41 +54,6 @@ static struct proto pptp_sk_proto __read_mostly;
 static const struct ppp_channel_ops pptp_chan_ops;
 static const struct proto_ops pptp_ops;
 
-#define PPP_LCP_ECHOREQ 0x09
-#define PPP_LCP_ECHOREP 0x0A
-#define SC_RCV_BITS(SC_RCV_B7_1|SC_RCV_B7_0|SC_RCV_ODDP|SC_RCV_EVNP)
-
-#define MISSING_WINDOW 20
-#define WRAPPED(curseq, lastseq)\
-   curseq) & 0xff00) == 0) &&\
-   (((lastseq) & 0xff00) == 0xff00))
-
-#define PPTP_GRE_PROTO  0x880B
-#define PPTP_GRE_VER0x1
-
-#define PPTP_GRE_FLAG_C0x80
-#define PPTP_GRE_FLAG_R0x40
-#define PPTP_GRE_FLAG_K0x20
-#define PPTP_GRE_FLAG_S0x10
-#define PPTP_GRE_FLAG_A0x80
-
-#define PPTP_GRE_IS_C(f) ((f)_GRE_FLAG_C)
-#define PPTP_GRE_IS_R(f) ((f)_GRE_FLAG_R)
-#define PPTP_GRE_IS_K(f) ((f)_GRE_FLAG_K)
-#define PPTP_GRE_IS_S(f) ((f)_GRE_FLAG_S)
-#define PPTP_GRE_IS_A(f) ((f)_GRE_FLAG_A)
-
-#define PPTP_HEADER_OVERHEAD (2+sizeof(struct pptp_gre_header))
-struct pptp_gre_header {
-   u8  flags;
-   u8  ver;
-   __be16 protocol;
-   __be16 payload_len;
-   __be16 call_id;
-   __be32 seq;
-   __be32 ack;
-} __packed;
-
 static struct pppox_sock *lookup_chan(u16 call_id, __be32 s_addr)
 {
struct pppox_sock *sock;
diff --git a/include/net/gre.h b/include/net/gre.h
index 7a54a31..6347f16 100644
--- a/include/net/gre.h
+++ b/include/net/gre.h
@@ -7,9 +7,20 @@
 struct gre_base_hdr {
__be16 flags;
__be16 protocol;
-};
+} __packed;
 #define GRE_HEADER_SECTION 4
 
+struct gre_full_hdr {
+   struct {
+   __be16 flags;
+   __be16 protocols;
+   };
+   __be16 csum;
+   __be16 reserved1;
+   __be32 key;
+   __be32 seq;
+} __packed;
+
 #define GREPROTO_CISCO 0
 #define GREPROTO_PPTP  1
 #define GREPROTO_MAX   2
diff --git a/include/net/pptp.h b/include/net/pptp.h
new file mode 100644
index 000..301d3e2
--- /dev/null
+++ b/include/net/pptp.h
@@ -0,0 +1,40 @@
+#ifndef _NET_PPTP_H
+#define _NET_PPTP_H
+
+#define PPP_LCP_ECHOREQ 0x09
+#define PPP_LCP_ECHOREP 0x0A
+#define SC_RCV_BITS (SC_RCV_B7_1|SC_RCV_B7_0|SC_RCV_ODDP|SC_RCV_EVNP)
+
+#define MISSING_WINDOW 20
+#define WRAPPED(curseq, lastseq)\
+   curseq) & 0xff00) == 0) &&\
+   (((lastseq) & 0xff00) == 0xff00))
+
+#define PPTP_GRE_PROTO  0x880B
+#define PPTP_GRE_VER0x1
+
+#define PPTP_GRE_FLAG_C 0x80
+#define PPTP_GRE_FLAG_R 0x40
+#define PPTP_GRE_FLAG_K 0x20
+#define PPTP_GRE_FLAG_S 0x10
+#define PPTP_GRE_FLAG_A 0x80
+
+#define PPTP_GRE_IS_C(f) ((f)_GRE_FLAG_C)
+#define PPTP_GRE_IS_R(f) ((f)_GRE_FLAG_R)
+#define PPTP_GRE_IS_K(f) ((f)_GRE_FLAG_K)
+#define PPTP_GRE_IS_S(f) ((f)_GRE_FLAG_S)
+#define PPTP_GRE_IS_A(f) ((f)_GRE_FLAG_A)
+
+#define PPTP_HEADER_OVERHEAD (2+sizeof(struct pptp_gre_header))
+struct pptp_gre_header {
+   u8  flags;
+   u8  ver;
+   __be16 protocol;
+   __be16 payload_len;
+   __be16 call_id;
+   __be32 seq;
+   __be32 ack;
+} __packed;
+
+
+#endif
diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
index 1046f55..0c11918 100644
--- a/include/uapi/linux/if_tunnel.h
+++

Re: [RFC 0/4] RFC: Add Checmate, BPF-driven minor LSM

2016-08-08 Thread Kees Cook

On Mon, Aug 8, 2016 at 5:00 PM, Sargun Dhillon  wrote:
> On Mon, Aug 08, 2016 at 04:44:02PM -0700, Kees Cook wrote:
>> On Thu, Aug 4, 2016 at 12:11 AM, Sargun Dhillon  wrote:
>> > I distributed this patchset to linux-security-mod...@vger.kernel.org 
>> > earlier,
>> > but based on the fact that the archive is down, and this is a fairly
>> > broad-sweeping proposal, I figured I'd grow the audience a little bit. 
>> > Sorry
>> > if you received this multiple times.
>> >
>> > I've begun building out the skeleton of a Linux Security Module, and I'd 
>> > like to
>> > get feedback on it. It's a skeleton, and I've only populated a few hooks, 
>> > so I'm
>> > mostly looking for input on the general proposal, interest, and design. 
>> > It's a
>> > minor LSM. My particular use case is one in which containers are being
>> > dynamically deployed to machines by internal developers in a different 
>> > group.
>> > The point of Checmate is to act as an extensible bed for _safe_, complex
>> > security policies. It's nice to enable dynamic security policies that can 
>> > be
>> > defined in C, and change as neccessary, without ever having to patch, or 
>> > rebuild
>> > the kernel.
>> >
>> > For many of these containers, the security policies can be fairly nuanced. 
>> > One
>> > particular one to take into account is network security. Often times,
>> > administrators want to prevent ingress, and egress connectivity except 
>> > from a
>> > few select IPs. Egress filtering can be managed using net_cls, but without
>> > modifying running software, it's non-trivial to attach a filter to all 
>> > sockets
>> > being created within a container. The inet_conn_request, socket_recvmsg,
>> > socket_sock_rcv_skb hooks make this trivial to implement.
>> >
>> > Other times, containers need to be throttled in places where there's not 
>> > really
>> > a good place to impose that policy for software which isn't built 
>> > in-house.  If
>> > one wants to limit file creations/sec, or reject I/O under certain
>> > characteristics, there's not a great place to do it now. This gives 
>> > engineers a
>> > mechanism to write those policies.
>> >
>> > This same flexibility can be used to take existing programs and enable 
>> > safe BPF
>> > helpers to modify memory to allow rules to pass. One example that I 
>> > prototyped
>> > was Docker's port mapping, which has an overhead (DNAT), and there's some 
>> > loss
>> > of fidelity in the BSD Socket API to identify what's going on. Instead, we 
>> > can
>> > just rewrite the port in a bind, based upon some data in a BPF map, and a 
>> > cgroup
>> > match.
>> >
>> > I can actually see other minor security modules being implemented in 
>> > Checmate,
>> > for example, Yama, or the recently proposed Hardchroot could be 
>> > reimplemented in
>> > BPF. Potentially, they could even be API compatible.
>> >
>> > Although, at first, much of this sounds like seccomp, it's quite 
>> > different. For
>> > one, what we can do in the security hooks is more complex (access to kernel
>> > pointers). The other side of this is we can have effects on a system-wide,
>> > or cgroup level. This also circumvents the need for CRIU-friendly policies.
>> >
>> > Lastly, the flexibility of this mechanism allows for prevention of security
>> > vulnerabilities which are often complex in nature and require the 
>> > interaction
>> > of multiple hooks (CVE-2014-9717 is a good example), and although ksplice,
>> > and livepatch exist, they're not always easy to use, as compared to loading
>> > a single bpf program across all kernels.
>> >
>> > The user-facing API is exposed via prctl as it's meant to be very simple 
>> > (at
>> > least the kernel components). It only has three operations. For a given 
>> > security
>> > hook, you can attach a BPF program to it, which will add it to the set of
>> > programs that are executed over when the hook is hit. You can reset a hook,
>> > which removes all program associated with a given hook, and you can set a
>> > deny_reset flag on a hook to prevent anyone from resetting it. It's likely 
>> > that
>> > an individual would want to set this in any production use case.
>>
>> One fairly serious problem that seccomp had to overcome was dealing
>> with exec+setuid in the face of an attacker. The main example is "what
>> if we refuse to allow a program to drop privileges via a filter rule?"
>> For seccomp, no-new-privs was introduced for non-root users of
>> seccomp. Programmatic syscall (or LSM) filters need to deal with this,
>> and it's a bit ungainly. :)
>>
> Couldn't someone do the same with SELinux, or Apparmor?

The "big" LSMs aren't defined programmatically by non-root users, so
there is no risk of elevating privileges (they are already root).

>> Also, if you have a prctl API that already has 3 operations, you might
>> want to use a new syscall anyway. :)
>>
> Looking at other LSMs, they appear to expose their API via a virtual 
> filesystem,
>

Re: [RFC 0/4] RFC: Add Checmate, BPF-driven minor LSM

2016-08-08 Thread Sargun Dhillon

On Mon, Aug 08, 2016 at 04:44:02PM -0700, Kees Cook wrote:
> On Thu, Aug 4, 2016 at 12:11 AM, Sargun Dhillon  wrote:
> > I distributed this patchset to linux-security-mod...@vger.kernel.org 
> > earlier,
> > but based on the fact that the archive is down, and this is a fairly
> > broad-sweeping proposal, I figured I'd grow the audience a little bit. Sorry
> > if you received this multiple times.
> >
> > I've begun building out the skeleton of a Linux Security Module, and I'd 
> > like to
> > get feedback on it. It's a skeleton, and I've only populated a few hooks, 
> > so I'm
> > mostly looking for input on the general proposal, interest, and design. 
> > It's a
> > minor LSM. My particular use case is one in which containers are being
> > dynamically deployed to machines by internal developers in a different 
> > group.
> > The point of Checmate is to act as an extensible bed for _safe_, complex
> > security policies. It's nice to enable dynamic security policies that can be
> > defined in C, and change as neccessary, without ever having to patch, or 
> > rebuild
> > the kernel.
> >
> > For many of these containers, the security policies can be fairly nuanced. 
> > One
> > particular one to take into account is network security. Often times,
> > administrators want to prevent ingress, and egress connectivity except from 
> > a
> > few select IPs. Egress filtering can be managed using net_cls, but without
> > modifying running software, it's non-trivial to attach a filter to all 
> > sockets
> > being created within a container. The inet_conn_request, socket_recvmsg,
> > socket_sock_rcv_skb hooks make this trivial to implement.
> >
> > Other times, containers need to be throttled in places where there's not 
> > really
> > a good place to impose that policy for software which isn't built in-house. 
> >  If
> > one wants to limit file creations/sec, or reject I/O under certain
> > characteristics, there's not a great place to do it now. This gives 
> > engineers a
> > mechanism to write those policies.
> >
> > This same flexibility can be used to take existing programs and enable safe 
> > BPF
> > helpers to modify memory to allow rules to pass. One example that I 
> > prototyped
> > was Docker's port mapping, which has an overhead (DNAT), and there's some 
> > loss
> > of fidelity in the BSD Socket API to identify what's going on. Instead, we 
> > can
> > just rewrite the port in a bind, based upon some data in a BPF map, and a 
> > cgroup
> > match.
> >
> > I can actually see other minor security modules being implemented in 
> > Checmate,
> > for example, Yama, or the recently proposed Hardchroot could be 
> > reimplemented in
> > BPF. Potentially, they could even be API compatible.
> >
> > Although, at first, much of this sounds like seccomp, it's quite different. 
> > For
> > one, what we can do in the security hooks is more complex (access to kernel
> > pointers). The other side of this is we can have effects on a system-wide,
> > or cgroup level. This also circumvents the need for CRIU-friendly policies.
> >
> > Lastly, the flexibility of this mechanism allows for prevention of security
> > vulnerabilities which are often complex in nature and require the 
> > interaction
> > of multiple hooks (CVE-2014-9717 is a good example), and although ksplice,
> > and livepatch exist, they're not always easy to use, as compared to loading
> > a single bpf program across all kernels.
> >
> > The user-facing API is exposed via prctl as it's meant to be very simple (at
> > least the kernel components). It only has three operations. For a given 
> > security
> > hook, you can attach a BPF program to it, which will add it to the set of
> > programs that are executed over when the hook is hit. You can reset a hook,
> > which removes all program associated with a given hook, and you can set a
> > deny_reset flag on a hook to prevent anyone from resetting it. It's likely 
> > that
> > an individual would want to set this in any production use case.
> 
> One fairly serious problem that seccomp had to overcome was dealing
> with exec+setuid in the face of an attacker. The main example is "what
> if we refuse to allow a program to drop privileges via a filter rule?"
> For seccomp, no-new-privs was introduced for non-root users of
> seccomp. Programmatic syscall (or LSM) filters need to deal with this,
> and it's a bit ungainly. :)
> 
Couldn't someone do the same with SELinux, or Apparmor?

> Also, if you have a prctl API that already has 3 operations, you might
> want to use a new syscall anyway. :)
> 
Looking at other LSMs, they appear to expose their API via a virtual 
filesystem, 
or prctl. I followed the model of YAMA. I think there may be two more 
operations 
(detach program, and mark a hook as append-only / read-only / disabled). It 
seems like overkill to implement my own syscall.

> > On the BPF side of it, all that's involved in the work in progress is to
> > move some of the tracing helpers

Re: [RFC 0/4] RFC: Add Checmate, BPF-driven minor LSM

2016-08-08 Thread Kees Cook

On Thu, Aug 4, 2016 at 12:11 AM, Sargun Dhillon  wrote:
> I distributed this patchset to linux-security-mod...@vger.kernel.org earlier,
> but based on the fact that the archive is down, and this is a fairly
> broad-sweeping proposal, I figured I'd grow the audience a little bit. Sorry
> if you received this multiple times.
>
> I've begun building out the skeleton of a Linux Security Module, and I'd like 
> to
> get feedback on it. It's a skeleton, and I've only populated a few hooks, so 
> I'm
> mostly looking for input on the general proposal, interest, and design. It's a
> minor LSM. My particular use case is one in which containers are being
> dynamically deployed to machines by internal developers in a different group.
> The point of Checmate is to act as an extensible bed for _safe_, complex
> security policies. It's nice to enable dynamic security policies that can be
> defined in C, and change as neccessary, without ever having to patch, or 
> rebuild
> the kernel.
>
> For many of these containers, the security policies can be fairly nuanced. One
> particular one to take into account is network security. Often times,
> administrators want to prevent ingress, and egress connectivity except from a
> few select IPs. Egress filtering can be managed using net_cls, but without
> modifying running software, it's non-trivial to attach a filter to all sockets
> being created within a container. The inet_conn_request, socket_recvmsg,
> socket_sock_rcv_skb hooks make this trivial to implement.
>
> Other times, containers need to be throttled in places where there's not 
> really
> a good place to impose that policy for software which isn't built in-house.  
> If
> one wants to limit file creations/sec, or reject I/O under certain
> characteristics, there's not a great place to do it now. This gives engineers 
> a
> mechanism to write those policies.
>
> This same flexibility can be used to take existing programs and enable safe 
> BPF
> helpers to modify memory to allow rules to pass. One example that I prototyped
> was Docker's port mapping, which has an overhead (DNAT), and there's some loss
> of fidelity in the BSD Socket API to identify what's going on. Instead, we can
> just rewrite the port in a bind, based upon some data in a BPF map, and a 
> cgroup
> match.
>
> I can actually see other minor security modules being implemented in Checmate,
> for example, Yama, or the recently proposed Hardchroot could be reimplemented 
> in
> BPF. Potentially, they could even be API compatible.
>
> Although, at first, much of this sounds like seccomp, it's quite different. 
> For
> one, what we can do in the security hooks is more complex (access to kernel
> pointers). The other side of this is we can have effects on a system-wide,
> or cgroup level. This also circumvents the need for CRIU-friendly policies.
>
> Lastly, the flexibility of this mechanism allows for prevention of security
> vulnerabilities which are often complex in nature and require the interaction
> of multiple hooks (CVE-2014-9717 is a good example), and although ksplice,
> and livepatch exist, they're not always easy to use, as compared to loading
> a single bpf program across all kernels.
>
> The user-facing API is exposed via prctl as it's meant to be very simple (at
> least the kernel components). It only has three operations. For a given 
> security
> hook, you can attach a BPF program to it, which will add it to the set of
> programs that are executed over when the hook is hit. You can reset a hook,
> which removes all program associated with a given hook, and you can set a
> deny_reset flag on a hook to prevent anyone from resetting it. It's likely 
> that
> an individual would want to set this in any production use case.

One fairly serious problem that seccomp had to overcome was dealing
with exec+setuid in the face of an attacker. The main example is "what
if we refuse to allow a program to drop privileges via a filter rule?"
For seccomp, no-new-privs was introduced for non-root users of
seccomp. Programmatic syscall (or LSM) filters need to deal with this,
and it's a bit ungainly. :)

Also, if you have a prctl API that already has 3 operations, you might
want to use a new syscall anyway. :)

> On the BPF side of it, all that's involved in the work in progress is to
> move some of the tracing helpers into the shared helpers. For example,
> it's very valuable to have access to current when enforcing a hook.
> BPF programs also have access to maps, which somewhat works around
> the need for security blobs in some cases.

Just from a compatibility perspective, doesn't this end up exposing
kernel structures to userspace? What happens when the structures
change?

And from a security perspective, programmatic examination of kernel
structures means you can trivially leak kernel memory locations and
contents. Resisting these sorts of leaks needs to be addressed too.

This looks like a subset of kprobes but available to non-root

[Patch iproute2] tc: fix a misleading failure

2016-08-08 Thread Cong Wang

Before this patch:

 # ./tc/tc actions add action drop index 11
 RTNETLINK answers: File exists
 We have an error talking to the kernel
 Command "(null)" is unknown, try "tc actions help".

After this patch:

 # ./tc/tc actions add action drop index 11
 RTNETLINK answers: File exists
 We have an error talking to the kernel

Cc: Stephen Hemminger 
Cc: Jamal Hadi Salim 
Signed-off-by: Cong Wang 
---
 tc/m_action.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/tc/m_action.c b/tc/m_action.c
index 24f8b5d..bb19df8 100644
--- a/tc/m_action.c
+++ b/tc/m_action.c
@@ -623,14 +623,12 @@ int do_action(int argc, char **argv)
act_usage();
return -1;
} else {
-
-   ret = -1;
-   }
-
-   if (ret < 0) {
fprintf(stderr, "Command \"%s\" is unknown, try \"tc 
actions help\".\n", *argv);
return -1;
}
+
+   if (ret < 0)
+   return -1;
}
 
return 0;
-- 
2.1.0

Re: size of data_segs_[in|out] and segs_[in|out]

2016-08-08 Thread David Miller

From: rapier 
Date: Mon, 8 Aug 2016 18:02:29 -0400

> As such, would it be feasible to define these instruments as 64bit
> instead of 32bit? If so, a cursory look at the code seems to indicate
> that this would only require a change in the header files.

It would break every application looking at these datastructures
right now.

qdisc hash table changes...

2016-08-08 Thread David Miller


I think there will still be build failures even with v6 due to symbol
clashes.

For example, kernel/audit_tree.c defines HASH_SIZE as an enumeration
value, and that (indirectly) includes networking headers.

There are others all over the tree.

I would therefore ask that you first fix the namespace conflicts
against the hash symbols in the entire tree as a separate patch series
(one for each driver/subsystem which has this problem.)  Really, get
it down to "git grep hash_add | grep -v _hash_add" and similar
returning no output.

Then we can add the qdisc hash facility.

Thanks.

size of data_segs_[in|out] and segs_[in|out]

2016-08-08 Thread rapier

The instruments for data_segs_in, data_segs_out, segs_in, and segs_out 
(along with the corresponding tcpi_ variables) are currently defined as 
unsigned 32 bit ints. While this is in line with RFC4898 I'm thinking 
that for some flows this value might be too small.


For example, a 1GB sustained flow at 1500 bytes would exceed the max 
value inside of around 14.7 hours. Which seems like a long time but at 
higher rates, even with 9k packets, this could cause problems within 7.5 
hours or so (after ~36TB of data). While this is probably not an issue 
for a large number of people in the scientific community transferring 
data sets of this size is happening and likely to become far more common.


As such, would it be feasible to define these instruments as 64bit 
instead of 32bit? If so, a cursory look at the code seems to indicate 
that this would only require a change in the header files.


Chris

Re: [PATCH net] vti: flush x-netns xfrm cache when vti interface is removed

2016-08-08 Thread David Miller

From: Lance Richardson 
Date: Mon,  8 Aug 2016 18:22:45 -0400

> @@ -392,6 +393,17 @@ static int vti_tunnel_init(struct net_device *dev)
>   return ip_tunnel_init(dev);
>  }
>  
> +static void vti_tunnel_uninit(struct net_device *dev)
> +{
> + struct ip_tunnel *tunnel = netdev_priv(dev);
> + struct net *net = tunnel->net;
> +
> + ip_tunnel_uninit(dev);
> +
> + if (!net_eq(net, dev_net(dev)))
> + xfrm_garbage_collect(net);
> +}

Like the normal netns, this netns should be expunged from the
flow cache on interface down, not uninit.

So like the existing facilities do, you should add a NETDEV_DOWN
notifier that flushes tunnel->net if necessary.

[PATCH net] vti: flush x-netns xfrm cache when vti interface is removed

2016-08-08 Thread Lance Richardson

When executing the script included below, the netns delete operation
hangs with the following message (repeated at 10 second intervals):

  kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

This occurs because a reference to the lo interface in the "secure" netns
is still held by a dst entry in the xfrm bundle cache in the init netns.

Prevent this problem by garbage collecting the tunnel namespace flow cache
when a vti interface is deleted from a namespace that is different from
the tunnel namepace.

A more detailed description of the problem scenario (referencing commands
in the problem script) is as follows:

(1) ip link add vti_test type vti local 1.1.1.1 remote 1.1.1.2 key 1

  The vti_test interface is created in the init namespace. vti_tunnel_init()
  attaches a struct ip_tunnel to the vti interface's netdev_priv(dev),
  setting tunnel->net to _net.

(2) ip link set vti_test netns secure

  The vti_test interface is moved to the "secure" netns. Note that
  the associated struct ip_tunnel still has tunnel->net set to _net.

(3) ip netns exec secure ping -c 4 -i 0.02 -I 192.168.100.1 192.168.200.1

  The first packet sent using the vti device causes xfrm_lookup() to be
  called as follows:

  dst = xfrm_lookup(tunnel->net, skb_dst(skb), fl, NULL, 0);

  Note that tunnel->net is the init namespace, while skb_dst(skb) references
  the vti_test interface in the "secure" namespace. The returned dst
  references an interface in the init namespace.

  Also note that the first parameter to xfrm_lookup() determines which flow
  cache is used to store the computed xfrm bundle, so after xfrm_lookup()
  returns there will be a cached bundle in the init namespace flow cache
  with a dst referencing a device in the "secure" namespace.

(4) ip netns del secure

  Kernel begins to delete the "secure" namespace.  At some point the
  vti_test interface is deleted, at which point dst_ifdown() changes
  the dst->dev in the cached xfrm bundle flow from vti_test to lo (still
  in the "secure" namespace however).

  Since nothing has happened to cause the init namespace's flow cache
  to be garbage collected, this dst remains attached to the flow cache,
  so the kernel loops waiting for the last reference to lo to go away.


ip link add br1 type bridge
ip link set dev br1 up
ip addr add dev br1 1.1.1.1/8

ip netns add secure
ip link add vti_test type vti local 1.1.1.1 remote 1.1.1.2 key 1
ip link set vti_test netns secure
ip netns exec secure ip link set vti_test up
ip netns exec secure ip link s lo up
ip netns exec secure ip addr add dev lo 192.168.100.1/24
ip netns exec secure ip route add 192.168.200.0/24 dev vti_test
ip xfrm policy flush
ip xfrm state flush
ip xfrm policy add dir out tmpl src 1.1.1.1 dst 1.1.1.2 \
   proto esp mode tunnel mark 1
ip xfrm policy add dir in tmpl src 1.1.1.2 dst 1.1.1.1 \
   proto esp mode tunnel mark 1
ip xfrm state add src 1.1.1.1 dst 1.1.1.2 proto esp spi 1 \
   mode tunnel enc des3_ede 0x112233445566778811223344556677881122334455667788
ip xfrm state add src 1.1.1.2 dst 1.1.1.1 proto esp spi 1 \
   mode tunnel enc des3_ede 0x112233445566778811223344556677881122334455667788

ip netns exec secure ping -c 4 -i 0.02 -I 192.168.100.1 192.168.200.1

ip netns del secure


Reported-by: Hangbin Liu 
Reported-by: Jan Tluka 
Signed-off-by: Lance Richardson 
---
 net/ipv4/ip_vti.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index a917903..a815735 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -48,6 +48,7 @@ static struct rtnl_link_ops vti_link_ops __read_mostly;
 
 static int vti_net_id __read_mostly;
 static int vti_tunnel_init(struct net_device *dev);
+static void vti_tunnel_uninit(struct net_device *dev);
 
 static int vti_input(struct sk_buff *skb, int nexthdr, __be32 spi,
 int encap_type)
@@ -359,7 +360,7 @@ vti_tunnel_ioctl(struct net_device *dev, struct ifreq *ifr, 
int cmd)
 
 static const struct net_device_ops vti_netdev_ops = {
.ndo_init   = vti_tunnel_init,
-   .ndo_uninit = ip_tunnel_uninit,
+   .ndo_uninit = vti_tunnel_uninit,
.ndo_start_xmit = vti_tunnel_xmit,
.ndo_do_ioctl   = vti_tunnel_ioctl,
.ndo_change_mtu = ip_tunnel_change_mtu,
@@ -392,6 +393,17 @@ static int vti_tunnel_init(struct net_device *dev)
return ip_tunnel_init(dev);
 }
 
+static void vti_tunnel_uninit(struct net_device *dev)
+{
+   struct ip_tunnel *tunnel = netdev_priv(dev);
+   struct net *net = tunnel->net;
+
+   ip_tunnel_uninit(dev);
+
+   if (!net_eq(net, dev_net(dev)))
+   xfrm_garbage_collect(net);
+}
+
 static void __net_init vti_fb_tunnel_init(struct net_device *dev)
 {
struct ip_tunnel *tunnel = netdev_priv(dev);
-- 
2.5.5

Re: [PATCH] net: make net namespace sysctls belong to container's owner

2016-08-08 Thread Eric W. Biederman

Dmitry Torokhov  writes:

> On Mon, Aug 8, 2016 at 2:08 PM, Eric W. Biederman  
> wrote:
>> Dmitry Torokhov  writes:
>>
>>> If net namespace is attached to a user namespace let's make container's
>>> root owner of sysctls affecting said network namespace instead of global
>>> root.
>>>
>>> This also allows us to clean up net_ctl_permissions() because we do not
>>> need to fudge permissions anymore for the container's owner since it now
>>> owns the objects in question.
>>
>> Acked-by: "Eric W. Biederman" 
>>
>> Overall this seems reasonable.  However I am not a fan of your error
>> handling.
>>
>>> Signed-off-by: Dmitry Torokhov 
>>> ---
>>>
>>> This helps when running Android CTS in a container, but I think it makes
>>> sense regardless.
>>
>>> +static void net_ctl_set_ownership(struct ctl_table_header *head,
>>> +   struct ctl_table *table,
>>> +   kuid_t *uid, kgid_t *gid)
>>> +{
>>> + struct net *net = container_of(head->set, struct net, sysctls);
>>> +
>>> + *uid = make_kuid(net->user_ns, 0);
>>> + if (!uid_valid(*uid))
>>> + *uid = GLOBAL_ROOT_UID;
>>> +
>>> + *gid = make_kgid(net->user_ns, 0);
>>> + if (!gid_valid(*gid))
>>> + *gid = GLOBAL_ROOT_GID;
>>
>> This code should eiter be:
>> *uid = make_kuid(net->user_ns, 0);
>> *gid = make_kgid(net->user_ns, 0);
>>
>> Or it should be:
>> tmp_uid = make_kuid(net->user_ns, 0);
>> if (uid_valid(tmp_uid))
>> *uid = tmp_uid;
>>
>> tmp_gid = make_kgid(net->user_ns, 0);
>> if (gid_valid(tmp_gid))
>> *gid = tmp_gid;
>>
>> It is just very fragile to assume to know what uid and gid
>> would be if this code fails.
>>
>> As of v4.8-rc1 INVALID_UID and INVALID_GID can be set in inode->i_uid
>> and inode->i_gid without causing horrible vfs confusion (making the
>> first option viable), but I expect with the mention of Android you want
>> to backport this so I will ask that you ask to implement the error
>> handling that doesn't assume you know better than the generic code.
>>
>> If you don't have a better value to set something to it really should be
>> left alone.
>
> OK, fair enough. I will adopt the 2nd option and will resubmit. I need
> to also test without net namespaces support (my other change blows up
> because we are getting half-initialized init_net structure when
> namespaces are disabled).

No rush.  I will be out on vacation for the next couple of weeks.

Eric

Re: [PATCH] proc: make proc entries inherit ownership from parent

2016-08-08 Thread Dmitry Torokhov

On Thu, Aug 4, 2016 at 8:22 PM, Dmitry Torokhov
 wrote:
> There are certain parameters that belong to net namespace and that are
> exported in /proc. They should be controllable by the container's owner,
> but are currently owned by global root and thus not available.
>
> Let's change proc code to inherit ownership of parent entry, and when
> create per-ns "net" proc entry set it up as owned by container's owner.
>
> Signed-off-by: Dmitry Torokhov 
> ---

Unfortunately this blows up if !CONFIG_NET_NS because of:

commit ed160e839d2e1118529e58b04d52dba703ca629c
Author: Denis V. Lunev 
Date:   Tue Nov 13 03:23:21 2007 -0800

[NET]: Cleanup pernet operation without CONFIG_NET_NS

If CONFIG_NET_NS is not set, the only namespace is possible.

This patch removes list of pernet_operations and cleanups code a bit.
This list is not needed if there are no namespaces. We should just call
->init method.

Additionally, the ->exit will be called on module unloading only. This
case is safe - the code is not discarded. For the in/kernel code, ->exit
should never be called.

Signed-off-by: Denis V. Lunev 
Signed-off-by: David S. Miller 

This causes proc_net_ns_init() to be called with not-yet-initialized
init_net namespace and oops due to net->user_ns being NULL.

Unfortunately simply reverting did not appear to work. I'll figure out
what to do and resubmit.


>  fs/proc/generic.c  |  2 ++
>  fs/proc/proc_net.c | 13 +
>  2 files changed, 15 insertions(+)
>
> diff --git a/fs/proc/generic.c b/fs/proc/generic.c
> index c633476..bca66d8 100644
> --- a/fs/proc/generic.c
> +++ b/fs/proc/generic.c
> @@ -390,6 +390,8 @@ static struct proc_dir_entry *__proc_create(struct 
> proc_dir_entry **parent,
> atomic_set(>count, 1);
> spin_lock_init(>pde_unload_lock);
> INIT_LIST_HEAD(>pde_openers);
> +   proc_set_user(ent, (*parent)->uid, (*parent)->gid);
> +
>  out:
> return ent;
>  }
> diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c
> index c8bbc68..d701738 100644
> --- a/fs/proc/proc_net.c
> +++ b/fs/proc/proc_net.c
> @@ -21,6 +21,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>
> @@ -185,6 +186,8 @@ const struct file_operations proc_net_operations = {
>  static __net_init int proc_net_ns_init(struct net *net)
>  {
> struct proc_dir_entry *netd, *net_statd;
> +   kuid_t uid;
> +   kgid_t gid;
> int err;
>
> err = -ENOMEM;
> @@ -199,6 +202,16 @@ static __net_init int proc_net_ns_init(struct net *net)
> netd->parent = _root;
> memcpy(netd->name, "net", 4);
>
> +   uid = make_kuid(net->user_ns, 0);
> +   if (!uid_valid(uid))
> +   uid = GLOBAL_ROOT_UID;
> +
> +   gid = make_kgid(net->user_ns, 0);
> +   if (!gid_valid(gid))
> +   gid = GLOBAL_ROOT_GID;
> +
> +   proc_set_user(netd, uid, gid);
> +
> err = -EEXIST;
> net_statd = proc_net_mkdir(net, "stat", netd);
> if (!net_statd)
> --
> 2.8.0.rc3.226.g39d4020
>
>
> --
> Dmitry

Thanks.

-- 
Dmitry

Re: [PATCH] net: make net namespace sysctls belong to container's owner

2016-08-08 Thread Dmitry Torokhov

On Mon, Aug 8, 2016 at 2:08 PM, Eric W. Biederman  wrote:
> Dmitry Torokhov  writes:
>
>> If net namespace is attached to a user namespace let's make container's
>> root owner of sysctls affecting said network namespace instead of global
>> root.
>>
>> This also allows us to clean up net_ctl_permissions() because we do not
>> need to fudge permissions anymore for the container's owner since it now
>> owns the objects in question.
>
> Acked-by: "Eric W. Biederman" 
>
> Overall this seems reasonable.  However I am not a fan of your error
> handling.
>
>> Signed-off-by: Dmitry Torokhov 
>> ---
>>
>> This helps when running Android CTS in a container, but I think it makes
>> sense regardless.
>
>> +static void net_ctl_set_ownership(struct ctl_table_header *head,
>> +   struct ctl_table *table,
>> +   kuid_t *uid, kgid_t *gid)
>> +{
>> + struct net *net = container_of(head->set, struct net, sysctls);
>> +
>> + *uid = make_kuid(net->user_ns, 0);
>> + if (!uid_valid(*uid))
>> + *uid = GLOBAL_ROOT_UID;
>> +
>> + *gid = make_kgid(net->user_ns, 0);
>> + if (!gid_valid(*gid))
>> + *gid = GLOBAL_ROOT_GID;
>
> This code should eiter be:
> *uid = make_kuid(net->user_ns, 0);
> *gid = make_kgid(net->user_ns, 0);
>
> Or it should be:
> tmp_uid = make_kuid(net->user_ns, 0);
> if (uid_valid(tmp_uid))
> *uid = tmp_uid;
>
> tmp_gid = make_kgid(net->user_ns, 0);
> if (gid_valid(tmp_gid))
> *gid = tmp_gid;
>
> It is just very fragile to assume to know what uid and gid
> would be if this code fails.
>
> As of v4.8-rc1 INVALID_UID and INVALID_GID can be set in inode->i_uid
> and inode->i_gid without causing horrible vfs confusion (making the
> first option viable), but I expect with the mention of Android you want
> to backport this so I will ask that you ask to implement the error
> handling that doesn't assume you know better than the generic code.
>
> If you don't have a better value to set something to it really should be
> left alone.

OK, fair enough. I will adopt the 2nd option and will resubmit. I need
to also test without net namespaces support (my other change blows up
because we are getting half-initialized init_net structure when
namespaces are disabled).

Thanks.

-- 
Dmitry

[PATCH] bonding: Allow tun-interfaces as slaves

2016-08-08 Thread Jörn Engel

Up until 00503b6f702e (part of 3.14-rc1), the bonding driver could be
used to enslave tun-interfaces.  00503b6f702e broke that behaviour,
afaics as an unintended side-effect.

For the purpose of bond-over-tun in balance-rr mode, simply ignoring the
error from dev_set_mac_address() is good enough.  I am not familiar
enough with the code to judge what new problems this patch might
introduce.

Signed-off-by: Joern Engel 
---
 drivers/net/bonding/bond_main.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 1f276fa30ba6..bc5dba847f50 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1489,11 +1489,7 @@ int bond_enslave(struct net_device *bond_dev, struct 
net_device *slave_dev)
 */
memcpy(addr.sa_data, bond_dev->dev_addr, bond_dev->addr_len);
addr.sa_family = slave_dev->type;
-   res = dev_set_mac_address(slave_dev, );
-   if (res) {
-   netdev_dbg(bond_dev, "Error %d calling 
set_mac_address\n", res);
-   goto err_restore_mtu;
-   }
+   dev_set_mac_address(slave_dev, );
}
 
/* set slave flag before open to prevent IPv6 addrconf */
@@ -1777,7 +1773,6 @@ err_restore_mac:
dev_set_mac_address(slave_dev, );
}
 
-err_restore_mtu:
dev_set_mtu(slave_dev, new_slave->original_mtu);
 
 err_free:
-- 
2.1.4

Re: [PATCH net] net: ipv6: Fix ping to link-local addresses.

2016-08-08 Thread David Miller

From: Lorenzo Colitti 
Date: Mon,  8 Aug 2016 16:42:07 +0900

> ping_v6_sendmsg never sets flowi6_oif, so it is not possible to
> ping an IPv6 address on a different interface. Instead, it sets
> flowi6_iif, which is incorrect but harmless. Also, it returns an
> error if a passed-in scope ID doesn't match sk_bound_dev_if.
> 
> Get rid of the error, stop setting flowi6_iif, and support
> various ways of setting oif in the same priority order used by
> udpv6_sendmsg.
> 
> Tested: https://android-review.googlesource.com/#/c/254470/
> Signed-off-by: Lorenzo Colitti 

We should always give sk_bound_dev_if the highest priority.

Also, we should amend, not delete, the check against the scope
ID in the sockaddr.  As explained by YOSHIFUJI Hideaki.

Re: [PATCH net] sctp: use event->chunk when it's valid

2016-08-08 Thread David Miller

From: Xin Long 
Date: Sun,  7 Aug 2016 14:15:13 +0800

> Commit 52253db924d1 ("sctp: also point GSO head_skb to the sk when
> it's available") used event->chunk->head_skb to get the head_skb in
> sctp_ulpevent_set_owner().
> 
> But at that moment, the event->chunk was NULL, as it cloned the skb
> in sctp_ulpevent_make_rcvmsg(). Therefore, that patch didn't really
> work.
> 
> This patch is to move the event->chunk initialization before calling
> sctp_ulpevent_receive_data() so that it uses event->chunk when it's
> valid.
> 
> Fixes: 52253db924d1 ("sctp: also point GSO head_skb to the sk when it's 
> available")
> Signed-off-by: Xin Long 

Applied, thank you.

Re: [PATCH v2 00/10] userns: sysctl limits for namespaces

2016-08-08 Thread Eric W. Biederman


I won't have any more time for this until I return from vacation at the
end of the month but after a little bit of thought I think I have fixed
all of the bugs (except arguably the return value).

I have further tweaked these and made the limits per user.  Because it
occured to me that if the limits are global it is possible for one
misbehaving user to DOS another which is undesirable in princible.

I have put my updated code at:
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git 
for-testing

Eric

Re: [PATCH] net: make net namespace sysctls belong to container's owner

2016-08-08 Thread Eric W. Biederman

Dmitry Torokhov  writes:

> If net namespace is attached to a user namespace let's make container's
> root owner of sysctls affecting said network namespace instead of global
> root.
>
> This also allows us to clean up net_ctl_permissions() because we do not
> need to fudge permissions anymore for the container's owner since it now
> owns the objects in question.

Acked-by: "Eric W. Biederman" 

Overall this seems reasonable.  However I am not a fan of your error
handling.

> Signed-off-by: Dmitry Torokhov 
> ---
>
> This helps when running Android CTS in a container, but I think it makes
> sense regardless.

> +static void net_ctl_set_ownership(struct ctl_table_header *head,
> +   struct ctl_table *table,
> +   kuid_t *uid, kgid_t *gid)
> +{
> + struct net *net = container_of(head->set, struct net, sysctls);
> +
> + *uid = make_kuid(net->user_ns, 0);
> + if (!uid_valid(*uid))
> + *uid = GLOBAL_ROOT_UID;
> +
> + *gid = make_kgid(net->user_ns, 0);
> + if (!gid_valid(*gid))
> + *gid = GLOBAL_ROOT_GID;

This code should eiter be:
*uid = make_kuid(net->user_ns, 0);
*gid = make_kgid(net->user_ns, 0);

Or it should be:
tmp_uid = make_kuid(net->user_ns, 0);
if (uid_valid(tmp_uid))
*uid = tmp_uid;

tmp_gid = make_kgid(net->user_ns, 0);
if (gid_valid(tmp_gid))
*gid = tmp_gid;

It is just very fragile to assume to know what uid and gid
would be if this code fails.

As of v4.8-rc1 INVALID_UID and INVALID_GID can be set in inode->i_uid
and inode->i_gid without causing horrible vfs confusion (making the
first option viable), but I expect with the mention of Android you want
to backport this so I will ask that you ask to implement the error
handling that doesn't assume you know better than the generic code.

If you don't have a better value to set something to it really should be
left alone.

Eric

Re: [Regression] Bonding no longer support tun-interfaces

2016-08-08 Thread Jörn Engel

Redirected by Davem.

Is there a mailing list or a maintainer for regressions?  There used to
be, but I've been out of the loop for a while.

On Mon, Aug 08, 2016 at 02:15:30PM -0700, Jörn Engel wrote:
> This has been reported (and ignored) before:
> http://lkml.iu.edu/hypermail/linux/kernel/1407.2/03790.html
> https://bugzilla.kernel.org/show_bug.cgi?id=89161
> 
> Regression was introduced by:
> 
> commit 00503b6f702e (refs/bisect/bad)
> Author: dingtianhong 
> Date:   Sat Jan 25 13:00:29 2014 +0800
> 
> bonding: fail_over_mac should only affect AB mode at enslave and removal 
> processing
> 
> According to bonding.txt, the fail_over_ma should only affect 
> active-backup mode,
> but I found that the fail_over_mac could be set to active or follow in all
> modes, this will cause new slave could not be set to bond's MAC address at
> enslave processing and restore its own MAC address at removal processing.
> 
> The correct way to fix the problem is that we should not add restrictions 
> when
> setting options, just need to modify the bond enslave and removal 
> processing
> to check the mode in addition to fail_over_mac when setting a slave's MAC 
> during
> enslavement. The change active slave processing already only calls the 
> fail_over_mac
> function when in active-backup mode.
> 
> Thanks for Jay's suggestion.
> 
> The patch also modify the pr_warning() to pr_warn().
> 
> Cc: Jay Vosburgh 
> Cc: Veaceslav Falico 
> Cc: Andy Gospodarek 
> Signed-off-by: Ding Tianhong 
> Signed-off-by: David S. Miller 
> 
> Since I never needed bonding or tun-interfaces before, I come late to
> the party.  Some 6k lines have changed in the bonding driver since the
> regression got in two years ago.  So a simple revert is unlikely to lead
> to happiness.
> 
> But I absolutely need that functionality and would rather run a 3.13
> kernel than live with the regression.  dingtianhong, any suggestions?
> 
> Jörn
> 
> --
> It is a cliché that most clichés are true, but then, like most clichés,
> that cliché is untrue.
> -- Stephen Fry

Jörn

--
I can say that I spend most of my time fixing bugs even if I have lots
of new features to implement in mind, but I give bugs more priority.
-- Andrea Arcangeli, 2000

Re: [PATCH net 2/2] net: vxlan: lwt: Fix vxlan local traffic.

2016-08-08 Thread David Miller

From: Pravin B Shelar 
Date: Fri,  5 Aug 2016 17:45:37 -0700

> vxlan driver has bypass for local vxlan traffic, but that
> depends on information about all VNIs on local system in
> vxlan driver. This is not available in case of LWT.
> Therefore following patch disable encap bypass for LWT
> vxlan traffic.
> 
> Fixes: ee122c79d42 ("vxlan: Flow based tunneling").
> Reported-by: Jakub Libosvar 
> Signed-off-by: Pravin B Shelar 

Applied.

Re: [PATCH net 1/2] net: vxlan: lwt: Use source ip address during route lookup.

2016-08-08 Thread David Miller

From: Pravin B Shelar 
Date: Fri,  5 Aug 2016 17:45:36 -0700

> LWT user can specify destination as well as source ip address
> for given tunnel endpoint. But vxlan is ignoring given source
> ip address. Following patch uses both ip address to route the
> tunnel packet. This consistent with other LWT implementations,
> like GENEVE and GRE.
> 
> Fixes: ee122c79d42 ("vxlan: Flow based tunneling").
> Signed-off-by: Pravin B Shelar 

Applied.

Re: 4.6.3, pppoe + shaper workload, skb_panic / skb_push / ppp_start_xmit

2016-08-08 Thread Guillaume Nault

On Mon, Aug 08, 2016 at 02:25:00PM +0300, Denys Fedoryshchenko wrote:
> On 2016-08-01 23:59, Guillaume Nault wrote:
> > Do you still have the vmlinux file with debug symbols that generated
> > this panic?
> Sorry for delay, i didn't had same image on all servers and probably i found
> cause of panic, but still testing on several servers.
> If i remove SFQ qdisc from ppp shapers, servers not rebooting anymore.
> 
Thanks for the feedback. I wonder which interactions between SFQ and
PPP can lead to this problem. I'll take a look.

> But still i need around 2 days to make sure that's the reason.
> 
Okay, just let me know if you can confirm that removing SFQ really
solves the problem.

Re: [5.3] ucc_geth: Fix to avoid IS_ERR_VALUE abuses and dead code on 64bit systems.

2016-08-08 Thread Arnd Bergmann

On Monday, August 8, 2016 3:49:22 PM CEST David Laight wrote:
> From: Arnd Bergmann
> > Sent: 08 August 2016 16:13
> > 
> > On Monday, August 8, 2016 2:49:11 PM CEST David Laight wrote:
> > >
> > > > If qe_muram_alloc will return any error, Then IS_ERR_VALUE will always
> > > > return 0. it'll not call ucc_fast_free for any failure. Inside 'if code'
> > > > will be a dead code on 64bit. Even qe_muram_addr will return wrong
> > > > virtual address. Which can cause an error.
> > > >
> > > >  kfree((void *)ugeth->tx_bd_ring_offset[i]);
> > >
> > > Erm, kfree() isn't the right function for things allocated by 
> > > qe_muram_alloc().
> > >
> > > I still thing you need to stop this code using IS_ERR_VALUE() at all.
> > 
> > Those are two separate issues:
> > 
> > a) The ucc_geth driver mixing kmalloc() memory with muram, and assigning
> >the result to "u32" and "void __iomem *" variables, both of which
> >are wrong at least half of the time.
> > 
> > b) calling conventions of qe_muram_alloc() being defined in a way that
> >requires the use of IS_ERR_VALUE(), because '0' is a valid address
> >here.
> 
> Yep, it is all a big bag of worms...
> '0' being valid is going to make tidying up after failure 'problematic'.
> 
> > The first one can be solved by updating the network driver, ideally
> > by getting rid of the casts and using proper types and accessors,
> > while the second would require updating all users of that interface.
> 
> It might be worth (at least as a compilation option) of embedding the
> 'muram offset' in a structure (passed and returned by value).
> 
> The compiler can then check that the driver code is never be looking
> directly at the value.
>
> For 'b' zero can be made invalid by changing the places where the
> offset is added/subtracted.
> It could even be used to offset the saved physical and virtual
> addresses of the area - so not needing any extra code when the values
> are converted to physical/virtual addresses.

Agreed.

For this driver, we don't actually seem to use the value returned from
the allocation function, only the virtual __iomem address we get after
calling qe_muram_addr(), so it would be a big improvement to just
store the virtual address as a pointer, and wrap the calls
to qe_muram_alloc/qe_muram_addr/qe_muram_free with an appropriate
helper that doesn't even show the offset.

However, I'd also separate the normal kmalloc pointer from the
muram_alloc() pointer because only the latter is __iomem, and
we shouldn't really call MMIO accessor functions on RAM in
portable code.

Arnd

[Patch net 5/5] net_sched: convert tcf_exts from list to flex_array

2016-08-08 Thread Cong Wang

As pointed out by Jamal, an action could be shared by
multiple filters, so we can't use list to chain them
any more after we get rid of the original tc_action.
Instead, we could just save pointers to these actions
in tcf_exts, since they are refcount'ed, so convert
the list to a flex array.

The ugly part is the action API still accepts list
as a parameter, I just introduce a helper function to
convert the flex array of pointers to a list.

Fixes: a85a970af265 ("net_sched: move tc_action into tcf_common")
Reported-by: Jamal Hadi Salim 
Cc: Jamal Hadi Salim 
Signed-off-by: Cong Wang 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c   |  4 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 12 --
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c  |  4 +-
 include/net/pkt_cls.h   | 44 +--
 net/sched/cls_api.c | 57 ++---
 5 files changed, 87 insertions(+), 34 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 5418c69a..6ac1254 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -8390,12 +8390,14 @@ static int parse_tc_actions(struct ixgbe_adapter 
*adapter,
struct tcf_exts *exts, u64 *action, u8 *queue)
 {
const struct tc_action *a;
+   LIST_HEAD(actions);
int err;
 
if (tc_no_actions(exts))
return -EINVAL;
 
-   tc_for_each_action(a, exts) {
+   tcf_exts_to_list(exts, );
+   list_for_each_entry(a, , list) {
 
/* Drop action */
if (is_tcf_gact_shot(a)) {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 0f19b01..dc8b1cb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -318,6 +318,7 @@ static int parse_tc_nic_actions(struct mlx5e_priv *priv, 
struct tcf_exts *exts,
u32 *action, u32 *flow_tag)
 {
const struct tc_action *a;
+   LIST_HEAD(actions);
 
if (tc_no_actions(exts))
return -EINVAL;
@@ -325,7 +326,8 @@ static int parse_tc_nic_actions(struct mlx5e_priv *priv, 
struct tcf_exts *exts,
*flow_tag = MLX5_FS_DEFAULT_FLOW_TAG;
*action = 0;
 
-   tc_for_each_action(a, exts) {
+   tcf_exts_to_list(exts, );
+   list_for_each_entry(a, , list) {
/* Only support a single action per rule */
if (*action)
return -EINVAL;
@@ -362,13 +364,15 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv, 
struct tcf_exts *exts,
u32 *action, u32 *dest_vport)
 {
const struct tc_action *a;
+   LIST_HEAD(actions);
 
if (tc_no_actions(exts))
return -EINVAL;
 
*action = 0;
 
-   tc_for_each_action(a, exts) {
+   tcf_exts_to_list(exts, );
+   list_for_each_entry(a, , list) {
/* Only support a single action per rule */
if (*action)
return -EINVAL;
@@ -503,6 +507,7 @@ int mlx5e_stats_flower(struct mlx5e_priv *priv,
struct mlx5e_tc_flow *flow;
struct tc_action *a;
struct mlx5_fc *counter;
+   LIST_HEAD(actions);
u64 bytes;
u64 packets;
u64 lastuse;
@@ -518,7 +523,8 @@ int mlx5e_stats_flower(struct mlx5e_priv *priv,
 
mlx5_fc_query_cached(counter, , , );
 
-   tc_for_each_action(a, f->exts)
+   tcf_exts_to_list(f->exts, );
+   list_for_each_entry(a, , list)
tcf_action_stats_update(a, bytes, packets, lastuse);
 
return 0;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index c3e6150..df85acd 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -1149,6 +1149,7 @@ static int mlxsw_sp_port_add_cls_matchall(struct 
mlxsw_sp_port *mlxsw_sp_port,
  bool ingress)
 {
const struct tc_action *a;
+   LIST_HEAD(actions);
int err;
 
if (!tc_single_action(cls->exts)) {
@@ -1156,7 +1157,8 @@ static int mlxsw_sp_port_add_cls_matchall(struct 
mlxsw_sp_port *mlxsw_sp_port,
return -ENOTSUPP;
}
 
-   tc_for_each_action(a, cls->exts) {
+   tcf_exts_to_list(cls->exts, );
+   list_for_each_entry(a, , list) {
if (!is_tcf_mirred_mirror(a) || protocol != htons(ETH_P_ALL))
return -ENOTSUPP;
 
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index f15aa1e..2b6f0ec 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -1,6 +1,7 @@
 #ifndef __NET_PKT_CLS_H

[Patch net 3/5] net_sched: fix a typo in tc_for_each_action()

2016-08-08 Thread Cong Wang

It is harmless because all users pass 'a' to this macro.

Fixes: 00175aec941e ("net/sched: Macro instead of CONFIG_NET_CLS_ACT ifdef")
Cc: Amir Vadai 
Signed-off-by: Cong Wang 
---
 include/net/act_api.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index 41e6a24..f53ee9d 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -193,7 +193,7 @@ int tcf_action_copy_stats(struct sk_buff *, struct 
tc_action *, int);
(list_empty(&(_exts)->actions))
 
 #define tc_for_each_action(_a, _exts) \
-   list_for_each_entry(a, &(_exts)->actions, list)
+   list_for_each_entry(_a, &(_exts)->actions, list)
 
 #define tc_single_action(_exts) \
(list_is_singular(&(_exts)->actions))
-- 
2.1.0

[Patch net 4/5] net_sched: move tc offload macros to pkt_cls.h

2016-08-08 Thread Cong Wang

struct tcf_exts belongs to filters, should not be visible
to plain tc actions.

Cc: Ido Schimmel 
Signed-off-by: Cong Wang 
---
 include/net/act_api.h | 16 
 include/net/pkt_cls.h | 20 
 2 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index f53ee9d..3fc88ad 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -189,15 +189,6 @@ int tcf_action_dump_old(struct sk_buff *skb, struct 
tc_action *a, int, int);
 int tcf_action_dump_1(struct sk_buff *skb, struct tc_action *a, int, int);
 int tcf_action_copy_stats(struct sk_buff *, struct tc_action *, int);
 
-#define tc_no_actions(_exts) \
-   (list_empty(&(_exts)->actions))
-
-#define tc_for_each_action(_a, _exts) \
-   list_for_each_entry(_a, &(_exts)->actions, list)
-
-#define tc_single_action(_exts) \
-   (list_is_singular(&(_exts)->actions))
-
 static inline void tcf_action_stats_update(struct tc_action *a, u64 bytes,
   u64 packets, u64 lastuse)
 {
@@ -207,12 +198,5 @@ static inline void tcf_action_stats_update(struct 
tc_action *a, u64 bytes,
a->ops->stats_update(a, bytes, packets, lastuse);
 }
 
-#else /* CONFIG_NET_CLS_ACT */
-
-#define tc_no_actions(_exts) true
-#define tc_for_each_action(_a, _exts) while ((void)(_a), 0)
-#define tc_single_action(_exts) false
-#define tcf_action_stats_update(a, bytes, packets, lastuse)
-
 #endif /* CONFIG_NET_CLS_ACT */
 #endif
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 6f8d653..f15aa1e 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -130,6 +130,26 @@ tcf_exts_exec(struct sk_buff *skb, struct tcf_exts *exts,
return 0;
 }
 
+#ifdef CONFIG_NET_CLS_ACT
+
+#define tc_no_actions(_exts) \
+   (list_empty(&(_exts)->actions))
+
+#define tc_for_each_action(_a, _exts) \
+   list_for_each_entry(_a, &(_exts)->actions, list)
+
+#define tc_single_action(_exts) \
+   (list_is_singular(&(_exts)->actions))
+
+#else /* CONFIG_NET_CLS_ACT */
+
+#define tc_no_actions(_exts) true
+#define tc_for_each_action(_a, _exts) while ((void)(_a), 0)
+#define tc_single_action(_exts) false
+#define tcf_action_stats_update(a, bytes, packets, lastuse)
+
+#endif /* CONFIG_NET_CLS_ACT */
+
 int tcf_exts_validate(struct net *net, struct tcf_proto *tp,
  struct nlattr **tb, struct nlattr *rate_tlv,
  struct tcf_exts *exts, bool ovr);
-- 
2.1.0

[Patch net 2/5] net_sched: remove an unnecessary list_del()

2016-08-08 Thread Cong Wang

This list_del() for tc action is not needed actually,
because we only use this list to chain bulk operations,
therefore should not be carried for latter operations.

Fixes: ec0595cc4495 ("net_sched: get rid of struct tcf_common")
Cc: Jamal Hadi Salim 
Signed-off-by: Cong Wang 
---
 net/sched/act_api.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index cce6986..b4c7be3 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -64,7 +64,6 @@ int __tcf_hash_release(struct tc_action *p, bool bind, bool 
strict)
if (p->tcfa_bindcnt <= 0 && p->tcfa_refcnt <= 0) {
if (p->ops->cleanup)
p->ops->cleanup(p, bind);
-   list_del(>list);
tcf_hash_destroy(p->hinfo, p);
ret = ACT_P_DELETED;
}
-- 
2.1.0

[Patch net 0/5] net_sched: tc action fixes and updates

2016-08-08 Thread Cong Wang

This patchset fixes several regressions caused by the previous
code refactor. Thanks to Jamal for catching them!

Note, patch 3/5 and 4/5 are not strictly necessary, I just
want to carry them together.

Cong Wang (5):
  net_sched: remove the leftover cleanup_a()
  net_sched: remove an unnecessary list_del()
  net_sched: fix a typo in tc_for_each_action()
  net_sched: move tc offload macros to pkt_cls.h
  net_sched: convert tcf_exts from list to flex_array

 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c   |  4 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 12 --
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c  |  4 +-
 include/net/act_api.h   | 16 ---
 include/net/pkt_cls.h   | 46 +---
 net/sched/act_api.c | 23 ++
 net/sched/cls_api.c | 57 ++---
 7 files changed, 101 insertions(+), 61 deletions(-)

-- 
2.1.0

[Patch net 1/5] net_sched: remove the leftover cleanup_a()

2016-08-08 Thread Cong Wang

After refactoring tc_action into tcf_common, we no
longer need to cleanup temporary "actions" in list,
they are permanently stored in the hashtable.

Fixes: a85a970af265 ("net_sched: move tc_action into tcf_common")
Reported-by: Jamal Hadi Salim 
Cc: Jamal Hadi Salim 
Signed-off-by: Cong Wang 
---
 net/sched/act_api.c | 22 +++---
 1 file changed, 3 insertions(+), 19 deletions(-)

diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index e4a5f26..cce6986 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -754,16 +754,6 @@ static struct tc_action *tcf_action_get_1(struct net *net, 
struct nlattr *nla,
return ERR_PTR(err);
 }
 
-static void cleanup_a(struct list_head *actions)
-{
-   struct tc_action *a, *tmp;
-
-   list_for_each_entry_safe(a, tmp, actions, list) {
-   list_del(>list);
-   kfree(a);
-   }
-}
-
 static int tca_action_flush(struct net *net, struct nlattr *nla,
struct nlmsghdr *n, u32 portid)
 {
@@ -905,7 +895,7 @@ tca_action_gd(struct net *net, struct nlattr *nla, struct 
nlmsghdr *n,
return ret;
}
 err:
-   cleanup_a();
+   tcf_action_destroy(, 0);
return ret;
 }
 
@@ -942,15 +932,9 @@ tcf_action_add(struct net *net, struct nlattr *nla, struct 
nlmsghdr *n,
 
ret = tcf_action_init(net, nla, NULL, NULL, ovr, 0, );
if (ret)
-   goto done;
+   return ret;
 
-   /* dump then free all the actions after update; inserted policy
-* stays intact
-*/
-   ret = tcf_add_notify(net, n, , portid);
-   cleanup_a();
-done:
-   return ret;
+   return tcf_add_notify(net, n, , portid);
 }
 
 static int tc_ctl_action(struct sk_buff *skb, struct nlmsghdr *n)
-- 
2.1.0

Re: [PATCH 1/3] net: stmmac: dwmac-rk: add rk3366 & rk3399-specific data

2016-08-08 Thread Heiko Stübner

Hi Roger,

Am Mittwoch, 6. Juli 2016, 18:51:29 schrieb Roger Chen:
> Add constants and callback functions for the dwmac on rk3368 socs.
> As can be seen, the base structure is the same, only registers and
> the bits in them moved slightly.
> 
> Signed-off-by: Roger Chen 
> ---
>  .../devicetree/bindings/net/rockchip-dwmac.txt |   3 +-
>  drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c | 226
> + 2 files changed, 228 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/devicetree/bindings/net/rockchip-dwmac.txt
> b/Documentation/devicetree/bindings/net/rockchip-dwmac.txt index
> 93eac7c..8c066e6 100644
> --- a/Documentation/devicetree/bindings/net/rockchip-dwmac.txt
> +++ b/Documentation/devicetree/bindings/net/rockchip-dwmac.txt
> @@ -3,7 +3,8 @@ Rockchip SoC RK3288 10/100/1000 Ethernet driver(GMAC)
>  The device node has following properties.
> 
>  Required properties:
> - - compatible: Can be one of "rockchip,rk3288-gmac", "rockchip,rk3368-gmac"
> + - compatible: Can be one of "rockchip,rk3288-gmac",
> "rockchip,rk3366-gmac", +
> "rockchip,rk3368-gmac", "rockchip,rk3399-gmac" - reg: addresses and length
> of the register sets for the device. - interrupts: Should contain the GMAC
> interrupts.
>   - interrupt-names: Should contain the interrupt names "macirq".

this doesn't apply against 4.8-rc1 anymore, can you please rebase and resend 
the series?


Thanks
Heiko

[PATCH net] bnx2x: don't reset chip on cleanup if PCI function is offline

2016-08-08 Thread Guilherme G. Piccoli

When PCI error is detected, in some architectures (like PowerPC) a slot
reset is performed - the driver's error handlers are in charge of "disable"
device before the reset, and re-enable it after a successful slot reset.

There are two cases though that another path is taken on the code: if the
slot reset is not successful or if too many errors already happened in the
specific adapter (meaning that possibly the device is experiencing a HW
failure that slot reset is not able to solve), the core PCI error mechanism
(called EEH in PowerPC) will remove the adapter from the system, since it
will consider this as a permanent failure on device. In this case, a path
is taken that leads to bnx2x_chip_cleanup() calling bnx2x_reset_hw(), which
then tries to perform a HW reset on chip. This reset won't succeed since
the HW is in a fault state, which can be seen by multiple messages on
kernel log like below:

bnx2x: [bnx2x_issue_dmae_with_comp:552(eth1)]DMAE timeout!
bnx2x: [bnx2x_write_dmae:600(eth1)]DMAE returned failure -1

After some time, the PCI error mechanism gives up on waiting the driver's
correct removal procedure and forcibly remove the adapter from the system.
We can see soft lockup while core PCI error mechanism is waiting for driver
to accomplish the right removal process.

This patch adds a verification to avoid a chip reset whenever the function
is in PCI error state - since this case is only reached when we have a
device being removed because of a permanent failure, the HW chip reset is
not expected to work fine neither is necessary.

Also, we avoid the MCP information dump in case of non-recoverable PCI
error (when adapter is about to be removed), since it will certainly fail.

Reported-by: Harsha Thyagaraja 
Signed-off-by: Guilherme G. Piccoli 
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 97e8925..e6329dc 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -772,6 +772,11 @@ void bnx2x_fw_dump_lvl(struct bnx2x *bp, const char *lvl)
(bp->common.bc_ver & 0xff00) >> 8,
(bp->common.bc_ver & 0xff));
 
+   if (unlikely(pci_channel_offline(bp->pdev))) {
+   BNX2X_ERR("Cannot dump MCP info while in PCI error\n");
+   return;
+   }
+
val = REG_RD(bp, MCP_REG_MCPR_CPU_PROGRAM_COUNTER);
if (val == REG_RD(bp, MCP_REG_MCPR_CPU_PROGRAM_COUNTER))
BNX2X_ERR("%s" "MCP PC at 0x%x\n", lvl, val);
@@ -9415,10 +9420,16 @@ unload_error:
/* Release IRQs */
bnx2x_free_irq(bp);
 
-   /* Reset the chip */
-   rc = bnx2x_reset_hw(bp, reset_code);
-   if (rc)
-   BNX2X_ERR("HW_RESET failed\n");
+   /* Reset the chip, unless PCI function is offline. If we reach this
+* point following a PCI error handling, it means device is really
+* in a bad state and we're about to remove it, so reset the chip
+* is not a good idea.
+*/
+   if (!pci_channel_offline(bp->pdev)) {
+   rc = bnx2x_reset_hw(bp, reset_code);
+   if (rc)
+   BNX2X_ERR("HW_RESET failed\n");
+   }
 
/* Report UNLOAD_DONE to MCP */
bnx2x_send_unload_done(bp, keep_link);
-- 
2.1.0

Re: [PATCH net v2 0/3] Few BPF helper related checksum fixes

2016-08-08 Thread David Miller

From: Daniel Borkmann 
Date: Fri,  5 Aug 2016 00:11:10 +0200

> The set contains three fixes with regards to CHECKSUM_COMPLETE
> and BPF helper functions. For details please see individual
> patches.
> 
> Thanks!
> 
> v1 -> v2:
>   - Fixed make htmldocs issue reported by kbuild bot.
>   - Rest as is.

Series applied, thanks!

Re: [PATCH net-next] cdc_ether: Improve ZTE MF823/831/910 handling

2016-08-08 Thread Bjørn Mork

Oliver Neukum  writes:

> But why fix similar issues at two different places? And what about
> PCI or other cards that show the same problem?

I guess some sort of common helper would be nice to avoid open coding
this fix everywhere.  But you would still have to modify every driver
where it is applicable, as there is no existing common API.

Note that this doesn't include *every* ethernet driver, although there
certainly are some examples.  There are also a number of serious
vendors, providing vendor supported drivers for cards with no known
issues of this kind.

Where exactly would you like to see this implemented if it isn't going
into those specific usbnet drivers?

Bjørn

Re: [net-next 0/2] BPF, kprobes: Add current_in_cgroup helper

2016-08-08 Thread Sargun Dhillon

On Mon, Aug 08, 2016 at 11:27:32AM +0200, Daniel Borkmann wrote:
> On 08/08/2016 05:52 AM, Alexei Starovoitov wrote:
> >On Sun, Aug 07, 2016 at 08:08:19PM -0700, Sargun Dhillon wrote:
> >>Thanks for your feedback Alexei,
> >>I really appreciate it.
> >>
> >>On Sun, Aug 07, 2016 at 05:52:36PM -0700, Alexei Starovoitov wrote:
> >>>On Sat, Aug 06, 2016 at 09:56:06PM -0700, Sargun Dhillon wrote:
> On Sat, Aug 06, 2016 at 09:32:05PM -0700, Alexei Starovoitov wrote:
> >On Sat, Aug 06, 2016 at 09:06:53PM -0700, Sargun Dhillon wrote:
> >>This patchset includes a helper and an example to determine whether the 
> >>kprobe
> >>is currently executing in the context of a specific cgroup based on a 
> >>cgroup
> >>bpf map / array.
> >
> >description is too short to understand how this new helper is going to 
> >be used.
> >depending on kprobe current is not always valid.
> Anything not in in_interrupt() should have a current, right?
> 
> >what are you trying to achieve?
> This is primarily to help troubleshoot containers (Docker, and now 
> systemd). A
> lot of the time we want to determine what's going on in a given container
> (opening files, connecting to systems, etc...). There's not really a 
> great way
> to restrict to containers except by manually walking datastructures to 
> check for
> the right cgroup. This seems like a better alternative.
> >>>
> >>>so it's about restricting or determining?
> >>>In other words if it's analytics/tracing that's one thing, but
> >>>enforcement/restriction is quite different.
> >>>For analytics one can walk task_css_set(current)->dfl_cgrp and remember
> >>>that pointer in a map or something for stats collections and similar.
> >>>If it's restricting apps in containers then kprobe approach
> >>>is not usable. I don't think you'd want to built an enforcement system
> >>>on an unstable api then can vary kernel-to-kernel.
> >>>
> >>The first real-world use case are to implement something like Sysdig. Often 
> >>the
> >>team running the team running the containers don't always know what's 
> >>inside of
> >>them, so they want to be able to view network, I/O, and other activity by
> >>container. Right now, the lowest common denominator between all of the
> >>containerization techniques is cgroups. We've seen examples of where a 
> >>admin is
> >>unsure of the workload, and would love to use opensnoop, but there are too 
> >>many
> >>workloads on the machine.
> >
> >Indeed it would be a useful feature to teach opensnoop to filter by a cgroup
> >and all descentants of it. If you can prepare a patch for it that would be
> >a strong use case for this bpf_current_in_cgroup helper and solid 
> >justification
> >to accept it in the kernel.
> >Something like cgroupv2 string path as an argument ?
> 
> How does this integrate with cgroup namespaces? Your current helper would only
> look at the cgroup in your current namespace, no? Or would the program 
> populating
> the map temporarily switch into other namespaces?
> 
The BPF program is namespace oblivious. If you had multiple cgroups namepaces, 
you'd have to open an fd for the other namespace's cgroup to populate the map. 
I 
see this as more of a userspace problem.

> What about cases where cgroup could be shared among other (net, ..) 
> namespaces,
> BPF program would still not be namespace aware to sort these things out?
> 
I'm not sure what you're getting at. It sounds like being "namespace aware" 
either means that during probe installation you restrict the probe to a given 
namespace, or you have another helper that allows you to check the namespace 
you're in. Would a second helper, and arraymap type address this? If so, I'd 
rather that be separate work.

> You'll also have the issue, for example, that bpf_perf_event_read() counters
> are global, combining them with cgroups helper in a program would lead to 
> false
> expectations (in the sense that they might also be assumed for that cgroup), 
> or
> do you have a way to tackle that as well (at least SW events, since HW should 
> not
> be possible)?
> 
> Btw, there's slightly related work from IBM folks (but to run it from within a
> container; there was a v2 recently I recall):
> 
>   https://lkml.org/lkml/2016/6/14/547
> 
I'm not sure how to avoid the aformentioned problem, but I'm not really sure 
it's a problem. Perhaps perf namespaces are the right way to go, but do you 
have 
a suggestion for the opensnoop-style problem?

> >>Unfortunately, I don't think that it's possible just to check
> >>task_css_set(current)->dfl_cgrp in a bpf program. The container, especially
> >>containers with sidecars (what Kubernetes calls Pods, I believe?) tend to 
> >>have
> >>multiple nested cgroups inside of them. If you had a way to convert cgroup 
> >>array
> >>entries to pointers, I imagine you could write an unrolled loop to check for
> >>ownership within a limited range.
> >>
> >>I'm still looking for

Re: [RFC PATCH 2/3] net: Replace for_each_possible_cpu with for_each_online_cpu

2016-08-08 Thread David Miller

From: Jia He 
Date: Mon,  8 Aug 2016 18:22:21 +0800

> In PowerPC server with large number cpus, the loop index in smt=1 could be 
> reduced to 1/8 compared with smt=8.
> Thus cache misses can be reduced.

You can't do this, if cpus go down we still want to report the statistics
they collected while they were up.

So we must use the possible cpu list here.

Re: problem with MPLS and TSO/GSO

2016-08-08 Thread David Ahern

On 7/25/16 10:39 AM, Lennert Buytenhek wrote:
> Hi!
> 
> I am seeing pretty horrible TCP transmit performance (anywhere between
> 1 and 10 Mb/s, on a 10 Gb/s interface) when traffic is sent out over a
> route that involves MPLS labeling, and this seems to be due to an
> interaction between MPLS and TSO/GSO that causes all segmentable TCP
> frames that are MPLS-labeled to be dropped on egress.
> 
> I initially ran into this issue with the ixgbe driver, but it is easily
> reproduced with veth interfaces, and the script attached below this
> email reproduces the issue.  The script configures three network
> namespaces: one that transmits TCP data (netperf) with MPLS labels,
> one that takes the MPLS traffic and pops the labels and forwards the
> traffic on, and one that receives the traffic (netserver).  When not
> using MPLS labeling, I get ~3 Mb/s single-stream TCP performance
> in this setup on my test box, and with MPLS labeling, I get ~2 Mb/s.
> 
> Some investigating shows that egress TCP frames that need to be
> segmented are being dropped in validate_xmit_skb(), which calls
> skb_gso_segment() which calls skb_mac_gso_segment() which returns
> -EPROTONOSUPPORT because we apparently didn't have the right kernel
> module (mpls_gso) loaded.
> 
> (It's somewhat poor design, IMHO, to degrade network performance by
> 15000x if someone didn't load a kernel module they didn't know they
> should have loaded, and in a way that doesn't log any warnings or
> errors and can only be diagnosed by adding printk calls to net/core/
> and recompiling your kernel.)
> 
> (Also, I'm not sure why mpls_gso is needed when ixgbe seems to be
> able to natively do TSO on MPLS-labeled traffic, maybe because ixgbe
> doesn't advertise the necessary features in ->mpls_features?  But
> adding those bits doesn't seem to change much.)
> 
> But, loading mpls_gso doesn't change much -- skb_gso_segment() then
> starts return -EINVAL instead, which is due to the
> skb_network_protocol() call in skb_mac_gso_segment() returning zero.
> And looking at skb_network_protocol(), I don't see how this is
> supposed to work -- skb->protocol is 0 at this point, and there is no
> way to figure out that what we are encapsulating is IP traffic, because
> unlike what is the case with VLAN tags, MPLS labels aren't followed by
> an inner ethertype that says what kind of traffic is in here, you have
> to have explicit knowledge of the payload type for MPLS.
> 
> Any ideas?

Something is up with the skb manipulations or settings by mpls. With the inner 
protocol set in mpls_output:

skb_set_inner_protocol(skb, skb->protocol);

I get EINVAL failures from inet_gso_segment because the iphdr is not proper 
(ihl is 0 and version is 0).


Thanks for the script to repro with namespaces; much simpler to debug.

Re: [Patch net-next] net_sched: remove an unnecessary list_del()

2016-08-08 Thread Cong Wang

On Tue, Aug 2, 2016 at 10:30 AM, Cong Wang  wrote:
> This list_del() for tc action is not needed actually,
> because we only use this list to chain bulk operations,
> therefore should not be carried for latter operations.
>
> Cc: Jamal Hadi Salim 
> Signed-off-by: Cong Wang 

David, please drop this patch from your backlog, I will
resend it with Fixes tag.

Re: [PATCH net 3/3] mlxsw: spectrum: Add missing DCB rollback in error path

2016-08-08 Thread Jiri Pirko

Thu, Aug 04, 2016 at 04:36:22PM CEST, ido...@mellanox.com wrote:
>We correctly execute mlxsw_sp_port_dcb_fini() when port is removed, but
>I missed its rollback in the error path of port creation, so add it.
>
>Fixes: f00817df2b42 ("mlxsw: spectrum: Introduce support for Data Center 
>Bridging (DCB)")
>Signed-off-by: Ido Schimmel 
Reviewed-by: Jiri Pirko

Re: [PATCH net 2/3] mlxsw: spectrum: Do not override PAUSE settings

2016-08-08 Thread Jiri Pirko

Thu, Aug 04, 2016 at 04:36:21PM CEST, ido...@mellanox.com wrote:
>The PFCC register is used to configure both PAUSE and PFC frames.
>Therefore, when PFC frames are disabled we must make sure we don't
>mistakenly also disable PAUSE frames (which might be enabled).
>
>Fix this by packing the PFCC register with the current PAUSE settings.
>
>Note that this register is also accessed via ethtool ops, but there we
>are guaranteed to have PFC disabled.
>
>Fixes: d81a6bdb87ce ("mlxsw: spectrum: Add IEEE 802.1Qbb PFC support")
>Signed-off-by: Ido Schimmel 
Reviewed-by: Jiri Pirko

[ANNOUNCE] iproute2 4.7.0

2016-08-08 Thread Stephen Hemminger

Update to iproute2 utility to support new features in Linux 4.7.
New features are support of JSON output for bridge command, and configuring 
macsec.
Plus the usual array of documentation, support of kernel flags and minor fixes.

Source:
  http://www.kernel.org/pub/linux/utils/net/iproute2/iproute2-4.7.0.tar.gz

Repository:
  git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git

Report problems (or enhancements) to the netdev@vger.kernel.org mailing list.

---
Alexander Aring (1):
  tc: let m_ipt work with new iptables API headers

Amir Vadai (1):
  tc: flower: Add skip_{hw|sw} support

Andrew Vagin (1):
  ip route: timeout for routes has to be set in seconds

Anuradha Karuppiah (3):
  json_writer: Removed automatic json-object type from the constructor
  bridge: add json support for bridge fdb show
  bridge: add json schema for bridge fdb show

Beniamino Galvani (1):
  utils: fix hex digits parsing in hexstring_a2n()

Daniel Borkmann (4):
  ingress, clsact: don't add TCA_OPTIONS to nl msg
  f_bpf: fix filling of handle when no further arg is provided
  ip, token: add del command
  bpf: also check elf for official e_machine value

David Ahern (13):
  Make builds default to quiet mode
  man: ip-link: Add vrf type
  ss: Refactor inet_show_sock
  ss: Allow ssfilter_bytecompile to return 0
  ss: Add support to filter on device
  ip vrf: Add name_is_vrf
  ip link/addr: Add support for vrf keyword
  ip neigh: Add support for keyword
  ip route: Change type mask to bitmask
  ip vrf: Add ipvrf_get_table
  ip route: Add support for vrf keyword
  ss: Fix support for device filter by index
  ss: Add option to suppress header line

Davide Caratti (3):
  man: macsec: fix macsec related typos
  ip {link,address}: add 'macsec' item to TYPE list
  macsec: cipher and icvlen can be set separately

Eli Cohen (1):
  Add support for configuring Infiniband GUIDs

Eric Dumazet (2):
  ss: add SK_MEMINFO_DROPS display
  fq_codel: add per queue memory limit

Fabien Siron (1):
  misc/ss: Add family list to -f option in _usage()

Ido Schimmel (2):
  man: Add devlink man pages to Makefile
  man: Point to 'devlink-sb' from 'devlink' man page

Jakub Sitnicki (1):
  ip/tcp_metrics: Simplify process_msg a bit

Jamal Hadi Salim (6):
  tc fix ife late binding
  tc simple action: bug fix
  tc action policer: Avoid nonsensical input
  tc filter u32: Coding style fixes
  tc action policer: enable timestamp display
  action pedit: stylistic changes

Jiri Benc (3):
  vxlan: 'external' implies 'nolearning'
  ip-link.8: document "external" flag for vxlan
  vxlan: add support for VXLAN-GPE

Jiri Pirko (4):
  devlink: implement shared buffer support
  devlink: implement shared buffer occupancy control
  devlink: write usage help messages to stderr
  devlink: add option to generate JSON output

Kylie McClain (1):
  ipaddress: fix build with musl libc

Lucas Bates (1):
  man: tc-ife.8: man page for ife action

Martin KaFai Lau (1):
  ss: Add tcp_info fields data_segs_in/out

Masatake YAMATO (1):
  man: rtacct: add missing TP marker

Michal Soltys (2):
  iproute2: unmangle netdev/my emails in man pages (hfsc, stab)
  man/man8/tc-flow.8: minor corrections

Peter Heise (2):
  Added support for selection of new HSR version
  man: ip-link: Added HSR part

Phil Sutter (42):
  man: ip, ip-link: Fix ip option location
  man: rtpr: Fix minor typo
  ipaddress: Allow listing addresses by type
  man: ip-link: Document query_rss option
  tc: m_xt: Prevent segfault with standard targets
  tc: m_xt: Fix segfault when adding multiple actions at once
  tc: m_xt: Fix indenting
  tc: m_xt: Get rid of one indentation level in parse_ipt()
  tc: m_xt: Drop unused variable fw in parse_ipt()
  tc: m_xt: Get rid of rargc in parse_ipt()
  tc: m_xt: Get rid of iargc variable in parse_ipt()
  tc: m_xt: Simplify argc adjusting in parse_ipt()
  tc: m_xt: Introduce get_xtables_target_opts()
  tc: m_action: Use C99 style initializers for struct req
  tc: m_action: Drop unused variable nladdr in tc_action_gd()
  iplink: Add missing variable initialization
  iplink: Check address length via netlink
  man: ip-address, ip-link: Document 'type' quirk
  Fix MAC address length check
  Use ARRAY_SIZE macro everywhere
  ip-address: Support filtering by slave type, too
  ip-address: Align type list in help and man page
  ip-address: constify match_link_kind arg
  iplink: List valid 'type' argument in ip link help text
  iplink: bond_slave: Add missing help functions
  ip-link.8: Extend type list in synopsis
  ip-link.8: Place 'ip link set' warning more prominently
  ip-link.8: Add slave type option descriptions
  ip-link.8: Fix font choices

Re: [PATCH net 1/3] mlxsw: spectrum: Do not assume PAUSE frames are disabled

2016-08-08 Thread Jiri Pirko

Thu, Aug 04, 2016 at 04:36:20PM CEST, ido...@mellanox.com wrote:
>When ieee_setpfc() gets called, PAUSE frames are not necessarily
>disabled on the port.
>
>Check if PAUSE frames are disabled or enabled and configure the port's
>headroom buffer accordingly.
>
>Fixes: d81a6bdb87ce ("mlxsw: spectrum: Add IEEE 802.1Qbb PFC support")
>Signed-off-by: Ido Schimmel 

Reviewed-by: Jiri Pirko

Re: [PATCH net] net: ipv6: Fix ping to link-local addresses.

2016-08-08 Thread David Ahern

On 8/8/16 10:24 AM, Lorenzo Colitti wrote:
> On Tue, Aug 9, 2016 at 12:27 AM, David Ahern  wrote:
>>> - if (!fl6.flowi6_oif && ipv6_addr_is_multicast())
>>> - fl6.flowi6_oif = np->mcast_oif;
>>> - else if (!fl6.flowi6_oif)
>>> - fl6.flowi6_oif = np->ucast_oif;
>>> -
>>
>> That code removal is contrary to your patch description regarding flowi6_oif.
> 
> Which code removal? The one I quote above? That code wasn't removed,
> it was moved to above the initialization of flowi6.
> 

Your description states:
"ping_v6_sendmsg never sets flowi6_oif, so it is not possible to
ping an IPv6 address on a different interface."

That code snippet above contradicts that -- flowi6_oif is set in 
ping_v6_sendmsg.

You are making a different change than just setting flowi6_oif.

Re: [PATCH net] net: ipv6: Fix ping to link-local addresses.

2016-08-08 Thread Lorenzo Colitti

On Tue, Aug 9, 2016 at 12:27 AM, David Ahern  wrote:
> > - if (!fl6.flowi6_oif && ipv6_addr_is_multicast())
> > - fl6.flowi6_oif = np->mcast_oif;
> > - else if (!fl6.flowi6_oif)
> > - fl6.flowi6_oif = np->ucast_oif;
> > -
>
> That code removal is contrary to your patch description regarding flowi6_oif.

Which code removal? The one I quote above? That code wasn't removed,
it was moved to above the initialization of flowi6.

Re: [PATCH net] net: ipv6: Fix ping to link-local addresses.

2016-08-08 Thread 吉藤英明

Hi,

2016-08-08 23:45 GMT+09:00 Lorenzo Colitti :
> On Mon, Aug 8, 2016 at 11:26 PM, Hannes Frederic Sowa
>  wrote:
>>> - if (sk->sk_bound_dev_if &&
>>> - sk->sk_bound_dev_if != u->sin6_scope_id) {
>>> - return -EINVAL;
>>> - }
>>
>> Hmm, sk->sk_bound_dev_if always has highest prio for the selection of
>> the output interface. Thus this code made sense to me.
>
> Removing it is consistent with the other sendmsg functions such as
> udpv6_sendmsg or rawv6_sendmsg.
>
> There is similar code in __ip6_datagram_connect, but that seems a bit
> different because that code also *sets* sk_bound_dev_if.
>
> Personally I think it's better for pingv6_sendmsg be consistent with
> the other *_sendmsg functions than with ip6_datagram_connect, and thus
> the code should be removed. But I don't feel particularly strongly
> about it.

Following must be met, at least, IMHO.
- SO_BINDTODEVICE requires "root", which sets sk_bound_dev_if.
- sin6_scope_id and sk_bound_dev_if should match (if the address it
link-local address), or each or both should equal to 0.

I think it would make more sense if former setting wins...

--yoshfuji

Re: [patch iproute2 1/2] devlink: write usage help messages to stderr

2016-08-08 Thread Jiri Pirko

Mon, Aug 08, 2016 at 05:56:54PM CEST, step...@networkplumber.org wrote:
>On Mon, 8 Aug 2016 09:19:21 +0200
>Jiri Pirko  wrote:
>
>> Sat, Jul 23, 2016 at 06:46:59PM CEST, step...@networkplumber.org wrote:
>> >On Fri, 22 Jul 2016 18:34:29 +0200
>> >Jiri Pirko  wrote:
>> >  
>> >> From: Jiri Pirko 
>> >> 
>> >> In order to not confuse reader, write help messages into stderr.
>> >> 
>> >> Signed-off-by: Jiri Pirko   
>> >
>> >This does make devlink consistent with other parts of iproute2.
>> >But the most common coding standards, back to Unix, and GNU are
>> >that help messages should go to stdout so that:
>> >  $ ip -h | more
>> >would work as expected.  
>> 
>> The thing is I wanted to make stdout only for json. Putting non-json
>> help out there does not look correct to me. Is it?
>> 
>
>I applied both of these patches, just wanted to mention that iproute2 is not
>following the GNU convention. At this point, it really doesn't matter, there
>are arguments to be made for both behaviors.


Allright. Thanks.

[PATCH 1/3] vsockmon: Add tap functions.

2016-08-08 Thread ggarcia

From: Gerard Garcia 

Add tap functions that can be used by the vsock transports to
deliver packets to vsockmon virtual network devices.

Signed-off-by: Gerard Garcia 
---
 include/net/af_vsock.h   |  13 +
 include/uapi/linux/if_arp.h  |   1 +
 net/vmw_vsock/Makefile   |   2 +-
 net/vmw_vsock/af_vsock_tap.c | 114 +++
 4 files changed, 129 insertions(+), 1 deletion(-)
 create mode 100644 net/vmw_vsock/af_vsock_tap.c

diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index f275896..f7c51b1 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -185,4 +185,17 @@ struct sock *vsock_find_connected_socket(struct 
sockaddr_vm *src,
 void vsock_remove_sock(struct vsock_sock *vsk);
 void vsock_for_each_connected_socket(void (*fn)(struct sock *sk));
 
+/ TAP /
+
+struct vsock_tap {
+   struct net_device *dev;
+   struct module *module;
+   struct list_head list;
+};
+
+int vsock_init_tap(void);
+int vsock_add_tap(struct vsock_tap *vt);
+int vsock_remove_tap(struct vsock_tap *vt);
+void vsock_deliver_tap(struct sk_buff *skb);
+
 #endif /* __AF_VSOCK_H__ */
diff --git a/include/uapi/linux/if_arp.h b/include/uapi/linux/if_arp.h
index 4d024d7..cf73510 100644
--- a/include/uapi/linux/if_arp.h
+++ b/include/uapi/linux/if_arp.h
@@ -95,6 +95,7 @@
 #define ARPHRD_IP6GRE  823 /* GRE over IPv6*/
 #define ARPHRD_NETLINK 824 /* Netlink header   */
 #define ARPHRD_6LOWPAN 825 /* IPv6 over LoWPAN */
+#define ARPHRD_VSOCKMON826 /* Vsock monitor header 
*/
 
 #define ARPHRD_VOID  0x/* Void type, nothing is known */
 #define ARPHRD_NONE  0xFFFE/* zero header length */
diff --git a/net/vmw_vsock/Makefile b/net/vmw_vsock/Makefile
index bc27c70..09fc2eb 100644
--- a/net/vmw_vsock/Makefile
+++ b/net/vmw_vsock/Makefile
@@ -3,7 +3,7 @@ obj-$(CONFIG_VMWARE_VMCI_VSOCKETS) += vmw_vsock_vmci_transport.o
 obj-$(CONFIG_VIRTIO_VSOCKETS) += vmw_vsock_virtio_transport.o
 obj-$(CONFIG_VIRTIO_VSOCKETS_COMMON) += vmw_vsock_virtio_transport_common.o
 
-vsock-y += af_vsock.o vsock_addr.o
+vsock-y += af_vsock.o af_vsock_tap.o vsock_addr.o
 
 vmw_vsock_vmci_transport-y += vmci_transport.o vmci_transport_notify.o \
vmci_transport_notify_qstate.o
diff --git a/net/vmw_vsock/af_vsock_tap.c b/net/vmw_vsock/af_vsock_tap.c
new file mode 100644
index 000..427b3b3
--- /dev/null
+++ b/net/vmw_vsock/af_vsock_tap.c
@@ -0,0 +1,114 @@
+/*
+ * Tap functions for AF_VSOCK sockets.
+ *
+ * Code based on net/netlink/af_netlink.c tap functions.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+
+static DEFINE_SPINLOCK(vsock_tap_lock);
+static struct list_head vsock_tap_all __read_mostly =
+   LIST_HEAD_INIT(vsock_tap_all);
+
+int vsock_add_tap(struct vsock_tap *vt) {
+   if (unlikely(vt->dev->type != ARPHRD_VSOCKMON))
+   return -EINVAL;
+
+   __module_get(vt->module);
+
+   spin_lock(_tap_lock);
+   list_add_rcu(>list, _tap_all);
+   spin_unlock(_tap_lock);
+
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(vsock_add_tap);
+
+int __vsock_remove_tap(struct vsock_tap *vt) {
+   bool found = false;
+   struct vsock_tap *tmp;
+
+   spin_lock(_tap_lock);
+
+   list_for_each_entry(tmp, _tap_all, list) {
+   if (vt == tmp) {
+   list_del_rcu(>list);
+   found = true;
+   goto out;
+   }
+   }
+
+   pr_warn("__vsock_remove_tap: %p not found\n", vt);
+out:
+   spin_unlock(_tap_lock);
+
+   if (found)
+   module_put(vt->module);
+
+   return found ? 0 : -ENODEV;
+}
+
+int vsock_remove_tap(struct vsock_tap *vt)
+{
+   int ret;
+
+   ret = __vsock_remove_tap(vt);
+   synchronize_net();
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(vsock_remove_tap);
+
+static int __vsock_deliver_tap_skb(struct sk_buff *skb,
+struct net_device *dev)
+{
+   int ret = 0;
+
+   if (skb) {
+   dev_hold(dev);
+   /* Take skb ownership so it is not consumed in dev_queue_xmit.
+* dev_queue_xmit will drop a reference so the reference count
+* will reset.
+*/
+   skb_get(skb);
+   skb->dev = dev;
+   ret = dev_queue_xmit(skb);
+   if (unlikely(ret > 0))
+   ret = net_xmit_errno(ret);
+
+   dev_put(dev);
+   }
+
+   return ret;
+}
+
+static void __vsock_deliver_tap(struct sk_buff *skb)
+{
+

[PATCH 0/3] vsockmon: virtual device to monitor AF_VSOCK sockets.

2016-08-08 Thread ggarcia

From: Gerard Garcia 

This patch applies over the mst vhost git repository:
http://git.kernel.org/cgit/linux/kernel/git/mst/vhost.git

This was already been sent as a RFC where several issues where fixed.
This is the summary of changes from the first RFC:

v2:
 * Do not clone skb, instead take ownership before transmitting.
 * Split tap functions from af_vsock.c.
 * Simplify vsockmon header to remove unnecessary padding and
set little endian byte order.
 * Various simple fixes from the comments received to the first RFC.

Additionally, this version changes:
 * Add len field to the vsockmon header to ease parsing.
 * Pack vsockmon header.
 * Various simple fixes and styling.

Overview:

Virtual socket transports operate at kernel level therefore, there is no easy
way to see the traffic exchanged between virtual machines and hypervisors that
communicate using AF_VSOCK sockets. In addition, being able to see the control
messages exchanged by the transports may be useful for debugging and
optimization purposes. This patch adds a virtual device that may be used to see
the traffic exchanged between virtual machines and hypervisors through AF_VSOCK
sockets.

Its structure is based on the nlmon device and this version just targets the
virtio transport, but support for the VMCI transport can be easily implemented.
The vsockmon header contains a generic header and includes the header specific 
to
the transport. The generic header allows to follow an AF_VSOCK stream without
having to dig into the details of the transport while the transport header
gives more detail which may be useful for troubleshooting and debugging.

Testing:

To set up a vsockmon device:

ip link add type vsockmon
ip link set vsockmon0 up

The Wireshark development version (master branch) includes a vsock dissector
that is capable of parsing packets received through vsockmon. The dissector
needs to be manually selected.

Thanks to Stefan Hajnoczi for his help.

Gerard

Gerard Garcia (3):
  vsockmon: Add tap functions.
  vsockmon: Add vsockmon device.
  vsockmon: Add virtio vsock hooks

 drivers/net/Kconfig   |   8 ++
 drivers/net/Makefile  |   1 +
 drivers/net/vsockmon.c| 168 ++
 drivers/vhost/vsock.c |  72 ++
 include/net/af_vsock.h|  13 
 include/uapi/linux/Kbuild |   1 +
 include/uapi/linux/if_arp.h   |   1 +
 include/uapi/linux/vsockmon.h |  35 +
 net/vmw_vsock/Makefile|   2 +-
 net/vmw_vsock/af_vsock_tap.c  | 114 
 10 files changed, 414 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/vsockmon.c
 create mode 100644 include/uapi/linux/vsockmon.h
 create mode 100644 net/vmw_vsock/af_vsock_tap.c

-- 
2.9.1

[PATCH 3/3] vsockmon: Add virtio vsock hooks

2016-08-08 Thread ggarcia

From: Gerard Garcia 

Add hooks to the virtio transport host driver to deliver a copy of
the received and sent messages to all vsockmon virtual network devices.

Signed-off-by: Gerard Garcia 
---
 drivers/vhost/vsock.c | 72 +++
 1 file changed, 72 insertions(+)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 0ddf3a2..75b5a46 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -15,8 +15,10 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
+#include 
 #include "vhost.h"
 
 #define VHOST_VSOCK_DEFAULT_HOST_CID   2
@@ -45,6 +47,68 @@ struct vhost_vsock {
u32 guest_cid;
 };
 
+static struct sk_buff *
+virtio_vsock_pkt_vsockmon_to_vsockmon_skb(struct virtio_vsock_pkt *pkt)
+{
+   struct sk_buff *skb;
+   struct af_vsockmon_hdr *hdr;
+   void *payload;
+
+   u32 skb_len = sizeof(struct af_vsockmon_hdr) + pkt->len;
+
+   skb = alloc_skb(skb_len, GFP_ATOMIC);
+   if (!skb)
+   return NULL;
+
+   hdr = (struct af_vsockmon_hdr *) skb_put(skb, sizeof(*hdr));
+
+   hdr->src_cid = pkt->hdr.src_cid;
+   hdr->src_port = pkt->hdr.src_port;
+   hdr->dst_cid = pkt->hdr.dst_cid;
+   hdr->dst_port = pkt->hdr.dst_port;
+   hdr->t = cpu_to_le16(AF_VSOCK_T_VIRTIO);
+   hdr->len = cpu_to_le16(sizeof(hdr->t_hdr));
+
+   switch(pkt->hdr.op) {
+   case VIRTIO_VSOCK_OP_REQUEST:
+   case VIRTIO_VSOCK_OP_RESPONSE:
+   hdr->op = cpu_to_le16(AF_VSOCK_OP_CONNECT);
+   break;
+   case VIRTIO_VSOCK_OP_RST:
+   case VIRTIO_VSOCK_OP_SHUTDOWN:
+   hdr->op = cpu_to_le16(AF_VSOCK_OP_DISCONNECT);
+   break;
+   case VIRTIO_VSOCK_OP_RW:
+   hdr->op = cpu_to_le16(AF_VSOCK_OP_PAYLOAD);
+   break;
+   case VIRTIO_VSOCK_OP_CREDIT_UPDATE:
+   case VIRTIO_VSOCK_OP_CREDIT_REQUEST:
+   hdr->op = cpu_to_le16(AF_VSOCK_OP_CONTROL);
+   break;
+   default:
+   hdr->op = cpu_to_le16(AF_VSOCK_OP_UNKNOWN);
+   break;
+   }
+
+   hdr->t_hdr.virtio_hdr = pkt->hdr;
+
+   if (pkt->len) {
+   payload = skb_put(skb, pkt->len);
+   memcpy(payload, pkt->buf, pkt->len);
+   }
+
+   return skb;
+}
+
+static void vsock_deliver_tap_pkt(struct virtio_vsock_pkt *pkt)
+{
+   struct sk_buff *skb = virtio_vsock_pkt_to_vsockmon_skb(pkt);
+   if (skb) {
+   vsock_deliver_tap(skb);
+   kfree_skb(skb);
+   }
+}
+
 static u32 vhost_transport_get_local_cid(void)
 {
return VHOST_VSOCK_DEFAULT_HOST_CID;
@@ -168,6 +232,11 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
restart_tx = true;
}
 
+   /* Deliver to monitoring devices all correctly transmitted
+* packets.
+*/
+   vsock_deliver_tap_pkt(pkt);
+
virtio_transport_free_pkt(pkt);
}
if (added)
@@ -334,6 +403,9 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work 
*work)
continue;
}
 
+   /* Deliver to monitoring devices all received packets */
+   vsock_deliver_tap_pkt(pkt);
+
/* Only accept correctly addressed packets */
if (le64_to_cpu(pkt->hdr.src_cid) == vsock->guest_cid)
virtio_transport_recv_pkt(pkt);
-- 
2.9.1

[PATCH 2/3] vsockmon: Add vsockmon device.

2016-08-08 Thread ggarcia

From: Gerard Garcia 

Add vsockmon virtual network device that receives packets from the vsock
transports and exposes them to user space.

Based on the nlmon device.

Signed-off-by: Gerard Garcia 
---
 drivers/net/Kconfig   |   8 ++
 drivers/net/Makefile  |   1 +
 drivers/net/vsockmon.c| 168 ++
 include/uapi/linux/Kbuild |   1 +
 include/uapi/linux/vsockmon.h |  35 +
 5 files changed, 213 insertions(+)
 create mode 100644 drivers/net/vsockmon.c
 create mode 100644 include/uapi/linux/vsockmon.h

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 0c5415b..42c43b6 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -330,6 +330,14 @@ config NET_VRF
  This option enables the support for mapping interfaces into VRF's. The
  support enables VRF devices.
 
+config VSOCKMON
+tristate "Virtual vsock monitoring device"
+depends on VHOST_VSOCK
+---help---
+ This option enables a monitoring net device for vsock sockets. It is
+ mostly intended for developers or support to debug vsock issues. If
+ unsure, say N.
+
 endif # NET_CORE
 
 config SUNGEM_PHY
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 7336cbd..e2188d4 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -28,6 +28,7 @@ obj-$(CONFIG_GENEVE) += geneve.o
 obj-$(CONFIG_GTP) += gtp.o
 obj-$(CONFIG_NLMON) += nlmon.o
 obj-$(CONFIG_NET_VRF) += vrf.o
+obj-$(CONFIG_VSOCKMON) += vsockmon.o
 
 #
 # Networking Drivers
diff --git a/drivers/net/vsockmon.c b/drivers/net/vsockmon.c
new file mode 100644
index 000..9ad4f0a
--- /dev/null
+++ b/drivers/net/vsockmon.c
@@ -0,0 +1,168 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* Virtio transport max packet size plus header */
+#define DEFAULT_MTU VIRTIO_VSOCK_MAX_PKT_BUF_SIZE + sizeof(struct 
af_vsockmon_hdr);
+
+struct pcpu_lstats {
+   u64 rx_packets;
+   u64 rx_bytes;
+   struct u64_stats_sync syncp;
+};
+
+static int vsockmon_dev_init(struct net_device *dev)
+{
+   dev->lstats = netdev_alloc_pcpu_stats(struct pcpu_lstats);
+   return dev->lstats == NULL ? -ENOMEM : 0;
+}
+
+static void vsockmon_dev_uninit(struct net_device *dev)
+{
+   free_percpu(dev->lstats);
+}
+
+struct vsockmon {
+   struct vsock_tap vt;
+};
+
+static int vsockmon_open(struct net_device *dev)
+{
+   struct vsockmon *vsockmon = netdev_priv(dev);
+
+   vsockmon->vt.dev = dev;
+   vsockmon->vt.module = THIS_MODULE;
+   return vsock_add_tap(>vt);
+}
+
+static int vsockmon_close(struct net_device *dev) {
+   struct vsockmon *vsockmon = netdev_priv(dev);
+
+   return vsock_remove_tap(>vt);
+}
+
+static netdev_tx_t vsockmon_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+   int len = skb->len;
+   struct pcpu_lstats *stats = this_cpu_ptr(dev->lstats);
+
+   u64_stats_update_begin(>syncp);
+   stats->rx_bytes += len;
+   stats->rx_packets++;
+   u64_stats_update_end(>syncp);
+
+   dev_kfree_skb(skb);
+
+   return NETDEV_TX_OK;
+}
+
+static struct rtnl_link_stats64 *
+vsockmon_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
+{
+   int i;
+   u64 bytes = 0, packets = 0;
+
+   for_each_possible_cpu(i) {
+   const struct pcpu_lstats *vstats;
+   u64 tbytes, tpackets;
+   unsigned int start;
+
+   vstats = per_cpu_ptr(dev->lstats, i);
+
+   do {
+   start = u64_stats_fetch_begin_irq(>syncp);
+   tbytes = vstats->rx_bytes;
+   tpackets = vstats->rx_packets;
+   } while (u64_stats_fetch_retry_irq(>syncp, start));
+
+   packets += tpackets;
+   bytes += tbytes;
+   }
+
+   stats->rx_packets = packets;
+   stats->tx_packets = 0;
+
+   stats->rx_bytes = bytes;
+   stats->tx_bytes = 0;
+
+   return stats;
+}
+
+static int vsockmon_is_valid_mtu(int new_mtu)
+{
+   return new_mtu >= (int) sizeof(struct af_vsockmon_hdr);
+}
+
+static int vsockmon_change_mtu(struct net_device *dev, int new_mtu)
+{
+   if (!vsockmon_is_valid_mtu(new_mtu))
+   return -EINVAL;
+
+   dev->mtu = new_mtu;
+   return 0;
+}
+
+static const struct net_device_ops vsockmon_ops = {
+   .ndo_init = vsockmon_dev_init,
+   .ndo_uninit = vsockmon_dev_uninit,
+   .ndo_open = vsockmon_open,
+   .ndo_stop = vsockmon_close,
+   .ndo_start_xmit = vsockmon_xmit,
+   .ndo_get_stats64 = vsockmon_get_stats64,
+   .ndo_change_mtu = vsockmon_change_mtu,
+};
+
+static u32 always_on(struct net_device *dev)
+{
+   return 1;
+}
+
+static const struct ethtool_ops vsockmon_ethtool_ops = {
+   .get_link = always_on,
+};
+
+static void vsockmon_setup(struct net_device *dev)
+{
+   dev->type =

Re: [PATCH net 2/2] net: vxlan: lwt: Fix vxlan local traffic.

2016-08-08 Thread Jiri Benc

On Fri,  5 Aug 2016 17:45:37 -0700, Pravin B Shelar wrote:
> vxlan driver has bypass for local vxlan traffic, but that
> depends on information about all VNIs on local system in
> vxlan driver. This is not available in case of LWT.
> Therefore following patch disable encap bypass for LWT
> vxlan traffic.
> 
> Fixes: ee122c79d42 ("vxlan: Flow based tunneling").
> Reported-by: Jakub Libosvar 
> Signed-off-by: Pravin B Shelar 

Acked-by: Jiri Benc

Re: [PATCH net 1/2] net: vxlan: lwt: Use source ip address during route lookup.

2016-08-08 Thread Jiri Benc

On Fri,  5 Aug 2016 17:45:36 -0700, Pravin B Shelar wrote:
> LWT user can specify destination as well as source ip address
> for given tunnel endpoint. But vxlan is ignoring given source
> ip address. Following patch uses both ip address to route the
> tunnel packet. This consistent with other LWT implementations,
> like GENEVE and GRE.
> 
> Fixes: ee122c79d42 ("vxlan: Flow based tunneling").
> Signed-off-by: Pravin B Shelar 

Acked-by: Jiri Benc

Re: [patch iproute2 1/2] devlink: write usage help messages to stderr

2016-08-08 Thread Stephen Hemminger

On Mon, 8 Aug 2016 09:19:21 +0200
Jiri Pirko  wrote:

> Sat, Jul 23, 2016 at 06:46:59PM CEST, step...@networkplumber.org wrote:
> >On Fri, 22 Jul 2016 18:34:29 +0200
> >Jiri Pirko  wrote:
> >  
> >> From: Jiri Pirko 
> >> 
> >> In order to not confuse reader, write help messages into stderr.
> >> 
> >> Signed-off-by: Jiri Pirko   
> >
> >This does make devlink consistent with other parts of iproute2.
> >But the most common coding standards, back to Unix, and GNU are
> >that help messages should go to stdout so that:
> >  $ ip -h | more
> >would work as expected.  
> 
> The thing is I wanted to make stdout only for json. Putting non-json
> help out there does not look correct to me. Is it?
> 

I applied both of these patches, just wanted to mention that iproute2 is not
following the GNU convention. At this point, it really doesn't matter, there
are arguments to be made for both behaviors.

Re: [PATCH] tc/m_gact: Fix action_a2n() return code check

2016-08-08 Thread Stephen Hemminger

On Sun,  7 Aug 2016 13:19:01 +0200
Phil Sutter  wrote:

> The function returns zero on success.
> 
> Reported-by: Mark Bloch 
> Fixes: 69f5aff63c770b ("tc: use action_a2n() everywhere")
> Signed-off-by: Phil Sutter 

Applied

RE: [5.3] ucc_geth: Fix to avoid IS_ERR_VALUE abuses and dead code on 64bit systems.

2016-08-08 Thread David Laight

From: Arnd Bergmann
> Sent: 08 August 2016 16:13
> 
> On Monday, August 8, 2016 2:49:11 PM CEST David Laight wrote:
> >
> > > If qe_muram_alloc will return any error, Then IS_ERR_VALUE will always
> > > return 0. it'll not call ucc_fast_free for any failure. Inside 'if code'
> > > will be a dead code on 64bit. Even qe_muram_addr will return wrong
> > > virtual address. Which can cause an error.
> > >
> > >  kfree((void *)ugeth->tx_bd_ring_offset[i]);
> >
> > Erm, kfree() isn't the right function for things allocated by 
> > qe_muram_alloc().
> >
> > I still thing you need to stop this code using IS_ERR_VALUE() at all.
> 
> Those are two separate issues:
> 
> a) The ucc_geth driver mixing kmalloc() memory with muram, and assigning
>the result to "u32" and "void __iomem *" variables, both of which
>are wrong at least half of the time.
> 
> b) calling conventions of qe_muram_alloc() being defined in a way that
>requires the use of IS_ERR_VALUE(), because '0' is a valid address
>here.

Yep, it is all a big bag of worms...
'0' being valid is going to make tidying up after failure 'problematic'.

> The first one can be solved by updating the network driver, ideally
> by getting rid of the casts and using proper types and accessors,
> while the second would require updating all users of that interface.

It might be worth (at least as a compilation option) of embedding the
'muram offset' in a structure (passed and returned by value).

The compiler can then check that the driver code is never be looking
directly at the value.

For 'b' zero can be made invalid by changing the places where the
offset is added/subtracted.
It could even be used to offset the saved physical and virtual
addresses of the area - so not needing any extra code when the values
are converted to physical/virtual addresses.

David

Re: [PATCH iproute2] bridge: vlan json: skip ports with empty vlans

2016-08-08 Thread Stephen Hemminger

On Sun,  7 Aug 2016 12:37:03 -0700
Roopa Prabhu  wrote:

> From: Roopa Prabhu 
> 
> The non-json output prints 'None' for such vlans.
> And this can garble json output.
> 
> Fixes: d82a49ce85f0 ("bridge: add json support for bridge vlan show")
> Signed-off-by: Roopa Prabhu 

Applied

[v5.1] ucc_fast: Fix to avoid IS_ERR_VALUE abuses and dead code on 64bit systems.

2016-08-08 Thread Arvind Yadav

IS_ERR_VALUE() assumes that parameter is an unsigned long.
It can not be used to check if 'unsigned int' is passed insted.
Which tends to reflect an error.
In 64bit architectures sizeof (int) == 4 && sizeof (long) == 8.
IS_ERR_VALUE(x) is ((x) >= (unsigned long)-4095).
IS_ERR_VALUE() of 'unsigned int' is always false because the 32bit
value is zero extended to 64 bits.

Now Problem In UCC fast protocols -: drivers/soc/fsl/qe/ucc_fast.c

/* Allocate memory for Tx Virtual Fifo */
uccf->ucc_fast_tx_virtual_fifo_base_offset =
  qe_muram_alloc(uf_info->utfs, UCC_FAST_VIRT_FIFO_REGS_ALIGNMENT);
if (IS_ERR_VALUE(uccf->ucc_fast_tx_virtual_fifo_base_offset)) {
printk(KERN_ERR "%s: cannot allocate MURAM for TX FIFO\n",
__func__);
uccf->ucc_fast_tx_virtual_fifo_base_offset = 0;
ucc_fast_free(uccf);
return -ENOMEM;
}

/* Allocate memory for Rx Virtual Fifo */
uccf->ucc_fast_rx_virtual_fifo_base_offset =
   qe_muram_alloc(uf_info->urfs +
   UCC_FAST_RECEIVE_VIRTUAL_FIFO_SIZE_FUDGE_FACTOR,
   UCC_FAST_VIRT_FIFO_REGS_ALIGNMENT);
if (IS_ERR_VALUE(uccf->ucc_fast_rx_virtual_fifo_base_offset)) {
printk(KERN_ERR "%s: cannot allocate MURAM for RX FIFO\n",
__func__);
uccf->ucc_fast_rx_virtual_fifo_base_offset = 0;
ucc_fast_free(uccf);
return -ENOMEM;
}

qe_muram_alloc (a.k.a. cpm_muram_alloc) returns unsigned long.
Return value store in a u32 (ucc_fast_tx_virtual_fifo_base_offset
and ucc_fast_rx_virtual_fifo_base_offset).If qe_muram_alloc will
return any error, Then IS_ERR_VALUE will always return 0. it'll not
call ucc_fast_free for any failure. Inside 'if code' will be a dead
code on 64bit.
This patch is to avoid this problem on 64bit machine.

Signed-off-by: Arvind Yadav 
---
 include/soc/fsl/qe/ucc_fast.h | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/soc/fsl/qe/ucc_fast.h b/include/soc/fsl/qe/ucc_fast.h
index df8ea79..ada9070 100644
--- a/include/soc/fsl/qe/ucc_fast.h
+++ b/include/soc/fsl/qe/ucc_fast.h
@@ -165,10 +165,12 @@ struct ucc_fast_private {
int stopped_tx; /* Whether channel has been stopped for Tx
   (STOP_TX, etc.) */
int stopped_rx; /* Whether channel has been stopped for Rx */
-   u32 ucc_fast_tx_virtual_fifo_base_offset;/* pointer to base of Tx
-   virtual fifo */
-   u32 ucc_fast_rx_virtual_fifo_base_offset;/* pointer to base of Rx
-   virtual fifo */
+   unsigned long ucc_fast_tx_virtual_fifo_base_offset;/* pointer to base of
+   * Tx virtual fifo
+   */
+   unsigned long ucc_fast_rx_virtual_fifo_base_offset;/* pointer to base of
+   * Rx virtual fifo
+   */
 #ifdef STATISTICS
u32 tx_frames;  /* Transmitted frames counter. */
u32 rx_frames;  /* Received frames counter (only frames
-- 
1.9.1

Re: [ovs-dev] [PATCH net-next v11 5/6] openvswitch: add layer 3 flow/port support

2016-08-08 Thread Jiri Benc

On Mon, 8 Aug 2016 17:17:17 +0200, Simon Horman wrote:
> +bool skb_mac_header_present(struct sk_buff *skb)
> +{
> + return skb->dev->type == ARPHRD_ETHER ||
> + (skb->dev->type == ARPHRD_NONE &&
> +  skb->protocol == htons(ETH_P_TEB));
> +}
> +EXPORT_SYMBOL(skb_mac_header_present);

I'd suggest a different name, this looks like it has something to do
with skb->mac_header, which it doesn't. skb_eth_header_present, perhaps?

 Jiri

Re: [PATCH net] net: ipv6: Fix ping to link-local addresses.

2016-08-08 Thread David Ahern

On 8/8/16 1:42 AM, Lorenzo Colitti wrote:
> ping_v6_sendmsg never sets flowi6_oif, so it is not possible to
> ping an IPv6 address on a different interface. Instead, it sets
> flowi6_iif, which is incorrect but harmless. Also, it returns an
> error if a passed-in scope ID doesn't match sk_bound_dev_if.
> 
> Get rid of the error, stop setting flowi6_iif, and support
> various ways of setting oif in the same priority order used by
> udpv6_sendmsg.
> 
> Tested: https://android-review.googlesource.com/#/c/254470/
> Signed-off-by: Lorenzo Colitti 
> ---
>  net/ipv6/ping.c | 29 +++--
>  1 file changed, 15 insertions(+), 14 deletions(-)
> 
> diff --git a/net/ipv6/ping.c b/net/ipv6/ping.c
> index fed40d1..eabf1ea 100644
> --- a/net/ipv6/ping.c
> +++ b/net/ipv6/ping.c

-8<-

> @@ -106,16 +111,12 @@ static int ping_v6_sendmsg(struct sock *sk, struct 
> msghdr *msg, size_t len)
>   fl6.flowi6_proto = IPPROTO_ICMPV6;
>   fl6.saddr = np->saddr;
>   fl6.daddr = *daddr;
> + fl6.flowi6_oif = oif;
>   fl6.flowi6_mark = sk->sk_mark;
>   fl6.fl6_icmp_type = user_icmph.icmp6_type;
>   fl6.fl6_icmp_code = user_icmph.icmp6_code;
>   security_sk_classify_flow(sk, flowi6_to_flowi());
>  
> - if (!fl6.flowi6_oif && ipv6_addr_is_multicast())
> - fl6.flowi6_oif = np->mcast_oif;
> - else if (!fl6.flowi6_oif)
> - fl6.flowi6_oif = np->ucast_oif;
> -
>   ipc6.tclass = np->tclass;
>   fl6.flowlabel = ip6_make_flowinfo(ipc6.tclass, fl6.flowlabel);
>  
> 

That code removal is contrary to your patch description regarding flowi6_oif.

Re: Buggy rhashtable walking

2016-08-08 Thread Herbert Xu

On Fri, Aug 05, 2016 at 04:46:43AM -0700, Ben Greear wrote:
>
> It would not be fun to have to revert to the old way of hashing
> stations in mac80211...
> 
> I'll be happy to test the patches when you have them ready.

Thanks for the offer.  Unfortunately it'll be a few days before
I'm ready because I need to work through some crypto patches first.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: problem with MPLS and TSO/GSO

2016-08-08 Thread Simon Horman

On Sun, Jul 31, 2016 at 12:07:10AM -0700, Roopa Prabhu wrote:
> On 7/27/16, 12:02 AM, zhuyj wrote:
> > On ubuntu16.04 server 64 bit
> > The attached script is run, the following will appear.
> >
> > Error: either "to" is duplicate, or "encap" is a garbage.
> 
> This maybe just because the iproute2 version on ubuntu does not
> support the route encap attributes yet.
> 
> [snip]
> 
> >
> > On Tue, Jul 26, 2016 at 12:39 AM, Lennert Buytenhek 
> > wrote:
> >
> >> Hi!
> >>
> >> I am seeing pretty horrible TCP transmit performance (anywhere between
> >> 1 and 10 Mb/s, on a 10 Gb/s interface) when traffic is sent out over a
> >> route that involves MPLS labeling, and this seems to be due to an
> >> interaction between MPLS and TSO/GSO that causes all segmentable TCP
> >> frames that are MPLS-labeled to be dropped on egress.
> >>
> >> I initially ran into this issue with the ixgbe driver, but it is easily
> >> reproduced with veth interfaces, and the script attached below this
> >> email reproduces the issue.  The script configures three network
> >> namespaces: one that transmits TCP data (netperf) with MPLS labels,
> >> one that takes the MPLS traffic and pops the labels and forwards the
> >> traffic on, and one that receives the traffic (netserver).  When not
> >> using MPLS labeling, I get ~3 Mb/s single-stream TCP performance
> >> in this setup on my test box, and with MPLS labeling, I get ~2 Mb/s.
> >>
> >> Some investigating shows that egress TCP frames that need to be
> >> segmented are being dropped in validate_xmit_skb(), which calls
> >> skb_gso_segment() which calls skb_mac_gso_segment() which returns
> >> -EPROTONOSUPPORT because we apparently didn't have the right kernel
> >> module (mpls_gso) loaded.
> >>
> >> (It's somewhat poor design, IMHO, to degrade network performance by
> >> 15000x if someone didn't load a kernel module they didn't know they
> >> should have loaded, and in a way that doesn't log any warnings or
> >> errors and can only be diagnosed by adding printk calls to net/core/
> >> and recompiling your kernel.)
> 
> Its possible that the right way to do this is to always auto select MPLS_GSO
> if MPLS_IPTUNNEL is selected. I am guessing this by looking at the
> openvswitch mpls Kconfig entries and comparing with MPLS_IPTUNNEL.
> will look some more.
> 
> >>
> >> (Also, I'm not sure why mpls_gso is needed when ixgbe seems to be
> >> able to natively do TSO on MPLS-labeled traffic, maybe because ixgbe
> >> doesn't advertise the necessary features in ->mpls_features?  But
> >> adding those bits doesn't seem to change much.)
> >>
> >> But, loading mpls_gso doesn't change much -- skb_gso_segment() then
> >> starts return -EINVAL instead, which is due to the
> >> skb_network_protocol() call in skb_mac_gso_segment() returning zero.
> >> And looking at skb_network_protocol(), I don't see how this is
> >> supposed to work -- skb->protocol is 0 at this point, and there is no
> >> way to figure out that what we are encapsulating is IP traffic, because
> >> unlike what is the case with VLAN tags, MPLS labels aren't followed by
> >> an inner ethertype that says what kind of traffic is in here, you have
> >> to have explicit knowledge of the payload type for MPLS.
> >>
> >> Any ideas?
> I was looking at the history of net/mpls/mpls_gso.c and the initial git log 
> comment
> says that the driver expects the mpls tunnel driver to do a few things which 
> I think
> might be the problem. I do see mpls_iptunnel.c setting the skb->protocol but 
> not the
> skb->inner_protocol. wonder if fixing anything there will help ?.

If the inner protocol is not set then I don't think that segmentation can
function as there is (or at least was for the use case the code was added)
no way for the stack to know the protocol of the inner packet otherwise.

On another note I was recently poking around the code and I wonder if the
following may be needed (this was in the context of my under-construction
l3 tunnel work for OvS and it may only be needed in that context):

diff --git a/net/mpls/mpls_gso.c b/net/mpls/mpls_gso.c
index 2055e57ed1c3..113cba89653d 100644
--- a/net/mpls/mpls_gso.c
+++ b/net/mpls/mpls_gso.c
@@ -39,16 +39,18 @@ static struct sk_buff *mpls_gso_segment(struct sk_buff *skb,
mpls_features = skb->dev->mpls_features & features;
segs = skb_mac_gso_segment(skb, mpls_features);
 
-
-   /* Restore outer protocol. */
-   skb->protocol = mpls_protocol;
-
/* Re-pull the mac header that the call to skb_mac_gso_segment()
 * above pulled.  It will be re-pushed after returning
 * skb_mac_gso_segment(), an indirect caller of this function.
 */
__skb_pull(skb, skb->data - skb_mac_header(skb));
 
+   /* Restore outer protocol. */
+   skb->protocol = mpls_protocol;
+   if (!IS_ERR(segs))
+   for (skb = segs; skb; skb = skb->next)
+   skb->protocol = mpls_protocol;
+
return segs;

Re: [ovs-dev] [PATCH net-next v11 5/6] openvswitch: add layer 3 flow/port support

2016-08-08 Thread Simon Horman

On Wed, Jul 20, 2016 at 11:06:37AM -0700, pravin shelar wrote:
> On Tue, Jul 19, 2016 at 5:02 PM, Simon Horman
>  wrote:
> > On Mon, Jul 18, 2016 at 03:34:52PM -0700, pravin shelar wrote:
> >> On Sun, Jul 17, 2016 at 9:50 PM, Simon Horman
> >>  wrote:
> >> > [CC Jiri Benc for portion regarding GRE]
> >> >
> >> > Hi Pravin,
> >> >
> >> > On Fri, Jul 15, 2016 at 02:07:37PM -0700, pravin shelar wrote:
> >> >> On Wed, Jul 13, 2016 at 12:31 AM, Simon Horman
> >> >>  wrote:
> >> >> > Hi Pravin,
> >> >> >
> >> >> > On Thu, Jul 07, 2016 at 01:54:15PM -0700, pravin shelar wrote:
> >> >> >> On Wed, Jul 6, 2016 at 10:59 AM, Simon Horman
> >> >> >>  wrote:
> >> >> >
> >> >> > ...
> >> >>
> >> >> >
> >> >> >> > diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
> >> >> >> > index 0ea128eeeab2..86f2cfb19de3 100644
> >> >> >> > --- a/net/openvswitch/flow.c
> >> >> >> > +++ b/net/openvswitch/flow.c
> >> >> >> ...
> >> >> >>
> >> >> >> > @@ -723,9 +729,17 @@ int ovs_flow_key_extract(const struct 
> >> >> >> > ip_tunnel_info *tun_info,
> >> >> >> > key->phy.skb_mark = skb->mark;
> >> >> >> > ovs_ct_fill_key(skb, key);
> >> >> >> > key->ovs_flow_hash = 0;
> >> >> >> > +   key->phy.is_layer3 = skb->mac_len == 0;
> >> >> >>
> >> >> >> I do not think mac_len can be used. mac_header needs to be checked.
> >> >> >> ...
> >> >> >
> >> >> > Yes, indeed. The update to use skb_mac_header_was_set() here 
> >> >> > accidently
> >> >> > slipped into the following patch, sorry about that.
> >> >> >
> >> >> > With that change in place I believe that this patch is internally
> >> >> > consistent because mac_header and mac_len are set correctly by the
> >> >> > call to key_extract() which is called by ovs_flow_key_extract() just
> >> >> > after where the excerpt above ends.
> >> >> >
> >> >> > That said, I do think that it is possible to rely on 
> >> >> > skb_mac_header_was_set
> >> >> > throughout the datapath, including action processing etc... I have 
> >> >> > provided
> >> >> > an incremental patch - which I created on top of this entire series - 
> >> >> > at
> >> >> > the end of this email. If you prefer that approach I am happy to take 
> >> >> > it,
> >> >> > though I do feel that using mac_len leads to slightly cleaner code. 
> >> >> > Let me
> >> >> > know what you think.
> >> >> >
> >> >>
> >> >>
> >> >> I am not sure if you can use only mac_len to detect L3 packet. This
> >> >> does not work with MPLS packets, mac_len is used to account MPLS
> >> >> headers pushed on skb. Therefore in case of a MPLS header on L3
> >> >> packet, mac_len would be non zero and we have to look at either
> >> >> mac_header or some other metadata like is_layer3 flag from key to
> >> >> check for L3 packet.
> >> >
> >> > At least within OvS mac_len does not include the length of the MPLS label
> >> > stack. Rather, the MPLS label stack length is the difference between the
> >> > end of (mac_header + mac_len) and network_header.
> >> >
> >> > So I think that the scheme does work as mac_len is 0 if there is no L2
> >> > header regardless of if an MPLS label stack is present or not.
> >> >
> >>
> >> I was thinking in overall networking stack rather than just ovs
> >> datapath. I think we should have consistent method of detecting L3
> >> packet. As commented in previous mail it could be achieved using
> >> skb-protocol and device type.
> >
> > This is somewhat of a surprise to me. As far as I recall when MPLS support
> > was added to OvS it and the accompanying support for MPLS GSO was the only
> > MPLS support present in the kernel. And at the time the scheme developed by
> > Jesse Gross, myself and others was as I describe above.
> >
> > Internally OvS relies on this scheme and in particular it is used
> > by skb_mpls_header() to calculate the beginning of the MPLS label stack
> > accurately in the presence of VLAN tags.
> >
> > Is it mpls_gso_segment() that you are concerned about?
> > If so, perhaps the problem could be addressed there.
> 
> Yes.
> Can you read the comment I made in previous main in context of
> function skb_mpls_header(). I have given rational for requested
> change.

Hi Pravin,

I have made an attempt to implement your suggestion to the extent that
I understand it. The following is an incremental change on top
of this patch-set. Does it move things closer to what you have in mind?

Light testing seems to indicate that it works for GSO skbs
received over both L3 and L2 GRE tunnels by OvS with both
IP-in-MPLS and IP (without MPLS) payloads.

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 72ece516535d..42033537eb4d 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2171,17 +2171,14 @@ static inline void skb_reset_mac_header(struct sk_buff 
*skb)
skb->mac_header = skb->data - skb->head;
 }
 
-static inline void skb_unset_mac_header(struct

Re: [PATCH v4 1/1] rps: Inspect PPTP encapsulated by GRE to get flow hash

2016-08-08 Thread Philp Prindeville


No, I was referring to anonymous structures, which is a feature of C11.

Please see the link I sent.


On 08/08/2016 03:13 AM, Feng Gao wrote:

Hi Philip,

Do you mean like the following?

struct gre_full_hdr {
 struct {
 __be16 flags;
 __be16 protocol;
 } fixed_header;
 __be16 csum;
 __be16 reserved1;
 __be32 key;
 __be32 seq;
} __packed;

But we need struct gre_base_hdr to get the fixed header of GRE in
function __skb_flow_dissect like the following codes.
struct gre_base_hdr *hdr, _hdr;
hdr = __skb_header_pointer(skb, nhoff, sizeof(_hdr), data, hlen, &_hdr);

BTW, the original codes define one local stuct gre_hdr. Now I use the
unified struct gre_base_hdr instead of it.

Best Regards
Feng


On Mon, Aug 8, 2016 at 11:27 AM, Philp Prindeville
 wrote:

Feng,

An anonymous structure is defined here:
https://gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html

i.e.:

struct gre_full_hdr {
 struct gre_base_hdr;
 ...

so yes, I'm talking about making fixed_header be anonymous instead.

-Philip



On 08/07/2016 08:50 PM, Feng Gao wrote:

Hi Philp,

Forgive my poor English, I am not clear about the comment #1.
"Can you make gre_base_hdr be anonymous?".

+struct gre_full_hdr {
+   struct gre_base_hdr fixed_header;

Do you mean make the member "fixed_header" as anonymous or not?

Best Regards
Feng


On Mon, Aug 8, 2016 at 5:03 AM, Philp Prindeville
 wrote:

Inline...



On 08/04/2016 01:06 AM, f...@48lvckh6395k16k5.yundunddos.com wrote:

From: Gao Feng 

The PPTP is encapsulated by GRE header with that GRE_VERSION bits
must contain one. But current GRE RPS needs the GRE_VERSION must be
zero. So RPS does not work for PPTP traffic.

In my test environment, there are four MIPS cores, and all traffic
are passed through by PPTP. As a result, only one core is 100% busy
while other three cores are very idle. After this patch, the usage
of four cores are balanced well.

Signed-off-by: Gao Feng 
---
v4: 1) Define struct gre_full_hdr, and use sizeof its member
directly;
2) Move version and routing check ahead;
3) Only PPTP in GRE check the ack flag;
v3: 1) Move struct pptp_gre_header defination into new file pptp.h
2) Use sizeof GRE and PPTP type instead of literal value;
3) Remove strict flag check for PPTP to robust;
4) Consolidate the codes again;
v2: Update according to Tom and Philp's advice.
1) Consolidate the codes with GRE version 0 path;
2) Use PPP_PROTOCOL to get ppp protol;
3) Set the FLOW_DIS_ENCAPSULATION flag;
v1: Intial Patch

drivers/net/ppp/pptp.c |  36 +
include/net/gre.h  |  10 +++-
include/net/pptp.h |  40 +++
include/uapi/linux/if_tunnel.h |   7 ++-
net/core/flow_dissector.c  | 113
-
5 files changed, 135 insertions(+), 71 deletions(-)
create mode 100644 include/net/pptp.h

diff --git a/drivers/net/ppp/pptp.c b/drivers/net/ppp/pptp.c
index ae0905e..3e68dbc 100644
--- a/drivers/net/ppp/pptp.c
+++ b/drivers/net/ppp/pptp.c
@@ -37,6 +37,7 @@
#include 
#include 
#include 
+#include 
  #include 
@@ -53,41 +54,6 @@ static struct proto pptp_sk_proto __read_mostly;
static const struct ppp_channel_ops pptp_chan_ops;
static const struct proto_ops pptp_ops;
-#define PPP_LCP_ECHOREQ 0x09
-#define PPP_LCP_ECHOREP 0x0A
-#define SC_RCV_BITS
(SC_RCV_B7_1|SC_RCV_B7_0|SC_RCV_ODDP|SC_RCV_EVNP)
-
-#define MISSING_WINDOW 20
-#define WRAPPED(curseq, lastseq)\
-   curseq) & 0xff00) == 0) &&\
-   (((lastseq) & 0xff00) == 0xff00))
-
-#define PPTP_GRE_PROTO  0x880B
-#define PPTP_GRE_VER0x1
-
-#define PPTP_GRE_FLAG_C0x80
-#define PPTP_GRE_FLAG_R0x40
-#define PPTP_GRE_FLAG_K0x20
-#define PPTP_GRE_FLAG_S0x10
-#define PPTP_GRE_FLAG_A0x80
-
-#define PPTP_GRE_IS_C(f) ((f)_GRE_FLAG_C)
-#define PPTP_GRE_IS_R(f) ((f)_GRE_FLAG_R)
-#define PPTP_GRE_IS_K(f) ((f)_GRE_FLAG_K)
-#define PPTP_GRE_IS_S(f) ((f)_GRE_FLAG_S)
-#define PPTP_GRE_IS_A(f) ((f)_GRE_FLAG_A)
-
-#define PPTP_HEADER_OVERHEAD (2+sizeof(struct pptp_gre_header))
-struct pptp_gre_header {
-   u8  flags;
-   u8  ver;
-   __be16 protocol;
-   __be16 payload_len;
-   __be16 call_id;
-   __be32 seq;
-   __be32 ack;
-} __packed;
-
static struct pppox_sock *lookup_chan(u16 call_id, __be32 s_addr)
{
  struct pppox_sock *sock;
diff --git a/include/net/gre.h b/include/net/gre.h
index 7a54a31..c469dcc 100644
--- a/include/net/gre.h
+++ b/include/net/gre.h
@@ -7,9 +7,17 @@
struct gre_base_hdr {
  __be16 flags;
  __be16 protocol;
-};
+} __packed;
#define GRE_HEADER_SECTION 4
+struct gre_full_hdr {
+   struct gre_base_hdr fixed_header;

Re: [5.3] ucc_geth: Fix to avoid IS_ERR_VALUE abuses and dead code on 64bit systems.

2016-08-08 Thread Arnd Bergmann

On Monday, August 8, 2016 2:49:11 PM CEST David Laight wrote:
> 
> > If qe_muram_alloc will return any error, Then IS_ERR_VALUE will always
> > return 0. it'll not call ucc_fast_free for any failure. Inside 'if code'
> > will be a dead code on 64bit. Even qe_muram_addr will return wrong
> > virtual address. Which can cause an error.
> > 
> >  kfree((void *)ugeth->tx_bd_ring_offset[i]);
> 
> Erm, kfree() isn't the right function for things allocated by 
> qe_muram_alloc().
> 
> I still thing you need to stop this code using IS_ERR_VALUE() at all.

Those are two separate issues:

a) The ucc_geth driver mixing kmalloc() memory with muram, and assigning
   the result to "u32" and "void __iomem *" variables, both of which
   are wrong at least half of the time.

b) calling conventions of qe_muram_alloc() being defined in a way that
   requires the use of IS_ERR_VALUE(), because '0' is a valid address
   here.

The first one can be solved by updating the network driver, ideally
by getting rid of the casts and using proper types and accessors,
while the second would require updating all users of that interface.

Arnd

RE: [5.3] ucc_geth: Fix to avoid IS_ERR_VALUE abuses and dead code on 64bit systems.

2016-08-08 Thread David Laight

From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] On 
Behalf Of Arvind Yadav
> IS_ERR_VALUE() assumes that parameter is an unsigned long.
> It can not be used to check if 'unsigned int' is passed insted.
> Which tends to reflect an error.
> In 64bit architectures sizeof (int) == 4 && sizeof (long) == 8.
> IS_ERR_VALUE(x) is ((x) >= (unsigned long)-4095).
> IS_ERR_VALUE() of 'unsigned int' is always false because the 32bit
> value is zero extended to 64 bits.

You are being far too wordy above, and definitely below.

> 
> Now problem in Freescale QEGigabit Ethernet-:
>  drivers/net/ethernet/freescale/ucc_geth.c
> 
...
>  qe_muram_addr(init_enet_pram_offset);
> 
> qe_muram_alloc (a.k.a. cpm_muram_alloc) returns unsigned long.
> Return value store in a u32 (init_enet_offset, exf_glbl_param_offset,
> rx_glbl_pram_offset, tx_glbl_pram_offset, send_q_mem_reg_offset,
> thread_dat_tx_offset, thread_dat_rx_offset, scheduler_offset,
> tx_fw_statistics_pram_offset, rx_fw_statistics_pram_offset,
> rx_irq_coalescing_tbl_offset, rx_bd_qs_tbl_offset, tx_bd_ring_offset,
> init_enet_pram_offset and rx_bd_ring_offset).

Inpenetrable...

> If qe_muram_alloc will return any error, Then IS_ERR_VALUE will always
> return 0. it'll not call ucc_fast_free for any failure. Inside 'if code'
> will be a dead code on 64bit. Even qe_muram_addr will return wrong
> virtual address. Which can cause an error.
> 
>  kfree((void *)ugeth->tx_bd_ring_offset[i]);

Erm, kfree() isn't the right function for things allocated by qe_muram_alloc().

I still thing you need to stop this code using IS_ERR_VALUE() at all.

David

Re: [PATCH net] net: ipv6: Fix ping to link-local addresses.

2016-08-08 Thread Lorenzo Colitti

On Mon, Aug 8, 2016 at 11:26 PM, Hannes Frederic Sowa
 wrote:
>> - if (sk->sk_bound_dev_if &&
>> - sk->sk_bound_dev_if != u->sin6_scope_id) {
>> - return -EINVAL;
>> - }
>
> Hmm, sk->sk_bound_dev_if always has highest prio for the selection of
> the output interface. Thus this code made sense to me.

Removing it is consistent with the other sendmsg functions such as
udpv6_sendmsg or rawv6_sendmsg.

There is similar code in __ip6_datagram_connect, but that seems a bit
different because that code also *sets* sk_bound_dev_if.

Personally I think it's better for pingv6_sendmsg be consistent with
the other *_sendmsg functions than with ip6_datagram_connect, and thus
the code should be removed. But I don't feel particularly strongly
about it.

Re: [PATCH RESEND] net: can: Introduce MEN 16Z192-00 CAN controller driver

2016-08-08 Thread Wolfgang Grandegger


Am 08.08.2016 um 16:05 schrieb Andreas Werner:

On Mon, Aug 08, 2016 at 02:28:39PM +0200, Wolfgang Grandegger wrote:

Hello,

Am 08.08.2016 um 13:39 schrieb Andreas Werner:

On Mon, Aug 08, 2016 at 11:27:25AM +0200, Wolfgang Grandegger wrote:

Hello Andreas,

a first quick review

Am 26.07.2016 um 11:16 schrieb Andreas Werner:

This CAN Controller is found on MEN Chameleon FPGAs.

The driver/device supports the CAN2.0 specification.
There are 255 RX and 255 Tx buffer within the IP. The
pointer for the buffer are handled by HW to make the
access from within the driver as simple as possible.

The driver also supports parameters to configure the
buffer level interrupt for RX/TX as well as a RX timeout
interrupt.

With this configuration options, the driver/device
provides flexibility for different types of usecases.

Signed-off-by: Andreas Werner 
---
drivers/net/can/Kconfig|  10 +
drivers/net/can/Makefile   |   1 +
drivers/net/can/men_z192_can.c | 989 +
3 files changed, 1000 insertions(+)
create mode 100644 drivers/net/can/men_z192_can.c


---snip---


+/* Buffer level control values */
+#define MEN_Z192_MIN_BUF_LVL   0
+#define MEN_Z192_MAX_BUF_LVL   254
+#define MEN_Z192_RX_BUF_LVL_DEF5
+#define MEN_Z192_TX_BUF_LVL_DEF5
+#define MEN_Z192_RX_TOUT_MIN   0
+#define MEN_Z192_RX_TOUT_MAX   65535
+#define MEN_Z192_RX_TOUT_DEF   1000
+
+static int txlvl = MEN_Z192_TX_BUF_LVL_DEF;
+module_param(txlvl, int, S_IRUGO);
+MODULE_PARM_DESC(txlvl, "TX IRQ trigger level (in frames) 0-254, default="
+__MODULE_STRING(MEN_Z192_TX_BUF_LVL_DEF) ")");
+
+static int rxlvl = MEN_Z192_RX_BUF_LVL_DEF;
+module_param(rxlvl, int, S_IRUGO);
+MODULE_PARM_DESC(rxlvl, "RX IRQ trigger level (in frames) 0-254, default="
+__MODULE_STRING(MEN_Z192_RX_BUF_LVL_DEF) ")");
+


What impact does the level have on the latency? Could you please add some
comments.


It has a impact on the latency.
rxlvl = 0 -> if one frame got received, a IRQ will be generated
rxlvl = 254 -> if 255 frames got received, a IRQ will be generated


Well, what's your usecase for rxlvl > 0? For me it's not obvious what it can
be good for. The application usually wants the message as soon as possible.
Anyway, the default should be *0*. For RX and TX.



The HW provides such feature and the driver should be able to control it.
It was developed to control the IRQ load (like NAPI) by defining a level of the 
buffer
when the IRQ got asserted.

I aggree with you to set the default to "0" which is the main usecase.


+static int rx_timeout = MEN_Z192_RX_TOUT_DEF;
+module_param(rx_timeout, int, S_IRUGO);
+MODULE_PARM_DESC(rx_timeout, "RX IRQ timeout (in 100usec steps), default="
+__MODULE_STRING(MEN_Z192_RX_TOUT_DEF) ")");


Ditto. What is "rx_timeout" good for.



The rx timeout is used im combination with the rxlvl to assert the
if the buffer level is not reached within this timeout.


What event will the application receive in case of a timeout.



Its just to control the time when the RX IRQ will be asserted if the buffer
level is not reached.
Imagine if the rx_timeout is not existing and the rxlvl is set to 50 and
only 30 packets are received. The RX IRQ will be never asserted.

By defining the rx_timeout, we can control the time when the RX IRQ is asserted
if the buffer level is not reached.

The application does not receive any special signal, its just the RX IRQ.


Now I got it. After timeout an interrupt will be trigger regardless of 
the thresholds. The default settings should result in minimum latencies.



Both, the timeout and the level are used to give the user as much
control over the latency and the IRQ handling as possible.
With this two options, the driver can be configured for different
use cases.

I will add this as the comment to make it more clear.


Even a bit more would be appreciated.



Sure...



---snip---


+static int men_z192_read_frame(struct net_device *ndev, unsigned int frame_nr)
+{
+   struct net_device_stats *stats = >stats;
+   struct men_z192 *priv = netdev_priv(ndev);
+   struct men_z192_cf_buf __iomem *cf_buf;
+   struct can_frame *cf;
+   struct sk_buff *skb;
+   u32 cf_offset;
+   u32 length;
+   u32 data;
+   u32 id;
+
+   skb = alloc_can_skb(ndev, );
+   if (unlikely(!skb)) {
+   stats->rx_dropped++;
+   return 0;
+   }
+
+   cf_offset = sizeof(struct men_z192_cf_buf) * frame_nr;
+
+   cf_buf = priv->dev_base + MEN_Z192_RX_BUF_START + cf_offset;
+   length = readl(_buf->length) & MEN_Z192_CFBUF_LEN;
+   id = readl(_buf->can_id);
+
+   if (id & MEN_Z192_CFBUF_IDE) {
+   /* Extended frame */
+   cf->can_id = (id & MEN_Z192_CFBUF_ID1) >> 3;
+   cf->can_id |= (id & MEN_Z192_CFBUF_ID2) >>
+   MEN_Z192_CFBUF_ID2_SHIFT;
+
+   cf->can_id

Re: [PATCH net] net: ipv6: Fix ping to link-local addresses.

2016-08-08 Thread Hannes Frederic Sowa

On 08.08.2016 09:42, Lorenzo Colitti wrote:
> ping_v6_sendmsg never sets flowi6_oif, so it is not possible to
> ping an IPv6 address on a different interface. Instead, it sets
> flowi6_iif, which is incorrect but harmless. Also, it returns an
> error if a passed-in scope ID doesn't match sk_bound_dev_if.
> 
> Get rid of the error, stop setting flowi6_iif, and support
> various ways of setting oif in the same priority order used by
> udpv6_sendmsg.
> 
> Tested: https://android-review.googlesource.com/#/c/254470/
> Signed-off-by: Lorenzo Colitti 
> ---
>  net/ipv6/ping.c | 29 +++--
>  1 file changed, 15 insertions(+), 14 deletions(-)
> 
> diff --git a/net/ipv6/ping.c b/net/ipv6/ping.c
> index fed40d1..eabf1ea 100644
> --- a/net/ipv6/ping.c
> +++ b/net/ipv6/ping.c
> @@ -55,7 +55,7 @@ static int ping_v6_sendmsg(struct sock *sk, struct msghdr 
> *msg, size_t len)
>   struct icmp6hdr user_icmph;
>   int addr_type;
>   struct in6_addr *daddr;
> - int iif = 0;
> + int oif = 0;
>   struct flowi6 fl6;
>   int err;
>   struct dst_entry *dst;
> @@ -78,23 +78,28 @@ static int ping_v6_sendmsg(struct sock *sk, struct msghdr 
> *msg, size_t len)
>   if (u->sin6_family != AF_INET6) {
>   return -EAFNOSUPPORT;
>   }
> - if (sk->sk_bound_dev_if &&
> - sk->sk_bound_dev_if != u->sin6_scope_id) {
> - return -EINVAL;
> - }

Hmm, sk->sk_bound_dev_if always has highest prio for the selection of
the output interface. Thus this code made sense to me.

Bye,
Hannes

[5.3] ucc_geth: Fix to avoid IS_ERR_VALUE abuses and dead code on 64bit systems.

2016-08-08 Thread Arvind Yadav

IS_ERR_VALUE() assumes that parameter is an unsigned long.
It can not be used to check if 'unsigned int' is passed insted.
Which tends to reflect an error.
In 64bit architectures sizeof (int) == 4 && sizeof (long) == 8.
IS_ERR_VALUE(x) is ((x) >= (unsigned long)-4095).
IS_ERR_VALUE() of 'unsigned int' is always false because the 32bit
value is zero extended to 64 bits.

Now problem in Freescale QEGigabit Ethernet-:
 drivers/net/ethernet/freescale/ucc_geth.c

   init_enet_offset =
  qe_muram_alloc(thread_size, thread_alignment);
   if (IS_ERR_VALUE(init_enet_offset)) {
  if (netif_msg_ifup(ugeth))
 pr_err("Can not allocate DPRAM memory\n");
  qe_put_snum((u8) snum);
  return -ENOMEM;
   }
   ugeth->tx_bd_ring_offset[j] =
  qe_muram_alloc(length,
UCC_GETH_TX_BD_RING_ALIGNMENT);
   if (!IS_ERR_VALUE(ugeth->tx_bd_ring_offset[j]))
  ugeth->p_tx_bd_ring[j] = (u8 __iomem *) qe_muram_addr(ugeth->
tx_bd_ring_offset[j]);

   ugeth->rx_bd_ring_offset[j] =
  qe_muram_alloc(length,
 UCC_GETH_RX_BD_RING_ALIGNMENT);
   if (!IS_ERR_VALUE(ugeth->rx_bd_ring_offset[j]))
   ugeth->p_rx_bd_ring[j] =
   (u8 __iomem *) qe_muram_addr(ugeth->
rx_bd_ring_offset[j]);

   /* Allocate global tx parameter RAM page */
ugeth->tx_glbl_pram_offset =
qe_muram_alloc(sizeof(struct ucc_geth_tx_global_pram),
   UCC_GETH_TX_GLOBAL_PRAM_ALIGNMENT);
if (IS_ERR_VALUE(ugeth->tx_glbl_pram_offset)) {
   if (netif_msg_ifup(ugeth))
  pr_err("Can not allocate DPRAM memory for p_tx_glbl_pram\n");
   return -ENOMEM;
}

/* Size varies with number of Tx threads */
ugeth->thread_dat_tx_offset =
qe_muram_alloc(numThreadsTxNumerical *
   sizeof(struct ucc_geth_thread_data_tx) +
   32 * (numThreadsTxNumerical == 1),
   UCC_GETH_THREAD_DATA_ALIGNMENT);
if (IS_ERR_VALUE(ugeth->thread_dat_tx_offset)) {
   if (netif_msg_ifup(ugeth))
  pr_err
  ("Can not allocate DPRAM memory for p_thread_data_tx\n");
   return -ENOMEM;
}

/* Size varies with number of Tx queues */
ugeth->send_q_mem_reg_offset =
qe_muram_alloc(ug_info->numQueuesTx *
   sizeof(struct ucc_geth_send_queue_qd),
   UCC_GETH_SEND_QUEUE_QUEUE_DESCRIPTOR_ALIGNMENT);

if (IS_ERR_VALUE(ugeth->send_q_mem_reg_offset)) {
   if (netif_msg_ifup(ugeth))
pr_err("Can not allocate DPRAM memory for p_send_q_mem_reg\n");
   return -ENOMEM;
}

ugeth->scheduler_offset =
   qe_muram_alloc(sizeof(struct ucc_geth_scheduler),
   UCC_GETH_SCHEDULER_ALIGNMENT);
if (IS_ERR_VALUE(ugeth->scheduler_offset)) {
   if (netif_msg_ifup(ugeth))
  pr_err("Can not allocate DPRAM memory for p_scheduler\n");
   return -ENOMEM;
}

ugeth->tx_fw_statistics_pram_offset =
 qe_muram_alloc(sizeof
   (struct ucc_geth_tx_firmware_statistics_pram),
UCC_GETH_TX_STATISTICS_ALIGNMENT);
if (IS_ERR_VALUE(ugeth->tx_fw_statistics_pram_offset)) {
   if (netif_msg_ifup(ugeth))
   pr_err(
   "Can not allocate DPRAM memory for p_tx_fw_statistics_pram\n");
  return -ENOMEM;
}
/* Allocate global rx parameter RAM page */
ugeth->rx_glbl_pram_offset =
qe_muram_alloc(sizeof(struct ucc_geth_rx_global_pram),
   UCC_GETH_RX_GLOBAL_PRAM_ALIGNMENT);
if (IS_ERR_VALUE(ugeth->rx_glbl_pram_offset)) {
   if (netif_msg_ifup(ugeth))
 pr_err("Can not allocate DPRAM memory for p_rx_glbl_pram\n");
   return -ENOMEM;
}
   /* Size varies with number of Rx threads */
ugeth->thread_dat_rx_offset =
qe_muram_alloc(numThreadsRxNumerical *
   sizeof(struct ucc_geth_thread_data_rx),
   UCC_GETH_THREAD_DATA_ALIGNMENT);
if (IS_ERR_VALUE(ugeth->thread_dat_rx_offset)) {
   if (netif_msg_ifup(ugeth))
pr_err("Can not allocate DPRAM memory for p_thread_data_rx\n");
   return -ENOMEM;
}
ugeth->rx_fw_statistics_pram_offset =
qe_muram_alloc(sizeof
  (struct ucc_geth_rx_firmware_statistics_pram),
   UCC_GETH_RX_STATISTICS_ALIGNMENT);
   if (IS_ERR_VALUE(ugeth->rx_fw_statistics_pram_offset)) {
   if (netif_msg_ifup(ugeth))
pr_err(

Re: [PATCH RESEND] net: can: Introduce MEN 16Z192-00 CAN controller driver

2016-08-08 Thread Andreas Werner

On Mon, Aug 08, 2016 at 03:06:33PM +0200, Kurt Van Dijck wrote:
> 
> --- Original message ---
> > Date:   Mon, 8 Aug 2016 14:28:39 +0200
> > From: Wolfgang Grandegger 
> > 
> [...]
> > >>>+
> > >>>+if (!(cf->can_id & CAN_RTR_FLAG)) {
> > >>>+writel(data[0], _buf->data[0]);
> > >>>+writel(data[1], _buf->data[1]);
> > >>
> > >>Why do you not check cf->can_dlc here as well. And is the extra copy
> > >>necessary.
> > >>
> > >
> > >Yes, I agree with you. The extra copy could be also avoided.
> > >
> > >>>+
> > >>>+stats->tx_bytes += cf->can_dlc;
> > >>>+}
> > >>
> > >>If I look to other drivers, they write the data even in case of RTR.
> > >>
> > >
> > >But why?
> > >
> > >A RTR does not have any data, therefore there is no need to write the data.
> > >Only the length is required as the request size.
> > 
> > Yes; I'm wondering as well.
> > 
> > >
> > >If there is a reason behind writing the data of a RTR frame, I can
> > >change that, but for now there is no reason.
> > 
> > Yep.
> 
> I _think_ that copying the data without checking the RTR bit clearly
> avoids a condition and might produce faster code on some machines.
> In any case, it reads easier.
> I'm not sure how that interacts with caches etc etc.
> 
> On the other hand, giving unused data is a bad habit that may reveal
> security information on some places, so better avoid it.
> 
> Kurt

Hi Kurt,
thanks for your comment.

In my opinion, I really prever to NOT copying such data if the RTR flag ist set.

Regards
Andy

Re: [PATCH RESEND] net: can: Introduce MEN 16Z192-00 CAN controller driver

2016-08-08 Thread Andreas Werner

On Mon, Aug 08, 2016 at 02:28:39PM +0200, Wolfgang Grandegger wrote:
> Hello,
> 
> Am 08.08.2016 um 13:39 schrieb Andreas Werner:
> >On Mon, Aug 08, 2016 at 11:27:25AM +0200, Wolfgang Grandegger wrote:
> >>Hello Andreas,
> >>
> >>a first quick review
> >>
> >>Am 26.07.2016 um 11:16 schrieb Andreas Werner:
> >>>This CAN Controller is found on MEN Chameleon FPGAs.
> >>>
> >>>The driver/device supports the CAN2.0 specification.
> >>>There are 255 RX and 255 Tx buffer within the IP. The
> >>>pointer for the buffer are handled by HW to make the
> >>>access from within the driver as simple as possible.
> >>>
> >>>The driver also supports parameters to configure the
> >>>buffer level interrupt for RX/TX as well as a RX timeout
> >>>interrupt.
> >>>
> >>>With this configuration options, the driver/device
> >>>provides flexibility for different types of usecases.
> >>>
> >>>Signed-off-by: Andreas Werner 
> >>>---
> >>>drivers/net/can/Kconfig|  10 +
> >>>drivers/net/can/Makefile   |   1 +
> >>>drivers/net/can/men_z192_can.c | 989 
> >>>+
> >>>3 files changed, 1000 insertions(+)
> >>>create mode 100644 drivers/net/can/men_z192_can.c
> 
> ---snip---
> 
> >>>+/* Buffer level control values */
> >>>+#define MEN_Z192_MIN_BUF_LVL  0
> >>>+#define MEN_Z192_MAX_BUF_LVL  254
> >>>+#define MEN_Z192_RX_BUF_LVL_DEF   5
> >>>+#define MEN_Z192_TX_BUF_LVL_DEF   5
> >>>+#define MEN_Z192_RX_TOUT_MIN  0
> >>>+#define MEN_Z192_RX_TOUT_MAX  65535
> >>>+#define MEN_Z192_RX_TOUT_DEF  1000
> >>>+
> >>>+static int txlvl = MEN_Z192_TX_BUF_LVL_DEF;
> >>>+module_param(txlvl, int, S_IRUGO);
> >>>+MODULE_PARM_DESC(txlvl, "TX IRQ trigger level (in frames) 0-254, default="
> >>>+   __MODULE_STRING(MEN_Z192_TX_BUF_LVL_DEF) ")");
> >>>+
> >>>+static int rxlvl = MEN_Z192_RX_BUF_LVL_DEF;
> >>>+module_param(rxlvl, int, S_IRUGO);
> >>>+MODULE_PARM_DESC(rxlvl, "RX IRQ trigger level (in frames) 0-254, default="
> >>>+   __MODULE_STRING(MEN_Z192_RX_BUF_LVL_DEF) ")");
> >>>+
> >>
> >>What impact does the level have on the latency? Could you please add some
> >>comments.
> >
> >It has a impact on the latency.
> >rxlvl = 0 -> if one frame got received, a IRQ will be generated
> >rxlvl = 254 -> if 255 frames got received, a IRQ will be generated
> 
> Well, what's your usecase for rxlvl > 0? For me it's not obvious what it can
> be good for. The application usually wants the message as soon as possible.
> Anyway, the default should be *0*. For RX and TX.
> 

The HW provides such feature and the driver should be able to control it.
It was developed to control the IRQ load (like NAPI) by defining a level of the 
buffer
when the IRQ got asserted.

I aggree with you to set the default to "0" which is the main usecase.

> >>>+static int rx_timeout = MEN_Z192_RX_TOUT_DEF;
> >>>+module_param(rx_timeout, int, S_IRUGO);
> >>>+MODULE_PARM_DESC(rx_timeout, "RX IRQ timeout (in 100usec steps), default="
> >>>+   __MODULE_STRING(MEN_Z192_RX_TOUT_DEF) ")");
> >>
> >>Ditto. What is "rx_timeout" good for.
> >>
> >
> >The rx timeout is used im combination with the rxlvl to assert the
> >if the buffer level is not reached within this timeout.
> 
> What event will the application receive in case of a timeout.
> 

Its just to control the time when the RX IRQ will be asserted if the buffer
level is not reached.
Imagine if the rx_timeout is not existing and the rxlvl is set to 50 and 
only 30 packets are received. The RX IRQ will be never asserted.

By defining the rx_timeout, we can control the time when the RX IRQ is asserted
if the buffer level is not reached.

The application does not receive any special signal, its just the RX IRQ.

> >Both, the timeout and the level are used to give the user as much
> >control over the latency and the IRQ handling as possible.
> >With this two options, the driver can be configured for different
> >use cases.
> >
> >I will add this as the comment to make it more clear.
> 
> Even a bit more would be appreciated.
> 

Sure...

> 
> ---snip---
> 
> >>>+static int men_z192_read_frame(struct net_device *ndev, unsigned int 
> >>>frame_nr)
> >>>+{
> >>>+  struct net_device_stats *stats = >stats;
> >>>+  struct men_z192 *priv = netdev_priv(ndev);
> >>>+  struct men_z192_cf_buf __iomem *cf_buf;
> >>>+  struct can_frame *cf;
> >>>+  struct sk_buff *skb;
> >>>+  u32 cf_offset;
> >>>+  u32 length;
> >>>+  u32 data;
> >>>+  u32 id;
> >>>+
> >>>+  skb = alloc_can_skb(ndev, );
> >>>+  if (unlikely(!skb)) {
> >>>+  stats->rx_dropped++;
> >>>+  return 0;
> >>>+  }
> >>>+
> >>>+  cf_offset = sizeof(struct men_z192_cf_buf) * frame_nr;
> >>>+
> >>>+  cf_buf = priv->dev_base + MEN_Z192_RX_BUF_START + cf_offset;
> >>>+  length = readl(_buf->length) & MEN_Z192_CFBUF_LEN;
> >>>+  id = readl(_buf->can_id);
> >>>+
> >>>+  if (id & MEN_Z192_CFBUF_IDE) {
> >>>+  /* Extended frame */
> >>>+  cf->can_id = (id &

Re: [PATCH net-next] cdc_ether: Improve ZTE MF823/831/910 handling

2016-08-08 Thread Oliver Neukum

On Mon, 2016-08-08 at 14:44 +0200, Bjørn Mork wrote:
> Oliver Neukum  writes:

> > I don't see how it would be specific for a subsystem. If the patch
> > is correct, it belongs into the networking core.
> 
> The bug is in the firmware implementation of the "read unique vendor
> assigned mac address" functions, and should therefore be fixed there.

Well, if you define the semantics of the operation in that manner, it
certainly does. That however is not given. You could also define it
as returning the MAC the hardware listens to. The difference was just
unclear when the API was defined.

> It cannot be put into the networking core because "read unique vendor
> assigned mac address" is a hardware dependent function.  It's
> implemented in each ethernet driver based of whatever interface the
> firmware provides to that register.

Again, that depends on that particular semantics.

> IMHO, usbnet_get_ethernet_addr() would be the most appropriate place for
> cdc_ether and other CDC drivers. And generic_rndis_bind() is the correct
> place for rndis_host.
> 
> Putting this in usbnet_get_ethernet_addr() will also fix the XMM7160
> based devices having an FF:FF:FF:FF:FF:FF mac address (sic).  I'm pretty
> sure there are other examples too.  There is no end to the creative
> crazyness of firmware engineers...

It is clear that that would work. No question about that.

But why fix similar issues at two different places? And what about
PCI or other cards that show the same problem?

Regards
Oliver

[PATCH v1 1/1] net: phy: Add edge-rate, mac-if, read, write func to Microsemi PHYs.

2016-08-08 Thread Nagaraju Lakkaraju

crosemi PHYsBcc: 
Subject: [PATCH v1 1/1]  net: phy: Add edge-rate, mac-if, read, write func to
Reply-To: Nagaraju Lakkaraju  

Hello,

As part of 2nd patch, Add Edge rate control, MAC Interface, Read and write 
driver functions add for Microsemi PHYs.

Please review and send your comments.

Thanks,
Raju.

>From 6303576768b5c5dcc0e35fb46c525337a3845557 Mon Sep 17 00:00:00 2001
From: Nagaraju Lakkaraju 
Date: Mon, 8 Aug 2016 18:51:36 +0530
Subject: [PATCH v1 1/1]  net: phy: Add edge-rate, mac-if, read, write func to
 Microsemi PHYs.

Signed-off-by: Nagaraju Lakkaraju 
---
 drivers/net/phy/mscc.c | 234 +++--
 drivers/net/phy/mscc_reg.h | 135 ++
 include/linux/mscc.h   |  45 +
 include/linux/phy.h|   2 +
 4 files changed, 387 insertions(+), 29 deletions(-)
 mode change 100644 => 100755 drivers/net/phy/mscc.c
 create mode 100644 drivers/net/phy/mscc_reg.h
 create mode 100644 include/linux/mscc.h
 mode change 100644 => 100755 include/linux/phy.h

diff --git a/drivers/net/phy/mscc.c b/drivers/net/phy/mscc.c
old mode 100644
new mode 100755
index 49c7506..af7a441
--- a/drivers/net/phy/mscc.c
+++ b/drivers/net/phy/mscc.c
@@ -11,34 +11,9 @@
 #include 
 #include 
 #include 
+#include 

-enum rgmii_rx_clock_delay {
-   RGMII_RX_CLK_DELAY_0_2_NS = 0,
-   RGMII_RX_CLK_DELAY_0_8_NS = 1,
-   RGMII_RX_CLK_DELAY_1_1_NS = 2,
-   RGMII_RX_CLK_DELAY_1_7_NS = 3,
-   RGMII_RX_CLK_DELAY_2_0_NS = 4,
-   RGMII_RX_CLK_DELAY_2_3_NS = 5,
-   RGMII_RX_CLK_DELAY_2_6_NS = 6,
-   RGMII_RX_CLK_DELAY_3_4_NS = 7
-};
-
-#define MII_VSC85XX_INT_MASK  25
-#define MII_VSC85XX_INT_MASK_MASK 0xa000
-#define MII_VSC85XX_INT_STATUS26
-
-#define MSCC_EXT_PAGE_ACCESS  31
-#define MSCC_PHY_PAGE_STANDARD0x /* Standard registers */
-#define MSCC_PHY_PAGE_EXTENDED_2  0x0002 /* Extended reg - page 2 */
-
-/* Extended Page 2 Registers */
-#define MSCC_PHY_RGMII_CNTL   20
-#define RGMII_RX_CLK_DELAY_MASK   0x0070
-#define RGMII_RX_CLK_DELAY_POS4
-
-/* Microsemi PHY ID's */
-#define PHY_ID_VSC85310x00070570
-#define PHY_ID_VSC85410x00070770
+#include "mscc_reg.h"

 static int vsc85xx_phy_page_set(struct phy_device *phydev, u8 page)
 {
@@ -84,7 +59,7 @@ static int vsc85xx_config_init(struct phy_device *phydev)

 static int vsc85xx_ack_interrupt(struct phy_device *phydev)
 {
-   int rc;
+   int rc = 0;

if (phydev->interrupts == PHY_INTERRUPT_ENABLED)
rc = phy_read(phydev, MII_VSC85XX_INT_STATUS);
@@ -98,7 +73,7 @@ static int vsc85xx_config_intr(struct phy_device *phydev)

if (phydev->interrupts == PHY_INTERRUPT_ENABLED) {
rc = phy_write(phydev, MII_VSC85XX_INT_MASK,
-  MII_VSC85XX_INT_MASK_MASK);
+  MII_VSC85XX_INT_MASK_MASK);
} else {
rc = phy_write(phydev, MII_VSC85XX_INT_MASK, 0);
if (rc < 0)
@@ -109,6 +84,203 @@ static int vsc85xx_config_intr(struct phy_device *phydev)
return rc;
 }

+static int vsc85xx_soft_reset(struct phy_device *phydev)
+{
+   int rc;
+   u16 reg_val;
+
+   reg_val = phy_read(phydev, MII_BMCR);
+   reg_val |= BMCR_RESET;
+   rc = phy_write(phydev, MII_BMCR, reg_val);
+
+   return rc;
+}
+
+static int vsc85xx_edge_rate_cntl_set(struct phy_device *phydev,
+ u8 *rate)
+{
+   int rc;
+   u16 reg_val;
+   u8  edge_rate = *rate;
+
+   mutex_lock(>lock);
+   rc = vsc85xx_phy_page_set(phydev, MSCC_PHY_PAGE_EXTENDED_2);
+   if (rc != 0)
+   goto out_unlock;
+   reg_val = phy_read(phydev, MSCC_PHY_WOL_MAC_CONTROL);
+   reg_val &= ~(EDGE_RATE_CNTL_MASK);
+   reg_val |= (edge_rate << EDGE_RATE_CNTL_POS);
+   phy_write(phydev, MSCC_PHY_WOL_MAC_CONTROL, reg_val);
+   rc = vsc85xx_phy_page_set(phydev, MSCC_PHY_PAGE_STANDARD);
+
+out_unlock:
+   mutex_unlock(>lock);
+
+   return rc;
+}
+
+static int vsc85xx_edge_rate_cntl_get(struct phy_device *phydev,
+ u8 *rate)
+{
+   int rc;
+   u16 reg_val;
+
+   mutex_lock(>lock);
+   rc = vsc85xx_phy_page_set(phydev, MSCC_PHY_PAGE_EXTENDED_2);
+   if (rc != 0)
+   goto out_unlock;
+   reg_val = phy_read(phydev, MSCC_PHY_WOL_MAC_CONTROL);
+   reg_val &= EDGE_RATE_CNTL_MASK;
+   *rate = reg_val >> EDGE_RATE_CNTL_POS;
+   rc = vsc85xx_phy_page_set(phydev, MSCC_PHY_PAGE_STANDARD);
+
+out_unlock:
+   mutex_unlock(>lock);
+
+   return rc;
+}
+
+static int vsc85xx_mac_if_set(struct phy_device *phydev,
+ phy_interface_t   *interface)

Re: [PATCH] sunrpc: Remove unnecessary variable

2016-08-08 Thread Anna Schumaker

On 08/08/2016 05:13 AM, Amitoj Kaur Chawla wrote:
> The variable `err` is not used anywhere and just returns the
> predefined value `0` at the end of the function. Hence, remove the
> variable and return 0 explicitly.

Makes sense to me.  Thanks!

Anna

> 
> Signed-off-by: Amitoj Kaur Chawla 
> ---
>  net/sunrpc/clnt.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> index b7f2104..0a775fa 100644
> --- a/net/sunrpc/clnt.c
> +++ b/net/sunrpc/clnt.c
> @@ -184,7 +184,6 @@ static int __rpc_clnt_handle_event(struct rpc_clnt *clnt, 
> unsigned long event,
>  struct super_block *sb)
>  {
>   struct dentry *dentry;
> - int err = 0;
>  
>   switch (event) {
>   case RPC_PIPEFS_MOUNT:
> @@ -201,7 +200,7 @@ static int __rpc_clnt_handle_event(struct rpc_clnt *clnt, 
> unsigned long event,
>   printk(KERN_ERR "%s: unknown event: %ld\n", __func__, event);
>   return -ENOTSUPP;
>   }
> - return err;
> + return 0;
>  }
>  
>  static int __rpc_pipefs_event(struct rpc_clnt *clnt, unsigned long event,
>

Re: [PATCH net] sctp: use event->chunk when it's valid

2016-08-08 Thread Neil Horman

On Sun, Aug 07, 2016 at 02:15:13PM +0800, Xin Long wrote:
> Commit 52253db924d1 ("sctp: also point GSO head_skb to the sk when
> it's available") used event->chunk->head_skb to get the head_skb in
> sctp_ulpevent_set_owner().
> 
> But at that moment, the event->chunk was NULL, as it cloned the skb
> in sctp_ulpevent_make_rcvmsg(). Therefore, that patch didn't really
> work.
> 
> This patch is to move the event->chunk initialization before calling
> sctp_ulpevent_receive_data() so that it uses event->chunk when it's
> valid.
> 
> Fixes: 52253db924d1 ("sctp: also point GSO head_skb to the sk when it's 
> available")
> Signed-off-by: Xin Long 
> ---
>  net/sctp/ulpevent.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sctp/ulpevent.c b/net/sctp/ulpevent.c
> index 1bc4f71..d85b803 100644
> --- a/net/sctp/ulpevent.c
> +++ b/net/sctp/ulpevent.c
> @@ -702,14 +702,14 @@ struct sctp_ulpevent *sctp_ulpevent_make_rcvmsg(struct 
> sctp_association *asoc,
>*/
>   sctp_ulpevent_init(event, 0, skb->len + sizeof(struct sk_buff));
>  
> - sctp_ulpevent_receive_data(event, asoc);
> -
>   /* And hold the chunk as we need it for getting the IP headers
>* later in recvmsg
>*/
>   sctp_chunk_hold(chunk);
>   event->chunk = chunk;
>  
> + sctp_ulpevent_receive_data(event, asoc);
> +
>   event->stream = ntohs(chunk->subh.data_hdr->stream);
>   event->ssn = ntohs(chunk->subh.data_hdr->ssn);
>   event->ppid = chunk->subh.data_hdr->ppid;
> -- 
> 2.1.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
Acked-by: Neil Horman

Re: [PATCH RESEND] net: can: Introduce MEN 16Z192-00 CAN controller driver

2016-08-08 Thread Kurt Van Dijck


--- Original message ---
> Date: Mon, 8 Aug 2016 14:28:39 +0200
> From: Wolfgang Grandegger 
> 
[...]
> >>>+
> >>>+  if (!(cf->can_id & CAN_RTR_FLAG)) {
> >>>+  writel(data[0], _buf->data[0]);
> >>>+  writel(data[1], _buf->data[1]);
> >>
> >>Why do you not check cf->can_dlc here as well. And is the extra copy
> >>necessary.
> >>
> >
> >Yes, I agree with you. The extra copy could be also avoided.
> >
> >>>+
> >>>+  stats->tx_bytes += cf->can_dlc;
> >>>+  }
> >>
> >>If I look to other drivers, they write the data even in case of RTR.
> >>
> >
> >But why?
> >
> >A RTR does not have any data, therefore there is no need to write the data.
> >Only the length is required as the request size.
> 
> Yes; I'm wondering as well.
> 
> >
> >If there is a reason behind writing the data of a RTR frame, I can
> >change that, but for now there is no reason.
> 
> Yep.

I _think_ that copying the data without checking the RTR bit clearly
avoids a condition and might produce faster code on some machines.
In any case, it reads easier.
I'm not sure how that interacts with caches etc etc.

On the other hand, giving unused data is a bad habit that may reveal
security information on some places, so better avoid it.

Kurt

RE: [RFC PATCH v4 1/2] Documentation: DT: net: Add Xilinx gmiitorgmii converter device tree binding documentation

2016-08-08 Thread Appana Durga Kedareswara Rao

Hi Michal,

> On 8.8.2016 09:15, Kedareswara rao Appana wrote:
> > Device-tree binding documentation for xilinx gmiitorgmii converter.
> >
> > Signed-off-by: Kedareswara rao Appana 
> > ---
> > Changes for v4:
> > --> Modified compatible as suggested by Rob.
> > --> Removed underscores from the converter node name as suggested by Rob.
> > Changes for v3:
> > --> None.
> > Changes for v2:
> > --> New patch.
> >
> >  .../devicetree/bindings/net/xilinx_gmii2rgmii.txt  | 38
> > ++
> >  1 file changed, 38 insertions(+)
> >  create mode 100644
> > Documentation/devicetree/bindings/net/xilinx_gmii2rgmii.txt
> >
> > diff --git
> > a/Documentation/devicetree/bindings/net/xilinx_gmii2rgmii.txt
> > b/Documentation/devicetree/bindings/net/xilinx_gmii2rgmii.txt
> > new file mode 100644
> > index 000..453680d
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/net/xilinx_gmii2rgmii.txt
> > @@ -0,0 +1,38 @@
> > +XILINX GMIITORGMII Converter Driver Device Tree Bindings
> > +
> > +
> > +The Gigabit Media Independent Interface (GMII) to Reduced Gigabit
> > +Media Independent Interface (RGMII) core provides the RGMII between
> > +RGMII-compliant Ethernet physical media devices (PHY) and the Gigabit
> Ethernet controller.
> > +This core can be used in all three modes of operation(10/100/1000 Mb/s).
> > +The Management Data Input/Output (MDIO) interface is used to
> > +configure the Speed of operation. This core can switch dynamically
> > +between the three Different speed modes by configuring the conveter
> register through mdio write.
> > +
> > +The MDIO is a bus to which the PHY devices are connected.  For each
> > +device that exists on this bus, a child node should be created.  See
> > +the definition of the PHY node in booting-without-of.txt for an
> > +example of how to define a PHY.
> > +
> > +This converter sits between the MAC and the external phy.
> > +MAC <==> GMII2RGMII <==> RGMII_PHY
> > +
> > +Required properties:
> > +  - compatible : Should be "xlnx,gmii-to-rgmii-1.0"
> > +  - reg : The ID number for the phy, usually a small integer
> > +  - phy-handle: Should point to the external phy device.
> > +   See ethernet.txt file in the same directory.
> > +
> > +Example:
> > +   mdio {
> > +#address-cells = <1>;
> > +#size-cells = <0>;
> > +   phy: ethernet-phy@0 {
> > +   ..
> > +   };
> > +gmiitorgmii: gmiitorgmii@8 {
> > +compatible = "xlnx,gmii-to-rgmii-1.0";
> > +reg = <8>;
> > +   phy-handle = <>;
> > +};
> 
> Indentation in this example is quite weird. You are mixing tabs and spaces.

Sure will fix in the next version...

Regards,
Kedar.

> 
> Thanks,
> Michal

Re: [RFC PATCH 1/3] net: Remove unnecessary memset in __snmp6_fill_stats64

2016-08-08 Thread hejianet


Yes, sorry about it，I am too hasty

B.R.

Jia He

On 8/8/16 7:12 PM, Florian Westphal wrote:

Jia He  wrote:

buff[] will be assigned later, so memset is not necessary.

Signed-off-by: Jia He 
Cc: "David S. Miller" 
Cc: Alexey Kuznetsov 
Cc: James Morris 
Cc: Hideaki YOSHIFUJI 
Cc: Patrick McHardy 
---
  net/ipv6/addrconf.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index ab3e796..43fa8d0 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4967,7 +4967,6 @@ static inline void __snmp6_fill_stats64(u64 *stats, void 
__percpu *mib,
  
  	BUG_ON(pad < 0);
  
-	memset(buff, 0, sizeof(buff));

buff[0] = IPSTATS_MIB_MAX;
  
  	for_each_possible_cpu(c) {

 for (i = 1; i < IPSTATS_MIB_MAX; i++)
 buff[i] += snmp_get_cpu_field64(mib, c, i, syncpoff);

Without memset result of buff[i] += ... is undefined.

Re: [PATCH] [net] rxrpc: fix uninitialized pointer dereference in debug code

2016-08-08 Thread David Howells

Arnd Bergmann  wrote:

> A newly added bugfix caused an uninitialized variable to be
> used for printing debug output. This is harmless as long
> as the debug setting is disabled, but otherwise leads to an
> immediate crash.
> 
> gcc warns about this when -Wmaybe-uninitialized is enabled:
> 
> net/rxrpc/call_object.c: In function 'rxrpc_release_call':
> net/rxrpc/call_object.c:496:163: error: 'sp' may be used uninitialized in 
> this function [-Werror=maybe-uninitialized]
> 
> The initialization was removed but one of the users remains.
> This adds back the initialization.
> 
> Signed-off-by: Arnd Bergmann 
> Fixes: 372ee16386bb ("rxrpc: Fix races between skb free, ACK generation and 
> replying")

Applied, thanks.

David

Re: [PATCH net-next] cdc_ether: Improve ZTE MF823/831/910 handling

2016-08-08 Thread Bjørn Mork

Oliver Neukum  writes:
> On Mon, 2016-07-18 at 16:10 +0200, Kristian Evensen wrote:
>> On Mon, Jul 18, 2016 at 3:50 PM, Oliver Neukum  wrote:
>> > No, you misunderstand me. I don't want quirks if we can avoid it.
>> > But if you need to do this for rndis_host and cdc_ether and maybe other
>> > drivers you should not be touching drivers. This belongs into the core
>> > ethernet code. Your code is good, but you are putting it in the wrong
>> > places.
>> 
>> Ok, sounds good. So far, I have only seen the random MAC issue with
>> the three previously mentioned devices, but who knows how many else is
>> out there with the same error ... I don't think it should be in the
>> core ethernet code, at least not yet, but I agree it would make sense
>> to move it to for example usbnet_core(). If you agree, I can prepare a
>> patch for it.
>
> I don't see how it would be specific for a subsystem. If the patch
> is correct, it belongs into the networking core.

The bug is in the firmware implementation of the "read unique vendor
assigned mac address" functions, and should therefore be fixed there.
It cannot be put into the networking core because "read unique vendor
assigned mac address" is a hardware dependent function.  It's
implemented in each ethernet driver based of whatever interface the
firmware provides to that register.

IMHO, usbnet_get_ethernet_addr() would be the most appropriate place for
cdc_ether and other CDC drivers. And generic_rndis_bind() is the correct
place for rndis_host.

Putting this in usbnet_get_ethernet_addr() will also fix the XMM7160
based devices having an FF:FF:FF:FF:FF:FF mac address (sic).  I'm pretty
sure there are other examples too.  There is no end to the creative
crazyness of firmware engineers...

An lsusb snippet example:

Interface Descriptor:
  bLength 9
  bDescriptorType 4
  bInterfaceNumber0
  bAlternateSetting   0
  bNumEndpoints   1
  bInterfaceClass 2 Communications
  bInterfaceSubClass 13 
  bInterfaceProtocol  0 
  iInterface  5 Sierra Wireless EM7345 4G LTE (NCM)
  CDC Header:
bcdCDC   1.20
  CDC Union:
bMasterInterface0
bSlaveInterface 1 
  CDC NCM:
bcdNcmVersion1.00
bmNetworkCapabilities 0x00
  CDC Ethernet:
iMacAddress  6 
bmEthernetStatistics0x
wMaxSegmentSize   1514
wNumberMCFilters0x
bNumberPowerFilters  0

FWIW, putting the fix in usbnet_get_ethernet_addr() will not be a
problem for qmi_wwan.  It will further fix up the resulting random
address if required.

Bjørn

Re: [PATCH RESEND] net: can: Introduce MEN 16Z192-00 CAN controller driver

2016-08-08 Thread Wolfgang Grandegger


Hello,

Am 08.08.2016 um 13:39 schrieb Andreas Werner:

On Mon, Aug 08, 2016 at 11:27:25AM +0200, Wolfgang Grandegger wrote:

Hello Andreas,

a first quick review

Am 26.07.2016 um 11:16 schrieb Andreas Werner:

This CAN Controller is found on MEN Chameleon FPGAs.

The driver/device supports the CAN2.0 specification.
There are 255 RX and 255 Tx buffer within the IP. The
pointer for the buffer are handled by HW to make the
access from within the driver as simple as possible.

The driver also supports parameters to configure the
buffer level interrupt for RX/TX as well as a RX timeout
interrupt.

With this configuration options, the driver/device
provides flexibility for different types of usecases.

Signed-off-by: Andreas Werner 
---
drivers/net/can/Kconfig|  10 +
drivers/net/can/Makefile   |   1 +
drivers/net/can/men_z192_can.c | 989 +
3 files changed, 1000 insertions(+)
create mode 100644 drivers/net/can/men_z192_can.c


---snip---


+/* Buffer level control values */
+#define MEN_Z192_MIN_BUF_LVL   0
+#define MEN_Z192_MAX_BUF_LVL   254
+#define MEN_Z192_RX_BUF_LVL_DEF5
+#define MEN_Z192_TX_BUF_LVL_DEF5
+#define MEN_Z192_RX_TOUT_MIN   0
+#define MEN_Z192_RX_TOUT_MAX   65535
+#define MEN_Z192_RX_TOUT_DEF   1000
+
+static int txlvl = MEN_Z192_TX_BUF_LVL_DEF;
+module_param(txlvl, int, S_IRUGO);
+MODULE_PARM_DESC(txlvl, "TX IRQ trigger level (in frames) 0-254, default="
+__MODULE_STRING(MEN_Z192_TX_BUF_LVL_DEF) ")");
+
+static int rxlvl = MEN_Z192_RX_BUF_LVL_DEF;
+module_param(rxlvl, int, S_IRUGO);
+MODULE_PARM_DESC(rxlvl, "RX IRQ trigger level (in frames) 0-254, default="
+__MODULE_STRING(MEN_Z192_RX_BUF_LVL_DEF) ")");
+


What impact does the level have on the latency? Could you please add some
comments.


It has a impact on the latency.
rxlvl = 0 -> if one frame got received, a IRQ will be generated
rxlvl = 254 -> if 255 frames got received, a IRQ will be generated


Well, what's your usecase for rxlvl > 0? For me it's not obvious what it 
can be good for. The application usually wants the message as soon as 
possible. Anyway, the default should be *0*. For RX and TX.



+static int rx_timeout = MEN_Z192_RX_TOUT_DEF;
+module_param(rx_timeout, int, S_IRUGO);
+MODULE_PARM_DESC(rx_timeout, "RX IRQ timeout (in 100usec steps), default="
+__MODULE_STRING(MEN_Z192_RX_TOUT_DEF) ")");


Ditto. What is "rx_timeout" good for.



The rx timeout is used im combination with the rxlvl to assert the
if the buffer level is not reached within this timeout.


What event will the application receive in case of a timeout.


Both, the timeout and the level are used to give the user as much
control over the latency and the IRQ handling as possible.
With this two options, the driver can be configured for different
use cases.

>

I will add this as the comment to make it more clear.


Even a bit more would be appreciated.


---snip---


+static int men_z192_read_frame(struct net_device *ndev, unsigned int frame_nr)
+{
+   struct net_device_stats *stats = >stats;
+   struct men_z192 *priv = netdev_priv(ndev);
+   struct men_z192_cf_buf __iomem *cf_buf;
+   struct can_frame *cf;
+   struct sk_buff *skb;
+   u32 cf_offset;
+   u32 length;
+   u32 data;
+   u32 id;
+
+   skb = alloc_can_skb(ndev, );
+   if (unlikely(!skb)) {
+   stats->rx_dropped++;
+   return 0;
+   }
+
+   cf_offset = sizeof(struct men_z192_cf_buf) * frame_nr;
+
+   cf_buf = priv->dev_base + MEN_Z192_RX_BUF_START + cf_offset;
+   length = readl(_buf->length) & MEN_Z192_CFBUF_LEN;
+   id = readl(_buf->can_id);
+
+   if (id & MEN_Z192_CFBUF_IDE) {
+   /* Extended frame */
+   cf->can_id = (id & MEN_Z192_CFBUF_ID1) >> 3;
+   cf->can_id |= (id & MEN_Z192_CFBUF_ID2) >>
+   MEN_Z192_CFBUF_ID2_SHIFT;
+
+   cf->can_id |= CAN_EFF_FLAG;
+
+   if (id & MEN_Z192_CFBUF_E_RTR)
+   cf->can_id |= CAN_RTR_FLAG;
+   } else {
+   /* Standard frame */
+   cf->can_id = (id & MEN_Z192_CFBUF_ID1) >>
+   MEN_Z192_CFBUF_ID1_SHIFT;
+
+   if (id & MEN_Z192_CFBUF_S_RTR)
+   cf->can_id |= CAN_RTR_FLAG;
+   }
+
+   cf->can_dlc = get_can_dlc(length);
+
+   /* remote transmission request frame
+* contains no data field even if the
+* data length is set to a value > 0
+*/
+   if (!(cf->can_id & CAN_RTR_FLAG)) {
+   if (cf->can_dlc > 0) {
+   data = readl(_buf->data[0]);
+   *(__be32 *)cf->data = cpu_to_be32(data);


Do you really need the extra copy?


+   }
+   if (cf->can_dlc > 4) {
+   data = readl(_buf->data[1]);
+

Re: [PATCH net] sctp: use event->chunk when it's valid

2016-08-08 Thread Marcelo Ricardo Leitner

On Sun, Aug 07, 2016 at 02:15:13PM +0800, Xin Long wrote:
> Commit 52253db924d1 ("sctp: also point GSO head_skb to the sk when
> it's available") used event->chunk->head_skb to get the head_skb in
> sctp_ulpevent_set_owner().
> 
> But at that moment, the event->chunk was NULL, as it cloned the skb
> in sctp_ulpevent_make_rcvmsg(). Therefore, that patch didn't really
> work.
> 
> This patch is to move the event->chunk initialization before calling
> sctp_ulpevent_receive_data() so that it uses event->chunk when it's
> valid.
> 
> Fixes: 52253db924d1 ("sctp: also point GSO head_skb to the sk when it's 
> available")
> Signed-off-by: Xin Long 

Acked-by: Marcelo Ricardo Leitner 

> ---
>  net/sctp/ulpevent.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sctp/ulpevent.c b/net/sctp/ulpevent.c
> index 1bc4f71..d85b803 100644
> --- a/net/sctp/ulpevent.c
> +++ b/net/sctp/ulpevent.c
> @@ -702,14 +702,14 @@ struct sctp_ulpevent *sctp_ulpevent_make_rcvmsg(struct 
> sctp_association *asoc,
>*/
>   sctp_ulpevent_init(event, 0, skb->len + sizeof(struct sk_buff));
>  
> - sctp_ulpevent_receive_data(event, asoc);
> -
>   /* And hold the chunk as we need it for getting the IP headers
>* later in recvmsg
>*/
>   sctp_chunk_hold(chunk);
>   event->chunk = chunk;
>  
> + sctp_ulpevent_receive_data(event, asoc);
> +
>   event->stream = ntohs(chunk->subh.data_hdr->stream);
>   event->ssn = ntohs(chunk->subh.data_hdr->ssn);
>   event->ppid = chunk->subh.data_hdr->ppid;
> -- 
> 2.1.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

1 2 >

1 - 100 of 126 matches

Mail list logo