Re: [net-next PATCH 3/3] qdisc: catch misconfig of attaching qdisc to tx_queue_len zero device

2016-11-07 Thread Jesper Dangaard Brouer

On Mon, 7 Nov 2016 22:14:37 -0800 Maciej Żenczykowski  
wrote:

> Just FYI:
> 
> I'm tangentially aware of internal Google code that:
> - expects a bonding device running HTB with non-zero txqueuelen
> - wants to remove HTB and get a noqueue interface (the normal default
> for bonding)
> 
> The code currently removes HTB, which gets us to mq, sets txqueuelen
> to 0, adds a pfifo, removes the pfifo, which gets us to noqueue.

This clearly shows that the older userspace interface, of tx_queue_len
having double meaning, was a mess!

> After this patch this would ?possibly? break (adding pfifo, would
> change txqueuelen, so when we remove it we wouldn't end up with
> noqueue).

No, you will still end-up with "noqueue".  It is now the flag
IFF_NO_QUEUE that determine if a device gets "noqueue" when the default
qdisc is attached. The tx_queue_len no longer have any effect on
getting "noqueue".  The IFF_NO_QUEUE system removed this double meaning
of tx_queue_len.


> From what I fuzzily recall, HTB with txquelelen == 0 drops traffic
> hard, while pfifo continues to function, hence the ordering...
> 
> Obviously our code can be fixed, but I'm worried there's a more
> generic backwards compatibility problem here.

It is good you bring it up, but I don't see a backwards compatibility
problem with your usage after the patchset.
 
> (note: this is mostly about 3.11 and 4.3 and might no longer be
> relevant with 4.10... maybe the new kernel's default qdisc selection
> logic doesn't depend on txqueuelen and checks the flag instead???)

If I were you, I would now implement a validation check that reported
the problem if not getting into the expected "noqueue" state.  Then
when you eventually upgrade to a more recent kernel, you would get
alerted of improper state.

Something like:

noqueue=$(ip link show dev $DEV 2> /dev/null | grep -q "noqueue" && echo 
"noqueue" || echo "bad")
if [[ "$noqueue" != "noqueue" ]]; then
echo "report-problem";
fi

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer


[PATCH v3] xen-netback: prefer xenbus_scanf() over xenbus_gather()

2016-11-07 Thread Jan Beulich
For single items being collected this should be preferred as being more
typesafe (as the compiler can check format string and to-be-written-to
variable match) and more efficient (requiring one less parameter to be
passed).

Signed-off-by: Jan Beulich 
---
v3: For consistency with other code don't consider zero an error
(utilizing that xenbus_scanf() at present won't return zero).
v2: Avoid commit message to continue from subject.
---
 drivers/net/xen-netback/xenbus.c |   12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

--- 4.9-rc4/drivers/net/xen-netback/xenbus.c
+++ 4.9-rc4-xen-netback-prefer-xenbus_scanf/drivers/net/xen-netback/xenbus.c
@@ -889,16 +889,16 @@ static int connect_ctrl_ring(struct back
unsigned int evtchn;
int err;
 
-   err = xenbus_gather(XBT_NIL, dev->otherend,
-   "ctrl-ring-ref", "%u", , NULL);
-   if (err)
+   err = xenbus_scanf(XBT_NIL, dev->otherend,
+  "ctrl-ring-ref", "%u", );
+   if (err < 0)
goto done; /* The frontend does not have a control ring */
 
ring_ref = val;
 
-   err = xenbus_gather(XBT_NIL, dev->otherend,
-   "event-channel-ctrl", "%u", , NULL);
-   if (err) {
+   err = xenbus_scanf(XBT_NIL, dev->otherend,
+  "event-channel-ctrl", "%u", );
+   if (err < 0) {
xenbus_dev_fatal(dev, err,
 "reading %s/event-channel-ctrl",
 dev->otherend);





Re: [PATCH] usbnet: prevent device rpm suspend in usbnet_probe function

2016-11-07 Thread Kai-Heng Feng
Hi,

On Mon, Nov 7, 2016 at 7:02 PM, Oliver Neukum  wrote:
> On Fri, 2016-11-04 at 17:57 +0800, Kai-Heng Feng wrote:
>> Sometimes cdc_mbim failed to probe if runtime pm is enabled:
>> [9.305626] cdc_mbim: probe of 2-2:1.12 failed with error -22
>>
>> This can be solved by increase its pm usage counter.
>>
>> Signed-off-by: Kai-Heng Feng 
>
> For the record:
>
> NAK. This fixes a symptom. If this patch helps something is broken in
> device core. We need to find that.
>

Please check attached dmesg with usbcore.dyndbg="+p".

Thanks!

> Regards
> Oliver
>
>


dmesg
Description: Binary data


RE: [PATCH net-next v6 02/10] dpaa_eth: add support for DPAA Ethernet

2016-11-07 Thread Madalin-Cristian Bucur
> -Original Message-
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Monday, November 07, 2016 5:55 PM
> 
> From: Madalin-Cristian Bucur 
> Date: Mon, 7 Nov 2016 15:43:26 +
> 
> >> From: David Miller [mailto:da...@davemloft.net]
> >> Sent: Thursday, November 03, 2016 9:58 PM
> >>
> >> Why?  By clearing this, you disallow an important fundamental way to do
> >> performane testing, via pktgen.
> >
> > The Tx path in DPAA requires one to insert a back-pointer to the skb
> into
> > the Tx buffer. On the Tx confirmation path the back-pointer in the
> buffer
> > is used to release the skb. If Tx buffer is shared we'd alter the back-
> pointer
> > and leak/double free skbs. See also
> 
> Then have your software state store an array of SKB pointers, one for each
> TX ring entry, just like every other driver does.

There is no Tx ring in DPAA. Frames are send out on QMan HW queues towards
the FMan for Tx and then received back on Tx confirmation queues for cleanup.
Array traversal would for sure cost more than using the back-pointer. Also,
we can now process confirmations on a different core than the one doing Tx,
we'd have to keep the arrays percpu and force the Tx conf on the same core.
Or add locks.

Madalin


[PATCH] igb: use igb_adapter->io_addr instead of e1000_hw->hw_addr

2016-11-07 Thread Cao jin
When running as guest, under certain condition, it will oops as following.
writel() in igb_configure_tx_ring() results in oops, because hw->hw_addr
is NULL. While other register access won't oops kernel because they use
wr32/rd32 which have a defense against NULL pointer.

[  141.225449] pcieport :00:1c.0: AER: Multiple Uncorrected (Fatal)
error received: id=0101
[  141.225523] igb :01:00.1: PCIe Bus Error:
severity=Uncorrected (Fatal), type=Unaccessible,
id=0101(Unregistered Agent ID)
[  141.299442] igb :01:00.1: broadcast error_detected message
[  141.300539] igb :01:00.0 enp1s0f0: PCIe link lost, device now
detached
[  141.351019] igb :01:00.1 enp1s0f1: PCIe link lost, device now
detached
[  143.465904] pcieport :00:1c.0: Root Port link has been reset
[  143.465994] igb :01:00.1: broadcast slot_reset message
[  143.466039] igb :01:00.0: enabling device ( -> 0002)
[  144.389078] igb :01:00.1: enabling device ( -> 0002)
[  145.312078] igb :01:00.1: broadcast resume message
[  145.322211] BUG: unable to handle kernel paging request at
3818
[  145.361275] IP: []
igb_configure_tx_ring+0x14d/0x280 [igb]
[  145.400048] PGD 0
[  145.438007] Oops: 0002 [#1] SMP

A similiar issue & solution could be found at:
http://patchwork.ozlabs.org/patch/689592/

Signed-off-by: Cao jin 
---
 drivers/net/ethernet/intel/igb/igb_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index edc9a6a..3f240ac 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -3390,7 +3390,7 @@ void igb_configure_tx_ring(struct igb_adapter *adapter,
 tdba & 0xULL);
wr32(E1000_TDBAH(reg_idx), tdba >> 32);
 
-   ring->tail = hw->hw_addr + E1000_TDT(reg_idx);
+   ring->tail = adapter->io_addr + E1000_TDT(reg_idx);
wr32(E1000_TDH(reg_idx), 0);
writel(0, ring->tail);
 
@@ -3729,7 +3729,7 @@ void igb_configure_rx_ring(struct igb_adapter *adapter,
 ring->count * sizeof(union e1000_adv_rx_desc));
 
/* initialize head and tail */
-   ring->tail = hw->hw_addr + E1000_RDT(reg_idx);
+   ring->tail = adapter->io_addr + E1000_RDT(reg_idx);
wr32(E1000_RDH(reg_idx), 0);
writel(0, ring->tail);
 
-- 
2.1.0





[PATCH v2 5/6] qedi: Add support for iSCSI session management.

2016-11-07 Thread Manish Rangankar
This patch adds support for iscsi_transport LLD Login,
Logout, NOP-IN/NOP-OUT, Async, Reject PDU processing
and Firmware async event handling support.

Signed-off-by: Nilesh Javali 
Signed-off-by: Adheer Chandravanshi 
Signed-off-by: Chad Dupuis 
Signed-off-by: Saurav Kashyap 
Signed-off-by: Arun Easi 
Signed-off-by: Manish Rangankar 
---
 drivers/scsi/qedi/qedi_fw.c| 1106 +++
 drivers/scsi/qedi/qedi_gbl.h   |   67 ++
 drivers/scsi/qedi/qedi_iscsi.c | 1611 
 drivers/scsi/qedi/qedi_iscsi.h |  232 ++
 drivers/scsi/qedi/qedi_main.c  |  166 +
 5 files changed, 3182 insertions(+)
 create mode 100644 drivers/scsi/qedi/qedi_fw.c
 create mode 100644 drivers/scsi/qedi/qedi_gbl.h
 create mode 100644 drivers/scsi/qedi/qedi_iscsi.c
 create mode 100644 drivers/scsi/qedi/qedi_iscsi.h

diff --git a/drivers/scsi/qedi/qedi_fw.c b/drivers/scsi/qedi/qedi_fw.c
new file mode 100644
index 000..5ee62a2
--- /dev/null
+++ b/drivers/scsi/qedi/qedi_fw.c
@@ -0,0 +1,1106 @@
+/*
+ * QLogic iSCSI Offload Driver
+ * Copyright (c) 2016 Cavium Inc.
+ *
+ * This software is available under the terms of the GNU General Public License
+ * (GPL) Version 2, available from the file COPYING in the main directory of
+ * this source tree.
+ */
+
+#include 
+#include 
+#include 
+
+#include "qedi.h"
+#include "qedi_iscsi.h"
+#include "qedi_gbl.h"
+
+static int qedi_send_iscsi_tmf(struct qedi_conn *qedi_conn,
+  struct iscsi_task *mtask);
+
+void qedi_iscsi_unmap_sg_list(struct qedi_cmd *cmd)
+{
+   struct scsi_cmnd *sc = cmd->scsi_cmd;
+
+   if (cmd->io_tbl.sge_valid && sc) {
+   cmd->io_tbl.sge_valid = 0;
+   scsi_dma_unmap(sc);
+   }
+}
+
+static void qedi_process_logout_resp(struct qedi_ctx *qedi,
+union iscsi_cqe *cqe,
+struct iscsi_task *task,
+struct qedi_conn *qedi_conn)
+{
+   struct iscsi_conn *conn = qedi_conn->cls_conn->dd_data;
+   struct iscsi_logout_rsp *resp_hdr;
+   struct iscsi_session *session = conn->session;
+   struct iscsi_logout_response_hdr *cqe_logout_response;
+   struct qedi_cmd *cmd;
+
+   cmd = (struct qedi_cmd *)task->dd_data;
+   cqe_logout_response = >cqe_common.iscsi_hdr.logout_response;
+   spin_lock(>back_lock);
+   resp_hdr = (struct iscsi_logout_rsp *)_conn->gen_pdu.resp_hdr;
+   memset(resp_hdr, 0, sizeof(struct iscsi_hdr));
+   resp_hdr->opcode = cqe_logout_response->opcode;
+   resp_hdr->flags = cqe_logout_response->flags;
+   resp_hdr->hlength = 0;
+
+   resp_hdr->itt = build_itt(cqe->cqe_solicited.itid, conn->session->age);
+   resp_hdr->statsn = cpu_to_be32(cqe_logout_response->stat_sn);
+   resp_hdr->exp_cmdsn = cpu_to_be32(cqe_logout_response->exp_cmd_sn);
+   resp_hdr->max_cmdsn = cpu_to_be32(cqe_logout_response->max_cmd_sn);
+
+   resp_hdr->t2wait = cpu_to_be32(cqe_logout_response->time2wait);
+   resp_hdr->t2retain = cpu_to_be32(cqe_logout_response->time2retain);
+
+   QEDI_INFO(>dbg_ctx, QEDI_LOG_TID,
+ "Freeing tid=0x%x for cid=0x%x\n",
+ cmd->task_id, qedi_conn->iscsi_conn_id);
+
+   if (likely(cmd->io_cmd_in_list)) {
+   cmd->io_cmd_in_list = false;
+   list_del_init(>io_cmd);
+   qedi_conn->active_cmd_count--;
+   } else {
+   QEDI_INFO(>dbg_ctx, QEDI_LOG_INFO,
+ "Active cmd list node already deleted, tid=0x%x, 
cid=0x%x, io_cmd_node=%p\n",
+ cmd->task_id, qedi_conn->iscsi_conn_id,
+ >io_cmd);
+   }
+
+   cmd->state = RESPONSE_RECEIVED;
+   qedi_clear_task_idx(qedi, cmd->task_id);
+   __iscsi_complete_pdu(conn, (struct iscsi_hdr *)resp_hdr, NULL, 0);
+
+   spin_unlock(>back_lock);
+}
+
+static void qedi_process_text_resp(struct qedi_ctx *qedi,
+  union iscsi_cqe *cqe,
+  struct iscsi_task *task,
+  struct qedi_conn *qedi_conn)
+{
+   struct iscsi_conn *conn = qedi_conn->cls_conn->dd_data;
+   struct iscsi_session *session = conn->session;
+   struct iscsi_task_context *task_ctx;
+   struct iscsi_text_rsp *resp_hdr_ptr;
+   struct iscsi_text_response_hdr *cqe_text_response;
+   struct qedi_cmd *cmd;
+   int pld_len;
+   u32 *tmp;
+
+   cmd = (struct qedi_cmd *)task->dd_data;
+   task_ctx = qedi_get_task_mem(>tasks, cmd->task_id);
+
+   cqe_text_response = >cqe_common.iscsi_hdr.text_response;
+   spin_lock(>back_lock);
+   resp_hdr_ptr =  (struct iscsi_text_rsp *)_conn->gen_pdu.resp_hdr;
+ 

[PATCH v2 6/6] qedi: Add support for data path.

2016-11-07 Thread Manish Rangankar
This patch adds support for data path and TMF handling.

Signed-off-by: Nilesh Javali 
Signed-off-by: Adheer Chandravanshi 
Signed-off-by: Chad Dupuis 
Signed-off-by: Saurav Kashyap 
Signed-off-by: Arun Easi 
Signed-off-by: Manish Rangankar 
---
 drivers/scsi/qedi/qedi_fw.c| 1272 
 drivers/scsi/qedi/qedi_gbl.h   |6 +
 drivers/scsi/qedi/qedi_iscsi.c |   13 +
 drivers/scsi/qedi/qedi_main.c  |4 +
 4 files changed, 1295 insertions(+)

diff --git a/drivers/scsi/qedi/qedi_fw.c b/drivers/scsi/qedi/qedi_fw.c
index 5ee62a2..560c3e6 100644
--- a/drivers/scsi/qedi/qedi_fw.c
+++ b/drivers/scsi/qedi/qedi_fw.c
@@ -146,6 +146,114 @@ static void qedi_process_text_resp(struct qedi_ctx *qedi,
spin_unlock(>back_lock);
 }
 
+static void qedi_tmf_resp_work(struct work_struct *work)
+{
+   struct qedi_cmd *qedi_cmd =
+   container_of(work, struct qedi_cmd, tmf_work);
+   struct qedi_conn *qedi_conn = qedi_cmd->conn;
+   struct qedi_ctx *qedi = qedi_conn->qedi;
+   struct iscsi_conn *conn = qedi_conn->cls_conn->dd_data;
+   struct iscsi_session *session = conn->session;
+   struct iscsi_tm_rsp *resp_hdr_ptr;
+   struct iscsi_cls_session *cls_sess;
+   int rval = 0;
+
+   set_bit(QEDI_CONN_FW_CLEANUP, _conn->flags);
+   resp_hdr_ptr =  (struct iscsi_tm_rsp *)qedi_cmd->tmf_resp_buf;
+   cls_sess = iscsi_conn_to_session(qedi_conn->cls_conn);
+
+   iscsi_block_session(session->cls_session);
+   rval = qedi_cleanup_all_io(qedi, qedi_conn, qedi_cmd->task, true);
+   if (rval) {
+   clear_bit(QEDI_CONN_FW_CLEANUP, _conn->flags);
+   qedi_clear_task_idx(qedi, qedi_cmd->task_id);
+   iscsi_unblock_session(session->cls_session);
+   return;
+   }
+
+   iscsi_unblock_session(session->cls_session);
+   qedi_clear_task_idx(qedi, qedi_cmd->task_id);
+
+   spin_lock(>back_lock);
+   __iscsi_complete_pdu(conn, (struct iscsi_hdr *)resp_hdr_ptr, NULL, 0);
+   spin_unlock(>back_lock);
+   kfree(resp_hdr_ptr);
+   clear_bit(QEDI_CONN_FW_CLEANUP, _conn->flags);
+}
+
+static void qedi_process_tmf_resp(struct qedi_ctx *qedi,
+ union iscsi_cqe *cqe,
+ struct iscsi_task *task,
+ struct qedi_conn *qedi_conn)
+
+{
+   struct iscsi_conn *conn = qedi_conn->cls_conn->dd_data;
+   struct iscsi_session *session = conn->session;
+   struct iscsi_tmf_response_hdr *cqe_tmp_response;
+   struct iscsi_tm_rsp *resp_hdr_ptr;
+   struct iscsi_tm *tmf_hdr;
+   struct qedi_cmd *qedi_cmd = NULL;
+   u32 *tmp;
+
+   cqe_tmp_response = >cqe_common.iscsi_hdr.tmf_response;
+
+   qedi_cmd = task->dd_data;
+   qedi_cmd->tmf_resp_buf = kzalloc(sizeof(*resp_hdr_ptr), GFP_KERNEL);
+   if (!qedi_cmd->tmf_resp_buf) {
+   QEDI_ERR(>dbg_ctx,
+"Failed to allocate resp buf, cid=0x%x\n",
+ qedi_conn->iscsi_conn_id);
+   return;
+   }
+
+   spin_lock(>back_lock);
+   resp_hdr_ptr =  (struct iscsi_tm_rsp *)qedi_cmd->tmf_resp_buf;
+   memset(resp_hdr_ptr, 0, sizeof(struct iscsi_tm_rsp));
+
+   /* Fill up the header */
+   resp_hdr_ptr->opcode = cqe_tmp_response->opcode;
+   resp_hdr_ptr->flags = cqe_tmp_response->hdr_flags;
+   resp_hdr_ptr->response = cqe_tmp_response->hdr_response;
+   resp_hdr_ptr->hlength = 0;
+
+   hton24(resp_hdr_ptr->dlength,
+  (cqe_tmp_response->hdr_second_dword &
+   ISCSI_TMF_RESPONSE_HDR_DATA_SEG_LEN_MASK));
+   tmp = (u32 *)resp_hdr_ptr->dlength;
+   resp_hdr_ptr->itt = build_itt(cqe->cqe_solicited.itid,
+ conn->session->age);
+   resp_hdr_ptr->statsn = cpu_to_be32(cqe_tmp_response->stat_sn);
+   resp_hdr_ptr->exp_cmdsn  = cpu_to_be32(cqe_tmp_response->exp_cmd_sn);
+   resp_hdr_ptr->max_cmdsn = cpu_to_be32(cqe_tmp_response->max_cmd_sn);
+
+   tmf_hdr = (struct iscsi_tm *)qedi_cmd->task->hdr;
+
+   if (likely(qedi_cmd->io_cmd_in_list)) {
+   qedi_cmd->io_cmd_in_list = false;
+   list_del_init(_cmd->io_cmd);
+   qedi_conn->active_cmd_count--;
+   }
+
+   if (((tmf_hdr->flags & ISCSI_FLAG_TM_FUNC_MASK) ==
+ ISCSI_TM_FUNC_LOGICAL_UNIT_RESET) ||
+   ((tmf_hdr->flags & ISCSI_FLAG_TM_FUNC_MASK) ==
+ ISCSI_TM_FUNC_TARGET_WARM_RESET) ||
+   ((tmf_hdr->flags & ISCSI_FLAG_TM_FUNC_MASK) ==
+ ISCSI_TM_FUNC_TARGET_COLD_RESET)) {
+   INIT_WORK(_cmd->tmf_work, qedi_tmf_resp_work);
+   queue_work(qedi->tmf_thread, _cmd->tmf_work);
+   

RE: [v15, 3/7] powerpc/fsl: move mpc85xx.h to include/linux/fsl

2016-11-07 Thread Y.B. Lu
Hi Arnd,


> -Original Message-
> From: Arnd Bergmann [mailto:a...@arndb.de]
> Sent: Tuesday, November 08, 2016 5:20 AM
> To: Y.B. Lu
> Cc: linuxppc-...@lists.ozlabs.org; linux-...@vger.kernel.org;
> ulf.hans...@linaro.org; Scott Wood; Mark Rutland; Greg Kroah-Hartman; X.B.
> Xie; M.H. Lian; linux-...@vger.kernel.org; linux-...@vger.kernel.org;
> Qiang Zhao; Russell King; Bhupesh Sharma; Joerg Roedel; Claudiu Manoil;
> devicet...@vger.kernel.org; Rob Herring; Santosh Shilimkar; linux-arm-
> ker...@lists.infradead.org; netdev@vger.kernel.org; linux-
> ker...@vger.kernel.org; Leo Li; io...@lists.linux-foundation.org; Kumar
> Gala
> Subject: Re: [v15, 3/7] powerpc/fsl: move mpc85xx.h to include/linux/fsl
> 
> On Monday, October 31, 2016 9:35:33 AM CET Y.B. Lu wrote:
> > >
> > > I don't see any of the contents of this header referenced by the soc
> > > driver any more. I think you can just drop this patch.
> > >
> >
> > [Lu Yangbo-B47093] This header file was included by guts.c.
> > The guts driver used macro SVR_MAJ/SVR_MIN for calculation.
> >
> > This header file was for powerpc arch before. And this patch is to
> > made it as common header file for both ARM and PPC.
> > Sooner or later this is needed.
> 
> Let's discuss it once we actually need the header then, ok?

[Lu Yangbo-B47093] As I said, this header file was included by guts.c in patch 
4.
The guts driver used macro SVR_MAJ/SVR_MIN for calculation which were defined 
in this header file.
Did you suggest we dropped this patch and just calculated them in driver?

Thanks :)


> 
>   Arnd


Re: [PATCH v4] Net Driver: Add Cypress GX3 VID=04b4 PID=3610.

2016-11-07 Thread Greg KH
On Mon, Nov 07, 2016 at 04:44:20PM -0600, Chris Roth wrote:
> From: Allan Chou 
> 
> Add support for Cypress GX3 SuperSpeed to Gigabit Ethernet
> Bridge Controller (Vendor=04b4 ProdID=3610).
> 
> Patch verified on x64 linux kernel 4.7.4, 4.8.6, 4.9-rc4 systems
> with the Kensington SD4600P USB-C Universal Dock with Power,
> which uses the Cypress GX3 SuperSpeed to Gigabit Ethernet Bridge
> Controller.
> 
> A similar patch was signed-off and tested-by Allan Chou
>  on 2015-12-01.
> 
> Allan verified his similar patch on x86 Linux kernel 4.1.6 system
> with Cypress GX3 SuperSpeed to Gigabit Ethernet Bridge Controller.
> 
> Tested-by: Allan Chou 
> Tested-by: Chris Roth 
> Tested-by: Artjom Simon 
> 
> Signed-off-by: Allan Chou 
> Signed-off-by: Chris Roth 
> ---
>  drivers/net/usb/ax88179_178a.c | 17 +
>  1 file changed, 17 insertions(+)
> 
> diff --git a/drivers/net/usb/ax88179_178a.c b/drivers/net/usb/ax88179_178a.c
> index e6338c1..8a6675d 100644
> --- a/drivers/net/usb/ax88179_178a.c
> +++ b/drivers/net/usb/ax88179_178a.c
> @@ -1656,6 +1656,19 @@ static const struct driver_info ax88178a_info = {
>  .tx_fixup = ax88179_tx_fixup,
>  };
> 
> +static const struct driver_info cypress_GX3_info = {
> +.description = "Cypress GX3 SuperSpeed to Gigabit Ethernet Controller",
> +.bind = ax88179_bind,
> +.unbind = ax88179_unbind,
> +.status = ax88179_status,
> +.link_reset = ax88179_link_reset,
> +.reset = ax88179_reset,
> +.stop = ax88179_stop,
> +.flags = FLAG_ETHER | FLAG_FRAMING_AX,
> +.rx_fixup = ax88179_rx_fixup,
> +.tx_fixup = ax88179_tx_fixup,
> +};

Your tabs got eaten and converted to spaces, making this patch
impossible to apply :(

And you forgot to list what changed from v3, please put that below the
--- line.

thanks,

greg k-h


Re: linux-next: manual merge of the net-next tree with the net tree

2016-11-07 Thread Cong Wang
On Mon, Nov 7, 2016 at 5:25 PM, Stephen Rothwell  wrote:
> Hi all,
>
> Today's linux-next merge of the net-next tree got a conflict in:
>
>   net/netlink/genetlink.c
>
> between commit:
>
>   00ffc1ba02d8 ("genetlink: fix a memory leak on error path")
>
> from the net tree and commit:
>
>   2ae0f17df1cd ("genetlink: use idr to track families")
>
> from the net-next tree.
>
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.

Looks good to me.

Thanks!


Re: [net-next PATCH 3/3] qdisc: catch misconfig of attaching qdisc to tx_queue_len zero device

2016-11-07 Thread Maciej Żenczykowski
Just FYI:

I'm tangentially aware of internal Google code that:
- expects a bonding device running HTB with non-zero txqueuelen
- wants to remove HTB and get a noqueue interface (the normal default
for bonding)

The code currently removes HTB, which gets us to mq, sets txqueuelen
to 0, adds a pfifo, removes the pfifo, which gets us to noqueue.

After this patch this would ?possibly? break (adding pfifo, would
change txqueuelen, so when we remove it we wouldn't end up with
noqueue).

>From what I fuzzily recall, HTB with txquelelen == 0 drops traffic
hard, while pfifo continues to function, hence the ordering...

Obviously our code can be fixed, but I'm worried there's a more
generic backwards compatibility problem here.

(note: this is mostly about 3.11 and 4.3 and might no longer be
relevant with 4.10... maybe the new kernel's default qdisc selection
logic doesn't depend on txqueuelen and checks the flag instead???)


Re: [Intel-wired-lan] [PATCH] igb: drop field "tail" of struct igb_ring

2016-11-07 Thread Cao jin



On 11/08/2016 12:12 PM, Alexander Duyck wrote:



On Monday, November 7, 2016, Cao jin > wrote:






We removed head because it isn't really accessed very often, it is only
really used for when the ring is configured.  Tail is accessed every
time we add a descriptor to a ring.  The pointer chasing from ring to
netdev to adapter to hw is expensive.  That is one of the rasons why
we cache the pointer to the tail register.


I see. I can submit the patch as you suggested.



Signed-off-by: Cao jin 
---
   drivers/net/ethernet/intel/igb/igb.h  |  1 -
   drivers/net/ethernet/intel/igb/igb_main.c | 16
+---




hw->hw_addr could be alterred to NULL(in igb_rd32), this is why
writel oops the kernel, you give a fine solution.

But from the oops message, we can find, register reading returns all
F's, I also have a question want to consult: when igb device is
reset, would reading register(no matter config space or non-PCIe
configuration registers) during reset returns all F's? (I guess this
is the core of my issue)


An all F's value means the read failed.  The device is likely off of the
bus and the hw_addr may not have been repopulated after the reset.

You might want to check the mailing list as I thought someone had
submitted a patch recently for one of the drivers to repopulate hw_addr
after a reset.



I guess you are saying this one:
  http://patchwork.ozlabs.org/patch/689592/

Seems they have a similar issue with me.


--
Yours Sincerely,

Cao jin




Re: [PATCH] net: alteon: acenic: use new api ethtool_{get|set}_link_ksettings

2016-11-07 Thread Jes Sorensen
On 11/05/16 11:17, Philippe Reynes wrote:
> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
> 
> Signed-off-by: Philippe Reynes 
> ---
>  drivers/net/ethernet/alteon/acenic.c |   65 ++---
>  1 files changed, 35 insertions(+), 30 deletions(-)

Nothing that sticks out to me

Acked-by: Jes Sorensen 

Jes


> diff --git a/drivers/net/ethernet/alteon/acenic.c 
> b/drivers/net/ethernet/alteon/acenic.c
> index a5c1e29..16f0c70 100644
> --- a/drivers/net/ethernet/alteon/acenic.c
> +++ b/drivers/net/ethernet/alteon/acenic.c
> @@ -429,14 +429,16 @@
>"acenic.c: v0.92 08/05/2002  Jes Sorensen, linux-ace...@sunsite.dk\n"
>"http://home.cern.ch/~jes/gige/acenic.html\n;;
>  
> -static int ace_get_settings(struct net_device *, struct ethtool_cmd *);
> -static int ace_set_settings(struct net_device *, struct ethtool_cmd *);
> +static int ace_get_link_ksettings(struct net_device *,
> +   struct ethtool_link_ksettings *);
> +static int ace_set_link_ksettings(struct net_device *,
> +   const struct ethtool_link_ksettings *);
>  static void ace_get_drvinfo(struct net_device *, struct ethtool_drvinfo *);
>  
>  static const struct ethtool_ops ace_ethtool_ops = {
> - .get_settings = ace_get_settings,
> - .set_settings = ace_set_settings,
>   .get_drvinfo = ace_get_drvinfo,
> + .get_link_ksettings = ace_get_link_ksettings,
> + .set_link_ksettings = ace_set_link_ksettings,
>  };
>  
>  static void ace_watchdog(struct net_device *dev);
> @@ -2579,43 +2581,44 @@ static int ace_change_mtu(struct net_device *dev, int 
> new_mtu)
>   return 0;
>  }
>  
> -static int ace_get_settings(struct net_device *dev, struct ethtool_cmd *ecmd)
> +static int ace_get_link_ksettings(struct net_device *dev,
> +   struct ethtool_link_ksettings *cmd)
>  {
>   struct ace_private *ap = netdev_priv(dev);
>   struct ace_regs __iomem *regs = ap->regs;
>   u32 link;
> + u32 supported;
>  
> - memset(ecmd, 0, sizeof(struct ethtool_cmd));
> - ecmd->supported =
> - (SUPPORTED_10baseT_Half | SUPPORTED_10baseT_Full |
> -  SUPPORTED_100baseT_Half | SUPPORTED_100baseT_Full |
> -  SUPPORTED_1000baseT_Half | SUPPORTED_1000baseT_Full |
> -  SUPPORTED_Autoneg | SUPPORTED_FIBRE);
> + memset(cmd, 0, sizeof(struct ethtool_link_ksettings));
>  
> - ecmd->port = PORT_FIBRE;
> - ecmd->transceiver = XCVR_INTERNAL;
> + supported = (SUPPORTED_10baseT_Half | SUPPORTED_10baseT_Full |
> +  SUPPORTED_100baseT_Half | SUPPORTED_100baseT_Full |
> +  SUPPORTED_1000baseT_Half | SUPPORTED_1000baseT_Full |
> +  SUPPORTED_Autoneg | SUPPORTED_FIBRE);
> +
> + cmd->base.port = PORT_FIBRE;
>  
>   link = readl(>GigLnkState);
> - if (link & LNK_1000MB)
> - ethtool_cmd_speed_set(ecmd, SPEED_1000);
> - else {
> + if (link & LNK_1000MB) {
> + cmd->base.speed = SPEED_1000;
> + } else {
>   link = readl(>FastLnkState);
>   if (link & LNK_100MB)
> - ethtool_cmd_speed_set(ecmd, SPEED_100);
> + cmd->base.speed = SPEED_100;
>   else if (link & LNK_10MB)
> - ethtool_cmd_speed_set(ecmd, SPEED_10);
> + cmd->base.speed = SPEED_10;
>   else
> - ethtool_cmd_speed_set(ecmd, 0);
> + cmd->base.speed = 0;
>   }
>   if (link & LNK_FULL_DUPLEX)
> - ecmd->duplex = DUPLEX_FULL;
> + cmd->base.duplex = DUPLEX_FULL;
>   else
> - ecmd->duplex = DUPLEX_HALF;
> + cmd->base.duplex = DUPLEX_HALF;
>  
>   if (link & LNK_NEGOTIATE)
> - ecmd->autoneg = AUTONEG_ENABLE;
> + cmd->base.autoneg = AUTONEG_ENABLE;
>   else
> - ecmd->autoneg = AUTONEG_DISABLE;
> + cmd->base.autoneg = AUTONEG_DISABLE;
>  
>  #if 0
>   /*
> @@ -2626,13 +2629,15 @@ static int ace_get_settings(struct net_device *dev, 
> struct ethtool_cmd *ecmd)
>   ecmd->txcoal = readl(>TuneTxCoalTicks);
>   ecmd->rxcoal = readl(>TuneRxCoalTicks);
>  #endif
> - ecmd->maxtxpkt = readl(>TuneMaxTxDesc);
> - ecmd->maxrxpkt = readl(>TuneMaxRxDesc);
> +
> + ethtool_convert_legacy_u32_to_link_mode(cmd->link_modes.supported,
> + supported);
>  
>   return 0;
>  }
>  
> -static int ace_set_settings(struct net_device *dev, struct ethtool_cmd *ecmd)
> +static int ace_set_link_ksettings(struct net_device *dev,
> +   const struct ethtool_link_ksettings *cmd)
>  {
>   struct ace_private *ap = netdev_priv(dev);
>   struct ace_regs __iomem *regs = 

net/sunrpc/clnt.c:2773 suspicious rcu_dereference_check() usage!

2016-11-07 Thread Ross Zwisler
I've got a virtual machine that has some NFS mounts, and with a newly compiled
kernel based on v4.9-rc3 I see the following warning/info message:

[   42.750181] ===
[   42.750192] [ INFO: suspicious RCU usage. ]
[   42.750203] 4.9.0-rc3-2-g7b6e7de #3 Not tainted
[   42.750213] ---
[   42.750225] net/sunrpc/clnt.c:2773 suspicious rcu_dereference_check() usage!
[   42.750235] 
[   42.750235] other info that might help us debug this:
[   42.750235] 
[   42.750246] 
[   42.750246] rcu_scheduler_active = 1, debug_locks = 0
[   42.750257] 1 lock held by mount.nfs4/6440:
[   42.750278]  #0: 
[   42.750299]  (
[   42.750319] &(>nfs_client_lock)->rlock
[   42.750340] ){+.+...}
[   42.750362] , at: 
[   42.750372] [] nfs_get_client+0x105/0x5e0
[   42.750383] 
[   42.750383] stack backtrace:
[   42.750394] CPU: 0 PID: 6440 Comm: mount.nfs4 Not tainted 
4.9.0-rc3-2-g7b6e7de #3
[   42.750406] Hardware name: Intel Corporation PURLEY/PURLEY, BIOS 
PLYDCRB1.MBH.0096.D23.1608240105 08/24/2016
[   42.750429]  c992fa68 8150730f 88014ec8da40 
0001
[   42.750452]  c992fa98 810bc3f7 880150b0b228 
88015068dbb0
[   42.750475]  c992fb38 88014fc99180 c992fac0 
81b243e5
[   42.750486] Call Trace:
[   42.750498]  [] dump_stack+0x67/0x98
[   42.750511]  [] lockdep_rcu_suspicious+0xe7/0x120
[   42.750524]  [] rpc_clnt_xprt_switch_has_addr+0x115/0x150
[   42.750536]  [] nfs_get_client+0x244/0x5e0
[   42.750549]  [] ? nfs_get_client+0xfc/0x5e0
[   42.750561]  [] nfs4_set_client+0x98/0x130
[   42.750574]  [] nfs4_create_server+0x13e/0x390
[   42.750588]  [] nfs4_remote_mount+0x2e/0x60
[   42.750600]  [] mount_fs+0x39/0x170
[   42.750614]  [] vfs_kern_mount+0x6b/0x150
[   42.750626]  [] ? nfs_do_root_mount+0x3c/0xc0
[   42.750639]  [] nfs_do_root_mount+0x86/0xc0
[   42.750652]  [] nfs4_try_mount+0x44/0xc0
[   42.750664]  [] ? get_nfs_version+0x27/0x90
[   42.750677]  [] nfs_fs_mount+0x4ac/0xd80
[   42.750689]  [] ? lockdep_init_map+0x88/0x1f0
[   42.750701]  [] ? nfs_clone_super+0x130/0x130
[   42.750713]  [] ? param_set_portnr+0x70/0x70
[   42.750726]  [] mount_fs+0x39/0x170
[   42.750740]  [] vfs_kern_mount+0x6b/0x150
[   42.750752]  [] do_mount+0x1f1/0xd10
[   42.750765]  [] ? copy_mount_options+0xa1/0x140
[   42.750777]  [] SyS_mount+0x83/0xd0
[   42.750790]  [] do_syscall_64+0x5c/0x130
[   42.750802]  [] entry_SYSCALL64_slow_path+0x25/0x25

This rcu_dereference_check() was introduced by the following commit:

commit 39e5d2df959dd4aea81fa33d765d2a5cc67a0512
Author: Andy Adamson 
Date:   Fri Sep 9 09:22:25 2016 -0400

SUNRPC search xprt switch for sockaddr

Signed-off-by: Andy Adamson 
Signed-off-by: Anna Schumaker 

Thanks,
- Ross


Re: Is there a maximum bytes in flight limitation in the tcp stack?

2016-11-07 Thread Yuchung Cheng
On Thu, Nov 3, 2016 at 9:37 AM, De Schepper, Koen (Nokia - BE)
 wrote:
>
> Hi,
>
> We experience some limit on the maximum packets in flight which seem not to 
> be related with the receive or write buffers. Does somebody know if there is 
> an issue with a maximum of around 1MByte (or sometimes 2Mbyte) of data in 
> flight per TCP flow?

does not ring a bell. I've definitely see cubic reaching >2MB cwnd (inflight)
some packet trace will help.

btw, tcp_rmem is the maximum receive buffer including all header and
control overhead. the receive window announced is (very roughly) half
of your rcvbuf.

>
> It seems to be a strict and stable limit independent from the CC (tested with 
> Cubic, Reno and DCTCP). On a link of 200Mbps and 200ms RTT our link is only 
> 20% (sometimes 40%, see conditions below) utilized for a single TCP flow with 
> no drop experienced at all (no bottleneck in the AQM or RTT emulation, as it 
> supports more throughput if multiple flows are active).
>
> Some configuration changes we already tried on both client and server (kernel 
> 3.18.9):
>
> net.ipv4.tcp_no_metrics_save = 1
> net.ipv4.tcp_rmem = 4096 87380 6291456
> net.ipv4.tcp_wmem = 4096 16384 4194304
>
> SERVER# ss -i
> tcpESTAB  0  1049728  10.187.255.211:46642 10.187.16.194:ssh
>  dctcp wscale:7,7 rto:408 rtt:204.333/0.741 ato:40 mss:1448 cwnd:1466 
> send 83.1Mbps unacked:728 rcv_rtt:212 rcv_space:29200
> CLIENT# ss -i
> tcpESTAB  0  288  10.187.16.194:ssh  10.187.255.211:46642
>  dctcp wscale:7,7 rto:404 rtt:203.389/0.213 ato:40 mss:1448 cwnd:78 
> send 4.4Mbps unacked:8 rcv_rtt:204 rcv_space:1074844
>
> When increasing the write and receive mem further (they were already way 
> above 1 or 2 MB) it steps to double (40%; 2Mbytes in flight):
> net.ipv4.tcp_no_metrics_save = 1
> net.ipv4.tcp_rmem = 4096 800 16291456
> net.ipv4.tcp_wmem = 4096 800 16291456
>
> SERVER # ss -i
> tcpESTAB  0  2068976  10.187.255.212:54637 10.187.16.112:ssh
>  cubic wscale:8,8 rto:404 rtt:202.622/0.061 ato:40 mss:1448 cwnd:1849 
> ssthresh:1140 send 105.7Mbps unacked:1457 rcv_rtt:217.5 rcv_space:29200
> CLIENT# ss -i
> tcpESTAB  0  648  10.187.16.112:ssh  10.187.255.212:54637
>  cubic wscale:8,8 rto:404 rtt:201.956/0.038 ato:40 mss:1448 cwnd:132 
> send 7.6Mbps unacked:18 rcv_rtt:204 rcv_space:2093044
>
> Further increasing (x10) does not help anymore...
> net.ipv4.tcp_no_metrics_save = 1
> net.ipv4.tcp_rmem = 4096 8000 162914560
> net.ipv4.tcp_wmem = 4096 8000 162914560
>
> As all these parameters autotune, it is hard to find out which one is 
> limiting... In the examples, above unacked does not want to go higher, while 
> congestion window in the server is big enough... rcv_space could be limiting, 
> but it tunes up if I change the server with the higher buffers (switching to 
> 2MByte in flight).
>
> We also tried tcp_limit_output_bytes, setting it bigger (x10) and 
> smaller(/10), without effect. We've put it in /etc/sysctl.conf and rebooted, 
> to make sure that it is effective.
>
> Some more detailed tests that had an effect on the 1 or 2MByte:
> - It seems that with TSO off, if we configure a bigger wmem buffer, an 
> ongoing flow suddenly is able to immediately double its bytes in flight 
> limit. We configured further up to more than 10x the buffer, but no further 
> increase helps, and the limits we saw are only 1MByte and 2Mbyte (no 
> intermediate values depending on any parameter). When setting tcp_wmem 
> smaller again, the 2MByte limit stays on the ongoing flow. We have to restart 
> the flow to make the buffer reduction to 1MByte effective.
> - With TSO on, only the 2MByte limit is effective, independent from the wmem 
> buffer. We have to restart the flow to make a tso change effective.
>
> Koen.
>


Re: [PATCH net 1/1] driver: macvlan: Destroy new macvlan port if macvlan_common_newlink failed.

2016-11-07 Thread Gao Feng
Hi David,

On Tue, Nov 8, 2016 at 9:33 AM, David Miller  wrote:
> From: f...@ikuai8.com
> Date: Fri,  4 Nov 2016 10:28:49 +0800
>
>> From: Gao Feng 
>>
>> When there is no existing macvlan port in lowdev, one new macvlan port
>> would be created. But it doesn't be destoried when something failed later.
>> It casues some memleak.
>>
>> Now add one flag to indicate if new macvlan port is created.
>>
>> Signed-off-by: Gao Feng 
>
> You need to be more patient, it sometimes take several days before
> your get patch reviewed or applied.  Sometimes nobody reviews a change
> for some time because it is obscure or everyone is busy.

Sorry, it is my fault.
I thought this patch may be lost because there are too many emails every day.
So ping just as a reminder.

>
> All patches are tracked in patchwork, so it is never an issue of a
> change getting "lost".  Therefore, it never makes sense to ping the
> list again and ask if a change is "ok".
>
> Personally, when people ping like that, it makes me want to review
> that patch _less_ not more.  So please do not do it.
>
> Thank you.
>

Now I get it. I violated some rules as a fresh man in kernel community.
Sorry again.


Best Regards
Feng




Re: Why are IPv6 host and anycast routes referencing lo device?

2016-11-07 Thread David Ahern
On 11/7/16 7:26 PM, YOSHIFUJI Hideaki wrote:
> Once I tried I did not work.
> You could try again to see what happens.

I did and both worked fine in quick POC testing. I'll do more in-depth testing 
and send a patch. Thanks.


Re: [Intel-wired-lan] [PATCH] igb: drop field "tail" of struct igb_ring

2016-11-07 Thread Cao jin



On 11/08/2016 02:49 AM, Alexander Duyck wrote:

On Mon, Nov 7, 2016 at 4:44 AM, Cao jin  wrote:

Under certain condition, I find guest will oops on writel() in
igb_configure_tx_ring(), because hw->hw_address is NULL. While other
register access won't oops kernel because they use wr32/rd32 which have
a defense against NULL pointer. The oops message are as following:

 [  141.225449] pcieport :00:1c.0: AER: Multiple Uncorrected (Fatal)
 error received: id=0101
 [  141.225523] igb :01:00.1: PCIe Bus Error: severity=Uncorrected
 (Fatal), type=Unaccessible, id=0101(Unregistered Agent ID)
 [  141.299442] igb :01:00.1: broadcast error_detected message
 [  141.300539] igb :01:00.0 enp1s0f0: PCIe link lost, device now
 detached
 [  141.351019] igb :01:00.1 enp1s0f1: PCIe link lost, device now
 detached
 [  143.465904] pcieport :00:1c.0: Root Port link has been reset
 [  143.465994] igb :01:00.1: broadcast slot_reset message
 [  143.466039] igb :01:00.0: enabling device ( -> 0002)
 [  144.389078] igb :01:00.1: enabling device ( -> 0002)
 [  145.312078] igb :01:00.1: broadcast resume message
 [  145.322211] BUG: unable to handle kernel paging request at
 3818
 [  145.361275] IP: [] igb_configure_tx_ring+0x14d/0x280 
[igb]
 [  145.438007] Oops: 0002 [#1] SMP

On the other hand, commit 238ac817 does some optimization which
dropped the field "head". So I think it is time to drop "tail" as well.


There is a bug here, but removing tail isn't the fix.



Yes, totally agree with you. The oops issue just drive me find that 
"tail" may should be dropped as "head", at least won't oops kernel when 
this driver's bug come out.


Couldn't we remove "tail"? Removed "head" while left "tail" untouched 
seems weird, and all the other register's access go via rd32/wr32, 
except "tail", it also seems weird, isn't it?



Signed-off-by: Cao jin 
---
  drivers/net/ethernet/intel/igb/igb.h  |  1 -
  drivers/net/ethernet/intel/igb/igb_main.c | 16 +---
  2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb.h 
b/drivers/net/ethernet/intel/igb/igb.h
index d11093d..0df06bc 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -247,7 +247,6 @@ struct igb_ring {
 };
 void *desc; /* descriptor ring memory */
 unsigned long flags;/* ring specific flags */
-   void __iomem *tail; /* pointer to ring tail register */
 dma_addr_t dma; /* phys address of the ring */
 unsigned int  size; /* length of desc. ring in bytes */

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index edc9a6a..e177d0e 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -3390,9 +3390,8 @@ void igb_configure_tx_ring(struct igb_adapter *adapter,
  tdba & 0xULL);
 wr32(E1000_TDBAH(reg_idx), tdba >> 32);

-   ring->tail = hw->hw_addr + E1000_TDT(reg_idx);


This line is where the bug is.  This should be adapter->io_addr, not
hw->hw_addr.


hw->hw_addr could be alterred to NULL(in igb_rd32), this is why writel 
oops the kernel, you give a fine solution.


But from the oops message, we can find, register reading returns all 
F's, I also have a question want to consult: when igb device is reset, 
would reading register(no matter config space or non-PCIe configuration 
registers) during reset returns all F's? (I guess this is the core of my 
issue)





 wr32(E1000_TDH(reg_idx), 0);
-   writel(0, ring->tail);
+wr32(E1000_TDT(reg_idx), 0);

 txdctl |= IGB_TX_PTHRESH;
 txdctl |= IGB_TX_HTHRESH << 8;
@@ -3729,9 +3728,8 @@ void igb_configure_rx_ring(struct igb_adapter *adapter,
  ring->count * sizeof(union e1000_adv_rx_desc));

 /* initialize head and tail */
-   ring->tail = hw->hw_addr + E1000_RDT(reg_idx);


Same thing here.  It looks like the wrong values where used.


 wr32(E1000_RDH(reg_idx), 0);
-   writel(0, ring->tail);
+   wr32(E1000_RDT(reg_idx), 0);

 /* set descriptor configuration */


Would you prefer to submit the patch for this or should I?  Basically
all you need to do is change the two lines where ring->tail is
populated so that you use adapter->io_addr instead of hw->hw_addr.

Thanks.

- Alex


.



--
Yours Sincerely,

Cao jin




Re: Why are IPv6 host and anycast routes referencing lo device?

2016-11-07 Thread YOSHIFUJI Hideaki
Hi,

David Ahern wrote:
> 
> Can anyone explain why host routes and anycast routes for IPv6 are added with 
> the device set to loopback versus the device with the address:
> 
> local ::1 dev lo  proto none  metric 0  pref medium
> local 2000:1:: dev lo  proto none  metric 0  pref medium
> local 2000:1::3 dev lo  proto none  metric 0  pref medium
> local 2100:2:: dev lo  proto none  metric 0  pref medium
> local 2100:2::3 dev lo  proto none  metric 0  pref medium
> 
> 
> This behavior differs from IPv4 where host routes use the device with the 
> address:
> 
> broadcast 10.1.1.0 dev eth0  proto kernel  scope link  src 10.1.1.3
> local 10.1.1.3 dev eth0  proto kernel  scope host  src 10.1.1.3
> broadcast 10.1.1.255 dev eth0  proto kernel  scope link  src 10.1.1.3
> broadcast 10.100.2.0 dev eth2  proto kernel  scope link  src 10.100.2.3
> local 10.100.2.3 dev eth2  proto kernel  scope host  src 10.100.2.3
> broadcast 10.100.2.255 dev eth2  proto kernel  scope link  src 10.100.2.3
> 
> The use of loopback pre-dates the git history, so wondering if someone 
> recalls the reason why. We would like to change that to make it consistent 
> with IPv4 - with a sysctl to maintain backwards compatibility.

Once I tried I did not work.
You could try again to see what happens.

--yoshfuji

> 
> Thanks,
> David
> 

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION


linux-next: manual merge of the rdma-leon-test tree with the net-next tree

2016-11-07 Thread Stephen Rothwell
Hi Leon,

Today's linux-next merge of the rdma-leon-test tree got a conflict in:

  drivers/infiniband/core/roce_gid_mgmt.c

between commit:

  453d39329ad0 ("IB/core: Flip to the new dev walk API")

from the net-next tree and commit:

  e4b4d6b5d8c2 ("IB/core: Remove debug prints after allocation failure")

from the rdma-leon-test tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/infiniband/core/roce_gid_mgmt.c
index 3a64a0881882,c86ddcea7675..
--- a/drivers/infiniband/core/roce_gid_mgmt.c
+++ b/drivers/infiniband/core/roce_gid_mgmt.c
@@@ -437,28 -434,6 +434,26 @@@ static void callback_for_addr_gid_devic
  >gid_attr);
  }
  
 +struct upper_list {
 +  struct list_head list;
 +  struct net_device *upper;
 +};
 +
 +static int netdev_upper_walk(struct net_device *upper, void *data)
 +{
 +  struct upper_list *entry = kmalloc(sizeof(*entry), GFP_ATOMIC);
 +  struct list_head *upper_list = data;
 +
-   if (!entry) {
-   pr_info("roce_gid_mgmt: couldn't allocate entry to delete 
ndev\n");
++  if (!entry)
 +  return 0;
-   }
 +
 +  list_add_tail(>list, upper_list);
 +  dev_hold(upper);
 +  entry->upper = upper;
 +
 +  return 0;
 +}
 +
  static void handle_netdev_upper(struct ib_device *ib_dev, u8 port,
void *cookie,
void (*handle_netdev)(struct ib_device *ib_dev,


Re: [lkp] [net] af1fee9821: BUG:spinlock_trylock_failure_on_UP_on_CPU

2016-11-07 Thread Ye Xiaolong
On 11/07, Allan W. Nielsen wrote:
>Hi,
>
>I tried to get this "lkp" up and running, but I had some troubles gettting
>these scripts to work.

Hi, Allan

Could you tell us what troubles you have met when trying the "lkp qemu"
tool, it would be better if you could paste some log so we can help to
improve it.

Thanks,
Xiaolong

>
>But it seems like it can be reproduced using th eprovided config file, and 
>qemu.



Re: [net PATCH] fib_trie: Correct /proc/net/route off by one error

2016-11-07 Thread David Miller
From: Alexander Duyck 
Date: Fri, 04 Nov 2016 15:11:57 -0400

> The display of /proc/net/route has had a couple issues due to the fact that
> when I originally rewrote most of fib_trie I made it so that the iterator
> was tracking the next value to use instead of the current.
> 
> In addition it had an off by 1 error where I was tracking the first piece
> of data as position 0, even though in reality that belonged to the
> SEQ_START_TOKEN.
> 
> This patch updates the code so the iterator tracks the last reported
> position and key instead of the next expected position and key.  In
> addition it shifts things so that all of the leaves start at 1 instead of
> trying to report leaves starting with offset 0 as being valid.  With these
> two issues addressed this should resolve any off by one errors that were
> present in the display of /proc/net/route.
> 
> Fixes: 25b97c016b26 ("ipv4: off-by-one in continuation handling in 
> /proc/net/route")
> Cc: Andy Whitcroft 
> Reported-by: Jason Baron 
> Signed-off-by: Alexander Duyck 

Applied and queued up for -stable.


Re: [PATCH] Documentation: networking: dsa: Update tagging protocols

2016-11-07 Thread David Miller
From: Fabian Mewes 
Date: Fri,  4 Nov 2016 13:16:14 +0100

> Add Qualcomm QCA tagging introduced in cafdc45c9 to the
> list of supported protocols.
> 
> Signed-off-by: Fabian Mewes 

Applied.


Re: [PATCH net-next v3] cadence: Add LSO support.

2016-11-07 Thread David Miller
From: Rafal Ozieblo 
Date: Fri, 4 Nov 2016 11:40:18 +

> + if (IPPROTO_UDP == (((struct iphdr 
> *)skb_network_header(skb))->protocol))

This is simply "ip_hdr(skb)->protocol", please use it everywhere you
have this ugly cast thing in this change.

Thanks.


Re: [PATCH] virtio-net: drop legacy features in virtio 1 mode

2016-11-07 Thread David Miller
From: "Michael S. Tsirkin" 
Date: Fri, 4 Nov 2016 12:55:36 +0200

> Virtio 1.0 spec says VIRTIO_F_ANY_LAYOUT and VIRTIO_NET_F_GSO are
> legacy-only feature bits. Do not negotiate them in virtio 1 mode.  Note
> this is a spec violation so we need to backport it to stable/downstream
> kernels.
> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Michael S. Tsirkin 

Applied.


Re: [PATCH net 1/1] driver: macvlan: Destroy new macvlan port if macvlan_common_newlink failed.

2016-11-07 Thread David Miller
From: f...@ikuai8.com
Date: Fri,  4 Nov 2016 10:28:49 +0800

> From: Gao Feng 
> 
> When there is no existing macvlan port in lowdev, one new macvlan port
> would be created. But it doesn't be destoried when something failed later.
> It casues some memleak.
> 
> Now add one flag to indicate if new macvlan port is created.
> 
> Signed-off-by: Gao Feng 

You need to be more patient, it sometimes take several days before
your get patch reviewed or applied.  Sometimes nobody reviews a change
for some time because it is obscure or everyone is busy.

All patches are tracked in patchwork, so it is never an issue of a
change getting "lost".  Therefore, it never makes sense to ping the
list again and ask if a change is "ok".

Personally, when people ping like that, it makes me want to review
that patch _less_ not more.  So please do not do it.

Thank you.



Re: [PATCH] net: icmp6_send should use dst dev to determine L3 domain

2016-11-07 Thread David Miller
From: David Ahern 
Date: Thu,  3 Nov 2016 16:17:26 -0700

> icmp6_send is called in response to some event. The skb may not have
> the device set (skb->dev is NULL), but it is expected to have a dst set.
> Update icmp6_send to use the dst on the skb to determine L3 domain.
> 
> Fixes: ca254490c8dfd ("net: Add VRF support to IPv6 stack")
> Signed-off-by: David Ahern 

Applied and queued up for -stable, thanks David.


Re: [PATCH net-next] sock: do not set sk_err in sock_dequeue_err_skb

2016-11-07 Thread David Miller
From: Soheil Hassas Yeganeh 
Date: Thu,  3 Nov 2016 18:24:27 -0400

> From: Soheil Hassas Yeganeh 
> 
> Do not set sk_err when dequeuing errors from the error queue.
> Doing so results in:
> a) Bugs: By overwriting existing sk_err values, it possibly
>hides legitimate errors. It is also incorrect when local
>errors are queued with ip_local_error. That happens in the
>context of a system call, which already returns the error
>code.
> b) Inconsistent behavior: When there are pending errors on
>the error queue, sk_err is sometimes 0 (e.g., for
>the first timestamp on the error queue) and sometimes
>set to an error code (after dequeuing the first
>timestamp).
> c) Suboptimality: Setting sk_err to ENOMSG on simple
>TX timestamps can abort parallel reads and writes.
> 
> Removing this line doesn't break userspace. This is because
> userspace code cannot rely on sk_err for detecting whether
> there is something on the error queue. Except for ICMP messages
> received for UDP and RAW, sk_err is not set at enqueue time,
> and as a result sk_err can be 0 while there are plenty of
> errors on the error queue.
> 
> For ICMP packets in UDP and RAW, sk_err is set when they are
> enqueued on the error queue, but that does not result in aborting
> reads and writes. For such cases, sk_err is only readable via
> getsockopt(SO_ERROR) which will reset the value of sk_err on
> its own. More importantly, prior to this patch,
> recvmsg(MSG_ERRQUEUE) has a race on setting sk_err (i.e.,
> sk_err is set by sock_dequeue_err_skb without atomic ops or
> locks) which can store 0 in sk_err even when we have ICMP
> messages pending. Removing this line from sock_dequeue_err_skb
> eliminates that race.
> 
> Signed-off-by: Soheil Hassas Yeganeh 
> Signed-off-by: Eric Dumazet 
> Signed-off-by: Willem de Bruijn 
> Signed-off-by: Neal Cardwell 

Ok, applied.


linux-next: manual merge of the net-next tree with the net tree

2016-11-07 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the net-next tree got a conflict in:

  net/netlink/genetlink.c

between commit:

  00ffc1ba02d8 ("genetlink: fix a memory leak on error path")

from the net tree and commit:

  2ae0f17df1cd ("genetlink: use idr to track families")

from the net-next tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc net/netlink/genetlink.c
index 49c28e8ef01b,bbd3bff885a1..
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@@ -402,11 -360,17 +360,17 @@@ int genl_register_family(struct genl_fa
} else
family->attrbuf = NULL;
  
+   family->id = idr_alloc(_fam_idr, family,
+  start, end + 1, GFP_KERNEL);
+   if (family->id < 0) {
+   err = family->id;
 -  goto errout_locked;
++  goto errout_free;
+   }
+ 
err = genl_validate_assign_mc_groups(family);
if (err)
-   goto errout_free;
+   goto errout_remove;
  
-   list_add_tail(>family_list, genl_family_chain(family->id));
genl_unlock_all();
  
/* send all events */
@@@ -417,14 -381,13 +381,15 @@@
  
return 0;
  
+ errout_remove:
+   idr_remove(_fam_idr, family->id);
 +errout_free:
 +  kfree(family->attrbuf);
  errout_locked:
genl_unlock_all();
- errout:
return err;
  }
- EXPORT_SYMBOL(__genl_register_family);
+ EXPORT_SYMBOL(genl_register_family);
  
  /**
   * genl_unregister_family - unregister generic netlink family


Re: [PATCH] vxlan: hide unused local variable

2016-11-07 Thread David Miller
From: Pravin Shelar 
Date: Mon, 7 Nov 2016 16:25:54 -0800

> On Mon, Nov 7, 2016 at 2:21 PM, Arnd Bergmann  wrote:
>> On Monday, November 7, 2016 2:16:30 PM CET Pravin Shelar wrote:
>>> On Monday, November 7, 2016, Arnd Bergmann  wrote:
>>>
>>> > A bugfix introduced a harmless warning in v4.9-rc4:
>>> >
>>> > drivers/net/vxlan.c: In function 'vxlan_group_used':
>>> > drivers/net/vxlan.c:947:21: error: unused variable 'sock6'
>>> > [-Werror=unused-variable]
>>> >
>>> > This hides the variable inside of the same #ifdef that is
>>> > around its user. The extraneous initialization is removed
>>> > at the same time, it was accidentally introduced in the
>>> > same commit.
>>> >
>>> > Fixes: c6fcc4fc5f8b ("vxlan: avoid using stale vxlan socket.")
>>> > Signed-off-by: Arnd Bergmann >
>>> > ---
>>>
>>>
>>> I have already submitted patch to fix this issue.
>>>
>>> https://patchwork.ozlabs.org/patch/691588/
>>
>> You have tagged those seven patches for net-next which seems
>> appropriate, but as I wrote above the commit that introduced
>> it was merged between -rc3 and -rc4, so I think we still need a
>> fix for v4.9, right?
>>
> 
> This is not actual bug, So I am not sure if we need the fix for net branch.

Really ugly warnings like this really need to be fixed in 'net'.


RE: [PATCH] igb/e1000: correct register comments

2016-11-07 Thread Brown, Aaron F
> From: netdev-ow...@vger.kernel.org [mailto:netdev-
> ow...@vger.kernel.org] On Behalf Of Cao jin
> Sent: Wednesday, November 2, 2016 12:20 AM
> To: linux-ker...@vger.kernel.org; netdev@vger.kernel.org
> Cc: intel-wired-...@lists.osuosl.org; Kirsher, Jeffrey T
> 
> Subject: [PATCH] igb/e1000: correct register comments
> 
> Signed-off-by: Cao jin 
> ---
>  drivers/net/ethernet/intel/igb/e1000_regs.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Tested-by: Aaron Brown 


Re: [net-next PATCH 0/3] qdisc and tx_queue_len cleanups for IFF_NO_QUEUE devices

2016-11-07 Thread David Miller
From: Jesper Dangaard Brouer 
Date: Thu, 03 Nov 2016 14:55:56 +0100

> This patchset is a cleanup for IFF_NO_QUEUE devices.  It will
> hopefully help userspace get a more consistent behavior when attaching
> qdisc to such virtual devices.

Series applied, thanks Jesper.


Re: [PATCH] igmp: Make igmp group member RFC 3376 compliant

2016-11-07 Thread David Miller
From: Michal Tesar 
Date: Thu, 3 Nov 2016 10:38:34 +0100

>  2. If the received Query is a General Query, the interface timer is
> used to schedule a response to the General Query after the
> selected delay.  Any previously pending response to a General
> Query is canceled.
> --8<--
> 
> Currently the timer is rearmed with new random expiration time for
> every incoming query regardless of possibly already pending report.
> Which is not aligned with the above RFE.

I don't read it that way.  #2 says if this is a general query then any
pending response to a general query is cancelled.  And that's
effectively what the code is doing right now.


Why are IPv6 host and anycast routes referencing lo device?

2016-11-07 Thread David Ahern

Can anyone explain why host routes and anycast routes for IPv6 are added with 
the device set to loopback versus the device with the address:

local ::1 dev lo  proto none  metric 0  pref medium
local 2000:1:: dev lo  proto none  metric 0  pref medium
local 2000:1::3 dev lo  proto none  metric 0  pref medium
local 2100:2:: dev lo  proto none  metric 0  pref medium
local 2100:2::3 dev lo  proto none  metric 0  pref medium


This behavior differs from IPv4 where host routes use the device with the 
address:

broadcast 10.1.1.0 dev eth0  proto kernel  scope link  src 10.1.1.3
local 10.1.1.3 dev eth0  proto kernel  scope host  src 10.1.1.3
broadcast 10.1.1.255 dev eth0  proto kernel  scope link  src 10.1.1.3
broadcast 10.100.2.0 dev eth2  proto kernel  scope link  src 10.100.2.3
local 10.100.2.3 dev eth2  proto kernel  scope host  src 10.100.2.3
broadcast 10.100.2.255 dev eth2  proto kernel  scope link  src 10.100.2.3

The use of loopback pre-dates the git history, so wondering if someone recalls 
the reason why. We would like to change that to make it consistent with IPv4 - 
with a sysctl to maintain backwards compatibility.

Thanks,
David


Re: [PATCH] vxlan: hide unused local variable

2016-11-07 Thread Pravin Shelar
On Mon, Nov 7, 2016 at 2:21 PM, Arnd Bergmann  wrote:
> On Monday, November 7, 2016 2:16:30 PM CET Pravin Shelar wrote:
>> On Monday, November 7, 2016, Arnd Bergmann  wrote:
>>
>> > A bugfix introduced a harmless warning in v4.9-rc4:
>> >
>> > drivers/net/vxlan.c: In function 'vxlan_group_used':
>> > drivers/net/vxlan.c:947:21: error: unused variable 'sock6'
>> > [-Werror=unused-variable]
>> >
>> > This hides the variable inside of the same #ifdef that is
>> > around its user. The extraneous initialization is removed
>> > at the same time, it was accidentally introduced in the
>> > same commit.
>> >
>> > Fixes: c6fcc4fc5f8b ("vxlan: avoid using stale vxlan socket.")
>> > Signed-off-by: Arnd Bergmann >
>> > ---
>>
>>
>> I have already submitted patch to fix this issue.
>>
>> https://patchwork.ozlabs.org/patch/691588/
>
> You have tagged those seven patches for net-next which seems
> appropriate, but as I wrote above the commit that introduced
> it was merged between -rc3 and -rc4, so I think we still need a
> fix for v4.9, right?
>

This is not actual bug, So I am not sure if we need the fix for net branch.


RE: [PATCH] igb: Workaround for igb i210 firmware issue.

2016-11-07 Thread Brown, Aaron F
> From: netdev-ow...@vger.kernel.org [mailto:netdev-
> ow...@vger.kernel.org] On Behalf Of Chris J Arges
> Sent: Wednesday, November 2, 2016 7:14 AM
> To: j...@henneberg-systemdesign.com
> Cc: intel-wired-...@lists.osuosl.org; Chris J Arges
> ; Kirsher, Jeffrey T
> ; netdev@vger.kernel.org; linux-
> ker...@vger.kernel.org
> Subject: [PATCH] igb: Workaround for igb i210 firmware issue.
> 
> Sometimes firmware may not properly initialize I347AT4_PAGE_SELECT
> causing
> the probe of an igb i210 NIC to fail. This patch adds an addition zeroing of
> this register during igb_get_phy_id to workaround this issue.
> 
> Thanks for Jochen Henneberg for the idea and original patch.
> 
> Signed-off-by: Chris J Arges 
> ---
>  drivers/net/ethernet/intel/igb/e1000_phy.c | 4 
>  1 file changed, 4 insertions(+)

Tested-by: Aaron Brown 


RE: [PATCH net-next v6 02/10] dpaa_eth: add support for DPAA Ethernet

2016-11-07 Thread Madalin-Cristian Bucur
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Monday, November 07, 2016 6:40 PM
> 
> From: Madalin-Cristian Bucur 
> Date: Mon, 7 Nov 2016 16:32:16 +
> 
> >> From: David Miller [mailto:da...@davemloft.net]
> >> Sent: Monday, November 07, 2016 5:55 PM
> >>
> >> From: Madalin-Cristian Bucur 
> >> Date: Mon, 7 Nov 2016 15:43:26 +
> >>
> >> >> From: David Miller [mailto:da...@davemloft.net]
> >> >> Sent: Thursday, November 03, 2016 9:58 PM
> >> >>
> >> >> Why?  By clearing this, you disallow an important fundamental way to
> >> >> do performane testing, via pktgen.
> >> >
> >> > The Tx path in DPAA requires one to insert a back-pointer to the skb
> >> > into
> >> > the Tx buffer. On the Tx confirmation path the back-pointer in the
> >> > buffer
> >> > is used to release the skb. If Tx buffer is shared we'd alter the
> >> > back-pointer
> >> > and leak/double free skbs. See also
> >>
> >> Then have your software state store an array of SKB pointers, one for
> each
> >> TX ring entry, just like every other driver does.
> >
> > There is no Tx ring in DPAA. Frames are send out on QMan HW queues
> > towards the FMan for Tx and then received back on Tx confirmation queues
> > for cleanup.
> > Array traversal would for sure cost more than using the back-pointer.
> > Also, we can now process confirmations on a different core than the one
> > doing Tx,
> > we'd have to keep the arrays percpu and force the Tx conf on the same
> > core. Or add locks.
> 
> Report back an integer index, like every scsi driver out there which
> completes tagged queued block I/O operations asynchronously.  You can
> associate the array with a specific TX confirmation queue.

>From HW? It only gives you back the buffer start address (plus length, etc).
"buff_2_skb()" needs to be solved in SW, expensively using array (lists? As
the number of frames in flight can be large/variable) or cheaply with the back
pointer. The back-pointer approach has its tradeoffs: no shared skbs, imposed
non-zero needed_headroom.



Re: [PATCH v3 2/4] kconfig: regenerate *.c_shipped files after previous changes

2016-11-07 Thread Josh Triplett
On Mon, Nov 07, 2016 at 05:41:38PM -0500, Nicolas Pitre wrote:
> On Mon, 7 Nov 2016, Josh Triplett wrote:
> 
> > [snipping large patch]
> > 
> > One suggestion that might make this patch easier to review: you might
> > consider first regenerating the unchanged parser with Bison 3.0.4, then
> > regenerating it again after the "imply" change.  I think that'd
> > eliminate quite a lot of noise in this patch.
> 
> I tried that. This made two large patches instead of just one, both 
> equally obscure.
> 
> So this patch stands on its own, containing changes that are 
> mechanically generated and therefore shouldn't require manual review.

Fair enough. I hadn't expected that the changes from "imply" would still
be huge.


Re: [PATCH net] Fixes: 5943634fc559 ("ipv4: Maintain redirect and PMTU info in struct rtable again.")

2016-11-07 Thread Stephen Suryaputra Lin
I did the temporary clearing/restoring rt_gateway following the deleted
function check_peer_redir(). But, looking again at the function the
assigning of peer->redirect_learned.a4 to rt_gateway can be permanent
because restoring to the old_gw only happens on errors.

I have updated the patch to use __ipv4_neigh_lookup().

Thank you.

On Mon, Nov 07, 2016 at 11:20:16AM -0500, David Miller wrote:
> From: Eric Dumazet 
> Date: Mon, 07 Nov 2016 08:08:52 -0800
> 
> > In any case, rt is a shared object at that time, so even temporarily
> > clearing/restoring rt_gateway seems wrong to me.
> > 
> > I would rather call __ipv4_neigh_lookup(dst->dev, new_gw) directly at
> > this point.
> 
> Agreed.


Re: net/l2tp: use-after-free write in l2tp_ip6_close

2016-11-07 Thread Cong Wang
On Mon, Nov 7, 2016 at 2:35 PM, Andrey Konovalov  wrote:
> Hi,
>
> I've got the following error report while running the syzkaller fuzzer:
>
> ==
> BUG: KASAN: use-after-free in l2tp_ip6_close+0x239/0x2a0 at addr
> 8800677276d8
> Write of size 8 by task a.out/8668
> CPU: 0 PID: 8668 Comm: a.out Not tainted 4.9.0-rc4+ #354
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>  8800694d7b00 81b46a64 88006adb5780 8800677276c0
>  880067727c68 8800677276c0 8800694d7b28 8150a86c
>  8800694d7bb8 88006adb5780 8800e77276d8 8800694d7ba8
> Call Trace:
>  [< inline >] __dump_stack lib/dump_stack.c:15
>  [] dump_stack+0xb3/0x10f lib/dump_stack.c:51
>  [] kasan_object_err+0x1c/0x70 mm/kasan/report.c:156
>  [< inline >] print_address_description mm/kasan/report.c:194
>  [] kasan_report_error+0x1f7/0x4d0 mm/kasan/report.c:283
>  [< inline >] kasan_report mm/kasan/report.c:303
>  [] __asan_report_store8_noabort+0x3e/0x40
> mm/kasan/report.c:329
>  [< inline >] __write_once_size ./include/linux/compiler.h:272
>  [< inline >] __hlist_del ./include/linux/list.h:622
>  [< inline >] hlist_del_init ./include/linux/list.h:637
>  [] l2tp_ip6_close+0x239/0x2a0 net/l2tp/l2tp_ip6.c:239
>  [] inet_release+0xef/0x1c0 net/ipv4/af_inet.c:415
>  [] inet6_release+0x50/0x70 net/ipv6/af_inet6.c:422
>  [] sock_release+0x8e/0x1d0 net/socket.c:570
>  [] sock_close+0x16/0x20 net/socket.c:1017
>  [] __fput+0x29d/0x720 fs/file_table.c:208
>  [] fput+0x15/0x20 fs/file_table.c:244
>  [] task_work_run+0xf8/0x170 kernel/task_work.c:116
>  [< inline >] exit_task_work ./include/linux/task_work.h:21
>  [] do_exit+0x883/0x2ac0 kernel/exit.c:828
>  [] do_group_exit+0x10e/0x340 kernel/exit.c:931
>  [< inline >] SYSC_exit_group kernel/exit.c:942
>  [] SyS_exit_group+0x1d/0x20 kernel/exit.c:940
>  [] entry_SYSCALL_64_fastpath+0x1f/0xc2
> arch/x86/entry/entry_64.S:209

I guess we need to lock the sock for l2tp_ip6_disconnect() too.

diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c
index ad3468c..ea2ae66 100644
--- a/net/l2tp/l2tp_ip6.c
+++ b/net/l2tp/l2tp_ip6.c
@@ -410,7 +410,7 @@ static int l2tp_ip6_disconnect(struct sock *sk, int flags)
if (sock_flag(sk, SOCK_ZAPPED))
return 0;

-   return __udp_disconnect(sk, flags);
+   return udp_disconnect(sk, flags);
 }

 static int l2tp_ip6_getname(struct socket *sock, struct sockaddr *uaddr,


[PATCH v4] Net Driver: Add Cypress GX3 VID=04b4 PID=3610.

2016-11-07 Thread Chris Roth
From: Allan Chou 

Add support for Cypress GX3 SuperSpeed to Gigabit Ethernet
Bridge Controller (Vendor=04b4 ProdID=3610).

Patch verified on x64 linux kernel 4.7.4, 4.8.6, 4.9-rc4 systems
with the Kensington SD4600P USB-C Universal Dock with Power,
which uses the Cypress GX3 SuperSpeed to Gigabit Ethernet Bridge
Controller.

A similar patch was signed-off and tested-by Allan Chou
 on 2015-12-01.

Allan verified his similar patch on x86 Linux kernel 4.1.6 system
with Cypress GX3 SuperSpeed to Gigabit Ethernet Bridge Controller.

Tested-by: Allan Chou 
Tested-by: Chris Roth 
Tested-by: Artjom Simon 

Signed-off-by: Allan Chou 
Signed-off-by: Chris Roth 
---
 drivers/net/usb/ax88179_178a.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/drivers/net/usb/ax88179_178a.c b/drivers/net/usb/ax88179_178a.c
index e6338c1..8a6675d 100644
--- a/drivers/net/usb/ax88179_178a.c
+++ b/drivers/net/usb/ax88179_178a.c
@@ -1656,6 +1656,19 @@ static const struct driver_info ax88178a_info = {
 .tx_fixup = ax88179_tx_fixup,
 };

+static const struct driver_info cypress_GX3_info = {
+.description = "Cypress GX3 SuperSpeed to Gigabit Ethernet Controller",
+.bind = ax88179_bind,
+.unbind = ax88179_unbind,
+.status = ax88179_status,
+.link_reset = ax88179_link_reset,
+.reset = ax88179_reset,
+.stop = ax88179_stop,
+.flags = FLAG_ETHER | FLAG_FRAMING_AX,
+.rx_fixup = ax88179_rx_fixup,
+.tx_fixup = ax88179_tx_fixup,
+};
+
 static const struct driver_info dlink_dub1312_info = {
 .description = "D-Link DUB-1312 USB 3.0 to Gigabit Ethernet Adapter",
 .bind = ax88179_bind,
@@ -1718,6 +1731,10 @@ static const struct usb_device_id products[] = {
 , USB_DEVICE(0x0b95, 0x178a),
 .driver_info = (unsigned long)_info,
 }, {
+/* Cypress GX3 SuperSpeed to Gigabit Ethernet Bridge Controller */
+USB_DEVICE(0x04b4, 0x3610),
+.driver_info = (unsigned long)_GX3_info,
+}, {
 /* D-Link DUB-1312 USB 3.0 to Gigabit Ethernet Adapter */
 USB_DEVICE(0x2001, 0x4a00),
 .driver_info = (unsigned long)_dub1312_info,
-- 
2.7.4


[PATCH net,v2] Fixes: 5943634fc559 ("ipv4: Maintain redirect and PMTU info in struct rtable again.")

2016-11-07 Thread Stephen Suryaputra Lin
ICMP redirects behavior is different after the commit above. An email
requesting the explanation on why the behavior needs to be different
was sent earlier to netdev (https://patchwork.ozlabs.org/patch/687728/).
Since there isn't a reply yet, I decided to prepare this formal patch.

In v2.6 kernel, it used to be that ip_rt_redirect() calls
arp_bind_neighbour() which returns 0 and then the state of the neigh for
the new_gw is checked. If the state isn't valid then the redirected
route is deleted. This behavior is maintained up to v3.5.7 by
check_peer_redirect() because rt->rt_gateway is assigned to
peer->redirect_learned.a4 before calling ipv4_neigh_lookup().

After the commit, ipv4_neigh_lookup() is performed without the
rt_gateway assigned to the new_gw. In the case when rt_gateway (old_gw)
isn't zero, the function uses it as the key. The neigh is most likely valid
since the old_gw is the one that sends the ICMP redirect message. Then the
new_gw is assigned to fib_nh_exception. The problem is: the new_gw ARP may
never gets resolved and the traffic is blackholed.

Changes from v1:
 - use __ipv4_neigh_lookup instead (per Eric Dumazet).

Signed-off-by: Stephen Suryaputra Lin 
---
 net/ipv4/route.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 62d4d90c1389..2a57566e6e91 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -753,7 +753,9 @@ static void __ip_do_redirect(struct rtable *rt, struct 
sk_buff *skb, struct flow
goto reject_redirect;
}
 
-   n = ipv4_neigh_lookup(>dst, NULL, _gw);
+   n = __ipv4_neigh_lookup(rt->dst.dev, new_gw);
+   if (!n)
+   n = neigh_create(_tbl, _gw, rt->dst.dev);
if (!IS_ERR(n)) {
if (!(n->nud_state & NUD_VALID)) {
neigh_event_send(n, NULL);
-- 
2.7.4



net/l2tp: use-after-free write in l2tp_ip6_close

2016-11-07 Thread Andrey Konovalov
Hi,

I've got the following error report while running the syzkaller fuzzer:

==
BUG: KASAN: use-after-free in l2tp_ip6_close+0x239/0x2a0 at addr
8800677276d8
Write of size 8 by task a.out/8668
CPU: 0 PID: 8668 Comm: a.out Not tainted 4.9.0-rc4+ #354
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 8800694d7b00 81b46a64 88006adb5780 8800677276c0
 880067727c68 8800677276c0 8800694d7b28 8150a86c
 8800694d7bb8 88006adb5780 8800e77276d8 8800694d7ba8
Call Trace:
 [< inline >] __dump_stack lib/dump_stack.c:15
 [] dump_stack+0xb3/0x10f lib/dump_stack.c:51
 [] kasan_object_err+0x1c/0x70 mm/kasan/report.c:156
 [< inline >] print_address_description mm/kasan/report.c:194
 [] kasan_report_error+0x1f7/0x4d0 mm/kasan/report.c:283
 [< inline >] kasan_report mm/kasan/report.c:303
 [] __asan_report_store8_noabort+0x3e/0x40
mm/kasan/report.c:329
 [< inline >] __write_once_size ./include/linux/compiler.h:272
 [< inline >] __hlist_del ./include/linux/list.h:622
 [< inline >] hlist_del_init ./include/linux/list.h:637
 [] l2tp_ip6_close+0x239/0x2a0 net/l2tp/l2tp_ip6.c:239
 [] inet_release+0xef/0x1c0 net/ipv4/af_inet.c:415
 [] inet6_release+0x50/0x70 net/ipv6/af_inet6.c:422
 [] sock_release+0x8e/0x1d0 net/socket.c:570
 [] sock_close+0x16/0x20 net/socket.c:1017
 [] __fput+0x29d/0x720 fs/file_table.c:208
 [] fput+0x15/0x20 fs/file_table.c:244
 [] task_work_run+0xf8/0x170 kernel/task_work.c:116
 [< inline >] exit_task_work ./include/linux/task_work.h:21
 [] do_exit+0x883/0x2ac0 kernel/exit.c:828
 [] do_group_exit+0x10e/0x340 kernel/exit.c:931
 [< inline >] SYSC_exit_group kernel/exit.c:942
 [] SyS_exit_group+0x1d/0x20 kernel/exit.c:940
 [] entry_SYSCALL_64_fastpath+0x1f/0xc2
arch/x86/entry/entry_64.S:209
Object at 8800677276c0, in cache L2TP/IPv6 size: 1448
Allocated:
PID = 8692
[] save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
[] save_stack+0x46/0xd0 mm/kasan/kasan.c:495
[< inline >] set_track mm/kasan/kasan.c:507
[] kasan_kmalloc+0xab/0xe0 mm/kasan/kasan.c:598
[] kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:537
[< inline >] slab_post_alloc_hook mm/slab.h:417
[< inline >] slab_alloc_node mm/slub.c:2708
[< inline >] slab_alloc mm/slub.c:2716
[] kmem_cache_alloc+0xb4/0x270 mm/slub.c:2721
[] sk_prot_alloc+0x69/0x2b0 net/core/sock.c:1327
[] sk_alloc+0x38/0xaf0 net/core/sock.c:1389
[] inet6_create+0x2e5/0xf60 net/ipv6/af_inet6.c:182
[] __sock_create+0x37f/0x640 net/socket.c:1153
[< inline >] sock_create net/socket.c:1193
[< inline >] SYSC_socket net/socket.c:1223
[] SyS_socket+0xf0/0x1b0 net/socket.c:1203
[] entry_SYSCALL_64_fastpath+0x1f/0xc2
arch/x86/entry/entry_64.S:209
Freed:
PID = 8668
[] save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
[] save_stack+0x46/0xd0 mm/kasan/kasan.c:495
[< inline >] set_track mm/kasan/kasan.c:507
[] kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:571
[< inline >] slab_free_hook mm/slub.c:1352
[< inline >] slab_free_freelist_hook mm/slub.c:1374
[< inline >] slab_free mm/slub.c:2951
[] kmem_cache_free+0xb3/0x2c0 mm/slub.c:2973
[< inline >] sk_prot_free net/core/sock.c:1370
[] __sk_destruct+0x319/0x480 net/core/sock.c:1445
[] sk_destruct+0x44/0x80 net/core/sock.c:1453
[] __sk_free+0x54/0x230 net/core/sock.c:1461
[] sk_free+0x23/0x30 net/core/sock.c:1472
[< inline >] sock_put ./include/net/sock.h:1591
[] sk_common_release+0x294/0x3e0 net/core/sock.c:2745
[] l2tp_ip6_close+0x209/0x2a0 net/l2tp/l2tp_ip6.c:243
[] inet_release+0xef/0x1c0 net/ipv4/af_inet.c:415
[] inet6_release+0x50/0x70 net/ipv6/af_inet6.c:422
[] sock_release+0x8e/0x1d0 net/socket.c:570
[] sock_close+0x16/0x20 net/socket.c:1017
[] __fput+0x29d/0x720 fs/file_table.c:208
[] fput+0x15/0x20 fs/file_table.c:244
[] task_work_run+0xf8/0x170 kernel/task_work.c:116
[< inline >] exit_task_work ./include/linux/task_work.h:21
[] do_exit+0x883/0x2ac0 kernel/exit.c:828
[] do_group_exit+0x10e/0x340 kernel/exit.c:931
[< inline >] SYSC_exit_group kernel/exit.c:942
[] SyS_exit_group+0x1d/0x20 kernel/exit.c:940
[] entry_SYSCALL_64_fastpath+0x1f/0xc2
arch/x86/entry/entry_64.S:209
Memory state around the buggy address:
 880067727580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 880067727600: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc
>880067727680: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
^
 880067727700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 880067727780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==

To reproduce run the attached program in a tight parallel loop using
stress (https://godoc.org/golang.org/x/tools/cmd/stress):
$ gcc -lpthread tmp.c

Re: [PATCH v3 2/4] kconfig: regenerate *.c_shipped files after previous changes

2016-11-07 Thread Nicolas Pitre
On Mon, 7 Nov 2016, Josh Triplett wrote:

> [snipping large patch]
> 
> One suggestion that might make this patch easier to review: you might
> consider first regenerating the unchanged parser with Bison 3.0.4, then
> regenerating it again after the "imply" change.  I think that'd
> eliminate quite a lot of noise in this patch.

I tried that. This made two large patches instead of just one, both 
equally obscure.

So this patch stands on its own, containing changes that are 
mechanically generated and therefore shouldn't require manual review.


Nicolas


Re: [PATCH v3 0/4] make POSIX timers optional with some Kconfig help

2016-11-07 Thread Nicolas Pitre
On Mon, 7 Nov 2016, Nicolas Pitre wrote:

> Many embedded systems don't need the full POSIX timer support.
> Configuring them out provides a nice kernel image size reduction.
> 
> When POSIX timers are configured out, the PTP clock subsystem should be
> left out as well. However a bunch of ethernet drivers currently *select*
> the later in their Kconfig entries. Therefore some more work was needed
> to break that hard dependency from those drivers without preventing their
> usage altogether.
> 
> Therefore this series also includes kconfig changes to implement a new
> keyword to express some reverse dependencies like "select" does, named
> "imply", and still allowing for the target config symbol to be disabled
> if the user or a direct dependency says so.
> 
> At this point I'd like to gather ACKs especially from people in the "To"
> field. Ideally this would need to go upstream as a single series to avoid
> cross subsystem dependency issues.  So far it was suggested that this should 
> go
> via the kbuild tree.

This is also available here for those who prefer a git tree:

git://git.linaro.org/people/nicolas.pitre/linux/ configurable_posix_timers


Nicolas


Re: [PATCH v3 2/4] kconfig: regenerate *.c_shipped files after previous changes

2016-11-07 Thread Josh Triplett
[snipping large patch]

One suggestion that might make this patch easier to review: you might
consider first regenerating the unchanged parser with Bison 3.0.4, then
regenerating it again after the "imply" change.  I think that'd
eliminate quite a lot of noise in this patch.

- Josh Triplett


[PATCH] rtnl: reset calcit fptr in rtnl_unregister()

2016-11-07 Thread Mathias Krause
To avoid having dangling function pointers left behind, reset calcit in
rtnl_unregister(), too.

This is no issue so far, as only the rtnl core registers a netlink
handler with a calcit hook which won't be unregistered, but may become
one if new code makes use of the calcit hook.

Fixes: c7ac8679bec9 ("rtnetlink: Compute and store minimum ifinfo...")
Cc: Jeff Kirsher 
Cc: Greg Rose 
Signed-off-by: Mathias Krause 
---
 net/core/rtnetlink.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 189cc78c77eb..d4c601604bf7 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -275,6 +275,7 @@ int rtnl_unregister(int protocol, int msgtype)
 
rtnl_msg_handlers[protocol][msgindex].doit = NULL;
rtnl_msg_handlers[protocol][msgindex].dumpit = NULL;
+   rtnl_msg_handlers[protocol][msgindex].calcit = NULL;
 
return 0;
 }
-- 
1.7.10.4



Re: [PATCH] vxlan: hide unused local variable

2016-11-07 Thread Arnd Bergmann
On Monday, November 7, 2016 2:16:30 PM CET Pravin Shelar wrote:
> On Monday, November 7, 2016, Arnd Bergmann  wrote:
> 
> > A bugfix introduced a harmless warning in v4.9-rc4:
> >
> > drivers/net/vxlan.c: In function 'vxlan_group_used':
> > drivers/net/vxlan.c:947:21: error: unused variable 'sock6'
> > [-Werror=unused-variable]
> >
> > This hides the variable inside of the same #ifdef that is
> > around its user. The extraneous initialization is removed
> > at the same time, it was accidentally introduced in the
> > same commit.
> >
> > Fixes: c6fcc4fc5f8b ("vxlan: avoid using stale vxlan socket.")
> > Signed-off-by: Arnd Bergmann >
> > ---
> 
> 
> I have already submitted patch to fix this issue.
> 
> https://patchwork.ozlabs.org/patch/691588/

You have tagged those seven patches for net-next which seems
appropriate, but as I wrote above the commit that introduced
it was merged between -rc3 and -rc4, so I think we still need a
fix for v4.9, right?

Arnd


[PATCH v3 2/4] kconfig: regenerate *.c_shipped files after previous changes

2016-11-07 Thread Nicolas Pitre
Signed-off-by: Nicolas Pitre 
---
 scripts/kconfig/zconf.hash.c_shipped |   30 +-
 scripts/kconfig/zconf.tab.c_shipped  | 1581 --
 2 files changed, 753 insertions(+), 858 deletions(-)

diff --git a/scripts/kconfig/zconf.hash.c_shipped 
b/scripts/kconfig/zconf.hash.c_shipped
index 360a62df2b..d51b15de07 100644
--- a/scripts/kconfig/zconf.hash.c_shipped
+++ b/scripts/kconfig/zconf.hash.c_shipped
@@ -55,10 +55,10 @@ kconf_id_hash (register const char *str, register unsigned 
int len)
   73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
   73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
   73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
-  73, 73, 73, 73, 73, 73, 73,  5, 25, 25,
+  73, 73, 73, 73, 73, 73, 73, 10, 25, 25,
0,  0,  0,  5,  0,  0, 73, 73,  5,  0,
   10,  5, 45, 73, 20, 20,  0, 15, 15, 73,
-  20,  5, 73, 73, 73, 73, 73, 73, 73, 73,
+  20,  0, 73, 73, 73, 73, 73, 73, 73, 73,
   73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
   73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
   73, 73, 73, 73, 73, 73, 73, 73, 73, 73,
@@ -120,6 +120,7 @@ struct kconf_id_strings_t
 char kconf_id_strings_str43[sizeof("hex")];
 char kconf_id_strings_str46[sizeof("config")];
 char kconf_id_strings_str47[sizeof("boolean")];
+char kconf_id_strings_str50[sizeof("imply")];
 char kconf_id_strings_str51[sizeof("string")];
 char kconf_id_strings_str54[sizeof("help")];
 char kconf_id_strings_str56[sizeof("prompt")];
@@ -157,6 +158,7 @@ static const struct kconf_id_strings_t 
kconf_id_strings_contents =
 "hex",
 "config",
 "boolean",
+"imply",
 "string",
 "help",
 "prompt",
@@ -174,7 +176,7 @@ kconf_id_lookup (register const char *str, register 
unsigned int len)
 {
   enum
 {
-  TOTAL_KEYWORDS = 34,
+  TOTAL_KEYWORDS = 35,
   MIN_WORD_LENGTH = 2,
   MAX_WORD_LENGTH = 14,
   MIN_HASH_VALUE = 2,
@@ -205,15 +207,15 @@ kconf_id_lookup (register const char *str, register 
unsigned int len)
   {(int)(long)&((struct kconf_id_strings_t *)0)->kconf_id_strings_str12,   
T_DEFAULT,  TF_COMMAND, S_TRISTATE},
 #line 36 "scripts/kconfig/zconf.gperf"
   {(int)(long)&((struct kconf_id_strings_t *)0)->kconf_id_strings_str13,   
T_DEFAULT,  TF_COMMAND, S_BOOLEAN},
-#line 46 "scripts/kconfig/zconf.gperf"
+#line 47 "scripts/kconfig/zconf.gperf"
   {(int)(long)&((struct kconf_id_strings_t *)0)->kconf_id_strings_str14,   
T_OPT_DEFCONFIG_LIST,TF_OPTION},
   {-1}, {-1},
-#line 44 "scripts/kconfig/zconf.gperf"
+#line 45 "scripts/kconfig/zconf.gperf"
   {(int)(long)&((struct kconf_id_strings_t *)0)->kconf_id_strings_str17,   
T_ON,   TF_PARAM},
 #line 29 "scripts/kconfig/zconf.gperf"
   {(int)(long)&((struct kconf_id_strings_t *)0)->kconf_id_strings_str18,   
T_OPTIONAL, TF_COMMAND},
   {-1}, {-1},
-#line 43 "scripts/kconfig/zconf.gperf"
+#line 44 "scripts/kconfig/zconf.gperf"
   {(int)(long)&((struct kconf_id_strings_t *)0)->kconf_id_strings_str21,   
T_OPTION,   TF_COMMAND},
 #line 17 "scripts/kconfig/zconf.gperf"
   {(int)(long)&((struct kconf_id_strings_t *)0)->kconf_id_strings_str22,   
T_ENDMENU,  TF_COMMAND},
@@ -223,9 +225,9 @@ kconf_id_lookup (register const char *str, register 
unsigned int len)
 #line 23 "scripts/kconfig/zconf.gperf"
   {(int)(long)&((struct kconf_id_strings_t *)0)->kconf_id_strings_str25,   
T_MENUCONFIG,   TF_COMMAND},
   {-1},
-#line 45 "scripts/kconfig/zconf.gperf"
+#line 46 "scripts/kconfig/zconf.gperf"
   {(int)(long)&((struct kconf_id_strings_t *)0)->kconf_id_strings_str27,   
T_OPT_MODULES,  TF_OPTION},
-#line 48 "scripts/kconfig/zconf.gperf"
+#line 49 "scripts/kconfig/zconf.gperf"
   {(int)(long)&((struct kconf_id_strings_t *)0)->kconf_id_strings_str28,   
T_OPT_ALLNOCONFIG_Y,TF_OPTION},
 #line 16 "scripts/kconfig/zconf.gperf"
   {(int)(long)&((struct kconf_id_strings_t *)0)->kconf_id_strings_str29,   
T_MENU, TF_COMMAND},
@@ -234,10 +236,10 @@ kconf_id_lookup (register const char *str, register 
unsigned int len)
   {(int)(long)&((struct kconf_id_strings_t *)0)->kconf_id_strings_str31,   
T_SELECT,   TF_COMMAND},
 #line 21 "scripts/kconfig/zconf.gperf"
   {(int)(long)&((struct kconf_id_strings_t *)0)->kconf_id_strings_str32,   
T_COMMENT,  TF_COMMAND},
-#line 47 "scripts/kconfig/zconf.gperf"
+#line 48 "scripts/kconfig/zconf.gperf"
   {(int)(long)&((struct kconf_id_strings_t *)0)->kconf_id_strings_str33,   
T_OPT_ENV,  TF_OPTION},
   {-1},
-#line 41 "scripts/kconfig/zconf.gperf"
+#line 42 "scripts/kconfig/zconf.gperf"
   {(int)(long)&((struct kconf_id_strings_t *)0)->kconf_id_strings_str35,   
T_RANGE,TF_COMMAND},
 #line 19 "scripts/kconfig/zconf.gperf"
   {(int)(long)&((struct kconf_id_strings_t *)0)->kconf_id_strings_str36,   
T_CHOICE,   TF_COMMAND},
@@ -247,7 +249,7 @@ 

[PATCH v3 4/4] posix-timers: make it configurable

2016-11-07 Thread Nicolas Pitre
Some embedded systems have no use for them.  This removes about
22KB from the kernel binary size when configured out.

Corresponding syscalls are routed to a stub logging the attempt to
use those syscalls which should be enough of a clue if they were
disabled without proper consideration. They are: timer_create,
timer_gettime: timer_getoverrun, timer_settime, timer_delete,
clock_adjtime.

The clock_settime, clock_gettime, clock_getres and clock_nanosleep
syscalls are replaced by simple wrappers compatible with CLOCK_REALTIME,
CLOCK_MONOTONIC and CLOCK_BOOTTIME only which should cover the vast
majority of use cases with very little code.

Signed-off-by: Nicolas Pitre 
Reviewed-by: Josh Triplett 
Acked-by: Richard Cochran 
---
 drivers/ptp/Kconfig  |   2 +-
 include/linux/posix-timers.h |  28 +-
 include/linux/sched.h|  10 
 init/Kconfig |  17 +++
 kernel/signal.c  |   4 ++
 kernel/time/Makefile |  10 +++-
 kernel/time/posix-stubs.c| 118 +++
 7 files changed, 184 insertions(+), 5 deletions(-)
 create mode 100644 kernel/time/posix-stubs.c

diff --git a/drivers/ptp/Kconfig b/drivers/ptp/Kconfig
index 0f7492f8ea..bdce332911 100644
--- a/drivers/ptp/Kconfig
+++ b/drivers/ptp/Kconfig
@@ -6,7 +6,7 @@ menu "PTP clock support"
 
 config PTP_1588_CLOCK
tristate "PTP clock support"
-   depends on NET
+   depends on NET && POSIX_TIMERS
select PPS
select NET_PTP_CLASSIFY
help
diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h
index 62d44c1760..2288c5c557 100644
--- a/include/linux/posix-timers.h
+++ b/include/linux/posix-timers.h
@@ -118,6 +118,8 @@ struct k_clock {
 extern struct k_clock clock_posix_cpu;
 extern struct k_clock clock_posix_dynamic;
 
+#ifdef CONFIG_POSIX_TIMERS
+
 void posix_timers_register_clock(const clockid_t clock_id, struct k_clock 
*new_clock);
 
 /* function to call to trigger timer event */
@@ -131,8 +133,30 @@ void posix_cpu_timers_exit_group(struct task_struct *task);
 void set_process_cpu_timer(struct task_struct *task, unsigned int clock_idx,
   cputime_t *newval, cputime_t *oldval);
 
-long clock_nanosleep_restart(struct restart_block *restart_block);
-
 void update_rlimit_cpu(struct task_struct *task, unsigned long rlim_new);
 
+#else
+
+#include 
+
+static inline void posix_timers_register_clock(const clockid_t clock_id,
+  struct k_clock *new_clock) {}
+static inline int posix_timer_event(struct k_itimer *timr, int si_private)
+{ return 0; }
+static inline void run_posix_cpu_timers(struct task_struct *task) {}
+static inline void posix_cpu_timers_exit(struct task_struct *task)
+{
+   add_device_randomness((const void*) >se.sum_exec_runtime,
+ sizeof(unsigned long long));
+}
+static inline void posix_cpu_timers_exit_group(struct task_struct *task) {}
+static inline void set_process_cpu_timer(struct task_struct *task,
+   unsigned int clock_idx, cputime_t *newval, cputime_t *oldval) {}
+static inline void update_rlimit_cpu(struct task_struct *task,
+unsigned long rlim_new) {}
+
+#endif
+
+long clock_nanosleep_restart(struct restart_block *restart_block);
+
 #endif
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 348f51b0ec..ad716d5559 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2946,8 +2946,13 @@ static inline void exit_thread(struct task_struct *tsk)
 extern void exit_files(struct task_struct *);
 extern void __cleanup_sighand(struct sighand_struct *);
 
+#ifdef CONFIG_POSIX_TIMERS
 extern void exit_itimers(struct signal_struct *);
 extern void flush_itimer_signals(void);
+#else
+static inline void exit_itimers(struct signal_struct *s) {}
+static inline void flush_itimer_signals(void) {}
+#endif
 
 extern void do_group_exit(int);
 
@@ -3450,7 +3455,12 @@ static __always_inline bool need_resched(void)
  * Thread group CPU time accounting.
  */
 void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times);
+#ifdef CONFIG_POSIX_TIMERS
 void thread_group_cputimer(struct task_struct *tsk, struct task_cputime 
*times);
+#else
+static inline void thread_group_cputimer(struct task_struct *tsk,
+struct task_cputime *times) {}
+#endif
 
 /*
  * Reevaluate whether the task has signals pending delivery.
diff --git a/init/Kconfig b/init/Kconfig
index 34407f15e6..f430f776e8 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1445,6 +1445,23 @@ config SYSCTL_SYSCALL
 
  If unsure say N here.
 
+config POSIX_TIMERS
+   bool "Posix Clocks & timers" if EXPERT
+   default y
+   help
+ This includes native support for POSIX timers to the kernel.
+ Some embedded systems have no use for them and 

[PATCH v3 3/4] ptp_clock: allow for it to be optional

2016-11-07 Thread Nicolas Pitre
In order to break the hard dependency between the PTP clock subsystem and
ethernet drivers capable of being clock providers, this patch provides
simple PTP stub functions to allow linkage of those drivers into the
kernel even when the PTP subsystem is configured out. Drivers must be
ready to accept NULL from ptp_clock_register() in that case.

And to make it possible for PTP to be configured out, the select statement
in those driver's Kconfig menu entries is converted to the new "imply"
statement. This way the PTP subsystem may have Kconfig dependencies of
its own, such as POSIX_TIMERS, without having to make those ethernet
drivers unavailable if POSIX timers are cconfigured out. And when support
for POSIX timers is selected again then the default config option for PTP
clock support will automatically be adjusted accordingly.

The pch_gbe driver is a bit special as it relies on extra code in
drivers/ptp/ptp_pch.c. Therefore we let the make process descend into
drivers/ptp/ even if PTP_1588_CLOCK is unselected.

Signed-off-by: Nicolas Pitre 
Reviewed-by: Josh Triplett 
Acked-by: Richard Cochran 
---
 drivers/Makefile|  2 +-
 drivers/net/ethernet/adi/Kconfig|  2 +-
 drivers/net/ethernet/amd/Kconfig|  2 +-
 drivers/net/ethernet/amd/xgbe/xgbe-main.c   |  6 ++-
 drivers/net/ethernet/broadcom/Kconfig   |  4 +-
 drivers/net/ethernet/cavium/Kconfig |  2 +-
 drivers/net/ethernet/freescale/Kconfig  |  2 +-
 drivers/net/ethernet/intel/Kconfig  | 10 ++--
 drivers/net/ethernet/mellanox/mlx4/Kconfig  |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig |  2 +-
 drivers/net/ethernet/renesas/Kconfig|  2 +-
 drivers/net/ethernet/samsung/Kconfig|  2 +-
 drivers/net/ethernet/sfc/Kconfig|  2 +-
 drivers/net/ethernet/stmicro/stmmac/Kconfig |  2 +-
 drivers/net/ethernet/ti/Kconfig |  2 +-
 drivers/net/ethernet/tile/Kconfig   |  2 +-
 drivers/ptp/Kconfig |  8 +--
 include/linux/ptp_clock_kernel.h| 65 -
 18 files changed, 69 insertions(+), 50 deletions(-)

diff --git a/drivers/Makefile b/drivers/Makefile
index f0afdfb3c7..8cfa1ff8f6 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -107,7 +107,7 @@ obj-$(CONFIG_INPUT) += input/
 obj-$(CONFIG_RTC_LIB)  += rtc/
 obj-y  += i2c/ media/
 obj-$(CONFIG_PPS)  += pps/
-obj-$(CONFIG_PTP_1588_CLOCK)   += ptp/
+obj-y  += ptp/
 obj-$(CONFIG_W1)   += w1/
 obj-y  += power/
 obj-$(CONFIG_HWMON)+= hwmon/
diff --git a/drivers/net/ethernet/adi/Kconfig b/drivers/net/ethernet/adi/Kconfig
index 6b94ba6103..98cc8f5350 100644
--- a/drivers/net/ethernet/adi/Kconfig
+++ b/drivers/net/ethernet/adi/Kconfig
@@ -58,7 +58,7 @@ config BFIN_RX_DESC_NUM
 config BFIN_MAC_USE_HWSTAMP
bool "Use IEEE 1588 hwstamp"
depends on BFIN_MAC && BF518
-   select PTP_1588_CLOCK
+   imply PTP_1588_CLOCK
default y
---help---
  To support the IEEE 1588 Precision Time Protocol (PTP), select y here
diff --git a/drivers/net/ethernet/amd/Kconfig b/drivers/net/ethernet/amd/Kconfig
index 0038709fd3..713ea7ad22 100644
--- a/drivers/net/ethernet/amd/Kconfig
+++ b/drivers/net/ethernet/amd/Kconfig
@@ -177,7 +177,7 @@ config AMD_XGBE
depends on ARM64 || COMPILE_TEST
select BITREVERSE
select CRC32
-   select PTP_1588_CLOCK
+   imply PTP_1588_CLOCK
---help---
  This driver supports the AMD 10GbE Ethernet device found on an
  AMD SoC.
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-main.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-main.c
index 9de078819a..e10e569c0d 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-main.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-main.c
@@ -773,7 +773,8 @@ static int xgbe_probe(struct platform_device *pdev)
goto err_wq;
}
 
-   xgbe_ptp_register(pdata);
+   if (IS_REACHABLE(CONFIG_PTP_1588_CLOCK))
+   xgbe_ptp_register(pdata);
 
xgbe_debugfs_init(pdata);
 
@@ -812,7 +813,8 @@ static int xgbe_remove(struct platform_device *pdev)
 
xgbe_debugfs_exit(pdata);
 
-   xgbe_ptp_unregister(pdata);
+   if (IS_REACHABLE(CONFIG_PTP_1588_CLOCK))
+   xgbe_ptp_unregister(pdata);
 
flush_workqueue(pdata->an_workqueue);
destroy_workqueue(pdata->an_workqueue);
diff --git a/drivers/net/ethernet/broadcom/Kconfig 
b/drivers/net/ethernet/broadcom/Kconfig
index bd8c80c0b7..6a8d74aeb6 100644
--- a/drivers/net/ethernet/broadcom/Kconfig
+++ b/drivers/net/ethernet/broadcom/Kconfig
@@ -110,7 +110,7 @@ config TIGON3
depends on PCI
select PHYLIB
select HWMON
-   select 

[PATCH v3 1/4] kconfig: introduce the "imply" keyword

2016-11-07 Thread Nicolas Pitre
The "imply" keyword is a weak version of "select" where the target
config symbol can still be turned off, avoiding those pitfalls that come
with the "select" keyword.

This is useful e.g. with multiple drivers that want to indicate their
ability to hook into a secondary subsystem while allowing the user to
configure that subsystem out without also having to unset these drivers.

Currently, the same effect can almost be achieved with:

config DRIVER_A
tristate

config DRIVER_B
tristate

config DRIVER_C
tristate

config DRIVER_D
tristate

[...]

config SUBSYSTEM_X
tristate
default DRIVER_A || DRIVER_B || DRIVER_C || DRIVER_D || [...]

This is unwieldy to maintain especially with a large number of drivers.
Furthermore, there is no easy way to restrict the choice for SUBSYSTEM_X
to y or n, excluding m, when some drivers are built-in. The "select"
keyword allows for excluding m, but it excludes n as well. Hence
this "imply" keyword.  The above becomes:

config DRIVER_A
tristate
imply SUBSYSTEM_X

config DRIVER_B
tristate
imply SUBSYSTEM_X

[...]

config SUBSYSTEM_X
tristate

This is much cleaner, and way more flexible than "select". SUBSYSTEM_X
can still be configured out, and it can be set as a module when none of
the drivers are configured in or all of them are modular.

Signed-off-by: Nicolas Pitre 
Reviewed-by: Josh Triplett 
Acked-by: Richard Cochran 
---
 Documentation/kbuild/kconfig-language.txt | 29 
 scripts/kconfig/expr.h|  2 ++
 scripts/kconfig/menu.c| 55 ++-
 scripts/kconfig/symbol.c  | 24 +-
 scripts/kconfig/zconf.gperf   |  1 +
 scripts/kconfig/zconf.y   | 16 +++--
 6 files changed, 108 insertions(+), 19 deletions(-)

diff --git a/Documentation/kbuild/kconfig-language.txt 
b/Documentation/kbuild/kconfig-language.txt
index 069fcb3eef..262722d886 100644
--- a/Documentation/kbuild/kconfig-language.txt
+++ b/Documentation/kbuild/kconfig-language.txt
@@ -113,6 +113,34 @@ applicable everywhere (see syntax).
That will limit the usefulness but on the other hand avoid
the illegal configurations all over.
 
+- weak reverse dependencies: "imply"  ["if" ]
+  This is similar to "select" as it enforces a lower limit on another
+  symbol except that the "implied" symbol's value may still be set to n
+  from a direct dependency or with a visible prompt.
+
+  Given the following example:
+
+  config FOO
+   tristate
+   imply BAZ
+
+  config BAZ
+   tristate
+   depends on BAR
+
+  The following values are possible:
+
+   FOO BAR BAZ's default   choice for BAZ
+   --- --- -   --
+   n   y   n   N/m/y
+   m   y   m   M/y/n
+   y   y   y   Y/n
+   y   n   *   N
+
+  This is useful e.g. with multiple drivers that want to indicate their
+  ability to hook into a secondary subsystem while allowing the user to
+  configure that subsystem out without also having to unset these drivers.
+
 - limiting menu display: "visible if" 
   This attribute is only applicable to menu blocks, if the condition is
   false, the menu block is not displayed to the user (the symbols
@@ -481,6 +509,7 @@ historical issues resolved through these different 
solutions.
   b) Match dependency semantics:
b1) Swap all "select FOO" to "depends on FOO" or,
b2) Swap all "depends on FOO" to "select FOO"
+  c) Consider the use of "imply" instead of "select"
 
 The resolution to a) can be tested with the sample Kconfig file
 Documentation/kbuild/Kconfig.recursion-issue-01 through the removal
diff --git a/scripts/kconfig/expr.h b/scripts/kconfig/expr.h
index 973b6f7333..a73f762c48 100644
--- a/scripts/kconfig/expr.h
+++ b/scripts/kconfig/expr.h
@@ -85,6 +85,7 @@ struct symbol {
struct property *prop;
struct expr_value dir_dep;
struct expr_value rev_dep;
+   struct expr_value implied;
 };
 
 #define for_all_symbols(i, sym) for (i = 0; i < SYMBOL_HASHSIZE; i++) for (sym 
= symbol_hash[i]; sym; sym = sym->next) if (sym->type != S_OTHER)
@@ -136,6 +137,7 @@ enum prop_type {
P_DEFAULT,  /* default y */
P_CHOICE,   /* choice value */
P_SELECT,   /* select BAR */
+   P_IMPLY,/* imply BAR */
P_RANGE,/* range 7..100 (for a symbol) */
P_ENV,  /* value from environment variable */
P_SYMBOL,   /* where a symbol is defined */
diff --git a/scripts/kconfig/menu.c b/scripts/kconfig/menu.c
index aed678e8a7..e9357931b4 100644
--- a/scripts/kconfig/menu.c
+++ b/scripts/kconfig/menu.c
@@ -233,6 +233,8 @@ 

[PATCH v3 0/4] make POSIX timers optional with some Kconfig help

2016-11-07 Thread Nicolas Pitre
Many embedded systems don't need the full POSIX timer support.
Configuring them out provides a nice kernel image size reduction.

When POSIX timers are configured out, the PTP clock subsystem should be
left out as well. However a bunch of ethernet drivers currently *select*
the later in their Kconfig entries. Therefore some more work was needed
to break that hard dependency from those drivers without preventing their
usage altogether.

Therefore this series also includes kconfig changes to implement a new
keyword to express some reverse dependencies like "select" does, named
"imply", and still allowing for the target config symbol to be disabled
if the user or a direct dependency says so.

At this point I'd like to gather ACKs especially from people in the "To"
field. Ideally this would need to go upstream as a single series to avoid
cross subsystem dependency issues.  So far it was suggested that this should go
via the kbuild tree.

Changes from v2:

- Dropped the patch adding the "suggest" keyword as nothing uses it.
  Requested by Paul Bolle.
- Various documentation and commit log clarifications, prompted also
  by Paul Bolle.
- Collected more ACKs.

Changes from v1:

- added "suggest" to kconfig for completeness
- various typo fixes
- small "imply" effect visibility fix

The bulk of the diffstat comes from the kconfig lex parser regeneration.

Diffstat:

 Documentation/kbuild/kconfig-language.txt   |   29 +
 drivers/Makefile|2 +-
 drivers/net/ethernet/adi/Kconfig|2 +-
 drivers/net/ethernet/amd/Kconfig|2 +-
 drivers/net/ethernet/amd/xgbe/xgbe-main.c   |6 +-
 drivers/net/ethernet/broadcom/Kconfig   |4 +-
 drivers/net/ethernet/cavium/Kconfig |2 +-
 drivers/net/ethernet/freescale/Kconfig  |2 +-
 drivers/net/ethernet/intel/Kconfig  |   10 +-
 drivers/net/ethernet/mellanox/mlx4/Kconfig  |2 +-
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig |2 +-
 drivers/net/ethernet/renesas/Kconfig|2 +-
 drivers/net/ethernet/samsung/Kconfig|2 +-
 drivers/net/ethernet/sfc/Kconfig|2 +-
 drivers/net/ethernet/stmicro/stmmac/Kconfig |2 +-
 drivers/net/ethernet/ti/Kconfig |2 +-
 drivers/net/ethernet/tile/Kconfig   |2 +-
 drivers/ptp/Kconfig |   10 +-
 include/linux/posix-timers.h|   28 +-
 include/linux/ptp_clock_kernel.h|   65 +-
 include/linux/sched.h   |   10 +
 init/Kconfig|   17 +
 kernel/signal.c |4 +
 kernel/time/Makefile|   10 +-
 kernel/time/posix-stubs.c   |  118 ++
 scripts/kconfig/expr.h  |2 +
 scripts/kconfig/menu.c  |   55 +-
 scripts/kconfig/symbol.c|   24 +-
 scripts/kconfig/zconf.gperf |1 +
 scripts/kconfig/zconf.hash.c_shipped|   30 +-
 scripts/kconfig/zconf.tab.c_shipped | 1581 -
 scripts/kconfig/zconf.y |   16 +-
 32 files changed, 1114 insertions(+), 932 deletions(-)


net/sctp: null-ptr-deref in sctp_inet_listen

2016-11-07 Thread Andrey Konovalov
Hi,

I've got the following error report while running the syzkaller fuzzer:

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault:  [#1] SMP KASAN
Modules linked in:
CPU: 1 PID: 3851 Comm: a.out Not tainted 4.9.0-rc4+ #354
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: 880065f1d800 task.stack: 88006384
RIP: 0010:[]  []
sctp_inet_listen+0x29b/0x790 net/sctp/socket.c:6870
RSP: 0018:880063847dd0  EFLAGS: 00010202
RAX: dc00 RBX: 11000c708fbd RCX: 
RDX:  RSI:  RDI: 0002
RBP: 880063847e70 R08: dc00 R09: dc00
R10: 0002 R11: 0002 R12: 88006b350800
R13:  R14: 11000d66a1a5 R15: 
FS:  7fd1f0f3d7c0() GS:88006cd0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 2000 CR3: 64af9000 CR4: 06e0
Stack:
 880063847de0 880066165900 88006b350d20 41b58ab3
 847ff589 83941280 dc00 
 880069b9f740  880063847e38 819f04ef
Call Trace:
 [< inline >] SYSC_listen net/socket.c:1396
 [] SyS_listen+0x206/0x250 net/socket.c:1382
 [] entry_SYSCALL_64_fastpath+0x1f/0xc2
arch/x86/entry/entry_64.S:209
Code: 00 0f 85 f4 04 00 00 4d 8b ac 24 28 05 00 00 49 b8 00 00 00 00
00 fc ff df 49 8d 7d 02 48 89 fe 49 89 fa 48 c1 ee 03 41 83 e2 07 <46>
0f b6 0c 06 41 83 c2 01 45 38 ca 7c 09 45 84 c9 0f 85 87 04
RIP  [] sctp_inet_listen+0x29b/0x790 net/sctp/socket.c:6870
 RSP 
---[ end trace f2b501fc22999b37 ]---

A reproducer is attached.

On commit bc33b0ca11e3df46a4fa7639ba488c9d4911 (Nov 5).

Thanks!


sctp-listen-null-poc.c
Description: Binary data


Re: [v15, 3/7] powerpc/fsl: move mpc85xx.h to include/linux/fsl

2016-11-07 Thread Arnd Bergmann
On Monday, October 31, 2016 9:35:33 AM CET Y.B. Lu wrote:
> > 
> > I don't see any of the contents of this header referenced by the soc
> > driver any more. I think you can just drop this patch.
> > 
> 
> [Lu Yangbo-B47093] This header file was included by guts.c.
> The guts driver used macro SVR_MAJ/SVR_MIN for calculation.
> 
> This header file was for powerpc arch before. And this patch is to made it as
> common header file for both ARM and PPC.
> Sooner or later this is needed.

Let's discuss it once we actually need the header then, ok?

Arnd


[PATCH] vxlan: hide unused local variable

2016-11-07 Thread Arnd Bergmann
A bugfix introduced a harmless warning in v4.9-rc4:

drivers/net/vxlan.c: In function 'vxlan_group_used':
drivers/net/vxlan.c:947:21: error: unused variable 'sock6' 
[-Werror=unused-variable]

This hides the variable inside of the same #ifdef that is
around its user. The extraneous initialization is removed
at the same time, it was accidentally introduced in the
same commit.

Fixes: c6fcc4fc5f8b ("vxlan: avoid using stale vxlan socket.")
Signed-off-by: Arnd Bergmann 
---
 drivers/net/vxlan.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index cb5cc7c03160..5264c1a49d86 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -944,7 +944,9 @@ static bool vxlan_group_used(struct vxlan_net *vn, struct 
vxlan_dev *dev)
 {
struct vxlan_dev *vxlan;
struct vxlan_sock *sock4;
-   struct vxlan_sock *sock6 = NULL;
+#if IS_ENABLED(CONFIG_IPV6)
+   struct vxlan_sock *sock6;
+#endif
unsigned short family = dev->default_dst.remote_ip.sa.sa_family;
 
sock4 = rtnl_dereference(dev->vn4_sock);
-- 
2.9.0



[PATCH net-next v2 4/5] net: l2tp: cleanup: remove redundant condition

2016-11-07 Thread Asbjoern Sloth Toennesen
These assignments follow this pattern:

unsigned int foo:1;
struct nlattr *nla = info->attrs[bar];

if (nla)
foo = nla_get_flag(nla); /* expands to: foo = !!nla */

This could be simplified to: if (nla) foo = 1;
but lets just remove the condition and use the macro,

foo = nla_get_flag(nla);

Signed-off-by: Asbjoern Sloth Toennesen 
---
 net/l2tp/l2tp_netlink.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/l2tp/l2tp_netlink.c b/net/l2tp/l2tp_netlink.c
index 494910d..3620fba 100644
--- a/net/l2tp/l2tp_netlink.c
+++ b/net/l2tp/l2tp_netlink.c
@@ -220,14 +220,14 @@ static int l2tp_nl_cmd_tunnel_create(struct sk_buff *skb, 
struct genl_info *info
cfg.local_udp_port = 
nla_get_u16(info->attrs[L2TP_ATTR_UDP_SPORT]);
if (info->attrs[L2TP_ATTR_UDP_DPORT])
cfg.peer_udp_port = 
nla_get_u16(info->attrs[L2TP_ATTR_UDP_DPORT]);
-   if (info->attrs[L2TP_ATTR_UDP_CSUM])
-   cfg.use_udp_checksums = 
nla_get_flag(info->attrs[L2TP_ATTR_UDP_CSUM]);
+   cfg.use_udp_checksums = nla_get_flag(
+   info->attrs[L2TP_ATTR_UDP_CSUM]);
 
 #if IS_ENABLED(CONFIG_IPV6)
-   if (info->attrs[L2TP_ATTR_UDP_ZERO_CSUM6_TX])
-   cfg.udp6_zero_tx_checksums = 
nla_get_flag(info->attrs[L2TP_ATTR_UDP_ZERO_CSUM6_TX]);
-   if (info->attrs[L2TP_ATTR_UDP_ZERO_CSUM6_RX])
-   cfg.udp6_zero_rx_checksums = 
nla_get_flag(info->attrs[L2TP_ATTR_UDP_ZERO_CSUM6_RX]);
+   cfg.udp6_zero_tx_checksums = nla_get_flag(
+   info->attrs[L2TP_ATTR_UDP_ZERO_CSUM6_TX]);
+   cfg.udp6_zero_rx_checksums = nla_get_flag(
+   info->attrs[L2TP_ATTR_UDP_ZERO_CSUM6_RX]);
 #endif
}
 
-- 
2.10.1



[PATCH net-next v2 5/5] net: l2tp: fix negative assignment to unsigned int

2016-11-07 Thread Asbjoern Sloth Toennesen
recv_seq, send_seq and lns_mode mode are all defined as
unsigned int foo:1;

Signed-off-by: Asbjoern Sloth Toennesen 
---
 net/l2tp/l2tp_core.c | 2 +-
 net/l2tp/l2tp_ppp.c  | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index a2ed3bd..85948c6 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -715,7 +715,7 @@ void l2tp_recv_common(struct l2tp_session *session, struct 
sk_buff *skb,
l2tp_info(session, L2TP_MSG_SEQ,
  "%s: requested to enable seq numbers by 
LNS\n",
  session->name);
-   session->send_seq = -1;
+   session->send_seq = 1;
l2tp_session_set_header_len(session, tunnel->version);
}
} else {
diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c
index 41d47bf..2ddfec1 100644
--- a/net/l2tp/l2tp_ppp.c
+++ b/net/l2tp/l2tp_ppp.c
@@ -1272,7 +1272,7 @@ static int pppol2tp_session_setsockopt(struct sock *sk,
err = -EINVAL;
break;
}
-   session->recv_seq = val ? -1 : 0;
+   session->recv_seq = !!val;
l2tp_info(session, PPPOL2TP_MSG_CONTROL,
  "%s: set recv_seq=%d\n",
  session->name, session->recv_seq);
@@ -1283,7 +1283,7 @@ static int pppol2tp_session_setsockopt(struct sock *sk,
err = -EINVAL;
break;
}
-   session->send_seq = val ? -1 : 0;
+   session->send_seq = !!val;
{
struct sock *ssk  = ps->sock;
struct pppox_sock *po = pppox_sk(ssk);
@@ -1301,7 +1301,7 @@ static int pppol2tp_session_setsockopt(struct sock *sk,
err = -EINVAL;
break;
}
-   session->lns_mode = val ? -1 : 0;
+   session->lns_mode = !!val;
l2tp_info(session, PPPOL2TP_MSG_CONTROL,
  "%s: set lns_mode=%d\n",
  session->name, session->lns_mode);
-- 
2.10.1



[PATCH net-next v2 2/5] net: l2tp: only set L2TP_ATTR_UDP_CSUM if AF_INET

2016-11-07 Thread Asbjoern Sloth Toennesen
Only set L2TP_ATTR_UDP_CSUM in l2tp_nl_tunnel_send()
when it's running over IPv4.

This prepares the code to also have IPv6 specific attributes.

Signed-off-by: Asbjoern Sloth Toennesen 
---
 net/l2tp/l2tp_netlink.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/l2tp/l2tp_netlink.c b/net/l2tp/l2tp_netlink.c
index 59aa2d2..2abd100 100644
--- a/net/l2tp/l2tp_netlink.c
+++ b/net/l2tp/l2tp_netlink.c
@@ -379,9 +379,14 @@ static int l2tp_nl_tunnel_send(struct sk_buff *skb, u32 
portid, u32 seq, int fla
 
switch (tunnel->encap) {
case L2TP_ENCAPTYPE_UDP:
+   switch (sk->sk_family) {
+   case AF_INET:
+   if (nla_put_u8(skb, L2TP_ATTR_UDP_CSUM, 
!sk->sk_no_check_tx))
+   goto nla_put_failure;
+   break;
+   }
if (nla_put_u16(skb, L2TP_ATTR_UDP_SPORT, 
ntohs(inet->inet_sport)) ||
-   nla_put_u16(skb, L2TP_ATTR_UDP_DPORT, 
ntohs(inet->inet_dport)) ||
-   nla_put_u8(skb, L2TP_ATTR_UDP_CSUM, !sk->sk_no_check_tx))
+   nla_put_u16(skb, L2TP_ATTR_UDP_DPORT, 
ntohs(inet->inet_dport)))
goto nla_put_failure;
/* NOBREAK */
case L2TP_ENCAPTYPE_IP:
-- 
2.10.1



[PATCH net-next v2 1/5] net: l2tp: change L2TP_ATTR_UDP_ZERO_CSUM6_{RX,TX} attribute types

2016-11-07 Thread Asbjoern Sloth Toennesen
The attributes L2TP_ATTR_UDP_ZERO_CSUM6_RX and
L2TP_ATTR_UDP_ZERO_CSUM6_TX are used as flags,
but is defined as a u8 in a comment.

This patch redocuments them as flags.

Adding nla_policy entries would break API, so not doing that.

CC: Tom Herbert 
Signed-off-by: Asbjoern Sloth Toennesen 
---
 include/uapi/linux/l2tp.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/l2tp.h b/include/uapi/linux/l2tp.h
index 4bd27d0..5daa48e 100644
--- a/include/uapi/linux/l2tp.h
+++ b/include/uapi/linux/l2tp.h
@@ -124,8 +124,8 @@ enum {
L2TP_ATTR_STATS,/* nested */
L2TP_ATTR_IP6_SADDR,/* struct in6_addr */
L2TP_ATTR_IP6_DADDR,/* struct in6_addr */
-   L2TP_ATTR_UDP_ZERO_CSUM6_TX,/* u8 */
-   L2TP_ATTR_UDP_ZERO_CSUM6_RX,/* u8 */
+   L2TP_ATTR_UDP_ZERO_CSUM6_TX,/* flag */
+   L2TP_ATTR_UDP_ZERO_CSUM6_RX,/* flag */
L2TP_ATTR_PAD,
__L2TP_ATTR_MAX,
 };
-- 
2.10.1



Re: [PATCH net-next 1/5] net: l2tp: fix L2TP_ATTR_UDP_CSUM attribute type

2016-11-07 Thread Asbjørn Sloth Tønnesen
Hi David,

Thanks for the review.

On Mon, 07 Nov 2016 13:08:45 -0500 (EST), David Miller  
wrote:
> From: Asbjoern Sloth Toennesen 
> Date: Fri,  4 Nov 2016 22:48:34 +
> 
> > L2TP_ATTR_UDP_CSUM is a flag, and gets read with
> > nla_get_flag, but it is defined as NLA_U8 in
> > the nla_policy.
> > 
> > It appears that this is only publicly used in
> > iproute2, where it's broken, because it's used as
> > a NLA_FLAG, and fails validation as a NLA_U8.
> > 
> > The only place it's used as a NLA_U8 is in
> > l2tp_nl_tunnel_send(), but iproute2 again reads that
> > as a flag, it's therefore always set. Fortunately
> > it is never used for anything, just read.
> > 
> > CC: Miao Wang 
> > Signed-off-by: Asbjoern Sloth Toennesen 
> 
> This is definitely the wrong way to go about this.
> 
> The kernel is everywhere and updating iproute2 is infinitely
> easier for users to do than updating the kernel.
> 
> And in any event, once exported we really should never change
> the API of anything shown to userspace like this.  Just because
> you can't find a user out there doesn't mean it doesn't exist.

Sure, I have submitted a v2 of the patchset, that keeps the
current netlink API intact.

Was unsure how frozen the API was in these outlying corners,
also only tried changing the cases where the kernel side is inconsistently
implemented, ie. kept L2TP_ATTR_{SEND,RECV}_SEQ as u8-flags since it was
used consitently.


> Please instead fix iproute2 to use u8 attributes for this.

Will do (set with u8-flag, read as u8).

-- 
Best regards
Asbjørn Sloth Tønnesen


[PATCH net-next v2 3/5] net: l2tp: netlink: l2tp_nl_tunnel_send: set UDP6 checksum flags

2016-11-07 Thread Asbjoern Sloth Toennesen
This patch causes the proper attribute flags to be set,
in the case that IPv6 UDP checksums are disabled, so that
userspace ie. `ip l2tp show tunnel` knows about it.

Signed-off-by: Asbjoern Sloth Toennesen 
---
 net/l2tp/l2tp_netlink.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/net/l2tp/l2tp_netlink.c b/net/l2tp/l2tp_netlink.c
index 2abd100..494910d 100644
--- a/net/l2tp/l2tp_netlink.c
+++ b/net/l2tp/l2tp_netlink.c
@@ -384,6 +384,16 @@ static int l2tp_nl_tunnel_send(struct sk_buff *skb, u32 
portid, u32 seq, int fla
if (nla_put_u8(skb, L2TP_ATTR_UDP_CSUM, 
!sk->sk_no_check_tx))
goto nla_put_failure;
break;
+#if IS_ENABLED(CONFIG_IPV6)
+   case AF_INET6:
+   if (udp_get_no_check6_tx(sk) &&
+   nla_put_flag(skb, L2TP_ATTR_UDP_ZERO_CSUM6_TX))
+   goto nla_put_failure;
+   if (udp_get_no_check6_rx(sk) &&
+   nla_put_flag(skb, L2TP_ATTR_UDP_ZERO_CSUM6_RX))
+   goto nla_put_failure;
+   break;
+#endif
}
if (nla_put_u16(skb, L2TP_ATTR_UDP_SPORT, 
ntohs(inet->inet_sport)) ||
nla_put_u16(skb, L2TP_ATTR_UDP_DPORT, 
ntohs(inet->inet_dport)))
-- 
2.10.1



[PATCH net] ibmvnic: Start completion queue negotiation at server-provided optimum values

2016-11-07 Thread John Allen
Use the opt_* fields to determine the starting point for negotiating the
number of tx/rx completion queues with the vnic server. These contain the
number of queues that the vnic server estimates that it will be able to
allocate. While renegotiation may still occur, using the opt_* fields will
reduce the number of times this needs to happen and will prevent driver
probe timeout on systems using large numbers of ibmvnic client devices per
vnic port.

Signed-off-by: John Allen 
---
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index d54405b4..ee66164 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1493,9 +1493,8 @@ static void init_sub_crqs(struct ibmvnic_adapter 
*adapter, int retry)
adapter->max_rx_add_entries_per_subcrq > entries_page ?
entries_page : adapter->max_rx_add_entries_per_subcrq;

-   /* Choosing the maximum number of queues supported by firmware*/
-   adapter->req_tx_queues = adapter->max_tx_queues;
-   adapter->req_rx_queues = adapter->max_rx_queues;
+   adapter->req_tx_queues = adapter->opt_tx_comp_sub_queues;
+   adapter->req_rx_queues = adapter->opt_rx_comp_queues;
adapter->req_rx_add_queues = adapter->max_rx_add_queues;

adapter->req_mtu = adapter->max_mtu;



Re: [net-next PATCH 0/3] qdisc and tx_queue_len cleanups for IFF_NO_QUEUE devices

2016-11-07 Thread Jesper Dangaard Brouer
On Mon, 07 Nov 2016 13:13:44 -0500 (EST)
David Miller  wrote:

> From: Jesper Dangaard Brouer 
> Date: Thu, 03 Nov 2016 14:55:56 +0100
> 
> > This patchset is a cleanup for IFF_NO_QUEUE devices.  It will
> > hopefully help userspace get a more consistent behavior when attaching
> > qdisc to such virtual devices.  
> 
> I'm still thinking about this.
> 
> My reservation about this is basically since the one known offender in
> userspace acknowledged that what it was doing wrong, and fixed it
> quickly already, I see no reason to explicitly accomodate this.

The situation I worry about is that a sysadm cannot manually apply a tc
qdisc on a Docker container's veth without getting bitten.  Docker will
forever run a "loophole-script" to accommodate older kernels, and yes
the shiny management scripts will get fixed, but how should a mortal
sysadm know (to change tx_queue_len before applying a qdisc).

Besides the it was only fixed in OpenShift, which inherited the "bug"
from Docker.  Thus, it is per-say not fixed in Docker or other projects
that (like OpenShift) uses components from Docker.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer


[PATCH v2] net: icmp_route_lookup should use rt dev to determine L3 domain

2016-11-07 Thread David Ahern
icmp_send is called in response to some event. The skb may not have
the device set (skb->dev is NULL), but it is expected to have an rt.
Update icmp_route_lookup to use the rt on the skb to determine L3
domain.

Fixes: 613d09b30f8b ("net: Use VRF device index for lookups on TX")
Signed-off-by: David Ahern 
---
v2
- use skb_dst versus skb_rtable

 net/ipv4/icmp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 53a890b605fc..691146abde2d 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -479,7 +479,7 @@ static struct rtable *icmp_route_lookup(struct net *net,
fl4->flowi4_proto = IPPROTO_ICMP;
fl4->fl4_icmp_type = type;
fl4->fl4_icmp_code = code;
-   fl4->flowi4_oif = l3mdev_master_ifindex(skb_in->dev);
+   fl4->flowi4_oif = l3mdev_master_ifindex(skb_dst(skb_in)->dev);
 
security_skb_classify_flow(skb_in, flowi4_to_flowi(fl4));
rt = __ip_route_output_key_hash(net, fl4,
@@ -504,7 +504,7 @@ static struct rtable *icmp_route_lookup(struct net *net,
if (err)
goto relookup_failed;
 
-   if (inet_addr_type_dev_table(net, skb_in->dev,
+   if (inet_addr_type_dev_table(net, skb_dst(skb_in)->dev,
 fl4_dec.saddr) == RTN_LOCAL) {
rt2 = __ip_route_output_key(net, _dec);
if (IS_ERR(rt2))
-- 
2.1.4



Re: [PATCH] net-ipv6: on device mtu change do not add mtu to mtu-less routes

2016-11-07 Thread Hannes Frederic Sowa
On 04.11.2016 22:51, Maciej Żenczykowski wrote:
> From: Maciej Żenczykowski 
> 
> Routes can specify an mtu explicitly or inherit the mtu from
> the underlying device - this inheritance is implemented in
> dst->ops->mtu handlers ip6_mtu() and ip6_blackhole_mtu().
> 
> Currently changing the mtu of a device adds mtu explicitly
> to routes using that device.
> 
> ie.
>   # ip link set dev lo mtu 65536
>   # ip -6 route add local 2000::1 dev lo
>   # ip -6 route get 2000::1
>   local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
> 
>   # ip link set dev lo mtu 65535
>   # ip -6 route get 2000::1
>   local 2000::1 dev lo  table local  src ...  metric 1024  mtu 65535 pref 
> medium
> 
>   # ip link set dev lo mtu 65536
>   # ip -6 route get 2000::1
>   local 2000::1 dev lo  table local  src ...  metric 1024  mtu 65536 pref 
> medium
> 
>   # ip -6 route del local 2000::1
> 
> After this patch the route entry no longer changes unless it already has an 
> mtu.
> There is no need: this inheritance is already done in ip6_mtu()
> 
>   # ip link set dev lo mtu 65536
>   # ip -6 route add local 2000::1 dev lo
>   # ip -6 route add local 2000::2 dev lo mtu 2000
>   # ip -6 route get 2000::1; ip -6 route get 2000::2
>   local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
>   local 2000::2 dev lo  table local  src ...  metric 1024  mtu 2000 pref 
> medium
> 
>   # ip link set dev lo mtu 65535
>   # ip -6 route get 2000::1; ip -6 route get 2000::2
>   local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
>   local 2000::2 dev lo  table local  src ...  metric 1024  mtu 2000 pref 
> medium
> 
>   # ip link set dev lo mtu 1501
>   # ip -6 route get 2000::1; ip -6 route get 2000::2
>   local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
>   local 2000::2 dev lo  table local  src ...  metric 1024  mtu 1501 pref 
> medium
> 
>   # ip link set dev lo mtu 65536
>   # ip -6 route get 2000::1; ip -6 route get 2000::2
>   local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
>   local 2000::2 dev lo  table local  src ...  metric 1024  mtu 65536 pref 
> medium
> 
>   # ip -6 route del local 2000::1
>   # ip -6 route del local 2000::2
> 
> This is desirable because changing device mtu and then resetting it
> to the previous value shouldn't change the user visible routing table.
> 
> Signed-off-by: Maciej Żenczykowski 
> CC: Eric Dumazet 
> ---
>  net/ipv6/route.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 947ed1ded026..fa90d14302f7 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -2758,6 +2758,7 @@ static int rt6_mtu_change_route(struct rt6_info *rt, 
> void *p_arg)
>  PMTU discouvery.
>*/
>   if (rt->dst.dev == arg->dev &&
> + dst_metric_raw(>dst, RTAX_MTU) &&
>   !dst_metric_locked(>dst, RTAX_MTU)) {
>   if (rt->rt6i_flags & RTF_CACHE) {
>   /* For RTF_CACHE with rt6i_pmtu == 0
> 

Yep, that makes sense.

Acked-by: Hannes Frederic Sowa 




net/tcp: warning in tcp_recvmsg

2016-11-07 Thread Andrey Konovalov
Hi,

I've got the following error report while running the syzkaller fuzzer:

[ cut here ]
WARNING: CPU: 1 PID: 9957 at net/ipv4/tcp.c:1766
tcp_recvmsg+0x19d7/0x26e0 net/ipv4/tcp.c:1765
Modules linked in:
CPU: 1 PID: 9957 Comm: syz-executor Not tainted 4.9.0-rc4+ #352
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 880065bd7788 81b46a64  
 8445f8e0  880065bd77d0 8387
 880068414980 06e6 8445f8e0 06e6
Call Trace:
 [< inline >] __dump_stack lib/dump_stack.c:15
 [] dump_stack+0xb3/0x10f lib/dump_stack.c:51
 [] __warn+0x1a7/0x1f0 kernel/panic.c:550
 [] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
 [] tcp_recvmsg+0x19d7/0x26e0 net/ipv4/tcp.c:1765
 [] inet_recvmsg+0x308/0x4b0 net/ipv4/af_inet.c:765
 [< inline >] sock_recvmsg_nosec net/socket.c:708
 [] sock_recvmsg+0xd9/0x110 net/socket.c:715
 [] sock_read_iter+0x247/0x360 net/socket.c:792
 [] do_iter_readv_writev+0x2bb/0x3f0 fs/read_write.c:695
 [] do_readv_writev+0x431/0x730 fs/read_write.c:872
 [] vfs_readv+0x8c/0xc0 fs/read_write.c:898
 [] do_readv+0xe1/0x240 fs/read_write.c:924
 [< inline >] SYSC_readv fs/read_write.c:1011
 [] SyS_readv+0x27/0x30 fs/read_write.c:1008
 [] entry_SYSCALL_64_fastpath+0x1f/0xc2
arch/x86/entry/entry_64.S:209
---[ end trace 8efae7c5dcb2bd76 ]---

On commit bc33b0ca11e3df46a4fa7639ba488c9d4911 (Nov 5).

Thanks!


[PATCH net-next] net-gro: avoid reorders

2016-11-07 Thread Eric Dumazet
From: Eric Dumazet 

Receiving a GSO packet in dev_gro_receive() is not uncommon
in stacked devices, or devices partially implementing LRO/GRO
like bnx2x. GRO is implementing the aggregation the device
was not able to do itself.

Current code causes reorders, like in following case :

For a given flow where sender sent 3 packets P1,P2,P3,P4

Receiver might receive P1 as a single packet, stored in GRO engine.

Then P2-P4 are received as a single GSO packet, immediately given to
upper stack, while P1 is held in GRO engine.

This patch will make sure P1 is given to upper stack, then P2-P4
immediately after.

Signed-off-by: Eric Dumazet 
---
 net/core/dev.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index f23e28668f32..b77cde68967c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4482,7 +4482,7 @@ static enum gro_result dev_gro_receive(struct napi_struct 
*napi, struct sk_buff
if (!(skb->dev->features & NETIF_F_GRO))
goto normal;
 
-   if (skb_is_gso(skb) || skb_has_frag_list(skb) || skb->csum_bad)
+   if (skb->csum_bad)
goto normal;
 
gro_list_prepare(napi, skb);
@@ -4495,7 +4495,7 @@ static enum gro_result dev_gro_receive(struct napi_struct 
*napi, struct sk_buff
skb_set_network_header(skb, skb_gro_offset(skb));
skb_reset_mac_len(skb);
NAPI_GRO_CB(skb)->same_flow = 0;
-   NAPI_GRO_CB(skb)->flush = 0;
+   NAPI_GRO_CB(skb)->flush = skb_is_gso(skb) || 
skb_has_frag_list(skb);
NAPI_GRO_CB(skb)->free = 0;
NAPI_GRO_CB(skb)->encap_mark = 0;
NAPI_GRO_CB(skb)->recursion_counter = 0;




Re: [Intel-wired-lan] [PATCH] igb: drop field "tail" of struct igb_ring

2016-11-07 Thread Alexander Duyck
On Mon, Nov 7, 2016 at 4:44 AM, Cao jin  wrote:
> Under certain condition, I find guest will oops on writel() in
> igb_configure_tx_ring(), because hw->hw_address is NULL. While other
> register access won't oops kernel because they use wr32/rd32 which have
> a defense against NULL pointer. The oops message are as following:
>
> [  141.225449] pcieport :00:1c.0: AER: Multiple Uncorrected (Fatal)
> error received: id=0101
> [  141.225523] igb :01:00.1: PCIe Bus Error: severity=Uncorrected
> (Fatal), type=Unaccessible, id=0101(Unregistered Agent ID)
> [  141.299442] igb :01:00.1: broadcast error_detected message
> [  141.300539] igb :01:00.0 enp1s0f0: PCIe link lost, device now
> detached
> [  141.351019] igb :01:00.1 enp1s0f1: PCIe link lost, device now
> detached
> [  143.465904] pcieport :00:1c.0: Root Port link has been reset
> [  143.465994] igb :01:00.1: broadcast slot_reset message
> [  143.466039] igb :01:00.0: enabling device ( -> 0002)
> [  144.389078] igb :01:00.1: enabling device ( -> 0002)
> [  145.312078] igb :01:00.1: broadcast resume message
> [  145.322211] BUG: unable to handle kernel paging request at
> 3818
> [  145.361275] IP: [] igb_configure_tx_ring+0x14d/0x280 
> [igb]
> [  145.438007] Oops: 0002 [#1] SMP
>
> On the other hand, commit 238ac817 does some optimization which
> dropped the field "head". So I think it is time to drop "tail" as well.

There is a bug here, but removing tail isn't the fix.

> Signed-off-by: Cao jin 
> ---
>  drivers/net/ethernet/intel/igb/igb.h  |  1 -
>  drivers/net/ethernet/intel/igb/igb_main.c | 16 +---
>  2 files changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/igb/igb.h 
> b/drivers/net/ethernet/intel/igb/igb.h
> index d11093d..0df06bc 100644
> --- a/drivers/net/ethernet/intel/igb/igb.h
> +++ b/drivers/net/ethernet/intel/igb/igb.h
> @@ -247,7 +247,6 @@ struct igb_ring {
> };
> void *desc; /* descriptor ring memory */
> unsigned long flags;/* ring specific flags */
> -   void __iomem *tail; /* pointer to ring tail register */
> dma_addr_t dma; /* phys address of the ring */
> unsigned int  size; /* length of desc. ring in bytes */
>
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
> b/drivers/net/ethernet/intel/igb/igb_main.c
> index edc9a6a..e177d0e 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -3390,9 +3390,8 @@ void igb_configure_tx_ring(struct igb_adapter *adapter,
>  tdba & 0xULL);
> wr32(E1000_TDBAH(reg_idx), tdba >> 32);
>
> -   ring->tail = hw->hw_addr + E1000_TDT(reg_idx);

This line is where the bug is.  This should be adapter->io_addr, not
hw->hw_addr.

> wr32(E1000_TDH(reg_idx), 0);
> -   writel(0, ring->tail);
> +wr32(E1000_TDT(reg_idx), 0);
>
> txdctl |= IGB_TX_PTHRESH;
> txdctl |= IGB_TX_HTHRESH << 8;
> @@ -3729,9 +3728,8 @@ void igb_configure_rx_ring(struct igb_adapter *adapter,
>  ring->count * sizeof(union e1000_adv_rx_desc));
>
> /* initialize head and tail */
> -   ring->tail = hw->hw_addr + E1000_RDT(reg_idx);

Same thing here.  It looks like the wrong values where used.

> wr32(E1000_RDH(reg_idx), 0);
> -   writel(0, ring->tail);
> +   wr32(E1000_RDT(reg_idx), 0);
>
> /* set descriptor configuration */

Would you prefer to submit the patch for this or should I?  Basically
all you need to do is change the two lines where ring->tail is
populated so that you use adapter->io_addr instead of hw->hw_addr.

Thanks.

- Alex


Re: [PATCH 7/8] tools lib bpf: fix maps resolution

2016-11-07 Thread Eric Leblond
Hi,

On Tue, 2016-11-08 at 02:23 +0800, Wangnan (F) wrote:
> Hi Eric,
> 
> Are you still working in this patch set?

Sorry to lag on this, I've been taken by a series of other projects. I
did not yet reworked it yet but I was planning to do a bit on it this
week.

> 
> Now I know why maps section is not a simple array
> from a patch set from Joe Stringer:
> 
> https://www.mail-archive.com/netdev@vger.kernel.org/msg135088.html
> 
> So I think this patch is really useful.
> 
> Are you going to resend the whole patch set? If not, let me collect
> this patch 7/8 into my local code base and send to Arnaldo
> with my other patches.

If ok with you, I propose that you collect patch 7/8 it you have no
news from me on Friday. If an issue for you, just collect it now and I
will synchronize with updated code when resending my patchset.

BR,
-- 
Eric Leblond 
Blog: https://home.regit.org/


Re: [PATCH 7/8] tools lib bpf: fix maps resolution

2016-11-07 Thread Wangnan (F)

Hi Eric,

Are you still working in this patch set?

Now I know why maps section is not a simple array
from a patch set from Joe Stringer:

https://www.mail-archive.com/netdev@vger.kernel.org/msg135088.html

So I think this patch is really useful.

Are you going to resend the whole patch set? If not, let me collect
this patch 7/8 into my local code base and send to Arnaldo
with my other patches.

Thank you.

On 2016/10/17 5:18, Eric Leblond wrote:

It is not correct to assimilate the elf data of the maps section
to an array of map definition. In fact the sizes differ. The
offset provided in the symbol section has to be used instead.

This patch fixes a bug causing a elf with two maps not to load
correctly.

Signed-off-by: Eric Leblond 
---
  tools/lib/bpf/libbpf.c | 50 +++---
  1 file changed, 35 insertions(+), 15 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 1fe4532..f72628b 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -186,6 +186,7 @@ struct bpf_program {
  struct bpf_map {
int fd;
char *name;
+   size_t offset;
struct bpf_map_def def;
void *priv;
bpf_map_clear_priv_t clear_priv;
@@ -529,13 +530,6 @@ bpf_object__init_maps(struct bpf_object *obj, void *data,
  
  	pr_debug("maps in %s: %zd bytes\n", obj->path, size);
  
-	obj->maps = calloc(nr_maps, sizeof(obj->maps[0]));

-   if (!obj->maps) {
-   pr_warning("alloc maps for object failed\n");
-   return -ENOMEM;
-   }
-   obj->nr_maps = nr_maps;
-
for (i = 0; i < nr_maps; i++) {
struct bpf_map_def *def = >maps[i].def;
  
@@ -547,23 +541,42 @@ bpf_object__init_maps(struct bpf_object *obj, void *data,

obj->maps[i].fd = -1;
  
  		/* Save map definition into obj->maps */

-   *def = ((struct bpf_map_def *)data)[i];
+   *def = *(struct bpf_map_def *)(data + obj->maps[i].offset);
}
return 0;
  }
  
  static int

-bpf_object__init_maps_name(struct bpf_object *obj)
+bpf_object__init_maps_symbol(struct bpf_object *obj)
  {
int i;
+   int nr_maps = 0;
Elf_Data *symbols = obj->efile.symbols;
+   size_t map_idx = 0;
  
  	if (!symbols || obj->efile.maps_shndx < 0)

return -EINVAL;
  
+	/* get the number of maps */

+   for (i = 0; i < symbols->d_size / sizeof(GElf_Sym); i++) {
+   GElf_Sym sym;
+
+   if (!gelf_getsym(symbols, i, ))
+   continue;
+   if (sym.st_shndx != obj->efile.maps_shndx)
+   continue;
+   nr_maps++;
+   }
+
+   obj->maps = calloc(nr_maps, sizeof(obj->maps[0]));
+   if (!obj->maps) {
+   pr_warning("alloc maps for object failed\n");
+   return -ENOMEM;
+   }
+   obj->nr_maps = nr_maps;
+
for (i = 0; i < symbols->d_size / sizeof(GElf_Sym); i++) {
GElf_Sym sym;
-   size_t map_idx;
const char *map_name;
  
  		if (!gelf_getsym(symbols, i, ))

@@ -574,12 +587,12 @@ bpf_object__init_maps_name(struct bpf_object *obj)
map_name = elf_strptr(obj->efile.elf,
  obj->efile.strtabidx,
  sym.st_name);
-   map_idx = sym.st_value / sizeof(struct bpf_map_def);
if (map_idx >= obj->nr_maps) {
pr_warning("index of map \"%s\" is buggy: %zu > %zu\n",
   map_name, map_idx, obj->nr_maps);
continue;
}
+   obj->maps[map_idx].offset = sym.st_value;
obj->maps[map_idx].name = strdup(map_name);
if (!obj->maps[map_idx].name) {
pr_warning("failed to alloc map name\n");
@@ -587,6 +600,7 @@ bpf_object__init_maps_name(struct bpf_object *obj)
}
pr_debug("map %zu is \"%s\"\n", map_idx,
 obj->maps[map_idx].name);
+   map_idx++;
}
return 0;
  }
@@ -647,8 +661,6 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
data->d_buf,
data->d_size);
else if (strcmp(name, "maps") == 0) {
-   err = bpf_object__init_maps(obj, data->d_buf,
-   data->d_size);
obj->efile.maps_shndx = idx;
} else if (sh.sh_type == SHT_SYMTAB) {
if (obj->efile.symbols) {
@@ -698,8 +710,16 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
pr_warning("Corrupted ELF file: index of strtab invalid\n");
return LIBBPF_ERRNO__FORMAT;
}
-   if 

Re: [PATCH net-next 0/2] udp: do fwd memory scheduling on dequeue

2016-11-07 Thread David Miller
From: Paolo Abeni 
Date: Fri,  4 Nov 2016 11:28:57 +0100

> After commit 850cbaddb52d ("udp: use it's own memory accounting schema"),
> the udp code needs to acquire twice the receive queue spinlock on dequeue.
> 
> This patch series remove the need for the second lock at skb free time,
> moving the udp memory scheduling inside the dequeue operation; the skb
> destructor field is not used anymore and an additional sk argument is added
> to ip_cmsg_recv_offset() to cope with null skb->sk after dequeue.
> 
> Many thanks to Eric Dumazed for suggesting pretty all much the above.

Series applied, thanks.


Re: [PATCH net] bpf: fix map not being uncharged during map creation failure

2016-11-07 Thread David Miller
From: Daniel Borkmann 
Date: Fri,  4 Nov 2016 00:56:31 +0100

> In map_create(), we first find and create the map, then once that
> suceeded, we charge it to the user's RLIMIT_MEMLOCK, and then fetch
> a new anon fd through anon_inode_getfd(). The problem is, once the
> latter fails f.e. due to RLIMIT_NOFILE limit, then we only destruct
> the map via map->ops->map_free(), but without uncharging the previously
> locked memory first. That means that the user_struct allocation is
> leaked as well as the accounted RLIMIT_MEMLOCK memory not released.
> Make the label names in the fix consistent with bpf_prog_load().
> 
> Fixes: aaac3ba95e4c ("bpf: charge user for creation of BPF maps and programs")
> Signed-off-by: Daniel Borkmann 
> Acked-by: Alexei Starovoitov 

Applied and queued up for -stable.


Re: [PATCH net v2 0/4] net: fix device reference leaks

2016-11-07 Thread David Miller
From: Johan Hovold 
Date: Thu,  3 Nov 2016 18:40:18 +0100

> This series fixes a number of device reference leaks (and one of_node
> leak) due to failure to drop the references taken by bus_find_device()
> and friends.
> 
> Note that the final two patches have been compile tested only.
 ...
> v2
>  - hold reference to cpsw-phy-sel device while accessing private data as
>requested by David. Also update the commit message. (patch 1/4)
>  - add linux-omap on CC where appropriate

Series applied, thanks.


Re: [PATCH net] bpf: fix htab map destruction when extra reserve is in use

2016-11-07 Thread David Miller
From: Daniel Borkmann 
Date: Fri,  4 Nov 2016 00:01:19 +0100

> Commit a6ed3ea65d98 ("bpf: restore behavior of bpf_map_update_elem")
> added an extra per-cpu reserve to the hash table map to restore old
> behaviour from pre prealloc times. When non-prealloc is in use for a
> map, then problem is that once a hash table extra element has been
> linked into the hash-table, and the hash table is destroyed due to
> refcount dropping to zero, then htab_map_free() -> delete_all_elements()
> will walk the whole hash table and drop all elements via htab_elem_free().
> The problem is that the element from the extra reserve is first fed
> to the wrong backend allocator and eventually freed twice.
> 
> Fixes: a6ed3ea65d98 ("bpf: restore behavior of bpf_map_update_elem")
> Reported-by: Dmitry Vyukov 
> Signed-off-by: Daniel Borkmann 
> Acked-by: Alexei Starovoitov 

Applied and queued up for -stable, thanks!


Re: [PATCH net-next 0/2] sfc: enable 4-tuple UDP RSS hashing

2016-11-07 Thread David Miller
From: Edward Cree 
Date: Thu, 3 Nov 2016 22:10:31 +

> EF10 based NICs have configurable RSS hash fields, and can be made to take the
> ports into the hash on UDP (they already do so for TCP).  This patch series
> enables this, in order to improve spreading of UDP traffic.

What does the chip do with fragmented traffic?


Re: [PATCH net] sctp: assign assoc_id earlier in __sctp_connect

2016-11-07 Thread David Miller
From: Marcelo Ricardo Leitner 
Date: Thu,  3 Nov 2016 17:03:41 -0200

> sctp_wait_for_connect() currently already holds the asoc to keep it
> alive during the sleep, in case another thread release it. But Andrey
> Konovalov and Dmitry Vyukov reported an use-after-free in such
> situation.
> 
> Problem is that __sctp_connect() doesn't get a ref on the asoc and will
> do a read on the asoc after calling sctp_wait_for_connect(), but by then
> another thread may have closed it and the _put on sctp_wait_for_connect
> will actually release it, causing the use-after-free.
> 
> Fix is, instead of doing the read after waiting for the connect, do it
> before so, and avoid this issue as the socket is still locked by then.
> There should be no issue on returning the asoc id in case of failure as
> the application shouldn't trust on that number in such situations
> anyway.
> 
> This issue doesn't exist in sctp_sendmsg() path.
> 
> Reported-by: Dmitry Vyukov 
> Reported-by: Andrey Konovalov 
> Tested-by: Andrey Konovalov 
> Signed-off-by: Marcelo Ricardo Leitner 

Applied and queued up for -stable, thanks.


Re: [PATCH] net: icmp_route_lookup should use rt dev to determine L3 domain

2016-11-07 Thread David Miller
From: David Ahern 
Date: Thu,  3 Nov 2016 10:13:39 -0700

> icmp_send is called in response to some event. The skb may not have
> the device set (skb->dev is NULL), but it is expected to have an rt.
> Update icmp_route_lookup to use the rt on the skb to determine L3
> domain.
> 
> Fixes: 613d09b30f8b ("net: Use VRF device index for lookups on TX")
> Signed-off-by: David Ahern 

"skb_dst(...)->dev" would be more direct and look nicer.  No need to
use skb_rtable() just to walk backwards to the 'dst'.



Re: [PATCH net-next] net: Update raw socket bind to consider l3 domain

2016-11-07 Thread David Miller
From: David Ahern 
Date: Thu,  3 Nov 2016 09:25:00 -0700

> Binding a raw socket to a local address fails if the socket is bound
> to an L3 domain:
> 
> $ vrf-test  -s -l 10.100.1.2 -R -I red
> error binding socket: 99: Cannot assign requested address
> 
> Update raw_bind to look consider if sk_bound_dev_if is bound to an L3
> domain and use inet_addr_type_table to lookup the address.
> 
> Signed-off-by: David Ahern 

Applied.


Re: [net-next PATCH 0/3] qdisc and tx_queue_len cleanups for IFF_NO_QUEUE devices

2016-11-07 Thread David Miller
From: Jesper Dangaard Brouer 
Date: Thu, 03 Nov 2016 14:55:56 +0100

> This patchset is a cleanup for IFF_NO_QUEUE devices.  It will
> hopefully help userspace get a more consistent behavior when attaching
> qdisc to such virtual devices.

I'm still thinking about this.

My reservation about this is basically since the one known offender in
userspace acknowledged that what it was doing wrong, and fixed it
quickly already, I see no reason to explicitly accomodate this.


Re: [PATCH v6 0/7] add NS2 support to bgmac

2016-11-07 Thread David Miller
From: Jon Mason 
Date: Fri,  4 Nov 2016 01:10:55 -0400

> Changes in v6:
> * Use a common bgmac_phy_connect_direct (per Rafal Milecki) 
> * Rebased on latest net-next
> * Added Reviewed-by to the relevant patches
> 
> 
> Changes in v5:
> * Change a pr_err to netdev_err (per Scott Branden)
> * Reword the lane swap binding documentation (per Andrew Lunn)
> 
> 
> Changes in v4:
> * Actually send out the lane swap binding doc patch (Per Scott Branden)
> * Remove unused #define (Per Andrew Lunn)
> 
> 
> Changes in v3:
> * Clean-up the bgmac DT binding doc (per Rob Herring)
> * Document the lane swap binding and make it generic (Per Andrew Lunn)
> 
> 
> Changes in v2:
> * Remove the PHY power-on (per Andrew Lunn)
> * Misc PHY clean-ups regarding comments and #defines (per Andrew Lunn)
>   This results on none of the original PHY code from Vikas being
>   present.  So, I'm removing him as an author and giving him
>   "Inspired-by" credit.
> * Move PHY lane swapping to PHY driver (per Andrew Lunn and Florian
>   Fainelli)
> * Remove bgmac sleep (per Florian Fainelli)
> * Re-add bgmac chip reset (per Florian Fainelli and Ray Jui)
> * Rebased on latest net-next
> * Added patch for bcm54xx_auxctl_read, which is used in the BCM54810

Series applied, thanks.


Re: [PATCH net-next 1/5] net: l2tp: fix L2TP_ATTR_UDP_CSUM attribute type

2016-11-07 Thread David Miller
From: Asbjoern Sloth Toennesen 
Date: Fri,  4 Nov 2016 22:48:34 +

> L2TP_ATTR_UDP_CSUM is a flag, and gets read with
> nla_get_flag, but it is defined as NLA_U8 in
> the nla_policy.
> 
> It appears that this is only publicly used in
> iproute2, where it's broken, because it's used as
> a NLA_FLAG, and fails validation as a NLA_U8.
> 
> The only place it's used as a NLA_U8 is in
> l2tp_nl_tunnel_send(), but iproute2 again reads that
> as a flag, it's therefore always set. Fortunately
> it is never used for anything, just read.
> 
> CC: Miao Wang 
> Signed-off-by: Asbjoern Sloth Toennesen 

This is definitely the wrong way to go about this.

The kernel is everywhere and updating iproute2 is infinitely
easier for users to do than updating the kernel.

And in any event, once exported we really should never change
the API of anything shown to userspace like this.  Just because
you can't find a user out there doesn't mean it doesn't exist.

Please instead fix iproute2 to use u8 attributes for this.

Thanks.


Re: stmmac/RTL8211F/Meson GXBB: TX throughput problems

2016-11-07 Thread Martin Blumenstingl
Hi Peppe,

On Mon, Nov 7, 2016 at 11:59 AM, Giuseppe CAVALLARO
 wrote:
> In the meantime, I will read again the thread just to see if
> there is something I am missing.
if you are re-reading this thread: please note that there are two
devices in discussion here!
Both are using the Amlogic S905 (GXBB) SoC and both are experiencing
the same issue (Gbit TX issues, RX with Gbit speeds and RX/TX with
100Mbit speed are NOT affected):
- Odroid-C2 (used by Jerome and André Roth)
- Tronsmart Vega S95 Meta (my device)

The (Gbit TX) problem seems to be gone on the Odroid-C2 with Jerome's
patch which disables EEE in drivers/net/phy/realtek.c (at least in his
tests, I don't have that device so I can't verify).
The same problem still appears on my Tronsmart Vega S95 Meta even with
the patched PHY driver.

Unfortunately I don't have a second device to rule out that my
Tronsmart Vega S95 Meta could be broken (not unlikely, I get DDR
errors from time to time in u-boot). Maybe Andreas Faerber can test
ethernet with and without Jerome's patch on one of his Tronsmart
devices.


Regards,
Martin


Re: [lkp] [net] af1fee9821: BUG:spinlock_trylock_failure_on_UP_on_CPU

2016-11-07 Thread Andrew Lunn
On Mon, Nov 07, 2016 at 02:27:14PM +0100, Allan W. Nielsen wrote:
> Hi,
> 
> I tried to get this "lkp" up and running, but I had some troubles gettting
> these scripts to work.
> 
> But it seems like it can be reproduced using th eprovided config file, and 
> qemu.
> 
> Here is what I did:
> 
> # reproduce original bug
> git reset --hard af1fee98219992ba2c12441a447719652ed7e983
> mkdir bug-build
> cp config-4.8.0-14895-gaf1fee9 bug-build/.config
> make O=bug-build oldconfig
> make O=bug-build -j8
> qemu-system-x86_64 -enable-kvm -cpu host -smp 2 -m 4G -kernel \
> ../net-next/bug-build/arch/x86_64/boot/bzImage -nographic
> 
> # bug seemed to be re-produced
> 
> 
> # Try previous version
> git reset --hard 32ab0a38f0bd554cc45203ff4fdb6b0fdea6f025
> make O=bug-build oldconfig
> make O=bug-build -j8
> qemu-system-x86_64 -enable-kvm -cpu host -smp 2 -m 4G -kernel \
> ../net-next/bug-build/arch/x86_64/boot/bzImage -nographic
> 
> # bug seemed to disappear
> 
> 
> # Try the buggy revision again - but without MICROSEMI_PHY
> git reset --hard af1fee98219992ba2c12441a447719652ed7e983
> sed -e "/MICROSEMI_PHY/d" -i bug-build/.config
> make O=bug-build oldconfig
> cat bug-build/.config | grep MICROSEMI_PHY
> qemu-system-x86_64 -enable-kvm -cpu host -smp 2 -m 4G -kernel \
> ../net-next/bug-build/arch/x86_64/boot/bzImage -nographic
> 
> # bug still seem to be there...
> 
> 
> Not sure what this tells me, any hints are more than welcome.

If the bug happens without your code being compiled, it cannot be your
code. It suggests the patch is moving code around in such a way to
trigger the issue, but it is not the source of the issue itself. To me
it seems like memory corruption or uninitialised variables in some
other code, or maybe DMA from the stack, which was never allowed but
mostly work on some platforms, but the recent change to virtual mapped
stacks as broken.

Your code is off the hook, thanks for the testing you did.

 Andrew



[PATCH 1/2] net: qcom/emac: configure the external phy to allow pause frames

2016-11-07 Thread Timur Tabi
Pause frames are used to enable flow control.  A MAC can send and
receive pause frames in order to throttle traffic.  However, the PHY
must be configured to allow those frames to pass through.

Reviewed-by: Florian Fainelli 
Signed-off-by: Timur Tabi 
---
 drivers/net/ethernet/qualcomm/emac/emac-mac.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/qualcomm/emac/emac-mac.c 
b/drivers/net/ethernet/qualcomm/emac/emac-mac.c
index 6fb3bee..70a55dc 100644
--- a/drivers/net/ethernet/qualcomm/emac/emac-mac.c
+++ b/drivers/net/ethernet/qualcomm/emac/emac-mac.c
@@ -1003,6 +1003,12 @@ int emac_mac_up(struct emac_adapter *adpt)
writel((u32)~DIS_INT, adpt->base + EMAC_INT_STATUS);
writel(adpt->irq.mask, adpt->base + EMAC_INT_MASK);
 
+   /* Enable pause frames.  Without this feature, the EMAC has been shown
+* to receive (and drop) frames with FCS errors at gigabit connections.
+*/
+   adpt->phydev->supported |= SUPPORTED_Pause | SUPPORTED_Asym_Pause;
+   adpt->phydev->advertising |= SUPPORTED_Pause | SUPPORTED_Asym_Pause;
+
adpt->phydev->irq = PHY_IGNORE_INTERRUPT;
phy_start(adpt->phydev);
 
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.



[PATCH 2/2] [v2] net: qcom/emac: enable flow control if requested

2016-11-07 Thread Timur Tabi
If the PHY has been configured to allow pause frames, then the MAC
should be configured to generate and/or accept those frames.

Signed-off-by: Timur Tabi 
---

v2: fix calculation when TXFC should be set

 drivers/net/ethernet/qualcomm/emac/emac-mac.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/emac/emac-mac.c 
b/drivers/net/ethernet/qualcomm/emac/emac-mac.c
index 70a55dc..0b4deb3 100644
--- a/drivers/net/ethernet/qualcomm/emac/emac-mac.c
+++ b/drivers/net/ethernet/qualcomm/emac/emac-mac.c
@@ -575,10 +575,11 @@ void emac_mac_start(struct emac_adapter *adpt)
 
mac |= TXEN | RXEN; /* enable RX/TX */
 
-   /* We don't have ethtool support yet, so force flow-control mode
-* to 'full' always.
-*/
-   mac |= TXFC | RXFC;
+   /* Configure MAC flow control to match the PHY's settings. */
+   if (phydev->pause)
+   mac |= RXFC;
+   if (phydev->pause != phydev->asym_pause)
+   mac |= TXFC;
 
/* setup link speed */
mac &= ~SPEED_MASK;
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.



[PATCH 0/2] net: qcom/emac: ensure that pause frames are enabled

2016-11-07 Thread Timur Tabi
The qcom emac driver experiences significant packet loss (through frame
check sequence errors) if flow control is not enabled and the phy is
not configured to allow pause frames to pass through it.  Therefore, we
need to enable flow control and force the phy to pass pause frames.

Timur Tabi (2):
  net: qcom/emac: configure the external phy to allow pause frames
  [v2] net: qcom/emac: enable flow control if requested

 drivers/net/ethernet/qualcomm/emac/emac-mac.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.



Re: [PATCH net-next v6 02/10] dpaa_eth: add support for DPAA Ethernet

2016-11-07 Thread David Miller
From: Madalin-Cristian Bucur 
Date: Mon, 7 Nov 2016 16:32:16 +

>> -Original Message-
>> From: David Miller [mailto:da...@davemloft.net]
>> Sent: Monday, November 07, 2016 5:55 PM
>> 
>> From: Madalin-Cristian Bucur 
>> Date: Mon, 7 Nov 2016 15:43:26 +
>> 
>> >> From: David Miller [mailto:da...@davemloft.net]
>> >> Sent: Thursday, November 03, 2016 9:58 PM
>> >>
>> >> Why?  By clearing this, you disallow an important fundamental way to do
>> >> performane testing, via pktgen.
>> >
>> > The Tx path in DPAA requires one to insert a back-pointer to the skb
>> into
>> > the Tx buffer. On the Tx confirmation path the back-pointer in the
>> buffer
>> > is used to release the skb. If Tx buffer is shared we'd alter the back-
>> pointer
>> > and leak/double free skbs. See also
>> 
>> Then have your software state store an array of SKB pointers, one for each
>> TX ring entry, just like every other driver does.
> 
> There is no Tx ring in DPAA. Frames are send out on QMan HW queues towards
> the FMan for Tx and then received back on Tx confirmation queues for cleanup.
> Array traversal would for sure cost more than using the back-pointer. Also,
> we can now process confirmations on a different core than the one doing Tx,
> we'd have to keep the arrays percpu and force the Tx conf on the same core.
> Or add locks.

Report back an integer index, like every scsi driver out there which
completes tagged queued block I/O operations asynchronously.  You can
associate the array with a specific TX confirmation queue.



Re: [PATCH net-next v6 02/10] dpaa_eth: add support for DPAA Ethernet

2016-11-07 Thread Joakim Tjernlund
On Wed, 2016-11-02 at 22:17 +0200, Madalin Bucur wrote:
> This introduces the Freescale Data Path Acceleration Architecture
> (DPAA) Ethernet driver (dpaa_eth) that builds upon the DPAA QMan,
> BMan, PAMU and FMan drivers to deliver Ethernet connectivity on
> the Freescale DPAA QorIQ platforms.

Nice to see DPAA support soon entering the kernel(not a day too early:)
I would like to see BQL supported from day one though, if possible.

 Regards
          Joakim Tjernlund


Re: [PATCH] [RFC] net: phy: phy drivers should not set SUPPORTED_Pause or SUPPORTED_Asym_Pause

2016-11-07 Thread Timur Tabi

On 11/01/2016 01:35 PM, Florian Fainelli wrote:

So in premise, this is good, and is exactly what I have in mind for the
series that I am cooking, but if we apply this alone, without a change
in drivers/net/phy/phy.c which adds SUPPORTED_Pause |
SUPPORTED_AsymPause to phydev->features, we are basically breaking the
Ethernet MAC drivers that don't explicitly override phydev->features and
yet rely on that to get flow control to work.


Can you tell me where I should set the SUPPORTED_Pause and 
SUPPORTED_AsymPause flags?  I have a two candidates:


1. The settings[] array.  Add these flags to each entry.

2. In phy_sanitize_settings().  Add

phydev->supported |= SUPPORTED_Pause | SUPPORTED_AsymPause;

at the end of the function.


I'm still don't understand 100% how these flags really work, because I 
just can't shake the feeling that they should not be set for every phy. 
 If these flags are supposed to be turned on universally, then why are 
they even an option?


--
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.


Re: [PATCH 00/12] xen: add common function for reading optional value

2016-11-07 Thread Jarkko Sakkinen
On Mon, Nov 07, 2016 at 11:08:09AM +, David Vrabel wrote:
> On 31/10/16 16:48, Juergen Gross wrote:
> > There are multiple instances of code reading an optional unsigned
> > parameter from Xenstore via xenbus_scanf(). Instead of repeating the
> > same code over and over add a service function doing the job and
> > replace the call of xenbus_scanf() with the call of the new function
> > where appropriate.
> 
> Acked-by: David Vrabel 
> 
> Please queue for the next release.

If you want this change to tpmdd, please resend it to tpmdd mailing
list and CC it to linux-security-module. Thanks.

> David

/Jarkko


Re: [PATCH net] Fixes: 5943634fc559 ("ipv4: Maintain redirect and PMTU info in struct rtable again.")

2016-11-07 Thread David Miller
From: Eric Dumazet 
Date: Mon, 07 Nov 2016 08:08:52 -0800

> In any case, rt is a shared object at that time, so even temporarily
> clearing/restoring rt_gateway seems wrong to me.
> 
> I would rather call __ipv4_neigh_lookup(dst->dev, new_gw) directly at
> this point.

Agreed.


Re: [PATCH net] Fixes: 5943634fc559 ("ipv4: Maintain redirect and PMTU info in struct rtable again.")

2016-11-07 Thread Eric Dumazet
On Mon, 2016-11-07 at 10:04 -0500, Stephen Suryaputra Lin wrote:
> ICMP redirects behavior is different after the commit above. An email
> requesting the explanation on why the behavior needs to be different
> was sent earlier to netdev (https://patchwork.ozlabs.org/patch/687728/).
> Since there isn't a reply yet, I decided to prepare this formal patch.
> 
> In v2.6 kernel, it used to be that ip_rt_redirect() calls
> arp_bind_neighbour() which returns 0 and then the state of the neigh for
> the new_gw is checked. If the state isn't valid then the redirected
> route is deleted. This behavior is maintained up to v3.5.7 by
> check_peer_redirect() because rt->rt_gateway is assigned to
> peer->redirect_learned.a4 before calling ipv4_neigh_lookup().
> 
> After the commit, ipv4_neigh_lookup() is performed without the
> rt_gateway assigned to the new_gw. In the case when rt_gateway (old_gw)
> isn't zero, the function uses it as the key. The neigh is most likely valid
> since the old_gw is the one that sends the ICMP redirect message. Then the
> new_gw is assigned to fib_nh_exception. The problem is: the new_gw ARP may
> never gets resolved and the traffic is blackholed.
> 
> Signed-off-by: Stephen Suryaputra Lin 
> ---
>  net/ipv4/route.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 62d4d90c1389..510045cefcab 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -753,7 +753,9 @@ static void __ip_do_redirect(struct rtable *rt, struct 
> sk_buff *skb, struct flow
>   goto reject_redirect;
>   }
>  
> + rt->rt_gateway = 0;
>   n = ipv4_neigh_lookup(>dst, NULL, _gw);
> + rt->rt_gateway = old_gw;
>   if (!IS_ERR(n)) {
>   if (!(n->nud_state & NUD_VALID)) {
>   neigh_event_send(n, NULL);

In any case, rt is a shared object at that time, so even temporarily
clearing/restoring rt_gateway seems wrong to me.

I would rather call __ipv4_neigh_lookup(dst->dev, new_gw) directly at
this point.




Re: [net PATCH] fib_trie: Correct /proc/net/route off by one error

2016-11-07 Thread Jason Baron



On 11/04/2016 03:11 PM, Alexander Duyck wrote:

The display of /proc/net/route has had a couple issues due to the fact that
when I originally rewrote most of fib_trie I made it so that the iterator
was tracking the next value to use instead of the current.

In addition it had an off by 1 error where I was tracking the first piece
of data as position 0, even though in reality that belonged to the
SEQ_START_TOKEN.

This patch updates the code so the iterator tracks the last reported
position and key instead of the next expected position and key.  In
addition it shifts things so that all of the leaves start at 1 instead of
trying to report leaves starting with offset 0 as being valid.  With these
two issues addressed this should resolve any off by one errors that were
present in the display of /proc/net/route.

Fixes: 25b97c016b26 ("ipv4: off-by-one in continuation handling in 
/proc/net/route")
Cc: Andy Whitcroft 
Reported-by: Jason Baron 
Signed-off-by: Alexander Duyck 
---
 net/ipv4/fib_trie.c |   21 +
 1 file changed, 9 insertions(+), 12 deletions(-)



Ok. Works for me.

Feel free to add:
Reviewed-and-Tested-by: Jason Baron 

Thanks,

-Jason


RE: [PATCH net-next v6 02/10] dpaa_eth: add support for DPAA Ethernet

2016-11-07 Thread Madalin-Cristian Bucur
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Thursday, November 03, 2016 9:58 PM
> 
> From: Madalin Bucur 
> Date: Wed, 2 Nov 2016 22:17:26 +0200
> 
> > This introduces the Freescale Data Path Acceleration Architecture
> > +static inline size_t bpool_buffer_raw_size(u8 index, u8 cnt)
> > +{
> > +   u8 i;
> > +   size_t res = DPAA_BP_RAW_SIZE / 2;
> 
> Always order local variable declarations from longest to shortest line,
> also know as Reverse Christmas Tree Format.
> 
> Please audit your entire submission for this problem, it occurs
> everywhere.

Thank you, I'll resolve this.

> > +   /* we do not want shared skbs on TX */
> > +   net_dev->priv_flags &= ~IFF_TX_SKB_SHARING;
> 
> Why?  By clearing this, you disallow an important fundamental way to do
> performane testing, via pktgen.

The Tx path in DPAA requires one to insert a back-pointer to the skb into
the Tx buffer. On the Tx confirmation path the back-pointer in the buffer
is used to release the skb. If Tx buffer is shared we'd alter the back-pointer
and leak/double free skbs. See also 

static int dpaa_start_xmit(struct sk_buff *skb, struct net_device 
*net_dev)
{
...
  if (!nonlinear) {
/* We're going to store the skb backpointer at the 
beginning
 * of the data buffer, so we need a privately owned skb
   *
 * We've made sure skb is not shared in dev->priv_flags,
 * we need to verify the skb head is not cloned
   */
if (skb_cow_head(skb, priv->tx_headroom))
goto enomem;

  WARN_ON(skb_is_nonlinear(skb));
}
...

> > +   int numstats = sizeof(struct rtnl_link_stats64) / sizeof(u64);
>  ...
> > +   cpustats = (u64 *)_priv->stats;
> > +
> > +   for (j = 0; j < numstats; j++)
> > +   netstats[j] += cpustats[j];
> 
> This is a memcpy() on well-typed datastructures which requires no
> casting or special handling whatsoever, so use memcpy instead of
> needlessly open coding the operation.

Will fix.

> > +static int dpaa_change_mtu(struct net_device *net_dev, int new_mtu)
> > +{
> > +   const int max_mtu = dpaa_get_max_mtu();
> > +
> > +   /* Make sure we don't exceed the Ethernet controller's MAXFRM */
> > +   if (new_mtu < 68 || new_mtu > max_mtu) {
> > +   netdev_err(net_dev, "Invalid L3 mtu %d (must be between %d and
> %d).\n",
> > +  new_mtu, 68, max_mtu);
> > +   return -EINVAL;
> > +   }
> > +   net_dev->mtu = new_mtu;
> > +
> > +   return 0;
> > +}
> 
> MTU restrictions are handled in the net-next tree via net_dev->min_mtu and
> net_dev->max_mtu.  Use that and do not define this NDO operation as you do
> not need it.

OK
 
> > +static int dpaa_set_features(struct net_device *dev, netdev_features_t
> features)
> > +{
> > +   /* Not much to do here for now */
> > +   dev->features = features;
> > +   return 0;
> > +}
> 
> Do not define unnecessary NDO operations, let the defaults do their job.
> 
> > +static netdev_features_t dpaa_fix_features(struct net_device *dev,
> > +  netdev_features_t features)
> > +{
> > +   netdev_features_t unsupported_features = 0;
> > +
> > +   /* In theory we should never be requested to enable features that
> > +* we didn't set in netdev->features and netdev->hw_features at
> probe
> > +* time, but double check just to be on the safe side.
> > +*/
> > +   unsupported_features |= NETIF_F_RXCSUM;
> > +
> > +   features &= ~unsupported_features;
> > +
> > +   return features;
> > +}
> 
> Unless you can show that your need this, do not "guess" by implement this
> NDO operation.  You don't need it.

Will remove it.
 
> > +#ifdef CONFIG_FSL_DPAA_ETH_FRIENDLY_IF_NAME
> > +static int dpaa_mac_hw_index_get(struct platform_device *pdev)
> > +{
> > +   struct device *dpaa_dev;
> > +   struct dpaa_eth_data *eth_data;
> > +
> > +   dpaa_dev = >dev;
> > +   eth_data = dpaa_dev->platform_data;
> > +
> > +   return eth_data->mac_hw_id;
> > +}
> > +
> > +static int dpaa_mac_fman_index_get(struct platform_device *pdev)
> > +{
> > +   struct device *dpaa_dev;
> > +   struct dpaa_eth_data *eth_data;
> > +
> > +   dpaa_dev = >dev;
> > +   eth_data = dpaa_dev->platform_data;
> > +
> > +   return eth_data->fman_hw_id;
> > +}
> > +#endif
> 
> Do not play network device naming games like this, use the standard name
> assignment done by the kernel and have userspace entities do geographic or
> device type specific naming.
> 
> I want to see this code completely removed.

I'll remove the option, udev rules like these can achieve the same effect:

SUBSYSTEM=="net", DRIVERS=="fsl_dpa*", ATTR{device_addr}=="ffe4e", 
NAME="fm1-mac1"
SUBSYSTEM=="net", DRIVERS=="fsl_dpa*", ATTR{device_addr}=="ffe4e2000", 
NAME="fm1-mac2"
 
> > +static int 

Re: [PATCH net-next v6 02/10] dpaa_eth: add support for DPAA Ethernet

2016-11-07 Thread David Miller
From: Madalin-Cristian Bucur 
Date: Mon, 7 Nov 2016 15:43:26 +

>> From: David Miller [mailto:da...@davemloft.net]
>> Sent: Thursday, November 03, 2016 9:58 PM
>> 
>> Why?  By clearing this, you disallow an important fundamental way to do
>> performane testing, via pktgen.
> 
> The Tx path in DPAA requires one to insert a back-pointer to the skb into
> the Tx buffer. On the Tx confirmation path the back-pointer in the buffer
> is used to release the skb. If Tx buffer is shared we'd alter the back-pointer
> and leak/double free skbs. See also 

Then have your software state store an array of SKB pointers, one for each
TX ring entry, just like every other driver does.


Re: [PATCH net-next 06/11] net: l3mdev: remove redundant calls

2016-11-07 Thread David Ahern
On 11/7/16 3:13 AM, Lorenzo Colitti wrote:
> What should we do here? It would seem that now that
> netif_index_is_l3_master has been resurrected, it's appropriate to use
> it here as well. The user-visible behaviour changed only two months
> ago. Unless we think that RSTs should always mirror the iif, in which
> case I can change our tests accordingly.

Using ingress information (iif in this case) to determine the route for a 
response seems appropriate.

I can send a patch to revert back to:

if (!oif && netif_index_is_vrf(net, skb->skb_iif))
oif = skb->skb_iif;

thanks for the report.


[PATCH] net/netfilter: Fix use uninitialized warn in nft_range_eval()

2016-11-07 Thread Shuah Khan
Fix the following warn:

   CC [M]  net/netfilter/nft_range.o
8601,8605c9105
 net/netfilter/nft_range.c: In function ‘nft_range_eval’:
 net/netfilter/nft_range.c:45:5: warning: ‘mismatch’ may be used uninitialized 
in this function [-Wmaybe-uninitialized]
   if (mismatch)
  ^

Signed-off-by: Shuah Khan 
---
 net/netfilter/nft_range.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/nft_range.c b/net/netfilter/nft_range.c
index c6d5358..fe5f69b 100644
--- a/net/netfilter/nft_range.c
+++ b/net/netfilter/nft_range.c
@@ -28,7 +28,7 @@ static void nft_range_eval(const struct nft_expr *expr,
 const struct nft_pktinfo *pkt)
 {
const struct nft_range_expr *priv = nft_expr_priv(expr);
-   bool mismatch;
+   bool mismatch = false;
int d1, d2;
 
d1 = memcmp(>data[priv->sreg], >data_from, priv->len);
-- 
2.9.3



[PATCH/RFC net-next] ravb: Add dma_unmap_single in ravb_ring_free

2016-11-07 Thread Simon Horman
From: Kazuya Mizuguchi 

The kernel panic occurs with "swiotlb buffer is full" message
after repeating suspend and resume, because dma_map_single of
ravb_ring_format and ravb_start_xmit is not released.
This patch adds dma_unmap_single in ravb_ring_free, and fixes
its problem.

Signed-off-by: Kazuya Mizuguchi 
Signed-off-by: Simon Horman 
---
Sergei, this is a patch from the Gen3 3.3.2 BSP.
Please consider if it is appropriate for mainline.
---
 drivers/net/ethernet/renesas/ravb_main.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/drivers/net/ethernet/renesas/ravb_main.c 
b/drivers/net/ethernet/renesas/ravb_main.c
index 27cfec3154c8..e44629b75c83 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -185,6 +185,9 @@ static void ravb_ring_free(struct net_device *ndev, int q)
struct ravb_private *priv = netdev_priv(ndev);
int ring_size;
int i;
+   struct ravb_ex_rx_desc *rx_desc;
+   struct ravb_tx_desc *tx_desc;
+   u32 size;
 
/* Free RX skb ringbuffer */
if (priv->rx_skb[q]) {
@@ -207,6 +210,16 @@ static void ravb_ring_free(struct net_device *ndev, int q)
priv->tx_align[q] = NULL;
 
if (priv->rx_ring[q]) {
+   for (i = 0; i < priv->num_rx_ring[q]; i++) {
+   rx_desc = >rx_ring[q][i];
+   if (rx_desc->dptr != 0) {
+   dma_unmap_single(ndev->dev.parent,
+le32_to_cpu(rx_desc->dptr),
+PKT_BUF_SZ,
+DMA_FROM_DEVICE);
+   rx_desc->dptr = 0;
+   }
+   }
ring_size = sizeof(struct ravb_ex_rx_desc) *
(priv->num_rx_ring[q] + 1);
dma_free_coherent(ndev->dev.parent, ring_size, priv->rx_ring[q],
@@ -215,6 +228,16 @@ static void ravb_ring_free(struct net_device *ndev, int q)
}
 
if (priv->tx_ring[q]) {
+   for (i = 0; i < priv->num_tx_ring[q]; i++) {
+   tx_desc = >tx_ring[q][i];
+   size = le16_to_cpu(tx_desc->ds_tagl) & TX_DS;
+   if (tx_desc->dptr != 0) {
+   dma_unmap_single(ndev->dev.parent,
+le32_to_cpu(tx_desc->dptr),
+size, DMA_TO_DEVICE);
+   tx_desc->dptr = 0;
+   }
+   }
ring_size = sizeof(struct ravb_tx_desc) *
(priv->num_tx_ring[q] * NUM_TX_DESC + 1);
dma_free_coherent(ndev->dev.parent, ring_size, priv->tx_ring[q],
-- 
2.7.0.rc3.207.g0ac5344



[PATCH net] Fixes: 5943634fc559 ("ipv4: Maintain redirect and PMTU info in struct rtable again.")

2016-11-07 Thread Stephen Suryaputra Lin
ICMP redirects behavior is different after the commit above. An email
requesting the explanation on why the behavior needs to be different
was sent earlier to netdev (https://patchwork.ozlabs.org/patch/687728/).
Since there isn't a reply yet, I decided to prepare this formal patch.

In v2.6 kernel, it used to be that ip_rt_redirect() calls
arp_bind_neighbour() which returns 0 and then the state of the neigh for
the new_gw is checked. If the state isn't valid then the redirected
route is deleted. This behavior is maintained up to v3.5.7 by
check_peer_redirect() because rt->rt_gateway is assigned to
peer->redirect_learned.a4 before calling ipv4_neigh_lookup().

After the commit, ipv4_neigh_lookup() is performed without the
rt_gateway assigned to the new_gw. In the case when rt_gateway (old_gw)
isn't zero, the function uses it as the key. The neigh is most likely valid
since the old_gw is the one that sends the ICMP redirect message. Then the
new_gw is assigned to fib_nh_exception. The problem is: the new_gw ARP may
never gets resolved and the traffic is blackholed.

Signed-off-by: Stephen Suryaputra Lin 
---
 net/ipv4/route.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 62d4d90c1389..510045cefcab 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -753,7 +753,9 @@ static void __ip_do_redirect(struct rtable *rt, struct 
sk_buff *skb, struct flow
goto reject_redirect;
}
 
+   rt->rt_gateway = 0;
n = ipv4_neigh_lookup(>dst, NULL, _gw);
+   rt->rt_gateway = old_gw;
if (!IS_ERR(n)) {
if (!(n->nud_state & NUD_VALID)) {
neigh_event_send(n, NULL);
-- 
2.7.4



  1   2   >