date:20171111

Re: [PATCH net v3 00/12] Fixes, cleanup and modernization for some legacy ethernet NIC drivers

2017-11-11 Thread David Miller

From: Finn Thain 
Date: Sat, 11 Nov 2017 01:20:58 -0500 (EST)

> This patch series adds support for the Linux Driver Model for Mac NIC
> drivers, fixes some logging bugs, removes dead code, and adopts netif_*
> calls to reduce code duplication.
> 
> All up, about 100 lines of code are eliminated.
> 
> This patch series has been tested on a variety of Macs, with coverage
> for the changes to lib8390.c, mac8390.c, macsonic.c, sonic.[ch] and
> macmace.c.
> 
> This patch series should be applied after the NuBus subsystem
> modernization patch series.

Then you can't be submitting this for the networking tree.

[PATCH net-next 0/2] cxgb4: collect LE-TCAM and SGE queue contexts

2017-11-11 Thread Rahul Lakkireddy

Collect hardware dumps via ethtool --get-dump facility.

Patch 1 collects LE-TCAM dump.

Patch 2 collects SGE queue context dumps.

Thanks,
Rahul

Rahul Lakkireddy (2):
  cxgb4: collect LE-TCAM dump
  cxgb4: collect SGE queue context dump

 drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h |  38 
 drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h |   2 +
 drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c| 253 ++
 drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.h|  11 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h|   4 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c  |  11 +
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c|  62 ++
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.h|   7 +
 drivers/net/ethernet/chelsio/cxgb4/t4_regs.h  |  68 ++
 9 files changed, 456 insertions(+)

-- 
2.14.1

[PATCH net-next 1/2] cxgb4: collect LE-TCAM dump

2017-11-11 Thread Rahul Lakkireddy

Signed-off-by: Rahul Lakkireddy 
Signed-off-by: Ganesh Goudar 
---
 drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h |  30 
 drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h |   1 +
 drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c| 175 ++
 drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.h|   7 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c  |   7 +
 drivers/net/ethernet/chelsio/cxgb4/t4_regs.h  |  41 +
 6 files changed, 261 insertions(+)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h 
b/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h
index 1de1d811fde3..f99db7b283fc 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h
@@ -185,6 +185,36 @@ struct cudbg_vpd_data {
u32 vpd_vers;
 };
 
+#define CUDBG_MAX_TCAM_TID 0x800
+
+enum cudbg_le_entry_types {
+   LE_ET_UNKNOWN = 0,
+   LE_ET_TCAM_CON = 1,
+   LE_ET_TCAM_SERVER = 2,
+   LE_ET_TCAM_FILTER = 3,
+   LE_ET_TCAM_CLIP = 4,
+   LE_ET_TCAM_ROUTING = 5,
+   LE_ET_HASH_CON = 6,
+   LE_ET_INVALID_TID = 8,
+};
+
+struct cudbg_tcam {
+   u32 filter_start;
+   u32 server_start;
+   u32 clip_start;
+   u32 routing_start;
+   u32 tid_hash_base;
+   u32 max_tid;
+};
+
+struct cudbg_tid_data {
+   u32 tid;
+   u32 dbig_cmd;
+   u32 dbig_conf;
+   u32 dbig_rsp_stat;
+   u32 data[NUM_LE_DB_DBGI_RSP_DATA_INSTANCES];
+};
+
 #define CUDBG_NUM_ULPTX 11
 #define CUDBG_NUM_ULPTX_READ 512
 
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h 
b/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h
index e484c514e9ae..4e5d189eae62 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h
@@ -65,6 +65,7 @@ enum cudbg_dbg_entity_type {
CUDBG_TID_INFO = 54,
CUDBG_MPS_TCAM = 57,
CUDBG_VPD_DATA = 58,
+   CUDBG_LE_TCAM = 59,
CUDBG_CCTRL = 60,
CUDBG_MA_INDIRECT = 61,
CUDBG_ULPTX_LA = 62,
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c 
b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c
index 32c9858da110..dd7e26be98cf 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c
@@ -1367,6 +1367,181 @@ int cudbg_collect_vpd_data(struct cudbg_init *pdbg_init,
return rc;
 }
 
+static int cudbg_read_tid(struct cudbg_init *pdbg_init, u32 tid,
+ struct cudbg_tid_data *tid_data)
+{
+   struct adapter *padap = pdbg_init->adap;
+   int i, cmd_retry = 8;
+   u32 val;
+
+   /* Fill REQ_DATA regs with 0's */
+   for (i = 0; i < NUM_LE_DB_DBGI_REQ_DATA_INSTANCES; i++)
+   t4_write_reg(padap, LE_DB_DBGI_REQ_DATA_A + (i << 2), 0);
+
+   /* Write DBIG command */
+   val = DBGICMD_V(4) | DBGITID_V(tid);
+   t4_write_reg(padap, LE_DB_DBGI_REQ_TCAM_CMD_A, val);
+   tid_data->dbig_cmd = val;
+
+   val = DBGICMDSTRT_F | DBGICMDMODE_V(1); /* LE mode */
+   t4_write_reg(padap, LE_DB_DBGI_CONFIG_A, val);
+   tid_data->dbig_conf = val;
+
+   /* Poll the DBGICMDBUSY bit */
+   val = 1;
+   while (val) {
+   val = t4_read_reg(padap, LE_DB_DBGI_CONFIG_A);
+   val = val & DBGICMDBUSY_F;
+   cmd_retry--;
+   if (!cmd_retry)
+   return CUDBG_SYSTEM_ERROR;
+   }
+
+   /* Check RESP status */
+   val = t4_read_reg(padap, LE_DB_DBGI_RSP_STATUS_A);
+   tid_data->dbig_rsp_stat = val;
+   if (!(val & 1))
+   return CUDBG_SYSTEM_ERROR;
+
+   /* Read RESP data */
+   for (i = 0; i < NUM_LE_DB_DBGI_RSP_DATA_INSTANCES; i++)
+   tid_data->data[i] = t4_read_reg(padap,
+   LE_DB_DBGI_RSP_DATA_A +
+   (i << 2));
+   tid_data->tid = tid;
+   return 0;
+}
+
+static int cudbg_get_le_type(u32 tid, struct cudbg_tcam tcam_region)
+{
+   int type = LE_ET_UNKNOWN;
+
+   if (tid < tcam_region.server_start)
+   type = LE_ET_TCAM_CON;
+   else if (tid < tcam_region.filter_start)
+   type = LE_ET_TCAM_SERVER;
+   else if (tid < tcam_region.clip_start)
+   type = LE_ET_TCAM_FILTER;
+   else if (tid < tcam_region.routing_start)
+   type = LE_ET_TCAM_CLIP;
+   else if (tid < tcam_region.tid_hash_base)
+   type = LE_ET_TCAM_ROUTING;
+   else if (tid < tcam_region.max_tid)
+   type = LE_ET_HASH_CON;
+   else
+   type = LE_ET_INVALID_TID;
+
+   return type;
+}
+
+static int cudbg_is_ipv6_entry(struct cudbg_tid_data *tid_data,
+  struct cudbg_tcam tcam_region)
+{
+   int ipv6 = 0;
+   int le_type;
+
+   le_type = cudbg_get_le_type(tid_data->tid,

[PATCH net-next 2/2] cxgb4: collect SGE queue context dump

2017-11-11 Thread Rahul Lakkireddy

Collect SGE freelist queue and congestion manager contexts.

Signed-off-by: Rahul Lakkireddy 
Signed-off-by: Ganesh Goudar 
---
 drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h |  8 +++
 drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h |  1 +
 drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c| 78 +++
 drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.h|  4 ++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h|  4 ++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c  |  4 ++
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c| 62 ++
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.h|  7 ++
 drivers/net/ethernet/chelsio/cxgb4/t4_regs.h  | 27 
 9 files changed, 195 insertions(+)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h 
b/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h
index f99db7b283fc..605689957496 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h
@@ -145,6 +145,14 @@ struct cudbg_tid_info_region_rev1 {
u32 reserved[16];
 };
 
+#define CUDBG_MAX_FL_QIDS 1024
+
+struct cudbg_ch_cntxt {
+   u32 cntxt_type;
+   u32 cntxt_id;
+   u32 data[SGE_CTXT_SIZE / 4];
+};
+
 #define CUDBG_MAX_RPLC_SIZE 128
 
 struct cudbg_mps_tcam {
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h 
b/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h
index 4e5d189eae62..e10ff1ee62c5 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h
@@ -63,6 +63,7 @@ enum cudbg_dbg_entity_type {
CUDBG_PCIE_INDIRECT = 50,
CUDBG_PM_INDIRECT = 51,
CUDBG_TID_INFO = 54,
+   CUDBG_DUMP_CONTEXT = 56,
CUDBG_MPS_TCAM = 57,
CUDBG_VPD_DATA = 58,
CUDBG_LE_TCAM = 59,
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c 
b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c
index dd7e26be98cf..d699bf88d18f 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c
@@ -1115,6 +1115,84 @@ int cudbg_collect_tid(struct cudbg_init *pdbg_init,
return rc;
 }
 
+int cudbg_dump_context_size(struct adapter *padap)
+{
+   u32 value, size;
+   u8 flq;
+
+   value = t4_read_reg(padap, SGE_FLM_CFG_A);
+
+   /* Get number of data freelist queues */
+   flq = HDRSTARTFLQ_G(value);
+   size = CUDBG_MAX_FL_QIDS >> flq;
+
+   /* Add extra space for congestion manager contexts.
+* The number of CONM contexts are same as number of freelist
+* queues.
+*/
+   size += size;
+   return size * sizeof(struct cudbg_ch_cntxt);
+}
+
+static void cudbg_read_sge_ctxt(struct cudbg_init *pdbg_init, u32 cid,
+   enum ctxt_type ctype, u32 *data)
+{
+   struct adapter *padap = pdbg_init->adap;
+   int rc = -1;
+
+   /* Under heavy traffic, the SGE Queue contexts registers will be
+* frequently accessed by firmware.
+*
+* To avoid conflicts with firmware, always ask firmware to fetch
+* the SGE Queue contexts via mailbox. On failure, fallback to
+* accessing hardware registers directly.
+*/
+   if (is_fw_attached(pdbg_init))
+   rc = t4_sge_ctxt_rd(padap, padap->mbox, cid, ctype, data);
+   if (rc)
+   t4_sge_ctxt_rd_bd(padap, cid, ctype, data);
+}
+
+int cudbg_collect_dump_context(struct cudbg_init *pdbg_init,
+  struct cudbg_buffer *dbg_buff,
+  struct cudbg_error *cudbg_err)
+{
+   struct adapter *padap = pdbg_init->adap;
+   struct cudbg_buffer temp_buff = { 0 };
+   struct cudbg_ch_cntxt *buff;
+   u32 size, i = 0;
+   int rc;
+
+   rc = cudbg_dump_context_size(padap);
+   if (rc <= 0)
+   return CUDBG_STATUS_ENTITY_NOT_FOUND;
+
+   size = rc;
+   rc = cudbg_get_buff(dbg_buff, size, _buff);
+   if (rc)
+   return rc;
+
+   buff = (struct cudbg_ch_cntxt *)temp_buff.data;
+   while (size > 0) {
+   buff->cntxt_type = CTXT_FLM;
+   buff->cntxt_id = i;
+   cudbg_read_sge_ctxt(pdbg_init, i, CTXT_FLM, buff->data);
+   buff++;
+   size -= sizeof(struct cudbg_ch_cntxt);
+
+   buff->cntxt_type = CTXT_CNM;
+   buff->cntxt_id = i;
+   cudbg_read_sge_ctxt(pdbg_init, i, CTXT_CNM, buff->data);
+   buff++;
+   size -= sizeof(struct cudbg_ch_cntxt);
+
+   i++;
+   }
+
+   cudbg_write_and_release_buff(_buff, dbg_buff);
+   return rc;
+}
+
 static inline void cudbg_tcamxy2valmask(u64 x, u64 y, u8 *addr, u64 *mask)
 {
*mask = x | y;
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.h 
b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.h
index ebb2d9907fc9..caeee8e33e86

MUTUAL COPERATION THANK YOU

2017-11-11 Thread Auditor Mr Obama Bassole.

Dear Friend, 

I know that this message will come to you as a surprise. I am the Auditing and 
Accounting section manager with African Development Bank, Ouagadougou Burkina 
faso. I Hope that you will not expose or betray this trust and confident that I 
am about to repose on you for the mutual benefit of our both families. 

I need your urgent assistance in transferring the sum of($39.5)million to your 
account within 10 or 14 banking days. This money has been dormant for years in 
our Bank without claim.I want the bank to release the money to you as the 
nearest person to our deceased customer late George small. who died along with 
his supposed next of kin in an air crash since 31st October 1999. 

I don't want the money to go into government treasury as an abandoned fund. So 
this is the reason why I am contacting you so that the bank can release the 
money to you as the next of kin to the deceased customer. Please I would like 
you to keep this proposal as atop secret and delete it if you are not 
interested. 

Upon receipt of your reply, I will give you full details on how the business 
will be executed and also note that you will have 40% of the above mentioned 
sum if you agree to handle this business with me. 

I am expecting your urgent response as soon as you receive my message. 

Best Regard, 

Auditor Mr Obama Bassole.

Re: [PATCH net-next] cxgb4: collect vpd info directly from hardware

2017-11-11 Thread David Miller

From: Rahul Lakkireddy 
Date: Fri, 10 Nov 2017 13:03:37 +0530

> Collect vpd information directly from hardware instead of software
> adapter context. Move EEPROM physical address to virtual address
> translation logic to t4_hw.c and update relevant files.
> 
> Fixes: 6f92a6544f1a ("cxgb4: collect hardware misc dumps")
> Signed-off-by: Rahul Lakkireddy 
> Signed-off-by: Ganesh Goudar 

Applied, thanks.

Re: [net-next:master 622/639] net/dsa/port.c:255: undefined reference to `br_vlan_enabled'

2017-11-11 Thread Andrew Lunn

On Sat, Nov 11, 2017 at 06:42:21PM +0900, David Miller wrote:
> From: kbuild test robot 
> Date: Sat, 11 Nov 2017 16:57:08 +0800
> 
> > All errors (new ones prefixed by >>):
> > 
> >net/dsa/port.o: In function `dsa_port_vlan_add':
> >>> net/dsa/port.c:255: undefined reference to `br_vlan_enabled'
> >net/dsa/port.o: In function `dsa_port_vlan_del':
> >net/dsa/port.c:270: undefined reference to `br_vlan_enabled'
> 
> Problem is NET_DSA=y and BRIDGE_VLAN_FILTERING=m
> 
> We need some Kconfig dependency foo to prevent this.

Yes. Lets see if i can put together a patch before Arnd!

 Andrew

Re: [PATCH RFC,WIP 5/5] netfilter: nft_flow_offload: add ndo hooks for hardware offload

2017-11-11 Thread Felix Fietkau

On 2017-11-03 16:26, Pablo Neira Ayuso wrote:
> This patch adds the infrastructure to offload flows to hardware, in case
> the nic/switch comes with built-in flow tables capabilities.
> 
> If the hardware comes with not hardware flow tables or they have
> limitations in terms of features, this falls back to the software
> generic flow table implementation.
> 
> The software flow table aging thread skips entries that resides in the
> hardware, so the hardware will be responsible for releasing this flow
> table entry too.
> 
> Signed-off-by: Pablo Neira Ayuso 
Hi Pablo,

I'd like to start playing with those patches in OpenWrt/LEDE soon. I'm
also considering making a patch that adds iptables support.
For that to work, I think it would be a good idea to keep the code that
tries to offload flows to hardware in nf_flow_offload.c instead, so that
it can be shared with iptables integration.

By the way, do you have a git tree where you keep the current version of
your patch set?

Thanks,

- Felix

Re: [PATCH net-next 0/3] l2tp: avoid aliasing tunnels socket pointer

2017-11-11 Thread David Miller

From: Guillaume Nault 
Date: Sat, 11 Nov 2017 06:06:23 +0900

> We don't need to copy the tunnel's socket pointer in the pseudo-wire
> specific session structures. This uselessly complicates the code
> and hampers evolution.
> 
> This series was part of an effort to protect tunnels socket pointer
> with RCU. But since it provides nice cleanup, I submit it separately.

Nice simplification, applied, thanks.

Re: [PATCH] net: Remove unused skb_shared_info member

2017-11-11 Thread David Miller

From: Mat Martineau 
Date: Fri, 10 Nov 2017 14:03:51 -0800

> ip6_frag_id was only used by UFO, which has been removed.
> ipv6_proxy_select_ident() only existed to set ip6_frag_id and has no
> in-tree callers.
> 
> Signed-off-by: Mat Martineau 

Applied to net-next, thanks.

Re: [PATCH] tcp: Export to userspace the TCP state names for the trace events

2017-11-11 Thread Yafang Shao

2017-11-11 3:32 GMT+00:00 Steven Rostedt :
> On Sat, 11 Nov 2017 02:06:00 +
> Yafang Shao  wrote:
>
>> 2017-11-10 15:07 GMT+00:00 Steven Rostedt :
>> > On Fri, 10 Nov 2017 12:56:06 +0800
>> > Yafang Shao  wrote:
>> >
>> >> Could the macro tcp_state_name() be renamed ？
>> >> If  is included in include/net/tcp.h, it will
>> >
>> > Ideally, you don't want to include trace/events/*.h headers in other
>> > headers, as they can have side effects if those headers are included in
>> > other trace/events/*.h headers.
>> >
>>
>> Actually I find trace/events/*.h is included in lots of other headers,
>> for example,
>>
>> net/rxrpc/ar-internal.h
>
> This is an internal header, so it's not that likely to be used where it
> shouldn't be.
>
>> include/linux/bpf_trace.h
>> fs/f2fs/trace.h
>
> The above two are actually headers specifically used to pull in the
> trace/events/*.h headers.
>
>> fs/afs/internal.h
>
> another internal header. Unlikely to be misused.
>
>> arch/x86/include/asm/mmu_context.h
>
> This one, hmm, probably should be fixed.
>
>> ...
>>
>> Are these files doing properly ?
>
> Most yes, some probably not.
>
>> Should we fix them ?
>
> Probably, but if they are used incorrectly, it would usually fail on
> build (The same global functions and variables would be defined).
>
>>
>> But per my understanding, it is ok to include  trace/events/*.h in
>> other headers because we defined TRACE_SYSTEM as well, as a
>> consequence those headers should not included in trace/events/*.h. If
>> that happens, it may means that one of the these two TRACE_SYSTEM is
>> not defined properly. Maybe these two TRACE_SYSTEM should be merged to
>> one TRACE_SYSTEM.
>
> Two different files may have the same TRACE_SYSTEM defined. That's not
> an issue.
>
> The issue is, if you have a trace/events/*.h header in a popular file
> (like it use to be in include/linux/slab.h), then it can cause issues
> if another trace/events/*.h header includes it. That's because each
> trace/events/*.h header must be included with CREATE_TRACE_POINTS only
> once.
>

Understood.
Thanks for explanation.

>>
>>
>> >> cause compile error, because there's another function tcp_state_name()
>> >> defined in net/netfilter/ipvs/ip_vs_proto_tcp.c.
>> >> static const char * tcp_state_name(int state)
>> >> {
>> >>
>> >> if (state >= IP_VS_TCP_S_LAST)
>> >>
>> >> return "ERR!";
>> >>
>> >> return tcp_state_name_table[state] ? tcp_state_name_table[state] 
>> >> : "?";
>> >>
>> >> }
>> >
>> > But that said, I didn't make up the trace_state_name(), it was already
>> > there in net-next before this patch.
>> >
>>
>> I know that is not your fault.
>
> :-)
>
>> But as you are modifying this file, it is better to modify it in your
>> patch as well.
>> So we need not submit another new patch to fix it.
>
> I could whip up a patch 2.
>
>>
>> > But yeah, in actuality, I would have just done:
>> >
>> > #define EM(a)   { a, #a },
>> > #define EMe(a)  { a, #a }
>> >
>> > directly. Which we can still do.
>> >
>> > -- Steve
>> >
>>
>> The suggestion from Song is good to fix it.
>
> Song's suggestion seems like it can simple be a patch added on top of
> mine. As it is somewhat agnostic to the fix I'm making. That is, it's a
> different problem, and thus should be a different patch.
>

Got it. These two issues should be fixed in two different patches :-)

Thanks
Yafang

Re: [PATCH v4] af_netlink: ensure that NLMSG_DONE never fails in dumps

2017-11-11 Thread Johannes Berg

On Sat, 2017-11-11 at 23:09 +0900, David Miller wrote:
> From: "Jason A. Donenfeld" 
> Date: Thu,  9 Nov 2017 13:04:44 +0900
> 
> > @@ -2195,13 +2197,15 @@ static int netlink_dump(struct sock *sk)
> >   return 0;
> >   }
> >  
> > - nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);
> > - if (!nlh)
> > + nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE,
> > +sizeof(nlk->dump_done_errno), NLM_F_MULTI);
> > + if (WARN_ON(!nlh))
> >   goto errout_skb;
> 
> If you're handling this by forcing another read() to procude the
> NLMSG_DONE, then you have no reason to WARN_ON() here.
> 
> In fact you are adding a WARN_ON() which is trivially triggerable by
> any user.

I added this in my suggestion for how this could work, but I don't
think you're right, since we previously check if there's enough space.
The patch is missing the full context, but this is:


+   if (nlk->dump_done_errno > 0 ||
+   skb_tailroom(skb) < nlmsg_total_size(sizeof(nlk->dump_done_errno))) 
{
mutex_unlock(nlk->cb_mutex);
 
if (sk_filter(sk, skb))
kfree_skb(skb);
else
__netlink_sendskb(sk, skb);
return 0;
}

-   nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);
-   if (!nlh)
+   nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE,
+  sizeof(nlk->dump_done_errno), NLM_F_MULTI);
+   if (WARN_ON(!nlh))


So unless the nlmsg_total_size() vs. nlmsg_put_answer() suddenly gets a
different idea of how much space is needed, nlh shouldn't ever be NULL
once we get here.

johannes

Re: pull-request: wireless-drivers-next 2017-11-11

2017-11-11 Thread Kalle Valo

David Miller  writes:

> From: Kalle Valo 
> Date: Sat, 11 Nov 2017 15:03:14 +0200
>
>> some more patches to net-next for v4.15. Even though I applied the last
>> patch only on Saturday morning, all these have been tested by kbuild bot
>> and most of them should also be in linux-next. Please let me know if
>> there are any problems.
>
> Pulled, but looking at your merge commit message:

Thanks!

>> Major changes:
>> 
>> iwlwifi
>> 
>> * some new PCI IDs
>
> I doubt this was the only major change in here :-)))

Yeah, you're right. I wrote that too hastily. My excuse this time is
that I tagged it at the airport :) But of course I should have prepared
it better.

-- 
Kalle Valo

[GIT] Networking

2017-11-11 Thread David Miller


1) Use after free in vlan, from Cong Wang.

2) Handle NAPI poll with a zero budget properly in mlx5 driver,
   from Saeed Mahameed.

3) If DMA mapping fails in mlx5 driver, NULL out page, from Inbar
   Karmy.

4) Handle overrun in RX FIFO of sun4i CAN driver, from Gerhard
   Bertelsmann.

5) Missing return in mdb and vlan prepare phase of DSA layer, from
   Vivien Didelot.

Please pull, thanks a lot!

The following changes since commit 3fefc31843cfe2b5f072efe11ed9ccaf6a7a5092:

  Merge tag 'pm-final-4.14' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm (2017-11-09 
11:16:28 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 

for you to fetch changes up to 92d28828179675176cd90293699b394b6d22ce68:

  Merge tag 'linux-can-fixes-for-4.14-20171110' of 
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can (2017-11-11 
21:52:01 +0900)


Cong Wang (1):
  vlan: fix a use-after-free in vlan_device_event()

David S. Miller (2):
  Merge tag 'mlx5-fixes-2017-11-08' of git://git.kernel.org/.../saeed/linux
  Merge tag 'linux-can-fixes-for-4.14-20171110' of 
git://git.kernel.org/.../mkl/linux-can

Eric Dumazet (1):
  tcp: gso: avoid refcount_t warning from tcp_gso_segment()

Eugenia Emantayev (1):
  net/mlx5e: Increase Striding RQ minimum size limit to 4 multi-packet WQEs

Gerhard Bertelsmann (1):
  can: sun4i: handle overrun in RX FIFO

Huy Nguyen (2):
  net/mlx5: Loop over temp list to release delay events
  net/mlx5: Cancel health poll before sending panic teardown command

Håkon Bugge (1):
  rds: ib: Fix NULL pointer dereference in debug code

Inbar Karmy (1):
  net/mlx5e: Set page to null in case dma mapping fails

Marek Vasut (1):
  can: ifi: Fix transmitter delay calculation

Richard Schütz (1):
  can: c_can: don't indicate triple sampling support for D_CAN

Saeed Mahameed (1):
  net/mlx5e: Fix napi poll with zero budget

Stephane Grosjean (1):
  can: peak: Add support for new PCIe/M2 CAN FD interfaces

Vivien Didelot (2):
  net: dsa: return after mdb prepare phase
  net: dsa: return after vlan prepare phase

Yuchung Cheng (1):
  tcp: fix tcp_fastretrans_alert warning

 drivers/net/can/c_can/c_can_pci.c |  1 -
 drivers/net/can/c_can/c_can_platform.c|  1 -
 drivers/net/can/ifi_canfd/ifi_canfd.c |  6 +++---
 drivers/net/can/peak_canfd/peak_pciefd_main.c | 14 --
 drivers/net/can/sun4i_can.c   | 12 ++--
 drivers/net/ethernet/mellanox/mlx5/core/dev.c |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c   | 12 +---
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c | 10 ++
 drivers/net/ethernet/mellanox/mlx5/core/main.c|  7 +++
 net/8021q/vlan.c  |  6 +++---
 net/dsa/switch.c  |  4 
 net/ipv4/tcp_input.c  |  3 +--
 net/ipv4/tcp_offload.c| 12 ++--
 net/rds/ib_recv.c | 10 +-
 15 files changed, 68 insertions(+), 34 deletions(-)

[PATCH net-next] net: dsa: Fix dependencies on bridge

2017-11-11 Thread Andrew Lunn

DSA now uses one of the symbols exported by the bridge,
br_vlan_enabled(). This has a stub, if the bridge is not
enabled. However, if the bridge is enabled, we cannot have DSA built
in and the bridge as a module, otherwise we get undefined symbols at
link time:

   net/dsa/port.o: In function `dsa_port_vlan_add':
   net/dsa/port.c:255: undefined reference to `br_vlan_enabled'
   net/dsa/port.o: In function `dsa_port_vlan_del':
   net/dsa/port.c:270: undefined reference to `br_vlan_enabled'

Reported-by: kbuild test robot 
Signed-off-by: Andrew Lunn 
---
 net/dsa/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/dsa/Kconfig b/net/dsa/Kconfig
index cc5f8f971689..6246254e9a2b 100644
--- a/net/dsa/Kconfig
+++ b/net/dsa/Kconfig
@@ -7,6 +7,7 @@ config HAVE_NET_DSA
 config NET_DSA
tristate "Distributed Switch Architecture"
depends on HAVE_NET_DSA && MAY_USE_DEVLINK
+   depends on BRIDGE || BRIDGE=n
select NET_SWITCHDEV
select PHYLIB
---help---
-- 
2.15.0

Re: [PATCH net-next 2/4] net: dsa: tag_brcm: Prepare for supporting prepended tag

2017-11-11 Thread Andrew Lunn

> +static struct sk_buff *brcm_tag_rcv_ll(struct sk_buff *skb,
> +struct net_device *dev,
> +struct packet_type *pt,
> +unsigned int offset)
>  {
>   int source_port;
>   u8 *brcm_tag;
> @@ -103,8 +114,7 @@ static struct sk_buff *brcm_tag_rcv(struct sk_buff *skb, 
> struct net_device *dev,
>   if (unlikely(!pskb_may_pull(skb, BRCM_TAG_LEN)))
>   return NULL;
>  
> - /* skb->data points to the EtherType, the tag is right before it */
> - brcm_tag = skb->data - 2;
> + brcm_tag = skb->data - offset;

A minor nit.

The first part of the comment is still true. And having it gives you
an anchor point to understanding where are we going from when we go
backwards in the packet. Yes, the comment appears later, but at that
point we are not dealing with skb->data.

Otherwise:

Reviewed-by: Andrew Lunn 

Andrew

[PATCH] wcn36xx: fix iris child-node lookup

2017-11-11 Thread Johan Hovold

Fix child-node lookup during probe, which ended up searching the whole
device tree depth-first starting at the parent rather than just matching
on its children.

To make things worse, the parent mmio node was also prematurely freed.

Fixes: fd52bdae9ab0 ("wcn36xx: Disable 5GHz for wcn3620")
Cc: stable  # 4.14
Cc: Loic Poulain 
Signed-off-by: Johan Hovold 
---
 drivers/net/wireless/ath/wcn36xx/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/wcn36xx/main.c 
b/drivers/net/wireless/ath/wcn36xx/main.c
index 71812a2dd513..f7d228b5ba93 100644
--- a/drivers/net/wireless/ath/wcn36xx/main.c
+++ b/drivers/net/wireless/ath/wcn36xx/main.c
@@ -1233,7 +1233,7 @@ static int wcn36xx_platform_get_resources(struct wcn36xx 
*wcn,
}
 
/* External RF module */
-   iris_node = of_find_node_by_name(mmio_node, "iris");
+   iris_node = of_get_child_by_name(mmio_node, "iris");
if (iris_node) {
if (of_device_is_compatible(iris_node, "qcom,wcn3620"))
wcn->rf_id = RF_IRIS_WCN3620;
-- 
2.15.0

Re: [PATCH net] vxlan: fix the issue that neigh proxy blocks all icmpv6 packets

2017-11-11 Thread Vincent Bernat

 ❦ 11 novembre 2017 19:58 +0800, Xin Long  :

> Commit f1fb08f6337c ("vxlan: fix ND proxy when skb doesn't have transport
> header offset") removed icmp6_code and icmp6_type check before calling
> neigh_reduce when doing neigh proxy.
>
> It means all icmpv6 packets would be blocked by this, not only ns packet.
> In Jianlin's env, even ping6 couldn't work through it.
>
> This patch is to bring the icmp6_code and icmp6_type check back and also
> removed the same check from neigh_reduce().

I am very sorry for not having spotted this bug earlier. I have tested
your fix and I can confirm it works as expected.

> Fixes: f1fb08f6337c ("vxlan: fix ND proxy when skb doesn't have transport 
> header offset")
> Reported-by: Jianlin Shi 
> Signed-off-by: Xin Long 
Reviewed-by: Vincent Bernat 
-- 
Don't just echo the code with comments - make every comment count.
- The Elements of Programming Style (Kernighan & Plauger)

Re: pull-request: can 2017-11-10

2017-11-11 Thread David Miller

From: Marc Kleine-Budde 
Date: Fri, 10 Nov 2017 14:07:26 +0100

> this is a pull request for net/master.
> 
> The first patch by Richard Schütz for the c_can driver removes the false
> indication to support triple sampling for d_can. Gerhard Bertelsmann's
> patch for the sun4i driver improves the RX overrun handling. The patch
> by Stephane Grosjean for the peak_canfd driver adds the PCI ids for
> various new PCIe/M2 interfaces. Marek Vasut's patch for the ifi driver
> fix transmitter delay calculation.

Pulled, thanks Marc.

pull-request: wireless-drivers-next 2017-11-11

2017-11-11 Thread Kalle Valo

Hi Dave,

some more patches to net-next for v4.15. Even though I applied the last
patch only on Saturday morning, all these have been tested by kbuild bot
and most of them should also be in linux-next. Please let me know if
there are any problems.

Kalle

The following changes since commit 2798b80b385384d51a81832556ee9ad25d175f9b:

  Merge branch 'eBPF-based-device-cgroup-controller' (2017-11-05 23:26:51 +0900)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next.git 
tags/wireless-drivers-next-for-davem-2017-11-11

for you to fetch changes up to fdd0bd88ceaecf729db103ac8836af5805dd2dc1:

  brcmfmac: add CLM download support (2017-11-11 03:04:09 +0200)


wireless-drivers-next patches for 4.15

Last minute patches before the merge window. Not really anything
special standing out, mostly fixes or cleanup and some minor new
features.

Major changes:

iwlwifi

* some new PCI IDs


Arend Van Spriel (6):
  brcmfmac: handle FWHALT mailbox indication
  brcmfmac: cleanup brcmf_cfg80211_escan() function
  brcmfmac: use msecs_to_jiffies() instead of calculation using HZ
  brcmfmac: get rid of brcmf_cfg80211_escan() function
  brcmfmac: get rid of struct brcmf_cfg80211_info::active_scan field
  brcmfmac: move configuration of probe request IEs

Arnd Bergmann (4):
  rtlwifi: fix uninitialized rtlhal->last_suspend_sec time
  rtlwifi: use ktime_get_real_seconds() for suspend time
  rtlwifi: drop unused ppsc->last_wakeup_time
  rt2x00: use monotonic timestamps for frame dump

Chung-Hsien Hsu (1):
  brcmfmac: add CLM download support

Colin Ian King (5):
  rtlwifi: remove redundant pointer tid_data
  rtlwifi: remove redundant initialization to cfg_cmd
  iwlegacy: remove redundant pointer sta_priv
  orinoco_usb: remove redundant pointer dev
  zd1201: remove unused variable framelen

Emmanuel Grumbach (3):
  iwlwifi: mvm: rs: remove the ANT C from the toogle antenna logic
  iwlwifi: remove dead code for internal devices only
  iwlwifi: remove host assisted paging

Franky Lin (1):
  brcmfmac: disable packet filtering in promiscuous mode

Gustavo A. R. Silva (1):
  rsi: rsi_91x_ps: remove redundant code in str_psstate

Igor Mitsyanko (9):
  qtnfmac: use per-band HT/VHT info from wireless device
  qtnfmac: initialize HT/VHT caps "can override" masks
  qtnfmac: get rid of PHYMODE capabilities flags
  qtnfmac: extend "IE set" TLV to include frame type info
  qtnfmac: SCAN results: retreive frame type information from "IE set" TLV
  qtnfmac: convert "Append IEs" command to QTN_TLV_ID_IE_SET usage
  qtnfmac: configure and start AP interface with a single command
  qtnfmac: include HTCAP and VHTCAP into config AP command
  qtnfmac: pass all CONNECT cmd params to wireless card for processing

Ihab Zhaika (3):
  iwlwifi: add new cards for 8260 series
  iwlwifi: add new cards for 8265 series
  iwlwifi: add new cards for a000 series

Kalle Valo (1):
  Merge tag 'iwlwifi-next-for-kalle-2017-11-03' of 
git://git.kernel.org/.../iwlwifi/iwlwifi-next

Kees Cook (1):
  iwlwifi: mvm: Convert timers to use timer_setup()

Kirtika Ruchandani (1):
  iwlwifi: Add more call-sites for pcie reg dumper

Larry Finger (3):
  rtlwifi: rtl_pci: Fix formatting errors in pci.h
  rtlwifi: rtl_pci: Fix formatting problems in pci.c
  rtlwifi: rtl_pci: Simplify some code be eliminating extraneous variables

Liad Kaufman (1):
  iwlwifi: mvm: reset seq num after restart

Luca Coelho (1):
  iwlwifi: mvm: hold mutex when flushing in iwl_mvm_flush_no_vif()

Ping-Ke Shih (4):
  rtlwifi: rtl_pci: Add support for 8822be TX/RX BD
  rtlwifi: rtl_pci: Add fill_tx_special_desc to issue H2C data, and process 
TXOK in interrupt.
  rtlwifi: rtl_pci: Add ID for 8822BE
  rtlwifi: rtl_pci: Extend recognized interrupt parameters from two to four 
ISR

Sara Sharon (6):
  iwlwifi: mvm: use RS macro instead of duplicating the code
  iwlwifi: mvm: cleanup references to aggregation count limit
  iwlwifi: mvm: improve latency when there is a reorder timeout
  iwlwifi: fix multi queue notification for a000 devices
  iwlwifi: mvm: refactor iwl_mvm_flush_no_vif
  iwlwifi: mvm: add missing implementation of flush for a000 devices

Shahar S Matityahu (1):
  iwlwifi: drop RX frames during hardware restart

Stanislaw Gruszka (1):
  rt2x00usb: mark device removed when get ENOENT usb error

 .../net/wireless/broadcom/brcm80211/brcmfmac/bus.h |  10 +
 .../broadcom/brcm80211/brcmfmac/cfg80211.c | 162 ++--
 .../broadcom/brcm80211/brcmfmac/cfg80211.h |   2 -
 .../wireless/broadcom/brcm80211/brcmfmac/common.c  | 157 
 .../wireless/broadcom/brcm80211/brcmfmac/core.c

Re: [net-next] tcp: allow drivers to tweak TSQ logic

2017-11-11 Thread Johannes Berg

Thanks Eric!

> We expect wifi drivers to set this field to smaller values (tests have
> been done with values from 6 to 9)

I suppose we should test each driver or so.

> They would have to use following template :
> 
> if (skb->sk && skb->sk->sk_pacing_shift != MY_PACING_SHIFT)
>  skb->sk->sk_pacing_shift = MY_PACING_SHIFT;

Hm. I wish we wouldn't have to do this on every skb, but perhaps it
doesn't matter that much.


>   u16 sk_gso_max_segs;
> + u8  sk_pacing_shift;

I guess you tried to fill a hole, but weren't we saying that it would
be better in the same cacheline? Then again, perhaps both cachelines
are resident anyway, haven't looked at this now.

Unrelated to that, I think this is missing a documentation update since
the struct has kernel-doc comments.

johannes

[PATCH net] vxlan: fix the issue that neigh proxy blocks all icmpv6 packets

2017-11-11 Thread Xin Long

Commit f1fb08f6337c ("vxlan: fix ND proxy when skb doesn't have transport
header offset") removed icmp6_code and icmp6_type check before calling
neigh_reduce when doing neigh proxy.

It means all icmpv6 packets would be blocked by this, not only ns packet.
In Jianlin's env, even ping6 couldn't work through it.

This patch is to bring the icmp6_code and icmp6_type check back and also
removed the same check from neigh_reduce().

Fixes: f1fb08f6337c ("vxlan: fix ND proxy when skb doesn't have transport 
header offset")
Reported-by: Jianlin Shi 
Signed-off-by: Xin Long 
---
 drivers/net/vxlan.c | 31 +--
 1 file changed, 13 insertions(+), 18 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index d7c49cf..a2f4e52 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1623,26 +1623,19 @@ static struct sk_buff *vxlan_na_create(struct sk_buff 
*request,
 static int neigh_reduce(struct net_device *dev, struct sk_buff *skb, __be32 
vni)
 {
struct vxlan_dev *vxlan = netdev_priv(dev);
-   struct nd_msg *msg;
-   const struct ipv6hdr *iphdr;
const struct in6_addr *daddr;
-   struct neighbour *n;
+   const struct ipv6hdr *iphdr;
struct inet6_dev *in6_dev;
+   struct neighbour *n;
+   struct nd_msg *msg;
 
in6_dev = __in6_dev_get(dev);
if (!in6_dev)
goto out;
 
-   if (!pskb_may_pull(skb, sizeof(struct ipv6hdr) + sizeof(struct nd_msg)))
-   goto out;
-
iphdr = ipv6_hdr(skb);
daddr = >daddr;
-
msg = (struct nd_msg *)(iphdr + 1);
-   if (msg->icmph.icmp6_code != 0 ||
-   msg->icmph.icmp6_type != NDISC_NEIGHBOUR_SOLICITATION)
-   goto out;
 
if (ipv6_addr_loopback(daddr) ||
ipv6_addr_is_multicast(>target))
@@ -2240,11 +2233,11 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
 static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
 {
struct vxlan_dev *vxlan = netdev_priv(dev);
+   struct vxlan_rdst *rdst, *fdst = NULL;
const struct ip_tunnel_info *info;
-   struct ethhdr *eth;
bool did_rsc = false;
-   struct vxlan_rdst *rdst, *fdst = NULL;
struct vxlan_fdb *f;
+   struct ethhdr *eth;
__be32 vni = 0;
 
info = skb_tunnel_info(skb);
@@ -2269,12 +2262,14 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, 
struct net_device *dev)
if (ntohs(eth->h_proto) == ETH_P_ARP)
return arp_reduce(dev, skb, vni);
 #if IS_ENABLED(CONFIG_IPV6)
-   else if (ntohs(eth->h_proto) == ETH_P_IPV6) {
-   struct ipv6hdr *hdr, _hdr;
-   if ((hdr = skb_header_pointer(skb,
- skb_network_offset(skb),
- sizeof(_hdr), &_hdr)) &&
-   hdr->nexthdr == IPPROTO_ICMPV6)
+   else if (ntohs(eth->h_proto) == ETH_P_IPV6 &&
+pskb_may_pull(skb, sizeof(struct ipv6hdr) +
+   sizeof(struct nd_msg)) &&
+ipv6_hdr(skb)->nexthdr == IPPROTO_ICMPV6) {
+   struct nd_msg *m = (struct nd_msg *)(ipv6_hdr(skb) + 1);
+
+   if (m->icmph.icmp6_code == 0 &&
+   m->icmph.icmp6_type == NDISC_NEIGHBOUR_SOLICITATION)
return neigh_reduce(dev, skb, vni);
}
 #endif
-- 
2.1.0

Re: [PATCH net-next 1/4] net: dsa: Pass a port to get_tag_protocol()

2017-11-11 Thread Andrew Lunn

On Fri, Nov 10, 2017 at 03:22:52PM -0800, Florian Fainelli wrote:
> A number of drivers want to check whether the configured CPU port is a
> possible configuration for enabling tagging, pass down the CPU port
> number so they verify that.
> 
> -static bool b53_can_enable_brcm_tags(struct dsa_switch *ds)
> +static bool b53_can_enable_brcm_tags(struct dsa_switch *ds, int port)
>  {
> - unsigned int brcm_tag_mask;
> - unsigned int i;
> -
>   /* Broadcom switches will accept enabling Broadcom tags on the
>* following ports: 5, 7 and 8, any other port is not supported
>*/
> - brcm_tag_mask = BIT(B53_CPU_PORT_25) | BIT(7) | BIT(B53_CPU_PORT);
> -
> - for (i = 0; i < ds->num_ports; i++) {
> - if (dsa_is_cpu_port(ds, i)) {
> - if (!(BIT(i) & brcm_tag_mask)) {
> - dev_warn(ds->dev,
> -  "Port %d is not Broadcom tag 
> capable\n",
> -  i);
> - return false;
> - }
> - }
> + switch (port) {
> + case B53_CPU_PORT_25:
> + case 7:
> + case B53_CPU_PORT:
> + return true;
>   }
>  
> - return true;
> + dev_warn(ds->dev, "Port %d is not Broadcom tag capable\n", port);
> + return false;
>  }

Hi Florian

This looks a lot better than the previous implementation.

Reviewed-by: Andrew Lunn 

Andrew

[PATCHv3 1/1] bnx2x: fix slowpath null crash

2017-11-11 Thread Zhu Yanjun

When "NETDEV WATCHDOG: em4 (bnx2x): transmit queue 2 timed out" occurs,
BNX2X_SP_RTNL_TX_TIMEOUT is set. In the function bnx2x_sp_rtnl_task,
bnx2x_nic_unload and bnx2x_nic_load are executed to shutdown and open
NIC. In the function bnx2x_nic_load, bnx2x_alloc_mem allocates dma
failure. The message "bnx2x: [bnx2x_alloc_mem:8399(em4)]Can't
allocate memory" pops out. The variable slowpath is set to NULL.
When shutdown the NIC, the function bnx2x_nic_unload is called. In
the function bnx2x_nic_unload, the following functions are executed.
bnx2x_chip_cleanup
bnx2x_set_storm_rx_mode
bnx2x_set_q_rx_mode
bnx2x_set_q_rx_mode
bnx2x_config_rx_mode
bnx2x_set_rx_mode_e2
In the function bnx2x_set_rx_mode_e2, the variable slowpath is operated.
Then the crash occurs.
To fix this crash, the variable slowpath is checked. And in the function
bnx2x_sp_rtnl_task, after dma memory allocation fails, another shutdown
and open NIC is executed.

CC: Joe Jin 
CC: Junxiao Bi 
Signed-off-by: Zhu Yanjun 
Acked-by: Ariel Elior 
---
v2->v3
Changes: fix the style of comments, add the leading space
V1->v2
Changes: add Acker and remove unnecessary brackets
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index c12b4d3..fbd302a 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -9332,7 +9332,7 @@ void bnx2x_chip_cleanup(struct bnx2x *bp, int 
unload_mode, bool keep_link)
/* Schedule the rx_mode command */
if (test_bit(BNX2X_FILTER_RX_MODE_PENDING, >sp_state))
set_bit(BNX2X_FILTER_RX_MODE_SCHED, >sp_state);
-   else
+   else if (bp->slowpath)
bnx2x_set_storm_rx_mode(bp);
 
/* Cleanup multicast configuration */
@@ -10271,8 +10271,15 @@ static void bnx2x_sp_rtnl_task(struct work_struct 
*work)
smp_mb();
 
bnx2x_nic_unload(bp, UNLOAD_NORMAL, true);
-   bnx2x_nic_load(bp, LOAD_NORMAL);
-
+   /* When ret value shows failure of allocation failure,
+* the nic is rebooted again. If open still fails, a error
+* message to notify the user.
+*/
+   if (bnx2x_nic_load(bp, LOAD_NORMAL) == -ENOMEM) {
+   bnx2x_nic_unload(bp, UNLOAD_NORMAL, true);
+   if (bnx2x_nic_load(bp, LOAD_NORMAL))
+   BNX2X_ERR("Open the NIC fails again!\n");
+   }
rtnl_unlock();
return;
}
-- 
2.7.4

( United Nations Compensation Unit )

2017-11-11 Thread United Nations

United Nations Compensation Unit, In Affiliation with World Bank Our Ref: 
U.N.O/W.B.O/11/11/2017/1982/09/05.



Congratulations Beneficiary,



We have been working closely with the INTERPOL, CIA, FBI and other foreign 
international organizations as well as Western Union and Money Gram regarding 
all payments you have made in the past and we have the complete lists and 
amount you have made so far. However, until this moment you have failed to 
receive your payment. We must get you informed that the previous official you 
have dealt with have been apprehended and would soon be charged to court and 
brought to justice.


We have been having a meeting for quit sometime now and we just came to a 
logical conclusion 72 hours ago in affiliation with the World Bank president. 
Your email was listed among those that are yet to receive their compensation 
payment. The United Nations in Affiliation with World Bank have agreed to 
compensate them with the sum of USD1,500, 000.00 (One Million Five Hundred 
Thousand United States Dollars) only.


For this reason, you are to receive your payment through a certified ATM MASTER 
CARD. Note, with this Master Card you can withdraw money from any part of the 
World without being disturbed or delay and please for no reason should you 
disclose your account information as your account information is not and can 
never be needed before you receive your card payment. All that is required of 
your now is to contact our 100% trust officials by the Name of Mrs. Sarah 
Ngene. Below is her contact information:


Name: Mrs. Sarah Ngene
Email: sarahng...@zenith-bank-plc.ugu.pl
Email: sarah.ng...@hotmail.com



Please ensure that you follow the directives and instructions of Mrs. Sarah 
Ngene so that within 72 hours you would have received your card payment and 
your secret pin code issued directly to you for security reasons. We apologize 
on behalf of the United Nation Organization for any delay you might have 
encountered in receiving your fund in the past. Congratulations, and I look 
forward to hear from you as soon as you confirm your payment making the world a 
better place.


Yours Faithfully,


Marie Chatardova
Under-Secretary-General for Economic and Social Council

[PATCH net-next 0/5] net: improve the process of redirect and toobig for ipv6 tunnels

2017-11-11 Thread Xin Long

Now let's say there are 3 kinds of icmp packets to process for tunnels,
toobig(needfrag), redirect, others, their process should be:

 - toobig(needfrag)
   update the lower dst's pmtu by route cache, also update sk dst's pmtu
   if possible, or it will be fine if sk dst pmtu will get updated on tx
   path.

 - redirect
   update the lower dst's gw by route cache and return, no need to send
   this redirect packet to user sk.

 - others
   send the packet to user's sk, or it will also be fine to use err_count
   to count it and report fail link on tx path.

All ipv4 tunnels basically follow this while some of ipv6 tunnels are
doing in different ways, like ip6gre and ip6_tunnels update tnl dev's
mtu instead of updating lower dst pmtu, no redirect process on their
err_handlers, which doesn't make any sense and even causes performance
problems.

This patchset is to improve the process of redirect and toobig for ip6gre
ip4ip6, ip6ip6 tunnels, as in ipv4 tunnels.

Xin Long (5):
  ip6_gre: add the process for redirect in ip6gre_err
  ip6_gre: process toobig in a better way
  ip6_tunnel: add the process for redirect in ip6_tnl_err
  ip6_tunnel: process toobig in a better way
  ip6_tunnel: clean up ip4ip6 and ip6ip6's err_handlers

 net/ipv6/ip6_gre.c| 20 ++--
 net/ipv6/ip6_tunnel.c | 64 ++-
 2 files changed, 34 insertions(+), 50 deletions(-)

-- 
2.1.0

[PATCH net-next 3/5] ip6_tunnel: add the process for redirect in ip6_tnl_err

2017-11-11 Thread Xin Long

The same process for redirect in "ip6_gre: add the process for redirect
in ip6gre_err" is needed by ip4ip6 and ip6ip6 as well.

Signed-off-by: Xin Long 
---
 net/ipv6/ip6_tunnel.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 439d65f..a1f704c 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -471,15 +471,16 @@ static int
 ip6_tnl_err(struct sk_buff *skb, __u8 ipproto, struct inet6_skb_parm *opt,
u8 *type, u8 *code, int *msg, __u32 *info, int offset)
 {
-   const struct ipv6hdr *ipv6h = (const struct ipv6hdr *) skb->data;
-   struct ip6_tnl *t;
-   int rel_msg = 0;
+   const struct ipv6hdr *ipv6h = (const struct ipv6hdr *)skb->data;
+   struct net *net = dev_net(skb->dev);
u8 rel_type = ICMPV6_DEST_UNREACH;
u8 rel_code = ICMPV6_ADDR_UNREACH;
-   u8 tproto;
__u32 rel_info = 0;
-   __u16 len;
+   struct ip6_tnl *t;
int err = -ENOENT;
+   int rel_msg = 0;
+   u8 tproto;
+   __u16 len;
 
/* If the packet doesn't contain the original IPv6 header we are
   in trouble since we might need the source address for further
@@ -543,6 +544,10 @@ ip6_tnl_err(struct sk_buff *skb, __u8 ipproto, struct 
inet6_skb_parm *opt,
rel_msg = 1;
}
break;
+   case NDISC_REDIRECT:
+   ip6_redirect(skb, net, skb->dev->ifindex, 0,
+sock_net_uid(net, NULL));
+   break;
}
 
*type = rel_type;
-- 
2.1.0

[PATCH net-next 5/5] ip6_tunnel: clean up ip4ip6 and ip6ip6's err_handlers

2017-11-11 Thread Xin Long

This patch is to remove some useless codes of redirect and fix some
indents on ip4ip6 and ip6ip6's err_handlers.

Note that redirect icmp packet is already processed in ip6_tnl_err,
the old redirect codes in ip4ip6_err actually never worked even
before this patch. Besides, there's no need to send redirect to
user's sk, it's for lower dst, so just remove it in this patch.

Signed-off-by: Xin Long 
---
 net/ipv6/ip6_tunnel.c | 42 ++
 1 file changed, 14 insertions(+), 28 deletions(-)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 7e9e205..00882fd 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -563,13 +563,12 @@ static int
 ip4ip6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
   u8 type, u8 code, int offset, __be32 info)
 {
-   int rel_msg = 0;
-   u8 rel_type = type;
-   u8 rel_code = code;
__u32 rel_info = ntohl(info);
-   int err;
-   struct sk_buff *skb2;
const struct iphdr *eiph;
+   struct sk_buff *skb2;
+   int err, rel_msg = 0;
+   u8 rel_type = type;
+   u8 rel_code = code;
struct rtable *rt;
struct flowi4 fl4;
 
@@ -594,10 +593,6 @@ ip4ip6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
rel_type = ICMP_DEST_UNREACH;
rel_code = ICMP_FRAG_NEEDED;
break;
-   case NDISC_REDIRECT:
-   rel_type = ICMP_REDIRECT;
-   rel_code = ICMP_REDIR_HOST;
-   /* fall through */
default:
return 0;
}
@@ -616,33 +611,26 @@ ip4ip6_err(struct sk_buff *skb, struct inet6_skb_parm 
*opt,
eiph = ip_hdr(skb2);
 
/* Try to guess incoming interface */
-   rt = ip_route_output_ports(dev_net(skb->dev), , NULL,
-  eiph->saddr, 0,
-  0, 0,
-  IPPROTO_IPIP, RT_TOS(eiph->tos), 0);
+   rt = ip_route_output_ports(dev_net(skb->dev), , NULL, eiph->saddr,
+  0, 0, 0, IPPROTO_IPIP, RT_TOS(eiph->tos), 0);
if (IS_ERR(rt))
goto out;
 
skb2->dev = rt->dst.dev;
+   ip_rt_put(rt);
 
/* route "incoming" packet */
if (rt->rt_flags & RTCF_LOCAL) {
-   ip_rt_put(rt);
-   rt = NULL;
rt = ip_route_output_ports(dev_net(skb->dev), , NULL,
-  eiph->daddr, eiph->saddr,
-  0, 0,
-  IPPROTO_IPIP,
-  RT_TOS(eiph->tos), 0);
-   if (IS_ERR(rt) ||
-   rt->dst.dev->type != ARPHRD_TUNNEL) {
+  eiph->daddr, eiph->saddr, 0, 0,
+  IPPROTO_IPIP, RT_TOS(eiph->tos), 0);
+   if (IS_ERR(rt) || rt->dst.dev->type != ARPHRD_TUNNEL) {
if (!IS_ERR(rt))
ip_rt_put(rt);
goto out;
}
skb_dst_set(skb2, >dst);
} else {
-   ip_rt_put(rt);
if (ip_route_input(skb2, eiph->daddr, eiph->saddr, eiph->tos,
   skb2->dev) ||
skb_dst(skb2)->dev->type != ARPHRD_TUNNEL)
@@ -654,10 +642,9 @@ ip4ip6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
if (rel_info > dst_mtu(skb_dst(skb2)))
goto out;
 
-   skb_dst(skb2)->ops->update_pmtu(skb_dst(skb2), NULL, skb2, 
rel_info);
+   skb_dst(skb2)->ops->update_pmtu(skb_dst(skb2), NULL, skb2,
+   rel_info);
}
-   if (rel_type == ICMP_REDIRECT)
-   skb_dst(skb2)->ops->redirect(skb_dst(skb2), NULL, skb2);
 
icmp_send(skb2, rel_type, rel_code, htonl(rel_info));
 
@@ -670,11 +657,10 @@ static int
 ip6ip6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
   u8 type, u8 code, int offset, __be32 info)
 {
-   int rel_msg = 0;
+   __u32 rel_info = ntohl(info);
+   int err, rel_msg = 0;
u8 rel_type = type;
u8 rel_code = code;
-   __u32 rel_info = ntohl(info);
-   int err;
 
err = ip6_tnl_err(skb, IPPROTO_IPV6, opt, _type, _code,
  _msg, _info, offset);
-- 
2.1.0

Re: [PATCH 1/2] bpf: add a bpf_override_function helper

2017-11-11 Thread Josef Bacik

On Sat, Nov 11, 2017 at 09:14:55AM +0100, Ingo Molnar wrote:
> 
> * Josef Bacik  wrote:
> 
> > On Fri, Nov 10, 2017 at 10:34:59AM +0100, Ingo Molnar wrote:
> > > 
> > > * Josef Bacik  wrote:
> > > 
> > > > @@ -551,6 +578,10 @@ static const struct bpf_func_proto 
> > > > *kprobe_prog_func_proto(enum bpf_func_id func
> > > > return _get_stackid_proto;
> > > > case BPF_FUNC_perf_event_read_value:
> > > > return _perf_event_read_value_proto;
> > > > +   case BPF_FUNC_override_return:
> > > > +   pr_warn_ratelimited("%s[%d] is installing a program 
> > > > with bpf_override_return helper that may cause unexpected behavior!",
> > > > +   current->comm, 
> > > > task_pid_nr(current));
> > > > +   return _override_return_proto;
> > > 
> > > So if this new functionality is used we'll always print this into the 
> > > syslog?
> > > 
> > > The warning is also a bit passive aggressive about informing the user: 
> > > what 
> > > unexpected behavior can happen, what is the worst case?
> > > 
> > 
> > It's modeled after the other warnings bpf will spit out, but with this 
> > feature
> > you are skipping a function and instead returning some arbitrary value, so
> > anything could go wrong if you mess something up.  For instance I screwed 
> > up my
> > initial test case and made every IO submitted return an error instead of 
> > just on
> > the one file system I was attempting to test, so all sorts of hilarity 
> > ensued.
> 
> Ok, then for the x86 bits:
> 
>   NAK-ed-by: Ingo Molnar 
> 
> One of the major advantages of having an in-kernel BPF sandbox is to never 
> crash 
> the kernel - and allowing BPF programs to just randomly modify the return 
> value of 
> kernel functions sounds immensely broken to me.
> 
> (And yes, I realize that kprobes are used here as a vehicle, but the point 
> remains.)
>

Only root can use this feature, and did you read the first email?  The whole
point of this is that error path checkig fucking sucks, and this gives us the
ability to systematically check our error paths and make the kernel way more
robust than it currently is.  Can things go wrong?  Sure, that's why its a
config option and root only.  You only want to turn this on for testing and not
have it on in production.  This is a valuable tool and well worth the risk.
Thanks,

Josef

Re: [PATCH v2 net-next 1/3] netem: convert to qdisc_watchdog_schedule_ns

2017-11-11 Thread kbuild test robot

Hi Dave,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Dave-Taht/netem-convert-to-qdisc_watchdog_schedule_ns/2017-184934
config: xtensa-allyesconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 4.9.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=xtensa 

All warnings (new ones prefixed by >>):

   In file included from ./arch/xtensa/include/generated/asm/div64.h:1:0,
from include/linux/kernel.h:173,
from include/asm-generic/bug.h:16,
from ./arch/xtensa/include/generated/asm/bug.h:1,
from include/linux/bug.h:5,
from include/linux/mmdebug.h:5,
from include/linux/mm.h:9,
from net/sched/sch_netem.c:16:
   net/sched/sch_netem.c: In function 'packet_len_2_sched_time':
   include/asm-generic/div64.h:208:28: warning: comparison of distinct pointer 
types lacks a cast
 (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \
   ^
>> net/sched/sch_netem.c:349:2: note: in expansion of macro 'do_div'
 do_div(offset, q->rate);
 ^

vim +/do_div +349 net/sched/sch_netem.c

   334  
   335  static s64 packet_len_2_sched_time(unsigned int len,
   336 struct netem_sched_data *q)
   337  {
   338  s64 offset;
   339  len += q->packet_overhead;
   340  
   341  if (q->cell_size) {
   342  u32 cells = reciprocal_divide(len, 
q->cell_size_reciprocal);
   343  
   344  if (len > cells * q->cell_size) /* extra cell needed 
for remainder */
   345  cells++;
   346  len = cells * (q->cell_size + q->cell_overhead);
   347  }
   348  offset = (s64)len * NSEC_PER_SEC;
 > 349  do_div(offset, q->rate);
   350  return offset;
   351  }
   352  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCH net-next v3 0/3] net: dsa: b53: Turn on Broadcom tags

2017-11-11 Thread David Miller

From: Florian Fainelli 
Date: Fri, 10 Nov 2017 11:33:24 -0800

> Hi all,
> 
> This was long overdue, with this patch series, the b53 driver now
> turns on Broadcom tags except for 5325 and 5365 which use an older
> format that we do not support yet (TBD).
> 
> First patch is necessary in order for bgmac, used on BCM5301X and Northstar
> Plus to work correctly and successfully send ARP packets back to the 
> requsester.
> 
> Second patch is actually a bug fix, but because net/master and net-next/master
> diverge in that area, I am targeting net-next/master here.
> 
> Finally, the last patch enables Broadcom tags after checking that the CPU port
> selected is either, 5, 7 or 8, since those are the only valid combinations
> given currently supported HW.
 ...

Series applied.

Re: pull-request: wireless-drivers-next 2017-11-11

2017-11-11 Thread David Miller

From: Kalle Valo 
Date: Sat, 11 Nov 2017 15:03:14 +0200

> some more patches to net-next for v4.15. Even though I applied the last
> patch only on Saturday morning, all these have been tested by kbuild bot
> and most of them should also be in linux-next. Please let me know if
> there are any problems.

Pulled, but looking at your merge commit message:

> Major changes:
> 
> iwlwifi
> 
> * some new PCI IDs

I doubt this was the only major change in here :-)))

Re: [PATCH net-next 4/4] net: dsa: b53: Support prepended Broadcom tags

2017-11-11 Thread Andrew Lunn

On Fri, Nov 10, 2017 at 03:22:55PM -0800, Florian Fainelli wrote:
> On BCM58xx devices (Northstar Plus), there is an accelerator attached to
> port 8 which would only work if we use prepended Broadcom tags. Resolve
> that difference in our get_tag_protocol() function by setting the
> appropriate tagging protocol in that case. We need to change
> b53_brcm_hdr_setup() a little bit now since we can deal with two types
> of Broadcom tags.
> 
> Signed-off-by: Florian Fainelli 

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH net-next 3/4] net: dsa: Support prepended Broadcom tag

2017-11-11 Thread Andrew Lunn

On Fri, Nov 10, 2017 at 03:22:54PM -0800, Florian Fainelli wrote:
> Add a new type: DSA_TAG_PROTO_PREPEND which allows us to support for the
> 4-bytes Broadcom tag that we already support, but in a format where it
> is pre-pended to the packet instead of located between the MAC SA and
> the Ethertyper (DSA_TAG_PROTO_BRCM).
> 
> Signed-off-by: Florian Fainelli 

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH v2 net-next 1/3] netem: convert to qdisc_watchdog_schedule_ns

2017-11-11 Thread kbuild test robot

Hi Dave,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Dave-Taht/netem-convert-to-qdisc_watchdog_schedule_ns/2017-184934
config: i386-randconfig-i1-201745 (attached as .config)
compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   net/sched/sch_netem.o: In function `netem_enqueue':
>> net/sched/sch_netem.c:323: undefined reference to `__moddi3'

vim +323 net/sched/sch_netem.c

661b7972 stephen hemminger 2011-02-23  302  
661b7972 stephen hemminger 2011-02-23  303  
^1da177e Linus Torvalds2005-04-16  304  /* tabledist - return a 
pseudo-randomly distributed value with mean mu and
^1da177e Linus Torvalds2005-04-16  305   * std deviation sigma.  Uses table 
lookup to approximate the desired
^1da177e Linus Torvalds2005-04-16  306   * distribution, and a 
uniformly-distributed pseudo-random source.
^1da177e Linus Torvalds2005-04-16  307   */
9d0cec66 Dave Taht 2017-11-08  308  static s64 tabledist(s64 mu, s64 
sigma,
b407621c Stephen Hemminger 2007-03-22  309   struct crndstate 
*state,
b407621c Stephen Hemminger 2007-03-22  310   const struct 
disttable *dist)
^1da177e Linus Torvalds2005-04-16  311  {
9d0cec66 Dave Taht 2017-11-08  312  s64 x;
b407621c Stephen Hemminger 2007-03-22  313  long t;
b407621c Stephen Hemminger 2007-03-22  314  u32 rnd;
^1da177e Linus Torvalds2005-04-16  315  
^1da177e Linus Torvalds2005-04-16  316  if (sigma == 0)
^1da177e Linus Torvalds2005-04-16  317  return mu;
^1da177e Linus Torvalds2005-04-16  318  
^1da177e Linus Torvalds2005-04-16  319  rnd = get_crandom(state);
^1da177e Linus Torvalds2005-04-16  320  
^1da177e Linus Torvalds2005-04-16  321  /* default uniform distribution 
*/
^1da177e Linus Torvalds2005-04-16  322  if (dist == NULL)
^1da177e Linus Torvalds2005-04-16 @323  return (rnd % 
(2*sigma)) - sigma + mu;
^1da177e Linus Torvalds2005-04-16  324  
^1da177e Linus Torvalds2005-04-16  325  t = dist->table[rnd % 
dist->size];
^1da177e Linus Torvalds2005-04-16  326  x = (sigma % NETEM_DIST_SCALE) 
* t;
^1da177e Linus Torvalds2005-04-16  327  if (x >= 0)
^1da177e Linus Torvalds2005-04-16  328  x += NETEM_DIST_SCALE/2;
^1da177e Linus Torvalds2005-04-16  329  else
^1da177e Linus Torvalds2005-04-16  330  x -= NETEM_DIST_SCALE/2;
^1da177e Linus Torvalds2005-04-16  331  
^1da177e Linus Torvalds2005-04-16  332  return  x / NETEM_DIST_SCALE + 
(sigma / NETEM_DIST_SCALE) * t + mu;
^1da177e Linus Torvalds2005-04-16  333  }
^1da177e Linus Torvalds2005-04-16  334  

:: The code at line 323 was first introduced by commit
:: 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 Linux-2.6.12-rc2

:: TO: Linus Torvalds <torva...@ppc970.osdl.org>
:: CC: Linus Torvalds <torva...@ppc970.osdl.org>

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCH net-next 0/2] net: dsa: lan9303: IGMP handling

2017-11-11 Thread David Miller

From: Egil Hjelmeland 
Date: Fri, 10 Nov 2017 12:54:33 +0100

> Set up the HW switch to trap IGMP packets to CPU port. 
> And make sure skb->offload_fwd_mark is cleared for incoming IGMP packets.
> 
> skb->offload_fwd_mark calculation is a candidate for consolidation into the
> DSA core. The calculation can probably be more polished when done at a point
> where DSA has updated skb.  

Series applied, thank you.

Re: [PATCH v4] af_netlink: ensure that NLMSG_DONE never fails in dumps

2017-11-11 Thread Johannes Berg


> > If you're handling this by forcing another read() to procude the
> > NLMSG_DONE, then you have no reason to WARN_ON() here.
> > 
> > In fact you are adding a WARN_ON() which is trivially triggerable by
> > any user.
> 
> I added this in my suggestion for how this could work, but I don't
> think you're right, since we previously check if there's enough space.

Or perhaps I should say this differently:

Forcing another read happens through the

skb_tailroom(skb) < nlmsg_total_size(...)

check, so the nlmsg_put_answer() can't really fail.


Handling nlmsg_put_answer() failures by forcing another read would have
required jumping to the existing if code with a goto, or restructuring
the whole thing completely somehow, and I didn't see how to do that.

johannes

Re: [run_timer_softirq] BUG: unable to handle kernel paging request at 0000000000010007

2017-11-11 Thread Fengguang Wu

On Fri, Nov 10, 2017 at 10:29:59PM +0100, Thomas Gleixner wrote:

On Fri, 10 Nov 2017, Linus Torvalds wrote:

On Wed, Nov 8, 2017 at 9:19 PM, Fengguang Wu  wrote:
>
> Yes it's accessing the list. Here is the faddr2line output.

Ok, so it's a corrupted timer list. Which is not a big surprise.

It's

next->pprev = pprev;

in __hlist_del(), and the trapping instruction decodes as

mov%rdx,0x8(%rax)

with %rax having the value dead0200,

Which is just LIST_POISON2.

So we've deleted that entry twice - LIST_POISON2 is what hlist_del()
sets pprev to after already deleting it once.

Although in this case it might not be hlist_del(), because
detach_timer() also sets entry->next to LIST_POISON2.

Which is pretty bogus, we are supposed to use LIST_POISON1 for the
"next" pointer. Oh well. Nobody cares, except for the list entry
debugging code, which isn't run on the hlist cases.

Adding Thomas Gleixner to the cc. It should not be possible to delete
the same timer twice.

Right, it shouldn't.

Fengguang, can you please enable:

CONFIG_DEBUG_OBJECTS
CONFIG_DEBUG_OBJECTS_TIMERS

and try to reproduce? Debugobject should catch that hopefully.

Sure. However I've not got any results until now -- it's rather hard
to reproduce. I'll check possible results tomorrow.

Regards,
Fengguang

Re: [PATCH v4] af_netlink: ensure that NLMSG_DONE never fails in dumps

2017-11-11 Thread David Miller

From: Johannes Berg 
Date: Sat, 11 Nov 2017 15:15:21 +0100

> On Sat, 2017-11-11 at 23:09 +0900, David Miller wrote:
>> From: "Jason A. Donenfeld" 
>> Date: Thu,  9 Nov 2017 13:04:44 +0900
>> 
>> > @@ -2195,13 +2197,15 @@ static int netlink_dump(struct sock *sk)
>> >   return 0;
>> >   }
>> >  
>> > - nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), 
>> > NLM_F_MULTI);
>> > - if (!nlh)
>> > + nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE,
>> > +sizeof(nlk->dump_done_errno), NLM_F_MULTI);
>> > + if (WARN_ON(!nlh))
>> >   goto errout_skb;
>> 
>> If you're handling this by forcing another read() to procude the
>> NLMSG_DONE, then you have no reason to WARN_ON() here.
>> 
>> In fact you are adding a WARN_ON() which is trivially triggerable by
>> any user.
> 
> I added this in my suggestion for how this could work, but I don't
> think you're right, since we previously check if there's enough space.
> The patch is missing the full context, but this is:
 ...
> So unless the nlmsg_total_size() vs. nlmsg_put_answer() suddenly gets a
> different idea of how much space is needed, nlh shouldn't ever be NULL
> once we get here.

Aha, that's what I missed.  Indeed, it cannot happen.

My bad.

Re: [PATCH v4] af_netlink: ensure that NLMSG_DONE never fails in dumps

2017-11-11 Thread Jason A. Donenfeld

On Sat, Nov 11, 2017 at 11:18 PM, Johannes Berg
 wrote:
>
>> > If you're handling this by forcing another read() to procude the
>> > NLMSG_DONE, then you have no reason to WARN_ON() here.
>> >
>> > In fact you are adding a WARN_ON() which is trivially triggerable by
>> > any user.
>>
>> I added this in my suggestion for how this could work, but I don't
>> think you're right, since we previously check if there's enough space.
>
> Or perhaps I should say this differently:
>
> Forcing another read happens through the
>
> skb_tailroom(skb) < nlmsg_total_size(...)
>
> check, so the nlmsg_put_answer() can't really fail.
>
>
> Handling nlmsg_put_answer() failures by forcing another read would have
> required jumping to the existing if code with a goto, or restructuring
> the whole thing completely somehow, and I didn't see how to do that.

Exactly. And if something _does_ go wrong in our logic, and we can't
add NLMSG_DONE, we really do want people to report this to us, since
dumps must always end that way. We'd probably have caught this a
number of years ago when userspace developers were twiddling with
their receive buffers if we had had the WARN_ON. Nice suggestion from
Johannes.

[PATCH net-next 0/3] bpf: improve verifier ARG_CONST_SIZE_OR_ZERO semantics

2017-11-11 Thread Yonghong Song

This patch set intends to change verifier ARG_CONST_SIZE_OR_ZERO
semantics so that simpler bpf programs can be written with verifier
acceptance. Patch #1 comment provided the detailed examples and
the patch itself implements the new semantics. Patch #2
changes bpf_probe_read helper arg2 type from
ARG_CONST_SIZE to ARG_CONST_SIZE_OR_ZERO. Patch #3 fixed a few
test cases and added some for better coverage.

Yonghong Song (3):
  bpf: improve verifier ARG_CONST_SIZE_OR_ZERO semantics
  bpf: change helper bpf_probe_read arg2 type to ARG_CONST_SIZE_OR_ZERO
  bpf: fix and add test cases for ARG_CONST_SIZE_OR_ZERO semantics
change

 kernel/bpf/verifier.c   |  40 +
 kernel/trace/bpf_trace.c|   8 +-
 tools/testing/selftests/bpf/test_verifier.c | 131 
 3 files changed, 142 insertions(+), 37 deletions(-)

-- 
2.9.5

[PATCH net-next 3/3] bpf: fix and add test cases for ARG_CONST_SIZE_OR_ZERO semantics change

2017-11-11 Thread Yonghong Song

Fix a few test cases to allow non-NULL map/packet/stack pointer
with size = 0. Change a few tests using bpf_probe_read to use
bpf_probe_write_user so ARG_CONST_SIZE arg can still be properly
tested. One existing test case already covers size = 0 with non-NULL
packet pointer, so add additional tests so all cases of
size = 0 and 0 <= size <= legal_upper_bound with non-NULL
map/packet/stack pointer are covered.

Signed-off-by: Yonghong Song 
Acked-by: Alexei Starovoitov 
---
 tools/testing/selftests/bpf/test_verifier.c | 131 
 1 file changed, 112 insertions(+), 19 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_verifier.c 
b/tools/testing/selftests/bpf/test_verifier.c
index bb3c4ad..bf092b8 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -3579,7 +3579,7 @@ static struct bpf_test tests[] = {
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
},
{
-   "helper access to packet: test19, cls helper fail range zero",
+   "helper access to packet: test19, cls helper range zero",
.insns = {
BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
offsetof(struct __sk_buff, data)),
@@ -3599,8 +3599,7 @@ static struct bpf_test tests[] = {
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
},
-   .result = REJECT,
-   .errstr = "invalid access to packet",
+   .result = ACCEPT,
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
},
{
@@ -4379,10 +4378,10 @@ static struct bpf_test tests[] = {
BPF_LD_MAP_FD(BPF_REG_1, 0),
BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 4),
-   BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
-   BPF_MOV64_IMM(BPF_REG_2, 0),
+   BPF_MOV64_IMM(BPF_REG_1, 0),
+   BPF_MOV64_REG(BPF_REG_2, BPF_REG_0),
BPF_MOV64_IMM(BPF_REG_3, 0),
-   BPF_EMIT_CALL(BPF_FUNC_probe_read),
+   BPF_EMIT_CALL(BPF_FUNC_probe_write_user),
BPF_EXIT_INSN(),
},
.fixup_map2 = { 3 },
@@ -4486,9 +4485,10 @@ static struct bpf_test tests[] = {
BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_1,
offsetof(struct test_val, foo)),
-   BPF_MOV64_IMM(BPF_REG_2, 0),
+   BPF_MOV64_REG(BPF_REG_2, BPF_REG_1),
+   BPF_MOV64_IMM(BPF_REG_1, 0),
BPF_MOV64_IMM(BPF_REG_3, 0),
-   BPF_EMIT_CALL(BPF_FUNC_probe_read),
+   BPF_EMIT_CALL(BPF_FUNC_probe_write_user),
BPF_EXIT_INSN(),
},
.fixup_map2 = { 3 },
@@ -4622,13 +4622,14 @@ static struct bpf_test tests[] = {
BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
BPF_MOV64_IMM(BPF_REG_3, 0),
BPF_ALU64_REG(BPF_ADD, BPF_REG_1, BPF_REG_3),
-   BPF_MOV64_IMM(BPF_REG_2, 0),
+   BPF_MOV64_REG(BPF_REG_2, BPF_REG_1),
+   BPF_MOV64_IMM(BPF_REG_1, 0),
BPF_MOV64_IMM(BPF_REG_3, 0),
-   BPF_EMIT_CALL(BPF_FUNC_probe_read),
+   BPF_EMIT_CALL(BPF_FUNC_probe_write_user),
BPF_EXIT_INSN(),
},
.fixup_map2 = { 3 },
-   .errstr = "R1 min value is outside of the array range",
+   .errstr = "R2 min value is outside of the array range",
.result = REJECT,
.prog_type = BPF_PROG_TYPE_TRACEPOINT,
},
@@ -4765,13 +4766,14 @@ static struct bpf_test tests[] = {
BPF_JMP_IMM(BPF_JGT, BPF_REG_3,
offsetof(struct test_val, foo), 4),
BPF_ALU64_REG(BPF_ADD, BPF_REG_1, BPF_REG_3),
-   BPF_MOV64_IMM(BPF_REG_2, 0),
+   BPF_MOV64_REG(BPF_REG_2, BPF_REG_1),
+   BPF_MOV64_IMM(BPF_REG_1, 0),
BPF_MOV64_IMM(BPF_REG_3, 0),
-   BPF_EMIT_CALL(BPF_FUNC_probe_read),
+   BPF_EMIT_CALL(BPF_FUNC_probe_write_user),
BPF_EXIT_INSN(),
},
.fixup_map2 = { 3 },
-   .errstr = "R1 min value is outside of the array range",
+   .errstr = "R2 min value is outside of the array range",
.result = REJECT,
.prog_type = BPF_PROG_TYPE_TRACEPOINT,
},
@@ -5350,7

Re: [net-next:master 488/665] verifier.c:undefined reference to `__multi3'

2017-11-11 Thread Fengguang Wu


On Sun, Nov 12, 2017 at 09:14:14AM +0800, Alexei Starovoitov wrote:

On 11/12/17 8:23 AM, kbuild test robot wrote:

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
master
head:   7c5556decd0a629e9ee02e93653f75ba7b7da03c
commit: 638f5b90d46016372a8e3e0a434f199cc5e12b8c [488/665] bpf: reduce verifier 
memory consumption
config: mips-64r6el_defconfig (attached as .config)
compiler: mips64el-linux-gnuabi64-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
git checkout 638f5b90d46016372a8e3e0a434f199cc5e12b8c
# save the attached .config to linux build tree
make.cross ARCH=mips

All errors (new ones prefixed by >>):

   kernel/bpf/verifier.o: In function `realloc_verifier_state.isra.19':

verifier.c:(.text+0x36fc): undefined reference to `__multi3'


that's a known issue with gcc 7 on mips that is "optimizing"
normal 64-bit multiply into 128-bit variant.
Nothing to fix on the kernel side.


Good to know that! Do you think it a good idea to blacklist __multi3
errors in mips builds?

Thanks,
Fengguang


   crypto/scompress.o: In function `.L82':
   scompress.c:(.text+0x55c): undefined reference to `__multi3'
   lib/mpi/generic_mpih-mul1.o: In function `.L2':
   generic_mpih-mul1.c:(.text+0x60): undefined reference to `__multi3'
   lib/mpi/generic_mpih-mul2.o: In function `.L2':
   generic_mpih-mul2.c:(.text+0x5c): undefined reference to `__multi3'
   lib/mpi/generic_mpih-mul3.o: In function `.L2':
   generic_mpih-mul3.c:(.text+0x5c): undefined reference to `__multi3'
   lib/mpi/mpih-div.o:mpih-div.c:(.text+0x1b8): more undefined references to 
`__multi3' follow

Re: [net-next:master 488/665] verifier.c:undefined reference to `__multi3'

2017-11-11 Thread Alexei Starovoitov


On 11/12/17 9:18 AM, Fengguang Wu wrote:

On Sun, Nov 12, 2017 at 09:14:14AM +0800, Alexei Starovoitov wrote:

On 11/12/17 8:23 AM, kbuild test robot wrote:

tree:
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git
master
head:   7c5556decd0a629e9ee02e93653f75ba7b7da03c
commit: 638f5b90d46016372a8e3e0a434f199cc5e12b8c [488/665] bpf:
reduce verifier memory consumption
config: mips-64r6el_defconfig (attached as .config)
compiler: mips64el-linux-gnuabi64-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
git checkout 638f5b90d46016372a8e3e0a434f199cc5e12b8c
# save the attached .config to linux build tree
make.cross ARCH=mips

All errors (new ones prefixed by >>):

   kernel/bpf/verifier.o: In function `realloc_verifier_state.isra.19':

verifier.c:(.text+0x36fc): undefined reference to `__multi3'


that's a known issue with gcc 7 on mips that is "optimizing"
normal 64-bit multiply into 128-bit variant.
Nothing to fix on the kernel side.


Good to know that! Do you think it a good idea to blacklist __multi3
errors in mips builds?


I would do so. yes.
Though digging further this function was added to
arch/sparc/lib/multi3.S
since gcc doing the same "optimization" there.
Adding asm code doesn't look right to me. I'd rather push
gcc folks to avoid such codegen.

[net-next:master 488/665] verifier.c:undefined reference to `__multi3'

2017-11-11 Thread kbuild test robot

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
master
head:   7c5556decd0a629e9ee02e93653f75ba7b7da03c
commit: 638f5b90d46016372a8e3e0a434f199cc5e12b8c [488/665] bpf: reduce verifier 
memory consumption
config: mips-64r6el_defconfig (attached as .config)
compiler: mips64el-linux-gnuabi64-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout 638f5b90d46016372a8e3e0a434f199cc5e12b8c
# save the attached .config to linux build tree
make.cross ARCH=mips 

All errors (new ones prefixed by >>):

   kernel/bpf/verifier.o: In function `realloc_verifier_state.isra.19':
>> verifier.c:(.text+0x36fc): undefined reference to `__multi3'
   crypto/scompress.o: In function `.L82':
   scompress.c:(.text+0x55c): undefined reference to `__multi3'
   lib/mpi/generic_mpih-mul1.o: In function `.L2':
   generic_mpih-mul1.c:(.text+0x60): undefined reference to `__multi3'
   lib/mpi/generic_mpih-mul2.o: In function `.L2':
   generic_mpih-mul2.c:(.text+0x5c): undefined reference to `__multi3'
   lib/mpi/generic_mpih-mul3.o: In function `.L2':
   generic_mpih-mul3.c:(.text+0x5c): undefined reference to `__multi3'
   lib/mpi/mpih-div.o:mpih-div.c:(.text+0x1b8): more undefined references to 
`__multi3' follow

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCH] ethernet: cavium: octeon: Switch to using netdev_info().

2017-11-11 Thread Joe Perches

On Wed, 2017-10-25 at 14:41 +0800, kbuild test robot wrote:
> Hi Steven,
> 
> [auto build test WARNING on net-next/master]
> [also build test WARNING on v4.14-rc6]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
> 
> url:
> https://github.com/0day-ci/linux/commits/Steven-J-Hill/ethernet-cavium-octeon-Switch-to-using-netdev_info/20171024-071910
> config: mips-cavium_octeon_defconfig (attached as .config)
> compiler: mips64-linux-gnuabi64-gcc (Debian 6.1.1-9) 6.1.1 20160705
> reproduce:
> wget 
> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
> ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> make.cross ARCH=mips 
> 
> All warnings (new ones prefixed by >>):
> 
>drivers/net/ethernet/cavium/octeon/octeon_mgmt.c: In function 
> 'octeon_mgmt_adjust_link':
> > > drivers/net/ethernet/cavium/octeon/octeon_mgmt.c:929:5: warning: suggest 
> > > explicit braces to avoid ambiguous 'else' [-Wparentheses]
> 
>  if (link_changed != 0)
> ^
> 
> vim +/else +929 drivers/net/ethernet/cavium/octeon/octeon_mgmt.c
> 
>896
>897static void octeon_mgmt_adjust_link(struct net_device *netdev)
>898{
>899struct octeon_mgmt *p = netdev_priv(netdev);
>900struct phy_device *phydev = netdev->phydev;
>901unsigned long flags;
>902int link_changed = 0;
>903
>904if (!phydev)
>905return;
>906
>907spin_lock_irqsave(>lock, flags);
>908
>909
>910if (!phydev->link && p->last_link)
>911link_changed = -1;
>912
>913if (phydev->link &&
>914(p->last_duplex != phydev->duplex ||
>915 p->last_link != phydev->link ||
>916 p->last_speed != phydev->speed)) {
>917octeon_mgmt_disable_link(p);
>918link_changed = 1;
>919octeon_mgmt_update_link(p);
>920octeon_mgmt_enable_link(p);
>921}
>922
>923p->last_link = phydev->link;
>924p->last_speed = phydev->speed;
>925p->last_duplex = phydev->duplex;
>926
>927spin_unlock_irqrestore(>lock, flags);
>928
>  > 929if (link_changed != 0)
>930if (link_changed > 0)
>931netdev_info(netdev, "Link is up - 
> %d/%s\n",
>932phydev->speed, phydev->duplex 
> == DUPLEX_FULL ? "Full" : "Half");
>933else
>934netdev_info(netdev, "Link is down\n");
>935}
>936

I think this would be better as

if (!phydev_link) {
if (p->last_link)
link_changed = -1;
} else if (p->last_duplex != phydev->duplex ||
   p->last_link != phydev->link ||
   p->last_speed != phydev->speed) {
link_changed = 1;
octeon_mgnt_disable_link(p);
octeon_mgnt_update_link(p);
octeon_mgnt_enable_link(p);
}

...

if (link_changed > 0)
netdev_info(netdev, "Link is up - %d/%s\n",
phydev->speed,
phydev->duplex == DUPLEX_FULL ? "Full" : "Half");
else if (link_changed < 0)
netdev_info(netdev, "Link is down\n");

Re: [net-next:master 488/665] verifier.c:undefined reference to `__multi3'

2017-11-11 Thread Florian Fainelli

Le 11/11/17 à 17:34, Fengguang Wu a écrit :
> On Sun, Nov 12, 2017 at 09:23:52AM +0800, Alexei Starovoitov wrote:
>> On 11/12/17 9:18 AM, Fengguang Wu wrote:
>>> On Sun, Nov 12, 2017 at 09:14:14AM +0800, Alexei Starovoitov wrote:
 On 11/12/17 8:23 AM, kbuild test robot wrote:
> tree:
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git
> master
> head:   7c5556decd0a629e9ee02e93653f75ba7b7da03c
> commit: 638f5b90d46016372a8e3e0a434f199cc5e12b8c [488/665] bpf:
> reduce verifier memory consumption
> config: mips-64r6el_defconfig (attached as .config)
> compiler: mips64el-linux-gnuabi64-gcc (Debian 7.2.0-11) 7.2.0
> reproduce:
>     git checkout 638f5b90d46016372a8e3e0a434f199cc5e12b8c
>     # save the attached .config to linux build tree
>     make.cross ARCH=mips
>
> All errors (new ones prefixed by >>):
>
>    kernel/bpf/verifier.o: In function
> `realloc_verifier_state.isra.19':
>>> verifier.c:(.text+0x36fc): undefined reference to `__multi3'

 that's a known issue with gcc 7 on mips that is "optimizing"
 normal 64-bit multiply into 128-bit variant.
 Nothing to fix on the kernel side.
>>>
>>> Good to know that! Do you think it a good idea to blacklist __multi3
>>> errors in mips builds?
>>
>> I would do so. yes.
> 
> OK.
> 
>> Though digging further this function was added to
>> arch/sparc/lib/multi3.S
>> since gcc doing the same "optimization" there.
>> Adding asm code doesn't look right to me. I'd rather push
>> gcc folks to avoid such codegen.
> 
> Sure, I just forwarded the original report to GCC list.

Thomas encountered a similar problem, reported on linux-mips here:

https://www.linux-mips.org/archives/linux-mips/2017-08/msg00041.html
-- 
Florian

[PATCH v2 net-next] tcp: allow drivers to tweak TSQ logic

2017-11-11 Thread Eric Dumazet

From: Eric Dumazet 

I had many reports that TSQ logic breaks wifi aggregation.

Current logic is to allow up to 1 ms of bytes to be queued into qdisc
and drivers queues.

But Wifi aggregation needs a bigger budget to allow bigger rates to
be discovered by various TCP Congestion Controls algorithms.

This patch adds an extra socket field, allowing wifi drivers to select
another log scale to derive TCP Small Queue credit from current pacing
rate.

Initial value is 10, meaning that this patch does not change current
behavior.

We expect wifi drivers to set this field to smaller values (tests have
been done with values from 6 to 9)

They would have to use following template :

if (skb->sk && skb->sk->sk_pacing_shift != MY_PACING_SHIFT)
 skb->sk->sk_pacing_shift = MY_PACING_SHIFT;


Ref: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1670041
Signed-off-by: Eric Dumazet 
Cc: Johannes Berg 
Cc: Toke Høiland-Jørgensen 
Cc: Kir Kolyshkin 
---
v2: added kernel-doc comment, based on Johannes feedback.

 include/net/sock.h|2 ++
 net/core/sock.c   |1 +
 net/ipv4/tcp_output.c |4 ++--
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 
688a823dccc306bd21f47da167c6922161af5a6a..f8715c5af37d4e598770dbe5c5f83246241f18d5
 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -267,6 +267,7 @@ struct sock_common {
   *@sk_gso_type: GSO type (e.g. %SKB_GSO_TCPV4)
   *@sk_gso_max_size: Maximum GSO segment size to build
   *@sk_gso_max_segs: Maximum number of GSO segments
+  *@sk_pacing_shift: scaling factor for TCP Small Queues
   *@sk_lingertime: %SO_LINGER l_linger setting
   *@sk_backlog: always used with the per-socket spinlock held
   *@sk_callback_lock: used with the callbacks in the end of this struct
@@ -451,6 +452,7 @@ struct sock {
kmemcheck_bitfield_end(flags);
 
u16 sk_gso_max_segs;
+   u8  sk_pacing_shift;
unsigned long   sk_lingertime;
struct proto*sk_prot_creator;
rwlock_tsk_callback_lock;
diff --git a/net/core/sock.c b/net/core/sock.c
index 
57bbd6040eb6a3c072ce4e024687786079552ddf..13719af7b4e35d2050ccba51d44c7f691a889b37
 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2746,6 +2746,7 @@ void sock_init_data(struct socket *sock, struct sock *sk)
 
sk->sk_max_pacing_rate = ~0U;
sk->sk_pacing_rate = ~0U;
+   sk->sk_pacing_shift = 10;
sk->sk_incoming_cpu = -1;
/*
 * Before updating sk_refcnt, we must commit prior changes to memory
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 
0256f7a410417d93c9edab9d25a3ce5a81c2b296..76dbe884f2469660028684a46fc19afa000a1353
 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1720,7 +1720,7 @@ u32 tcp_tso_autosize(const struct sock *sk, unsigned int 
mss_now,
 {
u32 bytes, segs;
 
-   bytes = min(sk->sk_pacing_rate >> 10,
+   bytes = min(sk->sk_pacing_rate >> sk->sk_pacing_shift,
sk->sk_gso_max_size - 1 - MAX_TCP_HEADER);
 
/* Goal is to send at least one packet per ms,
@@ -2198,7 +2198,7 @@ static bool tcp_small_queue_check(struct sock *sk, const 
struct sk_buff *skb,
 {
unsigned int limit;
 
-   limit = max(2 * skb->truesize, sk->sk_pacing_rate >> 10);
+   limit = max(2 * skb->truesize, sk->sk_pacing_rate >> 
sk->sk_pacing_shift);
limit = min_t(u32, limit,
  sock_net(sk)->ipv4.sysctl_tcp_limit_output_bytes);
limit <<= factor;

Re: [net-next:master 488/665] verifier.c:undefined reference to `__multi3'

2017-11-11 Thread Fengguang Wu


On Sun, Nov 12, 2017 at 09:23:52AM +0800, Alexei Starovoitov wrote:

On 11/12/17 9:18 AM, Fengguang Wu wrote:

On Sun, Nov 12, 2017 at 09:14:14AM +0800, Alexei Starovoitov wrote:

On 11/12/17 8:23 AM, kbuild test robot wrote:

tree:
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git
master
head:   7c5556decd0a629e9ee02e93653f75ba7b7da03c
commit: 638f5b90d46016372a8e3e0a434f199cc5e12b8c [488/665] bpf:
reduce verifier memory consumption
config: mips-64r6el_defconfig (attached as .config)
compiler: mips64el-linux-gnuabi64-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
git checkout 638f5b90d46016372a8e3e0a434f199cc5e12b8c
# save the attached .config to linux build tree
make.cross ARCH=mips

All errors (new ones prefixed by >>):

   kernel/bpf/verifier.o: In function `realloc_verifier_state.isra.19':

verifier.c:(.text+0x36fc): undefined reference to `__multi3'


that's a known issue with gcc 7 on mips that is "optimizing"
normal 64-bit multiply into 128-bit variant.
Nothing to fix on the kernel side.


Good to know that! Do you think it a good idea to blacklist __multi3
errors in mips builds?


I would do so. yes.


OK.


Though digging further this function was added to
arch/sparc/lib/multi3.S
since gcc doing the same "optimization" there.
Adding asm code doesn't look right to me. I'd rather push
gcc folks to avoid such codegen.


Sure, I just forwarded the original report to GCC list.

Thanks,
Fengguang

[net-next:master 488/665] verifier.c:undefined reference to `__multi3'

2017-11-11 Thread Fengguang Wu

CC gcc list. According to Alexei:

 This is a known issue with gcc 7 on mips that is "optimizing"
 normal 64-bit multiply into 128-bit variant.
 Nothing to fix on the kernel side.

 Digging further, this function was added to
 arch/sparc/lib/multi3.S
 since gcc doing the same "optimization" there.
 Adding asm code doesn't look right to me. I'd rather push
 gcc folks to avoid such codegen.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
master
head:   7c5556decd0a629e9ee02e93653f75ba7b7da03c
commit: 638f5b90d46016372a8e3e0a434f199cc5e12b8c [488/665] bpf: reduce verifier 
memory consumption
config: mips-64r6el_defconfig (attached as .config)
compiler: mips64el-linux-gnuabi64-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
 wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
 chmod +x ~/bin/make.cross
 git checkout 638f5b90d46016372a8e3e0a434f199cc5e12b8c
 # save the attached .config to linux build tree
 make.cross ARCH=mips 

All errors (new ones prefixed by >>):

kernel/bpf/verifier.o: In function `realloc_verifier_state.isra.19':
>> verifier.c:(.text+0x36fc): undefined reference to `__multi3'
crypto/scompress.o: In function `.L82':
scompress.c:(.text+0x55c): undefined reference to `__multi3'
lib/mpi/generic_mpih-mul1.o: In function `.L2':
generic_mpih-mul1.c:(.text+0x60): undefined reference to `__multi3'
lib/mpi/generic_mpih-mul2.o: In function `.L2':
generic_mpih-mul2.c:(.text+0x5c): undefined reference to `__multi3'
lib/mpi/generic_mpih-mul3.o: In function `.L2':
generic_mpih-mul3.c:(.text+0x5c): undefined reference to `__multi3'
lib/mpi/mpih-div.o:mpih-div.c:(.text+0x1b8): more undefined references to 
`__multi3' follow

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip
___
kbuild-all mailing list
kbuild-...@lists.01.org
https://lists.01.org/mailman/listinfo/kbuild-all

Re: [net-next] tcp: allow drivers to tweak TSQ logic

2017-11-11 Thread Eric Dumazet

On Sat, 2017-11-11 at 15:27 +0100, Johannes Berg wrote:
> Thanks Eric!
> 
> > We expect wifi drivers to set this field to smaller values (tests have
> > been done with values from 6 to 9)
> 
> I suppose we should test each driver or so.
> 
> > They would have to use following template :
> > 
> > if (skb->sk && skb->sk->sk_pacing_shift != MY_PACING_SHIFT)
> >  skb->sk->sk_pacing_shift = MY_PACING_SHIFT;
> 
> Hm. I wish we wouldn't have to do this on every skb, but perhaps it
> doesn't matter that much.

Yes, it does not matter, even at 40Gbit ;)

> 
> 
> > u16 sk_gso_max_segs;
> > +   u8  sk_pacing_shift;
> 
> I guess you tried to fill a hole, but weren't we saying that it would
> be better in the same cacheline? Then again, perhaps both cachelines
> are resident anyway, haven't looked at this now.

Same cache line already ;)

u32sk_pacing_rate;   /* 0x1c0   0x4 */
u32sk_max_pacing_rate;   /* 0x1c4   0x4 */
struct page_frag   sk_frag;  /* 0x1c8  0x10 */
netdev_features_t  sk_route_caps;/* 0x1d8   0x8 */
netdev_features_t  sk_route_nocaps;  /* 0x1e0   0x8 */
intsk_gso_type;  /* 0x1e8   0x4 */
unsigned int   sk_gso_max_size;  /* 0x1ec   0x4 */
gfp_t  sk_allocation;/* 0x1f0   0x4 */
__u32  sk_txhash;/* 0x1f4   0x4 */
unsigned int   __sk_flags_offset[0]; /* 0x1f8 0 */
unsigned int   sk_padding:1; /* 0x1f8:0x1f 0x4 */
unsigned int   sk_kern_sock:1;   /* 0x1f8:0x1e 0x4 */
unsigned int   sk_no_check_tx:1; /* 0x1f8:0x1d 0x4 */
unsigned int   sk_no_check_rx:1; /* 0x1f8:0x1c 0x4 */
unsigned int   sk_userlocks:4;   /* 0x1f8:0x18 0x4 */
unsigned int   sk_protocol:8;/* 0x1f8:0x10 0x4 */
unsigned int   sk_type:16;   /* 0x1f8: 0 0x4 */
u16sk_gso_max_segs;  /* 0x1fc   0x2 */
u8 sk_pacing_shift;  /* 0x1fe   0x1 */






> 
> Unrelated to that, I think this is missing a documentation update since
> the struct has kernel-doc comments.

Yeah, I believe these kernel-doc on gigantic struct sock are useless and
we should remove them, they have zero useful info.

Re: [PATCH iproute2 2/2] devlink: add batch command support

2017-11-11 Thread Jiri Pirko

Fri, Nov 10, 2017 at 08:47:35PM CET, l...@kernel.org wrote:
>On Fri, Nov 10, 2017 at 08:10:43AM +0100, Ivan Vecera wrote:
>> On 10.11.2017 07:57, Leon Romanovsky wrote:
>> > On Fri, Nov 10, 2017 at 07:20:14AM +0100, Ivan Vecera wrote:
>> >> The patch adds support to batch devlink commands.
>> >>
>> >> Cc: Jiri Pirko 
>> >> Cc: Arkadi Sharshevsky 
>> >> Signed-off-by: Ivan Vecera 
>> >> ---
>> >>  devlink/devlink.c  | 70 
>> >> +++---
>> >>  man/man8/devlink.8 | 16 +
>> >>  2 files changed, 78 insertions(+), 8 deletions(-)
>> >>
>> >
>> > <..>
>> >
>> >> diff --git a/man/man8/devlink.8 b/man/man8/devlink.8
>> >> index a480766c..a975ef34 100644
>> >> --- a/man/man8/devlink.8
>> >> +++ b/man/man8/devlink.8
>> >> @@ -12,6 +12,12 @@ devlink \- Devlink tool
>> >>  .sp
>> >>
>> >>  .ti -8
>> >> +.B devlink
>> >> +.RB "[ " -force " ] "
>> >> +.BI "-batch " filename
>> >> +.sp
>> >> +
>> >> +.ti -8
>> >>  .IR OBJECT " := { "
>> >>  .BR dev " | " port " | " monitor " }"
>> >>  .sp
>> >> @@ -32,6 +38,16 @@ Print the version of the
>> >>  utility and exit.
>> >>
>> >>  .TP
>> >> +.BR "\-b", " \-batch " 
>> >> +Read commands from provided file or standard input and invoke them.
>> >> +First failure will cause termination of devlink.
>> >
>> > It is worth to document the expected format of that file.
>> > And IMHO, it is better to have ability to load JSON fie which was
>> > generated by -j, instead of declaring new format/knob.
>> It's just a list of command-lines... like other utils (bridge,ip...)
>
>I'm implementing similar thing in RDMAtool (part of iproute2) and choose JSON
>approach, it is more user and script friendly.

Leon, we should really do things in a way they are currently done and
used. Batching is implemented in "ip" for a long time. It makes perfect
sense to have one command line per line of the batch file.

In contrary, json output sounds really odd in this case. With json,
there is no relation to ordinary ip command line params. Or do you want
to extend it to accept json as well?

Re: [PATCH iproute2 2/2] devlink: add batch command support

2017-11-11 Thread Jiri Pirko

Fri, Nov 10, 2017 at 07:20:14AM CET, ivec...@redhat.com wrote:
>The patch adds support to batch devlink commands.
>
>Cc: Jiri Pirko 
>Cc: Arkadi Sharshevsky 
>Signed-off-by: Ivan Vecera 

Acked-by: Jiri Pirko 

Thanks!

Re: [PATCH net-next] bpf: expose sk_priority through struct bpf_sock_ops

2017-11-11 Thread Alexei Starovoitov


On 11/12/17 4:46 AM, Daniel Borkmann wrote:

On 11/11/2017 05:06 AM, Alexei Starovoitov wrote:

On 11/11/17 6:07 AM, Daniel Borkmann wrote:

On 11/10/2017 08:17 PM, Vlad Dumitrescu wrote:

From: Vlad Dumitrescu 

Allows BPF_PROG_TYPE_SOCK_OPS programs to read sk_priority.

Signed-off-by: Vlad Dumitrescu 
---
  include/uapi/linux/bpf.h   |  1 +
  net/core/filter.c  | 11 +++
  tools/include/uapi/linux/bpf.h |  1 +
  3 files changed, 13 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index e880ae6434ee..9757a2002513 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -947,6 +947,7 @@ struct bpf_sock_ops {
  __u32 local_ip6[4];/* Stored in network byte order */
  __u32 remote_port;/* Stored in network byte order */
  __u32 local_port;/* stored in host byte order */
+__u32 priority;
  };
/* List of known BPF sock_ops operators.
diff --git a/net/core/filter.c b/net/core/filter.c
index 61c791f9f628..a6329642d047 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4449,6 +4449,17 @@ static u32 sock_ops_convert_ctx_access(enum
bpf_access_type type,
  *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->dst_reg,
offsetof(struct sock_common, skc_num));
  break;
+
+case offsetof(struct bpf_sock_ops, priority):
+BUILD_BUG_ON(FIELD_SIZEOF(struct sock, sk_priority) != 4);
+
+*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
+struct bpf_sock_ops_kern, sk),
+  si->dst_reg, si->src_reg,
+  offsetof(struct bpf_sock_ops_kern, sk));
+*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
+  offsetof(struct sock, sk_priority));
+break;


Hm, I don't think this would work, I actually think your initial patch
was ok.
bpf_setsockopt() as well as bpf_getsockopt() check for sk_fullsock(sk)
right
before accessing options on either socket or TCP level, and bail out
with error
otherwise; in such cases we'd read something else here and assume it's
sk_priority.


even if it's not fullsock, it will just read zero, no? what's a problem
with that?
In non-fullsock hooks like BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB
the program author will know that it's meaningless to read sk_priority,
so returning zero with minimal checks is fine.
While adding extra runtime if (sk_fullsock(sk)) is unnecessary,
since the safety is not compromised.


Hm, on my kernel, struct sock has the 4 bytes sk_priority at offset 440,
struct request_sock itself is only 232 byte long in total, and the struct
inet_timewait_sock is 208 byte long, so you'd be accessing out of bounds
that way, so it cannot be ignored and assumed zero.


I thought we always pass fully allocated sock but technically not 
fullsock yet. My mistake. We do: tcp_timeout_init((struct sock *)req))

so yeah ctx rewrite approach won't work.
Let's go back to access via helper.

Re: [PATCH v4] scripts: add leaking_addresses.pl

2017-11-11 Thread Kirill A. Shutemov

On Tue, Nov 07, 2017 at 09:32:11PM +1100, Tobin C. Harding wrote:
> Currently we are leaking addresses from the kernel to user space. This
> script is an attempt to find some of those leakages. Script parses
> `dmesg` output and /proc and /sys files for hex strings that look like
> kernel addresses.
> 
> Only works for 64 bit kernels, the reason being that kernel addresses
> on 64 bit kernels have '' as the leading bit pattern making greping
> possible. On 32 kernels we don't have this luxury.

Well, it's not going to work as well as intented on x86 machine with
5-level paging. Kernel address space there starts at 0xff10.
It will still catch pointers to kernel/modules text, but the rest is
outside of 0x... space. See Documentation/x86/x86_64/mm.txt.

Not sure if we care. It won't work too for other 64-bit architectrues that
have more than 256TB of virtual address space.

Just wanted to point to the limitation.

-- 
 Kirill A. Shutemov

[PATCH net-next 1/3] bpf: improve verifier ARG_CONST_SIZE_OR_ZERO semantics

2017-11-11 Thread Yonghong Song

For helpers, the argument type ARG_CONST_SIZE_OR_ZERO permits the
access size to be 0 when accessing the previous argument (arg).
Right now, it requires the arg needs to be NULL when size passed
is 0 or could be 0. It also requires a non-NULL arg when the size
is proved to be non-0.

This patch changes verifier ARG_CONST_SIZE_OR_ZERO behavior
such that for size-0 or possible size-0, it is not required
the arg equal to NULL.

There are a couple of reasons for this semantics change, and
all of them intends to simplify user bpf programs which
may improve user experience and/or increase chances of
verifier acceptance. Together with the next patch which
changes bpf_probe_read arg2 type from ARG_CONST_SIZE to
ARG_CONST_SIZE_OR_ZERO, the following two examples, which
fail the verifier currently, are able to get verifier acceptance.

Example 1:
==
   unsigned long len = pend - pstart;
   len = len > MAX_PAYLOAD_LEN ? MAX_PAYLOAD_LEN : len;
   len &= MAX_PAYLOAD_LEN;
   bpf_probe_read(data->payload, len, pstart);

It does not have test for "len > 0" and it failed the verifier.
Users may not be aware that they have to add this test.
Converting the bpf_probe_read helper to have
ARG_CONST_SIZE_OR_ZERO helps the above code get
verifier acceptance.

Example 2:
==
Here is one example where llvm "messed up" the code and
the verifier fails.

..
   unsigned long len = pend - pstart;
   if (len > 0 && len <= MAX_PAYLOAD_LEN)
 bpf_probe_read(data->payload, len, pstart);
..

The compiler generates the following code and verifier fails:
..
39: (79) r2 = *(u64 *)(r10 -16)
40: (1f) r2 -= r8
41: (bf) r1 = r2
42: (07) r1 += -1
43: (25) if r1 > 0xffe goto pc+3
  R0=inv(id=0) R1=inv(id=0,umax_value=4094,var_off=(0x0; 0xfff))
  R2=inv(id=0) R6=map_value(id=0,off=0,ks=4,vs=4095,imm=0) R7=inv(id=0)
  R8=inv(id=0) R9=inv0 R10=fp0
44: (bf) r1 = r6
45: (bf) r3 = r8
46: (85) call bpf_probe_read#45
R2 min value is negative, either use unsigned or 'var &= const'
..

The compiler optimization is correct. If r1 = 0,
r1 - 1 = 0x > 0xffe.  If r1 != 0, r1 - 1 will not wrap.
r1 > 0xffe at insn #43 can actually capture
both "r1 > 0" and "len <= MAX_PAYLOAD_LEN".
This however causes an issue in verifier as the value range of arg2
"r2" does not properly get refined and lead to verification failure.

Relaxing bpf_prog_read arg2 from ARG_CONST_SIZE to ARG_CONST_SIZE_OR_ZERO
allows the following simplied code:
   unsigned long len = pend - pstart;
   if (len <= MAX_PAYLOAD_LEN)
 bpf_probe_read(data->payload, len, pstart);

The llvm compiler will generate less complex code and the
verifier is able to verify that the program is okay.

Signed-off-by: Yonghong Song 
Acked-by: Alexei Starovoitov 
---
 kernel/bpf/verifier.c | 40 
 1 file changed, 24 insertions(+), 16 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 4a942e2..dd54d20 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -799,12 +799,13 @@ static int check_stack_read(struct bpf_verifier_env *env,
 
 /* check read/write into map element returned by bpf_map_lookup_elem() */
 static int __check_map_access(struct bpf_verifier_env *env, u32 regno, int off,
-   int size)
+ int size, bool zero_size_allowed)
 {
struct bpf_reg_state *regs = cur_regs(env);
struct bpf_map *map = regs[regno].map_ptr;
 
-   if (off < 0 || size <= 0 || off + size > map->value_size) {
+   if (off < 0 || size < 0 || (size == 0 && !zero_size_allowed) ||
+   off + size > map->value_size) {
verbose(env, "invalid access to map value, value_size=%d off=%d 
size=%d\n",
map->value_size, off, size);
return -EACCES;
@@ -814,7 +815,7 @@ static int __check_map_access(struct bpf_verifier_env *env, 
u32 regno, int off,
 
 /* check read/write into a map element with possible variable offset */
 static int check_map_access(struct bpf_verifier_env *env, u32 regno,
-   int off, int size)
+   int off, int size, bool zero_size_allowed)
 {
struct bpf_verifier_state *state = env->cur_state;
struct bpf_reg_state *reg = >regs[regno];
@@ -837,7 +838,8 @@ static int check_map_access(struct bpf_verifier_env *env, 
u32 regno,
regno);
return -EACCES;
}
-   err = __check_map_access(env, regno, reg->smin_value + off, size);
+   err = __check_map_access(env, regno, reg->smin_value + off, size,
+zero_size_allowed);
if (err) {
verbose(env, "R%d min value is outside of the array range\n",
regno);
@@ -853,7 +855,8 @@ static int check_map_access(struct bpf_verifier_env *env, 
u32 regno,
regno);
return -EACCES;
}
-

[PATCH net-next 2/3] bpf: change helper bpf_probe_read arg2 type to ARG_CONST_SIZE_OR_ZERO

2017-11-11 Thread Yonghong Song

The helper bpf_probe_read arg2 type is changed
from ARG_CONST_SIZE to ARG_CONST_SIZE_OR_ZERO to permit
size-0 buffer. Together with newer ARG_CONST_SIZE_OR_ZERO
semantics which allows non-NULL buffer with size 0,
this allows simpler bpf programs with verifier acceptance.
The prvious commit which changes ARG_CONST_SIZE_OR_ZERO semantics
has details on examples.

Signed-off-by: Yonghong Song 
Acked-by: Alexei Starovoitov 
---
 kernel/trace/bpf_trace.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 506efe6..a5580c6 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -78,12 +78,16 @@ EXPORT_SYMBOL_GPL(trace_call_bpf);
 
 BPF_CALL_3(bpf_probe_read, void *, dst, u32, size, const void *, unsafe_ptr)
 {
-   int ret;
+   int ret = 0;
+
+   if (unlikely(size == 0))
+   goto out;
 
ret = probe_kernel_read(dst, unsafe_ptr, size);
if (unlikely(ret < 0))
memset(dst, 0, size);
 
+ out:
return ret;
 }
 
@@ -92,7 +96,7 @@ static const struct bpf_func_proto bpf_probe_read_proto = {
.gpl_only   = true,
.ret_type   = RET_INTEGER,
.arg1_type  = ARG_PTR_TO_UNINIT_MEM,
-   .arg2_type  = ARG_CONST_SIZE,
+   .arg2_type  = ARG_CONST_SIZE_OR_ZERO,
.arg3_type  = ARG_ANYTHING,
 };
 
-- 
2.9.5

Re: [net-next:master 488/665] verifier.c:undefined reference to `__multi3'

2017-11-11 Thread Alexei Starovoitov


On 11/12/17 8:23 AM, kbuild test robot wrote:

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
master
head:   7c5556decd0a629e9ee02e93653f75ba7b7da03c
commit: 638f5b90d46016372a8e3e0a434f199cc5e12b8c [488/665] bpf: reduce verifier 
memory consumption
config: mips-64r6el_defconfig (attached as .config)
compiler: mips64el-linux-gnuabi64-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
git checkout 638f5b90d46016372a8e3e0a434f199cc5e12b8c
# save the attached .config to linux build tree
make.cross ARCH=mips

All errors (new ones prefixed by >>):

   kernel/bpf/verifier.o: In function `realloc_verifier_state.isra.19':

verifier.c:(.text+0x36fc): undefined reference to `__multi3'


that's a known issue with gcc 7 on mips that is "optimizing"
normal 64-bit multiply into 128-bit variant.
Nothing to fix on the kernel side.


   crypto/scompress.o: In function `.L82':
   scompress.c:(.text+0x55c): undefined reference to `__multi3'
   lib/mpi/generic_mpih-mul1.o: In function `.L2':
   generic_mpih-mul1.c:(.text+0x60): undefined reference to `__multi3'
   lib/mpi/generic_mpih-mul2.o: In function `.L2':
   generic_mpih-mul2.c:(.text+0x5c): undefined reference to `__multi3'
   lib/mpi/generic_mpih-mul3.o: In function `.L2':
   generic_mpih-mul3.c:(.text+0x5c): undefined reference to `__multi3'
   lib/mpi/mpih-div.o:mpih-div.c:(.text+0x1b8): more undefined references to 
`__multi3' follow

gpl-only change on bpf_setsockopt

2017-11-11 Thread David Ahern

Hi Lawrence:

I noticed that commit cd86d1fd21025 ("bpf: Adding helper function
bpf_getsockops") changed the gpl_only on bpf_setsockopt. The commit log
does not specify why. Is there any reason you changed it and made both
bpf_setsockopt and bpf_getsockopt available for non-gpl programs?

David

Re: gpl-only change on bpf_setsockopt

2017-11-11 Thread Alexei Starovoitov

On Sat, Nov 11, 2017 at 08:17:37PM -0700, David Ahern wrote:
> Hi Lawrence:
> 
> I noticed that commit cd86d1fd21025 ("bpf: Adding helper function
> bpf_getsockops") changed the gpl_only on bpf_setsockopt. The commit log
> does not specify why. Is there any reason you changed it and made both
> bpf_setsockopt and bpf_getsockopt available for non-gpl programs?

that was my request to match the rest of networking programs.
There is nothing linux specific about get/setsockopt that any
user space process can do just as well.
Compare that to tracing programs which are gpl only.

[PATCH net-next 1/3] rxrpc: Lock around calling a kernel service Rx notification

2017-11-11 Thread David Howells

Place a spinlock around the invocation of call->notify_rx() for a kernel
service call and lock again when ending the call and replace the
notification pointer with a pointer to a dummy function.

This is required because it's possible for rxrpc_notify_socket() to be
called after the call has been ended by the kernel service if called from
the asynchronous work function rxrpc_process_call().

However, rxrpc_notify_socket() currently only holds the RCU read lock when
invoking ->notify_rx(), which means that the afs_call struct would need to
be disposed of by call_rcu() rather than by kfree().

But we shouldn't see any notifications from a call after calling
rxrpc_kernel_end_call(), so a lock is required in rxrpc code.

Without this, we may see the call wait queue as having a corrupt spinlock:

BUG: spinlock bad magic on CPU#0, kworker/0:2/1612
general protection fault:  [#1] SMP
...
Workqueue: krxrpcd rxrpc_process_call
task: 88040b83c400 task.stack: 88040adfc000
RIP: 0010:spin_bug+0x161/0x18f
RSP: 0018:88040adffcc0 EFLAGS: 00010002
RAX: 0032 RBX: 6b6b6b6b6b6b6b6b RCX: 81ab16cf
RDX: 88041fa14c01 RSI: 88041fa0ccb8 RDI: 88041fa0ccb8
RBP: 88040adffcd8 R08:  R09: 
R10: 88040adffc60 R11: 022c R12: 88040aca2208
R13: 81a58114 R14:  R15: 

Call Trace:
 do_raw_spin_lock+0x1d/0x89
 _raw_spin_lock_irqsave+0x3d/0x49
 ? __wake_up_common_lock+0x4c/0xa7
 __wake_up_common_lock+0x4c/0xa7
 ? __lock_is_held+0x47/0x7a
 __wake_up+0xe/0x10
 afs_wake_up_call_waiter+0x11b/0x122 [kafs]
 rxrpc_notify_socket+0x12b/0x258
 rxrpc_process_call+0x18e/0x7d0
 process_one_work+0x298/0x4de
 ? rescuer_thread+0x280/0x280
 worker_thread+0x1d1/0x2ae
 ? rescuer_thread+0x280/0x280
 kthread+0x12c/0x134
 ? kthread_create_on_node+0x3a/0x3a
 ret_from_fork+0x27/0x40

In this case, note the corrupt data in EBX.  The address of the offending
afs_call is in R12, plus the offset to the spinlock.

Signed-off-by: David Howells 
---

 net/rxrpc/af_rxrpc.c|   16 
 net/rxrpc/ar-internal.h |1 +
 net/rxrpc/call_object.c |1 +
 net/rxrpc/recvmsg.c |2 ++
 4 files changed, 20 insertions(+)

diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index 344b2dcad52d..9b5c46b052fd 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -322,6 +322,14 @@ struct rxrpc_call *rxrpc_kernel_begin_call(struct socket 
*sock,
 }
 EXPORT_SYMBOL(rxrpc_kernel_begin_call);
 
+/*
+ * Dummy function used to stop the notifier talking to recvmsg().
+ */
+static void rxrpc_dummy_notify_rx(struct sock *sk, struct rxrpc_call *rxcall,
+ unsigned long call_user_ID)
+{
+}
+
 /**
  * rxrpc_kernel_end_call - Allow a kernel service to end a call it was using
  * @sock: The socket the call is on
@@ -336,6 +344,14 @@ void rxrpc_kernel_end_call(struct socket *sock, struct 
rxrpc_call *call)
 
mutex_lock(>user_mutex);
rxrpc_release_call(rxrpc_sk(sock->sk), call);
+
+   /* Make sure we're not going to call back into a kernel service */
+   if (call->notify_rx) {
+   spin_lock_bh(>notify_lock);
+   call->notify_rx = rxrpc_dummy_notify_rx;
+   spin_unlock_bh(>notify_lock);
+   }
+
mutex_unlock(>user_mutex);
rxrpc_put_call(call, rxrpc_call_put_kernel);
 }
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index ea5600b747cc..b2151993d384 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -525,6 +525,7 @@ struct rxrpc_call {
unsigned long   flags;
unsigned long   events;
spinlock_t  lock;
+   spinlock_t  notify_lock;/* Kernel notification lock */
rwlock_tstate_lock; /* lock for state transition */
u32 abort_code; /* Local/remote abort code */
int error;  /* Local error incurred */
diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c
index fcdd6555a820..4c7fbc6dcce7 100644
--- a/net/rxrpc/call_object.c
+++ b/net/rxrpc/call_object.c
@@ -124,6 +124,7 @@ struct rxrpc_call *rxrpc_alloc_call(gfp_t gfp)
INIT_LIST_HEAD(>sock_link);
init_waitqueue_head(>waitq);
spin_lock_init(>lock);
+   spin_lock_init(>notify_lock);
rwlock_init(>state_lock);
atomic_set(>usage, 1);
call->debug_id = atomic_inc_return(_debug_id);
diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c
index e4937b3f3685..8510a98b87e1 100644
--- a/net/rxrpc/recvmsg.c
+++ b/net/rxrpc/recvmsg.c
@@ -40,7 +40,9 @@ void rxrpc_notify_socket(struct rxrpc_call *call)
sk = >sk;
if (rx && sk->sk_state < RXRPC_CLOSE) {
if

[PATCH net-next 2/3] rxrpc: Fix a null ptr deref in rxrpc_fill_out_ack()

2017-11-11 Thread David Howells

rxrpc_fill_out_ack() needs to be passed the connection pointer from its
caller rather than using call->conn as the call may be disconnected in
parallel with it, clearing call->conn, leading to:

BUG: unable to handle kernel NULL pointer dereference at 
0010
IP: rxrpc_send_ack_packet+0x231/0x6a4

Signed-off-by: David Howells 
---

 net/rxrpc/output.c |9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c
index 71e6f713fbe7..8ee8b2d4a3eb 100644
--- a/net/rxrpc/output.c
+++ b/net/rxrpc/output.c
@@ -35,7 +35,8 @@ struct rxrpc_abort_buffer {
 /*
  * Fill out an ACK packet.
  */
-static size_t rxrpc_fill_out_ack(struct rxrpc_call *call,
+static size_t rxrpc_fill_out_ack(struct rxrpc_connection *conn,
+struct rxrpc_call *call,
 struct rxrpc_ack_buffer *pkt,
 rxrpc_seq_t *_hard_ack,
 rxrpc_seq_t *_top,
@@ -77,8 +78,8 @@ static size_t rxrpc_fill_out_ack(struct rxrpc_call *call,
} while (before_eq(seq, top));
}
 
-   mtu = call->conn->params.peer->if_mtu;
-   mtu -= call->conn->params.peer->hdrsize;
+   mtu = conn->params.peer->if_mtu;
+   mtu -= conn->params.peer->hdrsize;
jmax = (call->nr_jumbo_bad > 3) ? 1 : rxrpc_rx_jumbo_max;
pkt->ackinfo.rxMTU  = htonl(rxrpc_rx_mtu);
pkt->ackinfo.maxMTU = htonl(mtu);
@@ -148,7 +149,7 @@ int rxrpc_send_ack_packet(struct rxrpc_call *call, bool 
ping)
}
call->ackr_reason = 0;
}
-   n = rxrpc_fill_out_ack(call, pkt, _ack, , reason);
+   n = rxrpc_fill_out_ack(conn, call, pkt, _ack, , reason);
 
spin_unlock_bh(>lock);

[PATCH net-next 0/3] rxrpc: Fixes

2017-11-11 Thread David Howells


Here are some patches that fix some things in AF_RXRPC:

 (1) Prevent notifications from being passed to a kernel service for a call
 that it has ended.

 (2) Fix a null pointer deference that occurs under some circumstances when an
 ACK is generated.

 (3) Fix a number of things to do with call expiration.

The patches can be found here also:


http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-next

Tagged thusly:

git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
rxrpc-next-2017

David
---
David Howells (3):
  rxrpc: Lock around calling a kernel service Rx notification
  rxrpc: Fix a null ptr deref in rxrpc_fill_out_ack()
  rxrpc: Fix call expiry handling


 net/rxrpc/af_rxrpc.c|   16 
 net/rxrpc/ar-internal.h |1 +
 net/rxrpc/call_event.c  |2 +-
 net/rxrpc/call_object.c |1 +
 net/rxrpc/input.c   |2 --
 net/rxrpc/output.c  |   19 +++
 net/rxrpc/recvmsg.c |2 ++
 7 files changed, 36 insertions(+), 7 deletions(-)

Re: Regression in throughput between kvm guests over virtual bridge

2017-11-11 Thread Matthew Rosato

>> This case should be quite similar with pkgten, if you got improvement with
>> pktgen, usually it was also the same for UDP, could you please try to disable
>> tso, gso, gro, ufo on all host tap devices and guest virtio-net devices? 
>> Currently
>> the most significant tests would be like this AFAICT:
>>
>> Host->VM 4.124.13
>>  TCP:
>>  UDP:
>> pktgen:
>>
>> Don't want to bother you too much, so maybe 4.12 & 4.13 without Jason's 
>> patch should
>> work since we have seen positive number for that, you can also temporarily 
>> skip
>> net-next as well.
> 
> Here are the requested numbers, averaged over numerous runs --  guest is
> 4GB+1vcpu, host uperf/pktgen bound to 1 host CPU + qemu and vhost thread
> pinned to other unique host CPUs.  tso, gso, gro, ufo disabled on host
> taps / guest virtio-net devs as requested:
> 
> Host->VM  4.124.13
> TCP:  9.92Gb/s6.44Gb/s
> UDP:  5.77Gb/s6.63Gb/s
> pktgen:   1572403pps  1904265pps
> 
> UDP/pktgen both show improvement from 4.12->4.13.  More interesting,
> however, is that I am seeing the TCP regression for the first time from
> host->VM.  I wonder if the combination of CPU binding + disabling of one
> or more of tso/gso/gro/ufo is related.
> 
>>
>> If you see UDP and pktgen are aligned, then it might be helpful to continue
>> the other two cases, otherwise we fail in the first place.
> 

I continued running many iterations of these tests between 4.12 and
4.13..  My throughput findings can be summarized as:

VM->VM case:
UDP:  roughly equivalent
TCP:  Consistent regression (5-10%)

VM->Host
Both UDP and TCP traffic are roughly equivalent.

Host->VM
UDP+pktgen: improvement (5-10%), but inconsistent
TCP: Consistent regression (25-30%)

Host->VM UDP and pktgen seemed to show improvement in some runs, and in
others seemed to mirror 4.12-level performance.

The TCP regression for VM->VM is no surprise, we started with that.
It's still consistent, but smaller in this specific environment.

The TCP regression in Host->VM is interesting because I wasn't seeing it
consistently before binding CPUs + disabling tso/gso/gro/ufo.  Also
interesting because of how large it is -- By any chance can you see this
regression on x86 with the same configuration?

[PATCH net-next 3/3] rxrpc: Fix call expiry handling

2017-11-11 Thread David Howells

Fix call expiry handling in the following ways

 (1) If all the request data from a client call is acked, don't send a
 follow up IDLE ACK with firstPacket == 1 and previousPacket == 0 as
 this appears to fool some servers into thinking everything has been
 accepted.

 (2) Never send an abort back to the server once it has ACK'd all the
 request packets; rather just try to reuse the channel for the next
 call.  The first request DATA packet of the next call on the same
 channel will implicitly ACK the entire reply of the dead call - even
 if we haven't transmitted it yet.

 (3) Don't send RX_CALL_TIMEOUT in an ABORT packet, librx uses abort codes
 to pass local errors to the caller in addition to remote errors, and
 this is meant to be local only.

The following also need to be addressed in future patches:

 (4) Service calls should send PING ACKs as 'keep alives' if the server is
 still processing the call.

 (5) VERSION REPLY packets should be sent to the peers of service
 connections to act as keep-alives.  This is used to keep firewall
 routes in place.  The AFS CM should enable this.

Signed-off-by: David Howells 
---

 net/rxrpc/call_event.c |2 +-
 net/rxrpc/input.c  |2 --
 net/rxrpc/output.c |   10 ++
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index 7a77844aab16..3574508baf9a 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -386,7 +386,7 @@ void rxrpc_process_call(struct work_struct *work)
 
now = ktime_get_real();
if (ktime_before(call->expire_at, now)) {
-   rxrpc_abort_call("EXP", call, 0, RX_CALL_TIMEOUT, -ETIME);
+   rxrpc_abort_call("EXP", call, 0, RX_USER_ABORT, -ETIME);
set_bit(RXRPC_CALL_EV_ABORT, >events);
goto recheck_state;
}
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index 1e37eb1c0c66..1b592073ec96 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -298,8 +298,6 @@ static bool rxrpc_end_tx_phase(struct rxrpc_call *call, 
bool reply_begun,
 
write_unlock(>state_lock);
if (call->state == RXRPC_CALL_CLIENT_AWAIT_REPLY) {
-   rxrpc_propose_ACK(call, RXRPC_ACK_IDLE, 0, 0, false, true,
- rxrpc_propose_ack_client_tx_end);
trace_rxrpc_transmit(call, rxrpc_transmit_await_reply);
} else {
trace_rxrpc_transmit(call, rxrpc_transmit_end);
diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c
index 8ee8b2d4a3eb..f47659c7b224 100644
--- a/net/rxrpc/output.c
+++ b/net/rxrpc/output.c
@@ -222,6 +222,16 @@ int rxrpc_send_abort_packet(struct rxrpc_call *call)
rxrpc_serial_t serial;
int ret;
 
+   /* Don't bother sending aborts for a client call once the server has
+* hard-ACK'd all of its request data.  After that point, we're not
+* going to stop the operation proceeding, and whilst we might limit
+* the reply, it's not worth it if we can send a new call on the same
+* channel instead, thereby closing off this call.
+*/
+   if (rxrpc_is_client_call(call) &&
+   test_bit(RXRPC_CALL_TX_LAST, >flags))
+   return 0;
+
spin_lock_bh(>lock);
if (call->conn)
conn = rxrpc_get_connection_maybe(call->conn);

DIRECTOR IN CHARGE: DR.PATRICE TEME

2017-11-11 Thread United Nations

UN Visitor Centre
Department of Public Information
United Nations Headquarters
Room DHL-1B-154
New York, NY 10017
E-mail:un...@teewars.org

United Nations Compensation Unit, In Affiliation with World Bank Our Ref: 
UN/WBO/042UK/2015.

Congratulations Beneficiary,

How are you today  Hope all is well with you and family  You may not understand 
why this mail came to you. We have been having a meeting for the past 7 months 
which just ended few days ago with the secretary to the UNITED NATIONS. This 
email is to all the people that have been scammed in any part of the world, the 
UNITED NATIONS in Affiliation with WORLD BANK have agreed to compensate them 
with the sum of USD US$980,000.00 Dollars.

This includes every foreign contractors that may have not received their 
contract sum, and people that have had an unfinished transaction or 
international businesses that failed due to Government problems etc. We found 
your name in the list of those who are to benefit from these compensation 
exercise and that is why we are contacting you, this have been agreed upon and 
have been signed. You are advised to contact Dr.PATRICE TEME of our paying 
center in Africa, as he is our representative in Nigeria, contact him 
immediately for your Cheque/ International Bank Draft of US$980,000.00 Dollars.

This fund is in form of a Bank Draft for security purpose ok  So he will send 
it to you and you can clear it in any bank of your choice. Therefore, you 
should send him your full Name and telephone number your correct mailing 
address where you want him to send the Draft to you. Contact Dr.PATRICE TEME of 
MAGNUM PLC PAYMENT CENTER with your payment code:ST/DPI/829 immediately for 
your Cheque at the given address below:

DIRECTOR IN CHARGE: DR.PATRICE TEME
E-MAIL:info-magnumb...@ukcompanies.org
TELEPHONE:+ 234-817-008-4240
FAX: +234-817-008-4240
dr_patrice_teme

I apologize on behalf of my organization for any delay you might have 
encountered in receiving your fund in the past. Thanks and God bless you and 
your family. Hoping to hear from you as soon as you cash your Bank Draft. 
Making the world a better place.

You are required to contact the above person and furnish him with the following 
of your information that will be required to avoid any mistakes:-

1. Your Full Name :
2. Your Home/Mobile Telephone No:
3. Your Home or Office Address :
4. Age/Occupation/Marital Status:
5. Scanned copy of your identification:

Congratulations, and I look forward to hear from you as soon as you confirm 
your payment making the world a better place

http://u-n-ocompensation.co.nf/_about_.html

Re: [PATCH net-next] bpf: expose sk_priority through struct bpf_sock_ops

2017-11-11 Thread Daniel Borkmann


On 11/11/2017 05:06 AM, Alexei Starovoitov wrote:

On 11/11/17 6:07 AM, Daniel Borkmann wrote:

On 11/10/2017 08:17 PM, Vlad Dumitrescu wrote:

From: Vlad Dumitrescu 

Allows BPF_PROG_TYPE_SOCK_OPS programs to read sk_priority.

Signed-off-by: Vlad Dumitrescu 
---
  include/uapi/linux/bpf.h   |  1 +
  net/core/filter.c  | 11 +++
  tools/include/uapi/linux/bpf.h |  1 +
  3 files changed, 13 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index e880ae6434ee..9757a2002513 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -947,6 +947,7 @@ struct bpf_sock_ops {
  __u32 local_ip6[4];    /* Stored in network byte order */
  __u32 remote_port;    /* Stored in network byte order */
  __u32 local_port;    /* stored in host byte order */
+    __u32 priority;
  };
    /* List of known BPF sock_ops operators.
diff --git a/net/core/filter.c b/net/core/filter.c
index 61c791f9f628..a6329642d047 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4449,6 +4449,17 @@ static u32 sock_ops_convert_ctx_access(enum
bpf_access_type type,
  *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->dst_reg,
    offsetof(struct sock_common, skc_num));
  break;
+
+    case offsetof(struct bpf_sock_ops, priority):
+    BUILD_BUG_ON(FIELD_SIZEOF(struct sock, sk_priority) != 4);
+
+    *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
+    struct bpf_sock_ops_kern, sk),
+  si->dst_reg, si->src_reg,
+  offsetof(struct bpf_sock_ops_kern, sk));
+    *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
+  offsetof(struct sock, sk_priority));
+    break;


Hm, I don't think this would work, I actually think your initial patch
was ok.
bpf_setsockopt() as well as bpf_getsockopt() check for sk_fullsock(sk)
right
before accessing options on either socket or TCP level, and bail out
with error
otherwise; in such cases we'd read something else here and assume it's
sk_priority.


even if it's not fullsock, it will just read zero, no? what's a problem
with that?
In non-fullsock hooks like BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB
the program author will know that it's meaningless to read sk_priority,
so returning zero with minimal checks is fine.
While adding extra runtime if (sk_fullsock(sk)) is unnecessary,
since the safety is not compromised.


Hm, on my kernel, struct sock has the 4 bytes sk_priority at offset 440,
struct request_sock itself is only 232 byte long in total, and the struct
inet_timewait_sock is 208 byte long, so you'd be accessing out of bounds
that way, so it cannot be ignored and assumed zero.

If we don't care about error when !fullsock, then you could code the
sk_fullsock(sk) check in BPF itself above in the ctx conversion, and
set it to 0 manually when !fullsock. It might make it harder in the
future to change sk_fullsock() itself, but in any case sk_fullsock()
helper should get a comment in its function saying that when contents
are changed, also above BPF bits need to be adjusted to remain an
equivalent test.

Re: [PATCH 1/2] bpf: add a bpf_override_function helper

2017-11-11 Thread Alexei Starovoitov


On 11/11/17 4:14 PM, Ingo Molnar wrote:


* Josef Bacik  wrote:


On Fri, Nov 10, 2017 at 10:34:59AM +0100, Ingo Molnar wrote:


* Josef Bacik  wrote:


@@ -551,6 +578,10 @@ static const struct bpf_func_proto 
*kprobe_prog_func_proto(enum bpf_func_id func
return _get_stackid_proto;
case BPF_FUNC_perf_event_read_value:
return _perf_event_read_value_proto;
+   case BPF_FUNC_override_return:
+   pr_warn_ratelimited("%s[%d] is installing a program with 
bpf_override_return helper that may cause unexpected behavior!",
+   current->comm, task_pid_nr(current));
+   return _override_return_proto;


So if this new functionality is used we'll always print this into the syslog?

The warning is also a bit passive aggressive about informing the user: what
unexpected behavior can happen, what is the worst case?



It's modeled after the other warnings bpf will spit out, but with this feature
you are skipping a function and instead returning some arbitrary value, so
anything could go wrong if you mess something up.  For instance I screwed up my
initial test case and made every IO submitted return an error instead of just on
the one file system I was attempting to test, so all sorts of hilarity ensued.


Ok, then for the x86 bits:

  NAK-ed-by: Ingo Molnar 

One of the major advantages of having an in-kernel BPF sandbox is to never crash
the kernel - and allowing BPF programs to just randomly modify the return value 
of
kernel functions sounds immensely broken to me.

(And yes, I realize that kprobes are used here as a vehicle, but the point
remains.)


yeah. modifying arbitrary function return pushes bpf outside of
its safety guarantees and in that sense doing the same
override_return could be done from a kernel module if kernel
provides the x64 side of the facility introduced by this patch.
On the other side adding parts of this feature to the kernel only
to be used by external kernel module is quite ugly too and not
something that was ever done before.
How about we restrict this bpf_override_return() only to the functions
which callers expect to handle errors ?
We can add something similar to NOKPROBE_SYMBOL(). Like
ALLOW_RETURN_OVERRIDE() and on btrfs side mark the functions
we're going to test with this feature.
Then 'not crashing kernel' requirement will be preserved.
btrfs or whatever else we will be testing with override_return
will be functioning in 'stress test' mode and if bpf program
is not careful and returns error all the time then one particular
subsystem (like btrfs) will not be functional, but the kernel
will not be crashing.
Thoughts?

[PATCH net-next 4/5] ip6_tunnel: process toobig in a better way

2017-11-11 Thread Xin Long

The same improvement in "ip6_gre: process toobig in a better way"
is needed by ip4ip6 and ip6ip6 as well.

Note that ip4ip6 and ip6ip6 will also update sk dst pmtu in their
err_handlers. Like I said before, gre6 could not do this as it's
inner proto is not certain. But for all of them, sk dst pmtu will
be updated in tx path if in need.

Signed-off-by: Xin Long 
---
 net/ipv6/ip6_tunnel.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index a1f704c..7e9e205 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -498,9 +498,8 @@ ip6_tnl_err(struct sk_buff *skb, __u8 ipproto, struct 
inet6_skb_parm *opt,
err = 0;
 
switch (*type) {
-   __u32 teli;
struct ipv6_tlv_tnl_enc_lim *tel;
-   __u32 mtu;
+   __u32 mtu, teli;
case ICMPV6_DEST_UNREACH:
net_dbg_ratelimited("%s: Path to destination invalid or 
inactive!\n",
t->parms.name);
@@ -531,11 +530,11 @@ ip6_tnl_err(struct sk_buff *skb, __u8 ipproto, struct 
inet6_skb_parm *opt,
}
break;
case ICMPV6_PKT_TOOBIG:
+   ip6_update_pmtu(skb, net, htonl(*info), 0, 0,
+   sock_net_uid(net, NULL));
mtu = *info - offset;
if (mtu < IPV6_MIN_MTU)
mtu = IPV6_MIN_MTU;
-   t->dev->mtu = mtu;
-
len = sizeof(*ipv6h) + ntohs(ipv6h->payload_len);
if (len > mtu) {
rel_type = ICMPV6_PKT_TOOBIG;
-- 
2.1.0

[PATCH net-next 1/5] ip6_gre: add the process for redirect in ip6gre_err

2017-11-11 Thread Xin Long

This patch is to add redirect icmp packet process for ip6gre by
calling ip6_redirect() in ip6gre_err(), as in vti6_err.

Prior to this patch, there's even no route cache generated after
receiving redirect.

Reported-by: Jianlin Shi 
Signed-off-by: Xin Long 
---
 net/ipv6/ip6_gre.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 3e10c51..0684d0c 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -369,6 +369,7 @@ static void ip6gre_tunnel_uninit(struct net_device *dev)
 static void ip6gre_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
   u8 type, u8 code, int offset, __be32 info)
 {
+   struct net *net = dev_net(skb->dev);
const struct gre_base_hdr *greh;
const struct ipv6hdr *ipv6h;
int grehlen = sizeof(*greh);
@@ -442,6 +443,10 @@ static void ip6gre_err(struct sk_buff *skb, struct 
inet6_skb_parm *opt,
mtu = IPV6_MIN_MTU;
t->dev->mtu = mtu;
return;
+   case NDISC_REDIRECT:
+   ip6_redirect(skb, net, skb->dev->ifindex, 0,
+sock_net_uid(net, NULL));
+   return;
}
 
if (time_before(jiffies, t->err_time + IP6TUNNEL_ERR_TIMEO))
-- 
2.1.0

[PATCH net-next 2/5] ip6_gre: process toobig in a better way

2017-11-11 Thread Xin Long

Now ip6gre processes toobig icmp packet by setting gre dev's mtu in
ip6gre_err, which would cause few things not good:

  - It couldn't set mtu with dev_set_mtu due to it's not in user context,
which causes route cache and idev->cnf.mtu6 not to be updated.

  - It has to update sk dst pmtu in tx path according to gredev->mtu for
ip6gre, while it updates pmtu again according to lower dst pmtu in
ip6_tnl_xmit.

  - To change dev->mtu by toobig icmp packet is not a good idea, it should
only work on pmtu.

This patch is to process toobig by updating the lower dst's pmtu, as later
sk dst pmtu will be updated in ip6_tnl_xmit, the same way as in ip4gre.

Note that gre dev's mtu will not be updated any more, it doesn't make any
sense to change dev's mtu after receiving a toobig packet.

Signed-off-by: Xin Long 
---
 net/ipv6/ip6_gre.c | 15 ++-
 1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 0684d0c..b90bad7 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -403,9 +403,8 @@ static void ip6gre_err(struct sk_buff *skb, struct 
inet6_skb_parm *opt,
return;
 
switch (type) {
-   __u32 teli;
struct ipv6_tlv_tnl_enc_lim *tel;
-   __u32 mtu;
+   __u32 teli;
case ICMPV6_DEST_UNREACH:
net_dbg_ratelimited("%s: Path to destination invalid or 
inactive!\n",
t->parms.name);
@@ -436,12 +435,7 @@ static void ip6gre_err(struct sk_buff *skb, struct 
inet6_skb_parm *opt,
}
return;
case ICMPV6_PKT_TOOBIG:
-   mtu = be32_to_cpu(info) - offset - t->tun_hlen;
-   if (t->dev->type == ARPHRD_ETHER)
-   mtu -= ETH_HLEN;
-   if (mtu < IPV6_MIN_MTU)
-   mtu = IPV6_MIN_MTU;
-   t->dev->mtu = mtu;
+   ip6_update_pmtu(skb, net, info, 0, 0, sock_net_uid(net, NULL));
return;
case NDISC_REDIRECT:
ip6_redirect(skb, net, skb->dev->ifindex, 0,
@@ -508,7 +502,6 @@ static netdev_tx_t __gre6_xmit(struct sk_buff *skb,
   __u32 *pmtu, __be16 proto)
 {
struct ip6_tnl *tunnel = netdev_priv(dev);
-   struct dst_entry *dst = skb_dst(skb);
__be16 protocol;
 
if (dev->type == ARPHRD_ETHER)
@@ -527,10 +520,6 @@ static netdev_tx_t __gre6_xmit(struct sk_buff *skb,
gre_build_header(skb, tunnel->tun_hlen, tunnel->parms.o_flags,
 protocol, tunnel->parms.o_key, htonl(tunnel->o_seqno));
 
-   /* TooBig packet may have updated dst->dev's mtu */
-   if (dst && dst_mtu(dst) > dst->dev->mtu)
-   dst->ops->update_pmtu(dst, NULL, skb, dst->dev->mtu);
-
return ip6_tnl_xmit(skb, dev, dsfield, fl6, encap_limit, pmtu,
NEXTHDR_GRE);
 }
-- 
2.1.0

Re: [PATCH v4] af_netlink: ensure that NLMSG_DONE never fails in dumps

2017-11-11 Thread David Miller

From: "Jason A. Donenfeld" 
Date: Thu,  9 Nov 2017 13:04:44 +0900

> @@ -2195,13 +2197,15 @@ static int netlink_dump(struct sock *sk)
>   return 0;
>   }
>  
> - nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);
> - if (!nlh)
> + nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE,
> +sizeof(nlk->dump_done_errno), NLM_F_MULTI);
> + if (WARN_ON(!nlh))
>   goto errout_skb;

If you're handling this by forcing another read() to procude the
NLMSG_DONE, then you have no reason to WARN_ON() here.

In fact you are adding a WARN_ON() which is trivially triggerable by
any user.

Re: [PATCH 1/2] bpf: add a bpf_override_function helper

2017-11-11 Thread Ingo Molnar


* Josef Bacik  wrote:

> On Fri, Nov 10, 2017 at 10:34:59AM +0100, Ingo Molnar wrote:
> > 
> > * Josef Bacik  wrote:
> > 
> > > @@ -551,6 +578,10 @@ static const struct bpf_func_proto 
> > > *kprobe_prog_func_proto(enum bpf_func_id func
> > >   return _get_stackid_proto;
> > >   case BPF_FUNC_perf_event_read_value:
> > >   return _perf_event_read_value_proto;
> > > + case BPF_FUNC_override_return:
> > > + pr_warn_ratelimited("%s[%d] is installing a program with 
> > > bpf_override_return helper that may cause unexpected behavior!",
> > > + current->comm, task_pid_nr(current));
> > > + return _override_return_proto;
> > 
> > So if this new functionality is used we'll always print this into the 
> > syslog?
> > 
> > The warning is also a bit passive aggressive about informing the user: what 
> > unexpected behavior can happen, what is the worst case?
> > 
> 
> It's modeled after the other warnings bpf will spit out, but with this feature
> you are skipping a function and instead returning some arbitrary value, so
> anything could go wrong if you mess something up.  For instance I screwed up 
> my
> initial test case and made every IO submitted return an error instead of just 
> on
> the one file system I was attempting to test, so all sorts of hilarity ensued.

Ok, then for the x86 bits:

  NAK-ed-by: Ingo Molnar 

One of the major advantages of having an in-kernel BPF sandbox is to never 
crash 
the kernel - and allowing BPF programs to just randomly modify the return value 
of 
kernel functions sounds immensely broken to me.

(And yes, I realize that kprobes are used here as a vehicle, but the point 
remains.)

Thanks,

Ingo

Re: [PATCH 0/2][v5] Add the ability to do BPF directed error injection

2017-11-11 Thread Ingo Molnar


* David Miller  wrote:

> From: Josef Bacik 
> Date: Tue,  7 Nov 2017 15:28:41 -0500
> 
> > I'm sending this through Dave since it'll conflict with other BPF changes 
> > in his
> > tree, but since it touches tracing as well Dave would like a review from
> > somebody on the tracing side.
>  ...
> > A lot of our error paths are not well tested because we have no good way of
> > injecting errors generically.  Some subystems (block, memory) have ways to
> > inject errors, but they are random so it's hard to get reproduceable 
> > results.
> > 
> > With BPF we can add determinism to our error injection.  We can use kprobes 
> > and
> > other things to verify we are injecting errors at the exact case we are 
> > trying
> > to test.  This patch gives us the tool to actual do the error injection 
> > part.
> > It is very simple, we just set the return value of the pt_regs we're given 
> > to
> > whatever we provide, and then override the PC with a dummy function that 
> > simply
> > returns.
> > 
> > Right now this only works on x86, but it would be simple enough to expand to
> > other architectures.  Thanks,
> 
> Series applied, thanks Josef.

Please don't apply it yet as the series is still under active discussion - for 
now 
I'm NAK-ing the x86 bits because I have second thoughts about the whole premise 
of 
the feature being added here.

Thanks,

Ingo

[net-next:master 622/639] net/dsa/port.c:255: undefined reference to `br_vlan_enabled'

2017-11-11 Thread kbuild test robot

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
master
head:   bee955cd3ab4f1a1eb8fc16e7ed69364143df8d7
commit: 2ea7a679ca2abd251c1ec03f20508619707e1749 [622/639] net: dsa: Don't add 
vlans when vlan filtering is disabled
config: x86_64-randconfig-s2-1208 (attached as .config)
compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026
reproduce:
git checkout 2ea7a679ca2abd251c1ec03f20508619707e1749
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   net/dsa/port.o: In function `dsa_port_vlan_add':
>> net/dsa/port.c:255: undefined reference to `br_vlan_enabled'
   net/dsa/port.o: In function `dsa_port_vlan_del':
   net/dsa/port.c:270: undefined reference to `br_vlan_enabled'

vim +255 net/dsa/port.c

   243  
   244  int dsa_port_vlan_add(struct dsa_port *dp,
   245const struct switchdev_obj_port_vlan *vlan,
   246struct switchdev_trans *trans)
   247  {
   248  struct dsa_notifier_vlan_info info = {
   249  .sw_index = dp->ds->index,
   250  .port = dp->index,
   251  .trans = trans,
   252  .vlan = vlan,
   253  };
   254  
 > 255  if (br_vlan_enabled(dp->bridge_dev))
   256  return dsa_port_notify(dp, DSA_NOTIFIER_VLAN_ADD, 
);
   257  
   258  return 0;
   259  }
   260  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCH 0/2][v5] Add the ability to do BPF directed error injection

2017-11-11 Thread David Miller

From: Ingo Molnar 
Date: Sat, 11 Nov 2017 09:16:00 +0100

> Please don't apply it yet as the series is still under active
> discussion - for now

Fine, reverted.

Re: [net-next:master 622/639] net/dsa/port.c:255: undefined reference to `br_vlan_enabled'

2017-11-11 Thread David Miller

From: kbuild test robot 
Date: Sat, 11 Nov 2017 16:57:08 +0800

> All errors (new ones prefixed by >>):
> 
>net/dsa/port.o: In function `dsa_port_vlan_add':
>>> net/dsa/port.c:255: undefined reference to `br_vlan_enabled'
>net/dsa/port.o: In function `dsa_port_vlan_del':
>net/dsa/port.c:270: undefined reference to `br_vlan_enabled'

Problem is NET_DSA=y and BRIDGE_VLAN_FILTERING=m

We need some Kconfig dependency foo to prevent this.

Re: [PATCH] net: mvneta: fix handling of the Tx descriptor counter

2017-11-11 Thread David Miller

From: Simon Guinot 
Date: Wed,  8 Nov 2017 17:58:35 +0100

> @@ -2413,8 +2416,7 @@ static int mvneta_tx(struct sk_buff *skb, struct 
> net_device *dev)
>   if (txq->count >= txq->tx_stop_threshold)
>   netif_tx_stop_queue(nq);
>  
> - if (!skb->xmit_more || netif_xmit_stopped(nq) ||
> - txq->pending + frags > MVNETA_TXQ_DEC_SENT_MASK)
> + if (!skb->xmit_more || netif_xmit_stopped(nq))
>   mvneta_txq_pend_desc_add(pp, txq, frags);
>   else
>   txq->pending += frags;

As David Laight said, you should not allow unlimited amounts of
->xmit_more frames to be queued without a TX doorbell update.

Therefore, please keep some kind of limit here otherwise latency
will spike in some circumstances.

Re: [PATCH net-next] ibmvnic: Add vnic client data to login buffer

2017-11-11 Thread David Miller

From: Nathan Fontenot 
Date: Wed, 08 Nov 2017 11:23:56 -0600

> Update the login buffer to include client data for the vnic driver,
> this includes the OS name, LPAR name, and device name. This update
> alolows thius information to be available in the VIOS.
  ^^^ ^

So many typos...

> Signed-off-by: Nathan Fontenot 

Applied, tanks.

Re: [PATCH] net: ethernet: bgmac: mark expected switch fall-through

2017-11-11 Thread David Miller

From: "Gustavo A. R. Silva" 
Date: Wed, 8 Nov 2017 11:24:57 -0600

> In preparation to enabling -Wimplicit-fallthrough, mark switch cases
> where we are expecting to fall through.
> 
> Addresses-Coverity-ID: 1397972
> Signed-off-by: Gustavo A. R. Silva 

Applied to net-next.

Re: [PATCH] fsl/fman_port: mark expected switch fall-throughs

2017-11-11 Thread David Miller

From: "Gustavo A. R. Silva" 
Date: Wed, 8 Nov 2017 11:57:13 -0600

> In preparation to enabling -Wimplicit-fallthrough, mark switch cases
> where we are expecting to fall through.
> 
> Addresses-Coverity-ID: 1397960
> Signed-off-by: Gustavo A. R. Silva 

Applied to net-next.

Re: [PATCH net-next 0/2] remove FACK loss recovery

2017-11-11 Thread David Miller

From: Yuchung Cheng 
Date: Wed,  8 Nov 2017 13:01:25 -0800

> This patch set removes the forward-acknowledgment (FACK)
> packet-based loss and reordering detection. This simplifies TCP
> loss recovery since the SACK scoreboard no longer needs to track
> the number of pending packets under highest SACKed sequence. FACK
> is subsumed by the time-based RACK loss detection which is more
> robust under reordering and second order losses.

Series applied, thank you.

Re: [PATCHv2 net-next 1/3] ip_gre: Refector the erpsan tunnel code.

2017-11-11 Thread David Miller

From: William Tu 
Date: Wed,  8 Nov 2017 16:13:02 -0800

> +static void erspan_build_header(struct sk_buff *skb,
> + __be32 id, u32 index,
> + bool truncate, bool is_ipv4)
> +{

Please do not put large non-inline functions into header files.

Re: [PATCH net-next] l2tp: don't close sessions in l2tp_tunnel_destruct()

2017-11-11 Thread David Miller

From: Guillaume Nault 
Date: Thu, 9 Nov 2017 08:29:52 +0900

> Sessions are already removed by the proto ->destroy() handlers, and
> since commit f3c66d4e144a ("l2tp: prevent creation of sessions on terminated 
> tunnels"),
> we're guaranteed that no new session can be created afterwards.
> 
> Furthermore, l2tp_tunnel_closeall() can sleep when there are sessions
> left to close. So we really shouldn't call it in a ->sk_destruct()
> handler, as it can be used from atomic context.
> 
> Signed-off-by: Guillaume Nault 

Applied, thank you.

Re: [PATCH] net: wan: x25_asy: mark expected switch fall-through

2017-11-11 Thread David Miller

From: "Gustavo A. R. Silva" 
Date: Wed, 8 Nov 2017 22:25:08 -0600

> In preparation to enabling -Wimplicit-fallthrough, mark switch cases
> where we are expecting to fall through.
> 
> Addresses-Coverity-ID: 114928
> Signed-off-by: Gustavo A. R. Silva 

Applied.

Re: [PATCH] net: decnet: dn_table: mark expected switch fall-through

2017-11-11 Thread David Miller

From: "Gustavo A. R. Silva" 
Date: Wed, 8 Nov 2017 21:38:28 -0600

> In preparation to enabling -Wimplicit-fallthrough, mark switch cases
> where we are expecting to fall through.
> 
> Addresses-Coverity-ID: 115106
> Signed-off-by: Gustavo A. R. Silva 

Applied.

Re: [PATCH] net: 3com: 3c574_cs: mark expected switch fall-through

2017-11-11 Thread David Miller

From: "Gustavo A. R. Silva" 
Date: Wed, 8 Nov 2017 21:49:33 -0600

> In preparation to enabling -Wimplicit-fallthrough, mark switch cases
> where we are expecting to fall through.
> 
> Addresses-Coverity-ID: 114888
> Signed-off-by: Gustavo A. R. Silva 

Applied.

Re: [PATCH] net: 8390: pcnet_cs: mark expected switch fall-through

2017-11-11 Thread David Miller

From: "Gustavo A. R. Silva" 
Date: Wed, 8 Nov 2017 21:44:38 -0600

> In preparation to enabling -Wimplicit-fallthrough, mark switch cases
> where we are expecting to fall through.
> 
> Addresses-Coverity-ID: 114891
> Signed-off-by: Gustavo A. R. Silva 

Applied.

Re: [PATCH] net: sfc: remove redundant variable start

2017-11-11 Thread David Miller

From: Colin King 
Date: Thu,  9 Nov 2017 08:01:22 +

> From: Colin Ian King 
> 
> Variable start is assigned but never read hence it is redundant
> and can be removed. Cleans up clang warning:
> 
> drivers/net/ethernet/sfc/ptp.c:655:2: warning: Value stored to 'start'
> is never read
> 
> Signed-off-by: Colin Ian King 

Applied.

Re: [PATCH] qlge: remove duplicated assignment to mbcp

2017-11-11 Thread David Miller

From: Colin King 
Date: Thu,  9 Nov 2017 07:52:15 +

> From: Colin Ian King 
> 
> The assignment to mbcp is identical to the initiatialized value assigned
> to mbcp at declaration time a few lines earlier, hence we can remove the
> second redundant assignment.  Cleans up clang warning:
> 
> drivers/net/ethernet/qlogic/qlge/qlge_mpi.c:209:22: warning:
> Value stored to 'mbcp' during its initialization is never read
> 
> Signed-off-by: Colin Ian King 

Applied, thanks.

Re: [PATCH] sock: Remove the global prot_inuse counter.

2017-11-11 Thread David Miller

From: Tonghao Zhang 
Date: Thu,  9 Nov 2017 00:03:15 -0800

> The per-cpu counter for init_net is prepared in core_initcall.
> The patch 7d720c3e ("percpu: add __percpu sparse annotations to net")
> and d6d9ca0fe ("net: this_cpu_xxx conversions") optimize the
> routines. Then remove the old counter.
> 
> Cc: Pavel Emelyanov 
> Signed-off-by: Tonghao Zhang 

Yeah, why did we keep this global counter around :-)

Applied to net-next, thanks.

Re: [PATCH net-next] net: thunderbolt: Clear finished Tx frame bus address in tbnet_tx_callback()

2017-11-11 Thread David Miller

From: Mika Westerberg 
Date: Thu,  9 Nov 2017 13:46:28 +0300

> When Thunderbolt network interface is disabled or when the cable is
> unplugged the driver releases all allocated buffers by calling
> tbnet_free_buffers() for each ring. This function then calls
> dma_unmap_page() for each buffer it finds where bus address is non-zero.
> Now, we only clear this bus address when the Tx buffer is sent to the
> hardware so it is possible that the function finds an entry that has
> already been unmapped.
> 
> Enabling DMA-API debugging catches this as well:
> 
>   thunderbolt :06:00.0: DMA-API: device driver tries to free DMA
> memory it has not allocated [device address=0x68321000] 
> [size=4096 bytes]
> 
> Fix this by clearing the bus address of a Tx frame right after we have
> unmapped the buffer.
> 
> Signed-off-by: Mika Westerberg 

Applied, but assuming zero is a non-valid DMA address is never a good
idea.  That's why we have the DMA error code signaling abstracted.

Re: [PATCH] netdev: add netdev_pagefrag_enabled sysctl

2017-11-11 Thread David Miller

From: Hongbo Li 
Date: Thu, 9 Nov 2017 16:12:27 +0800

> From: Hongbo Li 
> 
> This patch solves a memory frag issue when allocating skb.
> I found this issue in a udp scenario, here is my test model:
> 1. About five hundreds udp threads listen on server,
>and five hundreds client threads send udp pkts to them.
>Some threads send pkts in a faster speed than others.
> 2. The user processes on server don't have enough ability
>to receive these pkts.
> 
> Then I got following result:
> 1. Some udp sockets' recv-q reach the queue's limit, others
>not because of the global rmem limit.
> 2. The "free" command shows "used" memory is more than 62GB.
>But cat /proc/net/sockstat shows that udp uses only 12GB.
> 
> This will confused the user that why the system consumes so
> many memory.This is caused by the memory frags in netdev layer.
> __netdev_alloc_frag() allocs a page block which has 8 pages.
> 
> Then in this scenario, most skbs are freed when the recv-q
> is full, but if any skb in the same page block be queued to
> other recv-q which is not full, the whole page block can't
> be freed.
> 
> So from the view of kernel, these pages are used, but from
> the view of tcp/udp, only the skbs in recv-q are used.
> 
> To avoid exhausting memory in such scenario, I add a sysctl
> to make user can disable allocating skbs in page frag.
> 
> Signed-off-by: Hongbo Li 

When something like page fragments don't work properly, we fix
them rather then providing a way to disable them.

Thank you.

Re: [PATCH net-next] net: thunderx: fix double free error

2017-11-11 Thread David Miller

From: Aleksey Makarov 
Date: Thu,  9 Nov 2017 14:58:57 +0300

> This patch fixes an error in memory allocation/freeing in
> ThunderX PF driver.
> 
> I moved the allocation to the probe() function and made it managed.
> 
> From the Colin's email:
> 
> While running static analysis on linux-next with CoverityScan I found 3
> double free errors in the Cavium thunder driver.
> 
> The issue occurs on the err_disable_device: label of function nic_probe
> when nic_free_lmacmem(nic) is called and a double free occurs on
> nic->duplex, nic->link and nic->speed.  This occurs when nic_init_hw()
> fails:
> 
> /* Initialize hardware */
> err = nic_init_hw(nic);
> if (err)
> goto err_release_regions;
> 
> nic_init_hw() calls nic_get_hw_info() and this calls nic_free_lmacmem()
> if any of the allocations fail. This free'ing occurs again by the call
> to nic_free_lmacmem() on the err_release_regions exit path in nic_probe().
> 
> Reported-by: Colin Ian King 
> Signed-off-by: Aleksey Makarov 

Applied, thank you.

Re: [PATCH] ipvlan: fix ipv6 outbound device

2017-11-11 Thread David Miller

From: 
Date: Thu, 9 Nov 2017 20:09:31 +0800

> From: Keefe Liu 
> 
> When process the outbound packet of ipv6, we should assign the master
> device to output device other than input device.
> 
> Signed-off-by: Keefe Liu 

Applied.

Re: pull-request: net-next: ieee802154 2017-11-09

2017-11-11 Thread David Miller

From: Stefan Schmidt 
Date: Thu,  9 Nov 2017 18:12:49 +0100

> A small update on ieee802154 patches for net-next. Nothing dramatic, but 
> simply
> housekeeping this time around.
> A fix for the correct mask to be applied in the mrf24j40 driver by Gustavo A. 
> R. Silva
> Removal of a non existing email user for the ca8210 driver by Harry Morris
> A bunch of checkpatch cleanups across the subsystem from myself
> 
> Please pull, thanks a lot!

Pulled, thanks.

Re: [PATCH net-next] bindings: net: stmmac: correctify note about LPI interrupt

2017-11-11 Thread David Miller

From: Niklas Cassel 
Date: Thu,  9 Nov 2017 18:09:26 +0100

> There are two different combined signal for various interrupt events:
> In EQOS-CORE and EQOS-MTL configurations, mci_intr_o is the interrupt
> signal.
> In EQOS-DMA, EQOS-AHB and EQOS-AXI configurations, these interrupt events
> are combined with the events in the DMA on the sbd_intr_o signal.
> 
> Depending on configuration, the device tree irq "macirq" will refer to
> either mci_intr_o or sbd_intr_o.
> 
> The databook states:
> "The MAC generates the LPI interrupt when the Tx or Rx side enters or exits
> the LPI state. The interrupt mci_intr_o (sbd_intr_o in certain
> configurations) is asserted when the LPI interrupt status is set.
> 
> When the MAC exits the Rx LPI state, then in addition to the mci_intr_o
> (sbd_intr_o in certain configurations), the sideband signal lpi_intr_o is
> asserted.
> 
> If you do not want to gate-off the application clock during the Rx LPI
> state, you can leave the lpi_intr_o signal unconnected and use the
> mci_intr_o (sbd_intr_o in certain configurations) signal to detect Rx LPI
> exit."
> 
> Since the "macirq" is always raised when Tx or Rx enters/exits the LPI
> state, "eth_lpi" must therefore refer to lpi_intr_o, which is only raised
> when Rx exits the LPI state. Update the DT binding description to reflect
> reality.
> 
> Signed-off-by: Niklas Cassel 

Applied.

Re: [PATCH v3 net-next 0/6] mv88e6xxx broadcast flooding in hardware

2017-11-11 Thread David Miller

From: Andrew Lunn 
Date: Thu,  9 Nov 2017 22:29:50 +0100

> This patchset makes the mv88e6xxx driver perform flooding in hardware,
> rather than let the software bridge perform the flooding. This is a
> prerequisite for IGMP snooping on the bridge interface.
> 
> In order to make hardware broadcasting work, a few other issues need
> fixing or improving. SWITCHDEV_ATTR_ID_PORT_PARENT_ID is broken, which
> is apparent when testing on the ZII devel board with multiple
> switches.
> 
> Some of these patches are taken from a previous RFC patchset of IGMP
> support.
> 
> Rebased onto net-next, with fixup for Vivien's refactoring.

Series applied, thanks Andrew.

Re: [PATCH net-next] net: dsa: mv88e6xxx: Fix stats histogram mode

2017-11-11 Thread David Miller

From: Andrew Lunn 
Date: Fri, 10 Nov 2017 00:36:41 +0100

> The statistics histogram mode was not being explicitly initialized on
> devices other than the 6390 family. Clearing the statistics then
> overwrote the default setting, setting the histogram to a reserved
> mode.
> 
> Explicitly set the histogram mode for all devices. Change the
> statistics clear into a read/modify/write, and since it is now more
> complex, move it into global1.c.
> 
> Signed-off-by: Andrew Lunn 

Applied, thanks.

Re: [Patch net] vlan: fix a use-after-free in vlan_device_event()

2017-11-11 Thread David Miller

From: Cong Wang 
Date: Thu,  9 Nov 2017 16:43:13 -0800

> After refcnt reaches zero, vlan_vid_del() could free
> dev->vlan_info via RCU:
> 
>   RCU_INIT_POINTER(dev->vlan_info, NULL);
>   call_rcu(_info->rcu, vlan_info_rcu_free);
> 
> However, the pointer 'grp' still points to that memory
> since it is set before vlan_vid_del():
> 
> vlan_info = rtnl_dereference(dev->vlan_info);
> if (!vlan_info)
> goto out;
> grp = _info->grp;
> 
> Depends on when that RCU callback is scheduled, we could
> trigger a use-after-free in vlan_group_for_each_dev()
> right following this vlan_vid_del().
> 
> Fix it by moving vlan_vid_del() before setting grp. This
> is also symmetric to the vlan_vid_add() we call in
> vlan_device_event().
> 
> Reported-by: Fengguang Wu 
> Fixes: efc73f4bbc23 ("net: Fix memory leak - vlan_info struct")
> Cc: Alexander Duyck 
> Cc: Linus Torvalds 
> Cc: Girish Moodalbail 
> Signed-off-by: Cong Wang 

Applied and queued up for -stable, thanks Cong!

1 2 >

1 - 100 of 101 matches

Mail list logo