Re: [PATCH net v3 00/12] Fixes, cleanup and modernization for some legacy ethernet NIC drivers
From: Finn ThainDate: Sat, 11 Nov 2017 01:20:58 -0500 (EST) > This patch series adds support for the Linux Driver Model for Mac NIC > drivers, fixes some logging bugs, removes dead code, and adopts netif_* > calls to reduce code duplication. > > All up, about 100 lines of code are eliminated. > > This patch series has been tested on a variety of Macs, with coverage > for the changes to lib8390.c, mac8390.c, macsonic.c, sonic.[ch] and > macmace.c. > > This patch series should be applied after the NuBus subsystem > modernization patch series. Then you can't be submitting this for the networking tree.
[PATCH net-next 0/2] cxgb4: collect LE-TCAM and SGE queue contexts
Collect hardware dumps via ethtool --get-dump facility. Patch 1 collects LE-TCAM dump. Patch 2 collects SGE queue context dumps. Thanks, Rahul Rahul Lakkireddy (2): cxgb4: collect LE-TCAM dump cxgb4: collect SGE queue context dump drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h | 38 drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h | 2 + drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c| 253 ++ drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.h| 11 + drivers/net/ethernet/chelsio/cxgb4/cxgb4.h| 4 + drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c | 11 + drivers/net/ethernet/chelsio/cxgb4/t4_hw.c| 62 ++ drivers/net/ethernet/chelsio/cxgb4/t4_hw.h| 7 + drivers/net/ethernet/chelsio/cxgb4/t4_regs.h | 68 ++ 9 files changed, 456 insertions(+) -- 2.14.1
[PATCH net-next 1/2] cxgb4: collect LE-TCAM dump
Signed-off-by: Rahul LakkireddySigned-off-by: Ganesh Goudar --- drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h | 30 drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h | 1 + drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c| 175 ++ drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.h| 7 + drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c | 7 + drivers/net/ethernet/chelsio/cxgb4/t4_regs.h | 41 + 6 files changed, 261 insertions(+) diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h b/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h index 1de1d811fde3..f99db7b283fc 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h +++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h @@ -185,6 +185,36 @@ struct cudbg_vpd_data { u32 vpd_vers; }; +#define CUDBG_MAX_TCAM_TID 0x800 + +enum cudbg_le_entry_types { + LE_ET_UNKNOWN = 0, + LE_ET_TCAM_CON = 1, + LE_ET_TCAM_SERVER = 2, + LE_ET_TCAM_FILTER = 3, + LE_ET_TCAM_CLIP = 4, + LE_ET_TCAM_ROUTING = 5, + LE_ET_HASH_CON = 6, + LE_ET_INVALID_TID = 8, +}; + +struct cudbg_tcam { + u32 filter_start; + u32 server_start; + u32 clip_start; + u32 routing_start; + u32 tid_hash_base; + u32 max_tid; +}; + +struct cudbg_tid_data { + u32 tid; + u32 dbig_cmd; + u32 dbig_conf; + u32 dbig_rsp_stat; + u32 data[NUM_LE_DB_DBGI_RSP_DATA_INSTANCES]; +}; + #define CUDBG_NUM_ULPTX 11 #define CUDBG_NUM_ULPTX_READ 512 diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h b/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h index e484c514e9ae..4e5d189eae62 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h +++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h @@ -65,6 +65,7 @@ enum cudbg_dbg_entity_type { CUDBG_TID_INFO = 54, CUDBG_MPS_TCAM = 57, CUDBG_VPD_DATA = 58, + CUDBG_LE_TCAM = 59, CUDBG_CCTRL = 60, CUDBG_MA_INDIRECT = 61, CUDBG_ULPTX_LA = 62, diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c index 32c9858da110..dd7e26be98cf 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c @@ -1367,6 +1367,181 @@ int cudbg_collect_vpd_data(struct cudbg_init *pdbg_init, return rc; } +static int cudbg_read_tid(struct cudbg_init *pdbg_init, u32 tid, + struct cudbg_tid_data *tid_data) +{ + struct adapter *padap = pdbg_init->adap; + int i, cmd_retry = 8; + u32 val; + + /* Fill REQ_DATA regs with 0's */ + for (i = 0; i < NUM_LE_DB_DBGI_REQ_DATA_INSTANCES; i++) + t4_write_reg(padap, LE_DB_DBGI_REQ_DATA_A + (i << 2), 0); + + /* Write DBIG command */ + val = DBGICMD_V(4) | DBGITID_V(tid); + t4_write_reg(padap, LE_DB_DBGI_REQ_TCAM_CMD_A, val); + tid_data->dbig_cmd = val; + + val = DBGICMDSTRT_F | DBGICMDMODE_V(1); /* LE mode */ + t4_write_reg(padap, LE_DB_DBGI_CONFIG_A, val); + tid_data->dbig_conf = val; + + /* Poll the DBGICMDBUSY bit */ + val = 1; + while (val) { + val = t4_read_reg(padap, LE_DB_DBGI_CONFIG_A); + val = val & DBGICMDBUSY_F; + cmd_retry--; + if (!cmd_retry) + return CUDBG_SYSTEM_ERROR; + } + + /* Check RESP status */ + val = t4_read_reg(padap, LE_DB_DBGI_RSP_STATUS_A); + tid_data->dbig_rsp_stat = val; + if (!(val & 1)) + return CUDBG_SYSTEM_ERROR; + + /* Read RESP data */ + for (i = 0; i < NUM_LE_DB_DBGI_RSP_DATA_INSTANCES; i++) + tid_data->data[i] = t4_read_reg(padap, + LE_DB_DBGI_RSP_DATA_A + + (i << 2)); + tid_data->tid = tid; + return 0; +} + +static int cudbg_get_le_type(u32 tid, struct cudbg_tcam tcam_region) +{ + int type = LE_ET_UNKNOWN; + + if (tid < tcam_region.server_start) + type = LE_ET_TCAM_CON; + else if (tid < tcam_region.filter_start) + type = LE_ET_TCAM_SERVER; + else if (tid < tcam_region.clip_start) + type = LE_ET_TCAM_FILTER; + else if (tid < tcam_region.routing_start) + type = LE_ET_TCAM_CLIP; + else if (tid < tcam_region.tid_hash_base) + type = LE_ET_TCAM_ROUTING; + else if (tid < tcam_region.max_tid) + type = LE_ET_HASH_CON; + else + type = LE_ET_INVALID_TID; + + return type; +} + +static int cudbg_is_ipv6_entry(struct cudbg_tid_data *tid_data, + struct cudbg_tcam tcam_region) +{ + int ipv6 = 0; + int le_type; + + le_type = cudbg_get_le_type(tid_data->tid,
[PATCH net-next 2/2] cxgb4: collect SGE queue context dump
Collect SGE freelist queue and congestion manager contexts. Signed-off-by: Rahul LakkireddySigned-off-by: Ganesh Goudar --- drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h | 8 +++ drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h | 1 + drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c| 78 +++ drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.h| 4 ++ drivers/net/ethernet/chelsio/cxgb4/cxgb4.h| 4 ++ drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c | 4 ++ drivers/net/ethernet/chelsio/cxgb4/t4_hw.c| 62 ++ drivers/net/ethernet/chelsio/cxgb4/t4_hw.h| 7 ++ drivers/net/ethernet/chelsio/cxgb4/t4_regs.h | 27 9 files changed, 195 insertions(+) diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h b/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h index f99db7b283fc..605689957496 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h +++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h @@ -145,6 +145,14 @@ struct cudbg_tid_info_region_rev1 { u32 reserved[16]; }; +#define CUDBG_MAX_FL_QIDS 1024 + +struct cudbg_ch_cntxt { + u32 cntxt_type; + u32 cntxt_id; + u32 data[SGE_CTXT_SIZE / 4]; +}; + #define CUDBG_MAX_RPLC_SIZE 128 struct cudbg_mps_tcam { diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h b/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h index 4e5d189eae62..e10ff1ee62c5 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h +++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h @@ -63,6 +63,7 @@ enum cudbg_dbg_entity_type { CUDBG_PCIE_INDIRECT = 50, CUDBG_PM_INDIRECT = 51, CUDBG_TID_INFO = 54, + CUDBG_DUMP_CONTEXT = 56, CUDBG_MPS_TCAM = 57, CUDBG_VPD_DATA = 58, CUDBG_LE_TCAM = 59, diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c index dd7e26be98cf..d699bf88d18f 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c @@ -1115,6 +1115,84 @@ int cudbg_collect_tid(struct cudbg_init *pdbg_init, return rc; } +int cudbg_dump_context_size(struct adapter *padap) +{ + u32 value, size; + u8 flq; + + value = t4_read_reg(padap, SGE_FLM_CFG_A); + + /* Get number of data freelist queues */ + flq = HDRSTARTFLQ_G(value); + size = CUDBG_MAX_FL_QIDS >> flq; + + /* Add extra space for congestion manager contexts. +* The number of CONM contexts are same as number of freelist +* queues. +*/ + size += size; + return size * sizeof(struct cudbg_ch_cntxt); +} + +static void cudbg_read_sge_ctxt(struct cudbg_init *pdbg_init, u32 cid, + enum ctxt_type ctype, u32 *data) +{ + struct adapter *padap = pdbg_init->adap; + int rc = -1; + + /* Under heavy traffic, the SGE Queue contexts registers will be +* frequently accessed by firmware. +* +* To avoid conflicts with firmware, always ask firmware to fetch +* the SGE Queue contexts via mailbox. On failure, fallback to +* accessing hardware registers directly. +*/ + if (is_fw_attached(pdbg_init)) + rc = t4_sge_ctxt_rd(padap, padap->mbox, cid, ctype, data); + if (rc) + t4_sge_ctxt_rd_bd(padap, cid, ctype, data); +} + +int cudbg_collect_dump_context(struct cudbg_init *pdbg_init, + struct cudbg_buffer *dbg_buff, + struct cudbg_error *cudbg_err) +{ + struct adapter *padap = pdbg_init->adap; + struct cudbg_buffer temp_buff = { 0 }; + struct cudbg_ch_cntxt *buff; + u32 size, i = 0; + int rc; + + rc = cudbg_dump_context_size(padap); + if (rc <= 0) + return CUDBG_STATUS_ENTITY_NOT_FOUND; + + size = rc; + rc = cudbg_get_buff(dbg_buff, size, _buff); + if (rc) + return rc; + + buff = (struct cudbg_ch_cntxt *)temp_buff.data; + while (size > 0) { + buff->cntxt_type = CTXT_FLM; + buff->cntxt_id = i; + cudbg_read_sge_ctxt(pdbg_init, i, CTXT_FLM, buff->data); + buff++; + size -= sizeof(struct cudbg_ch_cntxt); + + buff->cntxt_type = CTXT_CNM; + buff->cntxt_id = i; + cudbg_read_sge_ctxt(pdbg_init, i, CTXT_CNM, buff->data); + buff++; + size -= sizeof(struct cudbg_ch_cntxt); + + i++; + } + + cudbg_write_and_release_buff(_buff, dbg_buff); + return rc; +} + static inline void cudbg_tcamxy2valmask(u64 x, u64 y, u8 *addr, u64 *mask) { *mask = x | y; diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.h b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.h index ebb2d9907fc9..caeee8e33e86
MUTUAL COPERATION THANK YOU
Dear Friend, I know that this message will come to you as a surprise. I am the Auditing and Accounting section manager with African Development Bank, Ouagadougou Burkina faso. I Hope that you will not expose or betray this trust and confident that I am about to repose on you for the mutual benefit of our both families. I need your urgent assistance in transferring the sum of($39.5)million to your account within 10 or 14 banking days. This money has been dormant for years in our Bank without claim.I want the bank to release the money to you as the nearest person to our deceased customer late George small. who died along with his supposed next of kin in an air crash since 31st October 1999. I don't want the money to go into government treasury as an abandoned fund. So this is the reason why I am contacting you so that the bank can release the money to you as the next of kin to the deceased customer. Please I would like you to keep this proposal as atop secret and delete it if you are not interested. Upon receipt of your reply, I will give you full details on how the business will be executed and also note that you will have 40% of the above mentioned sum if you agree to handle this business with me. I am expecting your urgent response as soon as you receive my message. Best Regard, Auditor Mr Obama Bassole.
Re: [PATCH net-next] cxgb4: collect vpd info directly from hardware
From: Rahul LakkireddyDate: Fri, 10 Nov 2017 13:03:37 +0530 > Collect vpd information directly from hardware instead of software > adapter context. Move EEPROM physical address to virtual address > translation logic to t4_hw.c and update relevant files. > > Fixes: 6f92a6544f1a ("cxgb4: collect hardware misc dumps") > Signed-off-by: Rahul Lakkireddy > Signed-off-by: Ganesh Goudar Applied, thanks.
Re: [net-next:master 622/639] net/dsa/port.c:255: undefined reference to `br_vlan_enabled'
On Sat, Nov 11, 2017 at 06:42:21PM +0900, David Miller wrote: > From: kbuild test robot> Date: Sat, 11 Nov 2017 16:57:08 +0800 > > > All errors (new ones prefixed by >>): > > > >net/dsa/port.o: In function `dsa_port_vlan_add': > >>> net/dsa/port.c:255: undefined reference to `br_vlan_enabled' > >net/dsa/port.o: In function `dsa_port_vlan_del': > >net/dsa/port.c:270: undefined reference to `br_vlan_enabled' > > Problem is NET_DSA=y and BRIDGE_VLAN_FILTERING=m > > We need some Kconfig dependency foo to prevent this. Yes. Lets see if i can put together a patch before Arnd! Andrew
Re: [PATCH RFC,WIP 5/5] netfilter: nft_flow_offload: add ndo hooks for hardware offload
On 2017-11-03 16:26, Pablo Neira Ayuso wrote: > This patch adds the infrastructure to offload flows to hardware, in case > the nic/switch comes with built-in flow tables capabilities. > > If the hardware comes with not hardware flow tables or they have > limitations in terms of features, this falls back to the software > generic flow table implementation. > > The software flow table aging thread skips entries that resides in the > hardware, so the hardware will be responsible for releasing this flow > table entry too. > > Signed-off-by: Pablo Neira AyusoHi Pablo, I'd like to start playing with those patches in OpenWrt/LEDE soon. I'm also considering making a patch that adds iptables support. For that to work, I think it would be a good idea to keep the code that tries to offload flows to hardware in nf_flow_offload.c instead, so that it can be shared with iptables integration. By the way, do you have a git tree where you keep the current version of your patch set? Thanks, - Felix
Re: [PATCH net-next 0/3] l2tp: avoid aliasing tunnels socket pointer
From: Guillaume NaultDate: Sat, 11 Nov 2017 06:06:23 +0900 > We don't need to copy the tunnel's socket pointer in the pseudo-wire > specific session structures. This uselessly complicates the code > and hampers evolution. > > This series was part of an effort to protect tunnels socket pointer > with RCU. But since it provides nice cleanup, I submit it separately. Nice simplification, applied, thanks.
Re: [PATCH] net: Remove unused skb_shared_info member
From: Mat MartineauDate: Fri, 10 Nov 2017 14:03:51 -0800 > ip6_frag_id was only used by UFO, which has been removed. > ipv6_proxy_select_ident() only existed to set ip6_frag_id and has no > in-tree callers. > > Signed-off-by: Mat Martineau Applied to net-next, thanks.
Re: [PATCH] tcp: Export to userspace the TCP state names for the trace events
2017-11-11 3:32 GMT+00:00 Steven Rostedt: > On Sat, 11 Nov 2017 02:06:00 + > Yafang Shao wrote: > >> 2017-11-10 15:07 GMT+00:00 Steven Rostedt : >> > On Fri, 10 Nov 2017 12:56:06 +0800 >> > Yafang Shao wrote: >> > >> >> Could the macro tcp_state_name() be renamed ? >> >> If is included in include/net/tcp.h, it will >> > >> > Ideally, you don't want to include trace/events/*.h headers in other >> > headers, as they can have side effects if those headers are included in >> > other trace/events/*.h headers. >> > >> >> Actually I find trace/events/*.h is included in lots of other headers, >> for example, >> >> net/rxrpc/ar-internal.h > > This is an internal header, so it's not that likely to be used where it > shouldn't be. > >> include/linux/bpf_trace.h >> fs/f2fs/trace.h > > The above two are actually headers specifically used to pull in the > trace/events/*.h headers. > >> fs/afs/internal.h > > another internal header. Unlikely to be misused. > >> arch/x86/include/asm/mmu_context.h > > This one, hmm, probably should be fixed. > >> ... >> >> Are these files doing properly ? > > Most yes, some probably not. > >> Should we fix them ? > > Probably, but if they are used incorrectly, it would usually fail on > build (The same global functions and variables would be defined). > >> >> But per my understanding, it is ok to include trace/events/*.h in >> other headers because we defined TRACE_SYSTEM as well, as a >> consequence those headers should not included in trace/events/*.h. If >> that happens, it may means that one of the these two TRACE_SYSTEM is >> not defined properly. Maybe these two TRACE_SYSTEM should be merged to >> one TRACE_SYSTEM. > > Two different files may have the same TRACE_SYSTEM defined. That's not > an issue. > > The issue is, if you have a trace/events/*.h header in a popular file > (like it use to be in include/linux/slab.h), then it can cause issues > if another trace/events/*.h header includes it. That's because each > trace/events/*.h header must be included with CREATE_TRACE_POINTS only > once. > Understood. Thanks for explanation. >> >> >> >> cause compile error, because there's another function tcp_state_name() >> >> defined in net/netfilter/ipvs/ip_vs_proto_tcp.c. >> >> static const char * tcp_state_name(int state) >> >> { >> >> >> >> if (state >= IP_VS_TCP_S_LAST) >> >> >> >> return "ERR!"; >> >> >> >> return tcp_state_name_table[state] ? tcp_state_name_table[state] >> >> : "?"; >> >> >> >> } >> > >> > But that said, I didn't make up the trace_state_name(), it was already >> > there in net-next before this patch. >> > >> >> I know that is not your fault. > > :-) > >> But as you are modifying this file, it is better to modify it in your >> patch as well. >> So we need not submit another new patch to fix it. > > I could whip up a patch 2. > >> >> > But yeah, in actuality, I would have just done: >> > >> > #define EM(a) { a, #a }, >> > #define EMe(a) { a, #a } >> > >> > directly. Which we can still do. >> > >> > -- Steve >> > >> >> The suggestion from Song is good to fix it. > > Song's suggestion seems like it can simple be a patch added on top of > mine. As it is somewhat agnostic to the fix I'm making. That is, it's a > different problem, and thus should be a different patch. > Got it. These two issues should be fixed in two different patches :-) Thanks Yafang
Re: [PATCH v4] af_netlink: ensure that NLMSG_DONE never fails in dumps
On Sat, 2017-11-11 at 23:09 +0900, David Miller wrote: > From: "Jason A. Donenfeld"> Date: Thu, 9 Nov 2017 13:04:44 +0900 > > > @@ -2195,13 +2197,15 @@ static int netlink_dump(struct sock *sk) > > return 0; > > } > > > > - nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI); > > - if (!nlh) > > + nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, > > +sizeof(nlk->dump_done_errno), NLM_F_MULTI); > > + if (WARN_ON(!nlh)) > > goto errout_skb; > > If you're handling this by forcing another read() to procude the > NLMSG_DONE, then you have no reason to WARN_ON() here. > > In fact you are adding a WARN_ON() which is trivially triggerable by > any user. I added this in my suggestion for how this could work, but I don't think you're right, since we previously check if there's enough space. The patch is missing the full context, but this is: + if (nlk->dump_done_errno > 0 || + skb_tailroom(skb) < nlmsg_total_size(sizeof(nlk->dump_done_errno))) { mutex_unlock(nlk->cb_mutex); if (sk_filter(sk, skb)) kfree_skb(skb); else __netlink_sendskb(sk, skb); return 0; } - nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI); - if (!nlh) + nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, + sizeof(nlk->dump_done_errno), NLM_F_MULTI); + if (WARN_ON(!nlh)) So unless the nlmsg_total_size() vs. nlmsg_put_answer() suddenly gets a different idea of how much space is needed, nlh shouldn't ever be NULL once we get here. johannes
Re: pull-request: wireless-drivers-next 2017-11-11
David Millerwrites: > From: Kalle Valo > Date: Sat, 11 Nov 2017 15:03:14 +0200 > >> some more patches to net-next for v4.15. Even though I applied the last >> patch only on Saturday morning, all these have been tested by kbuild bot >> and most of them should also be in linux-next. Please let me know if >> there are any problems. > > Pulled, but looking at your merge commit message: Thanks! >> Major changes: >> >> iwlwifi >> >> * some new PCI IDs > > I doubt this was the only major change in here :-))) Yeah, you're right. I wrote that too hastily. My excuse this time is that I tagged it at the airport :) But of course I should have prepared it better. -- Kalle Valo
[GIT] Networking
1) Use after free in vlan, from Cong Wang. 2) Handle NAPI poll with a zero budget properly in mlx5 driver, from Saeed Mahameed. 3) If DMA mapping fails in mlx5 driver, NULL out page, from Inbar Karmy. 4) Handle overrun in RX FIFO of sun4i CAN driver, from Gerhard Bertelsmann. 5) Missing return in mdb and vlan prepare phase of DSA layer, from Vivien Didelot. Please pull, thanks a lot! The following changes since commit 3fefc31843cfe2b5f072efe11ed9ccaf6a7a5092: Merge tag 'pm-final-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm (2017-11-09 11:16:28 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git for you to fetch changes up to 92d28828179675176cd90293699b394b6d22ce68: Merge tag 'linux-can-fixes-for-4.14-20171110' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can (2017-11-11 21:52:01 +0900) Cong Wang (1): vlan: fix a use-after-free in vlan_device_event() David S. Miller (2): Merge tag 'mlx5-fixes-2017-11-08' of git://git.kernel.org/.../saeed/linux Merge tag 'linux-can-fixes-for-4.14-20171110' of git://git.kernel.org/.../mkl/linux-can Eric Dumazet (1): tcp: gso: avoid refcount_t warning from tcp_gso_segment() Eugenia Emantayev (1): net/mlx5e: Increase Striding RQ minimum size limit to 4 multi-packet WQEs Gerhard Bertelsmann (1): can: sun4i: handle overrun in RX FIFO Huy Nguyen (2): net/mlx5: Loop over temp list to release delay events net/mlx5: Cancel health poll before sending panic teardown command Håkon Bugge (1): rds: ib: Fix NULL pointer dereference in debug code Inbar Karmy (1): net/mlx5e: Set page to null in case dma mapping fails Marek Vasut (1): can: ifi: Fix transmitter delay calculation Richard Schütz (1): can: c_can: don't indicate triple sampling support for D_CAN Saeed Mahameed (1): net/mlx5e: Fix napi poll with zero budget Stephane Grosjean (1): can: peak: Add support for new PCIe/M2 CAN FD interfaces Vivien Didelot (2): net: dsa: return after mdb prepare phase net: dsa: return after vlan prepare phase Yuchung Cheng (1): tcp: fix tcp_fastretrans_alert warning drivers/net/can/c_can/c_can_pci.c | 1 - drivers/net/can/c_can/c_can_platform.c| 1 - drivers/net/can/ifi_canfd/ifi_canfd.c | 6 +++--- drivers/net/can/peak_canfd/peak_pciefd_main.c | 14 -- drivers/net/can/sun4i_can.c | 12 ++-- drivers/net/ethernet/mellanox/mlx5/core/dev.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/en.h | 2 +- drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 12 +--- drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c | 10 ++ drivers/net/ethernet/mellanox/mlx5/core/main.c| 7 +++ net/8021q/vlan.c | 6 +++--- net/dsa/switch.c | 4 net/ipv4/tcp_input.c | 3 +-- net/ipv4/tcp_offload.c| 12 ++-- net/rds/ib_recv.c | 10 +- 15 files changed, 68 insertions(+), 34 deletions(-)
[PATCH net-next] net: dsa: Fix dependencies on bridge
DSA now uses one of the symbols exported by the bridge, br_vlan_enabled(). This has a stub, if the bridge is not enabled. However, if the bridge is enabled, we cannot have DSA built in and the bridge as a module, otherwise we get undefined symbols at link time: net/dsa/port.o: In function `dsa_port_vlan_add': net/dsa/port.c:255: undefined reference to `br_vlan_enabled' net/dsa/port.o: In function `dsa_port_vlan_del': net/dsa/port.c:270: undefined reference to `br_vlan_enabled' Reported-by: kbuild test robotSigned-off-by: Andrew Lunn --- net/dsa/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/net/dsa/Kconfig b/net/dsa/Kconfig index cc5f8f971689..6246254e9a2b 100644 --- a/net/dsa/Kconfig +++ b/net/dsa/Kconfig @@ -7,6 +7,7 @@ config HAVE_NET_DSA config NET_DSA tristate "Distributed Switch Architecture" depends on HAVE_NET_DSA && MAY_USE_DEVLINK + depends on BRIDGE || BRIDGE=n select NET_SWITCHDEV select PHYLIB ---help--- -- 2.15.0
Re: [PATCH net-next 2/4] net: dsa: tag_brcm: Prepare for supporting prepended tag
> +static struct sk_buff *brcm_tag_rcv_ll(struct sk_buff *skb, > +struct net_device *dev, > +struct packet_type *pt, > +unsigned int offset) > { > int source_port; > u8 *brcm_tag; > @@ -103,8 +114,7 @@ static struct sk_buff *brcm_tag_rcv(struct sk_buff *skb, > struct net_device *dev, > if (unlikely(!pskb_may_pull(skb, BRCM_TAG_LEN))) > return NULL; > > - /* skb->data points to the EtherType, the tag is right before it */ > - brcm_tag = skb->data - 2; > + brcm_tag = skb->data - offset; A minor nit. The first part of the comment is still true. And having it gives you an anchor point to understanding where are we going from when we go backwards in the packet. Yes, the comment appears later, but at that point we are not dealing with skb->data. Otherwise: Reviewed-by: Andrew LunnAndrew
[PATCH] wcn36xx: fix iris child-node lookup
Fix child-node lookup during probe, which ended up searching the whole device tree depth-first starting at the parent rather than just matching on its children. To make things worse, the parent mmio node was also prematurely freed. Fixes: fd52bdae9ab0 ("wcn36xx: Disable 5GHz for wcn3620") Cc: stable# 4.14 Cc: Loic Poulain Signed-off-by: Johan Hovold --- drivers/net/wireless/ath/wcn36xx/main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/wireless/ath/wcn36xx/main.c b/drivers/net/wireless/ath/wcn36xx/main.c index 71812a2dd513..f7d228b5ba93 100644 --- a/drivers/net/wireless/ath/wcn36xx/main.c +++ b/drivers/net/wireless/ath/wcn36xx/main.c @@ -1233,7 +1233,7 @@ static int wcn36xx_platform_get_resources(struct wcn36xx *wcn, } /* External RF module */ - iris_node = of_find_node_by_name(mmio_node, "iris"); + iris_node = of_get_child_by_name(mmio_node, "iris"); if (iris_node) { if (of_device_is_compatible(iris_node, "qcom,wcn3620")) wcn->rf_id = RF_IRIS_WCN3620; -- 2.15.0
Re: [PATCH net] vxlan: fix the issue that neigh proxy blocks all icmpv6 packets
❦ 11 novembre 2017 19:58 +0800, Xin Long: > Commit f1fb08f6337c ("vxlan: fix ND proxy when skb doesn't have transport > header offset") removed icmp6_code and icmp6_type check before calling > neigh_reduce when doing neigh proxy. > > It means all icmpv6 packets would be blocked by this, not only ns packet. > In Jianlin's env, even ping6 couldn't work through it. > > This patch is to bring the icmp6_code and icmp6_type check back and also > removed the same check from neigh_reduce(). I am very sorry for not having spotted this bug earlier. I have tested your fix and I can confirm it works as expected. > Fixes: f1fb08f6337c ("vxlan: fix ND proxy when skb doesn't have transport > header offset") > Reported-by: Jianlin Shi > Signed-off-by: Xin Long Reviewed-by: Vincent Bernat -- Don't just echo the code with comments - make every comment count. - The Elements of Programming Style (Kernighan & Plauger)
Re: pull-request: can 2017-11-10
From: Marc Kleine-BuddeDate: Fri, 10 Nov 2017 14:07:26 +0100 > this is a pull request for net/master. > > The first patch by Richard Schütz for the c_can driver removes the false > indication to support triple sampling for d_can. Gerhard Bertelsmann's > patch for the sun4i driver improves the RX overrun handling. The patch > by Stephane Grosjean for the peak_canfd driver adds the PCI ids for > various new PCIe/M2 interfaces. Marek Vasut's patch for the ifi driver > fix transmitter delay calculation. Pulled, thanks Marc.
pull-request: wireless-drivers-next 2017-11-11
Hi Dave, some more patches to net-next for v4.15. Even though I applied the last patch only on Saturday morning, all these have been tested by kbuild bot and most of them should also be in linux-next. Please let me know if there are any problems. Kalle The following changes since commit 2798b80b385384d51a81832556ee9ad25d175f9b: Merge branch 'eBPF-based-device-cgroup-controller' (2017-11-05 23:26:51 +0900) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next.git tags/wireless-drivers-next-for-davem-2017-11-11 for you to fetch changes up to fdd0bd88ceaecf729db103ac8836af5805dd2dc1: brcmfmac: add CLM download support (2017-11-11 03:04:09 +0200) wireless-drivers-next patches for 4.15 Last minute patches before the merge window. Not really anything special standing out, mostly fixes or cleanup and some minor new features. Major changes: iwlwifi * some new PCI IDs Arend Van Spriel (6): brcmfmac: handle FWHALT mailbox indication brcmfmac: cleanup brcmf_cfg80211_escan() function brcmfmac: use msecs_to_jiffies() instead of calculation using HZ brcmfmac: get rid of brcmf_cfg80211_escan() function brcmfmac: get rid of struct brcmf_cfg80211_info::active_scan field brcmfmac: move configuration of probe request IEs Arnd Bergmann (4): rtlwifi: fix uninitialized rtlhal->last_suspend_sec time rtlwifi: use ktime_get_real_seconds() for suspend time rtlwifi: drop unused ppsc->last_wakeup_time rt2x00: use monotonic timestamps for frame dump Chung-Hsien Hsu (1): brcmfmac: add CLM download support Colin Ian King (5): rtlwifi: remove redundant pointer tid_data rtlwifi: remove redundant initialization to cfg_cmd iwlegacy: remove redundant pointer sta_priv orinoco_usb: remove redundant pointer dev zd1201: remove unused variable framelen Emmanuel Grumbach (3): iwlwifi: mvm: rs: remove the ANT C from the toogle antenna logic iwlwifi: remove dead code for internal devices only iwlwifi: remove host assisted paging Franky Lin (1): brcmfmac: disable packet filtering in promiscuous mode Gustavo A. R. Silva (1): rsi: rsi_91x_ps: remove redundant code in str_psstate Igor Mitsyanko (9): qtnfmac: use per-band HT/VHT info from wireless device qtnfmac: initialize HT/VHT caps "can override" masks qtnfmac: get rid of PHYMODE capabilities flags qtnfmac: extend "IE set" TLV to include frame type info qtnfmac: SCAN results: retreive frame type information from "IE set" TLV qtnfmac: convert "Append IEs" command to QTN_TLV_ID_IE_SET usage qtnfmac: configure and start AP interface with a single command qtnfmac: include HTCAP and VHTCAP into config AP command qtnfmac: pass all CONNECT cmd params to wireless card for processing Ihab Zhaika (3): iwlwifi: add new cards for 8260 series iwlwifi: add new cards for 8265 series iwlwifi: add new cards for a000 series Kalle Valo (1): Merge tag 'iwlwifi-next-for-kalle-2017-11-03' of git://git.kernel.org/.../iwlwifi/iwlwifi-next Kees Cook (1): iwlwifi: mvm: Convert timers to use timer_setup() Kirtika Ruchandani (1): iwlwifi: Add more call-sites for pcie reg dumper Larry Finger (3): rtlwifi: rtl_pci: Fix formatting errors in pci.h rtlwifi: rtl_pci: Fix formatting problems in pci.c rtlwifi: rtl_pci: Simplify some code be eliminating extraneous variables Liad Kaufman (1): iwlwifi: mvm: reset seq num after restart Luca Coelho (1): iwlwifi: mvm: hold mutex when flushing in iwl_mvm_flush_no_vif() Ping-Ke Shih (4): rtlwifi: rtl_pci: Add support for 8822be TX/RX BD rtlwifi: rtl_pci: Add fill_tx_special_desc to issue H2C data, and process TXOK in interrupt. rtlwifi: rtl_pci: Add ID for 8822BE rtlwifi: rtl_pci: Extend recognized interrupt parameters from two to four ISR Sara Sharon (6): iwlwifi: mvm: use RS macro instead of duplicating the code iwlwifi: mvm: cleanup references to aggregation count limit iwlwifi: mvm: improve latency when there is a reorder timeout iwlwifi: fix multi queue notification for a000 devices iwlwifi: mvm: refactor iwl_mvm_flush_no_vif iwlwifi: mvm: add missing implementation of flush for a000 devices Shahar S Matityahu (1): iwlwifi: drop RX frames during hardware restart Stanislaw Gruszka (1): rt2x00usb: mark device removed when get ENOENT usb error .../net/wireless/broadcom/brcm80211/brcmfmac/bus.h | 10 + .../broadcom/brcm80211/brcmfmac/cfg80211.c | 162 ++-- .../broadcom/brcm80211/brcmfmac/cfg80211.h | 2 - .../wireless/broadcom/brcm80211/brcmfmac/common.c | 157 .../wireless/broadcom/brcm80211/brcmfmac/core.c
Re: [net-next] tcp: allow drivers to tweak TSQ logic
Thanks Eric! > We expect wifi drivers to set this field to smaller values (tests have > been done with values from 6 to 9) I suppose we should test each driver or so. > They would have to use following template : > > if (skb->sk && skb->sk->sk_pacing_shift != MY_PACING_SHIFT) > skb->sk->sk_pacing_shift = MY_PACING_SHIFT; Hm. I wish we wouldn't have to do this on every skb, but perhaps it doesn't matter that much. > u16 sk_gso_max_segs; > + u8 sk_pacing_shift; I guess you tried to fill a hole, but weren't we saying that it would be better in the same cacheline? Then again, perhaps both cachelines are resident anyway, haven't looked at this now. Unrelated to that, I think this is missing a documentation update since the struct has kernel-doc comments. johannes
[PATCH net] vxlan: fix the issue that neigh proxy blocks all icmpv6 packets
Commit f1fb08f6337c ("vxlan: fix ND proxy when skb doesn't have transport header offset") removed icmp6_code and icmp6_type check before calling neigh_reduce when doing neigh proxy. It means all icmpv6 packets would be blocked by this, not only ns packet. In Jianlin's env, even ping6 couldn't work through it. This patch is to bring the icmp6_code and icmp6_type check back and also removed the same check from neigh_reduce(). Fixes: f1fb08f6337c ("vxlan: fix ND proxy when skb doesn't have transport header offset") Reported-by: Jianlin ShiSigned-off-by: Xin Long --- drivers/net/vxlan.c | 31 +-- 1 file changed, 13 insertions(+), 18 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index d7c49cf..a2f4e52 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -1623,26 +1623,19 @@ static struct sk_buff *vxlan_na_create(struct sk_buff *request, static int neigh_reduce(struct net_device *dev, struct sk_buff *skb, __be32 vni) { struct vxlan_dev *vxlan = netdev_priv(dev); - struct nd_msg *msg; - const struct ipv6hdr *iphdr; const struct in6_addr *daddr; - struct neighbour *n; + const struct ipv6hdr *iphdr; struct inet6_dev *in6_dev; + struct neighbour *n; + struct nd_msg *msg; in6_dev = __in6_dev_get(dev); if (!in6_dev) goto out; - if (!pskb_may_pull(skb, sizeof(struct ipv6hdr) + sizeof(struct nd_msg))) - goto out; - iphdr = ipv6_hdr(skb); daddr = >daddr; - msg = (struct nd_msg *)(iphdr + 1); - if (msg->icmph.icmp6_code != 0 || - msg->icmph.icmp6_type != NDISC_NEIGHBOUR_SOLICITATION) - goto out; if (ipv6_addr_loopback(daddr) || ipv6_addr_is_multicast(>target)) @@ -2240,11 +2233,11 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev) { struct vxlan_dev *vxlan = netdev_priv(dev); + struct vxlan_rdst *rdst, *fdst = NULL; const struct ip_tunnel_info *info; - struct ethhdr *eth; bool did_rsc = false; - struct vxlan_rdst *rdst, *fdst = NULL; struct vxlan_fdb *f; + struct ethhdr *eth; __be32 vni = 0; info = skb_tunnel_info(skb); @@ -2269,12 +2262,14 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev) if (ntohs(eth->h_proto) == ETH_P_ARP) return arp_reduce(dev, skb, vni); #if IS_ENABLED(CONFIG_IPV6) - else if (ntohs(eth->h_proto) == ETH_P_IPV6) { - struct ipv6hdr *hdr, _hdr; - if ((hdr = skb_header_pointer(skb, - skb_network_offset(skb), - sizeof(_hdr), &_hdr)) && - hdr->nexthdr == IPPROTO_ICMPV6) + else if (ntohs(eth->h_proto) == ETH_P_IPV6 && +pskb_may_pull(skb, sizeof(struct ipv6hdr) + + sizeof(struct nd_msg)) && +ipv6_hdr(skb)->nexthdr == IPPROTO_ICMPV6) { + struct nd_msg *m = (struct nd_msg *)(ipv6_hdr(skb) + 1); + + if (m->icmph.icmp6_code == 0 && + m->icmph.icmp6_type == NDISC_NEIGHBOUR_SOLICITATION) return neigh_reduce(dev, skb, vni); } #endif -- 2.1.0
Re: [PATCH net-next 1/4] net: dsa: Pass a port to get_tag_protocol()
On Fri, Nov 10, 2017 at 03:22:52PM -0800, Florian Fainelli wrote: > A number of drivers want to check whether the configured CPU port is a > possible configuration for enabling tagging, pass down the CPU port > number so they verify that. > > -static bool b53_can_enable_brcm_tags(struct dsa_switch *ds) > +static bool b53_can_enable_brcm_tags(struct dsa_switch *ds, int port) > { > - unsigned int brcm_tag_mask; > - unsigned int i; > - > /* Broadcom switches will accept enabling Broadcom tags on the >* following ports: 5, 7 and 8, any other port is not supported >*/ > - brcm_tag_mask = BIT(B53_CPU_PORT_25) | BIT(7) | BIT(B53_CPU_PORT); > - > - for (i = 0; i < ds->num_ports; i++) { > - if (dsa_is_cpu_port(ds, i)) { > - if (!(BIT(i) & brcm_tag_mask)) { > - dev_warn(ds->dev, > - "Port %d is not Broadcom tag > capable\n", > - i); > - return false; > - } > - } > + switch (port) { > + case B53_CPU_PORT_25: > + case 7: > + case B53_CPU_PORT: > + return true; > } > > - return true; > + dev_warn(ds->dev, "Port %d is not Broadcom tag capable\n", port); > + return false; > } Hi Florian This looks a lot better than the previous implementation. Reviewed-by: Andrew LunnAndrew
[PATCHv3 1/1] bnx2x: fix slowpath null crash
When "NETDEV WATCHDOG: em4 (bnx2x): transmit queue 2 timed out" occurs, BNX2X_SP_RTNL_TX_TIMEOUT is set. In the function bnx2x_sp_rtnl_task, bnx2x_nic_unload and bnx2x_nic_load are executed to shutdown and open NIC. In the function bnx2x_nic_load, bnx2x_alloc_mem allocates dma failure. The message "bnx2x: [bnx2x_alloc_mem:8399(em4)]Can't allocate memory" pops out. The variable slowpath is set to NULL. When shutdown the NIC, the function bnx2x_nic_unload is called. In the function bnx2x_nic_unload, the following functions are executed. bnx2x_chip_cleanup bnx2x_set_storm_rx_mode bnx2x_set_q_rx_mode bnx2x_set_q_rx_mode bnx2x_config_rx_mode bnx2x_set_rx_mode_e2 In the function bnx2x_set_rx_mode_e2, the variable slowpath is operated. Then the crash occurs. To fix this crash, the variable slowpath is checked. And in the function bnx2x_sp_rtnl_task, after dma memory allocation fails, another shutdown and open NIC is executed. CC: Joe JinCC: Junxiao Bi Signed-off-by: Zhu Yanjun Acked-by: Ariel Elior --- v2->v3 Changes: fix the style of comments, add the leading space V1->v2 Changes: add Acker and remove unnecessary brackets --- drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c index c12b4d3..fbd302a 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c @@ -9332,7 +9332,7 @@ void bnx2x_chip_cleanup(struct bnx2x *bp, int unload_mode, bool keep_link) /* Schedule the rx_mode command */ if (test_bit(BNX2X_FILTER_RX_MODE_PENDING, >sp_state)) set_bit(BNX2X_FILTER_RX_MODE_SCHED, >sp_state); - else + else if (bp->slowpath) bnx2x_set_storm_rx_mode(bp); /* Cleanup multicast configuration */ @@ -10271,8 +10271,15 @@ static void bnx2x_sp_rtnl_task(struct work_struct *work) smp_mb(); bnx2x_nic_unload(bp, UNLOAD_NORMAL, true); - bnx2x_nic_load(bp, LOAD_NORMAL); - + /* When ret value shows failure of allocation failure, +* the nic is rebooted again. If open still fails, a error +* message to notify the user. +*/ + if (bnx2x_nic_load(bp, LOAD_NORMAL) == -ENOMEM) { + bnx2x_nic_unload(bp, UNLOAD_NORMAL, true); + if (bnx2x_nic_load(bp, LOAD_NORMAL)) + BNX2X_ERR("Open the NIC fails again!\n"); + } rtnl_unlock(); return; } -- 2.7.4
( United Nations Compensation Unit )
United Nations Compensation Unit, In Affiliation with World Bank Our Ref: U.N.O/W.B.O/11/11/2017/1982/09/05. Congratulations Beneficiary, We have been working closely with the INTERPOL, CIA, FBI and other foreign international organizations as well as Western Union and Money Gram regarding all payments you have made in the past and we have the complete lists and amount you have made so far. However, until this moment you have failed to receive your payment. We must get you informed that the previous official you have dealt with have been apprehended and would soon be charged to court and brought to justice. We have been having a meeting for quit sometime now and we just came to a logical conclusion 72 hours ago in affiliation with the World Bank president. Your email was listed among those that are yet to receive their compensation payment. The United Nations in Affiliation with World Bank have agreed to compensate them with the sum of USD1,500, 000.00 (One Million Five Hundred Thousand United States Dollars) only. For this reason, you are to receive your payment through a certified ATM MASTER CARD. Note, with this Master Card you can withdraw money from any part of the World without being disturbed or delay and please for no reason should you disclose your account information as your account information is not and can never be needed before you receive your card payment. All that is required of your now is to contact our 100% trust officials by the Name of Mrs. Sarah Ngene. Below is her contact information: Name: Mrs. Sarah Ngene Email: sarahng...@zenith-bank-plc.ugu.pl Email: sarah.ng...@hotmail.com Please ensure that you follow the directives and instructions of Mrs. Sarah Ngene so that within 72 hours you would have received your card payment and your secret pin code issued directly to you for security reasons. We apologize on behalf of the United Nation Organization for any delay you might have encountered in receiving your fund in the past. Congratulations, and I look forward to hear from you as soon as you confirm your payment making the world a better place. Yours Faithfully, Marie Chatardova Under-Secretary-General for Economic and Social Council
[PATCH net-next 0/5] net: improve the process of redirect and toobig for ipv6 tunnels
Now let's say there are 3 kinds of icmp packets to process for tunnels, toobig(needfrag), redirect, others, their process should be: - toobig(needfrag) update the lower dst's pmtu by route cache, also update sk dst's pmtu if possible, or it will be fine if sk dst pmtu will get updated on tx path. - redirect update the lower dst's gw by route cache and return, no need to send this redirect packet to user sk. - others send the packet to user's sk, or it will also be fine to use err_count to count it and report fail link on tx path. All ipv4 tunnels basically follow this while some of ipv6 tunnels are doing in different ways, like ip6gre and ip6_tunnels update tnl dev's mtu instead of updating lower dst pmtu, no redirect process on their err_handlers, which doesn't make any sense and even causes performance problems. This patchset is to improve the process of redirect and toobig for ip6gre ip4ip6, ip6ip6 tunnels, as in ipv4 tunnels. Xin Long (5): ip6_gre: add the process for redirect in ip6gre_err ip6_gre: process toobig in a better way ip6_tunnel: add the process for redirect in ip6_tnl_err ip6_tunnel: process toobig in a better way ip6_tunnel: clean up ip4ip6 and ip6ip6's err_handlers net/ipv6/ip6_gre.c| 20 ++-- net/ipv6/ip6_tunnel.c | 64 ++- 2 files changed, 34 insertions(+), 50 deletions(-) -- 2.1.0
[PATCH net-next 3/5] ip6_tunnel: add the process for redirect in ip6_tnl_err
The same process for redirect in "ip6_gre: add the process for redirect in ip6gre_err" is needed by ip4ip6 and ip6ip6 as well. Signed-off-by: Xin Long--- net/ipv6/ip6_tunnel.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c index 439d65f..a1f704c 100644 --- a/net/ipv6/ip6_tunnel.c +++ b/net/ipv6/ip6_tunnel.c @@ -471,15 +471,16 @@ static int ip6_tnl_err(struct sk_buff *skb, __u8 ipproto, struct inet6_skb_parm *opt, u8 *type, u8 *code, int *msg, __u32 *info, int offset) { - const struct ipv6hdr *ipv6h = (const struct ipv6hdr *) skb->data; - struct ip6_tnl *t; - int rel_msg = 0; + const struct ipv6hdr *ipv6h = (const struct ipv6hdr *)skb->data; + struct net *net = dev_net(skb->dev); u8 rel_type = ICMPV6_DEST_UNREACH; u8 rel_code = ICMPV6_ADDR_UNREACH; - u8 tproto; __u32 rel_info = 0; - __u16 len; + struct ip6_tnl *t; int err = -ENOENT; + int rel_msg = 0; + u8 tproto; + __u16 len; /* If the packet doesn't contain the original IPv6 header we are in trouble since we might need the source address for further @@ -543,6 +544,10 @@ ip6_tnl_err(struct sk_buff *skb, __u8 ipproto, struct inet6_skb_parm *opt, rel_msg = 1; } break; + case NDISC_REDIRECT: + ip6_redirect(skb, net, skb->dev->ifindex, 0, +sock_net_uid(net, NULL)); + break; } *type = rel_type; -- 2.1.0
[PATCH net-next 5/5] ip6_tunnel: clean up ip4ip6 and ip6ip6's err_handlers
This patch is to remove some useless codes of redirect and fix some indents on ip4ip6 and ip6ip6's err_handlers. Note that redirect icmp packet is already processed in ip6_tnl_err, the old redirect codes in ip4ip6_err actually never worked even before this patch. Besides, there's no need to send redirect to user's sk, it's for lower dst, so just remove it in this patch. Signed-off-by: Xin Long--- net/ipv6/ip6_tunnel.c | 42 ++ 1 file changed, 14 insertions(+), 28 deletions(-) diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c index 7e9e205..00882fd 100644 --- a/net/ipv6/ip6_tunnel.c +++ b/net/ipv6/ip6_tunnel.c @@ -563,13 +563,12 @@ static int ip4ip6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, u8 type, u8 code, int offset, __be32 info) { - int rel_msg = 0; - u8 rel_type = type; - u8 rel_code = code; __u32 rel_info = ntohl(info); - int err; - struct sk_buff *skb2; const struct iphdr *eiph; + struct sk_buff *skb2; + int err, rel_msg = 0; + u8 rel_type = type; + u8 rel_code = code; struct rtable *rt; struct flowi4 fl4; @@ -594,10 +593,6 @@ ip4ip6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, rel_type = ICMP_DEST_UNREACH; rel_code = ICMP_FRAG_NEEDED; break; - case NDISC_REDIRECT: - rel_type = ICMP_REDIRECT; - rel_code = ICMP_REDIR_HOST; - /* fall through */ default: return 0; } @@ -616,33 +611,26 @@ ip4ip6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, eiph = ip_hdr(skb2); /* Try to guess incoming interface */ - rt = ip_route_output_ports(dev_net(skb->dev), , NULL, - eiph->saddr, 0, - 0, 0, - IPPROTO_IPIP, RT_TOS(eiph->tos), 0); + rt = ip_route_output_ports(dev_net(skb->dev), , NULL, eiph->saddr, + 0, 0, 0, IPPROTO_IPIP, RT_TOS(eiph->tos), 0); if (IS_ERR(rt)) goto out; skb2->dev = rt->dst.dev; + ip_rt_put(rt); /* route "incoming" packet */ if (rt->rt_flags & RTCF_LOCAL) { - ip_rt_put(rt); - rt = NULL; rt = ip_route_output_ports(dev_net(skb->dev), , NULL, - eiph->daddr, eiph->saddr, - 0, 0, - IPPROTO_IPIP, - RT_TOS(eiph->tos), 0); - if (IS_ERR(rt) || - rt->dst.dev->type != ARPHRD_TUNNEL) { + eiph->daddr, eiph->saddr, 0, 0, + IPPROTO_IPIP, RT_TOS(eiph->tos), 0); + if (IS_ERR(rt) || rt->dst.dev->type != ARPHRD_TUNNEL) { if (!IS_ERR(rt)) ip_rt_put(rt); goto out; } skb_dst_set(skb2, >dst); } else { - ip_rt_put(rt); if (ip_route_input(skb2, eiph->daddr, eiph->saddr, eiph->tos, skb2->dev) || skb_dst(skb2)->dev->type != ARPHRD_TUNNEL) @@ -654,10 +642,9 @@ ip4ip6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, if (rel_info > dst_mtu(skb_dst(skb2))) goto out; - skb_dst(skb2)->ops->update_pmtu(skb_dst(skb2), NULL, skb2, rel_info); + skb_dst(skb2)->ops->update_pmtu(skb_dst(skb2), NULL, skb2, + rel_info); } - if (rel_type == ICMP_REDIRECT) - skb_dst(skb2)->ops->redirect(skb_dst(skb2), NULL, skb2); icmp_send(skb2, rel_type, rel_code, htonl(rel_info)); @@ -670,11 +657,10 @@ static int ip6ip6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, u8 type, u8 code, int offset, __be32 info) { - int rel_msg = 0; + __u32 rel_info = ntohl(info); + int err, rel_msg = 0; u8 rel_type = type; u8 rel_code = code; - __u32 rel_info = ntohl(info); - int err; err = ip6_tnl_err(skb, IPPROTO_IPV6, opt, _type, _code, _msg, _info, offset); -- 2.1.0
Re: [PATCH 1/2] bpf: add a bpf_override_function helper
On Sat, Nov 11, 2017 at 09:14:55AM +0100, Ingo Molnar wrote: > > * Josef Bacikwrote: > > > On Fri, Nov 10, 2017 at 10:34:59AM +0100, Ingo Molnar wrote: > > > > > > * Josef Bacik wrote: > > > > > > > @@ -551,6 +578,10 @@ static const struct bpf_func_proto > > > > *kprobe_prog_func_proto(enum bpf_func_id func > > > > return _get_stackid_proto; > > > > case BPF_FUNC_perf_event_read_value: > > > > return _perf_event_read_value_proto; > > > > + case BPF_FUNC_override_return: > > > > + pr_warn_ratelimited("%s[%d] is installing a program > > > > with bpf_override_return helper that may cause unexpected behavior!", > > > > + current->comm, > > > > task_pid_nr(current)); > > > > + return _override_return_proto; > > > > > > So if this new functionality is used we'll always print this into the > > > syslog? > > > > > > The warning is also a bit passive aggressive about informing the user: > > > what > > > unexpected behavior can happen, what is the worst case? > > > > > > > It's modeled after the other warnings bpf will spit out, but with this > > feature > > you are skipping a function and instead returning some arbitrary value, so > > anything could go wrong if you mess something up. For instance I screwed > > up my > > initial test case and made every IO submitted return an error instead of > > just on > > the one file system I was attempting to test, so all sorts of hilarity > > ensued. > > Ok, then for the x86 bits: > > NAK-ed-by: Ingo Molnar > > One of the major advantages of having an in-kernel BPF sandbox is to never > crash > the kernel - and allowing BPF programs to just randomly modify the return > value of > kernel functions sounds immensely broken to me. > > (And yes, I realize that kprobes are used here as a vehicle, but the point > remains.) > Only root can use this feature, and did you read the first email? The whole point of this is that error path checkig fucking sucks, and this gives us the ability to systematically check our error paths and make the kernel way more robust than it currently is. Can things go wrong? Sure, that's why its a config option and root only. You only want to turn this on for testing and not have it on in production. This is a valuable tool and well worth the risk. Thanks, Josef
Re: [PATCH v2 net-next 1/3] netem: convert to qdisc_watchdog_schedule_ns
Hi Dave, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on net-next/master] url: https://github.com/0day-ci/linux/commits/Dave-Taht/netem-convert-to-qdisc_watchdog_schedule_ns/2017-184934 config: xtensa-allyesconfig (attached as .config) compiler: xtensa-linux-gcc (GCC) 4.9.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=xtensa All warnings (new ones prefixed by >>): In file included from ./arch/xtensa/include/generated/asm/div64.h:1:0, from include/linux/kernel.h:173, from include/asm-generic/bug.h:16, from ./arch/xtensa/include/generated/asm/bug.h:1, from include/linux/bug.h:5, from include/linux/mmdebug.h:5, from include/linux/mm.h:9, from net/sched/sch_netem.c:16: net/sched/sch_netem.c: In function 'packet_len_2_sched_time': include/asm-generic/div64.h:208:28: warning: comparison of distinct pointer types lacks a cast (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \ ^ >> net/sched/sch_netem.c:349:2: note: in expansion of macro 'do_div' do_div(offset, q->rate); ^ vim +/do_div +349 net/sched/sch_netem.c 334 335 static s64 packet_len_2_sched_time(unsigned int len, 336 struct netem_sched_data *q) 337 { 338 s64 offset; 339 len += q->packet_overhead; 340 341 if (q->cell_size) { 342 u32 cells = reciprocal_divide(len, q->cell_size_reciprocal); 343 344 if (len > cells * q->cell_size) /* extra cell needed for remainder */ 345 cells++; 346 len = cells * (q->cell_size + q->cell_overhead); 347 } 348 offset = (s64)len * NSEC_PER_SEC; > 349 do_div(offset, q->rate); 350 return offset; 351 } 352 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH net-next v3 0/3] net: dsa: b53: Turn on Broadcom tags
From: Florian FainelliDate: Fri, 10 Nov 2017 11:33:24 -0800 > Hi all, > > This was long overdue, with this patch series, the b53 driver now > turns on Broadcom tags except for 5325 and 5365 which use an older > format that we do not support yet (TBD). > > First patch is necessary in order for bgmac, used on BCM5301X and Northstar > Plus to work correctly and successfully send ARP packets back to the > requsester. > > Second patch is actually a bug fix, but because net/master and net-next/master > diverge in that area, I am targeting net-next/master here. > > Finally, the last patch enables Broadcom tags after checking that the CPU port > selected is either, 5, 7 or 8, since those are the only valid combinations > given currently supported HW. ... Series applied.
Re: pull-request: wireless-drivers-next 2017-11-11
From: Kalle ValoDate: Sat, 11 Nov 2017 15:03:14 +0200 > some more patches to net-next for v4.15. Even though I applied the last > patch only on Saturday morning, all these have been tested by kbuild bot > and most of them should also be in linux-next. Please let me know if > there are any problems. Pulled, but looking at your merge commit message: > Major changes: > > iwlwifi > > * some new PCI IDs I doubt this was the only major change in here :-)))
Re: [PATCH net-next 4/4] net: dsa: b53: Support prepended Broadcom tags
On Fri, Nov 10, 2017 at 03:22:55PM -0800, Florian Fainelli wrote: > On BCM58xx devices (Northstar Plus), there is an accelerator attached to > port 8 which would only work if we use prepended Broadcom tags. Resolve > that difference in our get_tag_protocol() function by setting the > appropriate tagging protocol in that case. We need to change > b53_brcm_hdr_setup() a little bit now since we can deal with two types > of Broadcom tags. > > Signed-off-by: Florian FainelliReviewed-by: Andrew Lunn Andrew
Re: [PATCH net-next 3/4] net: dsa: Support prepended Broadcom tag
On Fri, Nov 10, 2017 at 03:22:54PM -0800, Florian Fainelli wrote: > Add a new type: DSA_TAG_PROTO_PREPEND which allows us to support for the > 4-bytes Broadcom tag that we already support, but in a format where it > is pre-pended to the packet instead of located between the MAC SA and > the Ethertyper (DSA_TAG_PROTO_BRCM). > > Signed-off-by: Florian FainelliReviewed-by: Andrew Lunn Andrew
Re: [PATCH v2 net-next 1/3] netem: convert to qdisc_watchdog_schedule_ns
Hi Dave, Thank you for the patch! Yet something to improve: [auto build test ERROR on net-next/master] url: https://github.com/0day-ci/linux/commits/Dave-Taht/netem-convert-to-qdisc_watchdog_schedule_ns/2017-184934 config: i386-randconfig-i1-201745 (attached as .config) compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026 reproduce: # save the attached .config to linux build tree make ARCH=i386 All errors (new ones prefixed by >>): net/sched/sch_netem.o: In function `netem_enqueue': >> net/sched/sch_netem.c:323: undefined reference to `__moddi3' vim +323 net/sched/sch_netem.c 661b7972 stephen hemminger 2011-02-23 302 661b7972 stephen hemminger 2011-02-23 303 ^1da177e Linus Torvalds2005-04-16 304 /* tabledist - return a pseudo-randomly distributed value with mean mu and ^1da177e Linus Torvalds2005-04-16 305 * std deviation sigma. Uses table lookup to approximate the desired ^1da177e Linus Torvalds2005-04-16 306 * distribution, and a uniformly-distributed pseudo-random source. ^1da177e Linus Torvalds2005-04-16 307 */ 9d0cec66 Dave Taht 2017-11-08 308 static s64 tabledist(s64 mu, s64 sigma, b407621c Stephen Hemminger 2007-03-22 309 struct crndstate *state, b407621c Stephen Hemminger 2007-03-22 310 const struct disttable *dist) ^1da177e Linus Torvalds2005-04-16 311 { 9d0cec66 Dave Taht 2017-11-08 312 s64 x; b407621c Stephen Hemminger 2007-03-22 313 long t; b407621c Stephen Hemminger 2007-03-22 314 u32 rnd; ^1da177e Linus Torvalds2005-04-16 315 ^1da177e Linus Torvalds2005-04-16 316 if (sigma == 0) ^1da177e Linus Torvalds2005-04-16 317 return mu; ^1da177e Linus Torvalds2005-04-16 318 ^1da177e Linus Torvalds2005-04-16 319 rnd = get_crandom(state); ^1da177e Linus Torvalds2005-04-16 320 ^1da177e Linus Torvalds2005-04-16 321 /* default uniform distribution */ ^1da177e Linus Torvalds2005-04-16 322 if (dist == NULL) ^1da177e Linus Torvalds2005-04-16 @323 return (rnd % (2*sigma)) - sigma + mu; ^1da177e Linus Torvalds2005-04-16 324 ^1da177e Linus Torvalds2005-04-16 325 t = dist->table[rnd % dist->size]; ^1da177e Linus Torvalds2005-04-16 326 x = (sigma % NETEM_DIST_SCALE) * t; ^1da177e Linus Torvalds2005-04-16 327 if (x >= 0) ^1da177e Linus Torvalds2005-04-16 328 x += NETEM_DIST_SCALE/2; ^1da177e Linus Torvalds2005-04-16 329 else ^1da177e Linus Torvalds2005-04-16 330 x -= NETEM_DIST_SCALE/2; ^1da177e Linus Torvalds2005-04-16 331 ^1da177e Linus Torvalds2005-04-16 332 return x / NETEM_DIST_SCALE + (sigma / NETEM_DIST_SCALE) * t + mu; ^1da177e Linus Torvalds2005-04-16 333 } ^1da177e Linus Torvalds2005-04-16 334 :: The code at line 323 was first introduced by commit :: 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 Linux-2.6.12-rc2 :: TO: Linus Torvalds <torva...@ppc970.osdl.org> :: CC: Linus Torvalds <torva...@ppc970.osdl.org> --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH net-next 0/2] net: dsa: lan9303: IGMP handling
From: Egil HjelmelandDate: Fri, 10 Nov 2017 12:54:33 +0100 > Set up the HW switch to trap IGMP packets to CPU port. > And make sure skb->offload_fwd_mark is cleared for incoming IGMP packets. > > skb->offload_fwd_mark calculation is a candidate for consolidation into the > DSA core. The calculation can probably be more polished when done at a point > where DSA has updated skb. Series applied, thank you.
Re: [PATCH v4] af_netlink: ensure that NLMSG_DONE never fails in dumps
> > If you're handling this by forcing another read() to procude the > > NLMSG_DONE, then you have no reason to WARN_ON() here. > > > > In fact you are adding a WARN_ON() which is trivially triggerable by > > any user. > > I added this in my suggestion for how this could work, but I don't > think you're right, since we previously check if there's enough space. Or perhaps I should say this differently: Forcing another read happens through the skb_tailroom(skb) < nlmsg_total_size(...) check, so the nlmsg_put_answer() can't really fail. Handling nlmsg_put_answer() failures by forcing another read would have required jumping to the existing if code with a goto, or restructuring the whole thing completely somehow, and I didn't see how to do that. johannes
Re: [run_timer_softirq] BUG: unable to handle kernel paging request at 0000000000010007
On Fri, Nov 10, 2017 at 10:29:59PM +0100, Thomas Gleixner wrote: On Fri, 10 Nov 2017, Linus Torvalds wrote: On Wed, Nov 8, 2017 at 9:19 PM, Fengguang Wuwrote: > > Yes it's accessing the list. Here is the faddr2line output. Ok, so it's a corrupted timer list. Which is not a big surprise. It's next->pprev = pprev; in __hlist_del(), and the trapping instruction decodes as mov%rdx,0x8(%rax) with %rax having the value dead0200, Which is just LIST_POISON2. So we've deleted that entry twice - LIST_POISON2 is what hlist_del() sets pprev to after already deleting it once. Although in this case it might not be hlist_del(), because detach_timer() also sets entry->next to LIST_POISON2. Which is pretty bogus, we are supposed to use LIST_POISON1 for the "next" pointer. Oh well. Nobody cares, except for the list entry debugging code, which isn't run on the hlist cases. Adding Thomas Gleixner to the cc. It should not be possible to delete the same timer twice. Right, it shouldn't. Fengguang, can you please enable: CONFIG_DEBUG_OBJECTS CONFIG_DEBUG_OBJECTS_TIMERS and try to reproduce? Debugobject should catch that hopefully. Sure. However I've not got any results until now -- it's rather hard to reproduce. I'll check possible results tomorrow. Regards, Fengguang
Re: [PATCH v4] af_netlink: ensure that NLMSG_DONE never fails in dumps
From: Johannes BergDate: Sat, 11 Nov 2017 15:15:21 +0100 > On Sat, 2017-11-11 at 23:09 +0900, David Miller wrote: >> From: "Jason A. Donenfeld" >> Date: Thu, 9 Nov 2017 13:04:44 +0900 >> >> > @@ -2195,13 +2197,15 @@ static int netlink_dump(struct sock *sk) >> > return 0; >> > } >> > >> > - nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), >> > NLM_F_MULTI); >> > - if (!nlh) >> > + nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, >> > +sizeof(nlk->dump_done_errno), NLM_F_MULTI); >> > + if (WARN_ON(!nlh)) >> > goto errout_skb; >> >> If you're handling this by forcing another read() to procude the >> NLMSG_DONE, then you have no reason to WARN_ON() here. >> >> In fact you are adding a WARN_ON() which is trivially triggerable by >> any user. > > I added this in my suggestion for how this could work, but I don't > think you're right, since we previously check if there's enough space. > The patch is missing the full context, but this is: ... > So unless the nlmsg_total_size() vs. nlmsg_put_answer() suddenly gets a > different idea of how much space is needed, nlh shouldn't ever be NULL > once we get here. Aha, that's what I missed. Indeed, it cannot happen. My bad.
Re: [PATCH v4] af_netlink: ensure that NLMSG_DONE never fails in dumps
On Sat, Nov 11, 2017 at 11:18 PM, Johannes Bergwrote: > >> > If you're handling this by forcing another read() to procude the >> > NLMSG_DONE, then you have no reason to WARN_ON() here. >> > >> > In fact you are adding a WARN_ON() which is trivially triggerable by >> > any user. >> >> I added this in my suggestion for how this could work, but I don't >> think you're right, since we previously check if there's enough space. > > Or perhaps I should say this differently: > > Forcing another read happens through the > > skb_tailroom(skb) < nlmsg_total_size(...) > > check, so the nlmsg_put_answer() can't really fail. > > > Handling nlmsg_put_answer() failures by forcing another read would have > required jumping to the existing if code with a goto, or restructuring > the whole thing completely somehow, and I didn't see how to do that. Exactly. And if something _does_ go wrong in our logic, and we can't add NLMSG_DONE, we really do want people to report this to us, since dumps must always end that way. We'd probably have caught this a number of years ago when userspace developers were twiddling with their receive buffers if we had had the WARN_ON. Nice suggestion from Johannes.
[PATCH net-next 0/3] bpf: improve verifier ARG_CONST_SIZE_OR_ZERO semantics
This patch set intends to change verifier ARG_CONST_SIZE_OR_ZERO semantics so that simpler bpf programs can be written with verifier acceptance. Patch #1 comment provided the detailed examples and the patch itself implements the new semantics. Patch #2 changes bpf_probe_read helper arg2 type from ARG_CONST_SIZE to ARG_CONST_SIZE_OR_ZERO. Patch #3 fixed a few test cases and added some for better coverage. Yonghong Song (3): bpf: improve verifier ARG_CONST_SIZE_OR_ZERO semantics bpf: change helper bpf_probe_read arg2 type to ARG_CONST_SIZE_OR_ZERO bpf: fix and add test cases for ARG_CONST_SIZE_OR_ZERO semantics change kernel/bpf/verifier.c | 40 + kernel/trace/bpf_trace.c| 8 +- tools/testing/selftests/bpf/test_verifier.c | 131 3 files changed, 142 insertions(+), 37 deletions(-) -- 2.9.5
[PATCH net-next 3/3] bpf: fix and add test cases for ARG_CONST_SIZE_OR_ZERO semantics change
Fix a few test cases to allow non-NULL map/packet/stack pointer with size = 0. Change a few tests using bpf_probe_read to use bpf_probe_write_user so ARG_CONST_SIZE arg can still be properly tested. One existing test case already covers size = 0 with non-NULL packet pointer, so add additional tests so all cases of size = 0 and 0 <= size <= legal_upper_bound with non-NULL map/packet/stack pointer are covered. Signed-off-by: Yonghong SongAcked-by: Alexei Starovoitov --- tools/testing/selftests/bpf/test_verifier.c | 131 1 file changed, 112 insertions(+), 19 deletions(-) diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c index bb3c4ad..bf092b8 100644 --- a/tools/testing/selftests/bpf/test_verifier.c +++ b/tools/testing/selftests/bpf/test_verifier.c @@ -3579,7 +3579,7 @@ static struct bpf_test tests[] = { .prog_type = BPF_PROG_TYPE_SCHED_CLS, }, { - "helper access to packet: test19, cls helper fail range zero", + "helper access to packet: test19, cls helper range zero", .insns = { BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1, offsetof(struct __sk_buff, data)), @@ -3599,8 +3599,7 @@ static struct bpf_test tests[] = { BPF_MOV64_IMM(BPF_REG_0, 0), BPF_EXIT_INSN(), }, - .result = REJECT, - .errstr = "invalid access to packet", + .result = ACCEPT, .prog_type = BPF_PROG_TYPE_SCHED_CLS, }, { @@ -4379,10 +4378,10 @@ static struct bpf_test tests[] = { BPF_LD_MAP_FD(BPF_REG_1, 0), BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem), BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 4), - BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), - BPF_MOV64_IMM(BPF_REG_2, 0), + BPF_MOV64_IMM(BPF_REG_1, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_0), BPF_MOV64_IMM(BPF_REG_3, 0), - BPF_EMIT_CALL(BPF_FUNC_probe_read), + BPF_EMIT_CALL(BPF_FUNC_probe_write_user), BPF_EXIT_INSN(), }, .fixup_map2 = { 3 }, @@ -4486,9 +4485,10 @@ static struct bpf_test tests[] = { BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, offsetof(struct test_val, foo)), - BPF_MOV64_IMM(BPF_REG_2, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_1), + BPF_MOV64_IMM(BPF_REG_1, 0), BPF_MOV64_IMM(BPF_REG_3, 0), - BPF_EMIT_CALL(BPF_FUNC_probe_read), + BPF_EMIT_CALL(BPF_FUNC_probe_write_user), BPF_EXIT_INSN(), }, .fixup_map2 = { 3 }, @@ -4622,13 +4622,14 @@ static struct bpf_test tests[] = { BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), BPF_MOV64_IMM(BPF_REG_3, 0), BPF_ALU64_REG(BPF_ADD, BPF_REG_1, BPF_REG_3), - BPF_MOV64_IMM(BPF_REG_2, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_1), + BPF_MOV64_IMM(BPF_REG_1, 0), BPF_MOV64_IMM(BPF_REG_3, 0), - BPF_EMIT_CALL(BPF_FUNC_probe_read), + BPF_EMIT_CALL(BPF_FUNC_probe_write_user), BPF_EXIT_INSN(), }, .fixup_map2 = { 3 }, - .errstr = "R1 min value is outside of the array range", + .errstr = "R2 min value is outside of the array range", .result = REJECT, .prog_type = BPF_PROG_TYPE_TRACEPOINT, }, @@ -4765,13 +4766,14 @@ static struct bpf_test tests[] = { BPF_JMP_IMM(BPF_JGT, BPF_REG_3, offsetof(struct test_val, foo), 4), BPF_ALU64_REG(BPF_ADD, BPF_REG_1, BPF_REG_3), - BPF_MOV64_IMM(BPF_REG_2, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_1), + BPF_MOV64_IMM(BPF_REG_1, 0), BPF_MOV64_IMM(BPF_REG_3, 0), - BPF_EMIT_CALL(BPF_FUNC_probe_read), + BPF_EMIT_CALL(BPF_FUNC_probe_write_user), BPF_EXIT_INSN(), }, .fixup_map2 = { 3 }, - .errstr = "R1 min value is outside of the array range", + .errstr = "R2 min value is outside of the array range", .result = REJECT, .prog_type = BPF_PROG_TYPE_TRACEPOINT, }, @@ -5350,7
Re: [net-next:master 488/665] verifier.c:undefined reference to `__multi3'
On Sun, Nov 12, 2017 at 09:14:14AM +0800, Alexei Starovoitov wrote: On 11/12/17 8:23 AM, kbuild test robot wrote: tree: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master head: 7c5556decd0a629e9ee02e93653f75ba7b7da03c commit: 638f5b90d46016372a8e3e0a434f199cc5e12b8c [488/665] bpf: reduce verifier memory consumption config: mips-64r6el_defconfig (attached as .config) compiler: mips64el-linux-gnuabi64-gcc (Debian 7.2.0-11) 7.2.0 reproduce: git checkout 638f5b90d46016372a8e3e0a434f199cc5e12b8c # save the attached .config to linux build tree make.cross ARCH=mips All errors (new ones prefixed by >>): kernel/bpf/verifier.o: In function `realloc_verifier_state.isra.19': verifier.c:(.text+0x36fc): undefined reference to `__multi3' that's a known issue with gcc 7 on mips that is "optimizing" normal 64-bit multiply into 128-bit variant. Nothing to fix on the kernel side. Good to know that! Do you think it a good idea to blacklist __multi3 errors in mips builds? Thanks, Fengguang crypto/scompress.o: In function `.L82': scompress.c:(.text+0x55c): undefined reference to `__multi3' lib/mpi/generic_mpih-mul1.o: In function `.L2': generic_mpih-mul1.c:(.text+0x60): undefined reference to `__multi3' lib/mpi/generic_mpih-mul2.o: In function `.L2': generic_mpih-mul2.c:(.text+0x5c): undefined reference to `__multi3' lib/mpi/generic_mpih-mul3.o: In function `.L2': generic_mpih-mul3.c:(.text+0x5c): undefined reference to `__multi3' lib/mpi/mpih-div.o:mpih-div.c:(.text+0x1b8): more undefined references to `__multi3' follow
Re: [net-next:master 488/665] verifier.c:undefined reference to `__multi3'
On 11/12/17 9:18 AM, Fengguang Wu wrote: On Sun, Nov 12, 2017 at 09:14:14AM +0800, Alexei Starovoitov wrote: On 11/12/17 8:23 AM, kbuild test robot wrote: tree: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master head: 7c5556decd0a629e9ee02e93653f75ba7b7da03c commit: 638f5b90d46016372a8e3e0a434f199cc5e12b8c [488/665] bpf: reduce verifier memory consumption config: mips-64r6el_defconfig (attached as .config) compiler: mips64el-linux-gnuabi64-gcc (Debian 7.2.0-11) 7.2.0 reproduce: git checkout 638f5b90d46016372a8e3e0a434f199cc5e12b8c # save the attached .config to linux build tree make.cross ARCH=mips All errors (new ones prefixed by >>): kernel/bpf/verifier.o: In function `realloc_verifier_state.isra.19': verifier.c:(.text+0x36fc): undefined reference to `__multi3' that's a known issue with gcc 7 on mips that is "optimizing" normal 64-bit multiply into 128-bit variant. Nothing to fix on the kernel side. Good to know that! Do you think it a good idea to blacklist __multi3 errors in mips builds? I would do so. yes. Though digging further this function was added to arch/sparc/lib/multi3.S since gcc doing the same "optimization" there. Adding asm code doesn't look right to me. I'd rather push gcc folks to avoid such codegen.
[net-next:master 488/665] verifier.c:undefined reference to `__multi3'
tree: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master head: 7c5556decd0a629e9ee02e93653f75ba7b7da03c commit: 638f5b90d46016372a8e3e0a434f199cc5e12b8c [488/665] bpf: reduce verifier memory consumption config: mips-64r6el_defconfig (attached as .config) compiler: mips64el-linux-gnuabi64-gcc (Debian 7.2.0-11) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross git checkout 638f5b90d46016372a8e3e0a434f199cc5e12b8c # save the attached .config to linux build tree make.cross ARCH=mips All errors (new ones prefixed by >>): kernel/bpf/verifier.o: In function `realloc_verifier_state.isra.19': >> verifier.c:(.text+0x36fc): undefined reference to `__multi3' crypto/scompress.o: In function `.L82': scompress.c:(.text+0x55c): undefined reference to `__multi3' lib/mpi/generic_mpih-mul1.o: In function `.L2': generic_mpih-mul1.c:(.text+0x60): undefined reference to `__multi3' lib/mpi/generic_mpih-mul2.o: In function `.L2': generic_mpih-mul2.c:(.text+0x5c): undefined reference to `__multi3' lib/mpi/generic_mpih-mul3.o: In function `.L2': generic_mpih-mul3.c:(.text+0x5c): undefined reference to `__multi3' lib/mpi/mpih-div.o:mpih-div.c:(.text+0x1b8): more undefined references to `__multi3' follow --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH] ethernet: cavium: octeon: Switch to using netdev_info().
On Wed, 2017-10-25 at 14:41 +0800, kbuild test robot wrote: > Hi Steven, > > [auto build test WARNING on net-next/master] > [also build test WARNING on v4.14-rc6] > [if your patch is applied to the wrong git tree, please drop us a note to > help improve the system] > > url: > https://github.com/0day-ci/linux/commits/Steven-J-Hill/ethernet-cavium-octeon-Switch-to-using-netdev_info/20171024-071910 > config: mips-cavium_octeon_defconfig (attached as .config) > compiler: mips64-linux-gnuabi64-gcc (Debian 6.1.1-9) 6.1.1 20160705 > reproduce: > wget > https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O > ~/bin/make.cross > chmod +x ~/bin/make.cross > # save the attached .config to linux build tree > make.cross ARCH=mips > > All warnings (new ones prefixed by >>): > >drivers/net/ethernet/cavium/octeon/octeon_mgmt.c: In function > 'octeon_mgmt_adjust_link': > > > drivers/net/ethernet/cavium/octeon/octeon_mgmt.c:929:5: warning: suggest > > > explicit braces to avoid ambiguous 'else' [-Wparentheses] > > if (link_changed != 0) > ^ > > vim +/else +929 drivers/net/ethernet/cavium/octeon/octeon_mgmt.c > >896 >897static void octeon_mgmt_adjust_link(struct net_device *netdev) >898{ >899struct octeon_mgmt *p = netdev_priv(netdev); >900struct phy_device *phydev = netdev->phydev; >901unsigned long flags; >902int link_changed = 0; >903 >904if (!phydev) >905return; >906 >907spin_lock_irqsave(>lock, flags); >908 >909 >910if (!phydev->link && p->last_link) >911link_changed = -1; >912 >913if (phydev->link && >914(p->last_duplex != phydev->duplex || >915 p->last_link != phydev->link || >916 p->last_speed != phydev->speed)) { >917octeon_mgmt_disable_link(p); >918link_changed = 1; >919octeon_mgmt_update_link(p); >920octeon_mgmt_enable_link(p); >921} >922 >923p->last_link = phydev->link; >924p->last_speed = phydev->speed; >925p->last_duplex = phydev->duplex; >926 >927spin_unlock_irqrestore(>lock, flags); >928 > > 929if (link_changed != 0) >930if (link_changed > 0) >931netdev_info(netdev, "Link is up - > %d/%s\n", >932phydev->speed, phydev->duplex > == DUPLEX_FULL ? "Full" : "Half"); >933else >934netdev_info(netdev, "Link is down\n"); >935} >936 I think this would be better as if (!phydev_link) { if (p->last_link) link_changed = -1; } else if (p->last_duplex != phydev->duplex || p->last_link != phydev->link || p->last_speed != phydev->speed) { link_changed = 1; octeon_mgnt_disable_link(p); octeon_mgnt_update_link(p); octeon_mgnt_enable_link(p); } ... if (link_changed > 0) netdev_info(netdev, "Link is up - %d/%s\n", phydev->speed, phydev->duplex == DUPLEX_FULL ? "Full" : "Half"); else if (link_changed < 0) netdev_info(netdev, "Link is down\n");
Re: [net-next:master 488/665] verifier.c:undefined reference to `__multi3'
Le 11/11/17 à 17:34, Fengguang Wu a écrit : > On Sun, Nov 12, 2017 at 09:23:52AM +0800, Alexei Starovoitov wrote: >> On 11/12/17 9:18 AM, Fengguang Wu wrote: >>> On Sun, Nov 12, 2017 at 09:14:14AM +0800, Alexei Starovoitov wrote: On 11/12/17 8:23 AM, kbuild test robot wrote: > tree: > https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git > master > head: 7c5556decd0a629e9ee02e93653f75ba7b7da03c > commit: 638f5b90d46016372a8e3e0a434f199cc5e12b8c [488/665] bpf: > reduce verifier memory consumption > config: mips-64r6el_defconfig (attached as .config) > compiler: mips64el-linux-gnuabi64-gcc (Debian 7.2.0-11) 7.2.0 > reproduce: > git checkout 638f5b90d46016372a8e3e0a434f199cc5e12b8c > # save the attached .config to linux build tree > make.cross ARCH=mips > > All errors (new ones prefixed by >>): > > kernel/bpf/verifier.o: In function > `realloc_verifier_state.isra.19': >>> verifier.c:(.text+0x36fc): undefined reference to `__multi3' that's a known issue with gcc 7 on mips that is "optimizing" normal 64-bit multiply into 128-bit variant. Nothing to fix on the kernel side. >>> >>> Good to know that! Do you think it a good idea to blacklist __multi3 >>> errors in mips builds? >> >> I would do so. yes. > > OK. > >> Though digging further this function was added to >> arch/sparc/lib/multi3.S >> since gcc doing the same "optimization" there. >> Adding asm code doesn't look right to me. I'd rather push >> gcc folks to avoid such codegen. > > Sure, I just forwarded the original report to GCC list. Thomas encountered a similar problem, reported on linux-mips here: https://www.linux-mips.org/archives/linux-mips/2017-08/msg00041.html -- Florian
[PATCH v2 net-next] tcp: allow drivers to tweak TSQ logic
From: Eric DumazetI had many reports that TSQ logic breaks wifi aggregation. Current logic is to allow up to 1 ms of bytes to be queued into qdisc and drivers queues. But Wifi aggregation needs a bigger budget to allow bigger rates to be discovered by various TCP Congestion Controls algorithms. This patch adds an extra socket field, allowing wifi drivers to select another log scale to derive TCP Small Queue credit from current pacing rate. Initial value is 10, meaning that this patch does not change current behavior. We expect wifi drivers to set this field to smaller values (tests have been done with values from 6 to 9) They would have to use following template : if (skb->sk && skb->sk->sk_pacing_shift != MY_PACING_SHIFT) skb->sk->sk_pacing_shift = MY_PACING_SHIFT; Ref: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1670041 Signed-off-by: Eric Dumazet Cc: Johannes Berg Cc: Toke Høiland-Jørgensen Cc: Kir Kolyshkin --- v2: added kernel-doc comment, based on Johannes feedback. include/net/sock.h|2 ++ net/core/sock.c |1 + net/ipv4/tcp_output.c |4 ++-- 3 files changed, 5 insertions(+), 2 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index 688a823dccc306bd21f47da167c6922161af5a6a..f8715c5af37d4e598770dbe5c5f83246241f18d5 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -267,6 +267,7 @@ struct sock_common { *@sk_gso_type: GSO type (e.g. %SKB_GSO_TCPV4) *@sk_gso_max_size: Maximum GSO segment size to build *@sk_gso_max_segs: Maximum number of GSO segments + *@sk_pacing_shift: scaling factor for TCP Small Queues *@sk_lingertime: %SO_LINGER l_linger setting *@sk_backlog: always used with the per-socket spinlock held *@sk_callback_lock: used with the callbacks in the end of this struct @@ -451,6 +452,7 @@ struct sock { kmemcheck_bitfield_end(flags); u16 sk_gso_max_segs; + u8 sk_pacing_shift; unsigned long sk_lingertime; struct proto*sk_prot_creator; rwlock_tsk_callback_lock; diff --git a/net/core/sock.c b/net/core/sock.c index 57bbd6040eb6a3c072ce4e024687786079552ddf..13719af7b4e35d2050ccba51d44c7f691a889b37 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2746,6 +2746,7 @@ void sock_init_data(struct socket *sock, struct sock *sk) sk->sk_max_pacing_rate = ~0U; sk->sk_pacing_rate = ~0U; + sk->sk_pacing_shift = 10; sk->sk_incoming_cpu = -1; /* * Before updating sk_refcnt, we must commit prior changes to memory diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 0256f7a410417d93c9edab9d25a3ce5a81c2b296..76dbe884f2469660028684a46fc19afa000a1353 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1720,7 +1720,7 @@ u32 tcp_tso_autosize(const struct sock *sk, unsigned int mss_now, { u32 bytes, segs; - bytes = min(sk->sk_pacing_rate >> 10, + bytes = min(sk->sk_pacing_rate >> sk->sk_pacing_shift, sk->sk_gso_max_size - 1 - MAX_TCP_HEADER); /* Goal is to send at least one packet per ms, @@ -2198,7 +2198,7 @@ static bool tcp_small_queue_check(struct sock *sk, const struct sk_buff *skb, { unsigned int limit; - limit = max(2 * skb->truesize, sk->sk_pacing_rate >> 10); + limit = max(2 * skb->truesize, sk->sk_pacing_rate >> sk->sk_pacing_shift); limit = min_t(u32, limit, sock_net(sk)->ipv4.sysctl_tcp_limit_output_bytes); limit <<= factor;
Re: [net-next:master 488/665] verifier.c:undefined reference to `__multi3'
On Sun, Nov 12, 2017 at 09:23:52AM +0800, Alexei Starovoitov wrote: On 11/12/17 9:18 AM, Fengguang Wu wrote: On Sun, Nov 12, 2017 at 09:14:14AM +0800, Alexei Starovoitov wrote: On 11/12/17 8:23 AM, kbuild test robot wrote: tree: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master head: 7c5556decd0a629e9ee02e93653f75ba7b7da03c commit: 638f5b90d46016372a8e3e0a434f199cc5e12b8c [488/665] bpf: reduce verifier memory consumption config: mips-64r6el_defconfig (attached as .config) compiler: mips64el-linux-gnuabi64-gcc (Debian 7.2.0-11) 7.2.0 reproduce: git checkout 638f5b90d46016372a8e3e0a434f199cc5e12b8c # save the attached .config to linux build tree make.cross ARCH=mips All errors (new ones prefixed by >>): kernel/bpf/verifier.o: In function `realloc_verifier_state.isra.19': verifier.c:(.text+0x36fc): undefined reference to `__multi3' that's a known issue with gcc 7 on mips that is "optimizing" normal 64-bit multiply into 128-bit variant. Nothing to fix on the kernel side. Good to know that! Do you think it a good idea to blacklist __multi3 errors in mips builds? I would do so. yes. OK. Though digging further this function was added to arch/sparc/lib/multi3.S since gcc doing the same "optimization" there. Adding asm code doesn't look right to me. I'd rather push gcc folks to avoid such codegen. Sure, I just forwarded the original report to GCC list. Thanks, Fengguang
[net-next:master 488/665] verifier.c:undefined reference to `__multi3'
CC gcc list. According to Alexei: This is a known issue with gcc 7 on mips that is "optimizing" normal 64-bit multiply into 128-bit variant. Nothing to fix on the kernel side. Digging further, this function was added to arch/sparc/lib/multi3.S since gcc doing the same "optimization" there. Adding asm code doesn't look right to me. I'd rather push gcc folks to avoid such codegen. tree: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master head: 7c5556decd0a629e9ee02e93653f75ba7b7da03c commit: 638f5b90d46016372a8e3e0a434f199cc5e12b8c [488/665] bpf: reduce verifier memory consumption config: mips-64r6el_defconfig (attached as .config) compiler: mips64el-linux-gnuabi64-gcc (Debian 7.2.0-11) 7.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross git checkout 638f5b90d46016372a8e3e0a434f199cc5e12b8c # save the attached .config to linux build tree make.cross ARCH=mips All errors (new ones prefixed by >>): kernel/bpf/verifier.o: In function `realloc_verifier_state.isra.19': >> verifier.c:(.text+0x36fc): undefined reference to `__multi3' crypto/scompress.o: In function `.L82': scompress.c:(.text+0x55c): undefined reference to `__multi3' lib/mpi/generic_mpih-mul1.o: In function `.L2': generic_mpih-mul1.c:(.text+0x60): undefined reference to `__multi3' lib/mpi/generic_mpih-mul2.o: In function `.L2': generic_mpih-mul2.c:(.text+0x5c): undefined reference to `__multi3' lib/mpi/generic_mpih-mul3.o: In function `.L2': generic_mpih-mul3.c:(.text+0x5c): undefined reference to `__multi3' lib/mpi/mpih-div.o:mpih-div.c:(.text+0x1b8): more undefined references to `__multi3' follow --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip ___ kbuild-all mailing list kbuild-...@lists.01.org https://lists.01.org/mailman/listinfo/kbuild-all
Re: [net-next] tcp: allow drivers to tweak TSQ logic
On Sat, 2017-11-11 at 15:27 +0100, Johannes Berg wrote: > Thanks Eric! > > > We expect wifi drivers to set this field to smaller values (tests have > > been done with values from 6 to 9) > > I suppose we should test each driver or so. > > > They would have to use following template : > > > > if (skb->sk && skb->sk->sk_pacing_shift != MY_PACING_SHIFT) > > skb->sk->sk_pacing_shift = MY_PACING_SHIFT; > > Hm. I wish we wouldn't have to do this on every skb, but perhaps it > doesn't matter that much. Yes, it does not matter, even at 40Gbit ;) > > > > u16 sk_gso_max_segs; > > + u8 sk_pacing_shift; > > I guess you tried to fill a hole, but weren't we saying that it would > be better in the same cacheline? Then again, perhaps both cachelines > are resident anyway, haven't looked at this now. Same cache line already ;) u32sk_pacing_rate; /* 0x1c0 0x4 */ u32sk_max_pacing_rate; /* 0x1c4 0x4 */ struct page_frag sk_frag; /* 0x1c8 0x10 */ netdev_features_t sk_route_caps;/* 0x1d8 0x8 */ netdev_features_t sk_route_nocaps; /* 0x1e0 0x8 */ intsk_gso_type; /* 0x1e8 0x4 */ unsigned int sk_gso_max_size; /* 0x1ec 0x4 */ gfp_t sk_allocation;/* 0x1f0 0x4 */ __u32 sk_txhash;/* 0x1f4 0x4 */ unsigned int __sk_flags_offset[0]; /* 0x1f8 0 */ unsigned int sk_padding:1; /* 0x1f8:0x1f 0x4 */ unsigned int sk_kern_sock:1; /* 0x1f8:0x1e 0x4 */ unsigned int sk_no_check_tx:1; /* 0x1f8:0x1d 0x4 */ unsigned int sk_no_check_rx:1; /* 0x1f8:0x1c 0x4 */ unsigned int sk_userlocks:4; /* 0x1f8:0x18 0x4 */ unsigned int sk_protocol:8;/* 0x1f8:0x10 0x4 */ unsigned int sk_type:16; /* 0x1f8: 0 0x4 */ u16sk_gso_max_segs; /* 0x1fc 0x2 */ u8 sk_pacing_shift; /* 0x1fe 0x1 */ > > Unrelated to that, I think this is missing a documentation update since > the struct has kernel-doc comments. Yeah, I believe these kernel-doc on gigantic struct sock are useless and we should remove them, they have zero useful info.
Re: [PATCH iproute2 2/2] devlink: add batch command support
Fri, Nov 10, 2017 at 08:47:35PM CET, l...@kernel.org wrote: >On Fri, Nov 10, 2017 at 08:10:43AM +0100, Ivan Vecera wrote: >> On 10.11.2017 07:57, Leon Romanovsky wrote: >> > On Fri, Nov 10, 2017 at 07:20:14AM +0100, Ivan Vecera wrote: >> >> The patch adds support to batch devlink commands. >> >> >> >> Cc: Jiri Pirko>> >> Cc: Arkadi Sharshevsky >> >> Signed-off-by: Ivan Vecera >> >> --- >> >> devlink/devlink.c | 70 >> >> +++--- >> >> man/man8/devlink.8 | 16 + >> >> 2 files changed, 78 insertions(+), 8 deletions(-) >> >> >> > >> > <..> >> > >> >> diff --git a/man/man8/devlink.8 b/man/man8/devlink.8 >> >> index a480766c..a975ef34 100644 >> >> --- a/man/man8/devlink.8 >> >> +++ b/man/man8/devlink.8 >> >> @@ -12,6 +12,12 @@ devlink \- Devlink tool >> >> .sp >> >> >> >> .ti -8 >> >> +.B devlink >> >> +.RB "[ " -force " ] " >> >> +.BI "-batch " filename >> >> +.sp >> >> + >> >> +.ti -8 >> >> .IR OBJECT " := { " >> >> .BR dev " | " port " | " monitor " }" >> >> .sp >> >> @@ -32,6 +38,16 @@ Print the version of the >> >> utility and exit. >> >> >> >> .TP >> >> +.BR "\-b", " \-batch " >> >> +Read commands from provided file or standard input and invoke them. >> >> +First failure will cause termination of devlink. >> > >> > It is worth to document the expected format of that file. >> > And IMHO, it is better to have ability to load JSON fie which was >> > generated by -j, instead of declaring new format/knob. >> It's just a list of command-lines... like other utils (bridge,ip...) > >I'm implementing similar thing in RDMAtool (part of iproute2) and choose JSON >approach, it is more user and script friendly. Leon, we should really do things in a way they are currently done and used. Batching is implemented in "ip" for a long time. It makes perfect sense to have one command line per line of the batch file. In contrary, json output sounds really odd in this case. With json, there is no relation to ordinary ip command line params. Or do you want to extend it to accept json as well?
Re: [PATCH iproute2 2/2] devlink: add batch command support
Fri, Nov 10, 2017 at 07:20:14AM CET, ivec...@redhat.com wrote: >The patch adds support to batch devlink commands. > >Cc: Jiri Pirko>Cc: Arkadi Sharshevsky >Signed-off-by: Ivan Vecera Acked-by: Jiri Pirko Thanks!
Re: [PATCH net-next] bpf: expose sk_priority through struct bpf_sock_ops
On 11/12/17 4:46 AM, Daniel Borkmann wrote: On 11/11/2017 05:06 AM, Alexei Starovoitov wrote: On 11/11/17 6:07 AM, Daniel Borkmann wrote: On 11/10/2017 08:17 PM, Vlad Dumitrescu wrote: From: Vlad DumitrescuAllows BPF_PROG_TYPE_SOCK_OPS programs to read sk_priority. Signed-off-by: Vlad Dumitrescu --- include/uapi/linux/bpf.h | 1 + net/core/filter.c | 11 +++ tools/include/uapi/linux/bpf.h | 1 + 3 files changed, 13 insertions(+) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index e880ae6434ee..9757a2002513 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -947,6 +947,7 @@ struct bpf_sock_ops { __u32 local_ip6[4];/* Stored in network byte order */ __u32 remote_port;/* Stored in network byte order */ __u32 local_port;/* stored in host byte order */ +__u32 priority; }; /* List of known BPF sock_ops operators. diff --git a/net/core/filter.c b/net/core/filter.c index 61c791f9f628..a6329642d047 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -4449,6 +4449,17 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type, *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->dst_reg, offsetof(struct sock_common, skc_num)); break; + +case offsetof(struct bpf_sock_ops, priority): +BUILD_BUG_ON(FIELD_SIZEOF(struct sock, sk_priority) != 4); + +*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( +struct bpf_sock_ops_kern, sk), + si->dst_reg, si->src_reg, + offsetof(struct bpf_sock_ops_kern, sk)); +*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, + offsetof(struct sock, sk_priority)); +break; Hm, I don't think this would work, I actually think your initial patch was ok. bpf_setsockopt() as well as bpf_getsockopt() check for sk_fullsock(sk) right before accessing options on either socket or TCP level, and bail out with error otherwise; in such cases we'd read something else here and assume it's sk_priority. even if it's not fullsock, it will just read zero, no? what's a problem with that? In non-fullsock hooks like BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB the program author will know that it's meaningless to read sk_priority, so returning zero with minimal checks is fine. While adding extra runtime if (sk_fullsock(sk)) is unnecessary, since the safety is not compromised. Hm, on my kernel, struct sock has the 4 bytes sk_priority at offset 440, struct request_sock itself is only 232 byte long in total, and the struct inet_timewait_sock is 208 byte long, so you'd be accessing out of bounds that way, so it cannot be ignored and assumed zero. I thought we always pass fully allocated sock but technically not fullsock yet. My mistake. We do: tcp_timeout_init((struct sock *)req)) so yeah ctx rewrite approach won't work. Let's go back to access via helper.
Re: [PATCH v4] scripts: add leaking_addresses.pl
On Tue, Nov 07, 2017 at 09:32:11PM +1100, Tobin C. Harding wrote: > Currently we are leaking addresses from the kernel to user space. This > script is an attempt to find some of those leakages. Script parses > `dmesg` output and /proc and /sys files for hex strings that look like > kernel addresses. > > Only works for 64 bit kernels, the reason being that kernel addresses > on 64 bit kernels have '' as the leading bit pattern making greping > possible. On 32 kernels we don't have this luxury. Well, it's not going to work as well as intented on x86 machine with 5-level paging. Kernel address space there starts at 0xff10. It will still catch pointers to kernel/modules text, but the rest is outside of 0x... space. See Documentation/x86/x86_64/mm.txt. Not sure if we care. It won't work too for other 64-bit architectrues that have more than 256TB of virtual address space. Just wanted to point to the limitation. -- Kirill A. Shutemov
[PATCH net-next 1/3] bpf: improve verifier ARG_CONST_SIZE_OR_ZERO semantics
For helpers, the argument type ARG_CONST_SIZE_OR_ZERO permits the access size to be 0 when accessing the previous argument (arg). Right now, it requires the arg needs to be NULL when size passed is 0 or could be 0. It also requires a non-NULL arg when the size is proved to be non-0. This patch changes verifier ARG_CONST_SIZE_OR_ZERO behavior such that for size-0 or possible size-0, it is not required the arg equal to NULL. There are a couple of reasons for this semantics change, and all of them intends to simplify user bpf programs which may improve user experience and/or increase chances of verifier acceptance. Together with the next patch which changes bpf_probe_read arg2 type from ARG_CONST_SIZE to ARG_CONST_SIZE_OR_ZERO, the following two examples, which fail the verifier currently, are able to get verifier acceptance. Example 1: == unsigned long len = pend - pstart; len = len > MAX_PAYLOAD_LEN ? MAX_PAYLOAD_LEN : len; len &= MAX_PAYLOAD_LEN; bpf_probe_read(data->payload, len, pstart); It does not have test for "len > 0" and it failed the verifier. Users may not be aware that they have to add this test. Converting the bpf_probe_read helper to have ARG_CONST_SIZE_OR_ZERO helps the above code get verifier acceptance. Example 2: == Here is one example where llvm "messed up" the code and the verifier fails. .. unsigned long len = pend - pstart; if (len > 0 && len <= MAX_PAYLOAD_LEN) bpf_probe_read(data->payload, len, pstart); .. The compiler generates the following code and verifier fails: .. 39: (79) r2 = *(u64 *)(r10 -16) 40: (1f) r2 -= r8 41: (bf) r1 = r2 42: (07) r1 += -1 43: (25) if r1 > 0xffe goto pc+3 R0=inv(id=0) R1=inv(id=0,umax_value=4094,var_off=(0x0; 0xfff)) R2=inv(id=0) R6=map_value(id=0,off=0,ks=4,vs=4095,imm=0) R7=inv(id=0) R8=inv(id=0) R9=inv0 R10=fp0 44: (bf) r1 = r6 45: (bf) r3 = r8 46: (85) call bpf_probe_read#45 R2 min value is negative, either use unsigned or 'var &= const' .. The compiler optimization is correct. If r1 = 0, r1 - 1 = 0x > 0xffe. If r1 != 0, r1 - 1 will not wrap. r1 > 0xffe at insn #43 can actually capture both "r1 > 0" and "len <= MAX_PAYLOAD_LEN". This however causes an issue in verifier as the value range of arg2 "r2" does not properly get refined and lead to verification failure. Relaxing bpf_prog_read arg2 from ARG_CONST_SIZE to ARG_CONST_SIZE_OR_ZERO allows the following simplied code: unsigned long len = pend - pstart; if (len <= MAX_PAYLOAD_LEN) bpf_probe_read(data->payload, len, pstart); The llvm compiler will generate less complex code and the verifier is able to verify that the program is okay. Signed-off-by: Yonghong SongAcked-by: Alexei Starovoitov --- kernel/bpf/verifier.c | 40 1 file changed, 24 insertions(+), 16 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 4a942e2..dd54d20 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -799,12 +799,13 @@ static int check_stack_read(struct bpf_verifier_env *env, /* check read/write into map element returned by bpf_map_lookup_elem() */ static int __check_map_access(struct bpf_verifier_env *env, u32 regno, int off, - int size) + int size, bool zero_size_allowed) { struct bpf_reg_state *regs = cur_regs(env); struct bpf_map *map = regs[regno].map_ptr; - if (off < 0 || size <= 0 || off + size > map->value_size) { + if (off < 0 || size < 0 || (size == 0 && !zero_size_allowed) || + off + size > map->value_size) { verbose(env, "invalid access to map value, value_size=%d off=%d size=%d\n", map->value_size, off, size); return -EACCES; @@ -814,7 +815,7 @@ static int __check_map_access(struct bpf_verifier_env *env, u32 regno, int off, /* check read/write into a map element with possible variable offset */ static int check_map_access(struct bpf_verifier_env *env, u32 regno, - int off, int size) + int off, int size, bool zero_size_allowed) { struct bpf_verifier_state *state = env->cur_state; struct bpf_reg_state *reg = >regs[regno]; @@ -837,7 +838,8 @@ static int check_map_access(struct bpf_verifier_env *env, u32 regno, regno); return -EACCES; } - err = __check_map_access(env, regno, reg->smin_value + off, size); + err = __check_map_access(env, regno, reg->smin_value + off, size, +zero_size_allowed); if (err) { verbose(env, "R%d min value is outside of the array range\n", regno); @@ -853,7 +855,8 @@ static int check_map_access(struct bpf_verifier_env *env, u32 regno, regno); return -EACCES; } -
[PATCH net-next 2/3] bpf: change helper bpf_probe_read arg2 type to ARG_CONST_SIZE_OR_ZERO
The helper bpf_probe_read arg2 type is changed from ARG_CONST_SIZE to ARG_CONST_SIZE_OR_ZERO to permit size-0 buffer. Together with newer ARG_CONST_SIZE_OR_ZERO semantics which allows non-NULL buffer with size 0, this allows simpler bpf programs with verifier acceptance. The prvious commit which changes ARG_CONST_SIZE_OR_ZERO semantics has details on examples. Signed-off-by: Yonghong SongAcked-by: Alexei Starovoitov --- kernel/trace/bpf_trace.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 506efe6..a5580c6 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -78,12 +78,16 @@ EXPORT_SYMBOL_GPL(trace_call_bpf); BPF_CALL_3(bpf_probe_read, void *, dst, u32, size, const void *, unsafe_ptr) { - int ret; + int ret = 0; + + if (unlikely(size == 0)) + goto out; ret = probe_kernel_read(dst, unsafe_ptr, size); if (unlikely(ret < 0)) memset(dst, 0, size); + out: return ret; } @@ -92,7 +96,7 @@ static const struct bpf_func_proto bpf_probe_read_proto = { .gpl_only = true, .ret_type = RET_INTEGER, .arg1_type = ARG_PTR_TO_UNINIT_MEM, - .arg2_type = ARG_CONST_SIZE, + .arg2_type = ARG_CONST_SIZE_OR_ZERO, .arg3_type = ARG_ANYTHING, }; -- 2.9.5
Re: [net-next:master 488/665] verifier.c:undefined reference to `__multi3'
On 11/12/17 8:23 AM, kbuild test robot wrote: tree: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master head: 7c5556decd0a629e9ee02e93653f75ba7b7da03c commit: 638f5b90d46016372a8e3e0a434f199cc5e12b8c [488/665] bpf: reduce verifier memory consumption config: mips-64r6el_defconfig (attached as .config) compiler: mips64el-linux-gnuabi64-gcc (Debian 7.2.0-11) 7.2.0 reproduce: git checkout 638f5b90d46016372a8e3e0a434f199cc5e12b8c # save the attached .config to linux build tree make.cross ARCH=mips All errors (new ones prefixed by >>): kernel/bpf/verifier.o: In function `realloc_verifier_state.isra.19': verifier.c:(.text+0x36fc): undefined reference to `__multi3' that's a known issue with gcc 7 on mips that is "optimizing" normal 64-bit multiply into 128-bit variant. Nothing to fix on the kernel side. crypto/scompress.o: In function `.L82': scompress.c:(.text+0x55c): undefined reference to `__multi3' lib/mpi/generic_mpih-mul1.o: In function `.L2': generic_mpih-mul1.c:(.text+0x60): undefined reference to `__multi3' lib/mpi/generic_mpih-mul2.o: In function `.L2': generic_mpih-mul2.c:(.text+0x5c): undefined reference to `__multi3' lib/mpi/generic_mpih-mul3.o: In function `.L2': generic_mpih-mul3.c:(.text+0x5c): undefined reference to `__multi3' lib/mpi/mpih-div.o:mpih-div.c:(.text+0x1b8): more undefined references to `__multi3' follow
gpl-only change on bpf_setsockopt
Hi Lawrence: I noticed that commit cd86d1fd21025 ("bpf: Adding helper function bpf_getsockops") changed the gpl_only on bpf_setsockopt. The commit log does not specify why. Is there any reason you changed it and made both bpf_setsockopt and bpf_getsockopt available for non-gpl programs? David
Re: gpl-only change on bpf_setsockopt
On Sat, Nov 11, 2017 at 08:17:37PM -0700, David Ahern wrote: > Hi Lawrence: > > I noticed that commit cd86d1fd21025 ("bpf: Adding helper function > bpf_getsockops") changed the gpl_only on bpf_setsockopt. The commit log > does not specify why. Is there any reason you changed it and made both > bpf_setsockopt and bpf_getsockopt available for non-gpl programs? that was my request to match the rest of networking programs. There is nothing linux specific about get/setsockopt that any user space process can do just as well. Compare that to tracing programs which are gpl only.
[PATCH net-next 1/3] rxrpc: Lock around calling a kernel service Rx notification
Place a spinlock around the invocation of call->notify_rx() for a kernel service call and lock again when ending the call and replace the notification pointer with a pointer to a dummy function. This is required because it's possible for rxrpc_notify_socket() to be called after the call has been ended by the kernel service if called from the asynchronous work function rxrpc_process_call(). However, rxrpc_notify_socket() currently only holds the RCU read lock when invoking ->notify_rx(), which means that the afs_call struct would need to be disposed of by call_rcu() rather than by kfree(). But we shouldn't see any notifications from a call after calling rxrpc_kernel_end_call(), so a lock is required in rxrpc code. Without this, we may see the call wait queue as having a corrupt spinlock: BUG: spinlock bad magic on CPU#0, kworker/0:2/1612 general protection fault: [#1] SMP ... Workqueue: krxrpcd rxrpc_process_call task: 88040b83c400 task.stack: 88040adfc000 RIP: 0010:spin_bug+0x161/0x18f RSP: 0018:88040adffcc0 EFLAGS: 00010002 RAX: 0032 RBX: 6b6b6b6b6b6b6b6b RCX: 81ab16cf RDX: 88041fa14c01 RSI: 88041fa0ccb8 RDI: 88041fa0ccb8 RBP: 88040adffcd8 R08: R09: R10: 88040adffc60 R11: 022c R12: 88040aca2208 R13: 81a58114 R14: R15: Call Trace: do_raw_spin_lock+0x1d/0x89 _raw_spin_lock_irqsave+0x3d/0x49 ? __wake_up_common_lock+0x4c/0xa7 __wake_up_common_lock+0x4c/0xa7 ? __lock_is_held+0x47/0x7a __wake_up+0xe/0x10 afs_wake_up_call_waiter+0x11b/0x122 [kafs] rxrpc_notify_socket+0x12b/0x258 rxrpc_process_call+0x18e/0x7d0 process_one_work+0x298/0x4de ? rescuer_thread+0x280/0x280 worker_thread+0x1d1/0x2ae ? rescuer_thread+0x280/0x280 kthread+0x12c/0x134 ? kthread_create_on_node+0x3a/0x3a ret_from_fork+0x27/0x40 In this case, note the corrupt data in EBX. The address of the offending afs_call is in R12, plus the offset to the spinlock. Signed-off-by: David Howells--- net/rxrpc/af_rxrpc.c| 16 net/rxrpc/ar-internal.h |1 + net/rxrpc/call_object.c |1 + net/rxrpc/recvmsg.c |2 ++ 4 files changed, 20 insertions(+) diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c index 344b2dcad52d..9b5c46b052fd 100644 --- a/net/rxrpc/af_rxrpc.c +++ b/net/rxrpc/af_rxrpc.c @@ -322,6 +322,14 @@ struct rxrpc_call *rxrpc_kernel_begin_call(struct socket *sock, } EXPORT_SYMBOL(rxrpc_kernel_begin_call); +/* + * Dummy function used to stop the notifier talking to recvmsg(). + */ +static void rxrpc_dummy_notify_rx(struct sock *sk, struct rxrpc_call *rxcall, + unsigned long call_user_ID) +{ +} + /** * rxrpc_kernel_end_call - Allow a kernel service to end a call it was using * @sock: The socket the call is on @@ -336,6 +344,14 @@ void rxrpc_kernel_end_call(struct socket *sock, struct rxrpc_call *call) mutex_lock(>user_mutex); rxrpc_release_call(rxrpc_sk(sock->sk), call); + + /* Make sure we're not going to call back into a kernel service */ + if (call->notify_rx) { + spin_lock_bh(>notify_lock); + call->notify_rx = rxrpc_dummy_notify_rx; + spin_unlock_bh(>notify_lock); + } + mutex_unlock(>user_mutex); rxrpc_put_call(call, rxrpc_call_put_kernel); } diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index ea5600b747cc..b2151993d384 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -525,6 +525,7 @@ struct rxrpc_call { unsigned long flags; unsigned long events; spinlock_t lock; + spinlock_t notify_lock;/* Kernel notification lock */ rwlock_tstate_lock; /* lock for state transition */ u32 abort_code; /* Local/remote abort code */ int error; /* Local error incurred */ diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c index fcdd6555a820..4c7fbc6dcce7 100644 --- a/net/rxrpc/call_object.c +++ b/net/rxrpc/call_object.c @@ -124,6 +124,7 @@ struct rxrpc_call *rxrpc_alloc_call(gfp_t gfp) INIT_LIST_HEAD(>sock_link); init_waitqueue_head(>waitq); spin_lock_init(>lock); + spin_lock_init(>notify_lock); rwlock_init(>state_lock); atomic_set(>usage, 1); call->debug_id = atomic_inc_return(_debug_id); diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c index e4937b3f3685..8510a98b87e1 100644 --- a/net/rxrpc/recvmsg.c +++ b/net/rxrpc/recvmsg.c @@ -40,7 +40,9 @@ void rxrpc_notify_socket(struct rxrpc_call *call) sk = >sk; if (rx && sk->sk_state < RXRPC_CLOSE) { if
[PATCH net-next 2/3] rxrpc: Fix a null ptr deref in rxrpc_fill_out_ack()
rxrpc_fill_out_ack() needs to be passed the connection pointer from its caller rather than using call->conn as the call may be disconnected in parallel with it, clearing call->conn, leading to: BUG: unable to handle kernel NULL pointer dereference at 0010 IP: rxrpc_send_ack_packet+0x231/0x6a4 Signed-off-by: David Howells--- net/rxrpc/output.c |9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c index 71e6f713fbe7..8ee8b2d4a3eb 100644 --- a/net/rxrpc/output.c +++ b/net/rxrpc/output.c @@ -35,7 +35,8 @@ struct rxrpc_abort_buffer { /* * Fill out an ACK packet. */ -static size_t rxrpc_fill_out_ack(struct rxrpc_call *call, +static size_t rxrpc_fill_out_ack(struct rxrpc_connection *conn, +struct rxrpc_call *call, struct rxrpc_ack_buffer *pkt, rxrpc_seq_t *_hard_ack, rxrpc_seq_t *_top, @@ -77,8 +78,8 @@ static size_t rxrpc_fill_out_ack(struct rxrpc_call *call, } while (before_eq(seq, top)); } - mtu = call->conn->params.peer->if_mtu; - mtu -= call->conn->params.peer->hdrsize; + mtu = conn->params.peer->if_mtu; + mtu -= conn->params.peer->hdrsize; jmax = (call->nr_jumbo_bad > 3) ? 1 : rxrpc_rx_jumbo_max; pkt->ackinfo.rxMTU = htonl(rxrpc_rx_mtu); pkt->ackinfo.maxMTU = htonl(mtu); @@ -148,7 +149,7 @@ int rxrpc_send_ack_packet(struct rxrpc_call *call, bool ping) } call->ackr_reason = 0; } - n = rxrpc_fill_out_ack(call, pkt, _ack, , reason); + n = rxrpc_fill_out_ack(conn, call, pkt, _ack, , reason); spin_unlock_bh(>lock);
[PATCH net-next 0/3] rxrpc: Fixes
Here are some patches that fix some things in AF_RXRPC: (1) Prevent notifications from being passed to a kernel service for a call that it has ended. (2) Fix a null pointer deference that occurs under some circumstances when an ACK is generated. (3) Fix a number of things to do with call expiration. The patches can be found here also: http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-next Tagged thusly: git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git rxrpc-next-2017 David --- David Howells (3): rxrpc: Lock around calling a kernel service Rx notification rxrpc: Fix a null ptr deref in rxrpc_fill_out_ack() rxrpc: Fix call expiry handling net/rxrpc/af_rxrpc.c| 16 net/rxrpc/ar-internal.h |1 + net/rxrpc/call_event.c |2 +- net/rxrpc/call_object.c |1 + net/rxrpc/input.c |2 -- net/rxrpc/output.c | 19 +++ net/rxrpc/recvmsg.c |2 ++ 7 files changed, 36 insertions(+), 7 deletions(-)
Re: Regression in throughput between kvm guests over virtual bridge
>> This case should be quite similar with pkgten, if you got improvement with >> pktgen, usually it was also the same for UDP, could you please try to disable >> tso, gso, gro, ufo on all host tap devices and guest virtio-net devices? >> Currently >> the most significant tests would be like this AFAICT: >> >> Host->VM 4.124.13 >> TCP: >> UDP: >> pktgen: >> >> Don't want to bother you too much, so maybe 4.12 & 4.13 without Jason's >> patch should >> work since we have seen positive number for that, you can also temporarily >> skip >> net-next as well. > > Here are the requested numbers, averaged over numerous runs -- guest is > 4GB+1vcpu, host uperf/pktgen bound to 1 host CPU + qemu and vhost thread > pinned to other unique host CPUs. tso, gso, gro, ufo disabled on host > taps / guest virtio-net devs as requested: > > Host->VM 4.124.13 > TCP: 9.92Gb/s6.44Gb/s > UDP: 5.77Gb/s6.63Gb/s > pktgen: 1572403pps 1904265pps > > UDP/pktgen both show improvement from 4.12->4.13. More interesting, > however, is that I am seeing the TCP regression for the first time from > host->VM. I wonder if the combination of CPU binding + disabling of one > or more of tso/gso/gro/ufo is related. > >> >> If you see UDP and pktgen are aligned, then it might be helpful to continue >> the other two cases, otherwise we fail in the first place. > I continued running many iterations of these tests between 4.12 and 4.13.. My throughput findings can be summarized as: VM->VM case: UDP: roughly equivalent TCP: Consistent regression (5-10%) VM->Host Both UDP and TCP traffic are roughly equivalent. Host->VM UDP+pktgen: improvement (5-10%), but inconsistent TCP: Consistent regression (25-30%) Host->VM UDP and pktgen seemed to show improvement in some runs, and in others seemed to mirror 4.12-level performance. The TCP regression for VM->VM is no surprise, we started with that. It's still consistent, but smaller in this specific environment. The TCP regression in Host->VM is interesting because I wasn't seeing it consistently before binding CPUs + disabling tso/gso/gro/ufo. Also interesting because of how large it is -- By any chance can you see this regression on x86 with the same configuration?
[PATCH net-next 3/3] rxrpc: Fix call expiry handling
Fix call expiry handling in the following ways (1) If all the request data from a client call is acked, don't send a follow up IDLE ACK with firstPacket == 1 and previousPacket == 0 as this appears to fool some servers into thinking everything has been accepted. (2) Never send an abort back to the server once it has ACK'd all the request packets; rather just try to reuse the channel for the next call. The first request DATA packet of the next call on the same channel will implicitly ACK the entire reply of the dead call - even if we haven't transmitted it yet. (3) Don't send RX_CALL_TIMEOUT in an ABORT packet, librx uses abort codes to pass local errors to the caller in addition to remote errors, and this is meant to be local only. The following also need to be addressed in future patches: (4) Service calls should send PING ACKs as 'keep alives' if the server is still processing the call. (5) VERSION REPLY packets should be sent to the peers of service connections to act as keep-alives. This is used to keep firewall routes in place. The AFS CM should enable this. Signed-off-by: David Howells--- net/rxrpc/call_event.c |2 +- net/rxrpc/input.c |2 -- net/rxrpc/output.c | 10 ++ 3 files changed, 11 insertions(+), 3 deletions(-) diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c index 7a77844aab16..3574508baf9a 100644 --- a/net/rxrpc/call_event.c +++ b/net/rxrpc/call_event.c @@ -386,7 +386,7 @@ void rxrpc_process_call(struct work_struct *work) now = ktime_get_real(); if (ktime_before(call->expire_at, now)) { - rxrpc_abort_call("EXP", call, 0, RX_CALL_TIMEOUT, -ETIME); + rxrpc_abort_call("EXP", call, 0, RX_USER_ABORT, -ETIME); set_bit(RXRPC_CALL_EV_ABORT, >events); goto recheck_state; } diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c index 1e37eb1c0c66..1b592073ec96 100644 --- a/net/rxrpc/input.c +++ b/net/rxrpc/input.c @@ -298,8 +298,6 @@ static bool rxrpc_end_tx_phase(struct rxrpc_call *call, bool reply_begun, write_unlock(>state_lock); if (call->state == RXRPC_CALL_CLIENT_AWAIT_REPLY) { - rxrpc_propose_ACK(call, RXRPC_ACK_IDLE, 0, 0, false, true, - rxrpc_propose_ack_client_tx_end); trace_rxrpc_transmit(call, rxrpc_transmit_await_reply); } else { trace_rxrpc_transmit(call, rxrpc_transmit_end); diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c index 8ee8b2d4a3eb..f47659c7b224 100644 --- a/net/rxrpc/output.c +++ b/net/rxrpc/output.c @@ -222,6 +222,16 @@ int rxrpc_send_abort_packet(struct rxrpc_call *call) rxrpc_serial_t serial; int ret; + /* Don't bother sending aborts for a client call once the server has +* hard-ACK'd all of its request data. After that point, we're not +* going to stop the operation proceeding, and whilst we might limit +* the reply, it's not worth it if we can send a new call on the same +* channel instead, thereby closing off this call. +*/ + if (rxrpc_is_client_call(call) && + test_bit(RXRPC_CALL_TX_LAST, >flags)) + return 0; + spin_lock_bh(>lock); if (call->conn) conn = rxrpc_get_connection_maybe(call->conn);
DIRECTOR IN CHARGE: DR.PATRICE TEME
UN Visitor Centre Department of Public Information United Nations Headquarters Room DHL-1B-154 New York, NY 10017 E-mail:un...@teewars.org United Nations Compensation Unit, In Affiliation with World Bank Our Ref: UN/WBO/042UK/2015. Congratulations Beneficiary, How are you today Hope all is well with you and family You may not understand why this mail came to you. We have been having a meeting for the past 7 months which just ended few days ago with the secretary to the UNITED NATIONS. This email is to all the people that have been scammed in any part of the world, the UNITED NATIONS in Affiliation with WORLD BANK have agreed to compensate them with the sum of USD US$980,000.00 Dollars. This includes every foreign contractors that may have not received their contract sum, and people that have had an unfinished transaction or international businesses that failed due to Government problems etc. We found your name in the list of those who are to benefit from these compensation exercise and that is why we are contacting you, this have been agreed upon and have been signed. You are advised to contact Dr.PATRICE TEME of our paying center in Africa, as he is our representative in Nigeria, contact him immediately for your Cheque/ International Bank Draft of US$980,000.00 Dollars. This fund is in form of a Bank Draft for security purpose ok So he will send it to you and you can clear it in any bank of your choice. Therefore, you should send him your full Name and telephone number your correct mailing address where you want him to send the Draft to you. Contact Dr.PATRICE TEME of MAGNUM PLC PAYMENT CENTER with your payment code:ST/DPI/829 immediately for your Cheque at the given address below: DIRECTOR IN CHARGE: DR.PATRICE TEME E-MAIL:info-magnumb...@ukcompanies.org TELEPHONE:+ 234-817-008-4240 FAX: +234-817-008-4240 dr_patrice_teme I apologize on behalf of my organization for any delay you might have encountered in receiving your fund in the past. Thanks and God bless you and your family. Hoping to hear from you as soon as you cash your Bank Draft. Making the world a better place. You are required to contact the above person and furnish him with the following of your information that will be required to avoid any mistakes:- 1. Your Full Name : 2. Your Home/Mobile Telephone No: 3. Your Home or Office Address : 4. Age/Occupation/Marital Status: 5. Scanned copy of your identification: Congratulations, and I look forward to hear from you as soon as you confirm your payment making the world a better place http://u-n-ocompensation.co.nf/_about_.html
Re: [PATCH net-next] bpf: expose sk_priority through struct bpf_sock_ops
On 11/11/2017 05:06 AM, Alexei Starovoitov wrote: On 11/11/17 6:07 AM, Daniel Borkmann wrote: On 11/10/2017 08:17 PM, Vlad Dumitrescu wrote: From: Vlad DumitrescuAllows BPF_PROG_TYPE_SOCK_OPS programs to read sk_priority. Signed-off-by: Vlad Dumitrescu --- include/uapi/linux/bpf.h | 1 + net/core/filter.c | 11 +++ tools/include/uapi/linux/bpf.h | 1 + 3 files changed, 13 insertions(+) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index e880ae6434ee..9757a2002513 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -947,6 +947,7 @@ struct bpf_sock_ops { __u32 local_ip6[4]; /* Stored in network byte order */ __u32 remote_port; /* Stored in network byte order */ __u32 local_port; /* stored in host byte order */ + __u32 priority; }; /* List of known BPF sock_ops operators. diff --git a/net/core/filter.c b/net/core/filter.c index 61c791f9f628..a6329642d047 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -4449,6 +4449,17 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type, *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->dst_reg, offsetof(struct sock_common, skc_num)); break; + + case offsetof(struct bpf_sock_ops, priority): + BUILD_BUG_ON(FIELD_SIZEOF(struct sock, sk_priority) != 4); + + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( + struct bpf_sock_ops_kern, sk), + si->dst_reg, si->src_reg, + offsetof(struct bpf_sock_ops_kern, sk)); + *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, + offsetof(struct sock, sk_priority)); + break; Hm, I don't think this would work, I actually think your initial patch was ok. bpf_setsockopt() as well as bpf_getsockopt() check for sk_fullsock(sk) right before accessing options on either socket or TCP level, and bail out with error otherwise; in such cases we'd read something else here and assume it's sk_priority. even if it's not fullsock, it will just read zero, no? what's a problem with that? In non-fullsock hooks like BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB the program author will know that it's meaningless to read sk_priority, so returning zero with minimal checks is fine. While adding extra runtime if (sk_fullsock(sk)) is unnecessary, since the safety is not compromised. Hm, on my kernel, struct sock has the 4 bytes sk_priority at offset 440, struct request_sock itself is only 232 byte long in total, and the struct inet_timewait_sock is 208 byte long, so you'd be accessing out of bounds that way, so it cannot be ignored and assumed zero. If we don't care about error when !fullsock, then you could code the sk_fullsock(sk) check in BPF itself above in the ctx conversion, and set it to 0 manually when !fullsock. It might make it harder in the future to change sk_fullsock() itself, but in any case sk_fullsock() helper should get a comment in its function saying that when contents are changed, also above BPF bits need to be adjusted to remain an equivalent test.
Re: [PATCH 1/2] bpf: add a bpf_override_function helper
On 11/11/17 4:14 PM, Ingo Molnar wrote: * Josef Bacikwrote: On Fri, Nov 10, 2017 at 10:34:59AM +0100, Ingo Molnar wrote: * Josef Bacik wrote: @@ -551,6 +578,10 @@ static const struct bpf_func_proto *kprobe_prog_func_proto(enum bpf_func_id func return _get_stackid_proto; case BPF_FUNC_perf_event_read_value: return _perf_event_read_value_proto; + case BPF_FUNC_override_return: + pr_warn_ratelimited("%s[%d] is installing a program with bpf_override_return helper that may cause unexpected behavior!", + current->comm, task_pid_nr(current)); + return _override_return_proto; So if this new functionality is used we'll always print this into the syslog? The warning is also a bit passive aggressive about informing the user: what unexpected behavior can happen, what is the worst case? It's modeled after the other warnings bpf will spit out, but with this feature you are skipping a function and instead returning some arbitrary value, so anything could go wrong if you mess something up. For instance I screwed up my initial test case and made every IO submitted return an error instead of just on the one file system I was attempting to test, so all sorts of hilarity ensued. Ok, then for the x86 bits: NAK-ed-by: Ingo Molnar One of the major advantages of having an in-kernel BPF sandbox is to never crash the kernel - and allowing BPF programs to just randomly modify the return value of kernel functions sounds immensely broken to me. (And yes, I realize that kprobes are used here as a vehicle, but the point remains.) yeah. modifying arbitrary function return pushes bpf outside of its safety guarantees and in that sense doing the same override_return could be done from a kernel module if kernel provides the x64 side of the facility introduced by this patch. On the other side adding parts of this feature to the kernel only to be used by external kernel module is quite ugly too and not something that was ever done before. How about we restrict this bpf_override_return() only to the functions which callers expect to handle errors ? We can add something similar to NOKPROBE_SYMBOL(). Like ALLOW_RETURN_OVERRIDE() and on btrfs side mark the functions we're going to test with this feature. Then 'not crashing kernel' requirement will be preserved. btrfs or whatever else we will be testing with override_return will be functioning in 'stress test' mode and if bpf program is not careful and returns error all the time then one particular subsystem (like btrfs) will not be functional, but the kernel will not be crashing. Thoughts?
[PATCH net-next 4/5] ip6_tunnel: process toobig in a better way
The same improvement in "ip6_gre: process toobig in a better way" is needed by ip4ip6 and ip6ip6 as well. Note that ip4ip6 and ip6ip6 will also update sk dst pmtu in their err_handlers. Like I said before, gre6 could not do this as it's inner proto is not certain. But for all of them, sk dst pmtu will be updated in tx path if in need. Signed-off-by: Xin Long--- net/ipv6/ip6_tunnel.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c index a1f704c..7e9e205 100644 --- a/net/ipv6/ip6_tunnel.c +++ b/net/ipv6/ip6_tunnel.c @@ -498,9 +498,8 @@ ip6_tnl_err(struct sk_buff *skb, __u8 ipproto, struct inet6_skb_parm *opt, err = 0; switch (*type) { - __u32 teli; struct ipv6_tlv_tnl_enc_lim *tel; - __u32 mtu; + __u32 mtu, teli; case ICMPV6_DEST_UNREACH: net_dbg_ratelimited("%s: Path to destination invalid or inactive!\n", t->parms.name); @@ -531,11 +530,11 @@ ip6_tnl_err(struct sk_buff *skb, __u8 ipproto, struct inet6_skb_parm *opt, } break; case ICMPV6_PKT_TOOBIG: + ip6_update_pmtu(skb, net, htonl(*info), 0, 0, + sock_net_uid(net, NULL)); mtu = *info - offset; if (mtu < IPV6_MIN_MTU) mtu = IPV6_MIN_MTU; - t->dev->mtu = mtu; - len = sizeof(*ipv6h) + ntohs(ipv6h->payload_len); if (len > mtu) { rel_type = ICMPV6_PKT_TOOBIG; -- 2.1.0
[PATCH net-next 1/5] ip6_gre: add the process for redirect in ip6gre_err
This patch is to add redirect icmp packet process for ip6gre by calling ip6_redirect() in ip6gre_err(), as in vti6_err. Prior to this patch, there's even no route cache generated after receiving redirect. Reported-by: Jianlin ShiSigned-off-by: Xin Long --- net/ipv6/ip6_gre.c | 5 + 1 file changed, 5 insertions(+) diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c index 3e10c51..0684d0c 100644 --- a/net/ipv6/ip6_gre.c +++ b/net/ipv6/ip6_gre.c @@ -369,6 +369,7 @@ static void ip6gre_tunnel_uninit(struct net_device *dev) static void ip6gre_err(struct sk_buff *skb, struct inet6_skb_parm *opt, u8 type, u8 code, int offset, __be32 info) { + struct net *net = dev_net(skb->dev); const struct gre_base_hdr *greh; const struct ipv6hdr *ipv6h; int grehlen = sizeof(*greh); @@ -442,6 +443,10 @@ static void ip6gre_err(struct sk_buff *skb, struct inet6_skb_parm *opt, mtu = IPV6_MIN_MTU; t->dev->mtu = mtu; return; + case NDISC_REDIRECT: + ip6_redirect(skb, net, skb->dev->ifindex, 0, +sock_net_uid(net, NULL)); + return; } if (time_before(jiffies, t->err_time + IP6TUNNEL_ERR_TIMEO)) -- 2.1.0
[PATCH net-next 2/5] ip6_gre: process toobig in a better way
Now ip6gre processes toobig icmp packet by setting gre dev's mtu in ip6gre_err, which would cause few things not good: - It couldn't set mtu with dev_set_mtu due to it's not in user context, which causes route cache and idev->cnf.mtu6 not to be updated. - It has to update sk dst pmtu in tx path according to gredev->mtu for ip6gre, while it updates pmtu again according to lower dst pmtu in ip6_tnl_xmit. - To change dev->mtu by toobig icmp packet is not a good idea, it should only work on pmtu. This patch is to process toobig by updating the lower dst's pmtu, as later sk dst pmtu will be updated in ip6_tnl_xmit, the same way as in ip4gre. Note that gre dev's mtu will not be updated any more, it doesn't make any sense to change dev's mtu after receiving a toobig packet. Signed-off-by: Xin Long--- net/ipv6/ip6_gre.c | 15 ++- 1 file changed, 2 insertions(+), 13 deletions(-) diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c index 0684d0c..b90bad7 100644 --- a/net/ipv6/ip6_gre.c +++ b/net/ipv6/ip6_gre.c @@ -403,9 +403,8 @@ static void ip6gre_err(struct sk_buff *skb, struct inet6_skb_parm *opt, return; switch (type) { - __u32 teli; struct ipv6_tlv_tnl_enc_lim *tel; - __u32 mtu; + __u32 teli; case ICMPV6_DEST_UNREACH: net_dbg_ratelimited("%s: Path to destination invalid or inactive!\n", t->parms.name); @@ -436,12 +435,7 @@ static void ip6gre_err(struct sk_buff *skb, struct inet6_skb_parm *opt, } return; case ICMPV6_PKT_TOOBIG: - mtu = be32_to_cpu(info) - offset - t->tun_hlen; - if (t->dev->type == ARPHRD_ETHER) - mtu -= ETH_HLEN; - if (mtu < IPV6_MIN_MTU) - mtu = IPV6_MIN_MTU; - t->dev->mtu = mtu; + ip6_update_pmtu(skb, net, info, 0, 0, sock_net_uid(net, NULL)); return; case NDISC_REDIRECT: ip6_redirect(skb, net, skb->dev->ifindex, 0, @@ -508,7 +502,6 @@ static netdev_tx_t __gre6_xmit(struct sk_buff *skb, __u32 *pmtu, __be16 proto) { struct ip6_tnl *tunnel = netdev_priv(dev); - struct dst_entry *dst = skb_dst(skb); __be16 protocol; if (dev->type == ARPHRD_ETHER) @@ -527,10 +520,6 @@ static netdev_tx_t __gre6_xmit(struct sk_buff *skb, gre_build_header(skb, tunnel->tun_hlen, tunnel->parms.o_flags, protocol, tunnel->parms.o_key, htonl(tunnel->o_seqno)); - /* TooBig packet may have updated dst->dev's mtu */ - if (dst && dst_mtu(dst) > dst->dev->mtu) - dst->ops->update_pmtu(dst, NULL, skb, dst->dev->mtu); - return ip6_tnl_xmit(skb, dev, dsfield, fl6, encap_limit, pmtu, NEXTHDR_GRE); } -- 2.1.0
Re: [PATCH v4] af_netlink: ensure that NLMSG_DONE never fails in dumps
From: "Jason A. Donenfeld"Date: Thu, 9 Nov 2017 13:04:44 +0900 > @@ -2195,13 +2197,15 @@ static int netlink_dump(struct sock *sk) > return 0; > } > > - nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI); > - if (!nlh) > + nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, > +sizeof(nlk->dump_done_errno), NLM_F_MULTI); > + if (WARN_ON(!nlh)) > goto errout_skb; If you're handling this by forcing another read() to procude the NLMSG_DONE, then you have no reason to WARN_ON() here. In fact you are adding a WARN_ON() which is trivially triggerable by any user.
Re: [PATCH 1/2] bpf: add a bpf_override_function helper
* Josef Bacikwrote: > On Fri, Nov 10, 2017 at 10:34:59AM +0100, Ingo Molnar wrote: > > > > * Josef Bacik wrote: > > > > > @@ -551,6 +578,10 @@ static const struct bpf_func_proto > > > *kprobe_prog_func_proto(enum bpf_func_id func > > > return _get_stackid_proto; > > > case BPF_FUNC_perf_event_read_value: > > > return _perf_event_read_value_proto; > > > + case BPF_FUNC_override_return: > > > + pr_warn_ratelimited("%s[%d] is installing a program with > > > bpf_override_return helper that may cause unexpected behavior!", > > > + current->comm, task_pid_nr(current)); > > > + return _override_return_proto; > > > > So if this new functionality is used we'll always print this into the > > syslog? > > > > The warning is also a bit passive aggressive about informing the user: what > > unexpected behavior can happen, what is the worst case? > > > > It's modeled after the other warnings bpf will spit out, but with this feature > you are skipping a function and instead returning some arbitrary value, so > anything could go wrong if you mess something up. For instance I screwed up > my > initial test case and made every IO submitted return an error instead of just > on > the one file system I was attempting to test, so all sorts of hilarity ensued. Ok, then for the x86 bits: NAK-ed-by: Ingo Molnar One of the major advantages of having an in-kernel BPF sandbox is to never crash the kernel - and allowing BPF programs to just randomly modify the return value of kernel functions sounds immensely broken to me. (And yes, I realize that kprobes are used here as a vehicle, but the point remains.) Thanks, Ingo
Re: [PATCH 0/2][v5] Add the ability to do BPF directed error injection
* David Millerwrote: > From: Josef Bacik > Date: Tue, 7 Nov 2017 15:28:41 -0500 > > > I'm sending this through Dave since it'll conflict with other BPF changes > > in his > > tree, but since it touches tracing as well Dave would like a review from > > somebody on the tracing side. > ... > > A lot of our error paths are not well tested because we have no good way of > > injecting errors generically. Some subystems (block, memory) have ways to > > inject errors, but they are random so it's hard to get reproduceable > > results. > > > > With BPF we can add determinism to our error injection. We can use kprobes > > and > > other things to verify we are injecting errors at the exact case we are > > trying > > to test. This patch gives us the tool to actual do the error injection > > part. > > It is very simple, we just set the return value of the pt_regs we're given > > to > > whatever we provide, and then override the PC with a dummy function that > > simply > > returns. > > > > Right now this only works on x86, but it would be simple enough to expand to > > other architectures. Thanks, > > Series applied, thanks Josef. Please don't apply it yet as the series is still under active discussion - for now I'm NAK-ing the x86 bits because I have second thoughts about the whole premise of the feature being added here. Thanks, Ingo
[net-next:master 622/639] net/dsa/port.c:255: undefined reference to `br_vlan_enabled'
tree: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master head: bee955cd3ab4f1a1eb8fc16e7ed69364143df8d7 commit: 2ea7a679ca2abd251c1ec03f20508619707e1749 [622/639] net: dsa: Don't add vlans when vlan filtering is disabled config: x86_64-randconfig-s2-1208 (attached as .config) compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026 reproduce: git checkout 2ea7a679ca2abd251c1ec03f20508619707e1749 # save the attached .config to linux build tree make ARCH=x86_64 All errors (new ones prefixed by >>): net/dsa/port.o: In function `dsa_port_vlan_add': >> net/dsa/port.c:255: undefined reference to `br_vlan_enabled' net/dsa/port.o: In function `dsa_port_vlan_del': net/dsa/port.c:270: undefined reference to `br_vlan_enabled' vim +255 net/dsa/port.c 243 244 int dsa_port_vlan_add(struct dsa_port *dp, 245const struct switchdev_obj_port_vlan *vlan, 246struct switchdev_trans *trans) 247 { 248 struct dsa_notifier_vlan_info info = { 249 .sw_index = dp->ds->index, 250 .port = dp->index, 251 .trans = trans, 252 .vlan = vlan, 253 }; 254 > 255 if (br_vlan_enabled(dp->bridge_dev)) 256 return dsa_port_notify(dp, DSA_NOTIFIER_VLAN_ADD, ); 257 258 return 0; 259 } 260 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH 0/2][v5] Add the ability to do BPF directed error injection
From: Ingo MolnarDate: Sat, 11 Nov 2017 09:16:00 +0100 > Please don't apply it yet as the series is still under active > discussion - for now Fine, reverted.
Re: [net-next:master 622/639] net/dsa/port.c:255: undefined reference to `br_vlan_enabled'
From: kbuild test robotDate: Sat, 11 Nov 2017 16:57:08 +0800 > All errors (new ones prefixed by >>): > >net/dsa/port.o: In function `dsa_port_vlan_add': >>> net/dsa/port.c:255: undefined reference to `br_vlan_enabled' >net/dsa/port.o: In function `dsa_port_vlan_del': >net/dsa/port.c:270: undefined reference to `br_vlan_enabled' Problem is NET_DSA=y and BRIDGE_VLAN_FILTERING=m We need some Kconfig dependency foo to prevent this.
Re: [PATCH] net: mvneta: fix handling of the Tx descriptor counter
From: Simon GuinotDate: Wed, 8 Nov 2017 17:58:35 +0100 > @@ -2413,8 +2416,7 @@ static int mvneta_tx(struct sk_buff *skb, struct > net_device *dev) > if (txq->count >= txq->tx_stop_threshold) > netif_tx_stop_queue(nq); > > - if (!skb->xmit_more || netif_xmit_stopped(nq) || > - txq->pending + frags > MVNETA_TXQ_DEC_SENT_MASK) > + if (!skb->xmit_more || netif_xmit_stopped(nq)) > mvneta_txq_pend_desc_add(pp, txq, frags); > else > txq->pending += frags; As David Laight said, you should not allow unlimited amounts of ->xmit_more frames to be queued without a TX doorbell update. Therefore, please keep some kind of limit here otherwise latency will spike in some circumstances.
Re: [PATCH net-next] ibmvnic: Add vnic client data to login buffer
From: Nathan FontenotDate: Wed, 08 Nov 2017 11:23:56 -0600 > Update the login buffer to include client data for the vnic driver, > this includes the OS name, LPAR name, and device name. This update > alolows thius information to be available in the VIOS. ^^^ ^ So many typos... > Signed-off-by: Nathan Fontenot Applied, tanks.
Re: [PATCH] net: ethernet: bgmac: mark expected switch fall-through
From: "Gustavo A. R. Silva"Date: Wed, 8 Nov 2017 11:24:57 -0600 > In preparation to enabling -Wimplicit-fallthrough, mark switch cases > where we are expecting to fall through. > > Addresses-Coverity-ID: 1397972 > Signed-off-by: Gustavo A. R. Silva Applied to net-next.
Re: [PATCH] fsl/fman_port: mark expected switch fall-throughs
From: "Gustavo A. R. Silva"Date: Wed, 8 Nov 2017 11:57:13 -0600 > In preparation to enabling -Wimplicit-fallthrough, mark switch cases > where we are expecting to fall through. > > Addresses-Coverity-ID: 1397960 > Signed-off-by: Gustavo A. R. Silva Applied to net-next.
Re: [PATCH net-next 0/2] remove FACK loss recovery
From: Yuchung ChengDate: Wed, 8 Nov 2017 13:01:25 -0800 > This patch set removes the forward-acknowledgment (FACK) > packet-based loss and reordering detection. This simplifies TCP > loss recovery since the SACK scoreboard no longer needs to track > the number of pending packets under highest SACKed sequence. FACK > is subsumed by the time-based RACK loss detection which is more > robust under reordering and second order losses. Series applied, thank you.
Re: [PATCHv2 net-next 1/3] ip_gre: Refector the erpsan tunnel code.
From: William TuDate: Wed, 8 Nov 2017 16:13:02 -0800 > +static void erspan_build_header(struct sk_buff *skb, > + __be32 id, u32 index, > + bool truncate, bool is_ipv4) > +{ Please do not put large non-inline functions into header files.
Re: [PATCH net-next] l2tp: don't close sessions in l2tp_tunnel_destruct()
From: Guillaume NaultDate: Thu, 9 Nov 2017 08:29:52 +0900 > Sessions are already removed by the proto ->destroy() handlers, and > since commit f3c66d4e144a ("l2tp: prevent creation of sessions on terminated > tunnels"), > we're guaranteed that no new session can be created afterwards. > > Furthermore, l2tp_tunnel_closeall() can sleep when there are sessions > left to close. So we really shouldn't call it in a ->sk_destruct() > handler, as it can be used from atomic context. > > Signed-off-by: Guillaume Nault Applied, thank you.
Re: [PATCH] net: wan: x25_asy: mark expected switch fall-through
From: "Gustavo A. R. Silva"Date: Wed, 8 Nov 2017 22:25:08 -0600 > In preparation to enabling -Wimplicit-fallthrough, mark switch cases > where we are expecting to fall through. > > Addresses-Coverity-ID: 114928 > Signed-off-by: Gustavo A. R. Silva Applied.
Re: [PATCH] net: decnet: dn_table: mark expected switch fall-through
From: "Gustavo A. R. Silva"Date: Wed, 8 Nov 2017 21:38:28 -0600 > In preparation to enabling -Wimplicit-fallthrough, mark switch cases > where we are expecting to fall through. > > Addresses-Coverity-ID: 115106 > Signed-off-by: Gustavo A. R. Silva Applied.
Re: [PATCH] net: 3com: 3c574_cs: mark expected switch fall-through
From: "Gustavo A. R. Silva"Date: Wed, 8 Nov 2017 21:49:33 -0600 > In preparation to enabling -Wimplicit-fallthrough, mark switch cases > where we are expecting to fall through. > > Addresses-Coverity-ID: 114888 > Signed-off-by: Gustavo A. R. Silva Applied.
Re: [PATCH] net: 8390: pcnet_cs: mark expected switch fall-through
From: "Gustavo A. R. Silva"Date: Wed, 8 Nov 2017 21:44:38 -0600 > In preparation to enabling -Wimplicit-fallthrough, mark switch cases > where we are expecting to fall through. > > Addresses-Coverity-ID: 114891 > Signed-off-by: Gustavo A. R. Silva Applied.
Re: [PATCH] net: sfc: remove redundant variable start
From: Colin KingDate: Thu, 9 Nov 2017 08:01:22 + > From: Colin Ian King > > Variable start is assigned but never read hence it is redundant > and can be removed. Cleans up clang warning: > > drivers/net/ethernet/sfc/ptp.c:655:2: warning: Value stored to 'start' > is never read > > Signed-off-by: Colin Ian King Applied.
Re: [PATCH] qlge: remove duplicated assignment to mbcp
From: Colin KingDate: Thu, 9 Nov 2017 07:52:15 + > From: Colin Ian King > > The assignment to mbcp is identical to the initiatialized value assigned > to mbcp at declaration time a few lines earlier, hence we can remove the > second redundant assignment. Cleans up clang warning: > > drivers/net/ethernet/qlogic/qlge/qlge_mpi.c:209:22: warning: > Value stored to 'mbcp' during its initialization is never read > > Signed-off-by: Colin Ian King Applied, thanks.
Re: [PATCH] sock: Remove the global prot_inuse counter.
From: Tonghao ZhangDate: Thu, 9 Nov 2017 00:03:15 -0800 > The per-cpu counter for init_net is prepared in core_initcall. > The patch 7d720c3e ("percpu: add __percpu sparse annotations to net") > and d6d9ca0fe ("net: this_cpu_xxx conversions") optimize the > routines. Then remove the old counter. > > Cc: Pavel Emelyanov > Signed-off-by: Tonghao Zhang Yeah, why did we keep this global counter around :-) Applied to net-next, thanks.
Re: [PATCH net-next] net: thunderbolt: Clear finished Tx frame bus address in tbnet_tx_callback()
From: Mika WesterbergDate: Thu, 9 Nov 2017 13:46:28 +0300 > When Thunderbolt network interface is disabled or when the cable is > unplugged the driver releases all allocated buffers by calling > tbnet_free_buffers() for each ring. This function then calls > dma_unmap_page() for each buffer it finds where bus address is non-zero. > Now, we only clear this bus address when the Tx buffer is sent to the > hardware so it is possible that the function finds an entry that has > already been unmapped. > > Enabling DMA-API debugging catches this as well: > > thunderbolt :06:00.0: DMA-API: device driver tries to free DMA > memory it has not allocated [device address=0x68321000] > [size=4096 bytes] > > Fix this by clearing the bus address of a Tx frame right after we have > unmapped the buffer. > > Signed-off-by: Mika Westerberg Applied, but assuming zero is a non-valid DMA address is never a good idea. That's why we have the DMA error code signaling abstracted.
Re: [PATCH] netdev: add netdev_pagefrag_enabled sysctl
From: Hongbo LiDate: Thu, 9 Nov 2017 16:12:27 +0800 > From: Hongbo Li > > This patch solves a memory frag issue when allocating skb. > I found this issue in a udp scenario, here is my test model: > 1. About five hundreds udp threads listen on server, >and five hundreds client threads send udp pkts to them. >Some threads send pkts in a faster speed than others. > 2. The user processes on server don't have enough ability >to receive these pkts. > > Then I got following result: > 1. Some udp sockets' recv-q reach the queue's limit, others >not because of the global rmem limit. > 2. The "free" command shows "used" memory is more than 62GB. >But cat /proc/net/sockstat shows that udp uses only 12GB. > > This will confused the user that why the system consumes so > many memory.This is caused by the memory frags in netdev layer. > __netdev_alloc_frag() allocs a page block which has 8 pages. > > Then in this scenario, most skbs are freed when the recv-q > is full, but if any skb in the same page block be queued to > other recv-q which is not full, the whole page block can't > be freed. > > So from the view of kernel, these pages are used, but from > the view of tcp/udp, only the skbs in recv-q are used. > > To avoid exhausting memory in such scenario, I add a sysctl > to make user can disable allocating skbs in page frag. > > Signed-off-by: Hongbo Li When something like page fragments don't work properly, we fix them rather then providing a way to disable them. Thank you.
Re: [PATCH net-next] net: thunderx: fix double free error
From: Aleksey MakarovDate: Thu, 9 Nov 2017 14:58:57 +0300 > This patch fixes an error in memory allocation/freeing in > ThunderX PF driver. > > I moved the allocation to the probe() function and made it managed. > > From the Colin's email: > > While running static analysis on linux-next with CoverityScan I found 3 > double free errors in the Cavium thunder driver. > > The issue occurs on the err_disable_device: label of function nic_probe > when nic_free_lmacmem(nic) is called and a double free occurs on > nic->duplex, nic->link and nic->speed. This occurs when nic_init_hw() > fails: > > /* Initialize hardware */ > err = nic_init_hw(nic); > if (err) > goto err_release_regions; > > nic_init_hw() calls nic_get_hw_info() and this calls nic_free_lmacmem() > if any of the allocations fail. This free'ing occurs again by the call > to nic_free_lmacmem() on the err_release_regions exit path in nic_probe(). > > Reported-by: Colin Ian King > Signed-off-by: Aleksey Makarov Applied, thank you.
Re: [PATCH] ipvlan: fix ipv6 outbound device
From:Date: Thu, 9 Nov 2017 20:09:31 +0800 > From: Keefe Liu > > When process the outbound packet of ipv6, we should assign the master > device to output device other than input device. > > Signed-off-by: Keefe Liu Applied.
Re: pull-request: net-next: ieee802154 2017-11-09
From: Stefan SchmidtDate: Thu, 9 Nov 2017 18:12:49 +0100 > A small update on ieee802154 patches for net-next. Nothing dramatic, but > simply > housekeeping this time around. > A fix for the correct mask to be applied in the mrf24j40 driver by Gustavo A. > R. Silva > Removal of a non existing email user for the ca8210 driver by Harry Morris > A bunch of checkpatch cleanups across the subsystem from myself > > Please pull, thanks a lot! Pulled, thanks.
Re: [PATCH net-next] bindings: net: stmmac: correctify note about LPI interrupt
From: Niklas CasselDate: Thu, 9 Nov 2017 18:09:26 +0100 > There are two different combined signal for various interrupt events: > In EQOS-CORE and EQOS-MTL configurations, mci_intr_o is the interrupt > signal. > In EQOS-DMA, EQOS-AHB and EQOS-AXI configurations, these interrupt events > are combined with the events in the DMA on the sbd_intr_o signal. > > Depending on configuration, the device tree irq "macirq" will refer to > either mci_intr_o or sbd_intr_o. > > The databook states: > "The MAC generates the LPI interrupt when the Tx or Rx side enters or exits > the LPI state. The interrupt mci_intr_o (sbd_intr_o in certain > configurations) is asserted when the LPI interrupt status is set. > > When the MAC exits the Rx LPI state, then in addition to the mci_intr_o > (sbd_intr_o in certain configurations), the sideband signal lpi_intr_o is > asserted. > > If you do not want to gate-off the application clock during the Rx LPI > state, you can leave the lpi_intr_o signal unconnected and use the > mci_intr_o (sbd_intr_o in certain configurations) signal to detect Rx LPI > exit." > > Since the "macirq" is always raised when Tx or Rx enters/exits the LPI > state, "eth_lpi" must therefore refer to lpi_intr_o, which is only raised > when Rx exits the LPI state. Update the DT binding description to reflect > reality. > > Signed-off-by: Niklas Cassel Applied.
Re: [PATCH v3 net-next 0/6] mv88e6xxx broadcast flooding in hardware
From: Andrew LunnDate: Thu, 9 Nov 2017 22:29:50 +0100 > This patchset makes the mv88e6xxx driver perform flooding in hardware, > rather than let the software bridge perform the flooding. This is a > prerequisite for IGMP snooping on the bridge interface. > > In order to make hardware broadcasting work, a few other issues need > fixing or improving. SWITCHDEV_ATTR_ID_PORT_PARENT_ID is broken, which > is apparent when testing on the ZII devel board with multiple > switches. > > Some of these patches are taken from a previous RFC patchset of IGMP > support. > > Rebased onto net-next, with fixup for Vivien's refactoring. Series applied, thanks Andrew.
Re: [PATCH net-next] net: dsa: mv88e6xxx: Fix stats histogram mode
From: Andrew LunnDate: Fri, 10 Nov 2017 00:36:41 +0100 > The statistics histogram mode was not being explicitly initialized on > devices other than the 6390 family. Clearing the statistics then > overwrote the default setting, setting the histogram to a reserved > mode. > > Explicitly set the histogram mode for all devices. Change the > statistics clear into a read/modify/write, and since it is now more > complex, move it into global1.c. > > Signed-off-by: Andrew Lunn Applied, thanks.
Re: [Patch net] vlan: fix a use-after-free in vlan_device_event()
From: Cong WangDate: Thu, 9 Nov 2017 16:43:13 -0800 > After refcnt reaches zero, vlan_vid_del() could free > dev->vlan_info via RCU: > > RCU_INIT_POINTER(dev->vlan_info, NULL); > call_rcu(_info->rcu, vlan_info_rcu_free); > > However, the pointer 'grp' still points to that memory > since it is set before vlan_vid_del(): > > vlan_info = rtnl_dereference(dev->vlan_info); > if (!vlan_info) > goto out; > grp = _info->grp; > > Depends on when that RCU callback is scheduled, we could > trigger a use-after-free in vlan_group_for_each_dev() > right following this vlan_vid_del(). > > Fix it by moving vlan_vid_del() before setting grp. This > is also symmetric to the vlan_vid_add() we call in > vlan_device_event(). > > Reported-by: Fengguang Wu > Fixes: efc73f4bbc23 ("net: Fix memory leak - vlan_info struct") > Cc: Alexander Duyck > Cc: Linus Torvalds > Cc: Girish Moodalbail > Signed-off-by: Cong Wang Applied and queued up for -stable, thanks Cong!