[PATCH] rtl_bt: Update firmware for BT part of rtl8822be
These files were supplied by Realtek. Signed-off-by: Larry Finger--- WHENCE | 3 ++- rtl_bt/rtl8822b_config.bin | Bin 32 -> 14 bytes rtl_bt/rtl8822b_fw.bin | Bin 51756 -> 51176 bytes 3 files changed, 2 insertions(+), 1 deletion(-) diff --git a/WHENCE b/WHENCE index dcd5011..1ac6fd0 100644 --- a/WHENCE +++ b/WHENCE @@ -2859,7 +2859,8 @@ Licence: Redistributable. See LICENCE.rtlwifi_firmware.txt for details. Found in vendor driver, linux_bt_usb_2.11.20140423_8723be.rar From https://github.com/troy-tan/driver_store -Files rtl_bt/rtl8822b_* came directly from Realtek. +Files rtl_bt/rtl8822b_* came directly from Realtek. These files are +updated on April 14, 2017. -- diff --git a/rtl_bt/rtl8822b_config.bin b/rtl_bt/rtl8822b_config.bin index a691e7ca258b0e7dc4ff2bdbdc1d13f2a613526b..b00270edca74c0bcd0234ceb8fe313a61ee28416 100644 GIT binary patch literal 14 VcmWGtt=!JRaFc c95%5{95wX6aRco#FaXJC%U^0OgZI6A}jA-BJ z5eJL4+7km-i(oYY+k4OLJt=70Yw;Wftvx;My|iizNzkAkdrxnB`Tuqztv~Hpa zGc$Yk?B`l*uk~AN+fuLW`<*gtdAUX0R9;rG_Ho-$B}G#TN=fC-s+ekcAfKWr?<|Uv zHlN6H#0HtpNu*>b$NsHRBz^f`|SX^Wfxg0em5<~zdoc#j-0G>p zW6ho4ZMfg&*(U$trEL0H`RkWP)IBzep=~yQ-6o@KdU-l+m7jh2-uSg{r;8DH3^+Ru zx^@hl9XQzWh%D2Y35`ms+!QQzK7iIt!7}F(T; GSs}xvmC$6L%$IX>Rr@z zrfmIGR$58vB_3%7(!Wh}${0D4vy;#j?`K7`;?i#7Zs5ds@j#S0eP%VwAas>kgKc zj^lb&`f*|A>y2PDSaZ0WB+4MCv6~1}imb_0r!QDip-c#dKEUQE;Xj1<41v?#J~| zBrEmeirJKYLM0ba>C^G7^d-J|;vQCd8Sg$s+2gn#$iy=$mp0>jbl?KY)J3pTNwi*i z17*A@a~>UH(2*w!@r_Q80a6qvVRZK?^wP`tuofwMG794(6+Z5b)k}YgVx_V8C=%B$ zT=mG`g^x3Et;cmDt}V!)jcXgOrz7NlFP`psT=@e{Q?gBEA2k(o8KCvjSX^fy4dH$% zt|p`vQ{}hjmU>pHXVCJDc}4Q(`K#rHZ>(2tyG6?<-gsBpc#D$#3m#FdnkwJFuv*U7 zd~S%wX!_K8DLRdn*3Q#QKb5IUO)c~-Wwp{Gagw5OY@|ZodtkF%l=OwNY;c6EPoBc8 zLR+$Pv6xL3wgTy)qnRyUyw<{nc(YSFoAn`zL|5iL(#a*u+W>RFf^L(9*1` z$QKT-)sjitjaPE{?o`MvTl^Y@?T| A(GsU^*GCho}+TAEn z+}>+7Lacnx=)@d~OA6Epkqw%<2ym0Cj4B_yNiY2idS}GsWg@*zQzv$umaHpEFpN zRyy+h2y{xi6<5tzO#4l~t@~dGH++Pr1$yB2Zy=>R ziJ`b}KKy5Iiu^!&%9xpLstZZ=nw}r*>I*6Lr8kw85f`3z)(G13kv+6c3`AXc?TGHe zB>8vg<1@M4L)G{3y*D%)ruxRX ZSB7`NZni=xlkSafH0SB~`8*{mi&~h$5rC?cl>8SH*=Z37c zKFSu=z0FOHTxsAtYCH;oAIxjqDrdB?4v59KT}j|yH%Fox=k*6^=2Wu=PA&@ zQ!aXAqGfNhf%|QuM#bhF5FT^ekNEukytC7F_NPKI@qEBh0ZI6L2Y;L|e)*8D; z!4FwQUxkMjrOsUfHFBMS>wc`=duUy|^APVb1Yd7VxJd2|!CQ_-L-5H){l!Pyovm3y z>IIcBBo6e~_KoPV_v_AayP5H;@unr9yYNUGAw{WWz3@PrDd=g^`Qv-oiz`qKE5}_- z+fJ$3#2RFUgW`pm#7Hz!|O0`jCqLY%ln`m^29g_+UgvhtZ=-Hg6+cdqi;o#2 zM!bHZ&_0kV+m5Bv3G!>lQXFx+Svh{s{E6!e7WdP(lc0Sw?#6j?Fh6`_SHHR7@j|;V z!dC0I`$l{CM#IFWo6%L9k=Nxj6_`*q10{8@(;{`7JSnVa` }hUSaAc+WT*P_8 zzN+ry!g+AXvy?G>`=gvY5UY(r42gNckaZXe^eKunM7(eB2ix)gwKJv>A!8 z4Pl>D1$A;1?(1+r6=^pTAq9xZXWnfxxaXR|pR7puld|9ErXYllsK ra)r>MuA<#)#lig9Y08~1r3@5*;8GbvGPSFl>KNsu^Cgx8fF5?6|x9FM6p z!p8KZdQ?3|EXogd`nFEa$Vrh~+T-$xv KG|gR#PWsO=;ljk}>hoWn`=S z1bgD0S$jUGKt%>IRn9({p5}=_b4RFKs!d9sx|Oo&9?Mks{mnzwYM$aI%a)Ue< zWADf}Ygoc%$rIljA@iFVna$HEVpX2hyrl}oo=A^4VGgYr)X69ANlqX#;K#CvQuR5? zx B;g;>jH#nd6 OS@v-cjn4tJmr2OwTiClPb6wS*H z@d@SD8=U-U!C1wO6gka)ul7r_X(M5drx#9+acDt{Haoaf;v8D}`9gyN7iZyea$*1V zq}*I!M#?c=}Ea{l_pH$uya>crOE3jO;1>=Rd=g*(>ChEiZcvL zgMOZl^0+6lbm0T;2RzkZJm7f}X# 7cH1m0#x}dc?N^5#Cx)*%y>G(!+GCLb`;y z8^jyhnhDqDqkhus6zGSbQ2E}EX2;x}|94h?^`qj1tZ1rS$5M7(o7wN|VlpLeRv U%1%c8=U9VRuZ9X_dQ9>hDlYPDtuGrc^tow-}!w4_fL32zitEW zs`0F}F7i2pE4@mqGx)IlwJF2#n8)rbF7yUhdA-4>z0~dEf>Jy^gQqoK+Qx&`cxZQw z%^r|Cwz}+r*1*<|we7PH(c%uD-8cL6YR?+fYY0B*Gz6=ikF=9M?eH0b%e}<~$n~b% z9&$g_X6VT1Fm_~iP#ia7eaCXogZSn_^y
Re: [PATCH 02/22] nvmet: Make use of the new sg_map helper function
On Thu, Apr 13, 2017 at 11:06:16PM -0600, Logan Gunthorpe wrote: > Or maybe I'll just send a patch for that > separately seeing it doesn't depend on anything and is pretty simple. I > can do that next week. Yes, please just send that patch linux-nvme, we should be able to get it into 4.12.
Re: [PATCH 02/22] nvmet: Make use of the new sg_map helper function
On 13/04/17 10:59 PM, Christoph Hellwig wrote: > On Thu, Apr 13, 2017 at 04:05:15PM -0600, Logan Gunthorpe wrote: >> This is a straight forward conversion in two places. Should kmap fail, >> the code will return an INVALD_DATA error in the completion. > > It really should be using nvmet_copy_from_sgl to make things safer, > as we don't want to rely on any particular SG list layout. In fact > I'm pretty sure I did the conversion at some point, but it must never > have made it upstream. Ha, I did the conversion too a couple times for my RFC series. I can change this patch to do that. Or maybe I'll just send a patch for that separately seeing it doesn't depend on anything and is pretty simple. I can do that next week. Thanks, Logan
[PATCH net-next 1/1 v3] drivers: net: rmnet: Initial implementation
RmNet driver provides a transport agnostic MAP (multiplexing and aggregation protocol) support in embedded module. Module provides virtual network devices which can be attached to any IP-mode physical device. This will be used to provide all MAP functionality on future hardware in a single consistent location. Signed-off-by: Subash Abhinov Kasiviswanathan--- Documentation/networking/rmnet.txt| 83 + drivers/net/Kconfig | 2 + drivers/net/Makefile | 1 + drivers/net/rmnet/Kconfig | 23 ++ drivers/net/rmnet/Makefile| 14 + drivers/net/rmnet/rmnet_config.c | 592 ++ drivers/net/rmnet/rmnet_config.h | 79 + drivers/net/rmnet/rmnet_handlers.c| 517 + drivers/net/rmnet/rmnet_handlers.h| 24 ++ drivers/net/rmnet/rmnet_main.c| 52 +++ drivers/net/rmnet/rmnet_map.h | 100 ++ drivers/net/rmnet/rmnet_map_command.c | 180 +++ drivers/net/rmnet/rmnet_map_data.c| 145 + drivers/net/rmnet/rmnet_private.h | 76 + drivers/net/rmnet/rmnet_stats.c | 86 + drivers/net/rmnet/rmnet_stats.h | 61 drivers/net/rmnet/rmnet_vnd.c | 353 drivers/net/rmnet/rmnet_vnd.h | 34 ++ include/uapi/linux/Kbuild | 1 + include/uapi/linux/if_arp.h | 1 + include/uapi/linux/if_ether.h | 4 +- include/uapi/linux/rmnet.h| 34 ++ 22 files changed, 2461 insertions(+), 1 deletion(-) create mode 100644 Documentation/networking/rmnet.txt create mode 100644 drivers/net/rmnet/Kconfig create mode 100644 drivers/net/rmnet/Makefile create mode 100644 drivers/net/rmnet/rmnet_config.c create mode 100644 drivers/net/rmnet/rmnet_config.h create mode 100644 drivers/net/rmnet/rmnet_handlers.c create mode 100644 drivers/net/rmnet/rmnet_handlers.h create mode 100644 drivers/net/rmnet/rmnet_main.c create mode 100644 drivers/net/rmnet/rmnet_map.h create mode 100644 drivers/net/rmnet/rmnet_map_command.c create mode 100644 drivers/net/rmnet/rmnet_map_data.c create mode 100644 drivers/net/rmnet/rmnet_private.h create mode 100644 drivers/net/rmnet/rmnet_stats.c create mode 100644 drivers/net/rmnet/rmnet_stats.h create mode 100644 drivers/net/rmnet/rmnet_vnd.c create mode 100644 drivers/net/rmnet/rmnet_vnd.h create mode 100644 include/uapi/linux/rmnet.h diff --git a/Documentation/networking/rmnet.txt b/Documentation/networking/rmnet.txt new file mode 100644 index 000..58d3ea2 --- /dev/null +++ b/Documentation/networking/rmnet.txt @@ -0,0 +1,83 @@ +1. Introduction + +rmnet driver is used for supporting the Multiplexing and aggregation +Protocol (MAP). This protocol is used by all recent chipsets using Qualcomm +Technologies, Inc. modems. + +This driver can be used to register onto any physical network device in +IP mode. Physical transports include USB, HSIC, PCIe and IP accelerator. + +Multiplexing allows for creation of logical netdevices (rmnet devices) to +handle multiple private data networks (PDN) like a default internet, tethering, +multimedia messaging service (MMS) or IP media subsystem (IMS). Hardware sends +packets with MAP headers to rmnet. Based on the multiplexer id, rmnet +routes to the appropriate PDN after removing the MAP header. + +Aggregation is required to achieve high data rates. This involves hardware +sending aggregated bunch of MAP frames. rmnet driver will de-aggregate +these MAP frames and send them to appropriate PDN's. + +2. Packet format + +a. MAP packet (data / control) + +MAP header has the same endianness of the IP packet. + +Packet format - + +Bit 0 1 2-7 8 - 15 16 - 31 +Function Command / Data Reserved Pad Multiplexer IDPayload length +Bit32 - x +Function Raw Bytes + +Command (1)/ Data (0) bit value is to indicate if the packet is a MAP command +or data packet. Control packet is used for transport level flow control. Data +packets are standard IP packets. + +Reserved bits are usually zeroed out and to be ignored by receiver. + +Padding is number of bytes to be added for 4 byte alignment if required by +hardware. + +Multiplexer ID is to indicate the PDN on which data has to be sent. + +Payload length includes the padding length but does not include MAP header +length. + +b. MAP packet (command specific) + +Bit 0 1 2-7 8 - 15 16 - 31 +Function Command Reserved Pad Multiplexer IDPayload length +Bit 32 - 3940 - 4546 - 47 48 - 63 +Function Command nameReserved Command Type Reserved +Bit 64 - 95 +Function Transaction ID +Bit 96 - 127 +Function Command data + +Command 1 indicates disabling flow while 2 is enabling flow + +Command types - +0 for MAP command request +1 is to
[PATCH net-next 0/1 v3] drivers: net: Add support for rmnet driver
This patch adds support for the rmnet_data driver which is required to support recent chipsets using Qualcomm Technologies, Inc. modems. The data from hardware follows the multiplexing and aggregation protocol (MAP). This driver can be used to register onto any physical network device in IP mode. Physical transports include USB, HSIC, PCIe and IP accelerator. rmnet_data driver helps to decode these packets and queue them to network stack (and encode and transmit it to the physical device). -- v1: Same as the RFC patch with some minor fixes for issues reported by kbuild test robot. v1->v2: Change datatypes and remove config IOCTL as mentioned by David. Also fix checkpatch issues and remove some unused code. v2->v3: Move location to drivers/net and rename to rmnet. Change the userspace - netlink communication from custom netlink to rtnl_link_ops. Refactor some code. Use a fixed config for ingress and egress. Subash Abhinov Kasiviswanathan (1): drivers: net: rmnet: Initial implementation Documentation/networking/rmnet.txt| 83 + drivers/net/Kconfig | 2 + drivers/net/Makefile | 1 + drivers/net/rmnet/Kconfig | 23 ++ drivers/net/rmnet/Makefile| 14 + drivers/net/rmnet/rmnet_config.c | 592 ++ drivers/net/rmnet/rmnet_config.h | 79 + drivers/net/rmnet/rmnet_handlers.c| 517 + drivers/net/rmnet/rmnet_handlers.h| 24 ++ drivers/net/rmnet/rmnet_main.c| 52 +++ drivers/net/rmnet/rmnet_map.h | 100 ++ drivers/net/rmnet/rmnet_map_command.c | 180 +++ drivers/net/rmnet/rmnet_map_data.c| 145 + drivers/net/rmnet/rmnet_private.h | 76 + drivers/net/rmnet/rmnet_stats.c | 86 + drivers/net/rmnet/rmnet_stats.h | 61 drivers/net/rmnet/rmnet_vnd.c | 353 drivers/net/rmnet/rmnet_vnd.h | 34 ++ include/uapi/linux/Kbuild | 1 + include/uapi/linux/if_arp.h | 1 + include/uapi/linux/if_ether.h | 4 +- include/uapi/linux/rmnet.h| 34 ++ 22 files changed, 2461 insertions(+), 1 deletion(-) create mode 100644 Documentation/networking/rmnet.txt create mode 100644 drivers/net/rmnet/Kconfig create mode 100644 drivers/net/rmnet/Makefile create mode 100644 drivers/net/rmnet/rmnet_config.c create mode 100644 drivers/net/rmnet/rmnet_config.h create mode 100644 drivers/net/rmnet/rmnet_handlers.c create mode 100644 drivers/net/rmnet/rmnet_handlers.h create mode 100644 drivers/net/rmnet/rmnet_main.c create mode 100644 drivers/net/rmnet/rmnet_map.h create mode 100644 drivers/net/rmnet/rmnet_map_command.c create mode 100644 drivers/net/rmnet/rmnet_map_data.c create mode 100644 drivers/net/rmnet/rmnet_private.h create mode 100644 drivers/net/rmnet/rmnet_stats.c create mode 100644 drivers/net/rmnet/rmnet_stats.h create mode 100644 drivers/net/rmnet/rmnet_vnd.c create mode 100644 drivers/net/rmnet/rmnet_vnd.h create mode 100644 include/uapi/linux/rmnet.h -- 1.9.1
Re: [PATCH 02/22] nvmet: Make use of the new sg_map helper function
On Thu, Apr 13, 2017 at 04:05:15PM -0600, Logan Gunthorpe wrote: > This is a straight forward conversion in two places. Should kmap fail, > the code will return an INVALD_DATA error in the completion. It really should be using nvmet_copy_from_sgl to make things safer, as we don't want to rely on any particular SG list layout. In fact I'm pretty sure I did the conversion at some point, but it must never have made it upstream.
[PATCH v2 net 2/2] net: ethernet: mediatek: fix inconsistency of port number carried in TXD
From: Sean WangFix port inconsistency on TXD due to hardware BUG that would cause different port number is carried on the same TXD between tx_map() and tx_unmap() with the iperf test. It would cause confusing BQL logic which leads to kernel panic when dual GMAC runs concurrently. Signed-off-by: Sean Wang --- drivers/net/ethernet/mediatek/mtk_eth_soc.c | 14 +- drivers/net/ethernet/mediatek/mtk_eth_soc.h | 12 +--- 2 files changed, 18 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c index 48ba617..6313c53 100644 --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c @@ -648,6 +648,8 @@ static int mtk_tx_map(struct sk_buff *skb, struct net_device *dev, WRITE_ONCE(itxd->txd1, mapped_addr); itx_buf->flags |= MTK_TX_FLAGS_SINGLE0; + itx_buf->flags |= (!mac->id) ? MTK_TX_FLAGS_FPORT0 : + MTK_TX_FLAGS_FPORT1; dma_unmap_addr_set(itx_buf, dma_addr0, mapped_addr); dma_unmap_len_set(itx_buf, dma_len0, skb_headlen(skb)); @@ -689,6 +691,9 @@ static int mtk_tx_map(struct sk_buff *skb, struct net_device *dev, memset(tx_buf, 0, sizeof(*tx_buf)); tx_buf->skb = (struct sk_buff *)MTK_DMA_DUMMY_DESC; tx_buf->flags |= MTK_TX_FLAGS_PAGE0; + tx_buf->flags |= (!mac->id) ? MTK_TX_FLAGS_FPORT0 : +MTK_TX_FLAGS_FPORT1; + dma_unmap_addr_set(tx_buf, dma_addr0, mapped_addr); dma_unmap_len_set(tx_buf, dma_len0, frag_map_size); frag_size -= frag_map_size; @@ -1011,17 +1016,16 @@ static int mtk_poll_tx(struct mtk_eth *eth, int budget) while ((cpu != dma) && budget) { u32 next_cpu = desc->txd2; - int mac; + int mac = 0; desc = mtk_qdma_phys_to_virt(ring, desc->txd2); if ((desc->txd3 & TX_DMA_OWNER_CPU) == 0) break; - mac = (desc->txd4 >> TX_DMA_FPORT_SHIFT) & - TX_DMA_FPORT_MASK; - mac--; - tx_buf = mtk_desc_to_tx_buf(ring, desc); + if (tx_buf->flags & MTK_TX_FLAGS_FPORT1) + mac = 1; + skb = tx_buf->skb; if (!skb) { condition = 1; diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.h b/drivers/net/ethernet/mediatek/mtk_eth_soc.h index 996024d..3c46a3b 100644 --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h @@ -410,12 +410,18 @@ struct mtk_hw_stats { struct u64_stats_sync syncp; }; -/* PDMA descriptor can point at 1-2 segments. This enum allows us to track how - * memory was allocated so that it can be freed properly - */ enum mtk_tx_flags { + /* PDMA descriptor can point at 1-2 segments. This enum allows us to +* track how memory was allocated so that it can be freed properly. +*/ MTK_TX_FLAGS_SINGLE0= 0x01, MTK_TX_FLAGS_PAGE0 = 0x02, + + /* MTK_TX_FLAGS_FPORTx allows tracking which port the transmitted +* SKB out instead of looking up through hardware TX descriptor. +*/ + MTK_TX_FLAGS_FPORT0 = 0x04, + MTK_TX_FLAGS_FPORT1 = 0x08, }; /* This enum allows us to identify how the clock is defined on the array of the -- 1.9.1
[PATCH v2 net 1/2] net: ethernet: mediatek: fix inconsistency between TXD and the used buffer
From: Sean WangFix inconsistency between the TXD descriptor and the used buffer that would cause unexpected logic at mtk_tx_unmap() during skb housekeeping. Signed-off-by: Sean Wang --- drivers/net/ethernet/mediatek/mtk_eth_soc.c | 17 - 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c index 14e1bd1..48ba617 100644 --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c @@ -613,7 +613,7 @@ static int mtk_tx_map(struct sk_buff *skb, struct net_device *dev, struct mtk_mac *mac = netdev_priv(dev); struct mtk_eth *eth = mac->hw; struct mtk_tx_dma *itxd, *txd; - struct mtk_tx_buf *tx_buf; + struct mtk_tx_buf *itx_buf, *tx_buf; dma_addr_t mapped_addr; unsigned int nr_frags; int i, n_desc = 1; @@ -627,8 +627,8 @@ static int mtk_tx_map(struct sk_buff *skb, struct net_device *dev, fport = (mac->id + 1) << TX_DMA_FPORT_SHIFT; txd4 |= fport; - tx_buf = mtk_desc_to_tx_buf(ring, itxd); - memset(tx_buf, 0, sizeof(*tx_buf)); + itx_buf = mtk_desc_to_tx_buf(ring, itxd); + memset(itx_buf, 0, sizeof(*itx_buf)); if (gso) txd4 |= TX_DMA_TSO; @@ -647,9 +647,9 @@ static int mtk_tx_map(struct sk_buff *skb, struct net_device *dev, return -ENOMEM; WRITE_ONCE(itxd->txd1, mapped_addr); - tx_buf->flags |= MTK_TX_FLAGS_SINGLE0; - dma_unmap_addr_set(tx_buf, dma_addr0, mapped_addr); - dma_unmap_len_set(tx_buf, dma_len0, skb_headlen(skb)); + itx_buf->flags |= MTK_TX_FLAGS_SINGLE0; + dma_unmap_addr_set(itx_buf, dma_addr0, mapped_addr); + dma_unmap_len_set(itx_buf, dma_len0, skb_headlen(skb)); /* TX SG offload */ txd = itxd; @@ -685,10 +685,9 @@ static int mtk_tx_map(struct sk_buff *skb, struct net_device *dev, last_frag * TX_DMA_LS0)); WRITE_ONCE(txd->txd4, fport); - tx_buf->skb = (struct sk_buff *)MTK_DMA_DUMMY_DESC; tx_buf = mtk_desc_to_tx_buf(ring, txd); memset(tx_buf, 0, sizeof(*tx_buf)); - + tx_buf->skb = (struct sk_buff *)MTK_DMA_DUMMY_DESC; tx_buf->flags |= MTK_TX_FLAGS_PAGE0; dma_unmap_addr_set(tx_buf, dma_addr0, mapped_addr); dma_unmap_len_set(tx_buf, dma_len0, frag_map_size); @@ -698,7 +697,7 @@ static int mtk_tx_map(struct sk_buff *skb, struct net_device *dev, } /* store skb to cleanup */ - tx_buf->skb = skb; + itx_buf->skb = skb; WRITE_ONCE(itxd->txd4, txd4); WRITE_ONCE(itxd->txd3, (TX_DMA_SWC | TX_DMA_PLEN0(skb_headlen(skb)) | -- 1.9.1
[PATCH v2 net 0/2] Fix crash caused by reporting inconsistent skb->len to BQL
From: Sean WangChanges since v1: - fix inconsistent enumeration which easily causes the potential bug The series fixes kernel BUG caused by inconsistent SKB length reported into BQL. The reason for inconsistent length comes from hardware BUG which results in different port number carried on the TXD within the lifecycle of SKB. So patch 2) is proposed for use a software way to track which port the SKB involving instead of hardware way. And patch 1) is given for another issue I found which causes TXD and SKB inconsistency that is not expected in the initial logic, so it is also being corrected it in the series. The log for the kernel BUG caused by the issue is posted as below. [ 120.825955] kernel BUG at ... lib/dynamic_queue_limits.c:26! [ 120.837684] Internal error: Oops - BUG: 0 [#1] SMP ARM [ 120.842778] Modules linked in: [ 120.845811] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc1-191576-gdbcef47 #35 [ 120.853488] Hardware name: Mediatek Cortex-A7 (Device Tree) [ 120.859012] task: c1007480 task.stack: c100 [ 120.863510] PC is at dql_completed+0x108/0x17c [ 120.867915] LR is at 0x46 [ 120.870512] pc : []lr : [<0046>]psr: 8113 [ 120.870512] sp : c1001d58 ip : c1001d80 fp : c1001d7c [ 120.881895] r10: 003e r9 : df6b3400 r8 : 0ed86506 [ 120.887075] r7 : 0001 r6 : 0001 r5 : 0ed8654c r4 : df0135d8 [ 120.893546] r3 : 0001 r2 : df016800 r1 : fece r0 : df6b3480 [ 120.900018] Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 120.907093] Control: 10c5387d Table: 9e27806a DAC: 0051 [ 120.912789] Process swapper/0 (pid: 0, stack limit = 0xc1000218) [ 120.918744] Stack: (0xc1001d58 to 0xc1002000) 121.085331] 1fc0: c0a52a28 c10855d4 c1003c58 c0a52a24 c100885c 8000406a [ 121.093444] 1fe0: 410fc073 c1001ff8 8000807c c0a009cc [ 121.101575] [] (dql_completed) from [] (mtk_napi_tx+0x1d0/0x37c) [ 121.109263] [] (mtk_napi_tx) from [] (net_rx_action+0x24c/0x3b8) [ 121.116951] [] (net_rx_action) from [] (__do_softirq+0xe4/0x35c) [ 121.124638] [] (__do_softirq) from [] (irq_exit+0xe8/0x150) [ 121.131895] [] (irq_exit) from [] (__handle_domain_irq+0x70/0xc4) [ 121.139666] [] (__handle_domain_irq) from [] (gic_handle_irq+0x58/0x9c) [ 121.147953] [] (gic_handle_irq) from [] (__irq_svc+0x6c/0x90) [ 121.155373] Exception stack(0xc1001ef8 to 0xc1001f40) Sean Wang (2): net: ethernet: mediatek: fix inconsistency between TXD and the used buffer net: ethernet: mediatek: fix inconsistency of port number carried in TXD drivers/net/ethernet/mediatek/mtk_eth_soc.c | 31 - drivers/net/ethernet/mediatek/mtk_eth_soc.h | 12 --- 2 files changed, 26 insertions(+), 17 deletions(-) -- 1.9.1
Re: [PATCH net] xfrm: calculate L4 checksums also for GSO case before encrypting packets
On 13 April 2017 at 19:45, Ansis Attekawrote: > > > > On 11 April 2017 at 00:07, Steffen Klassert > wrote: >> >> On Mon, Apr 10, 2017 at 11:42:07AM -0700, Ansis Atteka wrote: >> > Otherwise, if L4 checksum calculation is done after encryption, >> > then all ESP packets end up being corrupted at the location >> > where pre-encryption L4 checksum field resides. >> > >> > One of the ways to reproduce this bug is to have a VM with virtio_net >> > driver (UFO set to ON in the guest VM); and then encapsulate all guest's >> > Ethernet frames in GENEVE; and then further encrypt GENEVE with IPsec. >> > In this case following symptoms are observed: >> > 1. If using ixgbe NIC, then the driver will also emit following >> >warning message: >> >ixgbe :01:00.1: partial checksum but l4 proto=32! >> > 2. Receiving VM will drop all the corrupted ESP packets, hence UDP iperf >> > test >> >with large packets will fail completely or TCP iperf will get >> > ridiculously >> >low performance because TCP window will never grow above MTU. >> > >> > Signed-off-by: Ansis Atteka >> > --- >> > net/xfrm/xfrm_output.c | 19 +-- >> > 1 file changed, 13 insertions(+), 6 deletions(-) >> > >> > diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c >> > index 8ba29fe..7ad7e5f 100644 >> > --- a/net/xfrm/xfrm_output.c >> > +++ b/net/xfrm/xfrm_output.c >> > @@ -168,7 +168,8 @@ static int xfrm_output2(struct net *net, struct sock >> > *sk, struct sk_buff *skb) >> > >> > static int xfrm_output_gso(struct net *net, struct sock *sk, struct >> > sk_buff *skb) >> > { >> > - struct sk_buff *segs; >> > + struct sk_buff *segs, *nskb; >> > + int err; >> > >> > BUILD_BUG_ON(sizeof(*IPCB(skb)) > SKB_SGO_CB_OFFSET); >> > BUILD_BUG_ON(sizeof(*IP6CB(skb)) > SKB_SGO_CB_OFFSET); >> > @@ -180,21 +181,27 @@ static int xfrm_output_gso(struct net *net, struct >> > sock *sk, struct sk_buff *skb >> > return -EINVAL; >> > >> > do { >> > - struct sk_buff *nskb = segs->next; >> > - int err; >> > + nskb = segs->next; >> > >> > segs->next = NULL; >> > - err = xfrm_output2(net, sk, segs); >> > + err = skb_checksum_help(segs); >> >> What's wrong with the checksum provided by the GSO layer and >> why we have to do this unconditionally here? I believe with "GSO layer" you meant the skb_gso_segment() function invocation from xfrm_output_gso()? If so, then the problem with that is that the list of the skb's returned by that function could be in CHECKSUM_PARTIAL state, if skbs came from a UDP tunnel such as Geneve: xfrm_output() { __skb_gso_segment() { skb_mac_gso_segment() { skb_network_protocol(); inet_gso_segment() { udp4_ufo_fragment() { skb_udp_tunnel_segment() { skb_mac_gso_segment() { skb_network_protocol(); inet_gso_segment() { udp4_ufo_fragment() { skb_checksum() { __skb_checksum() { csum_partial() { do_csum(); } csum_partial() { do_csum(); } } Since those skbs could remain in CHECKSUM_PARTIAL state even after IPsec encryption, then ixgbe tries to calculate L4 checksums on already encrypted skb where L4 layer is already protected through IPsec integrity checks. Hence, ESP packets end up being corrupted and dropped on receive side by XFRM. I clearly see this ESP packet corruption happening by observing: 1. in wireshark that the same ESP packet differs at the offset where UDP checksum field should reside; AND 2. in dmesg that ixgbe driver complains on send side with "partial checksum but L4 proto is 0x32 (ESP)". AND 3. in /proc/net/xfrm_stat where XfrmInStateProtoErrorcounter is incremented on receive side each time it receives corrupted packet. >> >> >> We don't announce any checksum capabilities, so the GSO >> layer should provide the checksum. If this is not the case, >> something along the path is taking wrong assumptions. > > The same explicit checksum calculation is done from xfrm_output() for non-GSO case, so it was tempting for me to simply put a similar skb_checksum_help() for GSO case as well. > >> Btw. all GSO packets on a standard IPv4 xfrm tunnel are getting >> dropped with your patch applied. >> I think I just noticed possible issue with my patch that I sent out. In your setup were packets getting dropped on receive side due to UDP checksum failure (and not IPsec integrity check failure)? If so then I wonder if after my patch applied skb_checksum_help() was called twice under conditions that you tested for. Hence the skbs ended up with wrong checksums. So, would you mind
repost: af_packet vs virtio (was packed ring layout proposal v2)
On Fri, Apr 14, 2017 at 05:42:58AM +0300, Michael S. Tsirkin wrote: > Hi all, I wanted to raise the question of similarities between virtio > and new zero copy af_packet interfaces. > > First I would like to mention that virtio device development isn't spec > limited - spec is there to help interoperability and add peace of mind > for people worried about IPR. > > So I tend to accept patches without requiring people write it up in the > spec as work on spec proceeds at its own pace - all I ask is that the > virtio mailing list is copied, this requires contributor to subscribe > and in the process contributor promises that it's ok for us to add this > to spec in the future. > > There shouldn't thus be a fundamental problem preventing use of virtio > format or reusing some of the code for af_packet, but it still might or > might not make sense - it was designed for CPU to CPU communication so > it seems to make sense though. So I would like that discussion to > happen even if we decide against. > > And even if people decide against, the problem space is very similar. You > can look up packed ring layout proposal v2 - should I repost here? Our > prototyping shows significant performance improvements from using it as > compared to head/tail layout. > > To start this discission I'm going to reply to this email reposting a > copy of the simplified virtio layout that might be appropriate for > af_packet as well. Here's the repost (slightly cut down) sorry about the duplicates. The idea is to have a r/w descriptor in a ring structure, replacing the used and available ring, index and descriptor buffer. * Descriptor ring: Guest adds descriptors with unique index values and DESC_HW set in flags. Host overwrites used descriptors with correct len, index, and DESC_HW clear. Flags are always set/cleared last. #define DESC_HW 0x0080 struct desc { __le64 addr; __le32 len; __le16 index; __le16 flags; }; When DESC_HW is set, descriptor belongs to device. When it is clear, it belongs to the driver. We can use 1 bit to set direction /* This marks a buffer as write-only (otherwise read-only). */ #define VRING_DESC_F_WRITE 2 * Scatter/gather support We can use 1 bit to chain s/g entries in a request, same as virtio 1.0: /* This marks a buffer as continuing via the next field. */ #define VRING_DESC_F_NEXT 1 Unlike virtio 1.0, all descriptors must have distinct ID values. Also unlike virtio 1.0, use of this flag will be an optional feature (e.g. VIRTIO_F_DESC_NEXT) so both devices and drivers can opt out of it. * Indirect buffers Can be marked like in virtio 1.0: /* This means the buffer contains a table of buffer descriptors. */ #define VRING_DESC_F_INDIRECT 4 Unlike virtio 1.0, this is a table, not a list: struct indirect_descriptor_table { /* The actual descriptors (16 bytes each) */ struct virtq_desc desc[len / 16]; }; The first descriptor is located at start of the indirect descriptor table, additional indirect descriptors come immediately afterwards. DESC_F_WRITE is the only valid flag for descriptors in the indirect table. Others should be set to 0 and are ignored. id is also set to 0 and should be ignored. virtio 1.0 seems to allow a s/g entry followed by an indirect descriptor. This does not seem useful, so we do not allow that anymore. This support would be an optional feature, same as in virtio 1.0 * Batching descriptors: virtio 1.0 allows passing a batch of descriptors in both directions, by incrementing the used/avail index by values > 1. We can support this by chaining a list of descriptors through a bit the flags field. To allow use together with s/g, a different bit will be used. #define VRING_DESC_F_BATCH_NEXT 0x0010 Batching works for both driver and device descriptors. * Processing descriptors in and out of order Device processing all descriptors in order can simply flip the DESC_HW bit as it is done with descriptors. Device can write descriptors out in order as they are used, overwriting descriptors that are there. Device must not use a descriptor until DESC_HW is set. It is only required to look at the first descriptor submitted. Driver must not overwrite a descriptor until DESC_HW is clear. It is only required to look at the first descriptor submitted. * Device specific descriptor flags We have a lot of unused space in the descriptor. This can be put to good use by reserving some flag bits for device use. For example, network device can set a bit to request that header in the descriptor is suppressed (in case it's all 0s anyway). This reduces cache utilization. Note: this feature can be supported in virtio 1.0 as well, as we have unused bits in both descriptor and used ring there. * Descriptor length in device descriptors virtio 1.0 places strict requirements on descriptor length. For example it must be 0 in used ring of TX VQ of a network device since nothing is written. In practice guests do not
af_packet vs virtio
Hi all, I wanted to raise the question of similarities between virtio and new zero copy af_packet interfaces. First I would like to mention that virtio device development isn't spec limited - spec is there to help interoperability and add peace of mind for people worried about IPR. So I tend to accept patches without requiring people write it up in the spec as work on spec proceeds at its own pace - all I ask is that the virtio mailing list is copied, this requires contributor to subscribe and in the process contributor promises that it's ok for us to add this to spec in the future. There shouldn't thus be a fundamental problem preventing use of virtio format or reusing some of the code for af_packet, but it still might or might not make sense - it was designed for CPU to CPU communication so it seems to make sense though. So I would like that discussion to happen even if we decide against. And even if people decide against, the problem space is very similar. You can look up packed ring layout proposal v2 - should I repost here? Our prototyping shows significant performance improvements from using it as compared to head/tail layout. To start this discission I'm going to reply to this email reposting a copy of the simplified virtio layout that might be appropriate for af_packet as well. -- MST
Re: [PATCH v2 net-next 5/8] net/ncsi: Dump NCSI packet statistics
On Thu, 2017-04-13 at 17:48 +1000, Gavin Shan wrote: > This creates /sys/kernel/debug/ncsi//stats to dump the NCSI > packets sent and received over all packages and channels. It's useful > to diagnose NCSI problems, especially when NCSI packages and channels > aren't probed properly. The statistics can be gained from debugfs file > as below: > > # cat /sys/kernel/debug/ncsi/eth0/stats > > CMD OK TIMEOUT ERROR > === > CIS 32 29 0 > SP 10 70 > DP 17 14 0 > EC 100 > ECNT 100 > AE 100 > GLS 11 00 > SMA 100 > EBF 100 > GVI 200 > GC 200 more trivia: > diff --git a/net/ncsi/ncsi-debug.c b/net/ncsi/ncsi-debug.c [] > @@ -23,6 +23,235 @@ > #include "ncsi-pkt.h" > > static struct dentry *ncsi_dentry; > +static struct ncsi_pkt_handler { > + unsigned char type; > + const char *name; > +} ncsi_pkt_handlers[] = { > + { NCSI_PKT_CMD_CIS,"CIS"}, > + { NCSI_PKT_CMD_SP, "SP" }, > + { NCSI_PKT_CMD_DP, "DP" }, > + { NCSI_PKT_CMD_EC, "EC" }, > + { NCSI_PKT_CMD_DC, "DC" }, > + { NCSI_PKT_CMD_RC, "RC" }, > + { NCSI_PKT_CMD_ECNT, "ECNT" }, > + { NCSI_PKT_CMD_DCNT, "DCNT" }, > + { NCSI_PKT_CMD_AE, "AE" }, > + { NCSI_PKT_CMD_SL, "SL" }, > + { NCSI_PKT_CMD_GLS,"GLS"}, > + { NCSI_PKT_CMD_SVF,"SVF"}, > + { NCSI_PKT_CMD_EV, "EV" }, > + { NCSI_PKT_CMD_DV, "DV" }, > + { NCSI_PKT_CMD_SMA,"SMA"}, > + { NCSI_PKT_CMD_EBF,"EBF"}, > + { NCSI_PKT_CMD_DBF,"DBF"}, > + { NCSI_PKT_CMD_EGMF, "EGMF" }, > + { NCSI_PKT_CMD_DGMF, "DGMF" }, > + { NCSI_PKT_CMD_SNFC, "SNFC" }, > + { NCSI_PKT_CMD_GVI,"GVI"}, > + { NCSI_PKT_CMD_GC, "GC" }, > + { NCSI_PKT_CMD_GP, "GP" }, > + { NCSI_PKT_CMD_GCPS, "GCPS" }, > + { NCSI_PKT_CMD_GNS,"GNS"}, > + { NCSI_PKT_CMD_GNPTS, "GNPTS" }, > + { NCSI_PKT_CMD_GPS,"GPS"}, > + { NCSI_PKT_CMD_OEM,"OEM"}, > + { NCSI_PKT_CMD_PLDM, "PLDM" }, > + { NCSI_PKT_CMD_GPUUID, "GPUUID" }, I don't know how common these are and how intelligible these acronyms are to knowledgeable developer/users, but maybe it'd be better to spell out what these are instead of having to look up what the acronyms stand for CIS - Clear Initial State SP - Select Package etc... Maybe copy the descriptions from the ncsi-pkt.h file #define NCSI_PKT_CMD_CIS0x00 /* Clear Initial State */ #define NCSI_PKT_CMD_SP 0x01 /* Select Package */ #define NCSI_PKT_CMD_DP 0x02 /* Deselect Package */ #define NCSI_PKT_CMD_EC 0x03 /* Enable Channel */ #define NCSI_PKT_CMD_DC 0x04 /* Disable Channel */ #define NCSI_PKT_CMD_RC 0x05 /* Reset Channel*/ #define NCSI_PKT_CMD_ECNT 0x06 /* Enable Channel Network Tx*/ #define NCSI_PKT_CMD_DCNT 0x07 /* Disable Channel Network Tx */ #define NCSI_PKT_CMD_AE 0x08 /* AEN Enable */ #define NCSI_PKT_CMD_SL 0x09 /* Set Link */ #define NCSI_PKT_CMD_GLS0x0a /* Get Link */ #define NCSI_PKT_CMD_SVF0x0b /* Set VLAN Filter */ #define NCSI_PKT_CMD_EV 0x0c /* Enable VLAN */ #define NCSI_PKT_CMD_DV 0x0d /* Disable VLAN */ #define NCSI_PKT_CMD_SMA0x0e /* Set MAC address */ #define NCSI_PKT_CMD_EBF0x10 /* Enable Broadcast Filter */ #define NCSI_PKT_CMD_DBF0x11 /* Disable Broadcast Filter */ #define NCSI_PKT_CMD_EGMF 0x12 /* Enable Global Multicast Filter */ #define NCSI_PKT_CMD_DGMF 0x13 /* Disable Global Multicast Filter */ #define NCSI_PKT_CMD_SNFC 0x14 /* Set NCSI Flow Control*/ #define NCSI_PKT_CMD_GVI0x15 /* Get Version ID */ #define NCSI_PKT_CMD_GC 0x16 /* Get Capabilities */ #define NCSI_PKT_CMD_GP 0x17 /* Get Parameters */ #define NCSI_PKT_CMD_GCPS 0x18 /* Get Controller Packet Statistics */ #define NCSI_PKT_CMD_GNS0x19 /* Get NCSI Statistics */ #define NCSI_PKT_CMD_GNPTS 0x1a /* Get NCSI Pass-throu Statistics */ #define NCSI_PKT_CMD_GPS0x1b /* Get package status */ #define NCSI_PKT_CMD_OEM0x50 /* OEM */ #define NCSI_PKT_CMD_PLDM 0x51 /* PLDM request over NCSI
Re: [PATCH v2 net-next 5/8] net/ncsi: Dump NCSI packet statistics
Hi! On Thu, 13 Apr 2017 17:48:18 +1000, Gavin Shan wrote: > This creates /sys/kernel/debug/ncsi//stats to dump the NCSI > packets sent and received over all packages and channels. It's useful > to diagnose NCSI problems, especially when NCSI packages and channels > aren't probed properly. The statistics can be gained from debugfs file > as below: > > # cat /sys/kernel/debug/ncsi/eth0/stats > > CMD OK TIMEOUT ERROR > === > CIS 32 29 0 > SP 10 70 > DP 17 14 0 > EC 100 > ECNT 100 > AE 100 > GLS 11 00 > SMA 100 > EBF 100 > GVI 200 > GC 200 > > RSP OK TIMEOUT ERROR > === > CIS 300 > SP 300 > DP 201 > EC 100 > ECNT 100 > AE 100 > GLS 11 00 > SMA 100 > EBF 100 > GVI 002 > GC 200 > > AEN OK TIMEOUT ERROR > === > > Signed-off-by: Gavin ShanI'm not familiar with NC-SI but these look like some standard stats. Would it make sense to provide a proper netlink API for them? [...] > +#ifdef CONFIG_NET_NCSI_DEBUG > + ndp->stats.aen[h->type][NCSI_PKT_STAT_ERROR]++; > +#endif In any case, did you consider creating a macro or inline helper to limit the number of #ifdefs?
[PATCH 3/9] netfilter: helper: Add the rcu lock when call __nf_conntrack_helper_find
From: Gao FengWhen invoke __nf_conntrack_helper_find, it needs the rcu lock to protect the helper module which would not be unloaded. Now there are two caller nf_conntrack_helper_try_module_get and ctnetlink_create_expect which don't hold rcu lock. And the other callers left like ctnetlink_change_helper, ctnetlink_create_conntrack, and ctnetlink_glue_attach_expect, they already hold the rcu lock or spin_lock_bh. Remove the rcu lock in functions nf_ct_helper_expectfn_find_by_name and nf_ct_helper_expectfn_find_by_symbol. Because they return one pointer which needs rcu lock, so their caller should hold the rcu lock, not in these two functions. Signed-off-by: Gao Feng Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nf_conntrack_helper.c | 17 - net/netfilter/nf_conntrack_netlink.c | 10 -- 2 files changed, 20 insertions(+), 7 deletions(-) diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c index 6dc44d9b4190..4eeb3418366a 100644 --- a/net/netfilter/nf_conntrack_helper.c +++ b/net/netfilter/nf_conntrack_helper.c @@ -158,16 +158,25 @@ nf_conntrack_helper_try_module_get(const char *name, u16 l3num, u8 protonum) { struct nf_conntrack_helper *h; + rcu_read_lock(); + h = __nf_conntrack_helper_find(name, l3num, protonum); #ifdef CONFIG_MODULES if (h == NULL) { - if (request_module("nfct-helper-%s", name) == 0) + rcu_read_unlock(); + if (request_module("nfct-helper-%s", name) == 0) { + rcu_read_lock(); h = __nf_conntrack_helper_find(name, l3num, protonum); + } else { + return h; + } } #endif if (h != NULL && !try_module_get(h->me)) h = NULL; + rcu_read_unlock(); + return h; } EXPORT_SYMBOL_GPL(nf_conntrack_helper_try_module_get); @@ -311,38 +320,36 @@ void nf_ct_helper_expectfn_unregister(struct nf_ct_helper_expectfn *n) } EXPORT_SYMBOL_GPL(nf_ct_helper_expectfn_unregister); +/* Caller should hold the rcu lock */ struct nf_ct_helper_expectfn * nf_ct_helper_expectfn_find_by_name(const char *name) { struct nf_ct_helper_expectfn *cur; bool found = false; - rcu_read_lock(); list_for_each_entry_rcu(cur, _ct_helper_expectfn_list, head) { if (!strcmp(cur->name, name)) { found = true; break; } } - rcu_read_unlock(); return found ? cur : NULL; } EXPORT_SYMBOL_GPL(nf_ct_helper_expectfn_find_by_name); +/* Caller should hold the rcu lock */ struct nf_ct_helper_expectfn * nf_ct_helper_expectfn_find_by_symbol(const void *symbol) { struct nf_ct_helper_expectfn *cur; bool found = false; - rcu_read_lock(); list_for_each_entry_rcu(cur, _ct_helper_expectfn_list, head) { if (cur->expectfn == symbol) { found = true; break; } } - rcu_read_unlock(); return found ? cur : NULL; } EXPORT_SYMBOL_GPL(nf_ct_helper_expectfn_find_by_symbol); diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index 59ee27deb9a0..06d28ac663df 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -3133,23 +3133,27 @@ ctnetlink_create_expect(struct net *net, return -ENOENT; ct = nf_ct_tuplehash_to_ctrack(h); + rcu_read_lock(); if (cda[CTA_EXPECT_HELP_NAME]) { const char *helpname = nla_data(cda[CTA_EXPECT_HELP_NAME]); helper = __nf_conntrack_helper_find(helpname, u3, nf_ct_protonum(ct)); if (helper == NULL) { + rcu_read_unlock(); #ifdef CONFIG_MODULES if (request_module("nfct-helper-%s", helpname) < 0) { err = -EOPNOTSUPP; goto err_ct; } + rcu_read_lock(); helper = __nf_conntrack_helper_find(helpname, u3, nf_ct_protonum(ct)); if (helper) { err = -EAGAIN; - goto err_ct; + goto err_rcu; } + rcu_read_unlock(); #endif err = -EOPNOTSUPP; goto err_ct; @@ -3159,11 +3163,13 @@ ctnetlink_create_expect(struct net *net, exp = ctnetlink_alloc_expect(cda, ct, helper, , ); if (IS_ERR(exp)) { err = PTR_ERR(exp); - goto err_ct; + goto
[PATCH 7/9] netfilter: nf_ct_expect: use proper RCU list traversal/update APIs
From: Liping ZhangWe should use proper RCU list APIs to manipulate help->expectations, as we can dump the conntrack's expectations via nfnetlink, i.e. in ctnetlink_exp_ct_dump_table(), where only rcu_read_lock is acquired. So for list traversal, use hlist_for_each_entry_rcu; for list add/del, use hlist_add_head_rcu and hlist_del_rcu. Signed-off-by: Liping Zhang Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nf_conntrack_expect.c | 4 ++-- net/netfilter/nf_conntrack_netlink.c | 6 +++--- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/net/netfilter/nf_conntrack_expect.c b/net/netfilter/nf_conntrack_expect.c index 4b2e1fb28bb4..d80073037856 100644 --- a/net/netfilter/nf_conntrack_expect.c +++ b/net/netfilter/nf_conntrack_expect.c @@ -57,7 +57,7 @@ void nf_ct_unlink_expect_report(struct nf_conntrack_expect *exp, hlist_del_rcu(>hnode); net->ct.expect_count--; - hlist_del(>lnode); + hlist_del_rcu(>lnode); master_help->expecting[exp->class]--; nf_ct_expect_event_report(IPEXP_DESTROY, exp, portid, report); @@ -363,7 +363,7 @@ static void nf_ct_expect_insert(struct nf_conntrack_expect *exp) /* two references : one for hash insert, one for the timer */ atomic_add(2, >use); - hlist_add_head(>lnode, _help->expectations); + hlist_add_head_rcu(>lnode, _help->expectations); master_help->expecting[exp->class]++; hlist_add_head_rcu(>hnode, _ct_expect_hash[h]); diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index f78eadba343d..dc7dfd68fafe 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -2680,8 +2680,8 @@ ctnetlink_exp_dump_table(struct sk_buff *skb, struct netlink_callback *cb) last = (struct nf_conntrack_expect *)cb->args[1]; for (; cb->args[0] < nf_ct_expect_hsize; cb->args[0]++) { restart: - hlist_for_each_entry(exp, _ct_expect_hash[cb->args[0]], -hnode) { + hlist_for_each_entry_rcu(exp, _ct_expect_hash[cb->args[0]], +hnode) { if (l3proto && exp->tuple.src.l3num != l3proto) continue; @@ -2732,7 +2732,7 @@ ctnetlink_exp_ct_dump_table(struct sk_buff *skb, struct netlink_callback *cb) rcu_read_lock(); last = (struct nf_conntrack_expect *)cb->args[1]; restart: - hlist_for_each_entry(exp, >expectations, lnode) { + hlist_for_each_entry_rcu(exp, >expectations, lnode) { if (l3proto && exp->tuple.src.l3num != l3proto) continue; if (cb->args[1]) { -- 2.1.4
[PATCH 9/9] netfilter: ipt_CLUSTERIP: Fix wrong conntrack netns refcnt usage
From: Gao FengCurrent codes invoke wrongly nf_ct_netns_get in the destroy routine, it should use nf_ct_netns_put, not nf_ct_netns_get. It could cause some modules could not be unloaded. Fixes: ecb2421b5ddf ("netfilter: add and use nf_ct_netns_get/put") Signed-off-by: Gao Feng Signed-off-by: Pablo Neira Ayuso --- net/ipv4/netfilter/ipt_CLUSTERIP.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c b/net/ipv4/netfilter/ipt_CLUSTERIP.c index 52f26459efc3..9b8841316e7b 100644 --- a/net/ipv4/netfilter/ipt_CLUSTERIP.c +++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c @@ -461,7 +461,7 @@ static void clusterip_tg_destroy(const struct xt_tgdtor_param *par) clusterip_config_put(cipinfo->config); - nf_ct_netns_get(par->net, par->family); + nf_ct_netns_put(par->net, par->family); } #ifdef CONFIG_COMPAT -- 2.1.4
[PATCH 1/9] netfilter: xt_TCPMSS: add more sanity tests on tcph->doff
From: Eric DumazetDenys provided an awesome KASAN report pointing to an use after free in xt_TCPMSS I have provided three patches to fix this issue, either in xt_TCPMSS or in xt_tcpudp.c. It seems xt_TCPMSS patch has the smallest possible impact. Signed-off-by: Eric Dumazet Reported-by: Denys Fedoryshchenko Signed-off-by: Pablo Neira Ayuso --- net/netfilter/xt_TCPMSS.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/net/netfilter/xt_TCPMSS.c b/net/netfilter/xt_TCPMSS.c index 27241a767f17..c64aca611ac5 100644 --- a/net/netfilter/xt_TCPMSS.c +++ b/net/netfilter/xt_TCPMSS.c @@ -104,7 +104,7 @@ tcpmss_mangle_packet(struct sk_buff *skb, tcph = (struct tcphdr *)(skb_network_header(skb) + tcphoff); tcp_hdrlen = tcph->doff * 4; - if (len < tcp_hdrlen) + if (len < tcp_hdrlen || tcp_hdrlen < sizeof(struct tcphdr)) return -1; if (info->mss == XT_TCPMSS_CLAMP_PMTU) { @@ -152,6 +152,10 @@ tcpmss_mangle_packet(struct sk_buff *skb, if (len > tcp_hdrlen) return 0; + /* tcph->doff has 4 bits, do not wrap it to 0 */ + if (tcp_hdrlen >= 15 * 4) + return 0; + /* * MSS Option not found ?! add it.. */ -- 2.1.4
[PATCH 4/9] netfilter: ctnetlink: make it safer when checking the ct helper name
From: Liping ZhangOne CPU is doing ctnetlink_change_helper(), while another CPU is doing unhelp() at the same time. So even if help->helper is not NULL at first, the later statement strcmp(help->helper->name, ...) may still access the NULL pointer. So we must use rcu_read_lock and rcu_dereference to avoid such _bad_ thing happen. Fixes: f95d7a46bc57 ("netfilter: ctnetlink: Fix regression in CTA_HELP processing") Signed-off-by: Liping Zhang Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nf_conntrack_netlink.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index 06d28ac663df..f9c643bc1a8e 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -1488,11 +1488,16 @@ static int ctnetlink_change_helper(struct nf_conn *ct, * treat the second attempt as a no-op instead of returning * an error. */ - if (help && help->helper && - !strcmp(help->helper->name, helpname)) - return 0; - else - return -EBUSY; + err = -EBUSY; + if (help) { + rcu_read_lock(); + helper = rcu_dereference(help->helper); + if (helper && !strcmp(helper->name, helpname)) + err = 0; + rcu_read_unlock(); + } + + return err; } if (!strcmp(helpname, "")) { -- 2.1.4
[PATCH 0/9] Netfilter fixes for net
Hi David, The following patchset contains Netfilter fixes for your net tree, they are: 1) Missing TCP header sanity check in TCPMSS target, from Eric Dumazet. 2) Incorrect event message type for related conntracks created via ctnetlink, from Liping Zhang. 3) Fix incorrect rcu locking when handling helpers from ctnetlink, from Gao feng. 4) Fix missing rcu locking when updating helper, from Liping Zhang. 5) Fix missing read_lock_bh when iterating over list of device addresses from TPROXY and redirect, also from Liping. 6) Fix crash when trying to dump expectations from conntrack with no helper via ctnetlink, from Liping. 7) Missing RCU protection to expecation list update given ctnetlink iterates over the list under rcu read lock side, from Liping too. 8) Don't dump autogenerated seed in nft_hash to userspace, this is very confusing to the user, again from Liping. 9) Fix wrong conntrack netns module refcount in ipt_CLUSTERIP, from Gao feng. You can pull these changes from: git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git Thanks! The following changes since commit 0b9aefea860063bb39e36bd7fe6c7087fed0ba87: tcp: minimize false-positives on TCP/GRO check (2017-04-03 18:43:41 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git HEAD for you to fetch changes up to fe50543c194e2e1aee2f3eba41fcafd187b3dbde: netfilter: ipt_CLUSTERIP: Fix wrong conntrack netns refcnt usage (2017-04-13 23:21:40 +0200) Eric Dumazet (1): netfilter: xt_TCPMSS: add more sanity tests on tcph->doff Gao Feng (2): netfilter: helper: Add the rcu lock when call __nf_conntrack_helper_find netfilter: ipt_CLUSTERIP: Fix wrong conntrack netns refcnt usage Liping Zhang (6): netfilter: ctnetlink: using bit to represent the ct event netfilter: ctnetlink: make it safer when checking the ct helper name netfilter: make it safer during the inet6_dev->addr_list traversal netfilter: ctnetlink: skip dumping expect when nfct_help(ct) is NULL netfilter: nf_ct_expect: use proper RCU list traversal/update APIs netfilter: nft_hash: do not dump the auto generated seed net/ipv4/netfilter/ipt_CLUSTERIP.c | 2 +- net/netfilter/nf_conntrack_expect.c | 4 ++-- net/netfilter/nf_conntrack_helper.c | 17 ++- net/netfilter/nf_conntrack_netlink.c | 41 +--- net/netfilter/nf_nat_redirect.c | 2 ++ net/netfilter/nft_hash.c | 10 ++--- net/netfilter/xt_TCPMSS.c| 6 +- net/netfilter/xt_TPROXY.c| 5 - 8 files changed, 62 insertions(+), 25 deletions(-)
[PATCH 8/9] netfilter: nft_hash: do not dump the auto generated seed
From: Liping ZhangThis can prevent the nft utility from printing out the auto generated seed to the user, which is unnecessary and confusing. Fixes: cb1b69b0b15b ("netfilter: nf_tables: add hash expression") Signed-off-by: Liping Zhang Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nft_hash.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/net/netfilter/nft_hash.c b/net/netfilter/nft_hash.c index eb2721af898d..c4dad1254ead 100644 --- a/net/netfilter/nft_hash.c +++ b/net/netfilter/nft_hash.c @@ -21,6 +21,7 @@ struct nft_hash { enum nft_registers sreg:8; enum nft_registers dreg:8; u8 len; + boolautogen_seed:1; u32 modulus; u32 seed; u32 offset; @@ -82,10 +83,12 @@ static int nft_hash_init(const struct nft_ctx *ctx, if (priv->offset + priv->modulus - 1 < priv->offset) return -EOVERFLOW; - if (tb[NFTA_HASH_SEED]) + if (tb[NFTA_HASH_SEED]) { priv->seed = ntohl(nla_get_be32(tb[NFTA_HASH_SEED])); - else + } else { + priv->autogen_seed = true; get_random_bytes(>seed, sizeof(priv->seed)); + } return nft_validate_register_load(priv->sreg, len) && nft_validate_register_store(ctx, priv->dreg, NULL, @@ -105,7 +108,8 @@ static int nft_hash_dump(struct sk_buff *skb, goto nla_put_failure; if (nla_put_be32(skb, NFTA_HASH_MODULUS, htonl(priv->modulus))) goto nla_put_failure; - if (nla_put_be32(skb, NFTA_HASH_SEED, htonl(priv->seed))) + if (!priv->autogen_seed && + nla_put_be32(skb, NFTA_HASH_SEED, htonl(priv->seed))) goto nla_put_failure; if (priv->offset != 0) if (nla_put_be32(skb, NFTA_HASH_OFFSET, htonl(priv->offset))) -- 2.1.4
[PATCH 6/9] netfilter: ctnetlink: skip dumping expect when nfct_help(ct) is NULL
From: Liping ZhangFor IPCTNL_MSG_EXP_GET, if the CTA_EXPECT_MASTER attr is specified, then the NLM_F_DUMP request will dump the expectations related to this connection tracking. But we forget to check whether the conntrack has nf_conn_help or not, so if nfct_help(ct) is NULL, oops will happen: BUG: unable to handle kernel NULL pointer dereference at 0008 IP: ctnetlink_exp_ct_dump_table+0xf9/0x1e0 [nf_conntrack_netlink] Call Trace: ? ctnetlink_exp_ct_dump_table+0x75/0x1e0 [nf_conntrack_netlink] netlink_dump+0x124/0x2a0 __netlink_dump_start+0x161/0x190 ctnetlink_dump_exp_ct+0x16c/0x1bc [nf_conntrack_netlink] ? ctnetlink_exp_fill_info.constprop.33+0xf0/0xf0 [nf_conntrack_netlink] ? ctnetlink_glue_seqadj+0x20/0x20 [nf_conntrack_netlink] ctnetlink_get_expect+0x32e/0x370 [nf_conntrack_netlink] ? debug_lockdep_rcu_enabled+0x1d/0x20 nfnetlink_rcv_msg+0x60a/0x6a9 [nfnetlink] ? nfnetlink_rcv_msg+0x1b9/0x6a9 [nfnetlink] [...] Signed-off-by: Liping Zhang Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nf_conntrack_netlink.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index f9c643bc1a8e..f78eadba343d 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -2794,6 +2794,12 @@ static int ctnetlink_dump_exp_ct(struct net *net, struct sock *ctnl, return -ENOENT; ct = nf_ct_tuplehash_to_ctrack(h); + /* No expectation linked to this connection tracking. */ + if (!nfct_help(ct)) { + nf_ct_put(ct); + return 0; + } + c.data = ct; err = netlink_dump_start(ctnl, skb, nlh, ); -- 2.1.4
[PATCH 5/9] netfilter: make it safer during the inet6_dev->addr_list traversal
From: Liping Zhanginet6_dev->addr_list is protected by inet6_dev->lock, so only using rcu_read_lock is not enough, we should acquire read_lock_bh(>lock) before the inet6_dev->addr_list traversal. Signed-off-by: Liping Zhang Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nf_nat_redirect.c | 2 ++ net/netfilter/xt_TPROXY.c | 5 - 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/net/netfilter/nf_nat_redirect.c b/net/netfilter/nf_nat_redirect.c index d43869879fcf..86067560a318 100644 --- a/net/netfilter/nf_nat_redirect.c +++ b/net/netfilter/nf_nat_redirect.c @@ -101,11 +101,13 @@ nf_nat_redirect_ipv6(struct sk_buff *skb, const struct nf_nat_range *range, rcu_read_lock(); idev = __in6_dev_get(skb->dev); if (idev != NULL) { + read_lock_bh(>lock); list_for_each_entry(ifa, >addr_list, if_list) { newdst = ifa->addr; addr = true; break; } + read_unlock_bh(>lock); } rcu_read_unlock(); diff --git a/net/netfilter/xt_TPROXY.c b/net/netfilter/xt_TPROXY.c index 80cb7babeb64..df7f1df00330 100644 --- a/net/netfilter/xt_TPROXY.c +++ b/net/netfilter/xt_TPROXY.c @@ -393,7 +393,8 @@ tproxy_laddr6(struct sk_buff *skb, const struct in6_addr *user_laddr, rcu_read_lock(); indev = __in6_dev_get(skb->dev); - if (indev) + if (indev) { + read_lock_bh(>lock); list_for_each_entry(ifa, >addr_list, if_list) { if (ifa->flags & (IFA_F_TENTATIVE | IFA_F_DEPRECATED)) continue; @@ -401,6 +402,8 @@ tproxy_laddr6(struct sk_buff *skb, const struct in6_addr *user_laddr, laddr = >addr; break; } + read_unlock_bh(>lock); + } rcu_read_unlock(); return laddr ? laddr : daddr; -- 2.1.4
[PATCH 2/9] netfilter: ctnetlink: using bit to represent the ct event
From: Liping ZhangOtherwise, creating a new conntrack via nfnetlink: # conntrack -I -p udp -s 1.1.1.1 -d 2.2.2.2 -t 10 --sport 10 --dport 20 will emit the wrong ct events(where UPDATE should be NEW): # conntrack -E [UPDATE] udp 17 10 src=1.1.1.1 dst=2.2.2.2 sport=10 dport=20 [UNREPLIED] src=2.2.2.2 dst=1.1.1.1 sport=20 dport=10 mark=0 Signed-off-by: Liping Zhang Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nf_conntrack_netlink.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index 908d858034e4..59ee27deb9a0 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -1929,9 +1929,9 @@ static int ctnetlink_new_conntrack(struct net *net, struct sock *ctnl, err = 0; if (test_bit(IPS_EXPECTED_BIT, >status)) - events = IPCT_RELATED; + events = 1 << IPCT_RELATED; else - events = IPCT_NEW; + events = 1 << IPCT_NEW; if (cda[CTA_LABELS] && ctnetlink_attach_labels(ct, cda) == 0) -- 2.1.4
Re: [PATCH nf-next] ipvs: remove unused function ip_vs_set_state_timeout
On Mon, Apr 10, 2017 at 03:50:44PM -0400, Aaron Conole wrote: > There are no in-tree callers of this function and it isn't exported. Simon, let me know if you want to take this, or just add your Signed-off-by. Thanks! > Signed-off-by: Aaron Conole> --- > include/net/ip_vs.h | 2 -- > net/netfilter/ipvs/ip_vs_proto.c | 22 -- > 2 files changed, 24 deletions(-) > > diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h > index 8a4a57b8..c76fedb 100644 > --- a/include/net/ip_vs.h > +++ b/include/net/ip_vs.h > @@ -1349,8 +1349,6 @@ int ip_vs_protocol_init(void); > void ip_vs_protocol_cleanup(void); > void ip_vs_protocol_timeout_change(struct netns_ipvs *ipvs, int flags); > int *ip_vs_create_timeout_table(int *table, int size); > -int ip_vs_set_state_timeout(int *table, int num, const char *const *names, > - const char *name, int to); > void ip_vs_tcpudp_debug_packet(int af, struct ip_vs_protocol *pp, > const struct sk_buff *skb, int offset, > const char *msg); > diff --git a/net/netfilter/ipvs/ip_vs_proto.c > b/net/netfilter/ipvs/ip_vs_proto.c > index 8ae4807..ca880a3 100644 > --- a/net/netfilter/ipvs/ip_vs_proto.c > +++ b/net/netfilter/ipvs/ip_vs_proto.c > @@ -193,28 +193,6 @@ ip_vs_create_timeout_table(int *table, int size) > } > > > -/* > - * Set timeout value for state specified by name > - */ > -int > -ip_vs_set_state_timeout(int *table, int num, const char *const *names, > - const char *name, int to) > -{ > - int i; > - > - if (!table || !name || !to) > - return -EINVAL; > - > - for (i = 0; i < num; i++) { > - if (strcmp(names[i], name)) > - continue; > - table[i] = to * HZ; > - return 0; > - } > - return -ENOENT; > -} > - > - > const char * ip_vs_state_name(__u16 proto, int state) > { > struct ip_vs_protocol *pp = ip_vs_proto_get(proto); > -- > 2.9.3 >
Re: [GIT 0/3] Second Round of IPVS Updates for v4.12
On Fri, Apr 14, 2017 at 08:51:19AM +0900, Simon Horman wrote: > On Fri, Apr 14, 2017 at 01:01:34AM +0200, Pablo Neira Ayuso wrote: > > Hi Simon, > > > > On Mon, Apr 10, 2017 at 09:58:32AM -0700, Simon Horman wrote: > > > Hi Pablo, > > > > > > please consider these clean-ups and enhancements to IPVS for v4.12. > > > > > > * Removal unused variable > > > * Use kzalloc where appropriate > > > * More efficient detection of presence of NAT extension > > > > > > > > > The following changes since commit > > > 592d42ac7fd36408979e09bf2f170f2595dab7b8: > > > > > > Merge branch 'qed-IOV-cleanups' (2017-03-21 19:02:38 -0700) > > > > > > are available in the git repository at: > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git > > > ipvs2-for-v4.12 > > > > This says: > > > > $ git pull > > https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git > > ipvs2-for-v4.12 > > fatal: Couldn't find remote ref ipvs2-for-v4.12 > > > > I don't any tag for this name in: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git/refs/tags > > Sorry about that, it looks like I forgot to push the tag. > It should be there now. I'm hitting a conflict between this and what I have in nf-next.git. If you can have a look, otherwise I will check tomorrow with fresher mind.
Re: [GIT 0/3] Second Round of IPVS Updates for v4.12
On Fri, Apr 14, 2017 at 01:01:34AM +0200, Pablo Neira Ayuso wrote: > Hi Simon, > > On Mon, Apr 10, 2017 at 09:58:32AM -0700, Simon Horman wrote: > > Hi Pablo, > > > > please consider these clean-ups and enhancements to IPVS for v4.12. > > > > * Removal unused variable > > * Use kzalloc where appropriate > > * More efficient detection of presence of NAT extension > > > > > > The following changes since commit 592d42ac7fd36408979e09bf2f170f2595dab7b8: > > > > Merge branch 'qed-IOV-cleanups' (2017-03-21 19:02:38 -0700) > > > > are available in the git repository at: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git > > ipvs2-for-v4.12 > > This says: > > $ git pull > https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git > ipvs2-for-v4.12 > fatal: Couldn't find remote ref ipvs2-for-v4.12 > > I don't any tag for this name in: > > https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git/refs/tags Sorry about that, it looks like I forgot to push the tag. It should be there now.
Re: [PATCH nf-next] ipset: remove unused function __ip_set_get_netlink
On Mon, Apr 10, 2017 at 03:52:37PM -0400, Aaron Conole wrote: > There are no in-tree callers. @Jozsef, let me know if I should just take this to save you a pull request. Thanks. > Signed-off-by: Aaron Conole> --- > net/netfilter/ipset/ip_set_core.c | 8 > 1 file changed, 8 deletions(-) > > diff --git a/net/netfilter/ipset/ip_set_core.c > b/net/netfilter/ipset/ip_set_core.c > index c296f9b..68ba531 100644 > --- a/net/netfilter/ipset/ip_set_core.c > +++ b/net/netfilter/ipset/ip_set_core.c > @@ -501,14 +501,6 @@ __ip_set_put(struct ip_set *set) > * a separate reference counter > */ > static inline void > -__ip_set_get_netlink(struct ip_set *set) > -{ > - write_lock_bh(_set_ref_lock); > - set->ref_netlink++; > - write_unlock_bh(_set_ref_lock); > -} > - > -static inline void > __ip_set_put_netlink(struct ip_set *set) > { > write_lock_bh(_set_ref_lock); > -- > 2.9.3 > > -- > To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH nf-next] nf_conntrack: remove double assignment
On Wed, Apr 12, 2017 at 04:32:54PM -0400, Aaron Conole wrote: > The protonet pointer will unconditionally be rewritten, so just do the > needed assignment first. Also applied, thanks.
Re: [PATCH nf-next] nf_tables: remove double return statement
Applied, thanks.
Re: [RFC net-next] of: mdio: Honor hints from MDIO bus drivers
On 04/13/2017 02:51 PM, Andrew Lunn wrote: >> The DT binding is in tree and provides an example of how the switch >> looks like, below is the example, but I am also adding the MDIO bus and >> the PHYs just so you can see how things wind up: >> >> switch_top@f0b0 { >> compatible = "simple-bus"; >> #size-cells = <1>; >> #address-cells = <1>; >> ranges = <0 0xf0b0 0x40804>; >> >> ethernet_switch@0 { >> compatible = "brcm,bcm7445-switch-v4.0"; >> #size-cells = <0>; >> #address-cells = <1>; >> reg = <0x0 0x4 >> 0x4 0x110 >> 0x40340 0x30 >> 0x40380 0x30 >> 0x40400 0x34 >> 0x40600 0x208>; >> reg-names = "core", "reg", intrl2_0", "intrl2_1", >> "fcb, "acb"; >> interrupts = <0 0x18 0 >> 0 0x19 0>; >> brcm,num-gphy = <1>; >> brcm,num-rgmii-ports = <2>; >> brcm,fcb-pause-override; >> brcm,acb-packets-inflight; >> >> ports { >> #address-cells = <1>; >> #size-cells = <0>; >> >> port@0 { >> label = "gphy"; >> reg = <0>; >> phy-handle = <>; >> }; >> >> sw0port1: port@1 { >> label = "rgmii_1"; >> reg = <1>; >> phy-mode = "rgmii"; >> fixed-link { >> speed = <1000>; >> full-duplex; >> }; >> } >> }; >> }; >> >> mdio@403c0 { >> reg = <0x403c0 0x8 0x40300 0x18>; >> #address-cells = <0x1>; >> #size-cells = <0x0>; >> compatible = "brcm,unimac-mdio"; >> reg-names = "mdio", "mdio_indir_rw"; >> >> switch: switch@0 { >> broken-turn-around; >> reg = <0x0>; >> compatible = "brcm,bcm53125"; >> #address-cells = <1>; >> #size-cells = <0>; >> >> ports { >> .. >> port@8 { >> ethernet = <>; >> }; >> ... >> }; >> }; >> >> phy5: ethernet-phy@5 { >> reg = <0x5>; >> compatible = "ethernet-phy-ieee802.3-c22"; >> }; >> }; >> }; > > So phy5 is connected to the internal switch with a phy-handle. But > because of your double usage of this node, it also can be mapped into > the external switches port 5? > > Is that your problem? Kind of, it does translate into an invalid mapping by virtue of the PHY being in a bad state, see below. The mapping per-se is not the problem, but the fact that the PHY driver is probed twice is the original problem that I have. The double probing comes from the switch driver being probed first (drivers/net/dsa/ comes before drivers/net/ethernet) and depends on the master netdev to be running. We need to turn on the Gigabit PHY clock in order to be able to read its PHY OUI and map it to a driver (yes a workaround could be to put its exact compatible string in DT, that way, no need for get_phy_id()). We have a local change in mdio-bcm-unimac.c which does exactly that (using the clock framework), and then, to avoid artificially bumping the clock reference count, the BCM7xxx PHY driver in its ->probe() function checks whether the clock is enabled (yes, using __clk_is_enabled while it probably should not) and keep the clock turned on for the MDIO layer to successfully read/write from the PHY. The BCM7xxx PHY driver does properly manage the clock though, and turns it off upon ->remove(). We got probed and removed once, no more clock enabled because of the first probe deferral. The second time around, when the slave MII bus probes us again, we go through the BCM7xxx ->probe() and ->remove() callbacks again, but the clock was already turned off due to first probe that got deferred. When the bcm_sf2 driver finally gets initialized, we try to attach to this Gigabit PHY, the driver is there, good, but the clock is turned off already, so the PHY does not respond correctly at all anymore and we end-up reading garbage. > > It seems like you should add an mdio node inside your switch node, and > list your external switch internal/external phys there if needed. I think I am going to keep this hack
Re: [GIT 0/3] Second Round of IPVS Updates for v4.12
Hi Simon, On Mon, Apr 10, 2017 at 09:58:32AM -0700, Simon Horman wrote: > Hi Pablo, > > please consider these clean-ups and enhancements to IPVS for v4.12. > > * Removal unused variable > * Use kzalloc where appropriate > * More efficient detection of presence of NAT extension > > > The following changes since commit 592d42ac7fd36408979e09bf2f170f2595dab7b8: > > Merge branch 'qed-IOV-cleanups' (2017-03-21 19:02:38 -0700) > > are available in the git repository at: > > https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git > ipvs2-for-v4.12 This says: $ git pull https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git ipvs2-for-v4.12 fatal: Couldn't find remote ref ipvs2-for-v4.12 I don't any tag for this name in: https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git/refs/tags
RE: [PATCH v4] smsc95xx: Add comments to the registers definition
> This chip is used by a lot of embedded devices and also by the Raspberry > Pi 1, 2 & 3 which were created to promote the study of computer > sciences. Students wanting to learn kernel / network device driver > programming through those devices can only rely on the Linux kernel > driver source to make their own. > > This commit adds a lot of comments to the registers definition to expand > the register names. > > Cc: Steve Glendinning> Cc: Microchip Linux Driver Support > CC: David Miller > Signed-off-by: Martin Wetterwald > Reviewed-by: Andrew Lunn > Acked-by: Steve Glendinning Acked-by: Woojung Huh Woojung
[PATCH 15/22] scsi: libfc, csiostor: Change to sg_copy_buffer in two drivers
These two drivers appear to duplicate the functionality of sg_copy_buffer. So we clean them up to use the common code. This helps us remove a couple of instances that would otherwise be slightly tricky sg_map usages. Signed-off-by: Logan Gunthorpe--- drivers/scsi/csiostor/csio_scsi.c | 54 +++ drivers/scsi/libfc/fc_libfc.c | 49 --- 2 files changed, 14 insertions(+), 89 deletions(-) diff --git a/drivers/scsi/csiostor/csio_scsi.c b/drivers/scsi/csiostor/csio_scsi.c index a1ff75f..bd9d062 100644 --- a/drivers/scsi/csiostor/csio_scsi.c +++ b/drivers/scsi/csiostor/csio_scsi.c @@ -1489,60 +1489,14 @@ static inline uint32_t csio_scsi_copy_to_sgl(struct csio_hw *hw, struct csio_ioreq *req) { struct scsi_cmnd *scmnd = (struct scsi_cmnd *)csio_scsi_cmnd(req); - struct scatterlist *sg; - uint32_t bytes_left; - uint32_t bytes_copy; - uint32_t buf_off = 0; - uint32_t start_off = 0; - uint32_t sg_off = 0; - void *sg_addr; - void *buf_addr; struct csio_dma_buf *dma_buf; + size_t copied; - bytes_left = scsi_bufflen(scmnd); - sg = scsi_sglist(scmnd); dma_buf = (struct csio_dma_buf *)csio_list_next(>gen_list); + copied = sg_copy_from_buffer(scsi_sglist(scmnd), scsi_sg_count(scmnd), +dma_buf->vaddr, scsi_bufflen(scmnd)); - /* Copy data from driver buffer to SGs of SCSI CMD */ - while (bytes_left > 0 && sg && dma_buf) { - if (buf_off >= dma_buf->len) { - buf_off = 0; - dma_buf = (struct csio_dma_buf *) - csio_list_next(dma_buf); - continue; - } - - if (start_off >= sg->length) { - start_off -= sg->length; - sg = sg_next(sg); - continue; - } - - buf_addr = dma_buf->vaddr + buf_off; - sg_off = sg->offset + start_off; - bytes_copy = min((dma_buf->len - buf_off), - sg->length - start_off); - bytes_copy = min((uint32_t)(PAGE_SIZE - (sg_off & ~PAGE_MASK)), -bytes_copy); - - sg_addr = kmap_atomic(sg_page(sg) + (sg_off >> PAGE_SHIFT)); - if (!sg_addr) { - csio_err(hw, "failed to kmap sg:%p of ioreq:%p\n", - sg, req); - break; - } - - csio_dbg(hw, "copy_to_sgl:sg_addr %p sg_off %d buf %p len %d\n", - sg_addr, sg_off, buf_addr, bytes_copy); - memcpy(sg_addr + (sg_off & ~PAGE_MASK), buf_addr, bytes_copy); - kunmap_atomic(sg_addr); - - start_off += bytes_copy; - buf_off += bytes_copy; - bytes_left -= bytes_copy; - } - - if (bytes_left > 0) + if (copied != scsi_bufflen(scmnd)) return DID_ERROR; else return DID_OK; diff --git a/drivers/scsi/libfc/fc_libfc.c b/drivers/scsi/libfc/fc_libfc.c index d623d08..ce0805a 100644 --- a/drivers/scsi/libfc/fc_libfc.c +++ b/drivers/scsi/libfc/fc_libfc.c @@ -113,45 +113,16 @@ u32 fc_copy_buffer_to_sglist(void *buf, size_t len, u32 *nents, size_t *offset, u32 *crc) { - size_t remaining = len; - u32 copy_len = 0; - - while (remaining > 0 && sg) { - size_t off, sg_bytes; - void *page_addr; - - if (*offset >= sg->length) { - /* -* Check for end and drop resources -* from the last iteration. -*/ - if (!(*nents)) - break; - --(*nents); - *offset -= sg->length; - sg = sg_next(sg); - continue; - } - sg_bytes = min(remaining, sg->length - *offset); - - /* -* The scatterlist item may be bigger than PAGE_SIZE, -* but we are limited to mapping PAGE_SIZE at a time. -*/ - off = *offset + sg->offset; - sg_bytes = min(sg_bytes, - (size_t)(PAGE_SIZE - (off & ~PAGE_MASK))); - page_addr = kmap_atomic(sg_page(sg) + (off >> PAGE_SHIFT)); - if (crc) - *crc = crc32(*crc, buf, sg_bytes); - memcpy((char *)page_addr + (off & ~PAGE_MASK), buf, sg_bytes); - kunmap_atomic(page_addr); - buf += sg_bytes; - *offset += sg_bytes; - remaining -= sg_bytes; -
[PATCH 12/22] scsi: ipr, pmcraid, isci: Make use of the new sg_map helper in 4 call sites
Very straightforward conversion of three scsi drivers. Signed-off-by: Logan Gunthorpe--- drivers/scsi/ipr.c | 27 ++- drivers/scsi/isci/request.c | 42 +- drivers/scsi/pmcraid.c | 19 --- 3 files changed, 51 insertions(+), 37 deletions(-) diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c index b29afaf..f98f251 100644 --- a/drivers/scsi/ipr.c +++ b/drivers/scsi/ipr.c @@ -3853,7 +3853,7 @@ static void ipr_free_ucode_buffer(struct ipr_sglist *sglist) static int ipr_copy_ucode_buffer(struct ipr_sglist *sglist, u8 *buffer, u32 len) { - int bsize_elem, i, result = 0; + int bsize_elem, i; struct scatterlist *scatterlist; void *kaddr; @@ -3863,32 +3863,33 @@ static int ipr_copy_ucode_buffer(struct ipr_sglist *sglist, scatterlist = sglist->scatterlist; for (i = 0; i < (len / bsize_elem); i++, buffer += bsize_elem) { - struct page *page = sg_page([i]); + kaddr = sg_map([i], SG_KMAP); + if (IS_ERR(kaddr)) { + ipr_trace; + return PTR_ERR(kaddr); + } - kaddr = kmap(page); memcpy(kaddr, buffer, bsize_elem); - kunmap(page); + sg_unmap([i], kaddr, SG_KMAP); scatterlist[i].length = bsize_elem; - - if (result != 0) { - ipr_trace; - return result; - } } if (len % bsize_elem) { - struct page *page = sg_page([i]); + kaddr = sg_map([i], SG_KMAP); + if (IS_ERR(kaddr)) { + ipr_trace; + return PTR_ERR(kaddr); + } - kaddr = kmap(page); memcpy(kaddr, buffer, len % bsize_elem); - kunmap(page); + sg_unmap([i], kaddr, SG_KMAP); scatterlist[i].length = len % bsize_elem; } sglist->buffer_len = len; - return result; + return 0; } /** diff --git a/drivers/scsi/isci/request.c b/drivers/scsi/isci/request.c index 47f66e9..66d6596 100644 --- a/drivers/scsi/isci/request.c +++ b/drivers/scsi/isci/request.c @@ -1424,12 +1424,14 @@ sci_stp_request_pio_data_in_copy_data_buffer(struct isci_stp_request *stp_req, sg = task->scatter; while (total_len > 0) { - struct page *page = sg_page(sg); - copy_len = min_t(int, total_len, sg_dma_len(sg)); - kaddr = kmap_atomic(page); - memcpy(kaddr + sg->offset, src_addr, copy_len); - kunmap_atomic(kaddr); + kaddr = sg_map(sg, SG_KMAP_ATOMIC); + if (IS_ERR(kaddr)) + return SCI_FAILURE; + + memcpy(kaddr, src_addr, copy_len); + sg_unmap(sg, kaddr, SG_KMAP_ATOMIC); + total_len -= copy_len; src_addr += copy_len; sg = sg_next(sg); @@ -1771,14 +1773,16 @@ sci_io_request_frame_handler(struct isci_request *ireq, case SCI_REQ_SMP_WAIT_RESP: { struct sas_task *task = isci_request_access_task(ireq); struct scatterlist *sg = >smp_task.smp_resp; - void *frame_header, *kaddr; + void *frame_header; u8 *rsp; sci_unsolicited_frame_control_get_header(>uf_control, frame_index, _header); - kaddr = kmap_atomic(sg_page(sg)); - rsp = kaddr + sg->offset; + rsp = sg_map(sg, SG_KMAP_ATOMIC); + if (IS_ERR(rsp)) + return SCI_FAILURE; + sci_swab32_cpy(rsp, frame_header, 1); if (rsp[0] == SMP_RESPONSE) { @@ -1814,7 +1818,7 @@ sci_io_request_frame_handler(struct isci_request *ireq, ireq->sci_status = SCI_FAILURE_CONTROLLER_SPECIFIC_IO_ERR; sci_change_state(>sm, SCI_REQ_COMPLETED); } - kunmap_atomic(kaddr); + sg_unmap(sg, rsp, SG_KMAP_ATOMIC); sci_controller_release_frame(ihost, frame_index); @@ -2919,15 +2923,18 @@ static void isci_request_io_request_complete(struct isci_host *ihost, case SAS_PROTOCOL_SMP: { struct scatterlist *sg = >smp_task.smp_req; struct smp_req *smp_req; - void *kaddr; dma_unmap_sg(>pdev->dev, sg, 1, DMA_TO_DEVICE); /* need to swab it back in case the command buffer is re-used */ -
[PATCH 10/22] staging: unisys: visorbus: Make use of the new sg_map helper function
Straightforward conversion to the new function. Signed-off-by: Logan Gunthorpe--- drivers/staging/unisys/visorhba/visorhba_main.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/drivers/staging/unisys/visorhba/visorhba_main.c b/drivers/staging/unisys/visorhba/visorhba_main.c index 0ce92c8..2d8c8bc 100644 --- a/drivers/staging/unisys/visorhba/visorhba_main.c +++ b/drivers/staging/unisys/visorhba/visorhba_main.c @@ -842,7 +842,6 @@ do_scsi_nolinuxstat(struct uiscmdrsp *cmdrsp, struct scsi_cmnd *scsicmd) struct scatterlist *sg; unsigned int i; char *this_page; - char *this_page_orig; int bufind = 0; struct visordisk_info *vdisk; struct visorhba_devdata *devdata; @@ -869,11 +868,14 @@ do_scsi_nolinuxstat(struct uiscmdrsp *cmdrsp, struct scsi_cmnd *scsicmd) sg = scsi_sglist(scsicmd); for (i = 0; i < scsi_sg_count(scsicmd); i++) { - this_page_orig = kmap_atomic(sg_page(sg + i)); - this_page = (void *)((unsigned long)this_page_orig | -sg[i].offset); + this_page = sg_map(sg + i, SG_KMAP_ATOMIC); + if (IS_ERR(this_page)) { + scsicmd->result = DID_ERROR << 16; + return; + } + memcpy(this_page, buf + bufind, sg[i].length); - kunmap_atomic(this_page_orig); + sg_unmap(sg + i, this_page, SG_KMAP_ATOMIC); } } else { devdata = (struct visorhba_devdata *)scsidev->host->hostdata; -- 2.1.4
[PATCH 03/22] libiscsi: Make use of new the sg_map helper function
Convert the kmap and kmap_atomic uses to the sg_map function. We now store the flags for the kmap instead of a boolean to indicate atomicitiy. We also propogate a possible kmap error down and create a new ISCSI_TCP_INTERNAL_ERR error type for this. Signed-off-by: Logan Gunthorpe--- drivers/scsi/cxgbi/libcxgbi.c | 5 + drivers/scsi/libiscsi_tcp.c | 32 include/scsi/libiscsi_tcp.h | 3 ++- 3 files changed, 27 insertions(+), 13 deletions(-) diff --git a/drivers/scsi/cxgbi/libcxgbi.c b/drivers/scsi/cxgbi/libcxgbi.c index bd7d39e..e38d0c1 100644 --- a/drivers/scsi/cxgbi/libcxgbi.c +++ b/drivers/scsi/cxgbi/libcxgbi.c @@ -1556,6 +1556,11 @@ static inline int read_pdu_skb(struct iscsi_conn *conn, */ iscsi_conn_printk(KERN_ERR, conn, "Invalid pdu or skb."); return -EFAULT; + case ISCSI_TCP_INTERNAL_ERR: + pr_info("skb 0x%p, off %u, %d, TCP_INTERNAL_ERR.\n", + skb, offset, offloaded); + iscsi_conn_printk(KERN_ERR, conn, "Internal error."); + return -EFAULT; case ISCSI_TCP_SEGMENT_DONE: log_debug(1 << CXGBI_DBG_PDU_RX, "skb 0x%p, off %u, %d, TCP_SEG_DONE, rc %d.\n", diff --git a/drivers/scsi/libiscsi_tcp.c b/drivers/scsi/libiscsi_tcp.c index 63a1d69..a2427699 100644 --- a/drivers/scsi/libiscsi_tcp.c +++ b/drivers/scsi/libiscsi_tcp.c @@ -133,25 +133,23 @@ static void iscsi_tcp_segment_map(struct iscsi_segment *segment, int recv) if (page_count(sg_page(sg)) >= 1 && !recv) return; - if (recv) { - segment->atomic_mapped = true; - segment->sg_mapped = kmap_atomic(sg_page(sg)); - } else { - segment->atomic_mapped = false; - /* the xmit path can sleep with the page mapped so use kmap */ - segment->sg_mapped = kmap(sg_page(sg)); + /* the xmit path can sleep with the page mapped so don't use atomic */ + segment->sg_map_flags = recv ? SG_KMAP_ATOMIC : SG_KMAP; + segment->sg_mapped = sg_map(sg, segment->sg_map_flags); + + if (IS_ERR(segment->sg_mapped)) { + segment->sg_mapped = NULL; + return; } - segment->data = segment->sg_mapped + sg->offset + segment->sg_offset; + segment->data = segment->sg_mapped + segment->sg_offset; } void iscsi_tcp_segment_unmap(struct iscsi_segment *segment) { if (segment->sg_mapped) { - if (segment->atomic_mapped) - kunmap_atomic(segment->sg_mapped); - else - kunmap(sg_page(segment->sg)); + sg_unmap(segment->sg, segment->sg_mapped, + segment->sg_map_flags); segment->sg_mapped = NULL; segment->data = NULL; } @@ -304,6 +302,9 @@ iscsi_tcp_segment_recv(struct iscsi_tcp_conn *tcp_conn, break; } + if (segment->data) + return -EFAULT; + copy = min(len - copied, segment->size - segment->copied); ISCSI_DBG_TCP(tcp_conn->iscsi_conn, "copying %d\n", copy); memcpy(segment->data + segment->copied, ptr + copied, copy); @@ -927,6 +928,13 @@ int iscsi_tcp_recv_skb(struct iscsi_conn *conn, struct sk_buff *skb, avail); rc = iscsi_tcp_segment_recv(tcp_conn, segment, ptr, avail); BUG_ON(rc == 0); + if (rc < 0) { + ISCSI_DBG_TCP(conn, "memory fault. Consumed %d\n", + consumed); + *status = ISCSI_TCP_INTERNAL_ERR; + goto skb_done; + } + consumed += rc; if (segment->total_copied >= segment->total_size) { diff --git a/include/scsi/libiscsi_tcp.h b/include/scsi/libiscsi_tcp.h index 30520d5..58c79af 100644 --- a/include/scsi/libiscsi_tcp.h +++ b/include/scsi/libiscsi_tcp.h @@ -47,7 +47,7 @@ struct iscsi_segment { struct scatterlist *sg; void*sg_mapped; unsigned intsg_offset; - boolatomic_mapped; + int sg_map_flags; iscsi_segment_done_fn_t *done; }; @@ -92,6 +92,7 @@ enum { ISCSI_TCP_SKB_DONE, /* skb is out of data */ ISCSI_TCP_CONN_ERR, /* iscsi layer has fired a conn err */ ISCSI_TCP_SUSPENDED,/* conn is suspended */ + ISCSI_TCP_INTERNAL_ERR, /* an internal error occurred */ }; extern void iscsi_tcp_hdr_recv_prep(struct iscsi_tcp_conn *tcp_conn); -- 2.1.4
[PATCH 07/22] crypto: shash, caam: Make use of the new sg_map helper function
Very straightforward conversion to the new function in two crypto drivers. Signed-off-by: Logan Gunthorpe--- crypto/shash.c| 9 ++--- drivers/crypto/caam/caamalg.c | 8 +++- 2 files changed, 9 insertions(+), 8 deletions(-) diff --git a/crypto/shash.c b/crypto/shash.c index 5e31c8d..2b7de94 100644 --- a/crypto/shash.c +++ b/crypto/shash.c @@ -283,10 +283,13 @@ int shash_ahash_digest(struct ahash_request *req, struct shash_desc *desc) if (nbytes < min(sg->length, ((unsigned int)(PAGE_SIZE)) - offset)) { void *data; - data = kmap_atomic(sg_page(sg)); - err = crypto_shash_digest(desc, data + offset, nbytes, + data = sg_map(sg, SG_KMAP_ATOMIC); + if (IS_ERR(data)) + return PTR_ERR(data); + + err = crypto_shash_digest(desc, data, nbytes, req->result); - kunmap_atomic(data); + sg_unmap(sg, data, SG_KMAP_ATOMIC); crypto_yield(desc->flags); } else err = crypto_shash_init(desc) ?: diff --git a/drivers/crypto/caam/caamalg.c b/drivers/crypto/caam/caamalg.c index 9bc80eb..76b97de 100644 --- a/drivers/crypto/caam/caamalg.c +++ b/drivers/crypto/caam/caamalg.c @@ -89,7 +89,6 @@ static void dbg_dump_sg(const char *level, const char *prefix_str, struct scatterlist *sg, size_t tlen, bool ascii) { struct scatterlist *it; - void *it_page; size_t len; void *buf; @@ -98,19 +97,18 @@ static void dbg_dump_sg(const char *level, const char *prefix_str, * make sure the scatterlist's page * has a valid virtual memory mapping */ - it_page = kmap_atomic(sg_page(it)); - if (unlikely(!it_page)) { + buf = sg_map(it, SG_KMAP_ATOMIC); + if (IS_ERR(buf)) { printk(KERN_ERR "dbg_dump_sg: kmap failed\n"); return; } - buf = it_page + it->offset; len = min_t(size_t, tlen, it->length); print_hex_dump(level, prefix_str, prefix_type, rowsize, groupsize, buf, len, ascii); tlen -= len; - kunmap_atomic(it_page); + sg_unmap(it, buf, SG_KMAP_ATOMIC); } } #endif -- 2.1.4
[PATCH 04/22] target: Make use of the new sg_map function at 16 call sites
Fairly straightforward conversions in all spots. In a couple of cases any error gets propogated up should sg_map fail. In other cases a warning is issued if the kmap fails seeing there's no clear error path. This should not be an issue until someone tries to use unmappable memory in the sgl with this driver. Signed-off-by: Logan Gunthorpe--- drivers/target/iscsi/iscsi_target.c| 27 +--- drivers/target/target_core_rd.c| 3 +- drivers/target/target_core_sbc.c | 122 +++-- drivers/target/target_core_transport.c | 18 +++-- drivers/target/target_core_user.c | 43 include/target/target_core_backend.h | 4 +- 6 files changed, 149 insertions(+), 68 deletions(-) diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c index a918024..e3e0d8f 100644 --- a/drivers/target/iscsi/iscsi_target.c +++ b/drivers/target/iscsi/iscsi_target.c @@ -579,7 +579,7 @@ iscsit_xmit_nondatain_pdu(struct iscsi_conn *conn, struct iscsi_cmd *cmd, } static int iscsit_map_iovec(struct iscsi_cmd *, struct kvec *, u32, u32); -static void iscsit_unmap_iovec(struct iscsi_cmd *); +static void iscsit_unmap_iovec(struct iscsi_cmd *, struct kvec *); static u32 iscsit_do_crypto_hash_sg(struct ahash_request *, struct iscsi_cmd *, u32, u32, u32, u8 *); static int @@ -646,7 +646,7 @@ iscsit_xmit_datain_pdu(struct iscsi_conn *conn, struct iscsi_cmd *cmd, ret = iscsit_fe_sendpage_sg(cmd, conn); - iscsit_unmap_iovec(cmd); + iscsit_unmap_iovec(cmd, >iov_data[1]); if (ret < 0) { iscsit_tx_thread_wait_for_tcp(conn); @@ -925,7 +925,10 @@ static int iscsit_map_iovec( while (data_length) { u32 cur_len = min_t(u32, data_length, sg->length - page_off); - iov[i].iov_base = kmap(sg_page(sg)) + sg->offset + page_off; + iov[i].iov_base = sg_map_offset(sg, page_off, SG_KMAP); + if (IS_ERR(iov[i].iov_base)) + goto map_err; + iov[i].iov_len = cur_len; data_length -= cur_len; @@ -937,17 +940,25 @@ static int iscsit_map_iovec( cmd->kmapped_nents = i; return i; + +map_err: + cmd->kmapped_nents = i - 1; + iscsit_unmap_iovec(cmd, iov); + return -1; } -static void iscsit_unmap_iovec(struct iscsi_cmd *cmd) +static void iscsit_unmap_iovec(struct iscsi_cmd *cmd, struct kvec *iov) { u32 i; struct scatterlist *sg; + unsigned int page_off = cmd->first_data_sg_off; sg = cmd->first_data_sg; - for (i = 0; i < cmd->kmapped_nents; i++) - kunmap(sg_page([i])); + for (i = 0; i < cmd->kmapped_nents; i++) { + sg_unmap_offset([i], iov[i].iov_base, page_off, SG_KMAP); + page_off = 0; + } } static void iscsit_ack_from_expstatsn(struct iscsi_conn *conn, u32 exp_statsn) @@ -1610,7 +1621,7 @@ iscsit_get_dataout(struct iscsi_conn *conn, struct iscsi_cmd *cmd, rx_got = rx_data(conn, >iov_data[0], iov_count, rx_size); - iscsit_unmap_iovec(cmd); + iscsit_unmap_iovec(cmd, iov); if (rx_got != rx_size) return -1; @@ -2626,7 +2637,7 @@ static int iscsit_handle_immediate_data( rx_got = rx_data(conn, >iov_data[0], iov_count, rx_size); - iscsit_unmap_iovec(cmd); + iscsit_unmap_iovec(cmd, cmd->iov_data); if (rx_got != rx_size) { iscsit_rx_thread_wait_for_tcp(conn); diff --git a/drivers/target/target_core_rd.c b/drivers/target/target_core_rd.c index ddc216c..22c5ad5 100644 --- a/drivers/target/target_core_rd.c +++ b/drivers/target/target_core_rd.c @@ -431,7 +431,8 @@ static sense_reason_t rd_do_prot_rw(struct se_cmd *cmd, bool is_read) cmd->t_prot_sg, 0); if (!rc) - sbc_dif_copy_prot(cmd, sectors, is_read, prot_sg, prot_offset); + rc = sbc_dif_copy_prot(cmd, sectors, is_read, prot_sg, + prot_offset); return rc; } diff --git a/drivers/target/target_core_sbc.c b/drivers/target/target_core_sbc.c index c194063..67cb420 100644 --- a/drivers/target/target_core_sbc.c +++ b/drivers/target/target_core_sbc.c @@ -420,17 +420,17 @@ static sense_reason_t xdreadwrite_callback(struct se_cmd *cmd, bool success, offset = 0; for_each_sg(cmd->t_bidi_data_sg, sg, cmd->t_bidi_data_nents, count) { - addr = kmap_atomic(sg_page(sg)); - if (!addr) { + addr = sg_map(sg, SG_KMAP_ATOMIC); + if (IS_ERR(addr)) { ret = TCM_OUT_OF_RESOURCES; goto out; } for (i = 0; i < sg->length; i++) - *(addr + sg->offset + i) ^= *(buf + offset + i); +
[PATCH 18/22] mmc: spi: Make use of the new sg_map helper function
We use the sg_map helper but it's slightly more complicated as we only check for the error when the mapping actually gets used. Such that if the mapping failed but wasn't needed then no error occurs. Signed-off-by: Logan Gunthorpe--- drivers/mmc/host/mmc_spi.c | 26 +++--- 1 file changed, 19 insertions(+), 7 deletions(-) diff --git a/drivers/mmc/host/mmc_spi.c b/drivers/mmc/host/mmc_spi.c index e77d79c..82f786d 100644 --- a/drivers/mmc/host/mmc_spi.c +++ b/drivers/mmc/host/mmc_spi.c @@ -676,9 +676,15 @@ mmc_spi_writeblock(struct mmc_spi_host *host, struct spi_transfer *t, struct scratch *scratch = host->data; u32 pattern; - if (host->mmc->use_spi_crc) + if (host->mmc->use_spi_crc) { + if (IS_ERR(t->tx_buf)) + return PTR_ERR(t->tx_buf); + scratch->crc_val = cpu_to_be16( crc_itu_t(0, t->tx_buf, t->len)); + t->tx_buf += t->len; + } + if (host->dma_dev) dma_sync_single_for_device(host->dma_dev, host->data_dma, sizeof(*scratch), @@ -743,7 +749,6 @@ mmc_spi_writeblock(struct mmc_spi_host *host, struct spi_transfer *t, return status; } - t->tx_buf += t->len; if (host->dma_dev) t->tx_dma += t->len; @@ -809,6 +814,11 @@ mmc_spi_readblock(struct mmc_spi_host *host, struct spi_transfer *t, } leftover = status << 1; + if (bitshift || host->mmc->use_spi_crc) { + if (IS_ERR(t->rx_buf)) + return PTR_ERR(t->rx_buf); + } + if (host->dma_dev) { dma_sync_single_for_device(host->dma_dev, host->data_dma, sizeof(*scratch), @@ -860,9 +870,10 @@ mmc_spi_readblock(struct mmc_spi_host *host, struct spi_transfer *t, scratch->crc_val, crc, t->len); return -EILSEQ; } + + t->rx_buf += t->len; } - t->rx_buf += t->len; if (host->dma_dev) t->rx_dma += t->len; @@ -936,11 +947,11 @@ mmc_spi_data_do(struct mmc_spi_host *host, struct mmc_command *cmd, } /* allow pio too; we don't allow highmem */ - kmap_addr = kmap(sg_page(sg)); + kmap_addr = sg_map(sg, SG_KMAP); if (direction == DMA_TO_DEVICE) - t->tx_buf = kmap_addr + sg->offset; + t->tx_buf = kmap_addr; else - t->rx_buf = kmap_addr + sg->offset; + t->rx_buf = kmap_addr; /* transfer each block, and update request status */ while (length) { @@ -970,7 +981,8 @@ mmc_spi_data_do(struct mmc_spi_host *host, struct mmc_command *cmd, /* discard mappings */ if (direction == DMA_FROM_DEVICE) flush_kernel_dcache_page(sg_page(sg)); - kunmap(sg_page(sg)); + if (!IS_ERR(kmap_addr)) + sg_unmap(sg, kmap_addr, SG_KMAP); if (dma_dev) dma_unmap_page(dma_dev, dma_addr, PAGE_SIZE, dir); -- 2.1.4
[PATCH 22/22] memstick: Make use of the new sg_map helper function
Straightforward conversion, but we have to WARN if unmappable memory finds its way into the sgl. Signed-off-by: Logan Gunthorpe--- drivers/memstick/host/jmb38x_ms.c | 23 ++- drivers/memstick/host/tifm_ms.c | 22 +- 2 files changed, 35 insertions(+), 10 deletions(-) diff --git a/drivers/memstick/host/jmb38x_ms.c b/drivers/memstick/host/jmb38x_ms.c index 48db922..256cf41 100644 --- a/drivers/memstick/host/jmb38x_ms.c +++ b/drivers/memstick/host/jmb38x_ms.c @@ -303,7 +303,6 @@ static int jmb38x_ms_transfer_data(struct jmb38x_ms_host *host) unsigned int off; unsigned int t_size, p_cnt; unsigned char *buf; - struct page *pg; unsigned long flags = 0; if (host->req->long_data) { @@ -318,14 +317,26 @@ static int jmb38x_ms_transfer_data(struct jmb38x_ms_host *host) unsigned int uninitialized_var(p_off); if (host->req->long_data) { - pg = nth_page(sg_page(>req->sg), - off >> PAGE_SHIFT); p_off = offset_in_page(off); p_cnt = PAGE_SIZE - p_off; p_cnt = min(p_cnt, length); local_irq_save(flags); - buf = kmap_atomic(pg) + p_off; + buf = sg_map_offset(>req->sg, +off - host->req->sg.offset, +SG_KMAP_ATOMIC); + if (IS_ERR(buf)) { + /* +* This should really never happen unless +* the code is changed to use memory that is +* not mappable in the sg. Seeing there doesn't +* seem to be any error path out of here, +* we can only WARN. +*/ + WARN(1, "Non-mappable memory used in sg!"); + break; + } + } else { buf = host->req->data + host->block_pos; p_cnt = host->req->data_len - host->block_pos; @@ -341,7 +352,9 @@ static int jmb38x_ms_transfer_data(struct jmb38x_ms_host *host) : jmb38x_ms_read_reg_data(host, buf, p_cnt); if (host->req->long_data) { - kunmap_atomic(buf - p_off); + sg_unmap_offset(>req->sg, buf, +off - host->req->sg.offset, +SG_KMAP_ATOMIC); local_irq_restore(flags); } diff --git a/drivers/memstick/host/tifm_ms.c b/drivers/memstick/host/tifm_ms.c index 7bafa72..c0bc40e 100644 --- a/drivers/memstick/host/tifm_ms.c +++ b/drivers/memstick/host/tifm_ms.c @@ -186,7 +186,6 @@ static unsigned int tifm_ms_transfer_data(struct tifm_ms *host) unsigned int off; unsigned int t_size, p_cnt; unsigned char *buf; - struct page *pg; unsigned long flags = 0; if (host->req->long_data) { @@ -203,14 +202,25 @@ static unsigned int tifm_ms_transfer_data(struct tifm_ms *host) unsigned int uninitialized_var(p_off); if (host->req->long_data) { - pg = nth_page(sg_page(>req->sg), - off >> PAGE_SHIFT); p_off = offset_in_page(off); p_cnt = PAGE_SIZE - p_off; p_cnt = min(p_cnt, length); local_irq_save(flags); - buf = kmap_atomic(pg) + p_off; + buf = sg_map_offset(>req->sg, +off - host->req->sg.offset, +SG_KMAP_ATOMIC); + if (IS_ERR(buf)) { + /* +* This should really never happen unless +* the code is changed to use memory that is +* not mappable in the sg. Seeing there doesn't +* seem to be any error path out of here, +* we can only WARN. +*/ + WARN(1, "Non-mappable memory used in sg!"); + break; + } } else { buf = host->req->data + host->block_pos; p_cnt = host->req->data_len - host->block_pos; @@ -221,7 +231,9 @@ static unsigned int tifm_ms_transfer_data(struct tifm_ms *host) : tifm_ms_read_data(host, buf, p_cnt); if
[PATCH 16/22] xen-blkfront: Make use of the new sg_map helper function
Straightforward conversion to the new helper, except due to the lack of error path, we have to warn if unmapable memory is ever present in the sgl. Signed-off-by: Logan Gunthorpe--- drivers/block/xen-blkfront.c | 33 +++-- 1 file changed, 27 insertions(+), 6 deletions(-) diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c index 5067a0a..7dcf41d 100644 --- a/drivers/block/xen-blkfront.c +++ b/drivers/block/xen-blkfront.c @@ -807,8 +807,19 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri BUG_ON(sg->offset + sg->length > PAGE_SIZE); if (setup.need_copy) { - setup.bvec_off = sg->offset; - setup.bvec_data = kmap_atomic(sg_page(sg)); + setup.bvec_off = 0; + setup.bvec_data = sg_map(sg, SG_KMAP_ATOMIC); + if (IS_ERR(setup.bvec_data)) { + /* +* This should really never happen unless +* the code is changed to use memory that is +* not mappable in the sg. Seeing there is a +* questionable error path out of here, +* we WARN. +*/ + WARN(1, "Non-mappable memory used in sg!"); + return 1; + } } gnttab_foreach_grant_in_range(sg_page(sg), @@ -818,7 +829,7 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri ); if (setup.need_copy) - kunmap_atomic(setup.bvec_data); + sg_unmap(sg, setup.bvec_data, SG_KMAP_ATOMIC); } if (setup.segments) kunmap_atomic(setup.segments); @@ -1468,8 +1479,18 @@ static bool blkif_completion(unsigned long *id, for_each_sg(s->sg, sg, num_sg, i) { BUG_ON(sg->offset + sg->length > PAGE_SIZE); - data.bvec_offset = sg->offset; - data.bvec_data = kmap_atomic(sg_page(sg)); + data.bvec_offset = 0; + data.bvec_data = sg_map(sg, SG_KMAP_ATOMIC); + if (IS_ERR(data.bvec_data)) { + /* +* This should really never happen unless +* the code is changed to use memory that is +* not mappable in the sg. Seeing there is no +* clear error path, we WARN. +*/ + WARN(1, "Non-mappable memory used in sg!"); + return 1; + } gnttab_foreach_grant_in_range(sg_page(sg), sg->offset, @@ -1477,7 +1498,7 @@ static bool blkif_completion(unsigned long *id, blkif_copy_from_grant, ); - kunmap_atomic(data.bvec_data); + sg_unmap(sg, data.bvec_data, SG_KMAP_ATOMIC); } } /* Add the persistent grant into the list of free grants */ -- 2.1.4
[PATCH 19/22] mmc: tmio: Make use of the new sg_map helper function
Straightforward conversion to sg_map helper. A couple paths will WARN if the memory does not end up being mappable. Signed-off-by: Logan Gunthorpe--- drivers/mmc/host/tmio_mmc.h | 12 ++-- drivers/mmc/host/tmio_mmc_dma.c | 5 + drivers/mmc/host/tmio_mmc_pio.c | 24 3 files changed, 39 insertions(+), 2 deletions(-) diff --git a/drivers/mmc/host/tmio_mmc.h b/drivers/mmc/host/tmio_mmc.h index 2b349d4..ba68c9fed 100644 --- a/drivers/mmc/host/tmio_mmc.h +++ b/drivers/mmc/host/tmio_mmc.h @@ -198,17 +198,25 @@ void tmio_mmc_enable_mmc_irqs(struct tmio_mmc_host *host, u32 i); void tmio_mmc_disable_mmc_irqs(struct tmio_mmc_host *host, u32 i); irqreturn_t tmio_mmc_irq(int irq, void *devid); +/* Note: this function may return PTR_ERR and must be checked! */ static inline char *tmio_mmc_kmap_atomic(struct scatterlist *sg, unsigned long *flags) { + void *ret; + local_irq_save(*flags); - return kmap_atomic(sg_page(sg)) + sg->offset; + ret = sg_map(sg, SG_KMAP_ATOMIC); + + if (IS_ERR(ret)) + local_irq_restore(*flags); + + return ret; } static inline void tmio_mmc_kunmap_atomic(struct scatterlist *sg, unsigned long *flags, void *virt) { - kunmap_atomic(virt - sg->offset); + sg_unmap(sg, virt, SG_KMAP_ATOMIC); local_irq_restore(*flags); } diff --git a/drivers/mmc/host/tmio_mmc_dma.c b/drivers/mmc/host/tmio_mmc_dma.c index fa8a936..07531f7 100644 --- a/drivers/mmc/host/tmio_mmc_dma.c +++ b/drivers/mmc/host/tmio_mmc_dma.c @@ -149,6 +149,11 @@ static void tmio_mmc_start_dma_tx(struct tmio_mmc_host *host) if (!aligned) { unsigned long flags; void *sg_vaddr = tmio_mmc_kmap_atomic(sg, ); + if (IS_ERR(sg_vaddr)) { + ret = PTR_ERR(sg_vaddr); + goto pio; + } + sg_init_one(>bounce_sg, host->bounce_buf, sg->length); memcpy(host->bounce_buf, sg_vaddr, host->bounce_sg.length); tmio_mmc_kunmap_atomic(sg, , sg_vaddr); diff --git a/drivers/mmc/host/tmio_mmc_pio.c b/drivers/mmc/host/tmio_mmc_pio.c index 6b789a7..d6fdbf6 100644 --- a/drivers/mmc/host/tmio_mmc_pio.c +++ b/drivers/mmc/host/tmio_mmc_pio.c @@ -479,6 +479,18 @@ static void tmio_mmc_pio_irq(struct tmio_mmc_host *host) } sg_virt = tmio_mmc_kmap_atomic(host->sg_ptr, ); + if (IS_ERR(sg_virt)) { + /* +* This should really never happen unless +* the code is changed to use memory that is +* not mappable in the sg. Seeing there doesn't +* seem to be any error path out of here, +* we can only WARN. +*/ + WARN(1, "Non-mappable memory used in sg!"); + return; + } + buf = (unsigned short *)(sg_virt + host->sg_off); count = host->sg_ptr->length - host->sg_off; @@ -506,6 +518,18 @@ static void tmio_mmc_check_bounce_buffer(struct tmio_mmc_host *host) if (host->sg_ptr == >bounce_sg) { unsigned long flags; void *sg_vaddr = tmio_mmc_kmap_atomic(host->sg_orig, ); + if (IS_ERR(sg_vaddr)) { + /* +* This should really never happen unless +* the code is changed to use memory that is +* not mappable in the sg. Seeing there doesn't +* seem to be any error path out of here, +* we can only WARN. +*/ + WARN(1, "Non-mappable memory used in sg!"); + return; + } + memcpy(sg_vaddr, host->bounce_buf, host->bounce_sg.length); tmio_mmc_kunmap_atomic(host->sg_orig, , sg_vaddr); } -- 2.1.4
[PATCH 05/22] drm/i915: Make use of the new sg_map helper function
This is a single straightforward conversion from kmap to sg_map. Signed-off-by: Logan Gunthorpe--- drivers/gpu/drm/i915/i915_gem.c | 27 --- 1 file changed, 16 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 67b1fc5..1b1b91a 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2188,6 +2188,15 @@ static void __i915_gem_object_reset_page_iter(struct drm_i915_gem_object *obj) radix_tree_delete(>mm.get_page.radix, iter.index); } +static void i915_gem_object_unmap(const struct drm_i915_gem_object *obj, + void *ptr) +{ + if (is_vmalloc_addr(ptr)) + vunmap(ptr); + else + sg_unmap(obj->mm.pages->sgl, ptr, SG_KMAP); +} + void __i915_gem_object_put_pages(struct drm_i915_gem_object *obj, enum i915_mm_subclass subclass) { @@ -2215,10 +2224,7 @@ void __i915_gem_object_put_pages(struct drm_i915_gem_object *obj, void *ptr; ptr = ptr_mask_bits(obj->mm.mapping); - if (is_vmalloc_addr(ptr)) - vunmap(ptr); - else - kunmap(kmap_to_page(ptr)); + i915_gem_object_unmap(obj, ptr); obj->mm.mapping = NULL; } @@ -2475,8 +2481,11 @@ static void *i915_gem_object_map(const struct drm_i915_gem_object *obj, void *addr; /* A single page can always be kmapped */ - if (n_pages == 1 && type == I915_MAP_WB) - return kmap(sg_page(sgt->sgl)); + if (n_pages == 1 && type == I915_MAP_WB) { + addr = sg_map(sgt->sgl, SG_KMAP); + if (IS_ERR(addr)) + return NULL; + } if (n_pages > ARRAY_SIZE(stack_pages)) { /* Too big for stack -- allocate temporary array instead */ @@ -2543,11 +2552,7 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj, goto err_unpin; } - if (is_vmalloc_addr(ptr)) - vunmap(ptr); - else - kunmap(kmap_to_page(ptr)); - + i915_gem_object_unmap(obj, ptr); ptr = obj->mm.mapping = NULL; } -- 2.1.4
[PATCH 01/22] scatterlist: Introduce sg_map helper functions
This patch introduces functions which kmap the pages inside an sgl. Two variants are provided: one if an offset is required and one if the offset is zero. These functions replace a common pattern of kmap(sg_page(sg)) that is used in about 50 places within the kernel. The motivation for this work is to eventually safely support sgls that contain io memory. In order for that to work, any access to the contents of an iomem SGL will need to be done with iomemcpy or hit some warning. (The exact details of how this will work have yet to be worked out.) Having all the kmaps in one place is just a first step in that direction. Additionally, seeing this helps cut down the users of sg_page, it should make any effort to go to struct-page-less DMAs a little easier (should that idea ever swing back into favour again). A flags option is added to select between a regular or atomic mapping so these functions can replace kmap(sg_page or kmap_atomic(sg_page. Future work may expand this to have flags for using page_address or vmap. Much further in the future, there may be a flag to allocate memory and copy the data from/to iomem. We also add the semantic that sg_map can fail to create a mapping, despite the fact that the current code this is replacing is assumed to never fail and the current version of these functions cannot fail. This is to support iomem which either have to fail to create the mapping or allocate memory as a bounce buffer which itself can fail. Also, in terms of cleanup, a few of the existing kmap(sg_page) users play things a bit loose in terms of whether they apply sg->offset so using these helper functions should help avoid such issues. Signed-off-by: Logan Gunthorpe--- drivers/dma-buf/dma-buf.c | 3 ++ include/linux/scatterlist.h | 97 + 2 files changed, 100 insertions(+) diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index 0007b79..b95934b 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -37,6 +37,9 @@ #include +/* Prevent the highmem.h macro from aliasing ops->kunmap_atomic */ +#undef kunmap_atomic + static inline int is_dma_buf_file(struct file *); struct dma_buf_list { diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h index cb3c8fe..acd4d73 100644 --- a/include/linux/scatterlist.h +++ b/include/linux/scatterlist.h @@ -5,6 +5,7 @@ #include #include #include +#include #include struct scatterlist { @@ -126,6 +127,102 @@ static inline struct page *sg_page(struct scatterlist *sg) return (struct page *)((sg)->page_link & ~0x3); } +#define SG_KMAP(1 << 0)/* create a mapping with kmap */ +#define SG_KMAP_ATOMIC (1 << 1)/* create a mapping with kmap_atomic */ + +/** + * sg_map_offset - kmap a page inside an sgl + * @sg:SG entry + * @offset:Offset into entry + * @flags: Flags for creating the mapping + * + * Description: + * Use this function to map a page in the scatterlist at the specified + * offset. sg->offset is already added for you. Note: the semantics of + * this function are that it may fail. Thus, its output should be checked + * with IS_ERR and PTR_ERR. Otherwise, a pointer to the specified offset + * in the mapped page is returned. + * + * Flags can be any of: + * * SG_KMAP- Use kmap to create the mapping + * * SG_KMAP_ATOMIC - Use kmap_atomic to map the page atommically. + *Thus, the rules of that function apply: the cpu + *may not sleep until it is unmaped. + * + * Also, consider carefully whether this function is appropriate. It is + * largely not recommended for new code and if the sgl came from another + * subsystem and you don't know what kind of memory might be in the list + * then you definitely should not call it. Non-mappable memory may be in + * the sgl and thus this function may fail unexpectedly. + **/ +static inline void *sg_map_offset(struct scatterlist *sg, size_t offset, + int flags) +{ + struct page *pg; + unsigned int pg_off; + + offset += sg->offset; + pg = nth_page(sg_page(sg), offset >> PAGE_SHIFT); + pg_off = offset_in_page(offset); + + if (flags & SG_KMAP_ATOMIC) + return kmap_atomic(pg) + pg_off; + else + return kmap(pg) + pg_off; +} + +/** + * sg_unkmap_offset - unmap a page that was mapped with sg_map_offset + * @sg:SG entry + * @addr: address returned by sg_map_offset + * @offset:Offset into entry (same as specified for sg_map_offset) + * @flags: Flags, which are the same specified for sg_map_offset + * + * Description: + * Unmap the page that was mapped with sg_map_offset + * + **/ +static inline void sg_unmap_offset(struct scatterlist *sg, void *addr, + size_t offset, int flags) +{ +
[PATCH 00/22] Introduce common scatterlist map function
Hi Everyone, As part of my effort to enable P2P DMA transactions with PCI cards, we've identified the need to be able to safely put IO memory into scatterlists (and eventually other spots). This probably involves a conversion from struct page to pfn_t but that migration is a ways off and those decisions are yet to be made. As an initial step in that direction, I've started cleaning up some of the scatterlist code by trying to carve out a better defined layer between it and it's users. The longer term goal would be to remove sg_page or replace it with something that can potentially fail. This patchset is the first step in that effort. I've introduced a common function to map scatterlist memory and converted all the common kmap(sg_page()) cases. This removes about 66 sg_page calls (of ~331). Seeing this is a fairly large cleanup set that touches a wide swath of the kernel I have limited the people I've sent this to. I'd suggest we look toward merging the first patch and then I can send the individual subsystem patches on to their respective maintainers and get them merged independantly. (This is to avoid the conflicts I created with my last cleanup set... Sorry) Though, I'm certainly open to other suggestions to get it merged. The patchset is based on v4.11-rc6 and can be found in the sg_map branch from this git tree: https://github.com/sbates130272/linux-p2pmem.git Thanks, Logan Logan Gunthorpe (22): scatterlist: Introduce sg_map helper functions nvmet: Make use of the new sg_map helper function libiscsi: Make use of new the sg_map helper function target: Make use of the new sg_map function at 16 call sites drm/i915: Make use of the new sg_map helper function crypto: hifn_795x: Make use of the new sg_map helper function crypto: shash, caam: Make use of the new sg_map helper function crypto: chcr: Make use of the new sg_map helper function dm-crypt: Make use of the new sg_map helper in 4 call sites staging: unisys: visorbus: Make use of the new sg_map helper function RDS: Make use of the new sg_map helper function scsi: ipr, pmcraid, isci: Make use of the new sg_map helper in 4 call sites scsi: hisi_sas, mvsas, gdth: Make use of the new sg_map helper function scsi: arcmsr, ips, megaraid: Make use of the new sg_map helper function scsi: libfc, csiostor: Change to sg_copy_buffer in two drivers xen-blkfront: Make use of the new sg_map helper function mmc: sdhci: Make use of the new sg_map helper function mmc: spi: Make use of the new sg_map helper function mmc: tmio: Make use of the new sg_map helper function mmc: sdricoh_cs: Make use of the new sg_map helper function mmc: tifm_sd: Make use of the new sg_map helper function memstick: Make use of the new sg_map helper function crypto/shash.c | 9 +- drivers/block/xen-blkfront.c| 33 +-- drivers/crypto/caam/caamalg.c | 8 +- drivers/crypto/chelsio/chcr_algo.c | 28 +++--- drivers/crypto/hifn_795x.c | 32 --- drivers/dma-buf/dma-buf.c | 3 + drivers/gpu/drm/i915/i915_gem.c | 27 +++--- drivers/md/dm-crypt.c | 38 +--- drivers/memstick/host/jmb38x_ms.c | 23 - drivers/memstick/host/tifm_ms.c | 22 - drivers/mmc/host/mmc_spi.c | 26 +++-- drivers/mmc/host/sdhci.c| 35 ++- drivers/mmc/host/sdricoh_cs.c | 14 ++- drivers/mmc/host/tifm_sd.c | 88 + drivers/mmc/host/tmio_mmc.h | 12 ++- drivers/mmc/host/tmio_mmc_dma.c | 5 + drivers/mmc/host/tmio_mmc_pio.c | 24 + drivers/nvme/target/fabrics-cmd.c | 16 +++- drivers/scsi/arcmsr/arcmsr_hba.c| 16 +++- drivers/scsi/csiostor/csio_scsi.c | 54 +-- drivers/scsi/cxgbi/libcxgbi.c | 5 + drivers/scsi/gdth.c | 9 +- drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 14 ++- drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 13 ++- drivers/scsi/ipr.c | 27 +++--- drivers/scsi/ips.c | 8 +- drivers/scsi/isci/request.c | 42 drivers/scsi/libfc/fc_libfc.c | 49 ++ drivers/scsi/libiscsi_tcp.c | 32 --- drivers/scsi/megaraid.c | 9 +- drivers/scsi/mvsas/mv_sas.c | 10 +- drivers/scsi/pmcraid.c | 19 ++-- drivers/staging/unisys/visorhba/visorhba_main.c | 12 ++- drivers/target/iscsi/iscsi_target.c | 27 -- drivers/target/target_core_rd.c | 3 +- drivers/target/target_core_sbc.c| 122 +---
[PATCH 08/22] crypto: chcr: Make use of the new sg_map helper function
The get_page in this area looks *highly* suspect due to there being no corresponding put_page. However, I've left that as is to avoid breaking things. I've also removed the KMAP_ATOMIC_ARGS check as it appears to be dead code that dates back to when it was first committed... Signed-off-by: Logan Gunthorpe--- drivers/crypto/chelsio/chcr_algo.c | 28 +++- 1 file changed, 15 insertions(+), 13 deletions(-) diff --git a/drivers/crypto/chelsio/chcr_algo.c b/drivers/crypto/chelsio/chcr_algo.c index 41bc7f4..a993d1d 100644 --- a/drivers/crypto/chelsio/chcr_algo.c +++ b/drivers/crypto/chelsio/chcr_algo.c @@ -1489,22 +1489,21 @@ static struct sk_buff *create_authenc_wr(struct aead_request *req, return ERR_PTR(-EINVAL); } -static void aes_gcm_empty_pld_pad(struct scatterlist *sg, - unsigned short offset) +static int aes_gcm_empty_pld_pad(struct scatterlist *sg, +unsigned short offset) { - struct page *spage; unsigned char *addr; - spage = sg_page(sg); - get_page(spage); /* so that it is not freed by NIC */ -#ifdef KMAP_ATOMIC_ARGS - addr = kmap_atomic(spage, KM_SOFTIRQ0); -#else - addr = kmap_atomic(spage); -#endif - memset(addr + sg->offset, 0, offset + 1); + get_page(sg_page(sg)); /* so that it is not freed by NIC */ + + addr = sg_map(sg, SG_KMAP_ATOMIC); + if (IS_ERR(addr)) + return PTR_ERR(addr); + + memset(addr, 0, offset + 1); + sg_unmap(sg, addr, SG_KMAP_ATOMIC); - kunmap_atomic(addr); + return 0; } static int set_msg_len(u8 *block, unsigned int msglen, int csize) @@ -1940,7 +1939,10 @@ static struct sk_buff *create_gcm_wr(struct aead_request *req, if (req->cryptlen) { write_sg_to_skb(skb, , src, req->cryptlen); } else { - aes_gcm_empty_pld_pad(req->dst, authsize - 1); + err = aes_gcm_empty_pld_pad(req->dst, authsize - 1); + if (err) + goto dstmap_fail; + write_sg_to_skb(skb, , reqctx->dst, crypt_len); } -- 2.1.4
[PATCH 17/22] mmc: sdhci: Make use of the new sg_map helper function
Straightforward conversion, except due to the lack of error path we have to WARN if the memory in the SGL is not mappable. Signed-off-by: Logan Gunthorpe--- drivers/mmc/host/sdhci.c | 35 ++- 1 file changed, 30 insertions(+), 5 deletions(-) diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c index 63bc33a..af0c107 100644 --- a/drivers/mmc/host/sdhci.c +++ b/drivers/mmc/host/sdhci.c @@ -497,15 +497,34 @@ static int sdhci_pre_dma_transfer(struct sdhci_host *host, return sg_count; } +/* + * Note this function may return PTR_ERR and must be checked. + */ static char *sdhci_kmap_atomic(struct scatterlist *sg, unsigned long *flags) { + void *ret; + local_irq_save(*flags); - return kmap_atomic(sg_page(sg)) + sg->offset; + + ret = sg_map(sg, SG_KMAP_ATOMIC); + if (IS_ERR(ret)) { + /* +* This should really never happen unless the code is changed +* to use memory that is not mappable in the sg. Seeing there +* doesn't seem to be any error path out of here, we can only +* WARN. +*/ + WARN(1, "Non-mappable memory used in sg!"); + local_irq_restore(*flags); + } + + return ret; } -static void sdhci_kunmap_atomic(void *buffer, unsigned long *flags) +static void sdhci_kunmap_atomic(struct scatterlist *sg, void *buffer, + unsigned long *flags) { - kunmap_atomic(buffer); + sg_unmap(sg, buffer, SG_KMAP_ATOMIC); local_irq_restore(*flags); } @@ -568,8 +587,11 @@ static void sdhci_adma_table_pre(struct sdhci_host *host, if (offset) { if (data->flags & MMC_DATA_WRITE) { buffer = sdhci_kmap_atomic(sg, ); + if (IS_ERR(buffer)) + return; + memcpy(align, buffer, offset); - sdhci_kunmap_atomic(buffer, ); + sdhci_kunmap_atomic(sg, buffer, ); } /* tran, valid */ @@ -646,8 +668,11 @@ static void sdhci_adma_table_post(struct sdhci_host *host, (sg_dma_address(sg) & SDHCI_ADMA2_MASK); buffer = sdhci_kmap_atomic(sg, ); + if (IS_ERR(buffer)) + return; + memcpy(buffer, align, size); - sdhci_kunmap_atomic(buffer, ); + sdhci_kunmap_atomic(sg, buffer, ); align += SDHCI_ADMA2_ALIGN; } -- 2.1.4
[PATCH 21/22] mmc: tifm_sd: Make use of the new sg_map helper function
This conversion is a bit complicated. We modiy the read_fifo, write_fifo and copy_page functions to take a scatterlist instead of a page. Thus we can use sg_map instead of kmap_atomic. There's a bit of accounting that needed to be done for the offset for this to work. (Seeing sg_map takes care of the offset but it's already added and used earlier in the code. There's also no error path, so if unmappable memory finds its way into the sgl we can only WARN. Signed-off-by: Logan Gunthorpe--- drivers/mmc/host/tifm_sd.c | 88 +++--- 1 file changed, 67 insertions(+), 21 deletions(-) diff --git a/drivers/mmc/host/tifm_sd.c b/drivers/mmc/host/tifm_sd.c index 93c4b40..75b0d74 100644 --- a/drivers/mmc/host/tifm_sd.c +++ b/drivers/mmc/host/tifm_sd.c @@ -111,14 +111,26 @@ struct tifm_sd { }; /* for some reason, host won't respond correctly to readw/writew */ -static void tifm_sd_read_fifo(struct tifm_sd *host, struct page *pg, +static void tifm_sd_read_fifo(struct tifm_sd *host, struct scatterlist *sg, unsigned int off, unsigned int cnt) { struct tifm_dev *sock = host->dev; unsigned char *buf; unsigned int pos = 0, val; - buf = kmap_atomic(pg) + off; + buf = sg_map_offset(sg, off - sg->offset, SG_KMAP_ATOMIC); + if (IS_ERR(buf)) { + /* +* This should really never happen unless +* the code is changed to use memory that is +* not mappable in the sg. Seeing there doesn't +* seem to be any error path out of here, +* we can only WARN. +*/ + WARN(1, "Non-mappable memory used in sg!"); + return; + } + if (host->cmd_flags & DATA_CARRY) { buf[pos++] = host->bounce_buf_data[0]; host->cmd_flags &= ~DATA_CARRY; @@ -134,17 +146,29 @@ static void tifm_sd_read_fifo(struct tifm_sd *host, struct page *pg, } buf[pos++] = (val >> 8) & 0xff; } - kunmap_atomic(buf - off); + sg_unmap_offset(sg, buf, off - sg->offset, SG_KMAP_ATOMIC); } -static void tifm_sd_write_fifo(struct tifm_sd *host, struct page *pg, +static void tifm_sd_write_fifo(struct tifm_sd *host, struct scatterlist *sg, unsigned int off, unsigned int cnt) { struct tifm_dev *sock = host->dev; unsigned char *buf; unsigned int pos = 0, val; - buf = kmap_atomic(pg) + off; + buf = sg_map_offset(sg, off - sg->offset, SG_KMAP_ATOMIC); + if (IS_ERR(buf)) { + /* +* This should really never happen unless +* the code is changed to use memory that is +* not mappable in the sg. Seeing there doesn't +* seem to be any error path out of here, +* we can only WARN. +*/ + WARN(1, "Non-mappable memory used in sg!"); + return; + } + if (host->cmd_flags & DATA_CARRY) { val = host->bounce_buf_data[0] | ((buf[pos++] << 8) & 0xff00); writel(val, sock->addr + SOCK_MMCSD_DATA); @@ -161,7 +185,7 @@ static void tifm_sd_write_fifo(struct tifm_sd *host, struct page *pg, val |= (buf[pos++] << 8) & 0xff00; writel(val, sock->addr + SOCK_MMCSD_DATA); } - kunmap_atomic(buf - off); + sg_unmap_offset(sg, buf, off - sg->offset, SG_KMAP_ATOMIC); } static void tifm_sd_transfer_data(struct tifm_sd *host) @@ -170,7 +194,6 @@ static void tifm_sd_transfer_data(struct tifm_sd *host) struct scatterlist *sg = r_data->sg; unsigned int off, cnt, t_size = TIFM_MMCSD_FIFO_SIZE * 2; unsigned int p_off, p_cnt; - struct page *pg; if (host->sg_pos == host->sg_len) return; @@ -192,33 +215,57 @@ static void tifm_sd_transfer_data(struct tifm_sd *host) } off = sg[host->sg_pos].offset + host->block_pos; - pg = nth_page(sg_page([host->sg_pos]), off >> PAGE_SHIFT); p_off = offset_in_page(off); p_cnt = PAGE_SIZE - p_off; p_cnt = min(p_cnt, cnt); p_cnt = min(p_cnt, t_size); if (r_data->flags & MMC_DATA_READ) - tifm_sd_read_fifo(host, pg, p_off, p_cnt); + tifm_sd_read_fifo(host, [host->sg_pos], p_off, + p_cnt); else if (r_data->flags & MMC_DATA_WRITE) - tifm_sd_write_fifo(host, pg, p_off, p_cnt); + tifm_sd_write_fifo(host, [host->sg_pos], p_off, + p_cnt); t_size -= p_cnt; host->block_pos += p_cnt; } } -static void tifm_sd_copy_page(struct page
[PATCH 09/22] dm-crypt: Make use of the new sg_map helper in 4 call sites
Very straightforward conversion to the new function in all four spots. Signed-off-by: Logan Gunthorpe--- drivers/md/dm-crypt.c | 38 +- 1 file changed, 25 insertions(+), 13 deletions(-) diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c index 389a363..6bd0ffc 100644 --- a/drivers/md/dm-crypt.c +++ b/drivers/md/dm-crypt.c @@ -589,9 +589,12 @@ static int crypt_iv_lmk_gen(struct crypt_config *cc, u8 *iv, int r = 0; if (bio_data_dir(dmreq->ctx->bio_in) == WRITE) { - src = kmap_atomic(sg_page(>sg_in)); - r = crypt_iv_lmk_one(cc, iv, dmreq, src + dmreq->sg_in.offset); - kunmap_atomic(src); + src = sg_map(>sg_in, SG_KMAP_ATOMIC); + if (IS_ERR(src)) + return PTR_ERR(src); + + r = crypt_iv_lmk_one(cc, iv, dmreq, src); + sg_unmap(>sg_in, src, SG_KMAP_ATOMIC); } else memset(iv, 0, cc->iv_size); @@ -607,14 +610,17 @@ static int crypt_iv_lmk_post(struct crypt_config *cc, u8 *iv, if (bio_data_dir(dmreq->ctx->bio_in) == WRITE) return 0; - dst = kmap_atomic(sg_page(>sg_out)); - r = crypt_iv_lmk_one(cc, iv, dmreq, dst + dmreq->sg_out.offset); + dst = sg_map(>sg_out, SG_KMAP_ATOMIC); + if (IS_ERR(dst)) + return PTR_ERR(dst); + + r = crypt_iv_lmk_one(cc, iv, dmreq, dst); /* Tweak the first block of plaintext sector */ if (!r) - crypto_xor(dst + dmreq->sg_out.offset, iv, cc->iv_size); + crypto_xor(dst, iv, cc->iv_size); - kunmap_atomic(dst); + sg_unmap(>sg_out, dst, SG_KMAP_ATOMIC); return r; } @@ -731,9 +737,12 @@ static int crypt_iv_tcw_gen(struct crypt_config *cc, u8 *iv, /* Remove whitening from ciphertext */ if (bio_data_dir(dmreq->ctx->bio_in) != WRITE) { - src = kmap_atomic(sg_page(>sg_in)); - r = crypt_iv_tcw_whitening(cc, dmreq, src + dmreq->sg_in.offset); - kunmap_atomic(src); + src = sg_map(>sg_in, SG_KMAP_ATOMIC); + if (IS_ERR(src)) + return PTR_ERR(src); + + r = crypt_iv_tcw_whitening(cc, dmreq, src); + sg_unmap(>sg_in, src, SG_KMAP_ATOMIC); } /* Calculate IV */ @@ -755,9 +764,12 @@ static int crypt_iv_tcw_post(struct crypt_config *cc, u8 *iv, return 0; /* Apply whitening on ciphertext */ - dst = kmap_atomic(sg_page(>sg_out)); - r = crypt_iv_tcw_whitening(cc, dmreq, dst + dmreq->sg_out.offset); - kunmap_atomic(dst); + dst = sg_map(>sg_out, SG_KMAP_ATOMIC); + if (IS_ERR(dst)) + return PTR_ERR(dst); + + r = crypt_iv_tcw_whitening(cc, dmreq, dst); + sg_unmap(>sg_out, dst, SG_KMAP_ATOMIC); return r; } -- 2.1.4
[PATCH 20/22] mmc: sdricoh_cs: Make use of the new sg_map helper function
This is a straightforward conversion to the new function. Signed-off-by: Logan Gunthorpe--- drivers/mmc/host/sdricoh_cs.c | 14 +- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/drivers/mmc/host/sdricoh_cs.c b/drivers/mmc/host/sdricoh_cs.c index 5ff26ab..7eeed23 100644 --- a/drivers/mmc/host/sdricoh_cs.c +++ b/drivers/mmc/host/sdricoh_cs.c @@ -319,16 +319,20 @@ static void sdricoh_request(struct mmc_host *mmc, struct mmc_request *mrq) for (i = 0; i < data->blocks; i++) { size_t len = data->blksz; u8 *buf; - struct page *page; int result; - page = sg_page(data->sg); - buf = kmap(page) + data->sg->offset + (len * i); + buf = sg_map_offset(data->sg, (len * i), SG_KMAP); + if (IS_ERR(buf)) { + cmd->error = PTR_ERR(buf); + break; + } + result = sdricoh_blockio(host, data->flags & MMC_DATA_READ, buf, len); - kunmap(page); - flush_dcache_page(page); + sg_unmap_offset(data->sg, buf, (len * i), SG_KMAP); + + flush_dcache_page(sg_page(data->sg)); if (result) { dev_err(dev, "sdricoh_request: cmd %i " "block transfer failed\n", cmd->opcode); -- 2.1.4
[PATCH 14/22] scsi: arcmsr, ips, megaraid: Make use of the new sg_map helper function
Very straightforward conversion of three scsi drivers Signed-off-by: Logan Gunthorpe--- drivers/scsi/arcmsr/arcmsr_hba.c | 16 drivers/scsi/ips.c | 8 drivers/scsi/megaraid.c | 9 +++-- 3 files changed, 23 insertions(+), 10 deletions(-) diff --git a/drivers/scsi/arcmsr/arcmsr_hba.c b/drivers/scsi/arcmsr/arcmsr_hba.c index af032c4..3cd485c 100644 --- a/drivers/scsi/arcmsr/arcmsr_hba.c +++ b/drivers/scsi/arcmsr/arcmsr_hba.c @@ -2306,7 +2306,10 @@ static int arcmsr_iop_message_xfer(struct AdapterControlBlock *acb, use_sg = scsi_sg_count(cmd); sg = scsi_sglist(cmd); - buffer = kmap_atomic(sg_page(sg)) + sg->offset; + buffer = sg_map(sg, SG_KMAP_ATOMIC); + if (IS_ERR(buffer)) + return ARCMSR_MESSAGE_FAIL; + if (use_sg > 1) { retvalue = ARCMSR_MESSAGE_FAIL; goto message_out; @@ -2539,7 +2542,7 @@ static int arcmsr_iop_message_xfer(struct AdapterControlBlock *acb, message_out: if (use_sg) { struct scatterlist *sg = scsi_sglist(cmd); - kunmap_atomic(buffer - sg->offset); + sg_unmap(sg, buffer, SG_KMAP_ATOMIC); } return retvalue; } @@ -2590,11 +2593,16 @@ static void arcmsr_handle_virtual_command(struct AdapterControlBlock *acb, strncpy([32], "R001", 4); /* Product Revision */ sg = scsi_sglist(cmd); - buffer = kmap_atomic(sg_page(sg)) + sg->offset; + buffer = sg_map(sg, SG_KMAP_ATOMIC); + if (IS_ERR(buffer)) { + cmd->result = (DID_ERROR << 16); + cmd->scsi_done(cmd); + return; + } memcpy(buffer, inqdata, sizeof(inqdata)); sg = scsi_sglist(cmd); - kunmap_atomic(buffer - sg->offset); + sg_unmap(sg, buffer, SG_KMAP_ATOMIC); cmd->scsi_done(cmd); } diff --git a/drivers/scsi/ips.c b/drivers/scsi/ips.c index 3419e1b..a44291d 100644 --- a/drivers/scsi/ips.c +++ b/drivers/scsi/ips.c @@ -1506,14 +1506,14 @@ static int ips_is_passthru(struct scsi_cmnd *SC) /* kmap_atomic() ensures addressability of the user buffer.*/ /* local_irq_save() protects the KM_IRQ0 address slot. */ local_irq_save(flags); -buffer = kmap_atomic(sg_page(sg)) + sg->offset; -if (buffer && buffer[0] == 'C' && buffer[1] == 'O' && +buffer = sg_map(sg, SG_KMAP_ATOMIC); +if (!IS_ERR(buffer) && buffer[0] == 'C' && buffer[1] == 'O' && buffer[2] == 'P' && buffer[3] == 'P') { -kunmap_atomic(buffer - sg->offset); +sg_unmap(sg, buffer, SG_KMAP_ATOMIC); local_irq_restore(flags); return 1; } -kunmap_atomic(buffer - sg->offset); +sg_unmap(sg, buffer, SG_KMAP_ATOMIC); local_irq_restore(flags); } return 0; diff --git a/drivers/scsi/megaraid.c b/drivers/scsi/megaraid.c index 3c63c29..0b66e50 100644 --- a/drivers/scsi/megaraid.c +++ b/drivers/scsi/megaraid.c @@ -663,10 +663,15 @@ mega_build_cmd(adapter_t *adapter, Scsi_Cmnd *cmd, int *busy) struct scatterlist *sg; sg = scsi_sglist(cmd); - buf = kmap_atomic(sg_page(sg)) + sg->offset; + buf = sg_map(sg, SG_KMAP_ATOMIC); + if (IS_ERR(buf)) { +cmd->result = (DID_ERROR << 16); + cmd->scsi_done(cmd); + return NULL; + } memset(buf, 0, cmd->cmnd[4]); - kunmap_atomic(buf - sg->offset); + sg_unmap(sg, buf, SG_KMAP_ATOMIC); cmd->result = (DID_OK << 16); cmd->scsi_done(cmd); -- 2.1.4
[PATCH 02/22] nvmet: Make use of the new sg_map helper function
This is a straight forward conversion in two places. Should kmap fail, the code will return an INVALD_DATA error in the completion. Signed-off-by: Logan Gunthorpe--- drivers/nvme/target/fabrics-cmd.c | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/drivers/nvme/target/fabrics-cmd.c b/drivers/nvme/target/fabrics-cmd.c index 8bd022af..f62a634 100644 --- a/drivers/nvme/target/fabrics-cmd.c +++ b/drivers/nvme/target/fabrics-cmd.c @@ -122,7 +122,11 @@ static void nvmet_execute_admin_connect(struct nvmet_req *req) struct nvmet_ctrl *ctrl = NULL; u16 status = 0; - d = kmap(sg_page(req->sg)) + req->sg->offset; + d = sg_map(req->sg, SG_KMAP); + if (IS_ERR(d)) { + status = NVME_SC_SGL_INVALID_DATA; + goto out; + } /* zero out initial completion result, assign values as needed */ req->rsp->result.u32 = 0; @@ -158,7 +162,7 @@ static void nvmet_execute_admin_connect(struct nvmet_req *req) req->rsp->result.u16 = cpu_to_le16(ctrl->cntlid); out: - kunmap(sg_page(req->sg)); + sg_unmap(req->sg, d, SG_KMAP); nvmet_req_complete(req, status); } @@ -170,7 +174,11 @@ static void nvmet_execute_io_connect(struct nvmet_req *req) u16 qid = le16_to_cpu(c->qid); u16 status = 0; - d = kmap(sg_page(req->sg)) + req->sg->offset; + d = sg_map(req->sg, SG_KMAP); + if (IS_ERR(d)) { + status = NVME_SC_SGL_INVALID_DATA; + goto out; + } /* zero out initial completion result, assign values as needed */ req->rsp->result.u32 = 0; @@ -205,7 +213,7 @@ static void nvmet_execute_io_connect(struct nvmet_req *req) pr_info("adding queue %d to ctrl %d.\n", qid, ctrl->cntlid); out: - kunmap(sg_page(req->sg)); + sg_unmap(req->sg, d, SG_KMAP); nvmet_req_complete(req, status); return; -- 2.1.4
[PATCH 13/22] scsi: hisi_sas, mvsas, gdth: Make use of the new sg_map helper function
Very straightforward conversion of three scsi drivers. Signed-off-by: Logan Gunthorpe--- drivers/scsi/gdth.c| 9 +++-- drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 14 +- drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 13 + drivers/scsi/mvsas/mv_sas.c| 10 +- 4 files changed, 30 insertions(+), 16 deletions(-) diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c index d020a13..82c9fba 100644 --- a/drivers/scsi/gdth.c +++ b/drivers/scsi/gdth.c @@ -2301,10 +2301,15 @@ static void gdth_copy_internal_data(gdth_ha_str *ha, Scsi_Cmnd *scp, return; } local_irq_save(flags); -address = kmap_atomic(sg_page(sl)) + sl->offset; +address = sg_map(sl, SG_KMAP_ATOMIC); +if (IS_ERR(address)) { +scp->result = DID_ERROR << 16; +return; + } + memcpy(address, buffer, cpnow); flush_dcache_page(sg_page(sl)); -kunmap_atomic(address); +sg_unmap(sl, address, SG_KMAP_ATOMIC); local_irq_restore(flags); if (cpsum == cpcount) break; diff --git a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c index 854fbea..30408f8 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c @@ -1377,18 +1377,22 @@ static int slot_complete_v1_hw(struct hisi_hba *hisi_hba, void *to; struct scatterlist *sg_resp = >smp_task.smp_resp; - ts->stat = SAM_STAT_GOOD; - to = kmap_atomic(sg_page(sg_resp)); + to = sg_map(sg_resp, SG_KMAP_ATOMIC); + if (IS_ERR(to)) { + dev_err(dev, "slot complete: error mapping memory"); + ts->stat = SAS_SG_ERR; + break; + } + ts->stat = SAM_STAT_GOOD; dma_unmap_sg(dev, >smp_task.smp_resp, 1, DMA_FROM_DEVICE); dma_unmap_sg(dev, >smp_task.smp_req, 1, DMA_TO_DEVICE); - memcpy(to + sg_resp->offset, - slot->status_buffer + + memcpy(to, slot->status_buffer + sizeof(struct hisi_sas_err_record), sg_dma_len(sg_resp)); - kunmap_atomic(to); + sg_unmap(sg_resp, to, SG_KMAP_ATOMIC); break; } case SAS_PROTOCOL_SATA: diff --git a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c index 1b21445..0907947 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c +++ b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c @@ -1796,18 +1796,23 @@ slot_complete_v2_hw(struct hisi_hba *hisi_hba, struct hisi_sas_slot *slot, struct scatterlist *sg_resp = >smp_task.smp_resp; void *to; + to = sg_map(sg_resp, SG_KMAP_ATOMIC); + if (IS_ERR(to)) { + dev_err(dev, "slot complete: error mapping memory"); + ts->stat = SAS_SG_ERR; + break; + } + ts->stat = SAM_STAT_GOOD; - to = kmap_atomic(sg_page(sg_resp)); dma_unmap_sg(dev, >smp_task.smp_resp, 1, DMA_FROM_DEVICE); dma_unmap_sg(dev, >smp_task.smp_req, 1, DMA_TO_DEVICE); - memcpy(to + sg_resp->offset, - slot->status_buffer + + memcpy(to, slot->status_buffer + sizeof(struct hisi_sas_err_record), sg_dma_len(sg_resp)); - kunmap_atomic(to); + sg_unmap(sg_resp, to, SG_KMAP_ATOMIC); break; } case SAS_PROTOCOL_SATA: diff --git a/drivers/scsi/mvsas/mv_sas.c b/drivers/scsi/mvsas/mv_sas.c index c7cc803..374d0e0 100644 --- a/drivers/scsi/mvsas/mv_sas.c +++ b/drivers/scsi/mvsas/mv_sas.c @@ -1798,11 +1798,11 @@ int mvs_slot_complete(struct mvs_info *mvi, u32 rx_desc, u32 flags) case SAS_PROTOCOL_SMP: { struct scatterlist *sg_resp = >smp_task.smp_resp; tstat->stat = SAM_STAT_GOOD; - to = kmap_atomic(sg_page(sg_resp)); - memcpy(to + sg_resp->offset, - slot->response + sizeof(struct mvs_err_info), - sg_dma_len(sg_resp)); - kunmap_atomic(to); + to = sg_map(sg_resp, SG_KMAP_ATOMIC); + memcpy(to, + slot->response + sizeof(struct mvs_err_info), + sg_dma_len(sg_resp)); + sg_unmap(sg_resp, to, SG_KMAP_ATOMIC);
[PATCH 11/22] RDS: Make use of the new sg_map helper function
Straightforward conversion except there's no error path, so we WARN if the sg_map fails. Signed-off-by: Logan Gunthorpe--- net/rds/ib_recv.c | 17 ++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/net/rds/ib_recv.c b/net/rds/ib_recv.c index e10624a..7f8fa99 100644 --- a/net/rds/ib_recv.c +++ b/net/rds/ib_recv.c @@ -801,9 +801,20 @@ static void rds_ib_cong_recv(struct rds_connection *conn, to_copy = min(RDS_FRAG_SIZE - frag_off, PAGE_SIZE - map_off); BUG_ON(to_copy & 7); /* Must be 64bit aligned. */ - addr = kmap_atomic(sg_page(>f_sg)); + addr = sg_map(>f_sg, SG_KMAP_ATOMIC); + if (IS_ERR(addr)) { + /* +* This should really never happen unless +* the code is changed to use memory that is +* not mappable in the sg. Seeing there doesn't +* seem to be any error path out of here, +* we can only WARN. +*/ + WARN(1, "Non-mappable memory used in sg!"); + return; + } - src = addr + frag->f_sg.offset + frag_off; + src = addr + frag_off; dst = (void *)map->m_page_addrs[map_page] + map_off; for (k = 0; k < to_copy; k += 8) { /* Record ports that became uncongested, ie @@ -811,7 +822,7 @@ static void rds_ib_cong_recv(struct rds_connection *conn, uncongested |= ~(*src) & *dst; *dst++ = *src++; } - kunmap_atomic(addr); + sg_unmap(>f_sg, addr, SG_KMAP_ATOMIC); copied += to_copy; -- 2.1.4
[PATCH 06/22] crypto: hifn_795x: Make use of the new sg_map helper function
Conversion of a couple kmap_atomic instances to the sg_map helper function. However, it looks like there was a bug in the original code: the source scatter lists offset (t->offset) was passed to ablkcipher_get which added it to the destination address. This doesn't make a lot of sense, but t->offset is likely always zero anyway. So, this patch cleans that brokeness up. Also, a change to the error path: if ablkcipher_get failed, everything seemed to proceed as if it hadn't. Setting 'error' should hopefully clear that up. Signed-off-by: Logan Gunthorpe--- drivers/crypto/hifn_795x.c | 32 +--- 1 file changed, 21 insertions(+), 11 deletions(-) diff --git a/drivers/crypto/hifn_795x.c b/drivers/crypto/hifn_795x.c index e09d405..8e2c6a9 100644 --- a/drivers/crypto/hifn_795x.c +++ b/drivers/crypto/hifn_795x.c @@ -1619,7 +1619,7 @@ static int hifn_start_device(struct hifn_device *dev) return 0; } -static int ablkcipher_get(void *saddr, unsigned int *srestp, unsigned int offset, +static int ablkcipher_get(void *saddr, unsigned int *srestp, struct scatterlist *dst, unsigned int size, unsigned int *nbytesp) { unsigned int srest = *srestp, nbytes = *nbytesp, copy; @@ -1632,15 +1632,17 @@ static int ablkcipher_get(void *saddr, unsigned int *srestp, unsigned int offset while (size) { copy = min3(srest, dst->length, size); - daddr = kmap_atomic(sg_page(dst)); - memcpy(daddr + dst->offset + offset, saddr, copy); - kunmap_atomic(daddr); + daddr = sg_map(dst, SG_KMAP_ATOMIC); + if (IS_ERR(daddr)) + return PTR_ERR(daddr); + + memcpy(daddr, saddr, copy); + sg_unmap(dst, daddr, SG_KMAP_ATOMIC); nbytes -= copy; size -= copy; srest -= copy; saddr += copy; - offset = 0; pr_debug("%s: copy: %u, size: %u, srest: %u, nbytes: %u.\n", __func__, copy, size, srest, nbytes); @@ -1671,11 +1673,12 @@ static inline void hifn_complete_sa(struct hifn_device *dev, int i) static void hifn_process_ready(struct ablkcipher_request *req, int error) { + int err; struct hifn_request_context *rctx = ablkcipher_request_ctx(req); if (rctx->walk.flags & ASYNC_FLAGS_MISALIGNED) { unsigned int nbytes = req->nbytes; - int idx = 0, err; + int idx = 0; struct scatterlist *dst, *t; void *saddr; @@ -1695,17 +1698,24 @@ static void hifn_process_ready(struct ablkcipher_request *req, int error) continue; } - saddr = kmap_atomic(sg_page(t)); + saddr = sg_map(t, SG_KMAP_ATOMIC); + if (IS_ERR(saddr)) { + if (!error) + error = PTR_ERR(saddr); + break; + } + + err = ablkcipher_get(saddr, >length, +dst, nbytes, ); + sg_unmap(t, saddr, SG_KMAP_ATOMIC); - err = ablkcipher_get(saddr, >length, t->offset, - dst, nbytes, ); if (err < 0) { - kunmap_atomic(saddr); + if (!error) + error = err; break; } idx += err; - kunmap_atomic(saddr); } hifn_cipher_walk_exit(>walk); -- 2.1.4
Re: [RFC net-next] of: mdio: Honor hints from MDIO bus drivers
> The DT binding is in tree and provides an example of how the switch > looks like, below is the example, but I am also adding the MDIO bus and > the PHYs just so you can see how things wind up: > > switch_top@f0b0 { > compatible = "simple-bus"; > #size-cells = <1>; > #address-cells = <1>; > ranges = <0 0xf0b0 0x40804>; > > ethernet_switch@0 { > compatible = "brcm,bcm7445-switch-v4.0"; > #size-cells = <0>; > #address-cells = <1>; > reg = <0x0 0x4 > 0x4 0x110 > 0x40340 0x30 > 0x40380 0x30 > 0x40400 0x34 > 0x40600 0x208>; > reg-names = "core", "reg", intrl2_0", "intrl2_1", > "fcb, "acb"; > interrupts = <0 0x18 0 > 0 0x19 0>; > brcm,num-gphy = <1>; > brcm,num-rgmii-ports = <2>; > brcm,fcb-pause-override; > brcm,acb-packets-inflight; > > ports { > #address-cells = <1>; > #size-cells = <0>; > > port@0 { > label = "gphy"; > reg = <0>; > phy-handle = <>; > }; > > sw0port1: port@1 { > label = "rgmii_1"; > reg = <1>; > phy-mode = "rgmii"; > fixed-link { > speed = <1000>; > full-duplex; > }; > } > }; > }; > > mdio@403c0 { > reg = <0x403c0 0x8 0x40300 0x18>; > #address-cells = <0x1>; > #size-cells = <0x0>; > compatible = "brcm,unimac-mdio"; > reg-names = "mdio", "mdio_indir_rw"; > > switch: switch@0 { > broken-turn-around; > reg = <0x0>; > compatible = "brcm,bcm53125"; > #address-cells = <1>; > #size-cells = <0>; > > ports { > .. > port@8 { > ethernet = <>; > }; > ... > }; > }; > > phy5: ethernet-phy@5 { > reg = <0x5>; > compatible = "ethernet-phy-ieee802.3-c22"; > }; > }; > }; So phy5 is connected to the internal switch with a phy-handle. But because of your double usage of this node, it also can be mapped into the external switches port 5? Is that your problem? It seems like you should add an mdio node inside your switch node, and list your external switch internal/external phys there if needed. Andrew
Re: [PATCH v4 net-next RFC] net: Generic XDP
From: Michael ChanDate: Thu, 13 Apr 2017 13:16:43 -0700 > On Thu, Apr 13, 2017 at 9:09 AM, David Miller wrote: >> >> --- >> >> v4: >> - Fix MAC header adjustmnet before calling prog (David Ahern) >> - Disable LRO when generic XDP is installed (Michael Chan) > > I don't see where you are disabling LRO in the patch. Ugh, I posted the wrong patch, here is the correct one. Sorry about that: Subject: [PATCH] net: Generic XDP This provides a generic SKB based non-optimized XDP path which is used if either the driver lacks a specific XDP implementation, or the user requests it via a new IFLA_XDP_FLAGS value named XDP_FLAGS_SKB_MODE. It is arguable that perhaps I should have required something like this as part of the initial XDP feature merge. I believe this is critical for two reasons: 1) Accessibility. More people can play with XDP with less dependencies. Yes I know we have XDP support in virtio_net, but that just creates another depedency for learning how to use this facility. I wrote this to make life easier for the XDP newbies. 2) As a model for what the expected semantics are. If there is a pure generic core implementation, it serves as a semantic example for driver folks adding XDP support. This is just a rough draft and is untested. One thing I have not tried to address here is the issue of XDP_PACKET_HEADROOM, thanks to Daniel for spotting that. It seems incredibly expensive to do a skb_cow(skb, XDP_PACKET_HEADROOM) or whatever even if the XDP program doesn't try to push headers at all. I think we really need the verifier to somehow propagate whether certain XDP helpers are used or not. Signed-off-by: David S. Miller --- v4: - Fix MAC header adjustmnet before calling prog (David Ahern) - Disable LRO when generic XDP is installed (Michael Chan) - Bypass qdisc et al. on XDP_TX and record the event (Alexei) - Do not perform generic XDP on reinjected packets (DaveM) v3: - Make sure XDP program sees packet at MAC header, push back MAC header if we do XDP_TX. (Alexei) - Elide GRO when generic XDP is in use. (Alexei) - Add XDP_FLAG_SKB_MODE flag which the user can use to request generic XDP even if the driver has an XDP implementation. (Alexei) - Report whether SKB mode is in use in rtnl_xdp_fill() via XDP_FLAGS attribute. (Daniel) v2: - Add some "fall through" comments in switch statements based upon feedback from Andrew Lunn - Use RCU for generic xdp_prog, thanks to Johannes Berg. --- include/linux/netdevice.h| 8 +++ include/uapi/linux/if_link.h | 4 +- net/core/dev.c | 153 +-- net/core/gro_cells.c | 2 +- net/core/rtnetlink.c | 40 ++- 5 files changed, 185 insertions(+), 22 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index b0aa089..071a58b 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1891,9 +1891,17 @@ struct net_device { struct lock_class_key *qdisc_tx_busylock; struct lock_class_key *qdisc_running_key; boolproto_down; + struct bpf_prog __rcu *xdp_prog; }; #define to_net_dev(d) container_of(d, struct net_device, dev) +static inline bool netif_elide_gro(const struct net_device *dev) +{ + if (!(dev->features & NETIF_F_GRO) || dev->xdp_prog) + return true; + return false; +} + #defineNETDEV_ALIGN32 static inline diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index 8b405af..633aa02 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -887,7 +887,9 @@ enum { /* XDP section */ #define XDP_FLAGS_UPDATE_IF_NOEXIST(1U << 0) -#define XDP_FLAGS_MASK (XDP_FLAGS_UPDATE_IF_NOEXIST) +#define XDP_FLAGS_SKB_MODE (2U << 0) +#define XDP_FLAGS_MASK (XDP_FLAGS_UPDATE_IF_NOEXIST | \ +XDP_FLAGS_SKB_MODE) enum { IFLA_XDP_UNSPEC, diff --git a/net/core/dev.c b/net/core/dev.c index ef9fe60e..b3d3a6e 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -95,6 +95,7 @@ #include #include #include +#include #include #include #include @@ -4247,6 +4248,123 @@ static int __netif_receive_skb(struct sk_buff *skb) return ret; } +static struct static_key generic_xdp_needed __read_mostly; + +static int generic_xdp_install(struct net_device *dev, struct netdev_xdp *xdp) +{ + struct bpf_prog *new = xdp->prog; + int ret = 0; + + switch (xdp->command) { + case XDP_SETUP_PROG: { + struct bpf_prog *old = rtnl_dereference(dev->xdp_prog); + + rcu_assign_pointer(dev->xdp_prog, new); + if (old) + bpf_prog_put(old); + + if (old && !new) { +
Re: [PATCH v4 net-next RFC] net: Generic XDP
On Thu, Apr 13, 2017 at 9:09 AM, David Millerwrote: > > --- > > v4: > - Fix MAC header adjustmnet before calling prog (David Ahern) > - Disable LRO when generic XDP is installed (Michael Chan) I don't see where you are disabling LRO in the patch. > - Bypass qdisc et al. on XDP_TX and record the event (Alexei) > - Do not perform generic XDP on reinjected packets (DaveM) >
Re: [PATCH v3 net-next RFC] Generic XDP
From: Johannes BergDate: Thu, 13 Apr 2017 21:22:21 +0200 > OTOH, it might depend on the frame data itself, if the program does > something like > > xdp->data[xdp->data[0] & 0xf] > > (read or write, doesn't really matter) so then the verifier would have > to take the maximum possible value there into account. I am not well versed enough with the verifier to understand exactly how and to what extent SKB accesses are validated by the verifier. My, perhaps mistaken, impression is that access range validation is still at least partially done at run time.
Re: net/ipv4: use-after-free in ip_queue_xmit
On Thu, Apr 13, 2017 at 11:49 AM, Andrey Konovalovwrote: > On Mon, Apr 10, 2017 at 7:46 PM, Andrey Konovalov > wrote: >> On Mon, Apr 10, 2017 at 7:42 PM, Cong Wang wrote: >>> On Mon, Apr 10, 2017 at 7:40 AM, Andrey Konovalov >>> wrote: Hi, I've got the following error report while fuzzing the kernel with syzkaller. On commit 39da7c509acff13fc8cb12ec1bb20337c988ed36 (4.11-rc6). Unfortunately it's not reproducible. BUG: KASAN: use-after-free in ip_select_ttl include/net/dst.h:176 [inline] at addr 88006ab3602c BUG: KASAN: use-after-free in ip_queue_xmit+0x1817/0x1a30 net/ipv4/ip_output.c:485 at addr 88006ab3602c Read of size 4 by task syz-executor1/12627 CPU: 3 PID: 12627 Comm: syz-executor1 Not tainted 4.11.0-rc6+ #206 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x292/0x398 lib/dump_stack.c:52 kasan_object_err+0x1c/0x70 mm/kasan/report.c:164 print_address_description mm/kasan/report.c:202 [inline] kasan_report_error mm/kasan/report.c:291 [inline] kasan_report+0x252/0x510 mm/kasan/report.c:347 __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:367 ip_select_ttl include/net/dst.h:176 [inline] >>> >>> Probably same as the one you reported on ipv4_mtu(), it would >>> be nice if you could test the patch I proposed: >>> >>> https://patchwork.ozlabs.org/patch/747556/ >> >> Applied your patch. > > Oops, apparently your patch doesn't compile: > Weird, it compiles fine here. Either you have a different config or the following piece is missing for some reason? @@ -69,6 +69,7 @@ struct rtable { struct list_head rt_uncached; struct uncached_list *rt_uncached_list; + struct fib_info *fi; /* for refcnt to shared metrics */ };
[PATCH block-tree] net: off by one in inet6_pton()
If "scope_len" is sizeof(scope_id) then we would put the NUL terminator one space beyond the end of the buffer. Fixes: b1a951fe469e ("net/utils: generic inet_pton_with_scope helper") Signed-off-by: Dan Carpenter--- This one goes through Jens' tree not through net-dev. diff --git a/net/core/utils.c b/net/core/utils.c index da1089ea5389..93066bd0305a 100644 --- a/net/core/utils.c +++ b/net/core/utils.c @@ -339,7 +339,7 @@ static int inet6_pton(struct net *net, const char *src, u16 port_num, src + srclen != scope_delim && *scope_delim == '%') { struct net_device *dev; char scope_id[16]; - size_t scope_len = min_t(size_t, sizeof(scope_id), + size_t scope_len = min_t(size_t, sizeof(scope_id) - 1, src + srclen - scope_delim - 1); memcpy(scope_id, scope_delim + 1, scope_len);
Re: [PATCH v3] net: phy: micrel: fix crash when statistic requested for KSZ9031 phy
On 04/13/2017 12:11 PM, Grygorii Strashko wrote: > Now the command: > ethtool --phy-statistics eth0 > will cause system crash with meassage "Unable to handle kernel NULL pointer > dereference at virtual address 0010" from: > > (kszphy_get_stats) from [] (ethtool_get_phy_stats+0xd8/0x210) > (ethtool_get_phy_stats) from [] (dev_ethtool+0x5b8/0x228c) > (dev_ethtool) from [] (dev_ioctl+0x3fc/0x964) > (dev_ioctl) from [] (sock_ioctl+0x170/0x2c0) > (sock_ioctl) from [] (do_vfs_ioctl+0xa8/0x95c) > (do_vfs_ioctl) from [] (SyS_ioctl+0x3c/0x64) > (SyS_ioctl) from [] (ret_fast_syscall+0x0/0x44) > > The reason: phy_driver structure for KSZ9031 phy has no .probe() callback > defined. As result, struct phy_device *phydev->priv pointer will not be > initializes (null). > This issue will affect also following phys: > KSZ8795, KSZ886X, KSZ8873MLL, KSZ9031, KSZ9021, KSZ8061, KS8737 > > Fix it by: > - adding .probe() = kszphy_probe() callback to KSZ9031, KSZ9021 > phys. The kszphy_probe() can be re-used as it doesn't do any phy specific > settings. > - removing statistic callbacks from other phys (KSZ8795, KSZ886X, > KSZ8873MLL, KSZ8061, KS8737) as they doesn't have corresponding > statistic counters. > > Fixes: 2b2427d06426 ("phy: micrel: Add ethtool statistics counters") > Signed-off-by: Grygorii Strashko> Reviewed-by: Andrew Lunn Reviewed-by: Florian Fainelli -- Florian
Re: [PATCH v3] net: phy: micrel: fix crash when statistic requested for KSZ9031 phy
On Thu, Apr 13, 2017 at 10:28:09PM +0300, Sergei Shtylyov wrote: > On 04/13/2017 10:11 PM, Grygorii Strashko wrote: > > >Now the command: > > ethtool --phy-statistics eth0 > >will cause system crash with meassage "Unable to handle kernel NULL pointer > >dereference at virtual address 0010" from: > > > > (kszphy_get_stats) from [] (ethtool_get_phy_stats+0xd8/0x210) > > (ethtool_get_phy_stats) from [] (dev_ethtool+0x5b8/0x228c) > > (dev_ethtool) from [] (dev_ioctl+0x3fc/0x964) > > (dev_ioctl) from [] (sock_ioctl+0x170/0x2c0) > > (sock_ioctl) from [] (do_vfs_ioctl+0xa8/0x95c) > > (do_vfs_ioctl) from [] (SyS_ioctl+0x3c/0x64) > > (SyS_ioctl) from [] (ret_fast_syscall+0x0/0x44) > > > >The reason: phy_driver structure for KSZ9031 phy has no .probe() callback > >defined. As result, struct phy_device *phydev->priv pointer will not be > >initializes (null). > >This issue will affect also following phys: > > KSZ8795, KSZ886X, KSZ8873MLL, KSZ9031, KSZ9021, KSZ8061, KS8737 > > > >Fix it by: > >- adding .probe() = kszphy_probe() callback to KSZ9031, KSZ9021 > >phys. The kszphy_probe() can be re-used as it doesn't do any phy specific > >settings. > >- removing statistic callbacks from other phys (KSZ8795, KSZ886X, > >KSZ8873MLL, KSZ8061, KS8737) as they doesn't have corresponding > >statistic counters. > >Not sure how the 2nd change fixes the reported issue. It looks > like a material for a separate patch... There are two different cases here: 1) The hardware supports the stats. So a probe function is needed, but is missing. 2) The hardware does not support the stats, so there should not be stats ops. The same crash will happen, independent of which one of the above is true. You need to fix them both, to stop it crashing. Andrew
Re: [PATCH v3] net: phy: micrel: fix crash when statistic requested for KSZ9031 phy
On 04/13/2017 10:11 PM, Grygorii Strashko wrote: Now the command: ethtool --phy-statistics eth0 will cause system crash with meassage "Unable to handle kernel NULL pointer dereference at virtual address 0010" from: (kszphy_get_stats) from [] (ethtool_get_phy_stats+0xd8/0x210) (ethtool_get_phy_stats) from [] (dev_ethtool+0x5b8/0x228c) (dev_ethtool) from [] (dev_ioctl+0x3fc/0x964) (dev_ioctl) from [] (sock_ioctl+0x170/0x2c0) (sock_ioctl) from [] (do_vfs_ioctl+0xa8/0x95c) (do_vfs_ioctl) from [] (SyS_ioctl+0x3c/0x64) (SyS_ioctl) from [] (ret_fast_syscall+0x0/0x44) The reason: phy_driver structure for KSZ9031 phy has no .probe() callback defined. As result, struct phy_device *phydev->priv pointer will not be initializes (null). This issue will affect also following phys: KSZ8795, KSZ886X, KSZ8873MLL, KSZ9031, KSZ9021, KSZ8061, KS8737 Fix it by: - adding .probe() = kszphy_probe() callback to KSZ9031, KSZ9021 phys. The kszphy_probe() can be re-used as it doesn't do any phy specific settings. - removing statistic callbacks from other phys (KSZ8795, KSZ886X, KSZ8873MLL, KSZ8061, KS8737) as they doesn't have corresponding statistic counters. Not sure how the 2nd change fixes the reported issue. It looks like a material for a separate patch... Fixes: 2b2427d06426 ("phy: micrel: Add ethtool statistics counters") Signed-off-by: Grygorii StrashkoReviewed-by: Andrew Lunn [...] MBR, Sergei
Mac80211 - 802.11s
How to get the vif of mesh stations and does the vif have separate queue structure apart from PHY queues ? I want to get the queue statistics for every mesh station that are associated.
Re: [PATCH 1/5] netlink: extended ACK reporting
On Thu, 2017-04-13 at 16:05 +0200, Nicolas Dichtel wrote: > Sure. It was just to mention that attribute 0 exists somewhere. > The other 0 attribute is OVS_TUNNEL_KEY_ATTR_ID. That looks like some really awkward hand-grown parsing - with all these "struct ovs_len_tbl" looking almost like a policy, but not using that code? Seems like something somebody should take a hard look at and see if it can't use more standard infrastructure. johannes
Re: [PATCH v3 net-next RFC] Generic XDP
On Thu, 2017-04-13 at 11:37 -0400, David Miller wrote: > If the capability is variable, it must be communicated to the user > somehow at program load time. > > We are consistently finding that there is this real need to > communicate XDP capabilities, or somehow verify that the needs > of an XDP program can be satisfied by a given implementation. Technically, once you know the capability of the *driver*, the verifier should be able to check if the *program* is compatible. So if the driver can guarantee "you always get 2k accessible", the verifier can check that you don't access more than xdb->data + 2047, similar to how it verifies that you don't access beyond xdb->data_end. > And eth_get_headlen() only pulls protocol headers, which precludes > XDP inspecting anything below TCP/UDP/etc. This is also not > reasonable. > > Right now, as it stands, we have to assume the program can > potentially be interested in the entire packet. I agree with this though, it's not reasonable to have wildly varying implementations here that may or may not be able to access almost anything. The totally degenerate case would be having no skb header at all, which is also still entirely valid from the network stack's POV. > We can only optimize this and elide things when we have a facility in > the future for the program to express it's needs precisely. I think > we will have to add some control structure to XDP programs that can > be filled in for this purpose. Like I said above, I think this is something that you can possibly determine in the verifier. So if, for example, the verifier notices that the program never accesses anything but the first few bytes, then it would seem valid to run with only that much pulled into the skb header. OTOH, it might depend on the frame data itself, if the program does something like xdp->data[xdp->data[0] & 0xf] (read or write, doesn't really matter) so then the verifier would have to take the maximum possible value there into account. johannes
[PATCH v3] net: phy: micrel: fix crash when statistic requested for KSZ9031 phy
Now the command: ethtool --phy-statistics eth0 will cause system crash with meassage "Unable to handle kernel NULL pointer dereference at virtual address 0010" from: (kszphy_get_stats) from [] (ethtool_get_phy_stats+0xd8/0x210) (ethtool_get_phy_stats) from [] (dev_ethtool+0x5b8/0x228c) (dev_ethtool) from [] (dev_ioctl+0x3fc/0x964) (dev_ioctl) from [] (sock_ioctl+0x170/0x2c0) (sock_ioctl) from [] (do_vfs_ioctl+0xa8/0x95c) (do_vfs_ioctl) from [] (SyS_ioctl+0x3c/0x64) (SyS_ioctl) from [] (ret_fast_syscall+0x0/0x44) The reason: phy_driver structure for KSZ9031 phy has no .probe() callback defined. As result, struct phy_device *phydev->priv pointer will not be initializes (null). This issue will affect also following phys: KSZ8795, KSZ886X, KSZ8873MLL, KSZ9031, KSZ9021, KSZ8061, KS8737 Fix it by: - adding .probe() = kszphy_probe() callback to KSZ9031, KSZ9021 phys. The kszphy_probe() can be re-used as it doesn't do any phy specific settings. - removing statistic callbacks from other phys (KSZ8795, KSZ886X, KSZ8873MLL, KSZ8061, KS8737) as they doesn't have corresponding statistic counters. Fixes: 2b2427d06426 ("phy: micrel: Add ethtool statistics counters") Signed-off-by: Grygorii StrashkoReviewed-by: Andrew Lunn --- changes in v3: - occasional whitespace change removed changes in v2: - probe callback added to KSZ9031, KSZ9021 - statistic callback removed from KSZ8795, KSZ886X, KSZ8873MLL, KSZ8061, KS8737 Links v2: https://patchwork.ozlabs.org/patch/750194/ v1: https://lkml.org/lkml/2017/4/10/1183 drivers/net/phy/micrel.c | 17 ++--- 1 file changed, 2 insertions(+), 15 deletions(-) diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c index 6742070..1326d99 100644 --- a/drivers/net/phy/micrel.c +++ b/drivers/net/phy/micrel.c @@ -798,9 +798,6 @@ static struct phy_driver ksphy_driver[] = { .read_status= genphy_read_status, .ack_interrupt = kszphy_ack_interrupt, .config_intr= kszphy_config_intr, - .get_sset_count = kszphy_get_sset_count, - .get_strings= kszphy_get_strings, - .get_stats = kszphy_get_stats, .suspend= genphy_suspend, .resume = genphy_resume, }, { @@ -940,9 +937,6 @@ static struct phy_driver ksphy_driver[] = { .read_status= genphy_read_status, .ack_interrupt = kszphy_ack_interrupt, .config_intr= kszphy_config_intr, - .get_sset_count = kszphy_get_sset_count, - .get_strings= kszphy_get_strings, - .get_stats = kszphy_get_stats, .suspend= genphy_suspend, .resume = genphy_resume, }, { @@ -952,6 +946,7 @@ static struct phy_driver ksphy_driver[] = { .features = PHY_GBIT_FEATURES, .flags = PHY_HAS_MAGICANEG | PHY_HAS_INTERRUPT, .driver_data= _type, + .probe = kszphy_probe, .config_init= ksz9021_config_init, .config_aneg= genphy_config_aneg, .read_status= genphy_read_status, @@ -971,6 +966,7 @@ static struct phy_driver ksphy_driver[] = { .features = PHY_GBIT_FEATURES, .flags = PHY_HAS_MAGICANEG | PHY_HAS_INTERRUPT, .driver_data= _type, + .probe = kszphy_probe, .config_init= ksz9031_config_init, .config_aneg= genphy_config_aneg, .read_status= ksz9031_read_status, @@ -989,9 +985,6 @@ static struct phy_driver ksphy_driver[] = { .config_init= kszphy_config_init, .config_aneg= ksz8873mll_config_aneg, .read_status= ksz8873mll_read_status, - .get_sset_count = kszphy_get_sset_count, - .get_strings= kszphy_get_strings, - .get_stats = kszphy_get_stats, .suspend= genphy_suspend, .resume = genphy_resume, }, { @@ -1003,9 +996,6 @@ static struct phy_driver ksphy_driver[] = { .config_init= kszphy_config_init, .config_aneg= genphy_config_aneg, .read_status= genphy_read_status, - .get_sset_count = kszphy_get_sset_count, - .get_strings= kszphy_get_strings, - .get_stats = kszphy_get_stats, .suspend= genphy_suspend, .resume = genphy_resume, }, { @@ -1017,9 +1007,6 @@ static struct phy_driver ksphy_driver[] = { .config_init= kszphy_config_init, .config_aneg= ksz8873mll_config_aneg, .read_status= ksz8873mll_read_status, - .get_sset_count = kszphy_get_sset_count, - .get_strings= kszphy_get_strings, - .get_stats = kszphy_get_stats, .suspend= genphy_suspend, .resume = genphy_resume, } }; -- 2.10.1
Re: [PATCH v2] net: phy: micrel: fix crash when statistic requested for KSZ9031 phy
On 04/13/2017 01:51 PM, Andrew Lunn wrote: On Wed, Apr 12, 2017 at 05:55:10PM -0500, Grygorii Strashko wrote: Now the command: ethtool --phy-statistics eth0 will cause system crash with meassage "Unable to handle kernel NULL pointer dereference at virtual address 0010" from: (kszphy_get_stats) from [] (ethtool_get_phy_stats+0xd8/0x210) (ethtool_get_phy_stats) from [] (dev_ethtool+0x5b8/0x228c) (dev_ethtool) from [] (dev_ioctl+0x3fc/0x964) (dev_ioctl) from [] (sock_ioctl+0x170/0x2c0) (sock_ioctl) from [] (do_vfs_ioctl+0xa8/0x95c) (do_vfs_ioctl) from [] (SyS_ioctl+0x3c/0x64) (SyS_ioctl) from [] (ret_fast_syscall+0x0/0x44) The reason: phy_driver structure for KSZ9031 phy has no .probe() callback defined. As result, struct phy_device *phydev->priv pointer will not be initializes (null). This issue will affect also following phys: KSZ8795, KSZ886X, KSZ8873MLL, KSZ9031, KSZ9021, KSZ8061, KS8737 Fix it by: - adding .probe() = kszphy_probe() callback to KSZ9031, KSZ9021 phys. The kszphy_probe() can be re-used as it doesn't do any phy specific settings. - removing statistic callbacks from other phys (KSZ8795, KSZ886X, KSZ8873MLL, KSZ8061, KS8737) as they doesn't have corresponding statistic counters. Fixes: 2b2427d06426 ("phy: micrel: Add ethtool statistics counters") Signed-off-by: Grygorii Strashko--- changes in v2: - probe callback added to KSZ9031, KSZ9021 - statistic callback removed from KSZ8795, KSZ886X, KSZ8873MLL, KSZ8061, KS8737 Link on v1: https://lkml.org/lkml/2017/4/10/1183 drivers/net/phy/micrel.c | 18 ++ 1 file changed, 2 insertions(+), 16 deletions(-) diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c index 6742070..6f207e6 100644 --- a/drivers/net/phy/micrel.c +++ b/drivers/net/phy/micrel.c @@ -574,7 +574,6 @@ static int ksz9031_config_init(struct phy_device *phydev) MII_KSZ9031RN_TX_DATA_PAD_SKEW, 4, tx_data_skews, 4); } - return ksz9031_center_flp_timing(phydev); } Hi Grygorii Whitespace changed like this should be in a separate patch, or not made at all. Oh. sry i've missed it. Will resend Otherwise, thanks for looking at the datasheets and fixing this up. Reviewed-by: Andrew Lunn -- regards, -grygorii
Re: [PATCH v2] net: phy: micrel: fix crash when statistic requested for KSZ9031 phy
On Wed, Apr 12, 2017 at 05:55:10PM -0500, Grygorii Strashko wrote: > Now the command: > ethtool --phy-statistics eth0 > will cause system crash with meassage "Unable to handle kernel NULL pointer > dereference at virtual address 0010" from: > > (kszphy_get_stats) from [] (ethtool_get_phy_stats+0xd8/0x210) > (ethtool_get_phy_stats) from [] (dev_ethtool+0x5b8/0x228c) > (dev_ethtool) from [] (dev_ioctl+0x3fc/0x964) > (dev_ioctl) from [] (sock_ioctl+0x170/0x2c0) > (sock_ioctl) from [] (do_vfs_ioctl+0xa8/0x95c) > (do_vfs_ioctl) from [] (SyS_ioctl+0x3c/0x64) > (SyS_ioctl) from [] (ret_fast_syscall+0x0/0x44) > > The reason: phy_driver structure for KSZ9031 phy has no .probe() callback > defined. As result, struct phy_device *phydev->priv pointer will not be > initializes (null). > This issue will affect also following phys: > KSZ8795, KSZ886X, KSZ8873MLL, KSZ9031, KSZ9021, KSZ8061, KS8737 > > Fix it by: > - adding .probe() = kszphy_probe() callback to KSZ9031, KSZ9021 > phys. The kszphy_probe() can be re-used as it doesn't do any phy specific > settings. > - removing statistic callbacks from other phys (KSZ8795, KSZ886X, > KSZ8873MLL, KSZ8061, KS8737) as they doesn't have corresponding > statistic counters. > > Fixes: 2b2427d06426 ("phy: micrel: Add ethtool statistics counters") > Signed-off-by: Grygorii Strashko> --- > changes in v2: > - probe callback added to KSZ9031, KSZ9021 > - statistic callback removed from KSZ8795, KSZ886X, KSZ8873MLL, KSZ8061, > KS8737 > > Link on v1: > https://lkml.org/lkml/2017/4/10/1183 > > drivers/net/phy/micrel.c | 18 ++ > 1 file changed, 2 insertions(+), 16 deletions(-) > > diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c > index 6742070..6f207e6 100644 > --- a/drivers/net/phy/micrel.c > +++ b/drivers/net/phy/micrel.c > @@ -574,7 +574,6 @@ static int ksz9031_config_init(struct phy_device *phydev) > MII_KSZ9031RN_TX_DATA_PAD_SKEW, 4, > tx_data_skews, 4); > } > - > return ksz9031_center_flp_timing(phydev); > } Hi Grygorii Whitespace changed like this should be in a separate patch, or not made at all. Otherwise, thanks for looking at the datasheets and fixing this up. Reviewed-by: Andrew Lunn Andrew
Re: net/ipv4: use-after-free in ip_queue_xmit
On Mon, Apr 10, 2017 at 7:46 PM, Andrey Konovalovwrote: > On Mon, Apr 10, 2017 at 7:42 PM, Cong Wang wrote: >> On Mon, Apr 10, 2017 at 7:40 AM, Andrey Konovalov >> wrote: >>> Hi, >>> >>> I've got the following error report while fuzzing the kernel with syzkaller. >>> >>> On commit 39da7c509acff13fc8cb12ec1bb20337c988ed36 (4.11-rc6). >>> >>> Unfortunately it's not reproducible. >>> >>> BUG: KASAN: use-after-free in ip_select_ttl include/net/dst.h:176 >>> [inline] at addr 88006ab3602c >>> BUG: KASAN: use-after-free in ip_queue_xmit+0x1817/0x1a30 >>> net/ipv4/ip_output.c:485 at addr 88006ab3602c >>> Read of size 4 by task syz-executor1/12627 >>> CPU: 3 PID: 12627 Comm: syz-executor1 Not tainted 4.11.0-rc6+ #206 >>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 >>> Call Trace: >>> __dump_stack lib/dump_stack.c:16 [inline] >>> dump_stack+0x292/0x398 lib/dump_stack.c:52 >>> kasan_object_err+0x1c/0x70 mm/kasan/report.c:164 >>> print_address_description mm/kasan/report.c:202 [inline] >>> kasan_report_error mm/kasan/report.c:291 [inline] >>> kasan_report+0x252/0x510 mm/kasan/report.c:347 >>> __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:367 >>> ip_select_ttl include/net/dst.h:176 [inline] >> >> Probably same as the one you reported on ipv4_mtu(), it would >> be nice if you could test the patch I proposed: >> >> https://patchwork.ozlabs.org/patch/747556/ > > Applied your patch. Oops, apparently your patch doesn't compile: CC net/ipv4/route.o net/ipv4/route.c: In function ‘ipv4_dst_destroy’: net/ipv4/route.c:1394:8: error: ‘struct rtable’ has no member named ‘fi’ if (rt->fi) { ^~ net/ipv4/route.c:1395:18: error: ‘struct rtable’ has no member named ‘fi’ fib_info_put(rt->fi); ^~ net/ipv4/route.c:1396:5: error: ‘struct rtable’ has no member named ‘fi’ rt->fi = NULL; ^~ net/ipv4/route.c: In function ‘rt_init_metrics’: net/ipv4/route.c:1440:5: error: ‘struct rtable’ has no member named ‘fi’ rt->fi = fi; ^~ net/ipv4/route.c: In function ‘rt_dst_alloc’: net/ipv4/route.c:1512:5: error: ‘struct rtable’ has no member named ‘fi’ rt->fi = NULL; ^~ make[2]: *** [net/ipv4/route.o] Error 1 make[1]: *** [net/ipv4] Error 2 make[1]: *** Waiting for unfinished jobs make: *** [net] Error 2 > > The bug gets triggered very rarely (only twice so far), but I'll let > you know if I see it again. > > Thanks! > >> >> >> Thanks! >> >>> ip_queue_xmit+0x1817/0x1a30 net/ipv4/ip_output.c:485 >>> sctp_v4_xmit+0x10d/0x140 net/sctp/protocol.c:994 >>> sctp_packet_transmit+0x215c/0x3560 net/sctp/output.c:637 >>> sctp_outq_flush+0xade/0x3f90 net/sctp/outqueue.c:885 >>> sctp_outq_uncork+0x5a/0x70 net/sctp/outqueue.c:750 >>> sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1773 [inline] >>> sctp_side_effects net/sctp/sm_sideeffect.c:1175 [inline] >>> sctp_do_sm+0x5a0/0x6a50 net/sctp/sm_sideeffect.c:1147 >>> sctp_primitive_ASSOCIATE+0x9d/0xd0 net/sctp/primitive.c:88 >>> sctp_sendmsg+0x270d/0x3b50 net/sctp/socket.c:1954 >>> inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:762 >>> sock_sendmsg_nosec net/socket.c:633 [inline] >>> sock_sendmsg+0xca/0x110 net/socket.c:643 >>> SYSC_sendto+0x660/0x810 net/socket.c:1696 >>> SyS_sendto+0x40/0x50 net/socket.c:1664 >>> entry_SYSCALL_64_fastpath+0x1f/0xc2 >>> RIP: 0033:0x4458d9 >>> RSP: 002b:7fdceca85b58 EFLAGS: 0282 ORIG_RAX: 002c >>> RAX: ffda RBX: 0016 RCX: 004458d9 >>> RDX: 0087 RSI: 20003000 RDI: 0016 >>> RBP: 006e2fe0 R08: 20003000 R09: 0010 >>> R10: 00040841 R11: 0282 R12: 007080a8 >>> R13: 000a R14: 0005 R15: 0084 >>> Object at 88006ab36008, in cache kmalloc-64 size: 64 >>> Allocated: >>> PID = 7243 >>> save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 >>> save_stack+0x43/0xd0 mm/kasan/kasan.c:513 >>> set_track mm/kasan/kasan.c:525 [inline] >>> kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:616 >>> kmem_cache_alloc_trace+0x82/0x270 mm/slub.c:2745 >>> kmalloc include/linux/slab.h:490 [inline] >>> kzalloc include/linux/slab.h:663 [inline] >>> fib_create_info+0x8e0/0x3a30 net/ipv4/fib_semantics.c:1040 >>> fib_table_insert+0x1a5/0x1550 net/ipv4/fib_trie.c:1221 >>> ip_rt_ioctl+0xddc/0x1590 net/ipv4/fib_frontend.c:597 >>> inet_ioctl+0xf2/0x1c0 net/ipv4/af_inet.c:882 >>> sock_do_ioctl+0x65/0xb0 net/socket.c:906 >>> sock_ioctl+0x28f/0x440 net/socket.c:1004 >>> vfs_ioctl fs/ioctl.c:45 [inline] >>> do_vfs_ioctl+0x1bf/0x1780 fs/ioctl.c:685 >>> SYSC_ioctl fs/ioctl.c:700 [inline] >>> SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691 >>> entry_SYSCALL_64_fastpath+0x1f/0xc2 >>> Freed: >>> PID = 12622 >>> save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 >>> save_stack+0x43/0xd0 mm/kasan/kasan.c:513 >>> set_track
Re: [PATCH v4 3/3] VSOCK: Add virtio vsock vsockmon hooks
On Thu, Apr 13, 2017 at 05:18:11PM +0100, Stefan Hajnoczi wrote: > From: Gerard Garcia> > The virtio drivers deal with struct virtio_vsock_pkt. Add > virtio_transport_deliver_tap_pkt(pkt) for handing packets to the > vsockmon device. > > We call virtio_transport_deliver_tap_pkt(pkt) from > net/vmw_vsock/virtio_transport.c and drivers/vhost/vsock.c instead of > common code. This is because the drivers may drop packets before > handing them to common code - we still want to capture them. > > Signed-off-by: Gerard Garcia > Signed-off-by: Stefan Hajnoczi > --- > v3: > * Hook virtio_transport.c (guest driver), not just >drivers/vhost/vsock.c (host driver) > --- > include/linux/virtio_vsock.h| 1 + > drivers/vhost/vsock.c | 8 + > net/vmw_vsock/virtio_transport.c| 3 ++ > net/vmw_vsock/virtio_transport_common.c | 58 > + > 4 files changed, 70 insertions(+) > > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h > index 584f9a6..ab13f07 100644 > --- a/include/linux/virtio_vsock.h > +++ b/include/linux/virtio_vsock.h > @@ -153,5 +153,6 @@ void virtio_transport_free_pkt(struct virtio_vsock_pkt > *pkt); > void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct > virtio_vsock_pkt *pkt); > u32 virtio_transport_get_credit(struct virtio_vsock_sock *vvs, u32 wanted); > void virtio_transport_put_credit(struct virtio_vsock_sock *vvs, u32 credit); > +void virtio_transport_deliver_tap_pkt(struct virtio_vsock_pkt *pkt); > > #endif /* _LINUX_VIRTIO_VSOCK_H */ > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c > index 44eed8e..d939ac1 100644 > --- a/drivers/vhost/vsock.c > +++ b/drivers/vhost/vsock.c > @@ -176,6 +176,11 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock, > restart_tx = true; > } > > + /* Deliver to monitoring devices all correctly transmitted > + * packets. > + */ > + virtio_transport_deliver_tap_pkt(pkt); > + > virtio_transport_free_pkt(pkt); > } > if (added) > @@ -383,6 +388,9 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work > *work) > > len = pkt->len; > > + /* Deliver to monitoring devices all received packets */ > + virtio_transport_deliver_tap_pkt(pkt); > + > /* Only accept correctly addressed packets */ > if (le64_to_cpu(pkt->hdr.src_cid) == vsock->guest_cid) > virtio_transport_recv_pkt(pkt); > diff --git a/net/vmw_vsock/virtio_transport.c > b/net/vmw_vsock/virtio_transport.c > index 68675a1..9dffe02 100644 > --- a/net/vmw_vsock/virtio_transport.c > +++ b/net/vmw_vsock/virtio_transport.c > @@ -144,6 +144,8 @@ virtio_transport_send_pkt_work(struct work_struct *work) > list_del_init(>list); > spin_unlock_bh(>send_pkt_list_lock); > > + virtio_transport_deliver_tap_pkt(pkt); > + > reply = pkt->reply; > > sg_init_one(, >hdr, sizeof(pkt->hdr)); > @@ -370,6 +372,7 @@ static void virtio_transport_rx_work(struct work_struct > *work) > } > > pkt->len = len - sizeof(pkt->hdr); > + virtio_transport_deliver_tap_pkt(pkt); > virtio_transport_recv_pkt(pkt); > } > } while (!virtqueue_enable_cb(vq)); > diff --git a/net/vmw_vsock/virtio_transport_common.c > b/net/vmw_vsock/virtio_transport_common.c > index af087b4..aae60c1 100644 > --- a/net/vmw_vsock/virtio_transport_common.c > +++ b/net/vmw_vsock/virtio_transport_common.c > @@ -16,6 +16,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -85,6 +86,63 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info > *info, > return NULL; > } > > +/* Packet capture */ > +void virtio_transport_deliver_tap_pkt(struct virtio_vsock_pkt *pkt) > +{ > + struct sk_buff *skb; > + struct af_vsockmon_hdr *hdr; > + unsigned char *t_hdr, *payload; > + > + skb = alloc_skb(sizeof(*hdr) + sizeof(pkt->hdr) + pkt->len, > + GFP_ATOMIC); > + if (!skb) > + return; /* nevermind if we cannot capture the packet */ > + > + hdr = (struct af_vsockmon_hdr *)skb_put(skb, sizeof(*hdr)); > + > + /* pkt->hdr is little-endian so no need to byteswap here */ Comment does not seem to make sense. Drop it? > + hdr->src_cid = pkt->hdr.src_cid; > + hdr->src_port = pkt->hdr.src_port; > + hdr->dst_cid = pkt->hdr.dst_cid; > + hdr->dst_port = pkt->hdr.dst_port; > + > + hdr->transport = cpu_to_le16(AF_VSOCK_TRANSPORT_VIRTIO); > + hdr->len = cpu_to_le16(sizeof(pkt->hdr)); > + hdr->reserved[0] = hdr->reserved[1] = 0; > + > + switch(cpu_to_le16(pkt->hdr.op)) { I'd
[Patch net-next] kcm: remove a useless copy_from_user()
struct kcm_clone only contains fd, and kcm_clone() only writes this struct, so there is no need to copy it from user. Cc: Tom HerbertSigned-off-by: Cong Wang --- net/kcm/kcmsock.c | 4 1 file changed, 4 deletions(-) diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c index 31762f7..deca20f 100644 --- a/net/kcm/kcmsock.c +++ b/net/kcm/kcmsock.c @@ -1707,11 +1707,7 @@ static int kcm_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) struct kcm_clone info; struct socket *newsock = NULL; - if (copy_from_user(, (void __user *)arg, sizeof(info))) - return -EFAULT; - err = kcm_clone(sock, , ); - if (!err) { if (copy_to_user((void __user *)arg, , sizeof(info))) { -- 2.5.5
How to debug DMAR errors?
Hello, I have been seeing a regular occurrence of DMAR errors, looking something like this when testing my ath10k driver/firmware under some specific loads (maximum receive of 512 byte frames in AP mode): DMAR: DRHD: handling fault status reg 3 DMAR: [DMA Read] Request device [05:00.0] fault addr fd99f000 [fault reason 06] PTE Read access is not set ath10k_pci :05:00.0: firmware crashed! (uuid 594b1393-ae35-42b5-9dec-74ff0c6791ff) So, I am wondering if there is any way I can get more information about what this fd99f000 address is? Once this problem hits, the entire OS locks hard (not even sysrq-boot will do anything), so I guess I would need the DMAR logic to print out more info on that address somehow. Thanks, Ben -- Ben GreearCandela Technologies Inc http://www.candelatech.com
Re: [PATCH linux 2/2] net sched actions: fix refcount decrement on error
On Thu, Apr 13, 2017 at 1:06 AM, Wolfgang Bumillerwrote: > On Wed, Apr 12, 2017 at 09:27:31PM -0700, Cong Wang wrote: >> Instead of duplicating code, you can add the check >> to the module_put() next to err_mod label? I mean: > > I just realized that with module_put() happening in both error and > success cases if `err != ACT_P_CREATED`, we could just move that code up > to above the TCA_ACT_COOKIE handling? Yes, even better. > Btw., the comment confused me a little at first as I thought it's about > what happens in ->init(). But reading the code I then noticed the module > count is increased in tc_lookup_action_n() (which calls try_module_get) > in this functions and it's about how this function itself is supposed > to affect the count - if I'm not mistaken. > => so I think it makes sense to deal with this earlier. Yes, the module reference count is not increased inside ->init(), it is because of the semantic of ->init(), it could create a new action or modify existing one, for the cast latter we need to rollback the refcount. Please feel free to update that comment to make it more clear, since you are already on it. ;) > > Otherwise I'd have to save `err != ACT_P_CREATED` in an additional > variable for the err_mod case since the cookie handling modifies `err`. > > What about this? (Since it's a separate issue not directly related to > patch 1 of the series I can send it as separate mail based on master if > you prefer - the diff below is based on master+patch1 for now.) > Looks good, this could also address Roman's comment. Please remove the RFC tag and resend the whole series. You can also add my: Acked-by: Cong Wang Thanks.
Re: [PATCH next] bonding: handle link transition from FAIL to UP correctly
From: Mahesh BandewarDate: Tue, 11 Apr 2017 22:36:00 -0700 > From: Mahesh Bandewar > > When link transitions from LINK_FAIL to LINK_UP, the commit phase is > not called. This leads to an erroneous state causing slave-link state to > get stuck in "going down" state while its speed and duplex are perfectly > fine. This issue is a side-effect of splitting link-set into propose and > commit phases introduced by de77ecd4ef02 ("bonding: improve link-status > update in mii-monitoring") > > This patch fixes these issues by calling commit phase whenever link > state change is proposed. > > Fixes: de77ecd4ef02 ("bonding: improve link-status update in mii-monitoring") > Signed-off-by: Mahesh Bandewar Applied, thanks.
Re: [PATCH v2 net-next] net: dwc-xlgmac: add the initial ethtool support
From: Jie DengDate: Wed, 12 Apr 2017 13:10:06 +0800 > It is necessary to provide ethtool support for displaying and > modifying parameters of dwc-xlgmac. > > Signed-off-by: Jie Deng > --- > v1->v2: > - remove begin() method which is unnecessary Applied, thank you.
Re: [PATCH v3 0/4] TI Bluetooth serdev support
Hi Rob, > This series adds serdev support to the HCI LL protocol used on TI BT > modules and enables support on HiKey board with with the WL1835 module. > With this the custom TI UIM daemon and btattach are no longer needed. > > The series is available on this git branch[1]. This version is rebased on > bluetooth-next tree containing its dependencies. > > Rob > > [1] git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git ti-bluetooth > > Rob Herring (4): > dt-bindings: net: Add TI WiLink shared transport binding > bluetooth: hci_uart: remove unused hci_uart_init_tty > bluetooth: hci_uart: add LL protocol serdev driver support > arm64: dts: hikey: add WL1835 Bluetooth device node > > .../devicetree/bindings/net/ti,wilink-st.txt | 35 +++ > arch/arm64/boot/dts/hisilicon/hi6220-hikey.dts | 5 + > drivers/bluetooth/hci_ldisc.c | 19 -- > drivers/bluetooth/hci_ll.c | 262 - > drivers/bluetooth/hci_uart.h | 1 - > 5 files changed, 301 insertions(+), 21 deletions(-) > create mode 100644 Documentation/devicetree/bindings/net/ti,wilink-st.txt all 4 patches have been applied to bluetooth-next tree. Regards Marcel
Re: [PATCH net-next v3 1/1] net: ipv4: Refine the ipv4_default_advmss
From: gfree.w...@foxmail.com Date: Wed, 12 Apr 2017 12:34:03 +0800 > From: Gao Feng> > 1. Don't get the metric RTAX_ADVMSS of dst. > There are two reasons. > 1) Its caller dst_metric_advmss has already invoke dst_metric_advmss > before invoke default_advmss. > 2) The ipv4_default_advmss is used to get the default mss, it should > not try to get the metric like ip6_default_advmss. > > 2. Use sizeof(tcphdr)+sizeof(iphdr) instead of literal 40. > > 3. Define one new macro IPV4_MAX_PMTU instead of 65535 according to > RFC 2675, section 5.1. > > Signed-off-by: Gao Feng > --- > v3: Simplify the codes again, per Joe > v2: Use min instead of unnecessary min_t, per Joe > v1: initial version Applied, thanks.
Re: [PATCH net-next v2 0/8] rtnetlink: Cleanup user notifications for netdev events
From: David AhernDate: Tue, 11 Apr 2017 17:02:39 -0700 > Vlad's recent patch to add the event type to rtnetlink notifications > points out a number of redundant or unnecessary notifications sent to > userspace for events that are essentially internal to the kernel. Trim > the list to put a dent in the notification storm. > > v2 > - rebased to top of net-next with IFLA_EVENT patch reverted > - dropped removal NETDEV_CHANGEINFODATA since it is intentionally > only to send a message to userspace > - dropped NOTIFY_PEERS since Vlad's says it is needed for macvlans > - add patches to remove NETDEV_CHANGEUPPER and NETDEV_CHANGE_TX_QUEUE_LEN > from the event list Series applied, thanks David.
[PATCH v2] cfg80211: Fix array-bounds warning in fragment copy
__ieee80211_amsdu_copy_frag intentionally initializes a pointer to array[-1] to increment it later to valid values. clang rightfully generates an array-bounds warning on the initialization statement. Initialize the pointer to array[0] and change the algorithm from increment before to increment after consume. Signed-off-by: Matthias Kaehlcke--- Note: Resent to include linux-wireless in cc net/wireless/util.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/net/wireless/util.c b/net/wireless/util.c index 68e5f2ecee1a..52795ae5337f 100644 --- a/net/wireless/util.c +++ b/net/wireless/util.c @@ -659,7 +659,7 @@ __ieee80211_amsdu_copy_frag(struct sk_buff *skb, struct sk_buff *frame, int offset, int len) { struct skb_shared_info *sh = skb_shinfo(skb); - const skb_frag_t *frag = >frags[-1]; + const skb_frag_t *frag = >frags[0]; struct page *frag_page; void *frag_ptr; int frag_len, frag_size; @@ -672,10 +672,10 @@ __ieee80211_amsdu_copy_frag(struct sk_buff *skb, struct sk_buff *frame, while (offset >= frag_size) { offset -= frag_size; - frag++; frag_page = skb_frag_page(frag); frag_ptr = skb_frag_address(frag); frag_size = skb_frag_size(frag); + frag++; } frag_ptr += offset; @@ -687,12 +687,12 @@ __ieee80211_amsdu_copy_frag(struct sk_buff *skb, struct sk_buff *frame, len -= cur_len; while (len > 0) { - frag++; frag_len = skb_frag_size(frag); cur_len = min(len, frag_len); __frame_add_frag(frame, skb_frag_page(frag), skb_frag_address(frag), cur_len, frag_len); len -= cur_len; + frag++; } } -- 2.12.2.715.g7642488e1d-goog
Re: [PATCH] tools: bpf_jit_disasm: Add option to dump JIT image to a file.
From: David DaneyDate: Tue, 11 Apr 2017 14:30:52 -0700 > When debugging the JIT on an embedded platform or cross build > environment, libbfd may not be available, making it impossible to run > bpf_jit_disasm natively. > > Add an option to emit a binary image of the JIT code to a file. This > file can then be disassembled off line. Typical usage in this case > might be (pasting mips64 dmesg output to cat command): > >$ cat > jit.raw >$ bpf_jit_disasm -f jit.raw -O jit.bin >$ mips64-linux-gnu-objdump -D -b binary -m mips:isa64r2 -EB jit.bin > > Signed-off-by: David Daney Applied, thanks.
[PATCH net] net: vrf: Fix setting NLM_F_EXCL flag when adding l3mdev rule
Only need 1 l3mdev FIB rule. Fix setting NLM_F_EXCL in the nlmsghdr. Fixes: 1aa6c4f6b8cd8 ("net: vrf: Add l3mdev rules on first device create") Signed-off-by: David Ahern--- drivers/net/vrf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c index 22379da63400..6a6e7f2fee29 100644 --- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -1125,7 +1125,7 @@ static int vrf_fib_rule(const struct net_device *dev, __u8 family, bool add_it) goto nla_put_failure; /* rule only needs to appear once */ - nlh->nlmsg_flags &= NLM_F_EXCL; + nlh->nlmsg_flags |= NLM_F_EXCL; frh = nlmsg_data(nlh); memset(frh, 0, sizeof(*frh)); -- 2.11.0 (Apple Git-81)
Re: [RFC net-next] of: mdio: Honor hints from MDIO bus drivers
From: Florian FainelliDate: Mon, 10 Apr 2017 14:42:58 -0700 > A MDIO bus driver can set phy_mask to indicate which PHYs should be > probed and which should not. Right now, of_mdiobus_register() always > sets mdio->phy_mask to ~0 which means: don't probe anything yourself, > and let the Device Tree scanning do it based on the availability of > child nodes. > > When MDIO buses are stacked together (on purpose, as is done by DSA), we > run into possible double probing which is, at best unnecessary, and at > worse, can cause problems if that's not expected (e.g: during probe > deferral). > > Fix this by remember the original mdio->phy_mask, and make sure that if > it was set to all 0xF, we set it to zero internally in order not to > influence how the child PHY/MDIO device registration is going to behave. > When the original mdio->phy_mask is set to something non-zero, we honor > this value and utilize it as a hint to register only the child nodes > that we have both found, and indicated to be necessary. > > Signed-off-by: Florian Fainelli I don't think it's valid to have a unique OF node appear twice in the device tree hiearchy. Even if you can somehow hack this situation into working, you are asking for all kinds of problems in the long run by doing things that way. If you have to, instantiate a new dummy device (perhaps a platform_device, which thus can have private attributes you can store in a structure whose layout you control) to act as the placeholder for operation interception and property duplication.
Re: [PATCH net-next] net: ipv6: send unsolicited NA on admin up
On 4/13/17 5:45 AM, Hannes Frederic Sowa wrote: > > > On Wed, Apr 12, 2017, at 20:49, David Ahern wrote: >> ndisc_notify is the ipv6 equivalent to arp_notify. When arp_notify is >> set to 1, gratuitous arp requests are sent when the device is brought up. >> The same is expected when ndisc_notify is set to 1 (per ndisc_notify in >> Documentation/networking/ip-sysctl.txt). The NA is not sent on NETDEV_UP >> event; add it. >> >> Fixes: 5cb04436eef6 ("ipv6: add knob to send unsolicited ND on link-layer >> address change") >> Signed-off-by: David Ahern> > Acked-by: Hannes Frederic Sowa > > In future we might be able to make this a bit more robust when DAD is > happening at the same time. agreed.
Re: [PATCH net-next] net: stmmac: set total length of the packet to be transmitted in TDES3
From: Niklas CasselDate: Mon, 10 Apr 2017 20:33:29 +0200 > From: Niklas Cassel > > Field FL/TPL in register TDES3 is not correctly set on GMAC4. > TX appears to be functional on GMAC 4.10a even if this field is not set, > however, to avoid relying on undefined behavior, set the length in TDES3. > > The field has a different meaning depending on if the TSE bit in TDES3 > is set or not (TSO). However, regardless of the TSE bit, the field is > not optional. The field is already set correctly when the TSE bit is set. > > Since there is no limit for the number of descriptors that can be > used for a single packet, the field should be set to the sum of > the buffers contained in: > [ ... ... > ], which should be equal to skb->len. > > Signed-off-by: Niklas Cassel Applied, thanks.
Re: [PATCH net-next] cxgb4: save tid while creating server filter
From: Ganesh GoudarDate: Mon, 10 Apr 2017 21:26:18 +0530 > Save the filter tid while creating the server filter, which is used > later to retrieve the corresponding filter instance while handling > the filter reply. > > Signed-off-by: Ganesh Goudar Applied.
Re: [PATCH 1/1] drivers: net: usb: qmi_wwan: add QMI_QUIRK_SET_DTR for Telit PID 0x1201
From: Daniele PalmasDate: Mon, 10 Apr 2017 17:34:23 +0200 > Telit LE920A4 uses the same pid 0x1201 of LE920, but modem > implementation is different, since it requires DTR to be set for > answering to qmi messages. > > This patch replaces QMI_FIXED_INTF with QMI_QUIRK_SET_DTR: tests on > LE920 have been performed in order to verify backward compatibility. > > Signed-off-by: Daniele Palmas Applied, thank you.
Re: IGMP on IPv6
On 03/22/2017 11:04 AM, Murali Karicheri wrote: > Hi Liu, > > I saw that you have sent patches to the list for IGMP and have a question on > IGMP on IPv6. > Hope you can clarify. I have posted the question already to the list and is > reproduced > below. Let me know if you have an answer. > > = See email with subject "IPv6 IGMP issue in v4.4.44 ?? > > Cut-n-paste from that email > > I see an issue with IGMP for IPv6 when I test HSR redundancy network > interface. As soon as I set up an HSR interface, I see some IGMP messages > (destination mac address: 33 33 00 00 00 02 going over HSR interface to > slave interfaces, at the egress where as for IPv6, I see similar messages > going directly over the Ethernet interfaces that are attached to > HSR master. It appears that the NETDEV_CHANGEUPPER is not handled properly > and the mcast snoop sends the packets over the old interfaces at timer > expiry. > > A dump of the message at the slave Ethernet interface looks like below. > > IPv4 > > [ 64.643842] 33 33 00 00 00 02 70 ff 76 1c 0f 8d 89 2f 10 3e fc > [ 64.649910] 18 86 dd 60 00 00 00 00 10 3a ff fe 80 00 00 00 > [ 64.655705] 00 00 00 72 ff 76 ff fe 1c 0f 8d ff 02 00 00 00 > [ 64.661503] 00 00 00 00 00 00 00 00 00 00 02 85 00 8d dc > > > You can see this is tagged with HSR. > > IPv6 > > [ 65.559130] 33 33 00 00 00 02 70 ff 76 1c 0f 8d 86 dd 60 00 00 > [ 65.565205] 00 00 10 3a ff fe 80 00 00 00 00 00 00 72 ff 76 > [ 65.571011] ff fe 1c 0f 8d ff 02 00 00 00 00 00 00 00 00 00 > [ 65.576806] 00 00 00 00 02 85 00 8d dc 00 00 00 00 01 01 > > This is going directly to the slave Ethernet interface. > > When I put a WARN_ONCE, I found this is coming directly from > mld_ifc_timer_expire() -> mld_sendpack() -> ip6_output() > > Do you think this is fixed in latest kernel at master? If so, could > you point me to some commits. > > Ping... I see this behavior is also seen on v4.9.x Kernel. Any clue if this is fixed by some commit or I need to debug? I see IGMPv6 has some fixes on the list to make it similar to IGMPv4. So can someone clarify this is is a bug at IGMPv6 code or I need to look into the HSR driver code? Since IGMPv4 is going over the HSR interface I am assuming this is a bug in the IGMPv6 code. But since I have not experience with this code can some expert comment please? Murali -- Murali Karicheri Linux Kernel, Keystone
[PATCH v4 2/3] VSOCK: Add vsockmon device
From: Gerard GarciaAdd vsockmon virtual network device that receives packets from the vsock transports and exposes them to user space. Based on the nlmon device. Signed-off-by: Gerard Garcia Signed-off-by: Stefan Hajnoczi --- v4: * Add explicit reserved padding field to struct af_vsockmon_hdr and drop __attribute__((packed)) [Michael, DaveM] v3: * Fix DEFAULT_MTU macro definition [Zhu Yanjun] * Rename af_vsockmon_hdr->t field ->transport for clarity * Update .ndo_get_stats64() return type since it has changed --- drivers/net/Makefile | 1 + include/uapi/linux/vsockmon.h | 58 +++ drivers/net/vsockmon.c| 167 ++ drivers/net/Kconfig | 8 ++ include/uapi/linux/Kbuild | 1 + 5 files changed, 235 insertions(+) create mode 100644 include/uapi/linux/vsockmon.h create mode 100644 drivers/net/vsockmon.c diff --git a/drivers/net/Makefile b/drivers/net/Makefile index 98ed4d9..2d54930 100644 --- a/drivers/net/Makefile +++ b/drivers/net/Makefile @@ -30,6 +30,7 @@ obj-$(CONFIG_GENEVE) += geneve.o obj-$(CONFIG_GTP) += gtp.o obj-$(CONFIG_NLMON) += nlmon.o obj-$(CONFIG_NET_VRF) += vrf.o +obj-$(CONFIG_VSOCKMON) += vsockmon.o # # Networking Drivers diff --git a/include/uapi/linux/vsockmon.h b/include/uapi/linux/vsockmon.h new file mode 100644 index 000..5fce3991 --- /dev/null +++ b/include/uapi/linux/vsockmon.h @@ -0,0 +1,58 @@ +#ifndef _UAPI_VSOCKMON_H +#define _UAPI_VSOCKMON_H + +#include + +/* + * vsockmon is the AF_VSOCK packet capture device. Packets captured have the + * following layout: + * + * +---+ + * | vsockmon header | + * | (struct af_vsockmon_hdr) | + * +---+ + * | transport header | + * | (af_vsockmon_hdr->len bytes long) | + * +---+ + * | payload | + * | (until end of packet) | + * +---+ + * + * The vsockmon header is a transport-independent description of the packet. + * It duplicates some of the information from the transport header so that + * no transport-specific knowledge is necessary to process packets. + * + * The transport header is useful for low-level transport-specific packet + * analysis. Transport type is given in af_vsockmon_hdr->transport and + * transport header length is given in af_vsockmon_hdr->len. + * + * If af_vsockmon_hdr->op is AF_VSOCK_OP_PAYLOAD then the payload follows the + * transport header. Other ops do not have a payload. + */ + +struct af_vsockmon_hdr { + __le64 src_cid; + __le64 dst_cid; + __le32 src_port; + __le32 dst_port; + __le16 op; /* enum af_vsockmon_op */ + __le16 transport; /* enum af_vsockmon_transport */ + __le16 len; /* Transport header length */ + __u8 reserved[2]; +}; + +enum af_vsockmon_op { + AF_VSOCK_OP_UNKNOWN = 0, + AF_VSOCK_OP_CONNECT = 1, + AF_VSOCK_OP_DISCONNECT = 2, + AF_VSOCK_OP_CONTROL = 3, + AF_VSOCK_OP_PAYLOAD = 4, +}; + +enum af_vsockmon_transport { + AF_VSOCK_TRANSPORT_UNKNOWN = 0, + AF_VSOCK_TRANSPORT_NO_INFO = 1, /* No transport information */ + AF_VSOCK_TRANSPORT_VIRTIO = 2, /* Virtio transport header (struct virtio_vsock_hdr) */ +}; + +#endif diff --git a/drivers/net/vsockmon.c b/drivers/net/vsockmon.c new file mode 100644 index 000..0bff1e9 --- /dev/null +++ b/drivers/net/vsockmon.c @@ -0,0 +1,167 @@ +#include +#include +#include +#include +#include +#include +#include +#include + +/* Virtio transport max packet size plus header */ +#define DEFAULT_MTU (VIRTIO_VSOCK_MAX_PKT_BUF_SIZE + \ +sizeof(struct af_vsockmon_hdr)) + +struct pcpu_lstats { + u64 rx_packets; + u64 rx_bytes; + struct u64_stats_sync syncp; +}; + +static int vsockmon_dev_init(struct net_device *dev) +{ + dev->lstats = netdev_alloc_pcpu_stats(struct pcpu_lstats); + return dev->lstats == NULL ? -ENOMEM : 0; +} + +static void vsockmon_dev_uninit(struct net_device *dev) +{ + free_percpu(dev->lstats); +} + +struct vsockmon { + struct vsock_tap vt; +}; + +static int vsockmon_open(struct net_device *dev) +{ + struct vsockmon *vsockmon = netdev_priv(dev); + + vsockmon->vt.dev = dev; + vsockmon->vt.module = THIS_MODULE; + return vsock_add_tap(>vt); +} + +static int vsockmon_close(struct net_device *dev) { + struct vsockmon *vsockmon = netdev_priv(dev); + + return vsock_remove_tap(>vt); +} + +static netdev_tx_t vsockmon_xmit(struct sk_buff *skb, struct net_device *dev) +{ + int len = skb->len; + struct pcpu_lstats *stats = this_cpu_ptr(dev->lstats); + + u64_stats_update_begin(>syncp); +
[PATCH v4 3/3] VSOCK: Add virtio vsock vsockmon hooks
From: Gerard GarciaThe virtio drivers deal with struct virtio_vsock_pkt. Add virtio_transport_deliver_tap_pkt(pkt) for handing packets to the vsockmon device. We call virtio_transport_deliver_tap_pkt(pkt) from net/vmw_vsock/virtio_transport.c and drivers/vhost/vsock.c instead of common code. This is because the drivers may drop packets before handing them to common code - we still want to capture them. Signed-off-by: Gerard Garcia Signed-off-by: Stefan Hajnoczi --- v3: * Hook virtio_transport.c (guest driver), not just drivers/vhost/vsock.c (host driver) --- include/linux/virtio_vsock.h| 1 + drivers/vhost/vsock.c | 8 + net/vmw_vsock/virtio_transport.c| 3 ++ net/vmw_vsock/virtio_transport_common.c | 58 + 4 files changed, 70 insertions(+) diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h index 584f9a6..ab13f07 100644 --- a/include/linux/virtio_vsock.h +++ b/include/linux/virtio_vsock.h @@ -153,5 +153,6 @@ void virtio_transport_free_pkt(struct virtio_vsock_pkt *pkt); void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct virtio_vsock_pkt *pkt); u32 virtio_transport_get_credit(struct virtio_vsock_sock *vvs, u32 wanted); void virtio_transport_put_credit(struct virtio_vsock_sock *vvs, u32 credit); +void virtio_transport_deliver_tap_pkt(struct virtio_vsock_pkt *pkt); #endif /* _LINUX_VIRTIO_VSOCK_H */ diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index 44eed8e..d939ac1 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -176,6 +176,11 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock, restart_tx = true; } + /* Deliver to monitoring devices all correctly transmitted +* packets. +*/ + virtio_transport_deliver_tap_pkt(pkt); + virtio_transport_free_pkt(pkt); } if (added) @@ -383,6 +388,9 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work) len = pkt->len; + /* Deliver to monitoring devices all received packets */ + virtio_transport_deliver_tap_pkt(pkt); + /* Only accept correctly addressed packets */ if (le64_to_cpu(pkt->hdr.src_cid) == vsock->guest_cid) virtio_transport_recv_pkt(pkt); diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c index 68675a1..9dffe02 100644 --- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -144,6 +144,8 @@ virtio_transport_send_pkt_work(struct work_struct *work) list_del_init(>list); spin_unlock_bh(>send_pkt_list_lock); + virtio_transport_deliver_tap_pkt(pkt); + reply = pkt->reply; sg_init_one(, >hdr, sizeof(pkt->hdr)); @@ -370,6 +372,7 @@ static void virtio_transport_rx_work(struct work_struct *work) } pkt->len = len - sizeof(pkt->hdr); + virtio_transport_deliver_tap_pkt(pkt); virtio_transport_recv_pkt(pkt); } } while (!virtqueue_enable_cb(vq)); diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index af087b4..aae60c1 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include @@ -85,6 +86,63 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info *info, return NULL; } +/* Packet capture */ +void virtio_transport_deliver_tap_pkt(struct virtio_vsock_pkt *pkt) +{ + struct sk_buff *skb; + struct af_vsockmon_hdr *hdr; + unsigned char *t_hdr, *payload; + + skb = alloc_skb(sizeof(*hdr) + sizeof(pkt->hdr) + pkt->len, + GFP_ATOMIC); + if (!skb) + return; /* nevermind if we cannot capture the packet */ + + hdr = (struct af_vsockmon_hdr *)skb_put(skb, sizeof(*hdr)); + + /* pkt->hdr is little-endian so no need to byteswap here */ + hdr->src_cid = pkt->hdr.src_cid; + hdr->src_port = pkt->hdr.src_port; + hdr->dst_cid = pkt->hdr.dst_cid; + hdr->dst_port = pkt->hdr.dst_port; + + hdr->transport = cpu_to_le16(AF_VSOCK_TRANSPORT_VIRTIO); + hdr->len = cpu_to_le16(sizeof(pkt->hdr)); + hdr->reserved[0] = hdr->reserved[1] = 0; + + switch(cpu_to_le16(pkt->hdr.op)) { + case VIRTIO_VSOCK_OP_REQUEST: + case VIRTIO_VSOCK_OP_RESPONSE: + hdr->op = cpu_to_le16(AF_VSOCK_OP_CONNECT); + break; + case VIRTIO_VSOCK_OP_RST: + case VIRTIO_VSOCK_OP_SHUTDOWN: + hdr->op = cpu_to_le16(AF_VSOCK_OP_DISCONNECT);
[PATCH v4 1/3] VSOCK: Add vsockmon tap functions
From: Gerard GarciaAdd tap functions that can be used by the vsock transports to deliver packets to vsockmon virtual network devices. Signed-off-by: Gerard Garcia Signed-off-by: Stefan Hajnoczi --- v4: * Call synchronize_net() before module_put() [Michael] v3: * Include missing header in af_vsock_tap.c --- net/vmw_vsock/Makefile | 2 +- include/net/af_vsock.h | 13 ++ include/uapi/linux/if_arp.h | 1 + net/vmw_vsock/af_vsock_tap.c | 107 +++ 4 files changed, 122 insertions(+), 1 deletion(-) create mode 100644 net/vmw_vsock/af_vsock_tap.c diff --git a/net/vmw_vsock/Makefile b/net/vmw_vsock/Makefile index bc27c70..09fc2eb 100644 --- a/net/vmw_vsock/Makefile +++ b/net/vmw_vsock/Makefile @@ -3,7 +3,7 @@ obj-$(CONFIG_VMWARE_VMCI_VSOCKETS) += vmw_vsock_vmci_transport.o obj-$(CONFIG_VIRTIO_VSOCKETS) += vmw_vsock_virtio_transport.o obj-$(CONFIG_VIRTIO_VSOCKETS_COMMON) += vmw_vsock_virtio_transport_common.o -vsock-y += af_vsock.o vsock_addr.o +vsock-y += af_vsock.o af_vsock_tap.o vsock_addr.o vmw_vsock_vmci_transport-y += vmci_transport.o vmci_transport_notify.o \ vmci_transport_notify_qstate.o diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h index f32ed9a..c526d4f 100644 --- a/include/net/af_vsock.h +++ b/include/net/af_vsock.h @@ -188,4 +188,17 @@ struct sock *vsock_find_connected_socket(struct sockaddr_vm *src, void vsock_remove_sock(struct vsock_sock *vsk); void vsock_for_each_connected_socket(void (*fn)(struct sock *sk)); +/ TAP / + +struct vsock_tap { + struct net_device *dev; + struct module *module; + struct list_head list; +}; + +int vsock_init_tap(void); +int vsock_add_tap(struct vsock_tap *vt); +int vsock_remove_tap(struct vsock_tap *vt); +void vsock_deliver_tap(struct sk_buff *skb); + #endif /* __AF_VSOCK_H__ */ diff --git a/include/uapi/linux/if_arp.h b/include/uapi/linux/if_arp.h index 4d024d7..cf73510 100644 --- a/include/uapi/linux/if_arp.h +++ b/include/uapi/linux/if_arp.h @@ -95,6 +95,7 @@ #define ARPHRD_IP6GRE 823 /* GRE over IPv6*/ #define ARPHRD_NETLINK 824 /* Netlink header */ #define ARPHRD_6LOWPAN 825 /* IPv6 over LoWPAN */ +#define ARPHRD_VSOCKMON826 /* Vsock monitor header */ #define ARPHRD_VOID 0x/* Void type, nothing is known */ #define ARPHRD_NONE 0xFFFE/* zero header length */ diff --git a/net/vmw_vsock/af_vsock_tap.c b/net/vmw_vsock/af_vsock_tap.c new file mode 100644 index 000..db0c4e7 --- /dev/null +++ b/net/vmw_vsock/af_vsock_tap.c @@ -0,0 +1,107 @@ +/* + * Tap functions for AF_VSOCK sockets. + * + * Code based on net/netlink/af_netlink.c tap functions. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include +#include +#include +#include + +static DEFINE_SPINLOCK(vsock_tap_lock); +static struct list_head vsock_tap_all __read_mostly = + LIST_HEAD_INIT(vsock_tap_all); + +int vsock_add_tap(struct vsock_tap *vt) { + if (unlikely(vt->dev->type != ARPHRD_VSOCKMON)) + return -EINVAL; + + __module_get(vt->module); + + spin_lock(_tap_lock); + list_add_rcu(>list, _tap_all); + spin_unlock(_tap_lock); + + + return 0; +} +EXPORT_SYMBOL_GPL(vsock_add_tap); + +int vsock_remove_tap(struct vsock_tap *vt) +{ + struct vsock_tap *tmp; + bool found = false; + + spin_lock(_tap_lock); + + list_for_each_entry(tmp, _tap_all, list) { + if (vt == tmp) { + list_del_rcu(>list); + found = true; + goto out; + } + } + + pr_warn("vsock_remove_tap: %p not found\n", vt); +out: + spin_unlock(_tap_lock); + + synchronize_net(); + + if (found) + module_put(vt->module); + + return found ? 0 : -ENODEV; +} +EXPORT_SYMBOL_GPL(vsock_remove_tap); + +static int __vsock_deliver_tap_skb(struct sk_buff *skb, +struct net_device *dev) +{ + int ret = 0; + struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC); + + if (nskb) { + dev_hold(dev); + + nskb->dev = dev; + ret = dev_queue_xmit(nskb); + if (unlikely(ret > 0)) + ret = net_xmit_errno(ret); + + dev_put(dev); + } + + return ret; +} + +static void __vsock_deliver_tap(struct sk_buff *skb) +{ + int ret; + struct vsock_tap *tmp; + + list_for_each_entry_rcu(tmp, _tap_all, list) { + ret =
Re: [RFC PATCH 6/7] net: allow simultaneous SW and HW transmit timestamping
On Thu, Apr 13, 2017 at 11:24 AM, Keller, Jacob Ewrote: > > >> -Original Message- >> From: Miroslav Lichvar [mailto:mlich...@redhat.com] >> Sent: Thursday, April 13, 2017 8:00 AM >> >> Oh, I see. I was struggling to find a good name for this option. >> >> > The name for this option is therefore not very descriptive. Perhaps >> > SOF_TIMESTAMPING_OPT_BOTH_SW_HW. >> >> Simultaneous SW/HW timestamping was already possible for incoming >> packets. Maybe _OPT_TX_SWHW would be better? >> > > This sounds more accurate to me. Agreed.
[PATCH v4 0/3] VSOCK: vsockmon virtual device to monitor AF_VSOCK sockets.
v4: * Add explicit reserved padding field to struct af_vsockmon_hdr and drop __attribute__((packed)) [Michael, DaveM] * Call synchronize_net() before module_put() [Michael] v3: * Hook virtio_transport.c (guest driver), not just drivers/vhost/vsock.c (host driver) * Fix DEFAULT_MTU macro definition [Zhu Yanjun] * Rename af_vsockmon_hdr->t field ->transport for clarity * Update .ndo_get_stats64() return type since it has changed * Include missing header in af_vsock_tap.c This is a continuation of Gerard Garcia's work on the vsockmon packet capture interface for AF_VSOCK. Packet capture is an essential feature for network communication. Gerard began addressing this feature gap in his Google Summer of Code 2016 project. I have cleaned up, rebased, and retested the v2 series he posted previously. The design follows the nlmon packet capture interface closely. This is because vsock has the same problem as netlink: there is no netdev on which packets can be captured. The nlmon driver is a synthetic netdev purely for the purpose of enabling packet capture. We follow the same approach here with vsockmon. See include/uapi/linux/vsockmon.h in this series for details on the packet layout. How to try it: 1. Build tcpdump with vsockmon patches: $ git clone -b vsock https://github.com/stefanha/libpcap $ (cd libcap && ./configure && make) $ git clone -b vsock https://github.com/stefanha/tcpdump $ (cd tcpdump && ./configure && make) 2. Build nc-vsock (a netcat-like tool): $ git clone https://github.com/stefanha/nc-vsock $ (cd nc-vsock && make) 3. Launch a virtual machine: # modprobe vhost_vsock # qemu-system-x86_64 -M accel=kvm -m 1024 -cpu host \ -drive if=virtio,file=test.img,format=raw \ -device vhost-vsock-pci,guest-cid=3 (Assumes guest is running a kernel with this patch) 4. Capture AF_VSOCK traffic in guest and/or host: # modprobe vsockmon # ip link add type vsockmon # ip link set vsockmon0 up # tcpdump -i vsockmon0 -vvv 5. Communicate! (host)$ nc-vsock -l 1234 (guest)$ nc-vsock 2 1234 Gerard Garcia (3): VSOCK: Add vsockmon tap functions VSOCK: Add vsockmon device VSOCK: Add virtio vsock vsockmon hooks drivers/net/Makefile| 1 + net/vmw_vsock/Makefile | 2 +- include/linux/virtio_vsock.h| 1 + include/net/af_vsock.h | 13 +++ include/uapi/linux/if_arp.h | 1 + include/uapi/linux/vsockmon.h | 58 +++ drivers/net/vsockmon.c | 167 drivers/vhost/vsock.c | 8 ++ net/vmw_vsock/af_vsock_tap.c| 107 net/vmw_vsock/virtio_transport.c| 3 + net/vmw_vsock/virtio_transport_common.c | 58 +++ drivers/net/Kconfig | 8 ++ include/uapi/linux/Kbuild | 1 + 13 files changed, 427 insertions(+), 1 deletion(-) create mode 100644 include/uapi/linux/vsockmon.h create mode 100644 drivers/net/vsockmon.c create mode 100644 net/vmw_vsock/af_vsock_tap.c -- 2.9.3
Re: [RFC PATCH 3/7] net: add option to get information about timestamped packets
On Thu, Apr 13, 2017 at 11:18 AM, Miroslav Lichvarwrote: > On Thu, Apr 13, 2017 at 10:37:07AM -0400, Willem de Bruijn wrote: >> On Wed, Apr 12, 2017 at 10:17 AM, Miroslav Lichvar >> wrote: >> > Extend the skb_shared_hwtstamps structure with the index of the >> > real interface which received or transmitted the packet and the length >> > of the packet at layer 2. >> >> The original packet is received along with the timestamp. > > But only outgoing packets, right? Timestamps for incoming packets are also passed alongside the original packet. >> Why is this L2 length needed? > > It's needed for incoming packets to allow converting of preamble > timestamps to trailer timestamps. Receiving the mac length of a packet sounds like a feature independent from timestamping. Either an ioctl similar to SIOCGIFMTU or, if it may vary due to existince of vlan headers, a new independent cmsg at the SOL_SOCKET layer. >> > Add a SOF_TIMESTAMPING_OPT_PKTINFO flag to >> > the SO_TIMESTAMPING option to allow applications to get this information >> > as struct scm_ts_pktinfo in SCM_TIMESTAMPING_PKTINFO control message. >> >> This patch saves skb->dev->ifindex, which is the same as existing >> SOF_TIMESTAMPING_OPT_CMSG. See also the bug fix for that >> feature I sent yesterday: http://patchwork.ozlabs.org/patch/750197/ > > The main point is that it provides the index of the device which > received the packet. It does duplicate the functionality of OPT_CMSG + > IP_PKTINFO for outgoing packets, but I thought it might be useful with > the TSONLY option. Agreed. I'd prefer to reuse the existing option for that and just extend it to work together with TSONLY. We will have to set serr->header.h4.iif from something other than skb->dev if the skb was allocated fresh in __skb_tstamp_tx without the device association.
[patch net-next] MAINTAINERS: rename TC entry and add couple of header files
From: Jiri PirkoThe section is not specific only to "TC classifiers", but applies to the whole TC subsystem. Also, add couple of forgotten headers. Signed-off-by: Jiri Pirko --- MAINTAINERS | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 5397f54..549e8e1 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -12180,12 +12180,17 @@ F:Documentation/accounting/taskstats* F: include/linux/taskstats* F: kernel/taskstats.c -TC CLASSIFIER +TC subsystem M: Jamal Hadi Salim L: netdev@vger.kernel.org S: Maintained F: include/net/pkt_cls.h +F: include/net/pkt_sched.h +F: include/net/tc_act/ F: include/uapi/linux/pkt_cls.h +F: include/uapi/linux/pkt_sched.h +F: include/uapi/linux/tc_act/ +F: include/uapi/linux/tc_ematch/ F: net/sched/ TCP LOW PRIORITY MODULE -- 2.7.4
Re: [Patch net-next v2] net_sched: move the empty tp check from ->destroy() to ->delete()
On Thu, Apr 13, 2017 at 12:28 AM, kbuild test robot <l...@intel.com> wrote: > Hi Cong, > > [auto build test WARNING on net-next/master] > > url: > https://github.com/0day-ci/linux/commits/Cong-Wang/net_sched-move-the-empty-tp-check-from-destroy-to-delete/20170413-145318 > config: x86_64-randconfig-x004-201715 (attached as .config) > compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 > reproduce: > # save the attached .config to linux build tree > make ARCH=x86_64 > > All warnings (new ones prefixed by >>): > >net/sched/cls_matchall.c: In function 'mall_destroy': >>> net/sched/cls_matchall.c:99:10: warning: 'return' with a value, in function >>> returning void > return true; > ^~~~ >net/sched/cls_matchall.c:93:13: note: declared here > static void mall_destroy(struct tcf_proto *tp) > ^~~~ >net/sched/cls_matchall.c:105:9: warning: 'return' with a value, in > function returning void > return true; > ^~~~ >net/sched/cls_matchall.c:93:13: note: declared here > static void mall_destroy(struct tcf_proto *tp) > ^~~~ Ah, I must miss it while compiling... Will send v3 after waiting for other comments.
[PATCH v4 net-next RFC] net: Generic XDP
This provides a generic SKB based non-optimized XDP path which is used if either the driver lacks a specific XDP implementation, or the user requests it via a new IFLA_XDP_FLAGS value named XDP_FLAGS_SKB_MODE. It is arguable that perhaps I should have required something like this as part of the initial XDP feature merge. I believe this is critical for two reasons: 1) Accessibility. More people can play with XDP with less dependencies. Yes I know we have XDP support in virtio_net, but that just creates another depedency for learning how to use this facility. I wrote this to make life easier for the XDP newbies. 2) As a model for what the expected semantics are. If there is a pure generic core implementation, it serves as a semantic example for driver folks adding XDP support. This is just a rough draft and is untested. One thing I have not tried to address here is the issue of XDP_PACKET_HEADROOM, thanks to Daniel for spotting that. It seems incredibly expensive to do a skb_cow(skb, XDP_PACKET_HEADROOM) or whatever even if the XDP program doesn't try to push headers at all. I think we really need the verifier to somehow propagate whether certain XDP helpers are used or not. Signed-off-by: David S. Miller--- v4: - Fix MAC header adjustmnet before calling prog (David Ahern) - Disable LRO when generic XDP is installed (Michael Chan) - Bypass qdisc et al. on XDP_TX and record the event (Alexei) - Do not perform generic XDP on reinjected packets (DaveM) v3: - Make sure XDP program sees packet at MAC header, push back MAC header if we do XDP_TX. (Alexei) - Elide GRO when generic XDP is in use. (Alexei) - Add XDP_FLAG_SKB_MODE flag which the user can use to request generic XDP even if the driver has an XDP implementation. (Alexei) - Report whether SKB mode is in use in rtnl_xdp_fill() via XDP_FLAGS attribute. (Daniel) v2: - Add some "fall through" comments in switch statements based upon feedback from Andrew Lunn - Use RCU for generic xdp_prog, thanks to Johannes Berg. diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index b0aa089..071a58b 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1891,9 +1891,17 @@ struct net_device { struct lock_class_key *qdisc_tx_busylock; struct lock_class_key *qdisc_running_key; boolproto_down; + struct bpf_prog __rcu *xdp_prog; }; #define to_net_dev(d) container_of(d, struct net_device, dev) +static inline bool netif_elide_gro(const struct net_device *dev) +{ + if (!(dev->features & NETIF_F_GRO) || dev->xdp_prog) + return true; + return false; +} + #defineNETDEV_ALIGN32 static inline diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index 8b405af..633aa02 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -887,7 +887,9 @@ enum { /* XDP section */ #define XDP_FLAGS_UPDATE_IF_NOEXIST(1U << 0) -#define XDP_FLAGS_MASK (XDP_FLAGS_UPDATE_IF_NOEXIST) +#define XDP_FLAGS_SKB_MODE (2U << 0) +#define XDP_FLAGS_MASK (XDP_FLAGS_UPDATE_IF_NOEXIST | \ +XDP_FLAGS_SKB_MODE) enum { IFLA_XDP_UNSPEC, diff --git a/net/core/dev.c b/net/core/dev.c index ef9fe60e..9ed4569 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -95,6 +95,7 @@ #include #include #include +#include #include #include #include @@ -4247,6 +4248,88 @@ static int __netif_receive_skb(struct sk_buff *skb) return ret; } +static struct static_key generic_xdp_needed __read_mostly; + +static int generic_xdp_install(struct net_device *dev, struct netdev_xdp *xdp) +{ + struct bpf_prog *new = xdp->prog; + int ret = 0; + + switch (xdp->command) { + case XDP_SETUP_PROG: { + struct bpf_prog *old = rtnl_dereference(dev->xdp_prog); + + rcu_assign_pointer(dev->xdp_prog, new); + if (old) + bpf_prog_put(old); + + if (old && !new) + static_key_slow_dec(_xdp_needed); + else if (new && !old) + static_key_slow_inc(_xdp_needed); + break; + } + + case XDP_QUERY_PROG: + xdp->prog_attached = !!rcu_access_pointer(dev->xdp_prog); + break; + + default: + ret = -EINVAL; + break; + } + + return ret; +} + +static u32 netif_receive_generic_xdp(struct sk_buff *skb, +struct bpf_prog *xdp_prog) +{ + struct xdp_buff xdp; + u32 act = XDP_DROP; + void *orig_data; + int hlen, off; + + if (skb_linearize(skb)) + goto do_drop; + + /* The XDP program wants to see the packet starting at the MAC +* header. +*/ +
Re: [PATCH v3 net-next RFC] Generic XDP
From: Daniel BorkmannDate: Thu, 13 Apr 2017 17:57:06 +0200 > On 04/12/2017 08:54 PM, David Miller wrote: > [...] >> +static u32 netif_receive_generic_xdp(struct sk_buff *skb, >> + struct bpf_prog *xdp_prog) >> +{ >> +struct xdp_buff xdp; >> +u32 act = XDP_DROP; >> +void *orig_data; >> +int hlen, off; >> + >> +if (skb_linearize(skb)) > > Btw, given the skb can come from all kind of points in the stack, > it could also be a clone at this point. One example is act_mirred > which in fact does skb_clone() and can push the skb back to > ingress path through netif_receive_skb() and thus could then go > into generic xdp processing, where skb can be mangled. > > Instead of skb_linearize() we would therefore need to use something > like skb_ensure_writable(skb, skb->len) as equivalent, which also > makes sure that we unclone whenever needed. We could use skb_cow() for this purpose, which deals with cloning as well as enforcing headroom. However, thinking further about this, the goal is to make generic XDP match precisely how in-driver-XDP behaves. Therefore, such redirects from act_mirred would never flow through the XDP path. No other possibility can cause us to see a cloned packet here, we are before network taps are processed, etc. So in my opinion the thing to do is to elide generic XDP if the SKB is cloned.