date:20170413

[PATCH] rtl_bt: Update firmware for BT part of rtl8822be

2017-04-13 Thread Larry Finger

These files were supplied by Realtek.

Signed-off-by: Larry Finger 
---
 WHENCE |   3 ++-
 rtl_bt/rtl8822b_config.bin | Bin 32 -> 14 bytes
 rtl_bt/rtl8822b_fw.bin | Bin 51756 -> 51176 bytes
 3 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/WHENCE b/WHENCE
index dcd5011..1ac6fd0 100644
--- a/WHENCE
+++ b/WHENCE
@@ -2859,7 +2859,8 @@ Licence: Redistributable. See 
LICENCE.rtlwifi_firmware.txt for details.
 
 Found in vendor driver, linux_bt_usb_2.11.20140423_8723be.rar
 From https://github.com/troy-tan/driver_store
-Files rtl_bt/rtl8822b_* came directly from Realtek.
+Files rtl_bt/rtl8822b_* came directly from Realtek. These files are
+updated on April 14, 2017.
 
 --
 
diff --git a/rtl_bt/rtl8822b_config.bin b/rtl_bt/rtl8822b_config.bin
index 
a691e7ca258b0e7dc4ff2bdbdc1d13f2a613526b..b00270edca74c0bcd0234ceb8fe313a61ee28416
 100644
GIT binary patch
literal 14
VcmWGtt=!JRaFcc95%5{95wX6aRco#FaXJC%U^0OgZI6A}jA-BJ
z5eJL4+7km-i(oYY+k4OLJt=70Yw;Wftvx;My|iizNzkAkdrxnB`Tuqztv~Hpa
zGc$Yk?B`l*uk~AN+fuLW`<*gtdAUX0R9;rG_Ho-$B}G#TN=fC-s+ekcAfKWr?<|Uv
zHlN6H#0HtpNu*>b$NsHRBz^f`|SX^Wfxg0em5<~zdoc#j-0G>p
zW6ho4ZMfg&*(U$trEL0H`RkWP)IBzep=~yQ-6o@KdU-l+m7jh2-uSg{r;8DH3^+Ru
zx^@hl9XQzWh%D2Y35`ms+!QQzK7iIt!7}F(T;GSs}xvmC$6L%$IX>Rr@z
zrfmIGR$58vB_3%7(!Wh}${0D4vy;#j?`K7`;?i#7Zs5ds@j#S0eP%VwAas>kgKc
zj^lb&`f*|A>y2PDSaZ0WB+4MCv6~1}imb_0r!QDip-c#dKEUQE;Xj1<41v?#J~|
zBrEmeirJKYLM0ba>C^G7^d-J|;vQCd8Sg$s+2gn#$iy=$mp0>jbl?KY)J3pTNwi*i
z17*A@a~>UH(2*w!@r_Q80a6qvVRZK?^wP`tuofwMG794(6+Z5b)k}YgVx_V8C=%B$
zT=mG`g^x3Et;cmDt}V!)jcXgOrz7NlFP`psT=@e{Q?gBEA2k(o8KCvjSX^fy4dH$%
zt|p`vQ{}hjmU>pHXVCJDc}4Q(`K#rHZ>(2tyG6?<-gsBpc#D$#3m#FdnkwJFuv*U7
zd~S%wX!_K8DLRdn*3Q#QKb5IUO)c~-Wwp{Gagw5OY@|ZodtkF%l=OwNY;c6EPoBc8
zLR+$Pv6xL3wgTy)qnRyUyw<{nc(YSFoAn`zL|5iL(#a*u+W>RFf^L(9*1`
z$QKT-)sjitjaPE{?o`MvTl^Y@?T|A(GsU^*GCho}+TAEn
z+}>+7Lacnx=)@d~OA6Epkqw%<2ym0Cj4B_yNiY2idS}GsWg@*zQzv$umaHpEFpN
zRyy+h2y{xi6<5tzO#4l~t@~dGH++Pr1$yB2Zy=>R
ziJ`b}KKy5Iiu^!&%9xpLstZZ=nw}r*>I*6Lr8kw85f`3z)(G13kv+6c3`AXc?TGHe
zB>8vg<1@M4L)G{3y*D%)ruxRXZSB7`NZni=xlkSafH0SB~`8*{mi&~h$5rC?cl>8SH*=Z37c
zKFSu=z0FOHTxsAtYCH;oAIxjqDrdB?4v59KT}j|yH%Fox=k*6^=2Wu=PA&@
zQ!aXAqGfNhf%|QuM#bhF5FT^ekNEukytC7F_NPKI@qEBh0ZI6L2Y;L|e)*8D;
z!4FwQUxkMjrOsUfHFBMS>wc`=duUy|^APVb1Yd7VxJd2|!CQ_-L-5H){l!Pyovm3y
z>IIcBBo6e~_KoPV_v_AayP5H;@unr9yYNUGAw{WWz3@PrDd=g^`Qv-oiz`qKE5}_-
z+fJ$3#2RFUgW`pm#7Hz!|O0`jCqLY%ln`m^29g_+UgvhtZ=-Hg6+cdqi;o#2
zM!bHZ&_0kV+m5Bv3G!>lQXFx+Svh{s{E6!e7WdP(lc0Sw?#6j?Fh6`_SHHR7@j|;V
z!dC0I`$l{CM#IFWo6%L9k=Nxj6_`*q10{8@(;{`7JSnVa`}hUSaAc+WT*P_8
zzN+ry!g+AXvy?G>`=gvY5UY(r42gNckaZXe^eKunM7(eB2ix)gwKJv>A!8
z4Pl>D1$A;1?(1+r6=^pTAq9xZXWnfxxaXR|pR7puld|9ErXYllsKra)r>MuA<#)#lig9Y08~1r3@5*;8GbvGPSFl>KNsu^Cgx8fF5?6|x9FM6p
z!p8KZdQ?3|EXogd`nFEa$Vrh~+T-$xvKG|gR#PWsO=;ljk}>hoWn`=S
z1bgD0S$jUGKt%>IRn9({p5}=_b4RFKs!d9sx|Oo&9?Mks{mnzwYM$aI%a)Ue<
zWADf}Ygoc%$rIljA@iFVna$HEVpX2hyrl}oo=A^4VGgYr)X69ANlqX#;K#CvQuR5?
zxB;g;>jH#nd6OS@v-cjn4tJmr2OwTiClPb6wS*H
z@d@SD8=U-U!C1wO6gka)ul7r_X(M5drx#9+acDt{Haoaf;v8D}`9gyN7iZyea$*1V
zq}*I!M#?c=}Ea{l_pH$uya>crOE3jO;1>=Rd=g*(>ChEiZcvL
zgMOZl^0+6lbm0T;2RzkZJm7f}X#7cH1m0#x}dc?N^5#Cx)*%y>G(!+GCLb`;y
z8^jyhnhDqDqkhus6zGSbQ2E}EX2;x}|94h?^`qj1tZ1rS$5M7(o7wN|VlpLeRvU%1%c8=U9VRuZ9X_dQ9>hDlYPDtuGrc^tow-}!w4_fL32zitEW
zs`0F}F7i2pE4@mqGx)IlwJF2#n8)rbF7yUhdA-4>z0~dEf>Jy^gQqoK+Qx&`cxZQw
z%^r|Cwz}+r*1*<|we7PH(c%uD-8cL6YR?+fYY0B*Gz6=ikF=9M?eH0b%e}<~$n~b%
z9&$g_X6VT1Fm_~iP#ia7eaCXogZSn_^y

Re: [PATCH 02/22] nvmet: Make use of the new sg_map helper function

2017-04-13 Thread Christoph Hellwig

On Thu, Apr 13, 2017 at 11:06:16PM -0600, Logan Gunthorpe wrote:
> Or maybe I'll just send a patch for that
> separately seeing it doesn't depend on anything and is pretty simple. I
> can do that next week.

Yes, please just send that patch linux-nvme, we should be able to get
it into 4.12.

Re: [PATCH 02/22] nvmet: Make use of the new sg_map helper function

2017-04-13 Thread Logan Gunthorpe

On 13/04/17 10:59 PM, Christoph Hellwig wrote:
> On Thu, Apr 13, 2017 at 04:05:15PM -0600, Logan Gunthorpe wrote:
>> This is a straight forward conversion in two places. Should kmap fail,
>> the code will return an INVALD_DATA error in the completion.
> 
> It really should be using nvmet_copy_from_sgl to make things safer,
> as we don't want to rely on any particular SG list layout.  In fact
> I'm pretty sure I did the conversion at some point, but it must never
> have made it upstream.

Ha, I did the conversion too a couple times for my RFC series. I can
change this patch to do that. Or maybe I'll just send a patch for that
separately seeing it doesn't depend on anything and is pretty simple. I
can do that next week.

Thanks,

Logan

[PATCH net-next 1/1 v3] drivers: net: rmnet: Initial implementation

2017-04-13 Thread Subash Abhinov Kasiviswanathan

RmNet driver provides a transport agnostic MAP (multiplexing and
aggregation protocol) support in embedded module. Module provides
virtual network devices which can be attached to any IP-mode
physical device. This will be used to provide all MAP functionality
on future hardware in a single consistent location.

Signed-off-by: Subash Abhinov Kasiviswanathan 
---
 Documentation/networking/rmnet.txt|  83 +
 drivers/net/Kconfig   |   2 +
 drivers/net/Makefile  |   1 +
 drivers/net/rmnet/Kconfig |  23 ++
 drivers/net/rmnet/Makefile|  14 +
 drivers/net/rmnet/rmnet_config.c  | 592 ++
 drivers/net/rmnet/rmnet_config.h  |  79 +
 drivers/net/rmnet/rmnet_handlers.c| 517 +
 drivers/net/rmnet/rmnet_handlers.h|  24 ++
 drivers/net/rmnet/rmnet_main.c|  52 +++
 drivers/net/rmnet/rmnet_map.h | 100 ++
 drivers/net/rmnet/rmnet_map_command.c | 180 +++
 drivers/net/rmnet/rmnet_map_data.c| 145 +
 drivers/net/rmnet/rmnet_private.h |  76 +
 drivers/net/rmnet/rmnet_stats.c   |  86 +
 drivers/net/rmnet/rmnet_stats.h   |  61 
 drivers/net/rmnet/rmnet_vnd.c | 353 
 drivers/net/rmnet/rmnet_vnd.h |  34 ++
 include/uapi/linux/Kbuild |   1 +
 include/uapi/linux/if_arp.h   |   1 +
 include/uapi/linux/if_ether.h |   4 +-
 include/uapi/linux/rmnet.h|  34 ++
 22 files changed, 2461 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/networking/rmnet.txt
 create mode 100644 drivers/net/rmnet/Kconfig
 create mode 100644 drivers/net/rmnet/Makefile
 create mode 100644 drivers/net/rmnet/rmnet_config.c
 create mode 100644 drivers/net/rmnet/rmnet_config.h
 create mode 100644 drivers/net/rmnet/rmnet_handlers.c
 create mode 100644 drivers/net/rmnet/rmnet_handlers.h
 create mode 100644 drivers/net/rmnet/rmnet_main.c
 create mode 100644 drivers/net/rmnet/rmnet_map.h
 create mode 100644 drivers/net/rmnet/rmnet_map_command.c
 create mode 100644 drivers/net/rmnet/rmnet_map_data.c
 create mode 100644 drivers/net/rmnet/rmnet_private.h
 create mode 100644 drivers/net/rmnet/rmnet_stats.c
 create mode 100644 drivers/net/rmnet/rmnet_stats.h
 create mode 100644 drivers/net/rmnet/rmnet_vnd.c
 create mode 100644 drivers/net/rmnet/rmnet_vnd.h
 create mode 100644 include/uapi/linux/rmnet.h

diff --git a/Documentation/networking/rmnet.txt 
b/Documentation/networking/rmnet.txt
new file mode 100644
index 000..58d3ea2
--- /dev/null
+++ b/Documentation/networking/rmnet.txt
@@ -0,0 +1,83 @@
+1. Introduction
+
+rmnet driver is used for supporting the Multiplexing and aggregation
+Protocol (MAP). This protocol is used by all recent chipsets using Qualcomm
+Technologies, Inc. modems.
+
+This driver can be used to register onto any physical network device in
+IP mode. Physical transports include USB, HSIC, PCIe and IP accelerator.
+
+Multiplexing allows for creation of logical netdevices (rmnet devices) to
+handle multiple private data networks (PDN) like a default internet, tethering,
+multimedia messaging service (MMS) or IP media subsystem (IMS). Hardware sends
+packets with MAP headers to rmnet. Based on the multiplexer id, rmnet
+routes to the appropriate PDN after removing the MAP header.
+
+Aggregation is required to achieve high data rates. This involves hardware
+sending aggregated bunch of MAP frames. rmnet driver will de-aggregate
+these MAP frames and send them to appropriate PDN's.
+
+2. Packet format
+
+a. MAP packet (data / control)
+
+MAP header has the same endianness of the IP packet.
+
+Packet format -
+
+Bit 0 1   2-7  8 - 15   16 - 31
+Function   Command / Data   Reserved Pad   Multiplexer IDPayload length
+Bit32 - x
+Function Raw  Bytes
+
+Command (1)/ Data (0) bit value is to indicate if the packet is a MAP command
+or data packet. Control packet is used for transport level flow control. Data
+packets are standard IP packets.
+
+Reserved bits are usually zeroed out and to be ignored by receiver.
+
+Padding is number of bytes to be added for 4 byte alignment if required by
+hardware.
+
+Multiplexer ID is to indicate the PDN on which data has to be sent.
+
+Payload length includes the padding length but does not include MAP header
+length.
+
+b. MAP packet (command specific)
+
+Bit 0 1   2-7  8 - 15   16 - 31
+Function   Command Reserved Pad   Multiplexer IDPayload length
+Bit  32 - 3940 - 4546 - 47   48 - 63
+Function   Command nameReserved   Command Type   Reserved
+Bit  64 - 95
+Function   Transaction ID
+Bit  96 - 127
+Function   Command data
+
+Command 1 indicates disabling flow while 2 is enabling flow
+
+Command types -
+0 for MAP command request
+1 is to

[PATCH net-next 0/1 v3] drivers: net: Add support for rmnet driver

2017-04-13 Thread Subash Abhinov Kasiviswanathan

This patch adds support for the rmnet_data driver which is required to
support recent chipsets using Qualcomm Technologies, Inc. modems. The data
from hardware follows the multiplexing and aggregation protocol (MAP).

This driver can be used to register onto any physical network device in
IP mode. Physical transports include USB, HSIC, PCIe and IP accelerator.

rmnet_data driver helps to decode these packets and queue them to network
stack (and encode and transmit it to the physical device).

--
v1: Same as the RFC patch with some minor fixes for issues reported by
kbuild test robot.

v1->v2: Change datatypes and remove config IOCTL as mentioned by David.
Also fix checkpatch issues and remove some unused code.

v2->v3: Move location to drivers/net and rename to rmnet. Change the
userspace - netlink communication from custom netlink to rtnl_link_ops.
Refactor some code. Use a fixed config for ingress and egress.

Subash Abhinov Kasiviswanathan (1):
  drivers: net: rmnet: Initial implementation

 Documentation/networking/rmnet.txt|  83 +
 drivers/net/Kconfig   |   2 +
 drivers/net/Makefile  |   1 +
 drivers/net/rmnet/Kconfig |  23 ++
 drivers/net/rmnet/Makefile|  14 +
 drivers/net/rmnet/rmnet_config.c  | 592 ++
 drivers/net/rmnet/rmnet_config.h  |  79 +
 drivers/net/rmnet/rmnet_handlers.c| 517 +
 drivers/net/rmnet/rmnet_handlers.h|  24 ++
 drivers/net/rmnet/rmnet_main.c|  52 +++
 drivers/net/rmnet/rmnet_map.h | 100 ++
 drivers/net/rmnet/rmnet_map_command.c | 180 +++
 drivers/net/rmnet/rmnet_map_data.c| 145 +
 drivers/net/rmnet/rmnet_private.h |  76 +
 drivers/net/rmnet/rmnet_stats.c   |  86 +
 drivers/net/rmnet/rmnet_stats.h   |  61 
 drivers/net/rmnet/rmnet_vnd.c | 353 
 drivers/net/rmnet/rmnet_vnd.h |  34 ++
 include/uapi/linux/Kbuild |   1 +
 include/uapi/linux/if_arp.h   |   1 +
 include/uapi/linux/if_ether.h |   4 +-
 include/uapi/linux/rmnet.h|  34 ++
 22 files changed, 2461 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/networking/rmnet.txt
 create mode 100644 drivers/net/rmnet/Kconfig
 create mode 100644 drivers/net/rmnet/Makefile
 create mode 100644 drivers/net/rmnet/rmnet_config.c
 create mode 100644 drivers/net/rmnet/rmnet_config.h
 create mode 100644 drivers/net/rmnet/rmnet_handlers.c
 create mode 100644 drivers/net/rmnet/rmnet_handlers.h
 create mode 100644 drivers/net/rmnet/rmnet_main.c
 create mode 100644 drivers/net/rmnet/rmnet_map.h
 create mode 100644 drivers/net/rmnet/rmnet_map_command.c
 create mode 100644 drivers/net/rmnet/rmnet_map_data.c
 create mode 100644 drivers/net/rmnet/rmnet_private.h
 create mode 100644 drivers/net/rmnet/rmnet_stats.c
 create mode 100644 drivers/net/rmnet/rmnet_stats.h
 create mode 100644 drivers/net/rmnet/rmnet_vnd.c
 create mode 100644 drivers/net/rmnet/rmnet_vnd.h
 create mode 100644 include/uapi/linux/rmnet.h

-- 
1.9.1

Re: [PATCH 02/22] nvmet: Make use of the new sg_map helper function

2017-04-13 Thread Christoph Hellwig

On Thu, Apr 13, 2017 at 04:05:15PM -0600, Logan Gunthorpe wrote:
> This is a straight forward conversion in two places. Should kmap fail,
> the code will return an INVALD_DATA error in the completion.

It really should be using nvmet_copy_from_sgl to make things safer,
as we don't want to rely on any particular SG list layout.  In fact
I'm pretty sure I did the conversion at some point, but it must never
have made it upstream.

[PATCH v2 net 2/2] net: ethernet: mediatek: fix inconsistency of port number carried in TXD

2017-04-13 Thread sean.wang

From: Sean Wang 

Fix port inconsistency on TXD due to hardware BUG that would cause
different port number is carried on the same TXD between tx_map()
and tx_unmap() with the iperf test. It would cause confusing BQL
logic which leads to kernel panic when dual GMAC runs concurrently.

Signed-off-by: Sean Wang 
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 14 +-
 drivers/net/ethernet/mediatek/mtk_eth_soc.h | 12 +---
 2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 48ba617..6313c53 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -648,6 +648,8 @@ static int mtk_tx_map(struct sk_buff *skb, struct 
net_device *dev,
 
WRITE_ONCE(itxd->txd1, mapped_addr);
itx_buf->flags |= MTK_TX_FLAGS_SINGLE0;
+   itx_buf->flags |= (!mac->id) ? MTK_TX_FLAGS_FPORT0 :
+ MTK_TX_FLAGS_FPORT1;
dma_unmap_addr_set(itx_buf, dma_addr0, mapped_addr);
dma_unmap_len_set(itx_buf, dma_len0, skb_headlen(skb));
 
@@ -689,6 +691,9 @@ static int mtk_tx_map(struct sk_buff *skb, struct 
net_device *dev,
memset(tx_buf, 0, sizeof(*tx_buf));
tx_buf->skb = (struct sk_buff *)MTK_DMA_DUMMY_DESC;
tx_buf->flags |= MTK_TX_FLAGS_PAGE0;
+   tx_buf->flags |= (!mac->id) ? MTK_TX_FLAGS_FPORT0 :
+MTK_TX_FLAGS_FPORT1;
+
dma_unmap_addr_set(tx_buf, dma_addr0, mapped_addr);
dma_unmap_len_set(tx_buf, dma_len0, frag_map_size);
frag_size -= frag_map_size;
@@ -1011,17 +1016,16 @@ static int mtk_poll_tx(struct mtk_eth *eth, int budget)
 
while ((cpu != dma) && budget) {
u32 next_cpu = desc->txd2;
-   int mac;
+   int mac = 0;
 
desc = mtk_qdma_phys_to_virt(ring, desc->txd2);
if ((desc->txd3 & TX_DMA_OWNER_CPU) == 0)
break;
 
-   mac = (desc->txd4 >> TX_DMA_FPORT_SHIFT) &
-  TX_DMA_FPORT_MASK;
-   mac--;
-
tx_buf = mtk_desc_to_tx_buf(ring, desc);
+   if (tx_buf->flags & MTK_TX_FLAGS_FPORT1)
+   mac = 1;
+
skb = tx_buf->skb;
if (!skb) {
condition = 1;
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.h 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
index 996024d..3c46a3b 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -410,12 +410,18 @@ struct mtk_hw_stats {
struct u64_stats_sync   syncp;
 };
 
-/* PDMA descriptor can point at 1-2 segments. This enum allows us to track how
- * memory was allocated so that it can be freed properly
- */
 enum mtk_tx_flags {
+   /* PDMA descriptor can point at 1-2 segments. This enum allows us to
+* track how memory was allocated so that it can be freed properly.
+*/
MTK_TX_FLAGS_SINGLE0= 0x01,
MTK_TX_FLAGS_PAGE0  = 0x02,
+
+   /* MTK_TX_FLAGS_FPORTx allows tracking which port the transmitted
+* SKB out instead of looking up through hardware TX descriptor.
+*/
+   MTK_TX_FLAGS_FPORT0 = 0x04,
+   MTK_TX_FLAGS_FPORT1 = 0x08,
 };
 
 /* This enum allows us to identify how the clock is defined on the array of the
-- 
1.9.1

[PATCH v2 net 1/2] net: ethernet: mediatek: fix inconsistency between TXD and the used buffer

2017-04-13 Thread sean.wang

From: Sean Wang 

Fix inconsistency between the TXD descriptor and the used buffer that
would cause unexpected logic at mtk_tx_unmap() during skb housekeeping.

Signed-off-by: Sean Wang 
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 14e1bd1..48ba617 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -613,7 +613,7 @@ static int mtk_tx_map(struct sk_buff *skb, struct 
net_device *dev,
struct mtk_mac *mac = netdev_priv(dev);
struct mtk_eth *eth = mac->hw;
struct mtk_tx_dma *itxd, *txd;
-   struct mtk_tx_buf *tx_buf;
+   struct mtk_tx_buf *itx_buf, *tx_buf;
dma_addr_t mapped_addr;
unsigned int nr_frags;
int i, n_desc = 1;
@@ -627,8 +627,8 @@ static int mtk_tx_map(struct sk_buff *skb, struct 
net_device *dev,
fport = (mac->id + 1) << TX_DMA_FPORT_SHIFT;
txd4 |= fport;
 
-   tx_buf = mtk_desc_to_tx_buf(ring, itxd);
-   memset(tx_buf, 0, sizeof(*tx_buf));
+   itx_buf = mtk_desc_to_tx_buf(ring, itxd);
+   memset(itx_buf, 0, sizeof(*itx_buf));
 
if (gso)
txd4 |= TX_DMA_TSO;
@@ -647,9 +647,9 @@ static int mtk_tx_map(struct sk_buff *skb, struct 
net_device *dev,
return -ENOMEM;
 
WRITE_ONCE(itxd->txd1, mapped_addr);
-   tx_buf->flags |= MTK_TX_FLAGS_SINGLE0;
-   dma_unmap_addr_set(tx_buf, dma_addr0, mapped_addr);
-   dma_unmap_len_set(tx_buf, dma_len0, skb_headlen(skb));
+   itx_buf->flags |= MTK_TX_FLAGS_SINGLE0;
+   dma_unmap_addr_set(itx_buf, dma_addr0, mapped_addr);
+   dma_unmap_len_set(itx_buf, dma_len0, skb_headlen(skb));
 
/* TX SG offload */
txd = itxd;
@@ -685,10 +685,9 @@ static int mtk_tx_map(struct sk_buff *skb, struct 
net_device *dev,
   last_frag * TX_DMA_LS0));
WRITE_ONCE(txd->txd4, fport);
 
-   tx_buf->skb = (struct sk_buff *)MTK_DMA_DUMMY_DESC;
tx_buf = mtk_desc_to_tx_buf(ring, txd);
memset(tx_buf, 0, sizeof(*tx_buf));
-
+   tx_buf->skb = (struct sk_buff *)MTK_DMA_DUMMY_DESC;
tx_buf->flags |= MTK_TX_FLAGS_PAGE0;
dma_unmap_addr_set(tx_buf, dma_addr0, mapped_addr);
dma_unmap_len_set(tx_buf, dma_len0, frag_map_size);
@@ -698,7 +697,7 @@ static int mtk_tx_map(struct sk_buff *skb, struct 
net_device *dev,
}
 
/* store skb to cleanup */
-   tx_buf->skb = skb;
+   itx_buf->skb = skb;
 
WRITE_ONCE(itxd->txd4, txd4);
WRITE_ONCE(itxd->txd3, (TX_DMA_SWC | TX_DMA_PLEN0(skb_headlen(skb)) |
-- 
1.9.1

[PATCH v2 net 0/2] Fix crash caused by reporting inconsistent skb->len to BQL

2017-04-13 Thread sean.wang

From: Sean Wang 

Changes since v1:
- fix inconsistent enumeration which easily causes the potential bug

The series fixes kernel BUG caused by inconsistent SKB length reported
into BQL. The reason for inconsistent length comes from hardware BUG which
results in different port number carried on the TXD within the lifecycle of
SKB. So patch 2) is proposed for use a software way to track which port
the SKB involving instead of hardware way. And patch 1) is given for another
issue I found which causes TXD and SKB inconsistency that is not expected
in the initial logic, so it is also being corrected it in the series.

The log for the kernel BUG caused by the issue is posted as below.

[  120.825955] kernel BUG at ... lib/dynamic_queue_limits.c:26!
[  120.837684] Internal error: Oops - BUG: 0 [#1] SMP ARM
[  120.842778] Modules linked in:
[  120.845811] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
4.11.0-rc1-191576-gdbcef47 #35
[  120.853488] Hardware name: Mediatek Cortex-A7 (Device Tree)
[  120.859012] task: c1007480 task.stack: c100
[  120.863510] PC is at dql_completed+0x108/0x17c
[  120.867915] LR is at 0x46
[  120.870512] pc : []lr : [<0046>]psr: 8113
[  120.870512] sp : c1001d58  ip : c1001d80  fp : c1001d7c
[  120.881895] r10: 003e  r9 : df6b3400  r8 : 0ed86506
[  120.887075] r7 : 0001  r6 : 0001  r5 : 0ed8654c  r4 : df0135d8
[  120.893546] r3 : 0001  r2 : df016800  r1 : fece  r0 : df6b3480
[  120.900018] Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[  120.907093] Control: 10c5387d  Table: 9e27806a  DAC: 0051
[  120.912789] Process swapper/0 (pid: 0, stack limit = 0xc1000218)
[  120.918744] Stack: (0xc1001d58 to 0xc1002000)



121.085331] 1fc0:  c0a52a28  c10855d4 c1003c58 c0a52a24 
c100885c 8000406a
[  121.093444] 1fe0: 410fc073   c1001ff8 8000807c c0a009cc 
 
[  121.101575] [] (dql_completed) from [] 
(mtk_napi_tx+0x1d0/0x37c)
[  121.109263] [] (mtk_napi_tx) from [] 
(net_rx_action+0x24c/0x3b8)
[  121.116951] [] (net_rx_action) from [] 
(__do_softirq+0xe4/0x35c)
[  121.124638] [] (__do_softirq) from [] 
(irq_exit+0xe8/0x150)
[  121.131895] [] (irq_exit) from [] 
(__handle_domain_irq+0x70/0xc4)
[  121.139666] [] (__handle_domain_irq) from [] 
(gic_handle_irq+0x58/0x9c)
[  121.147953] [] (gic_handle_irq) from [] 
(__irq_svc+0x6c/0x90)
[  121.155373] Exception stack(0xc1001ef8 to 0xc1001f40)

Sean Wang (2):
  net: ethernet: mediatek: fix inconsistency between TXD and the used
buffer
  net: ethernet: mediatek: fix inconsistency of port number carried in
TXD

 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 31 -
 drivers/net/ethernet/mediatek/mtk_eth_soc.h | 12 ---
 2 files changed, 26 insertions(+), 17 deletions(-)

-- 
1.9.1

Re: [PATCH net] xfrm: calculate L4 checksums also for GSO case before encrypting packets

2017-04-13 Thread Ansis Atteka

On 13 April 2017 at 19:45, Ansis Atteka  wrote:
>
>
>
> On 11 April 2017 at 00:07, Steffen Klassert  
> wrote:
>>
>> On Mon, Apr 10, 2017 at 11:42:07AM -0700, Ansis Atteka wrote:
>> > Otherwise, if L4 checksum calculation is done after encryption,
>> > then all ESP packets end up being corrupted at the location
>> > where pre-encryption L4 checksum field resides.
>> >
>> > One of the ways to reproduce this bug is to have a VM with virtio_net
>> > driver (UFO set to ON in the guest VM); and then encapsulate all guest's
>> > Ethernet frames in GENEVE; and then further encrypt GENEVE with IPsec.
>> > In this case following symptoms are observed:
>> > 1. If using ixgbe NIC, then the driver will also emit following
>> >warning message:
>> >ixgbe :01:00.1: partial checksum but l4 proto=32!
>> > 2. Receiving VM will drop all the corrupted ESP packets, hence UDP iperf 
>> > test
>> >with large packets will fail completely or TCP iperf will get 
>> > ridiculously
>> >low performance because TCP window will never grow above MTU.
>> >
>> > Signed-off-by: Ansis Atteka 
>> > ---
>> >  net/xfrm/xfrm_output.c | 19 +--
>> >  1 file changed, 13 insertions(+), 6 deletions(-)
>> >
>> > diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
>> > index 8ba29fe..7ad7e5f 100644
>> > --- a/net/xfrm/xfrm_output.c
>> > +++ b/net/xfrm/xfrm_output.c
>> > @@ -168,7 +168,8 @@ static int xfrm_output2(struct net *net, struct sock 
>> > *sk, struct sk_buff *skb)
>> >
>> >  static int xfrm_output_gso(struct net *net, struct sock *sk, struct 
>> > sk_buff *skb)
>> >  {
>> > - struct sk_buff *segs;
>> > + struct sk_buff *segs, *nskb;
>> > + int err;
>> >
>> >   BUILD_BUG_ON(sizeof(*IPCB(skb)) > SKB_SGO_CB_OFFSET);
>> >   BUILD_BUG_ON(sizeof(*IP6CB(skb)) > SKB_SGO_CB_OFFSET);
>> > @@ -180,21 +181,27 @@ static int xfrm_output_gso(struct net *net, struct 
>> > sock *sk, struct sk_buff *skb
>> >   return -EINVAL;
>> >
>> >   do {
>> > - struct sk_buff *nskb = segs->next;
>> > - int err;
>> > + nskb = segs->next;
>> >
>> >   segs->next = NULL;
>> > - err = xfrm_output2(net, sk, segs);
>> > + err = skb_checksum_help(segs);
>>
>> What's wrong with the checksum provided by the GSO layer and
>> why we have to do this unconditionally here?

I believe with "GSO layer" you meant the skb_gso_segment() function
invocation from xfrm_output_gso()?

If so, then the problem with that is that the list of the skb's
returned by that function could be in CHECKSUM_PARTIAL state, if skbs
came from a UDP tunnel such as Geneve:

   xfrm_output() {
 __skb_gso_segment() {
   skb_mac_gso_segment() {
 skb_network_protocol();
 inet_gso_segment() {
   udp4_ufo_fragment() {
 skb_udp_tunnel_segment() {
   skb_mac_gso_segment() {
 skb_network_protocol();
 inet_gso_segment() {
   udp4_ufo_fragment() {
 skb_checksum() {
   __skb_checksum() {
 csum_partial() {
   do_csum();
 }
 csum_partial() {
   do_csum();
 }
   }


Since those skbs could remain in CHECKSUM_PARTIAL state even after
IPsec encryption, then ixgbe tries to calculate L4 checksums on
already encrypted skb where L4 layer is already protected through
IPsec integrity checks. Hence, ESP packets end up being corrupted and
dropped on receive side by XFRM. I clearly see this ESP packet
corruption happening by observing:
1. in wireshark that the same ESP packet differs at the offset where
UDP checksum field should reside; AND
2. in dmesg that ixgbe driver complains on send side with "partial
checksum but L4 proto is 0x32 (ESP)". AND
3. in /proc/net/xfrm_stat where XfrmInStateProtoErrorcounter is
incremented on receive side each time it receives corrupted packet.


>>
>>
>> We don't announce any checksum capabilities, so the GSO
>> layer should provide the checksum. If this is not the case,
>> something along the path is taking wrong assumptions.
>
>
The same explicit checksum calculation is done from xfrm_output() for
non-GSO case, so it was tempting for me to simply put a similar
skb_checksum_help() for GSO case as well.
>
>> Btw. all GSO packets on a standard IPv4 xfrm tunnel are getting
>> dropped with your patch applied.
>>
I think I just noticed possible issue with my patch that I sent out.
In your setup were packets getting dropped on receive side due to UDP
checksum failure (and not IPsec integrity check failure)? If so then I
wonder if after my patch applied skb_checksum_help() was called twice
under conditions that you tested for. Hence the skbs ended up with
wrong checksums.

So, would you mind

repost: af_packet vs virtio (was packed ring layout proposal v2)

2017-04-13 Thread Michael S. Tsirkin

On Fri, Apr 14, 2017 at 05:42:58AM +0300, Michael S. Tsirkin wrote:
> Hi all, I wanted to raise the question of similarities between virtio
> and new zero copy af_packet interfaces.
> 
> First I would like to mention that virtio device development isn't spec
> limited - spec is there to help interoperability and add peace of mind
> for people worried about IPR.
> 
> So I tend to accept patches without requiring people write it up in the
> spec as work on spec proceeds at its own pace - all I ask is that the
> virtio mailing list is copied, this requires contributor to subscribe
> and in the process contributor promises that it's ok for us to add this
> to spec in the future.
> 
> There shouldn't thus be a fundamental problem preventing use of virtio
> format or reusing some of the code for af_packet, but it still might or
> might not make sense - it was designed for CPU to CPU communication so
> it seems to make sense though.  So I would like that discussion to
> happen even if we decide against.
> 
> And even if people decide against, the problem space is very similar.  You
> can look up packed ring layout proposal v2 - should I repost here?  Our
> prototyping shows significant performance improvements from using it as
> compared to head/tail layout.
> 
> To start this discission I'm going to reply to this email reposting a
> copy of the simplified virtio layout that might be appropriate for
> af_packet as well.

Here's the repost (slightly cut down) sorry about the duplicates.

The idea is to have a r/w descriptor in a ring structure,
replacing the used and available ring, index and descriptor
buffer.

* Descriptor ring:

Guest adds descriptors with unique index values and DESC_HW set in flags.
Host overwrites used descriptors with correct len, index, and DESC_HW
clear.  Flags are always set/cleared last.

#define DESC_HW 0x0080

struct desc {
__le64 addr;
__le32 len;
__le16 index;
__le16 flags;
};

When DESC_HW is set, descriptor belongs to device. When it is clear,
it belongs to the driver.

We can use 1 bit to set direction
/* This marks a buffer as write-only (otherwise read-only). */
#define VRING_DESC_F_WRITE  2

* Scatter/gather support

We can use 1 bit to chain s/g entries in a request, same as virtio 1.0:

/* This marks a buffer as continuing via the next field. */
#define VRING_DESC_F_NEXT   1

Unlike virtio 1.0, all descriptors must have distinct ID values.

Also unlike virtio 1.0, use of this flag will be an optional feature
(e.g. VIRTIO_F_DESC_NEXT) so both devices and drivers can opt out of it.

* Indirect buffers

Can be marked like in virtio 1.0:

/* This means the buffer contains a table of buffer descriptors. */
#define VRING_DESC_F_INDIRECT   4

Unlike virtio 1.0, this is a table, not a list:
struct indirect_descriptor_table {
/* The actual descriptors (16 bytes each) */
struct virtq_desc desc[len / 16];
};

The first descriptor is located at start of the indirect descriptor
table, additional indirect descriptors come immediately afterwards.
DESC_F_WRITE is the only valid flag for descriptors in the indirect
table. Others should be set to 0 and are ignored.  id is also set to 0
and should be ignored.

virtio 1.0 seems to allow a s/g entry followed by
an indirect descriptor. This does not seem useful,
so we do not allow that anymore.

This support would be an optional feature, same as in virtio 1.0

* Batching descriptors:

virtio 1.0 allows passing a batch of descriptors in both directions, by
incrementing the used/avail index by values > 1.  We can support this by
chaining a list of descriptors through a bit the flags field.
To allow use together with s/g, a different bit will be used.

#define VRING_DESC_F_BATCH_NEXT 0x0010

Batching works for both driver and device descriptors.

* Processing descriptors in and out of order

Device processing all descriptors in order can simply flip
the DESC_HW bit as it is done with descriptors.

Device can write descriptors out in order as they are used, overwriting
descriptors that are there.

Device must not use a descriptor until DESC_HW is set.
It is only required to look at the first descriptor
submitted.

Driver must not overwrite a descriptor until DESC_HW is clear.
It is only required to look at the first descriptor
submitted.

* Device specific descriptor flags
We have a lot of unused space in the descriptor.  This can be put to
good use by reserving some flag bits for device use.
For example, network device can set a bit to request
that header in the descriptor is suppressed
(in case it's all 0s anyway). This reduces cache utilization.

Note: this feature can be supported in virtio 1.0 as well,
as we have unused bits in both descriptor and used ring there.

* Descriptor length in device descriptors

virtio 1.0 places strict requirements on descriptor length. For example
it must be 0 in used ring of TX VQ of a network device since nothing is
written.  In practice guests do not

af_packet vs virtio

2017-04-13 Thread Michael S. Tsirkin

Hi all, I wanted to raise the question of similarities between virtio
and new zero copy af_packet interfaces.

First I would like to mention that virtio device development isn't spec
limited - spec is there to help interoperability and add peace of mind
for people worried about IPR.

So I tend to accept patches without requiring people write it up in the
spec as work on spec proceeds at its own pace - all I ask is that the
virtio mailing list is copied, this requires contributor to subscribe
and in the process contributor promises that it's ok for us to add this
to spec in the future.

There shouldn't thus be a fundamental problem preventing use of virtio
format or reusing some of the code for af_packet, but it still might or
might not make sense - it was designed for CPU to CPU communication so
it seems to make sense though.  So I would like that discussion to
happen even if we decide against.

And even if people decide against, the problem space is very similar.  You
can look up packed ring layout proposal v2 - should I repost here?  Our
prototyping shows significant performance improvements from using it as
compared to head/tail layout.

To start this discission I'm going to reply to this email reposting a
copy of the simplified virtio layout that might be appropriate for
af_packet as well.

-- 
MST

Re: [PATCH v2 net-next 5/8] net/ncsi: Dump NCSI packet statistics

2017-04-13 Thread Joe Perches

On Thu, 2017-04-13 at 17:48 +1000, Gavin Shan wrote:
> This creates /sys/kernel/debug/ncsi//stats to dump the NCSI
> packets sent and received over all packages and channels. It's useful
> to diagnose NCSI problems, especially when NCSI packages and channels
> aren't probed properly. The statistics can be gained from debugfs file
> as below:
> 
>  # cat /sys/kernel/debug/ncsi/eth0/stats
> 
>  CMD  OK   TIMEOUT  ERROR
>  ===
>  CIS  32   29   0
>  SP   10   70
>  DP   17   14   0
>  EC   100
>  ECNT 100
>  AE   100
>  GLS  11   00
>  SMA  100
>  EBF  100
>  GVI  200
>  GC   200

more trivia:

> diff --git a/net/ncsi/ncsi-debug.c b/net/ncsi/ncsi-debug.c
[]
> @@ -23,6 +23,235 @@
>  #include "ncsi-pkt.h"
>  
>  static struct dentry *ncsi_dentry;
> +static struct ncsi_pkt_handler {
> + unsigned char   type;
> + const char  *name;
> +} ncsi_pkt_handlers[] = {
> + { NCSI_PKT_CMD_CIS,"CIS"},
> + { NCSI_PKT_CMD_SP, "SP" },
> + { NCSI_PKT_CMD_DP, "DP" },
> + { NCSI_PKT_CMD_EC, "EC" },
> + { NCSI_PKT_CMD_DC, "DC" },
> + { NCSI_PKT_CMD_RC, "RC" },
> + { NCSI_PKT_CMD_ECNT,   "ECNT"   },
> + { NCSI_PKT_CMD_DCNT,   "DCNT"   },
> + { NCSI_PKT_CMD_AE, "AE" },
> + { NCSI_PKT_CMD_SL, "SL" },
> + { NCSI_PKT_CMD_GLS,"GLS"},
> + { NCSI_PKT_CMD_SVF,"SVF"},
> + { NCSI_PKT_CMD_EV, "EV" },
> + { NCSI_PKT_CMD_DV, "DV" },
> + { NCSI_PKT_CMD_SMA,"SMA"},
> + { NCSI_PKT_CMD_EBF,"EBF"},
> + { NCSI_PKT_CMD_DBF,"DBF"},
> + { NCSI_PKT_CMD_EGMF,   "EGMF"   },
> + { NCSI_PKT_CMD_DGMF,   "DGMF"   },
> + { NCSI_PKT_CMD_SNFC,   "SNFC"   },
> + { NCSI_PKT_CMD_GVI,"GVI"},
> + { NCSI_PKT_CMD_GC, "GC" },
> + { NCSI_PKT_CMD_GP, "GP" },
> + { NCSI_PKT_CMD_GCPS,   "GCPS"   },
> + { NCSI_PKT_CMD_GNS,"GNS"},
> + { NCSI_PKT_CMD_GNPTS,  "GNPTS"  },
> + { NCSI_PKT_CMD_GPS,"GPS"},
> + { NCSI_PKT_CMD_OEM,"OEM"},
> + { NCSI_PKT_CMD_PLDM,   "PLDM"   },
> + { NCSI_PKT_CMD_GPUUID, "GPUUID" },

I don't know how common these are and how
intelligible these acronyms are to knowledgeable
developer/users, but maybe it'd be better to
spell out what these are instead of having to
look up what the acronyms stand for

CIS - Clear Initial State
SP - Select Package
etc...

Maybe copy the descriptions from the ncsi-pkt.h file

#define NCSI_PKT_CMD_CIS0x00 /* Clear Initial State  */
#define NCSI_PKT_CMD_SP 0x01 /* Select Package   */
#define NCSI_PKT_CMD_DP 0x02 /* Deselect Package */
#define NCSI_PKT_CMD_EC 0x03 /* Enable Channel   */
#define NCSI_PKT_CMD_DC 0x04 /* Disable Channel  */
#define NCSI_PKT_CMD_RC 0x05 /* Reset Channel*/
#define NCSI_PKT_CMD_ECNT   0x06 /* Enable Channel Network Tx*/
#define NCSI_PKT_CMD_DCNT   0x07 /* Disable Channel Network Tx   */
#define NCSI_PKT_CMD_AE 0x08 /* AEN Enable   */
#define NCSI_PKT_CMD_SL 0x09 /* Set Link */
#define NCSI_PKT_CMD_GLS0x0a /* Get Link */
#define NCSI_PKT_CMD_SVF0x0b /* Set VLAN Filter  */
#define NCSI_PKT_CMD_EV 0x0c /* Enable VLAN  */
#define NCSI_PKT_CMD_DV 0x0d /* Disable VLAN */
#define NCSI_PKT_CMD_SMA0x0e /* Set MAC address  */
#define NCSI_PKT_CMD_EBF0x10 /* Enable Broadcast Filter  */
#define NCSI_PKT_CMD_DBF0x11 /* Disable Broadcast Filter */
#define NCSI_PKT_CMD_EGMF   0x12 /* Enable Global Multicast Filter   */
#define NCSI_PKT_CMD_DGMF   0x13 /* Disable Global Multicast Filter  */
#define NCSI_PKT_CMD_SNFC   0x14 /* Set NCSI Flow Control*/
#define NCSI_PKT_CMD_GVI0x15 /* Get Version ID   */
#define NCSI_PKT_CMD_GC 0x16 /* Get Capabilities */
#define NCSI_PKT_CMD_GP 0x17 /* Get Parameters   */
#define NCSI_PKT_CMD_GCPS   0x18 /* Get Controller Packet Statistics */
#define NCSI_PKT_CMD_GNS0x19 /* Get NCSI Statistics  */
#define NCSI_PKT_CMD_GNPTS  0x1a /* Get NCSI Pass-throu Statistics   */
#define NCSI_PKT_CMD_GPS0x1b /* Get package status   */
#define NCSI_PKT_CMD_OEM0x50 /* OEM  */
#define NCSI_PKT_CMD_PLDM   0x51 /* PLDM request over NCSI

Re: [PATCH v2 net-next 5/8] net/ncsi: Dump NCSI packet statistics

2017-04-13 Thread Jakub Kicinski

Hi!

On Thu, 13 Apr 2017 17:48:18 +1000, Gavin Shan wrote:
> This creates /sys/kernel/debug/ncsi//stats to dump the NCSI
> packets sent and received over all packages and channels. It's useful
> to diagnose NCSI problems, especially when NCSI packages and channels
> aren't probed properly. The statistics can be gained from debugfs file
> as below:
> 
>  # cat /sys/kernel/debug/ncsi/eth0/stats
> 
>  CMD  OK   TIMEOUT  ERROR
>  ===
>  CIS  32   29   0
>  SP   10   70
>  DP   17   14   0
>  EC   100
>  ECNT 100
>  AE   100
>  GLS  11   00
>  SMA  100
>  EBF  100
>  GVI  200
>  GC   200
> 
>  RSP  OK   TIMEOUT  ERROR
>  ===
>  CIS  300
>  SP   300
>  DP   201
>  EC   100
>  ECNT 100
>  AE   100
>  GLS  11   00
>  SMA  100
>  EBF  100
>  GVI  002
>  GC   200
> 
>  AEN  OK   TIMEOUT  ERROR
>  ===
> 
> Signed-off-by: Gavin Shan 

I'm not familiar with NC-SI but these look like some standard stats.
Would it make sense to provide a proper netlink API for them?

[...]
> +#ifdef CONFIG_NET_NCSI_DEBUG
> + ndp->stats.aen[h->type][NCSI_PKT_STAT_ERROR]++;
> +#endif

In any case, did you consider creating a macro or inline helper to
limit the number of #ifdefs?

[PATCH 3/9] netfilter: helper: Add the rcu lock when call __nf_conntrack_helper_find

2017-04-13 Thread Pablo Neira Ayuso

From: Gao Feng 

When invoke __nf_conntrack_helper_find, it needs the rcu lock to
protect the helper module which would not be unloaded.

Now there are two caller nf_conntrack_helper_try_module_get and
ctnetlink_create_expect which don't hold rcu lock. And the other
callers left like ctnetlink_change_helper, ctnetlink_create_conntrack,
and ctnetlink_glue_attach_expect, they already hold the rcu lock
or spin_lock_bh.

Remove the rcu lock in functions nf_ct_helper_expectfn_find_by_name
and nf_ct_helper_expectfn_find_by_symbol. Because they return one pointer
which needs rcu lock, so their caller should hold the rcu lock, not in
these two functions.

Signed-off-by: Gao Feng 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_conntrack_helper.c  | 17 -
 net/netfilter/nf_conntrack_netlink.c | 10 --
 2 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/net/netfilter/nf_conntrack_helper.c 
b/net/netfilter/nf_conntrack_helper.c
index 6dc44d9b4190..4eeb3418366a 100644
--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -158,16 +158,25 @@ nf_conntrack_helper_try_module_get(const char *name, u16 
l3num, u8 protonum)
 {
struct nf_conntrack_helper *h;
 
+   rcu_read_lock();
+
h = __nf_conntrack_helper_find(name, l3num, protonum);
 #ifdef CONFIG_MODULES
if (h == NULL) {
-   if (request_module("nfct-helper-%s", name) == 0)
+   rcu_read_unlock();
+   if (request_module("nfct-helper-%s", name) == 0) {
+   rcu_read_lock();
h = __nf_conntrack_helper_find(name, l3num, protonum);
+   } else {
+   return h;
+   }
}
 #endif
if (h != NULL && !try_module_get(h->me))
h = NULL;
 
+   rcu_read_unlock();
+
return h;
 }
 EXPORT_SYMBOL_GPL(nf_conntrack_helper_try_module_get);
@@ -311,38 +320,36 @@ void nf_ct_helper_expectfn_unregister(struct 
nf_ct_helper_expectfn *n)
 }
 EXPORT_SYMBOL_GPL(nf_ct_helper_expectfn_unregister);
 
+/* Caller should hold the rcu lock */
 struct nf_ct_helper_expectfn *
 nf_ct_helper_expectfn_find_by_name(const char *name)
 {
struct nf_ct_helper_expectfn *cur;
bool found = false;
 
-   rcu_read_lock();
list_for_each_entry_rcu(cur, _ct_helper_expectfn_list, head) {
if (!strcmp(cur->name, name)) {
found = true;
break;
}
}
-   rcu_read_unlock();
return found ? cur : NULL;
 }
 EXPORT_SYMBOL_GPL(nf_ct_helper_expectfn_find_by_name);
 
+/* Caller should hold the rcu lock */
 struct nf_ct_helper_expectfn *
 nf_ct_helper_expectfn_find_by_symbol(const void *symbol)
 {
struct nf_ct_helper_expectfn *cur;
bool found = false;
 
-   rcu_read_lock();
list_for_each_entry_rcu(cur, _ct_helper_expectfn_list, head) {
if (cur->expectfn == symbol) {
found = true;
break;
}
}
-   rcu_read_unlock();
return found ? cur : NULL;
 }
 EXPORT_SYMBOL_GPL(nf_ct_helper_expectfn_find_by_symbol);
diff --git a/net/netfilter/nf_conntrack_netlink.c 
b/net/netfilter/nf_conntrack_netlink.c
index 59ee27deb9a0..06d28ac663df 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -3133,23 +3133,27 @@ ctnetlink_create_expect(struct net *net,
return -ENOENT;
ct = nf_ct_tuplehash_to_ctrack(h);
 
+   rcu_read_lock();
if (cda[CTA_EXPECT_HELP_NAME]) {
const char *helpname = nla_data(cda[CTA_EXPECT_HELP_NAME]);
 
helper = __nf_conntrack_helper_find(helpname, u3,
nf_ct_protonum(ct));
if (helper == NULL) {
+   rcu_read_unlock();
 #ifdef CONFIG_MODULES
if (request_module("nfct-helper-%s", helpname) < 0) {
err = -EOPNOTSUPP;
goto err_ct;
}
+   rcu_read_lock();
helper = __nf_conntrack_helper_find(helpname, u3,
nf_ct_protonum(ct));
if (helper) {
err = -EAGAIN;
-   goto err_ct;
+   goto err_rcu;
}
+   rcu_read_unlock();
 #endif
err = -EOPNOTSUPP;
goto err_ct;
@@ -3159,11 +3163,13 @@ ctnetlink_create_expect(struct net *net,
exp = ctnetlink_alloc_expect(cda, ct, helper, , );
if (IS_ERR(exp)) {
err = PTR_ERR(exp);
-   goto err_ct;
+   goto

[PATCH 7/9] netfilter: nf_ct_expect: use proper RCU list traversal/update APIs

2017-04-13 Thread Pablo Neira Ayuso

From: Liping Zhang 

We should use proper RCU list APIs to manipulate help->expectations,
as we can dump the conntrack's expectations via nfnetlink, i.e. in
ctnetlink_exp_ct_dump_table(), where only rcu_read_lock is acquired.

So for list traversal, use hlist_for_each_entry_rcu; for list add/del,
use hlist_add_head_rcu and hlist_del_rcu.

Signed-off-by: Liping Zhang 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_conntrack_expect.c  | 4 ++--
 net/netfilter/nf_conntrack_netlink.c | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/nf_conntrack_expect.c 
b/net/netfilter/nf_conntrack_expect.c
index 4b2e1fb28bb4..d80073037856 100644
--- a/net/netfilter/nf_conntrack_expect.c
+++ b/net/netfilter/nf_conntrack_expect.c
@@ -57,7 +57,7 @@ void nf_ct_unlink_expect_report(struct nf_conntrack_expect 
*exp,
hlist_del_rcu(>hnode);
net->ct.expect_count--;
 
-   hlist_del(>lnode);
+   hlist_del_rcu(>lnode);
master_help->expecting[exp->class]--;
 
nf_ct_expect_event_report(IPEXP_DESTROY, exp, portid, report);
@@ -363,7 +363,7 @@ static void nf_ct_expect_insert(struct nf_conntrack_expect 
*exp)
/* two references : one for hash insert, one for the timer */
atomic_add(2, >use);
 
-   hlist_add_head(>lnode, _help->expectations);
+   hlist_add_head_rcu(>lnode, _help->expectations);
master_help->expecting[exp->class]++;
 
hlist_add_head_rcu(>hnode, _ct_expect_hash[h]);
diff --git a/net/netfilter/nf_conntrack_netlink.c 
b/net/netfilter/nf_conntrack_netlink.c
index f78eadba343d..dc7dfd68fafe 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -2680,8 +2680,8 @@ ctnetlink_exp_dump_table(struct sk_buff *skb, struct 
netlink_callback *cb)
last = (struct nf_conntrack_expect *)cb->args[1];
for (; cb->args[0] < nf_ct_expect_hsize; cb->args[0]++) {
 restart:
-   hlist_for_each_entry(exp, _ct_expect_hash[cb->args[0]],
-hnode) {
+   hlist_for_each_entry_rcu(exp, _ct_expect_hash[cb->args[0]],
+hnode) {
if (l3proto && exp->tuple.src.l3num != l3proto)
continue;
 
@@ -2732,7 +2732,7 @@ ctnetlink_exp_ct_dump_table(struct sk_buff *skb, struct 
netlink_callback *cb)
rcu_read_lock();
last = (struct nf_conntrack_expect *)cb->args[1];
 restart:
-   hlist_for_each_entry(exp, >expectations, lnode) {
+   hlist_for_each_entry_rcu(exp, >expectations, lnode) {
if (l3proto && exp->tuple.src.l3num != l3proto)
continue;
if (cb->args[1]) {
-- 
2.1.4

[PATCH 9/9] netfilter: ipt_CLUSTERIP: Fix wrong conntrack netns refcnt usage

2017-04-13 Thread Pablo Neira Ayuso

From: Gao Feng 

Current codes invoke wrongly nf_ct_netns_get in the destroy routine,
it should use nf_ct_netns_put, not nf_ct_netns_get.
It could cause some modules could not be unloaded.

Fixes: ecb2421b5ddf ("netfilter: add and use nf_ct_netns_get/put")
Signed-off-by: Gao Feng 
Signed-off-by: Pablo Neira Ayuso 
---
 net/ipv4/netfilter/ipt_CLUSTERIP.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c 
b/net/ipv4/netfilter/ipt_CLUSTERIP.c
index 52f26459efc3..9b8841316e7b 100644
--- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
+++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
@@ -461,7 +461,7 @@ static void clusterip_tg_destroy(const struct 
xt_tgdtor_param *par)
 
clusterip_config_put(cipinfo->config);
 
-   nf_ct_netns_get(par->net, par->family);
+   nf_ct_netns_put(par->net, par->family);
 }
 
 #ifdef CONFIG_COMPAT
-- 
2.1.4

[PATCH 1/9] netfilter: xt_TCPMSS: add more sanity tests on tcph->doff

2017-04-13 Thread Pablo Neira Ayuso

From: Eric Dumazet 

Denys provided an awesome KASAN report pointing to an use
after free in xt_TCPMSS

I have provided three patches to fix this issue, either in xt_TCPMSS or
in xt_tcpudp.c. It seems xt_TCPMSS patch has the smallest possible
impact.

Signed-off-by: Eric Dumazet 
Reported-by: Denys Fedoryshchenko 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/xt_TCPMSS.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/xt_TCPMSS.c b/net/netfilter/xt_TCPMSS.c
index 27241a767f17..c64aca611ac5 100644
--- a/net/netfilter/xt_TCPMSS.c
+++ b/net/netfilter/xt_TCPMSS.c
@@ -104,7 +104,7 @@ tcpmss_mangle_packet(struct sk_buff *skb,
tcph = (struct tcphdr *)(skb_network_header(skb) + tcphoff);
tcp_hdrlen = tcph->doff * 4;
 
-   if (len < tcp_hdrlen)
+   if (len < tcp_hdrlen || tcp_hdrlen < sizeof(struct tcphdr))
return -1;
 
if (info->mss == XT_TCPMSS_CLAMP_PMTU) {
@@ -152,6 +152,10 @@ tcpmss_mangle_packet(struct sk_buff *skb,
if (len > tcp_hdrlen)
return 0;
 
+   /* tcph->doff has 4 bits, do not wrap it to 0 */
+   if (tcp_hdrlen >= 15 * 4)
+   return 0;
+
/*
 * MSS Option not found ?! add it..
 */
-- 
2.1.4

[PATCH 4/9] netfilter: ctnetlink: make it safer when checking the ct helper name

2017-04-13 Thread Pablo Neira Ayuso

From: Liping Zhang 

One CPU is doing ctnetlink_change_helper(), while another CPU is doing
unhelp() at the same time. So even if help->helper is not NULL at first,
the later statement strcmp(help->helper->name, ...) may still access
the NULL pointer.

So we must use rcu_read_lock and rcu_dereference to avoid such _bad_
thing happen.

Fixes: f95d7a46bc57 ("netfilter: ctnetlink: Fix regression in CTA_HELP 
processing")
Signed-off-by: Liping Zhang 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_conntrack_netlink.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/nf_conntrack_netlink.c 
b/net/netfilter/nf_conntrack_netlink.c
index 06d28ac663df..f9c643bc1a8e 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -1488,11 +1488,16 @@ static int ctnetlink_change_helper(struct nf_conn *ct,
 * treat the second attempt as a no-op instead of returning
 * an error.
 */
-   if (help && help->helper &&
-   !strcmp(help->helper->name, helpname))
-   return 0;
-   else
-   return -EBUSY;
+   err = -EBUSY;
+   if (help) {
+   rcu_read_lock();
+   helper = rcu_dereference(help->helper);
+   if (helper && !strcmp(helper->name, helpname))
+   err = 0;
+   rcu_read_unlock();
+   }
+
+   return err;
}
 
if (!strcmp(helpname, "")) {
-- 
2.1.4

[PATCH 0/9] Netfilter fixes for net

2017-04-13 Thread Pablo Neira Ayuso

Hi David,

The following patchset contains Netfilter fixes for your net tree,
they are:

1) Missing TCP header sanity check in TCPMSS target, from Eric Dumazet.

2) Incorrect event message type for related conntracks created via
   ctnetlink, from Liping Zhang.

3) Fix incorrect rcu locking when handling helpers from ctnetlink,
   from Gao feng.

4) Fix missing rcu locking when updating helper, from Liping Zhang.

5) Fix missing read_lock_bh when iterating over list of device addresses
   from TPROXY and redirect, also from Liping.

6) Fix crash when trying to dump expectations from conntrack with no
   helper via ctnetlink, from Liping.

7) Missing RCU protection to expecation list update given ctnetlink
   iterates over the list under rcu read lock side, from Liping too.

8) Don't dump autogenerated seed in nft_hash to userspace, this is
   very confusing to the user, again from Liping.

9) Fix wrong conntrack netns module refcount in ipt_CLUSTERIP,
   from Gao feng.

You can pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Thanks!



The following changes since commit 0b9aefea860063bb39e36bd7fe6c7087fed0ba87:

  tcp: minimize false-positives on TCP/GRO check (2017-04-03 18:43:41 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git HEAD

for you to fetch changes up to fe50543c194e2e1aee2f3eba41fcafd187b3dbde:

  netfilter: ipt_CLUSTERIP: Fix wrong conntrack netns refcnt usage (2017-04-13 
23:21:40 +0200)


Eric Dumazet (1):
  netfilter: xt_TCPMSS: add more sanity tests on tcph->doff

Gao Feng (2):
  netfilter: helper: Add the rcu lock when call __nf_conntrack_helper_find
  netfilter: ipt_CLUSTERIP: Fix wrong conntrack netns refcnt usage

Liping Zhang (6):
  netfilter: ctnetlink: using bit to represent the ct event
  netfilter: ctnetlink: make it safer when checking the ct helper name
  netfilter: make it safer during the inet6_dev->addr_list traversal
  netfilter: ctnetlink: skip dumping expect when nfct_help(ct) is NULL
  netfilter: nf_ct_expect: use proper RCU list traversal/update APIs
  netfilter: nft_hash: do not dump the auto generated seed

 net/ipv4/netfilter/ipt_CLUSTERIP.c   |  2 +-
 net/netfilter/nf_conntrack_expect.c  |  4 ++--
 net/netfilter/nf_conntrack_helper.c  | 17 ++-
 net/netfilter/nf_conntrack_netlink.c | 41 +---
 net/netfilter/nf_nat_redirect.c  |  2 ++
 net/netfilter/nft_hash.c | 10 ++---
 net/netfilter/xt_TCPMSS.c|  6 +-
 net/netfilter/xt_TPROXY.c|  5 -
 8 files changed, 62 insertions(+), 25 deletions(-)

[PATCH 8/9] netfilter: nft_hash: do not dump the auto generated seed

2017-04-13 Thread Pablo Neira Ayuso

From: Liping Zhang 

This can prevent the nft utility from printing out the auto generated
seed to the user, which is unnecessary and confusing.

Fixes: cb1b69b0b15b ("netfilter: nf_tables: add hash expression")
Signed-off-by: Liping Zhang 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nft_hash.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/nft_hash.c b/net/netfilter/nft_hash.c
index eb2721af898d..c4dad1254ead 100644
--- a/net/netfilter/nft_hash.c
+++ b/net/netfilter/nft_hash.c
@@ -21,6 +21,7 @@ struct nft_hash {
enum nft_registers  sreg:8;
enum nft_registers  dreg:8;
u8  len;
+   boolautogen_seed:1;
u32 modulus;
u32 seed;
u32 offset;
@@ -82,10 +83,12 @@ static int nft_hash_init(const struct nft_ctx *ctx,
if (priv->offset + priv->modulus - 1 < priv->offset)
return -EOVERFLOW;
 
-   if (tb[NFTA_HASH_SEED])
+   if (tb[NFTA_HASH_SEED]) {
priv->seed = ntohl(nla_get_be32(tb[NFTA_HASH_SEED]));
-   else
+   } else {
+   priv->autogen_seed = true;
get_random_bytes(>seed, sizeof(priv->seed));
+   }
 
return nft_validate_register_load(priv->sreg, len) &&
   nft_validate_register_store(ctx, priv->dreg, NULL,
@@ -105,7 +108,8 @@ static int nft_hash_dump(struct sk_buff *skb,
goto nla_put_failure;
if (nla_put_be32(skb, NFTA_HASH_MODULUS, htonl(priv->modulus)))
goto nla_put_failure;
-   if (nla_put_be32(skb, NFTA_HASH_SEED, htonl(priv->seed)))
+   if (!priv->autogen_seed &&
+   nla_put_be32(skb, NFTA_HASH_SEED, htonl(priv->seed)))
goto nla_put_failure;
if (priv->offset != 0)
if (nla_put_be32(skb, NFTA_HASH_OFFSET, htonl(priv->offset)))
-- 
2.1.4

[PATCH 6/9] netfilter: ctnetlink: skip dumping expect when nfct_help(ct) is NULL

2017-04-13 Thread Pablo Neira Ayuso

From: Liping Zhang 

For IPCTNL_MSG_EXP_GET, if the CTA_EXPECT_MASTER attr is specified, then
the NLM_F_DUMP request will dump the expectations related to this
connection tracking.

But we forget to check whether the conntrack has nf_conn_help or not,
so if nfct_help(ct) is NULL, oops will happen:

 BUG: unable to handle kernel NULL pointer dereference at 0008
 IP: ctnetlink_exp_ct_dump_table+0xf9/0x1e0 [nf_conntrack_netlink]
 Call Trace:
  ? ctnetlink_exp_ct_dump_table+0x75/0x1e0 [nf_conntrack_netlink]
  netlink_dump+0x124/0x2a0
  __netlink_dump_start+0x161/0x190
  ctnetlink_dump_exp_ct+0x16c/0x1bc [nf_conntrack_netlink]
  ? ctnetlink_exp_fill_info.constprop.33+0xf0/0xf0 [nf_conntrack_netlink]
  ? ctnetlink_glue_seqadj+0x20/0x20 [nf_conntrack_netlink]
  ctnetlink_get_expect+0x32e/0x370 [nf_conntrack_netlink]
  ? debug_lockdep_rcu_enabled+0x1d/0x20
  nfnetlink_rcv_msg+0x60a/0x6a9 [nfnetlink]
  ? nfnetlink_rcv_msg+0x1b9/0x6a9 [nfnetlink]
  [...]

Signed-off-by: Liping Zhang 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_conntrack_netlink.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/net/netfilter/nf_conntrack_netlink.c 
b/net/netfilter/nf_conntrack_netlink.c
index f9c643bc1a8e..f78eadba343d 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -2794,6 +2794,12 @@ static int ctnetlink_dump_exp_ct(struct net *net, struct 
sock *ctnl,
return -ENOENT;
 
ct = nf_ct_tuplehash_to_ctrack(h);
+   /* No expectation linked to this connection tracking. */
+   if (!nfct_help(ct)) {
+   nf_ct_put(ct);
+   return 0;
+   }
+
c.data = ct;
 
err = netlink_dump_start(ctnl, skb, nlh, );
-- 
2.1.4

[PATCH 5/9] netfilter: make it safer during the inet6_dev->addr_list traversal

2017-04-13 Thread Pablo Neira Ayuso

From: Liping Zhang 

inet6_dev->addr_list is protected by inet6_dev->lock, so only using
rcu_read_lock is not enough, we should acquire read_lock_bh(>lock)
before the inet6_dev->addr_list traversal.

Signed-off-by: Liping Zhang 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_nat_redirect.c | 2 ++
 net/netfilter/xt_TPROXY.c   | 5 -
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nf_nat_redirect.c b/net/netfilter/nf_nat_redirect.c
index d43869879fcf..86067560a318 100644
--- a/net/netfilter/nf_nat_redirect.c
+++ b/net/netfilter/nf_nat_redirect.c
@@ -101,11 +101,13 @@ nf_nat_redirect_ipv6(struct sk_buff *skb, const struct 
nf_nat_range *range,
rcu_read_lock();
idev = __in6_dev_get(skb->dev);
if (idev != NULL) {
+   read_lock_bh(>lock);
list_for_each_entry(ifa, >addr_list, if_list) {
newdst = ifa->addr;
addr = true;
break;
}
+   read_unlock_bh(>lock);
}
rcu_read_unlock();
 
diff --git a/net/netfilter/xt_TPROXY.c b/net/netfilter/xt_TPROXY.c
index 80cb7babeb64..df7f1df00330 100644
--- a/net/netfilter/xt_TPROXY.c
+++ b/net/netfilter/xt_TPROXY.c
@@ -393,7 +393,8 @@ tproxy_laddr6(struct sk_buff *skb, const struct in6_addr 
*user_laddr,
 
rcu_read_lock();
indev = __in6_dev_get(skb->dev);
-   if (indev)
+   if (indev) {
+   read_lock_bh(>lock);
list_for_each_entry(ifa, >addr_list, if_list) {
if (ifa->flags & (IFA_F_TENTATIVE | IFA_F_DEPRECATED))
continue;
@@ -401,6 +402,8 @@ tproxy_laddr6(struct sk_buff *skb, const struct in6_addr 
*user_laddr,
laddr = >addr;
break;
}
+   read_unlock_bh(>lock);
+   }
rcu_read_unlock();
 
return laddr ? laddr : daddr;
-- 
2.1.4

[PATCH 2/9] netfilter: ctnetlink: using bit to represent the ct event

2017-04-13 Thread Pablo Neira Ayuso

From: Liping Zhang 

Otherwise, creating a new conntrack via nfnetlink:
  # conntrack -I -p udp -s 1.1.1.1 -d 2.2.2.2 -t 10 --sport 10 --dport 20

will emit the wrong ct events(where UPDATE should be NEW):
  # conntrack -E
  [UPDATE] udp  17 10 src=1.1.1.1 dst=2.2.2.2 sport=10 dport=20
  [UNREPLIED] src=2.2.2.2 dst=1.1.1.1 sport=20 dport=10 mark=0

Signed-off-by: Liping Zhang 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_conntrack_netlink.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nf_conntrack_netlink.c 
b/net/netfilter/nf_conntrack_netlink.c
index 908d858034e4..59ee27deb9a0 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -1929,9 +1929,9 @@ static int ctnetlink_new_conntrack(struct net *net, 
struct sock *ctnl,
 
err = 0;
if (test_bit(IPS_EXPECTED_BIT, >status))
-   events = IPCT_RELATED;
+   events = 1 << IPCT_RELATED;
else
-   events = IPCT_NEW;
+   events = 1 << IPCT_NEW;
 
if (cda[CTA_LABELS] &&
ctnetlink_attach_labels(ct, cda) == 0)
-- 
2.1.4

Re: [PATCH nf-next] ipvs: remove unused function ip_vs_set_state_timeout

2017-04-13 Thread Pablo Neira Ayuso

On Mon, Apr 10, 2017 at 03:50:44PM -0400, Aaron Conole wrote:
> There are no in-tree callers of this function and it isn't exported.

Simon, let me know if you want to take this, or just add your
Signed-off-by.

Thanks!

> Signed-off-by: Aaron Conole 
> ---
>  include/net/ip_vs.h  |  2 --
>  net/netfilter/ipvs/ip_vs_proto.c | 22 --
>  2 files changed, 24 deletions(-)
> 
> diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
> index 8a4a57b8..c76fedb 100644
> --- a/include/net/ip_vs.h
> +++ b/include/net/ip_vs.h
> @@ -1349,8 +1349,6 @@ int ip_vs_protocol_init(void);
>  void ip_vs_protocol_cleanup(void);
>  void ip_vs_protocol_timeout_change(struct netns_ipvs *ipvs, int flags);
>  int *ip_vs_create_timeout_table(int *table, int size);
> -int ip_vs_set_state_timeout(int *table, int num, const char *const *names,
> - const char *name, int to);
>  void ip_vs_tcpudp_debug_packet(int af, struct ip_vs_protocol *pp,
>  const struct sk_buff *skb, int offset,
>  const char *msg);
> diff --git a/net/netfilter/ipvs/ip_vs_proto.c 
> b/net/netfilter/ipvs/ip_vs_proto.c
> index 8ae4807..ca880a3 100644
> --- a/net/netfilter/ipvs/ip_vs_proto.c
> +++ b/net/netfilter/ipvs/ip_vs_proto.c
> @@ -193,28 +193,6 @@ ip_vs_create_timeout_table(int *table, int size)
>  }
>  
>  
> -/*
> - *   Set timeout value for state specified by name
> - */
> -int
> -ip_vs_set_state_timeout(int *table, int num, const char *const *names,
> - const char *name, int to)
> -{
> - int i;
> -
> - if (!table || !name || !to)
> - return -EINVAL;
> -
> - for (i = 0; i < num; i++) {
> - if (strcmp(names[i], name))
> - continue;
> - table[i] = to * HZ;
> - return 0;
> - }
> - return -ENOENT;
> -}
> -
> -
>  const char * ip_vs_state_name(__u16 proto, int state)
>  {
>   struct ip_vs_protocol *pp = ip_vs_proto_get(proto);
> -- 
> 2.9.3
>

Re: [GIT 0/3] Second Round of IPVS Updates for v4.12

2017-04-13 Thread Pablo Neira Ayuso

On Fri, Apr 14, 2017 at 08:51:19AM +0900, Simon Horman wrote:
> On Fri, Apr 14, 2017 at 01:01:34AM +0200, Pablo Neira Ayuso wrote:
> > Hi Simon,
> > 
> > On Mon, Apr 10, 2017 at 09:58:32AM -0700, Simon Horman wrote:
> > > Hi Pablo,
> > > 
> > > please consider these clean-ups and enhancements to IPVS for v4.12.
> > > 
> > > * Removal unused variable
> > > * Use kzalloc where appropriate
> > > * More efficient detection of presence of NAT extension
> > > 
> > > 
> > > The following changes since commit 
> > > 592d42ac7fd36408979e09bf2f170f2595dab7b8:
> > > 
> > >   Merge branch 'qed-IOV-cleanups' (2017-03-21 19:02:38 -0700)
> > > 
> > > are available in the git repository at:
> > > 
> > >   https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git 
> > > ipvs2-for-v4.12
> > 
> > This says:
> > 
> > $ git pull
> > https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git 
> > ipvs2-for-v4.12
> > fatal: Couldn't find remote ref ipvs2-for-v4.12
> > 
> > I don't any tag for this name in:
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git/refs/tags
> 
> Sorry about that, it looks like I forgot to push the tag.
> It should be there now.

I'm hitting a conflict between this and what I have in nf-next.git.

If you can have a look, otherwise I will check tomorrow with fresher
mind.

Re: [GIT 0/3] Second Round of IPVS Updates for v4.12

2017-04-13 Thread Simon Horman

On Fri, Apr 14, 2017 at 01:01:34AM +0200, Pablo Neira Ayuso wrote:
> Hi Simon,
> 
> On Mon, Apr 10, 2017 at 09:58:32AM -0700, Simon Horman wrote:
> > Hi Pablo,
> > 
> > please consider these clean-ups and enhancements to IPVS for v4.12.
> > 
> > * Removal unused variable
> > * Use kzalloc where appropriate
> > * More efficient detection of presence of NAT extension
> > 
> > 
> > The following changes since commit 592d42ac7fd36408979e09bf2f170f2595dab7b8:
> > 
> >   Merge branch 'qed-IOV-cleanups' (2017-03-21 19:02:38 -0700)
> > 
> > are available in the git repository at:
> > 
> >   https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git 
> > ipvs2-for-v4.12
> 
> This says:
> 
> $ git pull
> https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git 
> ipvs2-for-v4.12
> fatal: Couldn't find remote ref ipvs2-for-v4.12
> 
> I don't any tag for this name in:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git/refs/tags

Sorry about that, it looks like I forgot to push the tag.
It should be there now.

Re: [PATCH nf-next] ipset: remove unused function __ip_set_get_netlink

2017-04-13 Thread Pablo Neira Ayuso

On Mon, Apr 10, 2017 at 03:52:37PM -0400, Aaron Conole wrote:
> There are no in-tree callers.

@Jozsef, let me know if I should just take this to save you a pull
request.

Thanks.

> Signed-off-by: Aaron Conole 
> ---
>  net/netfilter/ipset/ip_set_core.c | 8 
>  1 file changed, 8 deletions(-)
> 
> diff --git a/net/netfilter/ipset/ip_set_core.c 
> b/net/netfilter/ipset/ip_set_core.c
> index c296f9b..68ba531 100644
> --- a/net/netfilter/ipset/ip_set_core.c
> +++ b/net/netfilter/ipset/ip_set_core.c
> @@ -501,14 +501,6 @@ __ip_set_put(struct ip_set *set)
>   * a separate reference counter
>   */
>  static inline void
> -__ip_set_get_netlink(struct ip_set *set)
> -{
> - write_lock_bh(_set_ref_lock);
> - set->ref_netlink++;
> - write_unlock_bh(_set_ref_lock);
> -}
> -
> -static inline void
>  __ip_set_put_netlink(struct ip_set *set)
>  {
>   write_lock_bh(_set_ref_lock);
> -- 
> 2.9.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH nf-next] nf_conntrack: remove double assignment

2017-04-13 Thread Pablo Neira Ayuso

On Wed, Apr 12, 2017 at 04:32:54PM -0400, Aaron Conole wrote:
> The protonet pointer will unconditionally be rewritten, so just do the
> needed assignment first.

Also applied, thanks.

Re: [PATCH nf-next] nf_tables: remove double return statement

2017-04-13 Thread Pablo Neira Ayuso

Applied, thanks.

Re: [RFC net-next] of: mdio: Honor hints from MDIO bus drivers

2017-04-13 Thread Florian Fainelli

On 04/13/2017 02:51 PM, Andrew Lunn wrote:
>> The DT binding is in tree and provides an example of how the switch
>> looks like, below is the example, but I am also adding the MDIO bus and
>> the PHYs just so you can see how things wind up:
>>
>> switch_top@f0b0 {
>> compatible = "simple-bus";
>> #size-cells = <1>;
>> #address-cells = <1>;
>> ranges = <0 0xf0b0 0x40804>;
>>
>> ethernet_switch@0 {
>> compatible = "brcm,bcm7445-switch-v4.0";
>> #size-cells = <0>;
>> #address-cells = <1>;
>> reg = <0x0 0x4
>> 0x4 0x110
>> 0x40340 0x30
>> 0x40380 0x30
>> 0x40400 0x34
>> 0x40600 0x208>;
>> reg-names = "core", "reg", intrl2_0", "intrl2_1",
>> "fcb, "acb";
>> interrupts = <0 0x18 0
>> 0 0x19 0>;
>> brcm,num-gphy = <1>;
>> brcm,num-rgmii-ports = <2>;
>> brcm,fcb-pause-override;
>> brcm,acb-packets-inflight;
>>
>> ports {
>> #address-cells = <1>;
>> #size-cells = <0>;
>>
>> port@0 {
>> label = "gphy";
>> reg = <0>;
>>  phy-handle = <>;
>> };
>>
>>  sw0port1: port@1 {
>>  label = "rgmii_1";
>>  reg = <1>;
>>  phy-mode = "rgmii";
>>  fixed-link {
>>  speed = <1000>;
>>  full-duplex;
>>  };
>>  }
>> };
>> };
>>
>>  mdio@403c0 {
>>  reg = <0x403c0 0x8 0x40300 0x18>;
>>  #address-cells = <0x1>;
>>  #size-cells = <0x0>;
>>  compatible = "brcm,unimac-mdio";
>>  reg-names = "mdio", "mdio_indir_rw";
>>
>>  switch: switch@0 {
>>  broken-turn-around;
>>  reg = <0x0>;
>>  compatible = "brcm,bcm53125";
>>  #address-cells = <1>;
>>  #size-cells = <0>;
>>
>>  ports {
>>  ..
>>  port@8 {
>>  ethernet = <>;
>>  };
>>  ...
>>  };
>>  };
>>
>>  phy5: ethernet-phy@5 {
>>  reg = <0x5>;
>>  compatible = "ethernet-phy-ieee802.3-c22";
>>  };
>>  };
>> };
> 
> So phy5 is connected to the internal switch with a phy-handle. But
> because of your double usage of this node, it also can be mapped into
> the external switches port 5?
> 
> Is that your problem?

Kind of, it does translate into an invalid mapping by virtue of the PHY
being in a bad state, see below.

The mapping per-se is not the problem, but the fact that the PHY driver
is probed twice is the original problem that I have. The double probing
comes from the switch driver being probed first (drivers/net/dsa/ comes
before drivers/net/ethernet) and depends on the master netdev to be running.

We need to turn on the Gigabit PHY clock in order to be able to read its
PHY OUI and map it to a driver (yes a workaround could be to put its
exact compatible string in DT, that way, no need for get_phy_id()). We
have a local change in mdio-bcm-unimac.c which does exactly that (using
the clock framework), and then, to avoid artificially bumping the clock
reference count, the BCM7xxx PHY driver in its ->probe() function checks
whether the clock is enabled (yes, using __clk_is_enabled while it
probably should not) and keep the clock turned on for the MDIO layer to
successfully read/write from the PHY. The BCM7xxx PHY driver does
properly manage the clock though, and turns it off upon ->remove(). We
got probed and removed once, no more clock enabled because of the first
probe deferral.

The second time around, when the slave MII bus probes us again, we go
through the BCM7xxx ->probe() and ->remove() callbacks again, but the
clock was already turned off due to first probe that got deferred.

When the bcm_sf2 driver finally gets initialized, we try to attach to
this Gigabit PHY, the driver is there, good, but the clock is turned off
already, so the PHY does not respond correctly at all anymore and we
end-up reading garbage.

> 
> It seems like you should add an mdio node inside your switch node, and
> list your external switch internal/external phys there if needed.

I think I am going to keep this hack

Re: [GIT 0/3] Second Round of IPVS Updates for v4.12

2017-04-13 Thread Pablo Neira Ayuso

Hi Simon,

On Mon, Apr 10, 2017 at 09:58:32AM -0700, Simon Horman wrote:
> Hi Pablo,
> 
> please consider these clean-ups and enhancements to IPVS for v4.12.
> 
> * Removal unused variable
> * Use kzalloc where appropriate
> * More efficient detection of presence of NAT extension
> 
> 
> The following changes since commit 592d42ac7fd36408979e09bf2f170f2595dab7b8:
> 
>   Merge branch 'qed-IOV-cleanups' (2017-03-21 19:02:38 -0700)
> 
> are available in the git repository at:
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git 
> ipvs2-for-v4.12

This says:

$ git pull
https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git 
ipvs2-for-v4.12
fatal: Couldn't find remote ref ipvs2-for-v4.12

I don't any tag for this name in:

https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git/refs/tags

RE: [PATCH v4] smsc95xx: Add comments to the registers definition

2017-04-13 Thread Woojung.Huh

> This chip is used by a lot of embedded devices and also by the Raspberry
> Pi 1, 2 & 3 which were created to promote the study of computer
> sciences. Students wanting to learn kernel / network device driver
> programming through those devices can only rely on the Linux kernel
> driver source to make their own.
> 
> This commit adds a lot of comments to the registers definition to expand
> the register names.
> 
> Cc: Steve Glendinning 
> Cc: Microchip Linux Driver Support 
> CC: David Miller 
> Signed-off-by: Martin Wetterwald 
> Reviewed-by: Andrew Lunn 
> Acked-by: Steve Glendinning 

Acked-by: Woojung Huh 

Woojung

[PATCH 15/22] scsi: libfc, csiostor: Change to sg_copy_buffer in two drivers

2017-04-13 Thread Logan Gunthorpe

These two drivers appear to duplicate the functionality of
sg_copy_buffer. So we clean them up to use the common code.

This helps us remove a couple of instances that would otherwise be
slightly tricky sg_map usages.

Signed-off-by: Logan Gunthorpe 
---
 drivers/scsi/csiostor/csio_scsi.c | 54 +++
 drivers/scsi/libfc/fc_libfc.c | 49 ---
 2 files changed, 14 insertions(+), 89 deletions(-)

diff --git a/drivers/scsi/csiostor/csio_scsi.c 
b/drivers/scsi/csiostor/csio_scsi.c
index a1ff75f..bd9d062 100644
--- a/drivers/scsi/csiostor/csio_scsi.c
+++ b/drivers/scsi/csiostor/csio_scsi.c
@@ -1489,60 +1489,14 @@ static inline uint32_t
 csio_scsi_copy_to_sgl(struct csio_hw *hw, struct csio_ioreq *req)
 {
struct scsi_cmnd *scmnd  = (struct scsi_cmnd *)csio_scsi_cmnd(req);
-   struct scatterlist *sg;
-   uint32_t bytes_left;
-   uint32_t bytes_copy;
-   uint32_t buf_off = 0;
-   uint32_t start_off = 0;
-   uint32_t sg_off = 0;
-   void *sg_addr;
-   void *buf_addr;
struct csio_dma_buf *dma_buf;
+   size_t copied;
 
-   bytes_left = scsi_bufflen(scmnd);
-   sg = scsi_sglist(scmnd);
dma_buf = (struct csio_dma_buf *)csio_list_next(>gen_list);
+   copied = sg_copy_from_buffer(scsi_sglist(scmnd), scsi_sg_count(scmnd),
+dma_buf->vaddr, scsi_bufflen(scmnd));
 
-   /* Copy data from driver buffer to SGs of SCSI CMD */
-   while (bytes_left > 0 && sg && dma_buf) {
-   if (buf_off >= dma_buf->len) {
-   buf_off = 0;
-   dma_buf = (struct csio_dma_buf *)
-   csio_list_next(dma_buf);
-   continue;
-   }
-
-   if (start_off >= sg->length) {
-   start_off -= sg->length;
-   sg = sg_next(sg);
-   continue;
-   }
-
-   buf_addr = dma_buf->vaddr + buf_off;
-   sg_off = sg->offset + start_off;
-   bytes_copy = min((dma_buf->len - buf_off),
-   sg->length - start_off);
-   bytes_copy = min((uint32_t)(PAGE_SIZE - (sg_off & ~PAGE_MASK)),
-bytes_copy);
-
-   sg_addr = kmap_atomic(sg_page(sg) + (sg_off >> PAGE_SHIFT));
-   if (!sg_addr) {
-   csio_err(hw, "failed to kmap sg:%p of ioreq:%p\n",
-   sg, req);
-   break;
-   }
-
-   csio_dbg(hw, "copy_to_sgl:sg_addr %p sg_off %d buf %p len %d\n",
-   sg_addr, sg_off, buf_addr, bytes_copy);
-   memcpy(sg_addr + (sg_off & ~PAGE_MASK), buf_addr, bytes_copy);
-   kunmap_atomic(sg_addr);
-
-   start_off +=  bytes_copy;
-   buf_off += bytes_copy;
-   bytes_left -= bytes_copy;
-   }
-
-   if (bytes_left > 0)
+   if (copied != scsi_bufflen(scmnd))
return DID_ERROR;
else
return DID_OK;
diff --git a/drivers/scsi/libfc/fc_libfc.c b/drivers/scsi/libfc/fc_libfc.c
index d623d08..ce0805a 100644
--- a/drivers/scsi/libfc/fc_libfc.c
+++ b/drivers/scsi/libfc/fc_libfc.c
@@ -113,45 +113,16 @@ u32 fc_copy_buffer_to_sglist(void *buf, size_t len,
 u32 *nents, size_t *offset,
 u32 *crc)
 {
-   size_t remaining = len;
-   u32 copy_len = 0;
-
-   while (remaining > 0 && sg) {
-   size_t off, sg_bytes;
-   void *page_addr;
-
-   if (*offset >= sg->length) {
-   /*
-* Check for end and drop resources
-* from the last iteration.
-*/
-   if (!(*nents))
-   break;
-   --(*nents);
-   *offset -= sg->length;
-   sg = sg_next(sg);
-   continue;
-   }
-   sg_bytes = min(remaining, sg->length - *offset);
-
-   /*
-* The scatterlist item may be bigger than PAGE_SIZE,
-* but we are limited to mapping PAGE_SIZE at a time.
-*/
-   off = *offset + sg->offset;
-   sg_bytes = min(sg_bytes,
-  (size_t)(PAGE_SIZE - (off & ~PAGE_MASK)));
-   page_addr = kmap_atomic(sg_page(sg) + (off >> PAGE_SHIFT));
-   if (crc)
-   *crc = crc32(*crc, buf, sg_bytes);
-   memcpy((char *)page_addr + (off & ~PAGE_MASK), buf, sg_bytes);
-   kunmap_atomic(page_addr);
-   buf += sg_bytes;
-   *offset += sg_bytes;
-   remaining -= sg_bytes;
-

[PATCH 12/22] scsi: ipr, pmcraid, isci: Make use of the new sg_map helper in 4 call sites

2017-04-13 Thread Logan Gunthorpe

Very straightforward conversion of three scsi drivers.

Signed-off-by: Logan Gunthorpe 
---
 drivers/scsi/ipr.c  | 27 ++-
 drivers/scsi/isci/request.c | 42 +-
 drivers/scsi/pmcraid.c  | 19 ---
 3 files changed, 51 insertions(+), 37 deletions(-)

diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c
index b29afaf..f98f251 100644
--- a/drivers/scsi/ipr.c
+++ b/drivers/scsi/ipr.c
@@ -3853,7 +3853,7 @@ static void ipr_free_ucode_buffer(struct ipr_sglist 
*sglist)
 static int ipr_copy_ucode_buffer(struct ipr_sglist *sglist,
 u8 *buffer, u32 len)
 {
-   int bsize_elem, i, result = 0;
+   int bsize_elem, i;
struct scatterlist *scatterlist;
void *kaddr;
 
@@ -3863,32 +3863,33 @@ static int ipr_copy_ucode_buffer(struct ipr_sglist 
*sglist,
scatterlist = sglist->scatterlist;
 
for (i = 0; i < (len / bsize_elem); i++, buffer += bsize_elem) {
-   struct page *page = sg_page([i]);
+   kaddr = sg_map([i], SG_KMAP);
+   if (IS_ERR(kaddr)) {
+   ipr_trace;
+   return PTR_ERR(kaddr);
+   }
 
-   kaddr = kmap(page);
memcpy(kaddr, buffer, bsize_elem);
-   kunmap(page);
+   sg_unmap([i], kaddr, SG_KMAP);
 
scatterlist[i].length = bsize_elem;
-
-   if (result != 0) {
-   ipr_trace;
-   return result;
-   }
}
 
if (len % bsize_elem) {
-   struct page *page = sg_page([i]);
+   kaddr = sg_map([i], SG_KMAP);
+   if (IS_ERR(kaddr)) {
+   ipr_trace;
+   return PTR_ERR(kaddr);
+   }
 
-   kaddr = kmap(page);
memcpy(kaddr, buffer, len % bsize_elem);
-   kunmap(page);
+   sg_unmap([i], kaddr, SG_KMAP);
 
scatterlist[i].length = len % bsize_elem;
}
 
sglist->buffer_len = len;
-   return result;
+   return 0;
 }
 
 /**
diff --git a/drivers/scsi/isci/request.c b/drivers/scsi/isci/request.c
index 47f66e9..66d6596 100644
--- a/drivers/scsi/isci/request.c
+++ b/drivers/scsi/isci/request.c
@@ -1424,12 +1424,14 @@ sci_stp_request_pio_data_in_copy_data_buffer(struct 
isci_stp_request *stp_req,
sg = task->scatter;
 
while (total_len > 0) {
-   struct page *page = sg_page(sg);
-
copy_len = min_t(int, total_len, sg_dma_len(sg));
-   kaddr = kmap_atomic(page);
-   memcpy(kaddr + sg->offset, src_addr, copy_len);
-   kunmap_atomic(kaddr);
+   kaddr = sg_map(sg, SG_KMAP_ATOMIC);
+   if (IS_ERR(kaddr))
+   return SCI_FAILURE;
+
+   memcpy(kaddr, src_addr, copy_len);
+   sg_unmap(sg, kaddr, SG_KMAP_ATOMIC);
+
total_len -= copy_len;
src_addr += copy_len;
sg = sg_next(sg);
@@ -1771,14 +1773,16 @@ sci_io_request_frame_handler(struct isci_request *ireq,
case SCI_REQ_SMP_WAIT_RESP: {
struct sas_task *task = isci_request_access_task(ireq);
struct scatterlist *sg = >smp_task.smp_resp;
-   void *frame_header, *kaddr;
+   void *frame_header;
u8 *rsp;
 
sci_unsolicited_frame_control_get_header(>uf_control,
 frame_index,
 _header);
-   kaddr = kmap_atomic(sg_page(sg));
-   rsp = kaddr + sg->offset;
+   rsp = sg_map(sg, SG_KMAP_ATOMIC);
+   if (IS_ERR(rsp))
+   return SCI_FAILURE;
+
sci_swab32_cpy(rsp, frame_header, 1);
 
if (rsp[0] == SMP_RESPONSE) {
@@ -1814,7 +1818,7 @@ sci_io_request_frame_handler(struct isci_request *ireq,
ireq->sci_status = 
SCI_FAILURE_CONTROLLER_SPECIFIC_IO_ERR;
sci_change_state(>sm, SCI_REQ_COMPLETED);
}
-   kunmap_atomic(kaddr);
+   sg_unmap(sg, rsp, SG_KMAP_ATOMIC);
 
sci_controller_release_frame(ihost, frame_index);
 
@@ -2919,15 +2923,18 @@ static void isci_request_io_request_complete(struct 
isci_host *ihost,
case SAS_PROTOCOL_SMP: {
struct scatterlist *sg = >smp_task.smp_req;
struct smp_req *smp_req;
-   void *kaddr;
 
dma_unmap_sg(>pdev->dev, sg, 1, DMA_TO_DEVICE);
 
/* need to swab it back in case the command buffer is re-used */
-

[PATCH 10/22] staging: unisys: visorbus: Make use of the new sg_map helper function

2017-04-13 Thread Logan Gunthorpe

Straightforward conversion to the new function.

Signed-off-by: Logan Gunthorpe 
---
 drivers/staging/unisys/visorhba/visorhba_main.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/unisys/visorhba/visorhba_main.c 
b/drivers/staging/unisys/visorhba/visorhba_main.c
index 0ce92c8..2d8c8bc 100644
--- a/drivers/staging/unisys/visorhba/visorhba_main.c
+++ b/drivers/staging/unisys/visorhba/visorhba_main.c
@@ -842,7 +842,6 @@ do_scsi_nolinuxstat(struct uiscmdrsp *cmdrsp, struct 
scsi_cmnd *scsicmd)
struct scatterlist *sg;
unsigned int i;
char *this_page;
-   char *this_page_orig;
int bufind = 0;
struct visordisk_info *vdisk;
struct visorhba_devdata *devdata;
@@ -869,11 +868,14 @@ do_scsi_nolinuxstat(struct uiscmdrsp *cmdrsp, struct 
scsi_cmnd *scsicmd)
 
sg = scsi_sglist(scsicmd);
for (i = 0; i < scsi_sg_count(scsicmd); i++) {
-   this_page_orig = kmap_atomic(sg_page(sg + i));
-   this_page = (void *)((unsigned long)this_page_orig |
-sg[i].offset);
+   this_page = sg_map(sg + i, SG_KMAP_ATOMIC);
+   if (IS_ERR(this_page)) {
+   scsicmd->result = DID_ERROR << 16;
+   return;
+   }
+
memcpy(this_page, buf + bufind, sg[i].length);
-   kunmap_atomic(this_page_orig);
+   sg_unmap(sg + i, this_page, SG_KMAP_ATOMIC);
}
} else {
devdata = (struct visorhba_devdata *)scsidev->host->hostdata;
-- 
2.1.4

[PATCH 03/22] libiscsi: Make use of new the sg_map helper function

2017-04-13 Thread Logan Gunthorpe

Convert the kmap and kmap_atomic uses to the sg_map function. We now
store the flags for the kmap instead of a boolean to indicate
atomicitiy. We also propogate a possible kmap error down and create
a new ISCSI_TCP_INTERNAL_ERR error type for this.

Signed-off-by: Logan Gunthorpe 
---
 drivers/scsi/cxgbi/libcxgbi.c |  5 +
 drivers/scsi/libiscsi_tcp.c   | 32 
 include/scsi/libiscsi_tcp.h   |  3 ++-
 3 files changed, 27 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/cxgbi/libcxgbi.c b/drivers/scsi/cxgbi/libcxgbi.c
index bd7d39e..e38d0c1 100644
--- a/drivers/scsi/cxgbi/libcxgbi.c
+++ b/drivers/scsi/cxgbi/libcxgbi.c
@@ -1556,6 +1556,11 @@ static inline int read_pdu_skb(struct iscsi_conn *conn,
 */
iscsi_conn_printk(KERN_ERR, conn, "Invalid pdu or skb.");
return -EFAULT;
+   case ISCSI_TCP_INTERNAL_ERR:
+   pr_info("skb 0x%p, off %u, %d, TCP_INTERNAL_ERR.\n",
+   skb, offset, offloaded);
+   iscsi_conn_printk(KERN_ERR, conn, "Internal error.");
+   return -EFAULT;
case ISCSI_TCP_SEGMENT_DONE:
log_debug(1 << CXGBI_DBG_PDU_RX,
"skb 0x%p, off %u, %d, TCP_SEG_DONE, rc %d.\n",
diff --git a/drivers/scsi/libiscsi_tcp.c b/drivers/scsi/libiscsi_tcp.c
index 63a1d69..a2427699 100644
--- a/drivers/scsi/libiscsi_tcp.c
+++ b/drivers/scsi/libiscsi_tcp.c
@@ -133,25 +133,23 @@ static void iscsi_tcp_segment_map(struct iscsi_segment 
*segment, int recv)
if (page_count(sg_page(sg)) >= 1 && !recv)
return;
 
-   if (recv) {
-   segment->atomic_mapped = true;
-   segment->sg_mapped = kmap_atomic(sg_page(sg));
-   } else {
-   segment->atomic_mapped = false;
-   /* the xmit path can sleep with the page mapped so use kmap */
-   segment->sg_mapped = kmap(sg_page(sg));
+   /* the xmit path can sleep with the page mapped so don't use atomic */
+   segment->sg_map_flags = recv ? SG_KMAP_ATOMIC : SG_KMAP;
+   segment->sg_mapped = sg_map(sg, segment->sg_map_flags);
+
+   if (IS_ERR(segment->sg_mapped)) {
+   segment->sg_mapped = NULL;
+   return;
}
 
-   segment->data = segment->sg_mapped + sg->offset + segment->sg_offset;
+   segment->data = segment->sg_mapped + segment->sg_offset;
 }
 
 void iscsi_tcp_segment_unmap(struct iscsi_segment *segment)
 {
if (segment->sg_mapped) {
-   if (segment->atomic_mapped)
-   kunmap_atomic(segment->sg_mapped);
-   else
-   kunmap(sg_page(segment->sg));
+   sg_unmap(segment->sg, segment->sg_mapped,
+ segment->sg_map_flags);
segment->sg_mapped = NULL;
segment->data = NULL;
}
@@ -304,6 +302,9 @@ iscsi_tcp_segment_recv(struct iscsi_tcp_conn *tcp_conn,
break;
}
 
+   if (segment->data)
+   return -EFAULT;
+
copy = min(len - copied, segment->size - segment->copied);
ISCSI_DBG_TCP(tcp_conn->iscsi_conn, "copying %d\n", copy);
memcpy(segment->data + segment->copied, ptr + copied, copy);
@@ -927,6 +928,13 @@ int iscsi_tcp_recv_skb(struct iscsi_conn *conn, struct 
sk_buff *skb,
  avail);
rc = iscsi_tcp_segment_recv(tcp_conn, segment, ptr, avail);
BUG_ON(rc == 0);
+   if (rc < 0) {
+   ISCSI_DBG_TCP(conn, "memory fault. Consumed %d\n",
+ consumed);
+   *status = ISCSI_TCP_INTERNAL_ERR;
+   goto skb_done;
+   }
+
consumed += rc;
 
if (segment->total_copied >= segment->total_size) {
diff --git a/include/scsi/libiscsi_tcp.h b/include/scsi/libiscsi_tcp.h
index 30520d5..58c79af 100644
--- a/include/scsi/libiscsi_tcp.h
+++ b/include/scsi/libiscsi_tcp.h
@@ -47,7 +47,7 @@ struct iscsi_segment {
struct scatterlist  *sg;
void*sg_mapped;
unsigned intsg_offset;
-   boolatomic_mapped;
+   int sg_map_flags;
 
iscsi_segment_done_fn_t *done;
 };
@@ -92,6 +92,7 @@ enum {
ISCSI_TCP_SKB_DONE, /* skb is out of data */
ISCSI_TCP_CONN_ERR, /* iscsi layer has fired a conn err */
ISCSI_TCP_SUSPENDED,/* conn is suspended */
+   ISCSI_TCP_INTERNAL_ERR, /* an internal error occurred */
 };
 
 extern void iscsi_tcp_hdr_recv_prep(struct iscsi_tcp_conn *tcp_conn);
-- 
2.1.4

[PATCH 07/22] crypto: shash, caam: Make use of the new sg_map helper function

2017-04-13 Thread Logan Gunthorpe

Very straightforward conversion to the new function in two crypto
drivers.

Signed-off-by: Logan Gunthorpe 
---
 crypto/shash.c| 9 ++---
 drivers/crypto/caam/caamalg.c | 8 +++-
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/crypto/shash.c b/crypto/shash.c
index 5e31c8d..2b7de94 100644
--- a/crypto/shash.c
+++ b/crypto/shash.c
@@ -283,10 +283,13 @@ int shash_ahash_digest(struct ahash_request *req, struct 
shash_desc *desc)
if (nbytes < min(sg->length, ((unsigned int)(PAGE_SIZE)) - offset)) {
void *data;
 
-   data = kmap_atomic(sg_page(sg));
-   err = crypto_shash_digest(desc, data + offset, nbytes,
+   data = sg_map(sg, SG_KMAP_ATOMIC);
+   if (IS_ERR(data))
+   return PTR_ERR(data);
+
+   err = crypto_shash_digest(desc, data, nbytes,
  req->result);
-   kunmap_atomic(data);
+   sg_unmap(sg, data, SG_KMAP_ATOMIC);
crypto_yield(desc->flags);
} else
err = crypto_shash_init(desc) ?:
diff --git a/drivers/crypto/caam/caamalg.c b/drivers/crypto/caam/caamalg.c
index 9bc80eb..76b97de 100644
--- a/drivers/crypto/caam/caamalg.c
+++ b/drivers/crypto/caam/caamalg.c
@@ -89,7 +89,6 @@ static void dbg_dump_sg(const char *level, const char 
*prefix_str,
struct scatterlist *sg, size_t tlen, bool ascii)
 {
struct scatterlist *it;
-   void *it_page;
size_t len;
void *buf;
 
@@ -98,19 +97,18 @@ static void dbg_dump_sg(const char *level, const char 
*prefix_str,
 * make sure the scatterlist's page
 * has a valid virtual memory mapping
 */
-   it_page = kmap_atomic(sg_page(it));
-   if (unlikely(!it_page)) {
+   buf = sg_map(it, SG_KMAP_ATOMIC);
+   if (IS_ERR(buf)) {
printk(KERN_ERR "dbg_dump_sg: kmap failed\n");
return;
}
 
-   buf = it_page + it->offset;
len = min_t(size_t, tlen, it->length);
print_hex_dump(level, prefix_str, prefix_type, rowsize,
   groupsize, buf, len, ascii);
tlen -= len;
 
-   kunmap_atomic(it_page);
+   sg_unmap(it, buf, SG_KMAP_ATOMIC);
}
 }
 #endif
-- 
2.1.4

[PATCH 04/22] target: Make use of the new sg_map function at 16 call sites

2017-04-13 Thread Logan Gunthorpe

Fairly straightforward conversions in all spots. In a couple of cases
any error gets propogated up should sg_map fail. In other
cases a warning is issued if the kmap fails seeing there's no
clear error path. This should not be an issue until someone tries to
use unmappable memory in the sgl with this driver.

Signed-off-by: Logan Gunthorpe 
---
 drivers/target/iscsi/iscsi_target.c|  27 +---
 drivers/target/target_core_rd.c|   3 +-
 drivers/target/target_core_sbc.c   | 122 +++--
 drivers/target/target_core_transport.c |  18 +++--
 drivers/target/target_core_user.c  |  43 
 include/target/target_core_backend.h   |   4 +-
 6 files changed, 149 insertions(+), 68 deletions(-)

diff --git a/drivers/target/iscsi/iscsi_target.c 
b/drivers/target/iscsi/iscsi_target.c
index a918024..e3e0d8f 100644
--- a/drivers/target/iscsi/iscsi_target.c
+++ b/drivers/target/iscsi/iscsi_target.c
@@ -579,7 +579,7 @@ iscsit_xmit_nondatain_pdu(struct iscsi_conn *conn, struct 
iscsi_cmd *cmd,
 }
 
 static int iscsit_map_iovec(struct iscsi_cmd *, struct kvec *, u32, u32);
-static void iscsit_unmap_iovec(struct iscsi_cmd *);
+static void iscsit_unmap_iovec(struct iscsi_cmd *, struct kvec *);
 static u32 iscsit_do_crypto_hash_sg(struct ahash_request *, struct iscsi_cmd *,
u32, u32, u32, u8 *);
 static int
@@ -646,7 +646,7 @@ iscsit_xmit_datain_pdu(struct iscsi_conn *conn, struct 
iscsi_cmd *cmd,
 
ret = iscsit_fe_sendpage_sg(cmd, conn);
 
-   iscsit_unmap_iovec(cmd);
+   iscsit_unmap_iovec(cmd, >iov_data[1]);
 
if (ret < 0) {
iscsit_tx_thread_wait_for_tcp(conn);
@@ -925,7 +925,10 @@ static int iscsit_map_iovec(
while (data_length) {
u32 cur_len = min_t(u32, data_length, sg->length - page_off);
 
-   iov[i].iov_base = kmap(sg_page(sg)) + sg->offset + page_off;
+   iov[i].iov_base = sg_map_offset(sg, page_off, SG_KMAP);
+   if (IS_ERR(iov[i].iov_base))
+   goto map_err;
+
iov[i].iov_len = cur_len;
 
data_length -= cur_len;
@@ -937,17 +940,25 @@ static int iscsit_map_iovec(
cmd->kmapped_nents = i;
 
return i;
+
+map_err:
+   cmd->kmapped_nents = i - 1;
+   iscsit_unmap_iovec(cmd, iov);
+   return -1;
 }
 
-static void iscsit_unmap_iovec(struct iscsi_cmd *cmd)
+static void iscsit_unmap_iovec(struct iscsi_cmd *cmd, struct kvec *iov)
 {
u32 i;
struct scatterlist *sg;
+   unsigned int page_off = cmd->first_data_sg_off;
 
sg = cmd->first_data_sg;
 
-   for (i = 0; i < cmd->kmapped_nents; i++)
-   kunmap(sg_page([i]));
+   for (i = 0; i < cmd->kmapped_nents; i++) {
+   sg_unmap_offset([i], iov[i].iov_base, page_off, SG_KMAP);
+   page_off = 0;
+   }
 }
 
 static void iscsit_ack_from_expstatsn(struct iscsi_conn *conn, u32 exp_statsn)
@@ -1610,7 +1621,7 @@ iscsit_get_dataout(struct iscsi_conn *conn, struct 
iscsi_cmd *cmd,
 
rx_got = rx_data(conn, >iov_data[0], iov_count, rx_size);
 
-   iscsit_unmap_iovec(cmd);
+   iscsit_unmap_iovec(cmd, iov);
 
if (rx_got != rx_size)
return -1;
@@ -2626,7 +2637,7 @@ static int iscsit_handle_immediate_data(
 
rx_got = rx_data(conn, >iov_data[0], iov_count, rx_size);
 
-   iscsit_unmap_iovec(cmd);
+   iscsit_unmap_iovec(cmd, cmd->iov_data);
 
if (rx_got != rx_size) {
iscsit_rx_thread_wait_for_tcp(conn);
diff --git a/drivers/target/target_core_rd.c b/drivers/target/target_core_rd.c
index ddc216c..22c5ad5 100644
--- a/drivers/target/target_core_rd.c
+++ b/drivers/target/target_core_rd.c
@@ -431,7 +431,8 @@ static sense_reason_t rd_do_prot_rw(struct se_cmd *cmd, 
bool is_read)
cmd->t_prot_sg, 0);
 
if (!rc)
-   sbc_dif_copy_prot(cmd, sectors, is_read, prot_sg, prot_offset);
+   rc = sbc_dif_copy_prot(cmd, sectors, is_read, prot_sg,
+  prot_offset);
 
return rc;
 }
diff --git a/drivers/target/target_core_sbc.c b/drivers/target/target_core_sbc.c
index c194063..67cb420 100644
--- a/drivers/target/target_core_sbc.c
+++ b/drivers/target/target_core_sbc.c
@@ -420,17 +420,17 @@ static sense_reason_t xdreadwrite_callback(struct se_cmd 
*cmd, bool success,
 
offset = 0;
for_each_sg(cmd->t_bidi_data_sg, sg, cmd->t_bidi_data_nents, count) {
-   addr = kmap_atomic(sg_page(sg));
-   if (!addr) {
+   addr = sg_map(sg, SG_KMAP_ATOMIC);
+   if (IS_ERR(addr)) {
ret = TCM_OUT_OF_RESOURCES;
goto out;
}
 
for (i = 0; i < sg->length; i++)
-   *(addr + sg->offset + i) ^= *(buf + offset + i);
+

[PATCH 18/22] mmc: spi: Make use of the new sg_map helper function

2017-04-13 Thread Logan Gunthorpe

We use the sg_map helper but it's slightly more complicated
as we only check for the error when the mapping actually gets used.
Such that if the mapping failed but wasn't needed then no
error occurs.

Signed-off-by: Logan Gunthorpe 
---
 drivers/mmc/host/mmc_spi.c | 26 +++---
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/drivers/mmc/host/mmc_spi.c b/drivers/mmc/host/mmc_spi.c
index e77d79c..82f786d 100644
--- a/drivers/mmc/host/mmc_spi.c
+++ b/drivers/mmc/host/mmc_spi.c
@@ -676,9 +676,15 @@ mmc_spi_writeblock(struct mmc_spi_host *host, struct 
spi_transfer *t,
struct scratch  *scratch = host->data;
u32 pattern;
 
-   if (host->mmc->use_spi_crc)
+   if (host->mmc->use_spi_crc) {
+   if (IS_ERR(t->tx_buf))
+   return PTR_ERR(t->tx_buf);
+
scratch->crc_val = cpu_to_be16(
crc_itu_t(0, t->tx_buf, t->len));
+   t->tx_buf += t->len;
+   }
+
if (host->dma_dev)
dma_sync_single_for_device(host->dma_dev,
host->data_dma, sizeof(*scratch),
@@ -743,7 +749,6 @@ mmc_spi_writeblock(struct mmc_spi_host *host, struct 
spi_transfer *t,
return status;
}
 
-   t->tx_buf += t->len;
if (host->dma_dev)
t->tx_dma += t->len;
 
@@ -809,6 +814,11 @@ mmc_spi_readblock(struct mmc_spi_host *host, struct 
spi_transfer *t,
}
leftover = status << 1;
 
+   if (bitshift || host->mmc->use_spi_crc) {
+   if (IS_ERR(t->rx_buf))
+   return PTR_ERR(t->rx_buf);
+   }
+
if (host->dma_dev) {
dma_sync_single_for_device(host->dma_dev,
host->data_dma, sizeof(*scratch),
@@ -860,9 +870,10 @@ mmc_spi_readblock(struct mmc_spi_host *host, struct 
spi_transfer *t,
scratch->crc_val, crc, t->len);
return -EILSEQ;
}
+
+   t->rx_buf += t->len;
}
 
-   t->rx_buf += t->len;
if (host->dma_dev)
t->rx_dma += t->len;
 
@@ -936,11 +947,11 @@ mmc_spi_data_do(struct mmc_spi_host *host, struct 
mmc_command *cmd,
}
 
/* allow pio too; we don't allow highmem */
-   kmap_addr = kmap(sg_page(sg));
+   kmap_addr = sg_map(sg, SG_KMAP);
if (direction == DMA_TO_DEVICE)
-   t->tx_buf = kmap_addr + sg->offset;
+   t->tx_buf = kmap_addr;
else
-   t->rx_buf = kmap_addr + sg->offset;
+   t->rx_buf = kmap_addr;
 
/* transfer each block, and update request status */
while (length) {
@@ -970,7 +981,8 @@ mmc_spi_data_do(struct mmc_spi_host *host, struct 
mmc_command *cmd,
/* discard mappings */
if (direction == DMA_FROM_DEVICE)
flush_kernel_dcache_page(sg_page(sg));
-   kunmap(sg_page(sg));
+   if (!IS_ERR(kmap_addr))
+   sg_unmap(sg, kmap_addr, SG_KMAP);
if (dma_dev)
dma_unmap_page(dma_dev, dma_addr, PAGE_SIZE, dir);
 
-- 
2.1.4

[PATCH 22/22] memstick: Make use of the new sg_map helper function

2017-04-13 Thread Logan Gunthorpe

Straightforward conversion, but we have to WARN if unmappable
memory finds its way into the sgl.

Signed-off-by: Logan Gunthorpe 
---
 drivers/memstick/host/jmb38x_ms.c | 23 ++-
 drivers/memstick/host/tifm_ms.c   | 22 +-
 2 files changed, 35 insertions(+), 10 deletions(-)

diff --git a/drivers/memstick/host/jmb38x_ms.c 
b/drivers/memstick/host/jmb38x_ms.c
index 48db922..256cf41 100644
--- a/drivers/memstick/host/jmb38x_ms.c
+++ b/drivers/memstick/host/jmb38x_ms.c
@@ -303,7 +303,6 @@ static int jmb38x_ms_transfer_data(struct jmb38x_ms_host 
*host)
unsigned int off;
unsigned int t_size, p_cnt;
unsigned char *buf;
-   struct page *pg;
unsigned long flags = 0;
 
if (host->req->long_data) {
@@ -318,14 +317,26 @@ static int jmb38x_ms_transfer_data(struct jmb38x_ms_host 
*host)
unsigned int uninitialized_var(p_off);
 
if (host->req->long_data) {
-   pg = nth_page(sg_page(>req->sg),
- off >> PAGE_SHIFT);
p_off = offset_in_page(off);
p_cnt = PAGE_SIZE - p_off;
p_cnt = min(p_cnt, length);
 
local_irq_save(flags);
-   buf = kmap_atomic(pg) + p_off;
+   buf = sg_map_offset(>req->sg,
+off - host->req->sg.offset,
+SG_KMAP_ATOMIC);
+   if (IS_ERR(buf)) {
+   /*
+* This should really never happen unless
+* the code is changed to use memory that is
+* not mappable in the sg. Seeing there doesn't
+* seem to be any error path out of here,
+* we can only WARN.
+*/
+   WARN(1, "Non-mappable memory used in sg!");
+   break;
+   }
+
} else {
buf = host->req->data + host->block_pos;
p_cnt = host->req->data_len - host->block_pos;
@@ -341,7 +352,9 @@ static int jmb38x_ms_transfer_data(struct jmb38x_ms_host 
*host)
 : jmb38x_ms_read_reg_data(host, buf, p_cnt);
 
if (host->req->long_data) {
-   kunmap_atomic(buf - p_off);
+   sg_unmap_offset(>req->sg, buf,
+off - host->req->sg.offset,
+SG_KMAP_ATOMIC);
local_irq_restore(flags);
}
 
diff --git a/drivers/memstick/host/tifm_ms.c b/drivers/memstick/host/tifm_ms.c
index 7bafa72..c0bc40e 100644
--- a/drivers/memstick/host/tifm_ms.c
+++ b/drivers/memstick/host/tifm_ms.c
@@ -186,7 +186,6 @@ static unsigned int tifm_ms_transfer_data(struct tifm_ms 
*host)
unsigned int off;
unsigned int t_size, p_cnt;
unsigned char *buf;
-   struct page *pg;
unsigned long flags = 0;
 
if (host->req->long_data) {
@@ -203,14 +202,25 @@ static unsigned int tifm_ms_transfer_data(struct tifm_ms 
*host)
unsigned int uninitialized_var(p_off);
 
if (host->req->long_data) {
-   pg = nth_page(sg_page(>req->sg),
- off >> PAGE_SHIFT);
p_off = offset_in_page(off);
p_cnt = PAGE_SIZE - p_off;
p_cnt = min(p_cnt, length);
 
local_irq_save(flags);
-   buf = kmap_atomic(pg) + p_off;
+   buf = sg_map_offset(>req->sg,
+off - host->req->sg.offset,
+SG_KMAP_ATOMIC);
+   if (IS_ERR(buf)) {
+   /*
+* This should really never happen unless
+* the code is changed to use memory that is
+* not mappable in the sg. Seeing there doesn't
+* seem to be any error path out of here,
+* we can only WARN.
+*/
+   WARN(1, "Non-mappable memory used in sg!");
+   break;
+   }
} else {
buf = host->req->data + host->block_pos;
p_cnt = host->req->data_len - host->block_pos;
@@ -221,7 +231,9 @@ static unsigned int tifm_ms_transfer_data(struct tifm_ms 
*host)
 : tifm_ms_read_data(host, buf, p_cnt);
 
if

[PATCH 16/22] xen-blkfront: Make use of the new sg_map helper function

2017-04-13 Thread Logan Gunthorpe

Straightforward conversion to the new helper, except due to
the lack of error path, we have to warn if unmapable memory
is ever present in the sgl.

Signed-off-by: Logan Gunthorpe 
---
 drivers/block/xen-blkfront.c | 33 +++--
 1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 5067a0a..7dcf41d 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -807,8 +807,19 @@ static int blkif_queue_rw_req(struct request *req, struct 
blkfront_ring_info *ri
BUG_ON(sg->offset + sg->length > PAGE_SIZE);
 
if (setup.need_copy) {
-   setup.bvec_off = sg->offset;
-   setup.bvec_data = kmap_atomic(sg_page(sg));
+   setup.bvec_off = 0;
+   setup.bvec_data = sg_map(sg, SG_KMAP_ATOMIC);
+   if (IS_ERR(setup.bvec_data)) {
+   /*
+* This should really never happen unless
+* the code is changed to use memory that is
+* not mappable in the sg. Seeing there is a
+* questionable error path out of here,
+* we WARN.
+*/
+   WARN(1, "Non-mappable memory used in sg!");
+   return 1;
+   }
}
 
gnttab_foreach_grant_in_range(sg_page(sg),
@@ -818,7 +829,7 @@ static int blkif_queue_rw_req(struct request *req, struct 
blkfront_ring_info *ri
  );
 
if (setup.need_copy)
-   kunmap_atomic(setup.bvec_data);
+   sg_unmap(sg, setup.bvec_data, SG_KMAP_ATOMIC);
}
if (setup.segments)
kunmap_atomic(setup.segments);
@@ -1468,8 +1479,18 @@ static bool blkif_completion(unsigned long *id,
for_each_sg(s->sg, sg, num_sg, i) {
BUG_ON(sg->offset + sg->length > PAGE_SIZE);
 
-   data.bvec_offset = sg->offset;
-   data.bvec_data = kmap_atomic(sg_page(sg));
+   data.bvec_offset = 0;
+   data.bvec_data = sg_map(sg, SG_KMAP_ATOMIC);
+   if (IS_ERR(data.bvec_data)) {
+   /*
+* This should really never happen unless
+* the code is changed to use memory that is
+* not mappable in the sg. Seeing there is no
+* clear error path, we WARN.
+*/
+   WARN(1, "Non-mappable memory used in sg!");
+   return 1;
+   }
 
gnttab_foreach_grant_in_range(sg_page(sg),
  sg->offset,
@@ -1477,7 +1498,7 @@ static bool blkif_completion(unsigned long *id,
  blkif_copy_from_grant,
  );
 
-   kunmap_atomic(data.bvec_data);
+   sg_unmap(sg, data.bvec_data, SG_KMAP_ATOMIC);
}
}
/* Add the persistent grant into the list of free grants */
-- 
2.1.4

[PATCH 19/22] mmc: tmio: Make use of the new sg_map helper function

2017-04-13 Thread Logan Gunthorpe

Straightforward conversion to sg_map helper. A couple paths will
WARN if the memory does not end up being mappable.

Signed-off-by: Logan Gunthorpe 
---
 drivers/mmc/host/tmio_mmc.h | 12 ++--
 drivers/mmc/host/tmio_mmc_dma.c |  5 +
 drivers/mmc/host/tmio_mmc_pio.c | 24 
 3 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/drivers/mmc/host/tmio_mmc.h b/drivers/mmc/host/tmio_mmc.h
index 2b349d4..ba68c9fed 100644
--- a/drivers/mmc/host/tmio_mmc.h
+++ b/drivers/mmc/host/tmio_mmc.h
@@ -198,17 +198,25 @@ void tmio_mmc_enable_mmc_irqs(struct tmio_mmc_host *host, 
u32 i);
 void tmio_mmc_disable_mmc_irqs(struct tmio_mmc_host *host, u32 i);
 irqreturn_t tmio_mmc_irq(int irq, void *devid);
 
+/* Note: this function may return PTR_ERR and must be checked! */
 static inline char *tmio_mmc_kmap_atomic(struct scatterlist *sg,
 unsigned long *flags)
 {
+   void *ret;
+
local_irq_save(*flags);
-   return kmap_atomic(sg_page(sg)) + sg->offset;
+   ret = sg_map(sg, SG_KMAP_ATOMIC);
+
+   if (IS_ERR(ret))
+   local_irq_restore(*flags);
+
+   return ret;
 }
 
 static inline void tmio_mmc_kunmap_atomic(struct scatterlist *sg,
  unsigned long *flags, void *virt)
 {
-   kunmap_atomic(virt - sg->offset);
+   sg_unmap(sg, virt, SG_KMAP_ATOMIC);
local_irq_restore(*flags);
 }
 
diff --git a/drivers/mmc/host/tmio_mmc_dma.c b/drivers/mmc/host/tmio_mmc_dma.c
index fa8a936..07531f7 100644
--- a/drivers/mmc/host/tmio_mmc_dma.c
+++ b/drivers/mmc/host/tmio_mmc_dma.c
@@ -149,6 +149,11 @@ static void tmio_mmc_start_dma_tx(struct tmio_mmc_host 
*host)
if (!aligned) {
unsigned long flags;
void *sg_vaddr = tmio_mmc_kmap_atomic(sg, );
+   if (IS_ERR(sg_vaddr)) {
+   ret = PTR_ERR(sg_vaddr);
+   goto pio;
+   }
+
sg_init_one(>bounce_sg, host->bounce_buf, sg->length);
memcpy(host->bounce_buf, sg_vaddr, host->bounce_sg.length);
tmio_mmc_kunmap_atomic(sg, , sg_vaddr);
diff --git a/drivers/mmc/host/tmio_mmc_pio.c b/drivers/mmc/host/tmio_mmc_pio.c
index 6b789a7..d6fdbf6 100644
--- a/drivers/mmc/host/tmio_mmc_pio.c
+++ b/drivers/mmc/host/tmio_mmc_pio.c
@@ -479,6 +479,18 @@ static void tmio_mmc_pio_irq(struct tmio_mmc_host *host)
}
 
sg_virt = tmio_mmc_kmap_atomic(host->sg_ptr, );
+   if (IS_ERR(sg_virt)) {
+   /*
+* This should really never happen unless
+* the code is changed to use memory that is
+* not mappable in the sg. Seeing there doesn't
+* seem to be any error path out of here,
+* we can only WARN.
+*/
+   WARN(1, "Non-mappable memory used in sg!");
+   return;
+   }
+
buf = (unsigned short *)(sg_virt + host->sg_off);
 
count = host->sg_ptr->length - host->sg_off;
@@ -506,6 +518,18 @@ static void tmio_mmc_check_bounce_buffer(struct 
tmio_mmc_host *host)
if (host->sg_ptr == >bounce_sg) {
unsigned long flags;
void *sg_vaddr = tmio_mmc_kmap_atomic(host->sg_orig, );
+   if (IS_ERR(sg_vaddr)) {
+   /*
+* This should really never happen unless
+* the code is changed to use memory that is
+* not mappable in the sg. Seeing there doesn't
+* seem to be any error path out of here,
+* we can only WARN.
+*/
+   WARN(1, "Non-mappable memory used in sg!");
+   return;
+   }
+
memcpy(sg_vaddr, host->bounce_buf, host->bounce_sg.length);
tmio_mmc_kunmap_atomic(host->sg_orig, , sg_vaddr);
}
-- 
2.1.4

[PATCH 05/22] drm/i915: Make use of the new sg_map helper function

2017-04-13 Thread Logan Gunthorpe

This is a single straightforward conversion from kmap to sg_map.

Signed-off-by: Logan Gunthorpe 
---
 drivers/gpu/drm/i915/i915_gem.c | 27 ---
 1 file changed, 16 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 67b1fc5..1b1b91a 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2188,6 +2188,15 @@ static void __i915_gem_object_reset_page_iter(struct 
drm_i915_gem_object *obj)
radix_tree_delete(>mm.get_page.radix, iter.index);
 }
 
+static void i915_gem_object_unmap(const struct drm_i915_gem_object *obj,
+ void *ptr)
+{
+   if (is_vmalloc_addr(ptr))
+   vunmap(ptr);
+   else
+   sg_unmap(obj->mm.pages->sgl, ptr, SG_KMAP);
+}
+
 void __i915_gem_object_put_pages(struct drm_i915_gem_object *obj,
 enum i915_mm_subclass subclass)
 {
@@ -2215,10 +2224,7 @@ void __i915_gem_object_put_pages(struct 
drm_i915_gem_object *obj,
void *ptr;
 
ptr = ptr_mask_bits(obj->mm.mapping);
-   if (is_vmalloc_addr(ptr))
-   vunmap(ptr);
-   else
-   kunmap(kmap_to_page(ptr));
+   i915_gem_object_unmap(obj, ptr);
 
obj->mm.mapping = NULL;
}
@@ -2475,8 +2481,11 @@ static void *i915_gem_object_map(const struct 
drm_i915_gem_object *obj,
void *addr;
 
/* A single page can always be kmapped */
-   if (n_pages == 1 && type == I915_MAP_WB)
-   return kmap(sg_page(sgt->sgl));
+   if (n_pages == 1 && type == I915_MAP_WB) {
+   addr = sg_map(sgt->sgl, SG_KMAP);
+   if (IS_ERR(addr))
+   return NULL;
+   }
 
if (n_pages > ARRAY_SIZE(stack_pages)) {
/* Too big for stack -- allocate temporary array instead */
@@ -2543,11 +2552,7 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object 
*obj,
goto err_unpin;
}
 
-   if (is_vmalloc_addr(ptr))
-   vunmap(ptr);
-   else
-   kunmap(kmap_to_page(ptr));
-
+   i915_gem_object_unmap(obj, ptr);
ptr = obj->mm.mapping = NULL;
}
 
-- 
2.1.4

[PATCH 01/22] scatterlist: Introduce sg_map helper functions

2017-04-13 Thread Logan Gunthorpe

This patch introduces functions which kmap the pages inside an sgl. Two
variants are provided: one if an offset is required and one if the
offset is zero. These functions replace a common pattern of
kmap(sg_page(sg)) that is used in about 50 places within the kernel.

The motivation for this work is to eventually safely support sgls that
contain io memory. In order for that to work, any access to the contents
of an iomem SGL will need to be done with iomemcpy or hit some warning.
(The exact details of how this will work have yet to be worked out.)
Having all the kmaps in one place is just a first step in that
direction. Additionally, seeing this helps cut down the users of sg_page,
it should make any effort to go to struct-page-less DMAs a little
easier (should that idea ever swing back into favour again).

A flags option is added to select between a regular or atomic mapping so
these functions can replace kmap(sg_page or kmap_atomic(sg_page.
Future work may expand this to have flags for using page_address or
vmap. Much further in the future, there may be a flag to allocate memory
and copy the data from/to iomem.

We also add the semantic that sg_map can fail to create a mapping,
despite the fact that the current code this is replacing is assumed to
never fail and the current version of these functions cannot fail. This
is to support iomem which either have to fail to create the mapping or
allocate memory as a bounce buffer which itself can fail.

Also, in terms of cleanup, a few of the existing kmap(sg_page) users
play things a bit loose in terms of whether they apply sg->offset
so using these helper functions should help avoid such issues.

Signed-off-by: Logan Gunthorpe 
---
 drivers/dma-buf/dma-buf.c   |  3 ++
 include/linux/scatterlist.h | 97 +
 2 files changed, 100 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 0007b79..b95934b 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -37,6 +37,9 @@
 
 #include 
 
+/* Prevent the highmem.h macro from aliasing ops->kunmap_atomic */
+#undef kunmap_atomic
+
 static inline int is_dma_buf_file(struct file *);
 
 struct dma_buf_list {
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index cb3c8fe..acd4d73 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -5,6 +5,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 struct scatterlist {
@@ -126,6 +127,102 @@ static inline struct page *sg_page(struct scatterlist *sg)
return (struct page *)((sg)->page_link & ~0x3);
 }
 
+#define SG_KMAP(1 << 0)/* create a mapping with kmap */
+#define SG_KMAP_ATOMIC (1 << 1)/* create a mapping with kmap_atomic */
+
+/**
+ * sg_map_offset - kmap a page inside an sgl
+ * @sg:SG entry
+ * @offset:Offset into entry
+ * @flags: Flags for creating the mapping
+ *
+ * Description:
+ *   Use this function to map a page in the scatterlist at the specified
+ *   offset. sg->offset is already added for you. Note: the semantics of
+ *   this function are that it may fail. Thus, its output should be checked
+ *   with IS_ERR and PTR_ERR. Otherwise, a pointer to the specified offset
+ *   in the mapped page is returned.
+ *
+ *   Flags can be any of:
+ * * SG_KMAP- Use kmap to create the mapping
+ * * SG_KMAP_ATOMIC - Use kmap_atomic to map the page atommically.
+ *Thus, the rules of that function apply: the cpu
+ *may not sleep until it is unmaped.
+ *
+ *   Also, consider carefully whether this function is appropriate. It is
+ *   largely not recommended for new code and if the sgl came from another
+ *   subsystem and you don't know what kind of memory might be in the list
+ *   then you definitely should not call it. Non-mappable memory may be in
+ *   the sgl and thus this function may fail unexpectedly.
+ **/
+static inline void *sg_map_offset(struct scatterlist *sg, size_t offset,
+  int flags)
+{
+   struct page *pg;
+   unsigned int pg_off;
+
+   offset += sg->offset;
+   pg = nth_page(sg_page(sg), offset >> PAGE_SHIFT);
+   pg_off = offset_in_page(offset);
+
+   if (flags & SG_KMAP_ATOMIC)
+   return kmap_atomic(pg) + pg_off;
+   else
+   return kmap(pg) + pg_off;
+}
+
+/**
+ * sg_unkmap_offset - unmap a page that was mapped with sg_map_offset
+ * @sg:SG entry
+ * @addr:  address returned by sg_map_offset
+ * @offset:Offset into entry (same as specified for sg_map_offset)
+ * @flags: Flags, which are the same specified for sg_map_offset
+ *
+ * Description:
+ *   Unmap the page that was mapped with sg_map_offset
+ *
+ **/
+static inline void sg_unmap_offset(struct scatterlist *sg, void *addr,
+   size_t offset, int flags)
+{
+

[PATCH 00/22] Introduce common scatterlist map function

2017-04-13 Thread Logan Gunthorpe

Hi Everyone,

As part of my effort to enable P2P DMA transactions with PCI cards,
we've identified the need to be able to safely put IO memory into
scatterlists (and eventually other spots). This probably involves a
conversion from struct page to pfn_t but that migration is a ways off
and those decisions are yet to be made.

As an initial step in that direction, I've started cleaning up some of the
scatterlist code by trying to carve out a better defined layer between it
and it's users. The longer term goal would be to remove sg_page or replace
it with something that can potentially fail.

This patchset is the first step in that effort. I've introduced
a common function to map scatterlist memory and converted all the common
kmap(sg_page()) cases. This removes about 66 sg_page calls (of ~331).

Seeing this is a fairly large cleanup set that touches a wide swath of
the kernel I have limited the people I've sent this to. I'd suggest we look
toward merging the first patch and then I can send the individual subsystem
patches on to their respective maintainers and get them merged
independantly. (This is to avoid the conflicts I created with my last
cleanup set... Sorry) Though, I'm certainly open to other suggestions to get
it merged.

The patchset is based on v4.11-rc6 and can be found in the sg_map
branch from this git tree:

https://github.com/sbates130272/linux-p2pmem.git

Thanks,

Logan


Logan Gunthorpe (22):
  scatterlist: Introduce sg_map helper functions
  nvmet: Make use of the new sg_map helper function
  libiscsi: Make use of new the sg_map helper function
  target: Make use of the new sg_map function at 16 call sites
  drm/i915: Make use of the new sg_map helper function
  crypto: hifn_795x: Make use of the new sg_map helper function
  crypto: shash, caam: Make use of the new sg_map helper function
  crypto: chcr: Make use of the new sg_map helper function
  dm-crypt: Make use of the new sg_map helper in 4 call sites
  staging: unisys: visorbus: Make use of the new sg_map helper function
  RDS: Make use of the new sg_map helper function
  scsi: ipr, pmcraid, isci: Make use of the new sg_map helper in 4 call
sites
  scsi: hisi_sas, mvsas, gdth: Make use of the new sg_map helper
function
  scsi: arcmsr, ips, megaraid: Make use of the new sg_map helper
function
  scsi: libfc, csiostor: Change to sg_copy_buffer in two drivers
  xen-blkfront: Make use of the new sg_map helper function
  mmc: sdhci: Make use of the new sg_map helper function
  mmc: spi: Make use of the new sg_map helper function
  mmc: tmio: Make use of the new sg_map helper function
  mmc: sdricoh_cs: Make use of the new sg_map helper function
  mmc: tifm_sd: Make use of the new sg_map helper function
  memstick: Make use of the new sg_map helper function

 crypto/shash.c  |   9 +-
 drivers/block/xen-blkfront.c|  33 +--
 drivers/crypto/caam/caamalg.c   |   8 +-
 drivers/crypto/chelsio/chcr_algo.c  |  28 +++---
 drivers/crypto/hifn_795x.c  |  32 ---
 drivers/dma-buf/dma-buf.c   |   3 +
 drivers/gpu/drm/i915/i915_gem.c |  27 +++---
 drivers/md/dm-crypt.c   |  38 +---
 drivers/memstick/host/jmb38x_ms.c   |  23 -
 drivers/memstick/host/tifm_ms.c |  22 -
 drivers/mmc/host/mmc_spi.c  |  26 +++--
 drivers/mmc/host/sdhci.c|  35 ++-
 drivers/mmc/host/sdricoh_cs.c   |  14 ++-
 drivers/mmc/host/tifm_sd.c  |  88 +
 drivers/mmc/host/tmio_mmc.h |  12 ++-
 drivers/mmc/host/tmio_mmc_dma.c |   5 +
 drivers/mmc/host/tmio_mmc_pio.c |  24 +
 drivers/nvme/target/fabrics-cmd.c   |  16 +++-
 drivers/scsi/arcmsr/arcmsr_hba.c|  16 +++-
 drivers/scsi/csiostor/csio_scsi.c   |  54 +--
 drivers/scsi/cxgbi/libcxgbi.c   |   5 +
 drivers/scsi/gdth.c |   9 +-
 drivers/scsi/hisi_sas/hisi_sas_v1_hw.c  |  14 ++-
 drivers/scsi/hisi_sas/hisi_sas_v2_hw.c  |  13 ++-
 drivers/scsi/ipr.c  |  27 +++---
 drivers/scsi/ips.c  |   8 +-
 drivers/scsi/isci/request.c |  42 
 drivers/scsi/libfc/fc_libfc.c   |  49 ++
 drivers/scsi/libiscsi_tcp.c |  32 ---
 drivers/scsi/megaraid.c |   9 +-
 drivers/scsi/mvsas/mv_sas.c |  10 +-
 drivers/scsi/pmcraid.c  |  19 ++--
 drivers/staging/unisys/visorhba/visorhba_main.c |  12 ++-
 drivers/target/iscsi/iscsi_target.c |  27 --
 drivers/target/target_core_rd.c |   3 +-
 drivers/target/target_core_sbc.c| 122 +---

[PATCH 08/22] crypto: chcr: Make use of the new sg_map helper function

2017-04-13 Thread Logan Gunthorpe

The get_page in this area looks *highly* suspect due to there being no
corresponding put_page. However, I've left that as is to avoid breaking
things.

I've also removed the KMAP_ATOMIC_ARGS check as it appears to be dead
code that dates back to when it was first committed...

Signed-off-by: Logan Gunthorpe 
---
 drivers/crypto/chelsio/chcr_algo.c | 28 +++-
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/drivers/crypto/chelsio/chcr_algo.c 
b/drivers/crypto/chelsio/chcr_algo.c
index 41bc7f4..a993d1d 100644
--- a/drivers/crypto/chelsio/chcr_algo.c
+++ b/drivers/crypto/chelsio/chcr_algo.c
@@ -1489,22 +1489,21 @@ static struct sk_buff *create_authenc_wr(struct 
aead_request *req,
return ERR_PTR(-EINVAL);
 }
 
-static void aes_gcm_empty_pld_pad(struct scatterlist *sg,
- unsigned short offset)
+static int aes_gcm_empty_pld_pad(struct scatterlist *sg,
+unsigned short offset)
 {
-   struct page *spage;
unsigned char *addr;
 
-   spage = sg_page(sg);
-   get_page(spage); /* so that it is not freed by NIC */
-#ifdef KMAP_ATOMIC_ARGS
-   addr = kmap_atomic(spage, KM_SOFTIRQ0);
-#else
-   addr = kmap_atomic(spage);
-#endif
-   memset(addr + sg->offset, 0, offset + 1);
+   get_page(sg_page(sg)); /* so that it is not freed by NIC */
+
+   addr = sg_map(sg, SG_KMAP_ATOMIC);
+   if (IS_ERR(addr))
+   return PTR_ERR(addr);
+
+   memset(addr, 0, offset + 1);
+   sg_unmap(sg, addr, SG_KMAP_ATOMIC);
 
-   kunmap_atomic(addr);
+   return 0;
 }
 
 static int set_msg_len(u8 *block, unsigned int msglen, int csize)
@@ -1940,7 +1939,10 @@ static struct sk_buff *create_gcm_wr(struct aead_request 
*req,
if (req->cryptlen) {
write_sg_to_skb(skb, , src, req->cryptlen);
} else {
-   aes_gcm_empty_pld_pad(req->dst, authsize - 1);
+   err = aes_gcm_empty_pld_pad(req->dst, authsize - 1);
+   if (err)
+   goto dstmap_fail;
+
write_sg_to_skb(skb, , reqctx->dst, crypt_len);
 
}
-- 
2.1.4

[PATCH 17/22] mmc: sdhci: Make use of the new sg_map helper function

2017-04-13 Thread Logan Gunthorpe

Straightforward conversion, except due to the lack of error path we
have to WARN if the memory in the SGL is not mappable.

Signed-off-by: Logan Gunthorpe 
---
 drivers/mmc/host/sdhci.c | 35 ++-
 1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index 63bc33a..af0c107 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -497,15 +497,34 @@ static int sdhci_pre_dma_transfer(struct sdhci_host *host,
return sg_count;
 }
 
+/*
+ * Note this function may return PTR_ERR and must be checked.
+ */
 static char *sdhci_kmap_atomic(struct scatterlist *sg, unsigned long *flags)
 {
+   void *ret;
+
local_irq_save(*flags);
-   return kmap_atomic(sg_page(sg)) + sg->offset;
+
+   ret = sg_map(sg, SG_KMAP_ATOMIC);
+   if (IS_ERR(ret)) {
+   /*
+* This should really never happen unless the code is changed
+* to use memory that is not mappable in the sg. Seeing there
+* doesn't seem to be any error path out of here, we can only
+* WARN.
+*/
+   WARN(1, "Non-mappable memory used in sg!");
+   local_irq_restore(*flags);
+   }
+
+   return ret;
 }
 
-static void sdhci_kunmap_atomic(void *buffer, unsigned long *flags)
+static void sdhci_kunmap_atomic(struct scatterlist *sg, void *buffer,
+   unsigned long *flags)
 {
-   kunmap_atomic(buffer);
+   sg_unmap(sg, buffer, SG_KMAP_ATOMIC);
local_irq_restore(*flags);
 }
 
@@ -568,8 +587,11 @@ static void sdhci_adma_table_pre(struct sdhci_host *host,
if (offset) {
if (data->flags & MMC_DATA_WRITE) {
buffer = sdhci_kmap_atomic(sg, );
+   if (IS_ERR(buffer))
+   return;
+
memcpy(align, buffer, offset);
-   sdhci_kunmap_atomic(buffer, );
+   sdhci_kunmap_atomic(sg, buffer, );
}
 
/* tran, valid */
@@ -646,8 +668,11 @@ static void sdhci_adma_table_post(struct sdhci_host *host,
   (sg_dma_address(sg) & 
SDHCI_ADMA2_MASK);
 
buffer = sdhci_kmap_atomic(sg, );
+   if (IS_ERR(buffer))
+   return;
+
memcpy(buffer, align, size);
-   sdhci_kunmap_atomic(buffer, );
+   sdhci_kunmap_atomic(sg, buffer, );
 
align += SDHCI_ADMA2_ALIGN;
}
-- 
2.1.4

[PATCH 21/22] mmc: tifm_sd: Make use of the new sg_map helper function

2017-04-13 Thread Logan Gunthorpe

This conversion is a bit complicated. We modiy the read_fifo,
write_fifo and copy_page functions to take a scatterlist instead of a
page. Thus we can use sg_map instead of kmap_atomic. There's a bit of
accounting that needed to be done for the offset for this to work.
(Seeing sg_map takes care of the offset but it's already added and
used earlier in the code.

There's also no error path, so if unmappable memory finds its way into
the sgl we can only WARN.

Signed-off-by: Logan Gunthorpe 
---
 drivers/mmc/host/tifm_sd.c | 88 +++---
 1 file changed, 67 insertions(+), 21 deletions(-)

diff --git a/drivers/mmc/host/tifm_sd.c b/drivers/mmc/host/tifm_sd.c
index 93c4b40..75b0d74 100644
--- a/drivers/mmc/host/tifm_sd.c
+++ b/drivers/mmc/host/tifm_sd.c
@@ -111,14 +111,26 @@ struct tifm_sd {
 };
 
 /* for some reason, host won't respond correctly to readw/writew */
-static void tifm_sd_read_fifo(struct tifm_sd *host, struct page *pg,
+static void tifm_sd_read_fifo(struct tifm_sd *host, struct scatterlist *sg,
  unsigned int off, unsigned int cnt)
 {
struct tifm_dev *sock = host->dev;
unsigned char *buf;
unsigned int pos = 0, val;
 
-   buf = kmap_atomic(pg) + off;
+   buf = sg_map_offset(sg, off - sg->offset, SG_KMAP_ATOMIC);
+   if (IS_ERR(buf)) {
+   /*
+* This should really never happen unless
+* the code is changed to use memory that is
+* not mappable in the sg. Seeing there doesn't
+* seem to be any error path out of here,
+* we can only WARN.
+*/
+   WARN(1, "Non-mappable memory used in sg!");
+   return;
+   }
+
if (host->cmd_flags & DATA_CARRY) {
buf[pos++] = host->bounce_buf_data[0];
host->cmd_flags &= ~DATA_CARRY;
@@ -134,17 +146,29 @@ static void tifm_sd_read_fifo(struct tifm_sd *host, 
struct page *pg,
}
buf[pos++] = (val >> 8) & 0xff;
}
-   kunmap_atomic(buf - off);
+   sg_unmap_offset(sg, buf, off - sg->offset, SG_KMAP_ATOMIC);
 }
 
-static void tifm_sd_write_fifo(struct tifm_sd *host, struct page *pg,
+static void tifm_sd_write_fifo(struct tifm_sd *host, struct scatterlist *sg,
   unsigned int off, unsigned int cnt)
 {
struct tifm_dev *sock = host->dev;
unsigned char *buf;
unsigned int pos = 0, val;
 
-   buf = kmap_atomic(pg) + off;
+   buf = sg_map_offset(sg, off - sg->offset, SG_KMAP_ATOMIC);
+   if (IS_ERR(buf)) {
+   /*
+* This should really never happen unless
+* the code is changed to use memory that is
+* not mappable in the sg. Seeing there doesn't
+* seem to be any error path out of here,
+* we can only WARN.
+*/
+   WARN(1, "Non-mappable memory used in sg!");
+   return;
+   }
+
if (host->cmd_flags & DATA_CARRY) {
val = host->bounce_buf_data[0] | ((buf[pos++] << 8) & 0xff00);
writel(val, sock->addr + SOCK_MMCSD_DATA);
@@ -161,7 +185,7 @@ static void tifm_sd_write_fifo(struct tifm_sd *host, struct 
page *pg,
val |= (buf[pos++] << 8) & 0xff00;
writel(val, sock->addr + SOCK_MMCSD_DATA);
}
-   kunmap_atomic(buf - off);
+   sg_unmap_offset(sg, buf, off - sg->offset, SG_KMAP_ATOMIC);
 }
 
 static void tifm_sd_transfer_data(struct tifm_sd *host)
@@ -170,7 +194,6 @@ static void tifm_sd_transfer_data(struct tifm_sd *host)
struct scatterlist *sg = r_data->sg;
unsigned int off, cnt, t_size = TIFM_MMCSD_FIFO_SIZE * 2;
unsigned int p_off, p_cnt;
-   struct page *pg;
 
if (host->sg_pos == host->sg_len)
return;
@@ -192,33 +215,57 @@ static void tifm_sd_transfer_data(struct tifm_sd *host)
}
off = sg[host->sg_pos].offset + host->block_pos;
 
-   pg = nth_page(sg_page([host->sg_pos]), off >> PAGE_SHIFT);
p_off = offset_in_page(off);
p_cnt = PAGE_SIZE - p_off;
p_cnt = min(p_cnt, cnt);
p_cnt = min(p_cnt, t_size);
 
if (r_data->flags & MMC_DATA_READ)
-   tifm_sd_read_fifo(host, pg, p_off, p_cnt);
+   tifm_sd_read_fifo(host, [host->sg_pos], p_off,
+ p_cnt);
else if (r_data->flags & MMC_DATA_WRITE)
-   tifm_sd_write_fifo(host, pg, p_off, p_cnt);
+   tifm_sd_write_fifo(host, [host->sg_pos], p_off,
+  p_cnt);
 
t_size -= p_cnt;
host->block_pos += p_cnt;
}
 }
 
-static void tifm_sd_copy_page(struct page

[PATCH 09/22] dm-crypt: Make use of the new sg_map helper in 4 call sites

2017-04-13 Thread Logan Gunthorpe

Very straightforward conversion to the new function in all four spots.

Signed-off-by: Logan Gunthorpe 
---
 drivers/md/dm-crypt.c | 38 +-
 1 file changed, 25 insertions(+), 13 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 389a363..6bd0ffc 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -589,9 +589,12 @@ static int crypt_iv_lmk_gen(struct crypt_config *cc, u8 
*iv,
int r = 0;
 
if (bio_data_dir(dmreq->ctx->bio_in) == WRITE) {
-   src = kmap_atomic(sg_page(>sg_in));
-   r = crypt_iv_lmk_one(cc, iv, dmreq, src + dmreq->sg_in.offset);
-   kunmap_atomic(src);
+   src = sg_map(>sg_in, SG_KMAP_ATOMIC);
+   if (IS_ERR(src))
+   return PTR_ERR(src);
+
+   r = crypt_iv_lmk_one(cc, iv, dmreq, src);
+   sg_unmap(>sg_in, src, SG_KMAP_ATOMIC);
} else
memset(iv, 0, cc->iv_size);
 
@@ -607,14 +610,17 @@ static int crypt_iv_lmk_post(struct crypt_config *cc, u8 
*iv,
if (bio_data_dir(dmreq->ctx->bio_in) == WRITE)
return 0;
 
-   dst = kmap_atomic(sg_page(>sg_out));
-   r = crypt_iv_lmk_one(cc, iv, dmreq, dst + dmreq->sg_out.offset);
+   dst = sg_map(>sg_out, SG_KMAP_ATOMIC);
+   if (IS_ERR(dst))
+   return PTR_ERR(dst);
+
+   r = crypt_iv_lmk_one(cc, iv, dmreq, dst);
 
/* Tweak the first block of plaintext sector */
if (!r)
-   crypto_xor(dst + dmreq->sg_out.offset, iv, cc->iv_size);
+   crypto_xor(dst, iv, cc->iv_size);
 
-   kunmap_atomic(dst);
+   sg_unmap(>sg_out, dst, SG_KMAP_ATOMIC);
return r;
 }
 
@@ -731,9 +737,12 @@ static int crypt_iv_tcw_gen(struct crypt_config *cc, u8 
*iv,
 
/* Remove whitening from ciphertext */
if (bio_data_dir(dmreq->ctx->bio_in) != WRITE) {
-   src = kmap_atomic(sg_page(>sg_in));
-   r = crypt_iv_tcw_whitening(cc, dmreq, src + 
dmreq->sg_in.offset);
-   kunmap_atomic(src);
+   src = sg_map(>sg_in, SG_KMAP_ATOMIC);
+   if (IS_ERR(src))
+   return PTR_ERR(src);
+
+   r = crypt_iv_tcw_whitening(cc, dmreq, src);
+   sg_unmap(>sg_in, src, SG_KMAP_ATOMIC);
}
 
/* Calculate IV */
@@ -755,9 +764,12 @@ static int crypt_iv_tcw_post(struct crypt_config *cc, u8 
*iv,
return 0;
 
/* Apply whitening on ciphertext */
-   dst = kmap_atomic(sg_page(>sg_out));
-   r = crypt_iv_tcw_whitening(cc, dmreq, dst + dmreq->sg_out.offset);
-   kunmap_atomic(dst);
+   dst = sg_map(>sg_out, SG_KMAP_ATOMIC);
+   if (IS_ERR(dst))
+   return PTR_ERR(dst);
+
+   r = crypt_iv_tcw_whitening(cc, dmreq, dst);
+   sg_unmap(>sg_out, dst, SG_KMAP_ATOMIC);
 
return r;
 }
-- 
2.1.4

[PATCH 20/22] mmc: sdricoh_cs: Make use of the new sg_map helper function

2017-04-13 Thread Logan Gunthorpe

This is a straightforward conversion to the new function.

Signed-off-by: Logan Gunthorpe 
---
 drivers/mmc/host/sdricoh_cs.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/mmc/host/sdricoh_cs.c b/drivers/mmc/host/sdricoh_cs.c
index 5ff26ab..7eeed23 100644
--- a/drivers/mmc/host/sdricoh_cs.c
+++ b/drivers/mmc/host/sdricoh_cs.c
@@ -319,16 +319,20 @@ static void sdricoh_request(struct mmc_host *mmc, struct 
mmc_request *mrq)
for (i = 0; i < data->blocks; i++) {
size_t len = data->blksz;
u8 *buf;
-   struct page *page;
int result;
-   page = sg_page(data->sg);
 
-   buf = kmap(page) + data->sg->offset + (len * i);
+   buf = sg_map_offset(data->sg, (len * i), SG_KMAP);
+   if (IS_ERR(buf)) {
+   cmd->error = PTR_ERR(buf);
+   break;
+   }
+
result =
sdricoh_blockio(host,
data->flags & MMC_DATA_READ, buf, len);
-   kunmap(page);
-   flush_dcache_page(page);
+   sg_unmap_offset(data->sg, buf, (len * i), SG_KMAP);
+
+   flush_dcache_page(sg_page(data->sg));
if (result) {
dev_err(dev, "sdricoh_request: cmd %i "
"block transfer failed\n", cmd->opcode);
-- 
2.1.4

[PATCH 14/22] scsi: arcmsr, ips, megaraid: Make use of the new sg_map helper function

2017-04-13 Thread Logan Gunthorpe

Very straightforward conversion of three scsi drivers

Signed-off-by: Logan Gunthorpe 
---
 drivers/scsi/arcmsr/arcmsr_hba.c | 16 
 drivers/scsi/ips.c   |  8 
 drivers/scsi/megaraid.c  |  9 +++--
 3 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/drivers/scsi/arcmsr/arcmsr_hba.c b/drivers/scsi/arcmsr/arcmsr_hba.c
index af032c4..3cd485c 100644
--- a/drivers/scsi/arcmsr/arcmsr_hba.c
+++ b/drivers/scsi/arcmsr/arcmsr_hba.c
@@ -2306,7 +2306,10 @@ static int arcmsr_iop_message_xfer(struct 
AdapterControlBlock *acb,
 
use_sg = scsi_sg_count(cmd);
sg = scsi_sglist(cmd);
-   buffer = kmap_atomic(sg_page(sg)) + sg->offset;
+   buffer = sg_map(sg, SG_KMAP_ATOMIC);
+   if (IS_ERR(buffer))
+   return ARCMSR_MESSAGE_FAIL;
+
if (use_sg > 1) {
retvalue = ARCMSR_MESSAGE_FAIL;
goto message_out;
@@ -2539,7 +2542,7 @@ static int arcmsr_iop_message_xfer(struct 
AdapterControlBlock *acb,
 message_out:
if (use_sg) {
struct scatterlist *sg = scsi_sglist(cmd);
-   kunmap_atomic(buffer - sg->offset);
+   sg_unmap(sg, buffer, SG_KMAP_ATOMIC);
}
return retvalue;
 }
@@ -2590,11 +2593,16 @@ static void arcmsr_handle_virtual_command(struct 
AdapterControlBlock *acb,
strncpy([32], "R001", 4); /* Product Revision */
 
sg = scsi_sglist(cmd);
-   buffer = kmap_atomic(sg_page(sg)) + sg->offset;
+   buffer = sg_map(sg, SG_KMAP_ATOMIC);
+   if (IS_ERR(buffer)) {
+   cmd->result = (DID_ERROR << 16);
+   cmd->scsi_done(cmd);
+   return;
+   }
 
memcpy(buffer, inqdata, sizeof(inqdata));
sg = scsi_sglist(cmd);
-   kunmap_atomic(buffer - sg->offset);
+   sg_unmap(sg, buffer, SG_KMAP_ATOMIC);
 
cmd->scsi_done(cmd);
}
diff --git a/drivers/scsi/ips.c b/drivers/scsi/ips.c
index 3419e1b..a44291d 100644
--- a/drivers/scsi/ips.c
+++ b/drivers/scsi/ips.c
@@ -1506,14 +1506,14 @@ static int ips_is_passthru(struct scsi_cmnd *SC)
 /* kmap_atomic() ensures addressability of the user buffer.*/
 /* local_irq_save() protects the KM_IRQ0 address slot. */
 local_irq_save(flags);
-buffer = kmap_atomic(sg_page(sg)) + sg->offset;
-if (buffer && buffer[0] == 'C' && buffer[1] == 'O' &&
+buffer = sg_map(sg, SG_KMAP_ATOMIC);
+if (!IS_ERR(buffer) && buffer[0] == 'C' && buffer[1] == 'O' &&
 buffer[2] == 'P' && buffer[3] == 'P') {
-kunmap_atomic(buffer - sg->offset);
+sg_unmap(sg, buffer, SG_KMAP_ATOMIC);
 local_irq_restore(flags);
 return 1;
 }
-kunmap_atomic(buffer - sg->offset);
+sg_unmap(sg, buffer, SG_KMAP_ATOMIC);
 local_irq_restore(flags);
}
return 0;
diff --git a/drivers/scsi/megaraid.c b/drivers/scsi/megaraid.c
index 3c63c29..0b66e50 100644
--- a/drivers/scsi/megaraid.c
+++ b/drivers/scsi/megaraid.c
@@ -663,10 +663,15 @@ mega_build_cmd(adapter_t *adapter, Scsi_Cmnd *cmd, int 
*busy)
struct scatterlist *sg;
 
sg = scsi_sglist(cmd);
-   buf = kmap_atomic(sg_page(sg)) + sg->offset;
+   buf = sg_map(sg, SG_KMAP_ATOMIC);
+   if (IS_ERR(buf)) {
+cmd->result = (DID_ERROR << 16);
+   cmd->scsi_done(cmd);
+   return NULL;
+   }
 
memset(buf, 0, cmd->cmnd[4]);
-   kunmap_atomic(buf - sg->offset);
+   sg_unmap(sg, buf, SG_KMAP_ATOMIC);
 
cmd->result = (DID_OK << 16);
cmd->scsi_done(cmd);
-- 
2.1.4

[PATCH 02/22] nvmet: Make use of the new sg_map helper function

2017-04-13 Thread Logan Gunthorpe

This is a straight forward conversion in two places. Should kmap fail,
the code will return an INVALD_DATA error in the completion.

Signed-off-by: Logan Gunthorpe 
---
 drivers/nvme/target/fabrics-cmd.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/target/fabrics-cmd.c 
b/drivers/nvme/target/fabrics-cmd.c
index 8bd022af..f62a634 100644
--- a/drivers/nvme/target/fabrics-cmd.c
+++ b/drivers/nvme/target/fabrics-cmd.c
@@ -122,7 +122,11 @@ static void nvmet_execute_admin_connect(struct nvmet_req 
*req)
struct nvmet_ctrl *ctrl = NULL;
u16 status = 0;
 
-   d = kmap(sg_page(req->sg)) + req->sg->offset;
+   d = sg_map(req->sg, SG_KMAP);
+   if (IS_ERR(d)) {
+   status = NVME_SC_SGL_INVALID_DATA;
+   goto out;
+   }
 
/* zero out initial completion result, assign values as needed */
req->rsp->result.u32 = 0;
@@ -158,7 +162,7 @@ static void nvmet_execute_admin_connect(struct nvmet_req 
*req)
req->rsp->result.u16 = cpu_to_le16(ctrl->cntlid);
 
 out:
-   kunmap(sg_page(req->sg));
+   sg_unmap(req->sg, d, SG_KMAP);
nvmet_req_complete(req, status);
 }
 
@@ -170,7 +174,11 @@ static void nvmet_execute_io_connect(struct nvmet_req *req)
u16 qid = le16_to_cpu(c->qid);
u16 status = 0;
 
-   d = kmap(sg_page(req->sg)) + req->sg->offset;
+   d = sg_map(req->sg, SG_KMAP);
+   if (IS_ERR(d)) {
+   status = NVME_SC_SGL_INVALID_DATA;
+   goto out;
+   }
 
/* zero out initial completion result, assign values as needed */
req->rsp->result.u32 = 0;
@@ -205,7 +213,7 @@ static void nvmet_execute_io_connect(struct nvmet_req *req)
pr_info("adding queue %d to ctrl %d.\n", qid, ctrl->cntlid);
 
 out:
-   kunmap(sg_page(req->sg));
+   sg_unmap(req->sg, d, SG_KMAP);
nvmet_req_complete(req, status);
return;
 
-- 
2.1.4

[PATCH 13/22] scsi: hisi_sas, mvsas, gdth: Make use of the new sg_map helper function

2017-04-13 Thread Logan Gunthorpe

Very straightforward conversion of three scsi drivers.

Signed-off-by: Logan Gunthorpe 
---
 drivers/scsi/gdth.c|  9 +++--
 drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 14 +-
 drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 13 +
 drivers/scsi/mvsas/mv_sas.c| 10 +-
 4 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/drivers/scsi/gdth.c b/drivers/scsi/gdth.c
index d020a13..82c9fba 100644
--- a/drivers/scsi/gdth.c
+++ b/drivers/scsi/gdth.c
@@ -2301,10 +2301,15 @@ static void gdth_copy_internal_data(gdth_ha_str *ha, 
Scsi_Cmnd *scp,
 return;
 }
 local_irq_save(flags);
-address = kmap_atomic(sg_page(sl)) + sl->offset;
+address = sg_map(sl, SG_KMAP_ATOMIC);
+if (IS_ERR(address)) {
+scp->result = DID_ERROR << 16;
+return;
+   }
+
 memcpy(address, buffer, cpnow);
 flush_dcache_page(sg_page(sl));
-kunmap_atomic(address);
+sg_unmap(sl, address, SG_KMAP_ATOMIC);
 local_irq_restore(flags);
 if (cpsum == cpcount)
 break;
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c 
b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
index 854fbea..30408f8 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
@@ -1377,18 +1377,22 @@ static int slot_complete_v1_hw(struct hisi_hba 
*hisi_hba,
void *to;
struct scatterlist *sg_resp = >smp_task.smp_resp;
 
-   ts->stat = SAM_STAT_GOOD;
-   to = kmap_atomic(sg_page(sg_resp));
+   to = sg_map(sg_resp, SG_KMAP_ATOMIC);
+   if (IS_ERR(to)) {
+   dev_err(dev, "slot complete: error mapping memory");
+   ts->stat = SAS_SG_ERR;
+   break;
+   }
 
+   ts->stat = SAM_STAT_GOOD;
dma_unmap_sg(dev, >smp_task.smp_resp, 1,
 DMA_FROM_DEVICE);
dma_unmap_sg(dev, >smp_task.smp_req, 1,
 DMA_TO_DEVICE);
-   memcpy(to + sg_resp->offset,
-  slot->status_buffer +
+   memcpy(to, slot->status_buffer +
   sizeof(struct hisi_sas_err_record),
   sg_dma_len(sg_resp));
-   kunmap_atomic(to);
+   sg_unmap(sg_resp, to, SG_KMAP_ATOMIC);
break;
}
case SAS_PROTOCOL_SATA:
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c 
b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
index 1b21445..0907947 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
@@ -1796,18 +1796,23 @@ slot_complete_v2_hw(struct hisi_hba *hisi_hba, struct 
hisi_sas_slot *slot,
struct scatterlist *sg_resp = >smp_task.smp_resp;
void *to;
 
+   to = sg_map(sg_resp, SG_KMAP_ATOMIC);
+   if (IS_ERR(to)) {
+   dev_err(dev, "slot complete: error mapping memory");
+   ts->stat = SAS_SG_ERR;
+   break;
+   }
+
ts->stat = SAM_STAT_GOOD;
-   to = kmap_atomic(sg_page(sg_resp));
 
dma_unmap_sg(dev, >smp_task.smp_resp, 1,
 DMA_FROM_DEVICE);
dma_unmap_sg(dev, >smp_task.smp_req, 1,
 DMA_TO_DEVICE);
-   memcpy(to + sg_resp->offset,
-  slot->status_buffer +
+   memcpy(to, slot->status_buffer +
   sizeof(struct hisi_sas_err_record),
   sg_dma_len(sg_resp));
-   kunmap_atomic(to);
+   sg_unmap(sg_resp, to, SG_KMAP_ATOMIC);
break;
}
case SAS_PROTOCOL_SATA:
diff --git a/drivers/scsi/mvsas/mv_sas.c b/drivers/scsi/mvsas/mv_sas.c
index c7cc803..374d0e0 100644
--- a/drivers/scsi/mvsas/mv_sas.c
+++ b/drivers/scsi/mvsas/mv_sas.c
@@ -1798,11 +1798,11 @@ int mvs_slot_complete(struct mvs_info *mvi, u32 
rx_desc, u32 flags)
case SAS_PROTOCOL_SMP: {
struct scatterlist *sg_resp = >smp_task.smp_resp;
tstat->stat = SAM_STAT_GOOD;
-   to = kmap_atomic(sg_page(sg_resp));
-   memcpy(to + sg_resp->offset,
-   slot->response + sizeof(struct mvs_err_info),
-   sg_dma_len(sg_resp));
-   kunmap_atomic(to);
+   to = sg_map(sg_resp, SG_KMAP_ATOMIC);
+   memcpy(to,
+  slot->response + sizeof(struct mvs_err_info),
+  sg_dma_len(sg_resp));
+   sg_unmap(sg_resp, to, SG_KMAP_ATOMIC);

[PATCH 11/22] RDS: Make use of the new sg_map helper function

2017-04-13 Thread Logan Gunthorpe

Straightforward conversion except there's no error path, so we WARN if
the sg_map fails.

Signed-off-by: Logan Gunthorpe 
---
 net/rds/ib_recv.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/net/rds/ib_recv.c b/net/rds/ib_recv.c
index e10624a..7f8fa99 100644
--- a/net/rds/ib_recv.c
+++ b/net/rds/ib_recv.c
@@ -801,9 +801,20 @@ static void rds_ib_cong_recv(struct rds_connection *conn,
to_copy = min(RDS_FRAG_SIZE - frag_off, PAGE_SIZE - map_off);
BUG_ON(to_copy & 7); /* Must be 64bit aligned. */
 
-   addr = kmap_atomic(sg_page(>f_sg));
+   addr = sg_map(>f_sg, SG_KMAP_ATOMIC);
+   if (IS_ERR(addr)) {
+   /*
+* This should really never happen unless
+* the code is changed to use memory that is
+* not mappable in the sg. Seeing there doesn't
+* seem to be any error path out of here,
+* we can only WARN.
+*/
+   WARN(1, "Non-mappable memory used in sg!");
+   return;
+   }
 
-   src = addr + frag->f_sg.offset + frag_off;
+   src = addr + frag_off;
dst = (void *)map->m_page_addrs[map_page] + map_off;
for (k = 0; k < to_copy; k += 8) {
/* Record ports that became uncongested, ie
@@ -811,7 +822,7 @@ static void rds_ib_cong_recv(struct rds_connection *conn,
uncongested |= ~(*src) & *dst;
*dst++ = *src++;
}
-   kunmap_atomic(addr);
+   sg_unmap(>f_sg, addr, SG_KMAP_ATOMIC);
 
copied += to_copy;
 
-- 
2.1.4

[PATCH 06/22] crypto: hifn_795x: Make use of the new sg_map helper function

2017-04-13 Thread Logan Gunthorpe

Conversion of a couple kmap_atomic instances to the sg_map helper
function.

However, it looks like there was a bug in the original code: the source
scatter lists offset (t->offset) was passed to ablkcipher_get which
added it to the destination address. This doesn't make a lot of
sense, but t->offset is likely always zero anyway. So, this patch cleans
that brokeness up.

Also, a change to the error path: if ablkcipher_get failed, everything
seemed to proceed as if it hadn't. Setting 'error' should hopefully
clear that up.

Signed-off-by: Logan Gunthorpe 
---
 drivers/crypto/hifn_795x.c | 32 +---
 1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/drivers/crypto/hifn_795x.c b/drivers/crypto/hifn_795x.c
index e09d405..8e2c6a9 100644
--- a/drivers/crypto/hifn_795x.c
+++ b/drivers/crypto/hifn_795x.c
@@ -1619,7 +1619,7 @@ static int hifn_start_device(struct hifn_device *dev)
return 0;
 }
 
-static int ablkcipher_get(void *saddr, unsigned int *srestp, unsigned int 
offset,
+static int ablkcipher_get(void *saddr, unsigned int *srestp,
struct scatterlist *dst, unsigned int size, unsigned int 
*nbytesp)
 {
unsigned int srest = *srestp, nbytes = *nbytesp, copy;
@@ -1632,15 +1632,17 @@ static int ablkcipher_get(void *saddr, unsigned int 
*srestp, unsigned int offset
while (size) {
copy = min3(srest, dst->length, size);
 
-   daddr = kmap_atomic(sg_page(dst));
-   memcpy(daddr + dst->offset + offset, saddr, copy);
-   kunmap_atomic(daddr);
+   daddr = sg_map(dst, SG_KMAP_ATOMIC);
+   if (IS_ERR(daddr))
+   return PTR_ERR(daddr);
+
+   memcpy(daddr, saddr, copy);
+   sg_unmap(dst, daddr, SG_KMAP_ATOMIC);
 
nbytes -= copy;
size -= copy;
srest -= copy;
saddr += copy;
-   offset = 0;
 
pr_debug("%s: copy: %u, size: %u, srest: %u, nbytes: %u.\n",
 __func__, copy, size, srest, nbytes);
@@ -1671,11 +1673,12 @@ static inline void hifn_complete_sa(struct hifn_device 
*dev, int i)
 
 static void hifn_process_ready(struct ablkcipher_request *req, int error)
 {
+   int err;
struct hifn_request_context *rctx = ablkcipher_request_ctx(req);
 
if (rctx->walk.flags & ASYNC_FLAGS_MISALIGNED) {
unsigned int nbytes = req->nbytes;
-   int idx = 0, err;
+   int idx = 0;
struct scatterlist *dst, *t;
void *saddr;
 
@@ -1695,17 +1698,24 @@ static void hifn_process_ready(struct 
ablkcipher_request *req, int error)
continue;
}
 
-   saddr = kmap_atomic(sg_page(t));
+   saddr = sg_map(t, SG_KMAP_ATOMIC);
+   if (IS_ERR(saddr)) {
+   if (!error)
+   error = PTR_ERR(saddr);
+   break;
+   }
+
+   err = ablkcipher_get(saddr, >length,
+dst, nbytes, );
+   sg_unmap(t, saddr, SG_KMAP_ATOMIC);
 
-   err = ablkcipher_get(saddr, >length, t->offset,
-   dst, nbytes, );
if (err < 0) {
-   kunmap_atomic(saddr);
+   if (!error)
+   error = err;
break;
}
 
idx += err;
-   kunmap_atomic(saddr);
}
 
hifn_cipher_walk_exit(>walk);
-- 
2.1.4

Re: [RFC net-next] of: mdio: Honor hints from MDIO bus drivers

2017-04-13 Thread Andrew Lunn

> The DT binding is in tree and provides an example of how the switch
> looks like, below is the example, but I am also adding the MDIO bus and
> the PHYs just so you can see how things wind up:
> 
> switch_top@f0b0 {
> compatible = "simple-bus";
> #size-cells = <1>;
> #address-cells = <1>;
> ranges = <0 0xf0b0 0x40804>;
> 
> ethernet_switch@0 {
> compatible = "brcm,bcm7445-switch-v4.0";
> #size-cells = <0>;
> #address-cells = <1>;
> reg = <0x0 0x4
> 0x4 0x110
> 0x40340 0x30
> 0x40380 0x30
> 0x40400 0x34
> 0x40600 0x208>;
> reg-names = "core", "reg", intrl2_0", "intrl2_1",
> "fcb, "acb";
> interrupts = <0 0x18 0
> 0 0x19 0>;
> brcm,num-gphy = <1>;
> brcm,num-rgmii-ports = <2>;
> brcm,fcb-pause-override;
> brcm,acb-packets-inflight;
> 
> ports {
> #address-cells = <1>;
> #size-cells = <0>;
> 
> port@0 {
> label = "gphy";
> reg = <0>;
>   phy-handle = <>;
> };
> 
>   sw0port1: port@1 {
>   label = "rgmii_1";
>   reg = <1>;
>   phy-mode = "rgmii";
>   fixed-link {
>   speed = <1000>;
>   full-duplex;
>   };
>   }
> };
> };
> 
>   mdio@403c0 {
>   reg = <0x403c0 0x8 0x40300 0x18>;
>   #address-cells = <0x1>;
>   #size-cells = <0x0>;
>   compatible = "brcm,unimac-mdio";
>   reg-names = "mdio", "mdio_indir_rw";
> 
>   switch: switch@0 {
>   broken-turn-around;
>   reg = <0x0>;
>   compatible = "brcm,bcm53125";
>   #address-cells = <1>;
>   #size-cells = <0>;
> 
>   ports {
>   ..
>   port@8 {
>   ethernet = <>;
>   };
>   ...
>   };
>   };
> 
>   phy5: ethernet-phy@5 {
>   reg = <0x5>;
>   compatible = "ethernet-phy-ieee802.3-c22";
>   };
>   };
> };

So phy5 is connected to the internal switch with a phy-handle. But
because of your double usage of this node, it also can be mapped into
the external switches port 5?

Is that your problem?

It seems like you should add an mdio node inside your switch node, and
list your external switch internal/external phys there if needed.

   Andrew

Re: [PATCH v4 net-next RFC] net: Generic XDP

2017-04-13 Thread David Miller

From: Michael Chan 
Date: Thu, 13 Apr 2017 13:16:43 -0700

> On Thu, Apr 13, 2017 at 9:09 AM, David Miller  wrote:
>>
>> ---
>>
>> v4:
>>  - Fix MAC header adjustmnet before calling prog (David Ahern)
>>  - Disable LRO when generic XDP is installed (Michael Chan)
> 
> I don't see where you are disabling LRO in the patch.

Ugh, I posted the wrong patch, here is the correct one.  Sorry about that:


Subject: [PATCH] net: Generic XDP

This provides a generic SKB based non-optimized XDP path which is used
if either the driver lacks a specific XDP implementation, or the user
requests it via a new IFLA_XDP_FLAGS value named XDP_FLAGS_SKB_MODE.

It is arguable that perhaps I should have required something like
this as part of the initial XDP feature merge.

I believe this is critical for two reasons:

1) Accessibility.  More people can play with XDP with less
   dependencies.  Yes I know we have XDP support in virtio_net, but
   that just creates another depedency for learning how to use this
   facility.

   I wrote this to make life easier for the XDP newbies.

2) As a model for what the expected semantics are.  If there is a pure
   generic core implementation, it serves as a semantic example for
   driver folks adding XDP support.

This is just a rough draft and is untested.

One thing I have not tried to address here is the issue of
XDP_PACKET_HEADROOM, thanks to Daniel for spotting that.  It seems
incredibly expensive to do a skb_cow(skb, XDP_PACKET_HEADROOM) or
whatever even if the XDP program doesn't try to push headers at all.
I think we really need the verifier to somehow propagate whether
certain XDP helpers are used or not.

Signed-off-by: David S. Miller 
---

v4:
 - Fix MAC header adjustmnet before calling prog (David Ahern)
 - Disable LRO when generic XDP is installed (Michael Chan)
 - Bypass qdisc et al. on XDP_TX and record the event (Alexei)
 - Do not perform generic XDP on reinjected packets (DaveM)

v3:
 - Make sure XDP program sees packet at MAC header, push back MAC
   header if we do XDP_TX.  (Alexei)
 - Elide GRO when generic XDP is in use.  (Alexei)
 - Add XDP_FLAG_SKB_MODE flag which the user can use to request generic
   XDP even if the driver has an XDP implementation.  (Alexei)
 - Report whether SKB mode is in use in rtnl_xdp_fill() via XDP_FLAGS
   attribute.  (Daniel)

v2:
 - Add some "fall through" comments in switch statements based
   upon feedback from Andrew Lunn
 - Use RCU for generic xdp_prog, thanks to Johannes Berg.
---
 include/linux/netdevice.h|   8 +++
 include/uapi/linux/if_link.h |   4 +-
 net/core/dev.c   | 153 +--
 net/core/gro_cells.c |   2 +-
 net/core/rtnetlink.c |  40 ++-
 5 files changed, 185 insertions(+), 22 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index b0aa089..071a58b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1891,9 +1891,17 @@ struct net_device {
struct lock_class_key   *qdisc_tx_busylock;
struct lock_class_key   *qdisc_running_key;
boolproto_down;
+   struct bpf_prog __rcu   *xdp_prog;
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
+static inline bool netif_elide_gro(const struct net_device *dev)
+{
+   if (!(dev->features & NETIF_F_GRO) || dev->xdp_prog)
+   return true;
+   return false;
+}
+
 #defineNETDEV_ALIGN32
 
 static inline
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 8b405af..633aa02 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -887,7 +887,9 @@ enum {
 /* XDP section */
 
 #define XDP_FLAGS_UPDATE_IF_NOEXIST(1U << 0)
-#define XDP_FLAGS_MASK (XDP_FLAGS_UPDATE_IF_NOEXIST)
+#define XDP_FLAGS_SKB_MODE (2U << 0)
+#define XDP_FLAGS_MASK (XDP_FLAGS_UPDATE_IF_NOEXIST | \
+XDP_FLAGS_SKB_MODE)
 
 enum {
IFLA_XDP_UNSPEC,
diff --git a/net/core/dev.c b/net/core/dev.c
index ef9fe60e..b3d3a6e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -95,6 +95,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -4247,6 +4248,123 @@ static int __netif_receive_skb(struct sk_buff *skb)
return ret;
 }
 
+static struct static_key generic_xdp_needed __read_mostly;
+
+static int generic_xdp_install(struct net_device *dev, struct netdev_xdp *xdp)
+{
+   struct bpf_prog *new = xdp->prog;
+   int ret = 0;
+
+   switch (xdp->command) {
+   case XDP_SETUP_PROG: {
+   struct bpf_prog *old = rtnl_dereference(dev->xdp_prog);
+
+   rcu_assign_pointer(dev->xdp_prog, new);
+   if (old)
+   bpf_prog_put(old);
+
+   if (old && !new) {
+

Re: [PATCH v4 net-next RFC] net: Generic XDP

2017-04-13 Thread Michael Chan

On Thu, Apr 13, 2017 at 9:09 AM, David Miller  wrote:
>
> ---
>
> v4:
>  - Fix MAC header adjustmnet before calling prog (David Ahern)
>  - Disable LRO when generic XDP is installed (Michael Chan)

I don't see where you are disabling LRO in the patch.

>  - Bypass qdisc et al. on XDP_TX and record the event (Alexei)
>  - Do not perform generic XDP on reinjected packets (DaveM)
>

Re: [PATCH v3 net-next RFC] Generic XDP

2017-04-13 Thread David Miller

From: Johannes Berg 
Date: Thu, 13 Apr 2017 21:22:21 +0200

> OTOH, it might depend on the frame data itself, if the program does
> something like
> 
> xdp->data[xdp->data[0] & 0xf]
> 
> (read or write, doesn't really matter) so then the verifier would have
> to take the maximum possible value there into account.

I am not well versed enough with the verifier to understand exactly
how and to what extent SKB accesses are validated by the verifier.

My, perhaps mistaken, impression is that access range validation is
still at least partially done at run time.

Re: net/ipv4: use-after-free in ip_queue_xmit

2017-04-13 Thread Cong Wang

On Thu, Apr 13, 2017 at 11:49 AM, Andrey Konovalov
 wrote:
> On Mon, Apr 10, 2017 at 7:46 PM, Andrey Konovalov  
> wrote:
>> On Mon, Apr 10, 2017 at 7:42 PM, Cong Wang  wrote:
>>> On Mon, Apr 10, 2017 at 7:40 AM, Andrey Konovalov  
>>> wrote:
 Hi,

 I've got the following error report while fuzzing the kernel with 
 syzkaller.

 On commit 39da7c509acff13fc8cb12ec1bb20337c988ed36 (4.11-rc6).

 Unfortunately it's not reproducible.

 BUG: KASAN: use-after-free in ip_select_ttl include/net/dst.h:176
 [inline] at addr 88006ab3602c
 BUG: KASAN: use-after-free in ip_queue_xmit+0x1817/0x1a30
 net/ipv4/ip_output.c:485 at addr 88006ab3602c
 Read of size 4 by task syz-executor1/12627
 CPU: 3 PID: 12627 Comm: syz-executor1 Not tainted 4.11.0-rc6+ #206
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 
 01/01/2011
 Call Trace:
  __dump_stack lib/dump_stack.c:16 [inline]
  dump_stack+0x292/0x398 lib/dump_stack.c:52
  kasan_object_err+0x1c/0x70 mm/kasan/report.c:164
  print_address_description mm/kasan/report.c:202 [inline]
  kasan_report_error mm/kasan/report.c:291 [inline]
  kasan_report+0x252/0x510 mm/kasan/report.c:347
  __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:367
  ip_select_ttl include/net/dst.h:176 [inline]
>>>
>>> Probably same as the one you reported on ipv4_mtu(), it would
>>> be nice if you could test the patch I proposed:
>>>
>>> https://patchwork.ozlabs.org/patch/747556/
>>
>> Applied your patch.
>
> Oops, apparently your patch doesn't compile:
>

Weird, it compiles fine here. Either you have a different config
or the following piece is missing for some reason?

@@ -69,6 +69,7 @@  struct rtable {

  struct list_head rt_uncached;
  struct uncached_list *rt_uncached_list;
+ struct fib_info *fi; /* for refcnt to shared metrics */
 };

[PATCH block-tree] net: off by one in inet6_pton()

2017-04-13 Thread Dan Carpenter

If "scope_len" is sizeof(scope_id) then we would put the NUL terminator
one space beyond the end of the buffer.

Fixes: b1a951fe469e ("net/utils: generic inet_pton_with_scope helper")
Signed-off-by: Dan Carpenter 
---
This one goes through Jens' tree not through net-dev.

diff --git a/net/core/utils.c b/net/core/utils.c
index da1089ea5389..93066bd0305a 100644
--- a/net/core/utils.c
+++ b/net/core/utils.c
@@ -339,7 +339,7 @@ static int inet6_pton(struct net *net, const char *src, u16 
port_num,
src + srclen != scope_delim && *scope_delim == '%') {
struct net_device *dev;
char scope_id[16];
-   size_t scope_len = min_t(size_t, sizeof(scope_id),
+   size_t scope_len = min_t(size_t, sizeof(scope_id) - 1,
 src + srclen - scope_delim - 1);
 
memcpy(scope_id, scope_delim + 1, scope_len);

Re: [PATCH v3] net: phy: micrel: fix crash when statistic requested for KSZ9031 phy

2017-04-13 Thread Florian Fainelli

On 04/13/2017 12:11 PM, Grygorii Strashko wrote:
> Now the command:
>   ethtool --phy-statistics eth0
> will cause system crash with meassage "Unable to handle kernel NULL pointer
> dereference at virtual address 0010" from:
> 
>  (kszphy_get_stats) from [] (ethtool_get_phy_stats+0xd8/0x210)
>  (ethtool_get_phy_stats) from [] (dev_ethtool+0x5b8/0x228c)
>  (dev_ethtool) from [] (dev_ioctl+0x3fc/0x964)
>  (dev_ioctl) from [] (sock_ioctl+0x170/0x2c0)
>  (sock_ioctl) from [] (do_vfs_ioctl+0xa8/0x95c)
>  (do_vfs_ioctl) from [] (SyS_ioctl+0x3c/0x64)
>  (SyS_ioctl) from [] (ret_fast_syscall+0x0/0x44)
> 
> The reason: phy_driver structure for KSZ9031 phy has no .probe() callback
> defined. As result, struct phy_device *phydev->priv pointer will not be
> initializes (null).
> This issue will affect also following phys:
>  KSZ8795, KSZ886X, KSZ8873MLL, KSZ9031, KSZ9021, KSZ8061, KS8737
> 
> Fix it by:
> - adding .probe() = kszphy_probe() callback to KSZ9031, KSZ9021
> phys. The kszphy_probe() can be re-used as it doesn't do any phy specific
> settings.
> - removing statistic callbacks from other phys (KSZ8795, KSZ886X,
> KSZ8873MLL, KSZ8061, KS8737) as they doesn't have corresponding
> statistic counters.
> 
> Fixes: 2b2427d06426 ("phy: micrel: Add ethtool statistics counters")
> Signed-off-by: Grygorii Strashko 
> Reviewed-by: Andrew Lunn 

Reviewed-by: Florian Fainelli 
-- 
Florian

Re: [PATCH v3] net: phy: micrel: fix crash when statistic requested for KSZ9031 phy

2017-04-13 Thread Andrew Lunn

On Thu, Apr 13, 2017 at 10:28:09PM +0300, Sergei Shtylyov wrote:
> On 04/13/2017 10:11 PM, Grygorii Strashko wrote:
> 
> >Now the command:
> > ethtool --phy-statistics eth0
> >will cause system crash with meassage "Unable to handle kernel NULL pointer
> >dereference at virtual address 0010" from:
> >
> > (kszphy_get_stats) from [] (ethtool_get_phy_stats+0xd8/0x210)
> > (ethtool_get_phy_stats) from [] (dev_ethtool+0x5b8/0x228c)
> > (dev_ethtool) from [] (dev_ioctl+0x3fc/0x964)
> > (dev_ioctl) from [] (sock_ioctl+0x170/0x2c0)
> > (sock_ioctl) from [] (do_vfs_ioctl+0xa8/0x95c)
> > (do_vfs_ioctl) from [] (SyS_ioctl+0x3c/0x64)
> > (SyS_ioctl) from [] (ret_fast_syscall+0x0/0x44)
> >
> >The reason: phy_driver structure for KSZ9031 phy has no .probe() callback
> >defined. As result, struct phy_device *phydev->priv pointer will not be
> >initializes (null).
> >This issue will affect also following phys:
> > KSZ8795, KSZ886X, KSZ8873MLL, KSZ9031, KSZ9021, KSZ8061, KS8737
> >
> >Fix it by:
> >- adding .probe() = kszphy_probe() callback to KSZ9031, KSZ9021
> >phys. The kszphy_probe() can be re-used as it doesn't do any phy specific
> >settings.
> >- removing statistic callbacks from other phys (KSZ8795, KSZ886X,
> >KSZ8873MLL, KSZ8061, KS8737) as they doesn't have corresponding
> >statistic counters.
> 
>Not sure how the 2nd change fixes the reported issue. It looks
> like a material for a separate patch...

There are two different cases here:

1) The hardware supports the stats. So a probe function is needed, but
is missing.

2) The hardware does not support the stats, so there should not be
stats ops.

The same crash will happen, independent of which one of the above is
true. You need to fix them both, to stop it crashing.

  Andrew

Re: [PATCH v3] net: phy: micrel: fix crash when statistic requested for KSZ9031 phy

2017-04-13 Thread Sergei Shtylyov


On 04/13/2017 10:11 PM, Grygorii Strashko wrote:


Now the command:
ethtool --phy-statistics eth0
will cause system crash with meassage "Unable to handle kernel NULL pointer
dereference at virtual address 0010" from:

 (kszphy_get_stats) from [] (ethtool_get_phy_stats+0xd8/0x210)
 (ethtool_get_phy_stats) from [] (dev_ethtool+0x5b8/0x228c)
 (dev_ethtool) from [] (dev_ioctl+0x3fc/0x964)
 (dev_ioctl) from [] (sock_ioctl+0x170/0x2c0)
 (sock_ioctl) from [] (do_vfs_ioctl+0xa8/0x95c)
 (do_vfs_ioctl) from [] (SyS_ioctl+0x3c/0x64)
 (SyS_ioctl) from [] (ret_fast_syscall+0x0/0x44)

The reason: phy_driver structure for KSZ9031 phy has no .probe() callback
defined. As result, struct phy_device *phydev->priv pointer will not be
initializes (null).
This issue will affect also following phys:
 KSZ8795, KSZ886X, KSZ8873MLL, KSZ9031, KSZ9021, KSZ8061, KS8737

Fix it by:
- adding .probe() = kszphy_probe() callback to KSZ9031, KSZ9021
phys. The kszphy_probe() can be re-used as it doesn't do any phy specific
settings.
- removing statistic callbacks from other phys (KSZ8795, KSZ886X,
KSZ8873MLL, KSZ8061, KS8737) as they doesn't have corresponding
statistic counters.


   Not sure how the 2nd change fixes the reported issue. It looks like a 
material for a separate patch...



Fixes: 2b2427d06426 ("phy: micrel: Add ethtool statistics counters")
Signed-off-by: Grygorii Strashko 
Reviewed-by: Andrew Lunn 

[...]

MBR, Sergei

Mac80211 - 802.11s

2017-04-13 Thread prabhu

How to get the vif of mesh stations and does the vif have separate queue 
structure apart from PHY queues ?


I want to get the queue statistics for every mesh station that are 
associated.

Re: [PATCH 1/5] netlink: extended ACK reporting

2017-04-13 Thread Johannes Berg

On Thu, 2017-04-13 at 16:05 +0200, Nicolas Dichtel wrote:

> Sure. It was just to mention that attribute 0 exists somewhere.
> The other 0 attribute is OVS_TUNNEL_KEY_ATTR_ID.

That looks like some really awkward hand-grown parsing - with all these
"struct ovs_len_tbl" looking almost like a policy, but not using that
code?

Seems like something somebody should take a hard look at and see if it
can't use more standard infrastructure.

johannes

Re: [PATCH v3 net-next RFC] Generic XDP

2017-04-13 Thread Johannes Berg

On Thu, 2017-04-13 at 11:37 -0400, David Miller wrote:

> If the capability is variable, it must be communicated to the user
> somehow at program load time.
> 
> We are consistently finding that there is this real need to
> communicate XDP capabilities, or somehow verify that the needs
> of an XDP program can be satisfied by a given implementation.

Technically, once you know the capability of the *driver*, the verifier
should be able to check if the *program* is compatible. So if the
driver can guarantee "you always get 2k accessible", the verifier can
check that you don't access more than xdb->data + 2047, similar to how
it verifies that you don't access beyond xdb->data_end.

> And eth_get_headlen() only pulls protocol headers, which precludes
> XDP inspecting anything below TCP/UDP/etc.  This is also not
> reasonable.
> 
> Right now, as it stands, we have to assume the program can
> potentially be interested in the entire packet.

I agree with this though, it's not reasonable to have wildly varying
implementations here that may or may not be able to access almost
anything. The totally degenerate case would be having no skb header at
all, which is also still entirely valid from the network stack's POV.

> We can only optimize this and elide things when we have a facility in
> the future for the program to express it's needs precisely.  I think
> we will have to add some control structure to XDP programs that can
> be filled in for this purpose.

Like I said above, I think this is something that you can possibly
determine in the verifier.

So if, for example, the verifier notices that the program never
accesses anything but the first few bytes, then it would seem valid to
run with only that much pulled into the skb header.

OTOH, it might depend on the frame data itself, if the program does
something like

xdp->data[xdp->data[0] & 0xf]

(read or write, doesn't really matter) so then the verifier would have
to take the maximum possible value there into account.

johannes

[PATCH v3] net: phy: micrel: fix crash when statistic requested for KSZ9031 phy

2017-04-13 Thread Grygorii Strashko

Now the command:
ethtool --phy-statistics eth0
will cause system crash with meassage "Unable to handle kernel NULL pointer
dereference at virtual address 0010" from:

 (kszphy_get_stats) from [] (ethtool_get_phy_stats+0xd8/0x210)
 (ethtool_get_phy_stats) from [] (dev_ethtool+0x5b8/0x228c)
 (dev_ethtool) from [] (dev_ioctl+0x3fc/0x964)
 (dev_ioctl) from [] (sock_ioctl+0x170/0x2c0)
 (sock_ioctl) from [] (do_vfs_ioctl+0xa8/0x95c)
 (do_vfs_ioctl) from [] (SyS_ioctl+0x3c/0x64)
 (SyS_ioctl) from [] (ret_fast_syscall+0x0/0x44)

The reason: phy_driver structure for KSZ9031 phy has no .probe() callback
defined. As result, struct phy_device *phydev->priv pointer will not be
initializes (null).
This issue will affect also following phys:
 KSZ8795, KSZ886X, KSZ8873MLL, KSZ9031, KSZ9021, KSZ8061, KS8737

Fix it by:
- adding .probe() = kszphy_probe() callback to KSZ9031, KSZ9021
phys. The kszphy_probe() can be re-used as it doesn't do any phy specific
settings.
- removing statistic callbacks from other phys (KSZ8795, KSZ886X,
KSZ8873MLL, KSZ8061, KS8737) as they doesn't have corresponding
statistic counters.

Fixes: 2b2427d06426 ("phy: micrel: Add ethtool statistics counters")
Signed-off-by: Grygorii Strashko 
Reviewed-by: Andrew Lunn 
---
changes in v3:
- occasional whitespace change removed

changes in v2:
 - probe callback added to KSZ9031, KSZ9021
 - statistic callback removed from KSZ8795, KSZ886X, KSZ8873MLL, KSZ8061, KS8737

Links
v2: https://patchwork.ozlabs.org/patch/750194/
v1: https://lkml.org/lkml/2017/4/10/1183

 drivers/net/phy/micrel.c | 17 ++---
 1 file changed, 2 insertions(+), 15 deletions(-)

diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c
index 6742070..1326d99 100644
--- a/drivers/net/phy/micrel.c
+++ b/drivers/net/phy/micrel.c
@@ -798,9 +798,6 @@ static struct phy_driver ksphy_driver[] = {
.read_status= genphy_read_status,
.ack_interrupt  = kszphy_ack_interrupt,
.config_intr= kszphy_config_intr,
-   .get_sset_count = kszphy_get_sset_count,
-   .get_strings= kszphy_get_strings,
-   .get_stats  = kszphy_get_stats,
.suspend= genphy_suspend,
.resume = genphy_resume,
 }, {
@@ -940,9 +937,6 @@ static struct phy_driver ksphy_driver[] = {
.read_status= genphy_read_status,
.ack_interrupt  = kszphy_ack_interrupt,
.config_intr= kszphy_config_intr,
-   .get_sset_count = kszphy_get_sset_count,
-   .get_strings= kszphy_get_strings,
-   .get_stats  = kszphy_get_stats,
.suspend= genphy_suspend,
.resume = genphy_resume,
 }, {
@@ -952,6 +946,7 @@ static struct phy_driver ksphy_driver[] = {
.features   = PHY_GBIT_FEATURES,
.flags  = PHY_HAS_MAGICANEG | PHY_HAS_INTERRUPT,
.driver_data= _type,
+   .probe  = kszphy_probe,
.config_init= ksz9021_config_init,
.config_aneg= genphy_config_aneg,
.read_status= genphy_read_status,
@@ -971,6 +966,7 @@ static struct phy_driver ksphy_driver[] = {
.features   = PHY_GBIT_FEATURES,
.flags  = PHY_HAS_MAGICANEG | PHY_HAS_INTERRUPT,
.driver_data= _type,
+   .probe  = kszphy_probe,
.config_init= ksz9031_config_init,
.config_aneg= genphy_config_aneg,
.read_status= ksz9031_read_status,
@@ -989,9 +985,6 @@ static struct phy_driver ksphy_driver[] = {
.config_init= kszphy_config_init,
.config_aneg= ksz8873mll_config_aneg,
.read_status= ksz8873mll_read_status,
-   .get_sset_count = kszphy_get_sset_count,
-   .get_strings= kszphy_get_strings,
-   .get_stats  = kszphy_get_stats,
.suspend= genphy_suspend,
.resume = genphy_resume,
 }, {
@@ -1003,9 +996,6 @@ static struct phy_driver ksphy_driver[] = {
.config_init= kszphy_config_init,
.config_aneg= genphy_config_aneg,
.read_status= genphy_read_status,
-   .get_sset_count = kszphy_get_sset_count,
-   .get_strings= kszphy_get_strings,
-   .get_stats  = kszphy_get_stats,
.suspend= genphy_suspend,
.resume = genphy_resume,
 }, {
@@ -1017,9 +1007,6 @@ static struct phy_driver ksphy_driver[] = {
.config_init= kszphy_config_init,
.config_aneg= ksz8873mll_config_aneg,
.read_status= ksz8873mll_read_status,
-   .get_sset_count = kszphy_get_sset_count,
-   .get_strings= kszphy_get_strings,
-   .get_stats  = kszphy_get_stats,
.suspend= genphy_suspend,
.resume = genphy_resume,
 } };
-- 
2.10.1

Re: [PATCH v2] net: phy: micrel: fix crash when statistic requested for KSZ9031 phy

2017-04-13 Thread Grygorii Strashko




On 04/13/2017 01:51 PM, Andrew Lunn wrote:

On Wed, Apr 12, 2017 at 05:55:10PM -0500, Grygorii Strashko wrote:

Now the command:
ethtool --phy-statistics eth0
will cause system crash with meassage "Unable to handle kernel NULL pointer
dereference at virtual address 0010" from:

 (kszphy_get_stats) from [] (ethtool_get_phy_stats+0xd8/0x210)
 (ethtool_get_phy_stats) from [] (dev_ethtool+0x5b8/0x228c)
 (dev_ethtool) from [] (dev_ioctl+0x3fc/0x964)
 (dev_ioctl) from [] (sock_ioctl+0x170/0x2c0)
 (sock_ioctl) from [] (do_vfs_ioctl+0xa8/0x95c)
 (do_vfs_ioctl) from [] (SyS_ioctl+0x3c/0x64)
 (SyS_ioctl) from [] (ret_fast_syscall+0x0/0x44)

The reason: phy_driver structure for KSZ9031 phy has no .probe() callback
defined. As result, struct phy_device *phydev->priv pointer will not be
initializes (null).
This issue will affect also following phys:
 KSZ8795, KSZ886X, KSZ8873MLL, KSZ9031, KSZ9021, KSZ8061, KS8737

Fix it by:
- adding .probe() = kszphy_probe() callback to KSZ9031, KSZ9021
phys. The kszphy_probe() can be re-used as it doesn't do any phy specific
settings.
- removing statistic callbacks from other phys (KSZ8795, KSZ886X,
KSZ8873MLL, KSZ8061, KS8737) as they doesn't have corresponding
statistic counters.

Fixes: 2b2427d06426 ("phy: micrel: Add ethtool statistics counters")
Signed-off-by: Grygorii Strashko 
---
changes in v2:
 - probe callback added to KSZ9031, KSZ9021
 - statistic callback removed from KSZ8795, KSZ886X, KSZ8873MLL, KSZ8061, KS8737

Link on v1:
 https://lkml.org/lkml/2017/4/10/1183

 drivers/net/phy/micrel.c | 18 ++
 1 file changed, 2 insertions(+), 16 deletions(-)

diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c
index 6742070..6f207e6 100644
--- a/drivers/net/phy/micrel.c
+++ b/drivers/net/phy/micrel.c
@@ -574,7 +574,6 @@ static int ksz9031_config_init(struct phy_device *phydev)
MII_KSZ9031RN_TX_DATA_PAD_SKEW, 4,
tx_data_skews, 4);
}
-
return ksz9031_center_flp_timing(phydev);
 }


Hi Grygorii

Whitespace changed like this should be in a separate patch, or not
made at all.



Oh. sry i've missed it. Will resend




Otherwise, thanks for looking at the datasheets and fixing this up.

Reviewed-by: Andrew Lunn 



--
regards,
-grygorii

Re: [PATCH v2] net: phy: micrel: fix crash when statistic requested for KSZ9031 phy

2017-04-13 Thread Andrew Lunn

On Wed, Apr 12, 2017 at 05:55:10PM -0500, Grygorii Strashko wrote:
> Now the command:
>   ethtool --phy-statistics eth0
> will cause system crash with meassage "Unable to handle kernel NULL pointer
> dereference at virtual address 0010" from:
> 
>  (kszphy_get_stats) from [] (ethtool_get_phy_stats+0xd8/0x210)
>  (ethtool_get_phy_stats) from [] (dev_ethtool+0x5b8/0x228c)
>  (dev_ethtool) from [] (dev_ioctl+0x3fc/0x964)
>  (dev_ioctl) from [] (sock_ioctl+0x170/0x2c0)
>  (sock_ioctl) from [] (do_vfs_ioctl+0xa8/0x95c)
>  (do_vfs_ioctl) from [] (SyS_ioctl+0x3c/0x64)
>  (SyS_ioctl) from [] (ret_fast_syscall+0x0/0x44)
> 
> The reason: phy_driver structure for KSZ9031 phy has no .probe() callback
> defined. As result, struct phy_device *phydev->priv pointer will not be
> initializes (null).
> This issue will affect also following phys:
>  KSZ8795, KSZ886X, KSZ8873MLL, KSZ9031, KSZ9021, KSZ8061, KS8737
> 
> Fix it by:
> - adding .probe() = kszphy_probe() callback to KSZ9031, KSZ9021
> phys. The kszphy_probe() can be re-used as it doesn't do any phy specific
> settings.
> - removing statistic callbacks from other phys (KSZ8795, KSZ886X,
> KSZ8873MLL, KSZ8061, KS8737) as they doesn't have corresponding
> statistic counters.
> 
> Fixes: 2b2427d06426 ("phy: micrel: Add ethtool statistics counters")
> Signed-off-by: Grygorii Strashko 
> ---
> changes in v2:
>  - probe callback added to KSZ9031, KSZ9021
>  - statistic callback removed from KSZ8795, KSZ886X, KSZ8873MLL, KSZ8061, 
> KS8737
> 
> Link on v1:
>  https://lkml.org/lkml/2017/4/10/1183
> 
>  drivers/net/phy/micrel.c | 18 ++
>  1 file changed, 2 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c
> index 6742070..6f207e6 100644
> --- a/drivers/net/phy/micrel.c
> +++ b/drivers/net/phy/micrel.c
> @@ -574,7 +574,6 @@ static int ksz9031_config_init(struct phy_device *phydev)
>   MII_KSZ9031RN_TX_DATA_PAD_SKEW, 4,
>   tx_data_skews, 4);
>   }
> -
>   return ksz9031_center_flp_timing(phydev);
>  }

Hi Grygorii

Whitespace changed like this should be in a separate patch, or not
made at all.

Otherwise, thanks for looking at the datasheets and fixing this up.

Reviewed-by: Andrew Lunn 

Andrew

Re: net/ipv4: use-after-free in ip_queue_xmit

2017-04-13 Thread Andrey Konovalov

On Mon, Apr 10, 2017 at 7:46 PM, Andrey Konovalov  wrote:
> On Mon, Apr 10, 2017 at 7:42 PM, Cong Wang  wrote:
>> On Mon, Apr 10, 2017 at 7:40 AM, Andrey Konovalov  
>> wrote:
>>> Hi,
>>>
>>> I've got the following error report while fuzzing the kernel with syzkaller.
>>>
>>> On commit 39da7c509acff13fc8cb12ec1bb20337c988ed36 (4.11-rc6).
>>>
>>> Unfortunately it's not reproducible.
>>>
>>> BUG: KASAN: use-after-free in ip_select_ttl include/net/dst.h:176
>>> [inline] at addr 88006ab3602c
>>> BUG: KASAN: use-after-free in ip_queue_xmit+0x1817/0x1a30
>>> net/ipv4/ip_output.c:485 at addr 88006ab3602c
>>> Read of size 4 by task syz-executor1/12627
>>> CPU: 3 PID: 12627 Comm: syz-executor1 Not tainted 4.11.0-rc6+ #206
>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>>> Call Trace:
>>>  __dump_stack lib/dump_stack.c:16 [inline]
>>>  dump_stack+0x292/0x398 lib/dump_stack.c:52
>>>  kasan_object_err+0x1c/0x70 mm/kasan/report.c:164
>>>  print_address_description mm/kasan/report.c:202 [inline]
>>>  kasan_report_error mm/kasan/report.c:291 [inline]
>>>  kasan_report+0x252/0x510 mm/kasan/report.c:347
>>>  __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:367
>>>  ip_select_ttl include/net/dst.h:176 [inline]
>>
>> Probably same as the one you reported on ipv4_mtu(), it would
>> be nice if you could test the patch I proposed:
>>
>> https://patchwork.ozlabs.org/patch/747556/
>
> Applied your patch.

Oops, apparently your patch doesn't compile:

  CC  net/ipv4/route.o
net/ipv4/route.c: In function ‘ipv4_dst_destroy’:
net/ipv4/route.c:1394:8: error: ‘struct rtable’ has no member named ‘fi’
  if (rt->fi) {
^~
net/ipv4/route.c:1395:18: error: ‘struct rtable’ has no member named ‘fi’
   fib_info_put(rt->fi);
  ^~
net/ipv4/route.c:1396:5: error: ‘struct rtable’ has no member named ‘fi’
   rt->fi = NULL;
 ^~
net/ipv4/route.c: In function ‘rt_init_metrics’:
net/ipv4/route.c:1440:5: error: ‘struct rtable’ has no member named ‘fi’
   rt->fi = fi;
 ^~
net/ipv4/route.c: In function ‘rt_dst_alloc’:
net/ipv4/route.c:1512:5: error: ‘struct rtable’ has no member named ‘fi’
   rt->fi = NULL;
 ^~
make[2]: *** [net/ipv4/route.o] Error 1
make[1]: *** [net/ipv4] Error 2
make[1]: *** Waiting for unfinished jobs
make: *** [net] Error 2


>
> The bug gets triggered very rarely (only twice so far), but I'll let
> you know if I see it again.
>
> Thanks!
>
>>
>>
>> Thanks!
>>
>>>  ip_queue_xmit+0x1817/0x1a30 net/ipv4/ip_output.c:485
>>>  sctp_v4_xmit+0x10d/0x140 net/sctp/protocol.c:994
>>>  sctp_packet_transmit+0x215c/0x3560 net/sctp/output.c:637
>>>  sctp_outq_flush+0xade/0x3f90 net/sctp/outqueue.c:885
>>>  sctp_outq_uncork+0x5a/0x70 net/sctp/outqueue.c:750
>>>  sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1773 [inline]
>>>  sctp_side_effects net/sctp/sm_sideeffect.c:1175 [inline]
>>>  sctp_do_sm+0x5a0/0x6a50 net/sctp/sm_sideeffect.c:1147
>>>  sctp_primitive_ASSOCIATE+0x9d/0xd0 net/sctp/primitive.c:88
>>>  sctp_sendmsg+0x270d/0x3b50 net/sctp/socket.c:1954
>>>  inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:762
>>>  sock_sendmsg_nosec net/socket.c:633 [inline]
>>>  sock_sendmsg+0xca/0x110 net/socket.c:643
>>>  SYSC_sendto+0x660/0x810 net/socket.c:1696
>>>  SyS_sendto+0x40/0x50 net/socket.c:1664
>>>  entry_SYSCALL_64_fastpath+0x1f/0xc2
>>> RIP: 0033:0x4458d9
>>> RSP: 002b:7fdceca85b58 EFLAGS: 0282 ORIG_RAX: 002c
>>> RAX: ffda RBX: 0016 RCX: 004458d9
>>> RDX: 0087 RSI: 20003000 RDI: 0016
>>> RBP: 006e2fe0 R08: 20003000 R09: 0010
>>> R10: 00040841 R11: 0282 R12: 007080a8
>>> R13: 000a R14: 0005 R15: 0084
>>> Object at 88006ab36008, in cache kmalloc-64 size: 64
>>> Allocated:
>>> PID = 7243
>>>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
>>>  save_stack+0x43/0xd0 mm/kasan/kasan.c:513
>>>  set_track mm/kasan/kasan.c:525 [inline]
>>>  kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:616
>>>  kmem_cache_alloc_trace+0x82/0x270 mm/slub.c:2745
>>>  kmalloc include/linux/slab.h:490 [inline]
>>>  kzalloc include/linux/slab.h:663 [inline]
>>>  fib_create_info+0x8e0/0x3a30 net/ipv4/fib_semantics.c:1040
>>>  fib_table_insert+0x1a5/0x1550 net/ipv4/fib_trie.c:1221
>>>  ip_rt_ioctl+0xddc/0x1590 net/ipv4/fib_frontend.c:597
>>>  inet_ioctl+0xf2/0x1c0 net/ipv4/af_inet.c:882
>>>  sock_do_ioctl+0x65/0xb0 net/socket.c:906
>>>  sock_ioctl+0x28f/0x440 net/socket.c:1004
>>>  vfs_ioctl fs/ioctl.c:45 [inline]
>>>  do_vfs_ioctl+0x1bf/0x1780 fs/ioctl.c:685
>>>  SYSC_ioctl fs/ioctl.c:700 [inline]
>>>  SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
>>>  entry_SYSCALL_64_fastpath+0x1f/0xc2
>>> Freed:
>>> PID = 12622
>>>  save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
>>>  save_stack+0x43/0xd0 mm/kasan/kasan.c:513
>>>  set_track

Re: [PATCH v4 3/3] VSOCK: Add virtio vsock vsockmon hooks

2017-04-13 Thread Michael S. Tsirkin

On Thu, Apr 13, 2017 at 05:18:11PM +0100, Stefan Hajnoczi wrote:
> From: Gerard Garcia 
> 
> The virtio drivers deal with struct virtio_vsock_pkt.  Add
> virtio_transport_deliver_tap_pkt(pkt) for handing packets to the
> vsockmon device.
> 
> We call virtio_transport_deliver_tap_pkt(pkt) from
> net/vmw_vsock/virtio_transport.c and drivers/vhost/vsock.c instead of
> common code.  This is because the drivers may drop packets before
> handing them to common code - we still want to capture them.
> 
> Signed-off-by: Gerard Garcia 
> Signed-off-by: Stefan Hajnoczi 
> ---
> v3:
>  * Hook virtio_transport.c (guest driver), not just
>drivers/vhost/vsock.c (host driver)
> ---
>  include/linux/virtio_vsock.h|  1 +
>  drivers/vhost/vsock.c   |  8 +
>  net/vmw_vsock/virtio_transport.c|  3 ++
>  net/vmw_vsock/virtio_transport_common.c | 58 
> +
>  4 files changed, 70 insertions(+)
> 
> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> index 584f9a6..ab13f07 100644
> --- a/include/linux/virtio_vsock.h
> +++ b/include/linux/virtio_vsock.h
> @@ -153,5 +153,6 @@ void virtio_transport_free_pkt(struct virtio_vsock_pkt 
> *pkt);
>  void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct 
> virtio_vsock_pkt *pkt);
>  u32 virtio_transport_get_credit(struct virtio_vsock_sock *vvs, u32 wanted);
>  void virtio_transport_put_credit(struct virtio_vsock_sock *vvs, u32 credit);
> +void virtio_transport_deliver_tap_pkt(struct virtio_vsock_pkt *pkt);
>  
>  #endif /* _LINUX_VIRTIO_VSOCK_H */
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index 44eed8e..d939ac1 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -176,6 +176,11 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
>   restart_tx = true;
>   }
>  
> + /* Deliver to monitoring devices all correctly transmitted
> +  * packets.
> +  */
> + virtio_transport_deliver_tap_pkt(pkt);
> +
>   virtio_transport_free_pkt(pkt);
>   }
>   if (added)
> @@ -383,6 +388,9 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work 
> *work)
>  
>   len = pkt->len;
>  
> + /* Deliver to monitoring devices all received packets */
> + virtio_transport_deliver_tap_pkt(pkt);
> +
>   /* Only accept correctly addressed packets */
>   if (le64_to_cpu(pkt->hdr.src_cid) == vsock->guest_cid)
>   virtio_transport_recv_pkt(pkt);
> diff --git a/net/vmw_vsock/virtio_transport.c 
> b/net/vmw_vsock/virtio_transport.c
> index 68675a1..9dffe02 100644
> --- a/net/vmw_vsock/virtio_transport.c
> +++ b/net/vmw_vsock/virtio_transport.c
> @@ -144,6 +144,8 @@ virtio_transport_send_pkt_work(struct work_struct *work)
>   list_del_init(>list);
>   spin_unlock_bh(>send_pkt_list_lock);
>  
> + virtio_transport_deliver_tap_pkt(pkt);
> +
>   reply = pkt->reply;
>  
>   sg_init_one(, >hdr, sizeof(pkt->hdr));
> @@ -370,6 +372,7 @@ static void virtio_transport_rx_work(struct work_struct 
> *work)
>   }
>  
>   pkt->len = len - sizeof(pkt->hdr);
> + virtio_transport_deliver_tap_pkt(pkt);
>   virtio_transport_recv_pkt(pkt);
>   }
>   } while (!virtqueue_enable_cb(vq));
> diff --git a/net/vmw_vsock/virtio_transport_common.c 
> b/net/vmw_vsock/virtio_transport_common.c
> index af087b4..aae60c1 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -16,6 +16,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -85,6 +86,63 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info 
> *info,
>   return NULL;
>  }
>  
> +/* Packet capture */
> +void virtio_transport_deliver_tap_pkt(struct virtio_vsock_pkt *pkt)
> +{
> + struct sk_buff *skb;
> + struct af_vsockmon_hdr *hdr;
> + unsigned char *t_hdr, *payload;
> +
> + skb = alloc_skb(sizeof(*hdr) + sizeof(pkt->hdr) + pkt->len,
> + GFP_ATOMIC);
> + if (!skb)
> + return; /* nevermind if we cannot capture the packet */
> +
> + hdr = (struct af_vsockmon_hdr *)skb_put(skb, sizeof(*hdr));
> +
> + /* pkt->hdr is little-endian so no need to byteswap here */

Comment does not seem to make sense. Drop it?

> + hdr->src_cid = pkt->hdr.src_cid;
> + hdr->src_port = pkt->hdr.src_port;
> + hdr->dst_cid = pkt->hdr.dst_cid;
> + hdr->dst_port = pkt->hdr.dst_port;
> +
> + hdr->transport = cpu_to_le16(AF_VSOCK_TRANSPORT_VIRTIO);
> + hdr->len = cpu_to_le16(sizeof(pkt->hdr));
> + hdr->reserved[0] = hdr->reserved[1] = 0;
> +
> + switch(cpu_to_le16(pkt->hdr.op)) {

I'd

[Patch net-next] kcm: remove a useless copy_from_user()

2017-04-13 Thread Cong Wang

struct kcm_clone only contains fd, and kcm_clone() only
writes this struct, so there is no need to copy it from user.

Cc: Tom Herbert 
Signed-off-by: Cong Wang 
---
 net/kcm/kcmsock.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c
index 31762f7..deca20f 100644
--- a/net/kcm/kcmsock.c
+++ b/net/kcm/kcmsock.c
@@ -1707,11 +1707,7 @@ static int kcm_ioctl(struct socket *sock, unsigned int 
cmd, unsigned long arg)
struct kcm_clone info;
struct socket *newsock = NULL;
 
-   if (copy_from_user(, (void __user *)arg, sizeof(info)))
-   return -EFAULT;
-
err = kcm_clone(sock, , );
-
if (!err) {
if (copy_to_user((void __user *)arg, ,
 sizeof(info))) {
-- 
2.5.5

How to debug DMAR errors?

2017-04-13 Thread Ben Greear


Hello,

I have been seeing a regular occurrence of DMAR errors, looking something
like this when testing my ath10k driver/firmware under some specific loads
(maximum receive of 512 byte frames in AP mode):

DMAR: DRHD: handling fault status reg 3
DMAR: [DMA Read] Request device [05:00.0] fault addr fd99f000 [fault reason 06] 
PTE Read access is not set
ath10k_pci :05:00.0: firmware crashed! (uuid 
594b1393-ae35-42b5-9dec-74ff0c6791ff)

So, I am wondering if there is any way I can get more information about what 
this fd99f000 address
is?

Once this problem hits, the entire OS locks hard (not even sysrq-boot will do 
anything),
so I guess I would need the DMAR logic to print out more info on that address 
somehow.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com

Re: [PATCH linux 2/2] net sched actions: fix refcount decrement on error

2017-04-13 Thread Cong Wang

On Thu, Apr 13, 2017 at 1:06 AM, Wolfgang Bumiller
 wrote:
> On Wed, Apr 12, 2017 at 09:27:31PM -0700, Cong Wang wrote:
>> Instead of duplicating code, you can add the check
>> to the module_put() next to err_mod label? I mean:
>
> I just realized that with module_put() happening in both error and
> success cases if `err != ACT_P_CREATED`, we could just move that code up
> to above the TCA_ACT_COOKIE handling?

Yes, even better.

> Btw., the comment confused me a little at first as I thought it's about
> what happens in ->init(). But reading the code I then noticed the module
> count is increased in tc_lookup_action_n() (which calls try_module_get)
> in this functions and it's about how this function itself is supposed
> to affect the count - if I'm not mistaken.
> => so I think it makes sense to deal with this earlier.

Yes, the module reference count is not increased inside ->init(),
it is because of the semantic of ->init(), it could create a new action
or modify existing one, for the cast latter we need to rollback the
refcount. Please feel free to update that comment to make it more
clear, since you are already on it. ;)

>
> Otherwise I'd have to save `err != ACT_P_CREATED` in an additional
> variable for the err_mod case since the cookie handling modifies `err`.
>
> What about this? (Since it's a separate issue not directly related to
> patch 1 of the series I can send it as separate mail based on master if
> you prefer - the diff below is based on master+patch1 for now.)
>

Looks good, this could also address Roman's comment. Please remove
the RFC tag and resend the whole series.

You can also add my:

Acked-by: Cong Wang 


Thanks.

Re: [PATCH next] bonding: handle link transition from FAIL to UP correctly

2017-04-13 Thread David Miller

From: Mahesh Bandewar 
Date: Tue, 11 Apr 2017 22:36:00 -0700

> From: Mahesh Bandewar 
> 
> When link transitions from LINK_FAIL to LINK_UP, the commit phase is
> not called. This leads to an erroneous state causing slave-link state to
> get stuck in "going down" state while its speed and duplex are perfectly
> fine. This issue is a side-effect of splitting link-set into propose and
> commit phases introduced by de77ecd4ef02 ("bonding: improve link-status
> update in mii-monitoring")
> 
> This patch fixes these issues by calling commit phase whenever link
> state change is proposed.
> 
> Fixes: de77ecd4ef02 ("bonding: improve link-status update in mii-monitoring")
> Signed-off-by: Mahesh Bandewar 

Applied, thanks.

Re: [PATCH v2 net-next] net: dwc-xlgmac: add the initial ethtool support

2017-04-13 Thread David Miller

From: Jie Deng 
Date: Wed, 12 Apr 2017 13:10:06 +0800

> It is necessary to provide ethtool support for displaying and
> modifying parameters of dwc-xlgmac.
> 
> Signed-off-by: Jie Deng 
> ---
> v1->v2:
>   - remove begin() method which is unnecessary

Applied, thank you.

Re: [PATCH v3 0/4] TI Bluetooth serdev support

2017-04-13 Thread Marcel Holtmann

Hi Rob,

> This series adds serdev support to the HCI LL protocol used on TI BT
> modules and enables support on HiKey board with with the WL1835 module.
> With this the custom TI UIM daemon and btattach are no longer needed.
> 
> The series is available on this git branch[1]. This version is rebased on 
> bluetooth-next tree containing its dependencies.
> 
> Rob
> 
> [1] git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git ti-bluetooth
> 
> Rob Herring (4):
>  dt-bindings: net: Add TI WiLink shared transport binding
>  bluetooth: hci_uart: remove unused hci_uart_init_tty
>  bluetooth: hci_uart: add LL protocol serdev driver support
>  arm64: dts: hikey: add WL1835 Bluetooth device node
> 
> .../devicetree/bindings/net/ti,wilink-st.txt   |  35 +++
> arch/arm64/boot/dts/hisilicon/hi6220-hikey.dts |   5 +
> drivers/bluetooth/hci_ldisc.c  |  19 --
> drivers/bluetooth/hci_ll.c | 262 -
> drivers/bluetooth/hci_uart.h   |   1 -
> 5 files changed, 301 insertions(+), 21 deletions(-)
> create mode 100644 Documentation/devicetree/bindings/net/ti,wilink-st.txt

all 4 patches have been applied to bluetooth-next tree.

Regards

Marcel

Re: [PATCH net-next v3 1/1] net: ipv4: Refine the ipv4_default_advmss

2017-04-13 Thread David Miller

From: gfree.w...@foxmail.com
Date: Wed, 12 Apr 2017 12:34:03 +0800

> From: Gao Feng 
> 
> 1. Don't get the metric RTAX_ADVMSS of dst.
> There are two reasons.
> 1) Its caller dst_metric_advmss has already invoke dst_metric_advmss
> before invoke default_advmss.
> 2) The ipv4_default_advmss is used to get the default mss, it should
> not try to get the metric like ip6_default_advmss.
> 
> 2. Use sizeof(tcphdr)+sizeof(iphdr) instead of literal 40.
> 
> 3. Define one new macro IPV4_MAX_PMTU instead of 65535 according to
> RFC 2675, section 5.1.
> 
> Signed-off-by: Gao Feng 
> ---
>  v3: Simplify the codes again, per Joe
>  v2: Use min instead of unnecessary min_t, per Joe
>  v1: initial version

Applied, thanks.

Re: [PATCH net-next v2 0/8] rtnetlink: Cleanup user notifications for netdev events

2017-04-13 Thread David Miller

From: David Ahern 
Date: Tue, 11 Apr 2017 17:02:39 -0700

> Vlad's recent patch to add the event type to rtnetlink notifications
> points out a number of redundant or unnecessary notifications sent to
> userspace for events that are essentially internal to the kernel. Trim
> the list to put a dent in the notification storm.
> 
> v2
> - rebased to top of net-next with IFLA_EVENT patch reverted
> - dropped removal NETDEV_CHANGEINFODATA since it is intentionally
>   only to send a message to userspace
> - dropped NOTIFY_PEERS since Vlad's says it is needed for macvlans
> - add patches to remove NETDEV_CHANGEUPPER and NETDEV_CHANGE_TX_QUEUE_LEN
>   from the event list

Series applied, thanks David.

[PATCH v2] cfg80211: Fix array-bounds warning in fragment copy

2017-04-13 Thread Matthias Kaehlcke

__ieee80211_amsdu_copy_frag intentionally initializes a pointer to
array[-1] to increment it later to valid values. clang rightfully
generates an array-bounds warning on the initialization statement.

Initialize the pointer to array[0] and change the algorithm from
increment before to increment after consume.

Signed-off-by: Matthias Kaehlcke 
---
Note: Resent to include linux-wireless in cc

 net/wireless/util.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/wireless/util.c b/net/wireless/util.c
index 68e5f2ecee1a..52795ae5337f 100644
--- a/net/wireless/util.c
+++ b/net/wireless/util.c
@@ -659,7 +659,7 @@ __ieee80211_amsdu_copy_frag(struct sk_buff *skb, struct 
sk_buff *frame,
int offset, int len)
 {
struct skb_shared_info *sh = skb_shinfo(skb);
-   const skb_frag_t *frag = >frags[-1];
+   const skb_frag_t *frag = >frags[0];
struct page *frag_page;
void *frag_ptr;
int frag_len, frag_size;
@@ -672,10 +672,10 @@ __ieee80211_amsdu_copy_frag(struct sk_buff *skb, struct 
sk_buff *frame,
 
while (offset >= frag_size) {
offset -= frag_size;
-   frag++;
frag_page = skb_frag_page(frag);
frag_ptr = skb_frag_address(frag);
frag_size = skb_frag_size(frag);
+   frag++;
}
 
frag_ptr += offset;
@@ -687,12 +687,12 @@ __ieee80211_amsdu_copy_frag(struct sk_buff *skb, struct 
sk_buff *frame,
len -= cur_len;
 
while (len > 0) {
-   frag++;
frag_len = skb_frag_size(frag);
cur_len = min(len, frag_len);
__frame_add_frag(frame, skb_frag_page(frag),
 skb_frag_address(frag), cur_len, frag_len);
len -= cur_len;
+   frag++;
}
 }
 
-- 
2.12.2.715.g7642488e1d-goog

Re: [PATCH] tools: bpf_jit_disasm: Add option to dump JIT image to a file.

2017-04-13 Thread David Miller

From: David Daney 
Date: Tue, 11 Apr 2017 14:30:52 -0700

> When debugging the JIT on an embedded platform or cross build
> environment, libbfd may not be available, making it impossible to run
> bpf_jit_disasm natively.
> 
> Add an option to emit a binary image of the JIT code to a file.  This
> file can then be disassembled off line.  Typical usage in this case
> might be (pasting mips64 dmesg output to cat command):
> 
>$ cat > jit.raw
>$ bpf_jit_disasm -f jit.raw -O jit.bin
>$ mips64-linux-gnu-objdump -D -b binary -m mips:isa64r2 -EB jit.bin
> 
> Signed-off-by: David Daney 

Applied, thanks.

[PATCH net] net: vrf: Fix setting NLM_F_EXCL flag when adding l3mdev rule

2017-04-13 Thread David Ahern

Only need 1 l3mdev FIB rule. Fix setting NLM_F_EXCL in the nlmsghdr.

Fixes: 1aa6c4f6b8cd8 ("net: vrf: Add l3mdev rules on first device create")
Signed-off-by: David Ahern 
---
 drivers/net/vrf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 22379da63400..6a6e7f2fee29 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -1125,7 +1125,7 @@ static int vrf_fib_rule(const struct net_device *dev, 
__u8 family, bool add_it)
goto nla_put_failure;
 
/* rule only needs to appear once */
-   nlh->nlmsg_flags &= NLM_F_EXCL;
+   nlh->nlmsg_flags |= NLM_F_EXCL;
 
frh = nlmsg_data(nlh);
memset(frh, 0, sizeof(*frh));
-- 
2.11.0 (Apple Git-81)

Re: [RFC net-next] of: mdio: Honor hints from MDIO bus drivers

2017-04-13 Thread David Miller

From: Florian Fainelli 
Date: Mon, 10 Apr 2017 14:42:58 -0700

> A MDIO bus driver can set phy_mask to indicate which PHYs should be
> probed and which should not. Right now, of_mdiobus_register() always
> sets mdio->phy_mask to ~0 which means: don't probe anything yourself,
> and let the Device Tree scanning do it based on the availability of
> child nodes.
> 
> When MDIO buses are stacked together (on purpose, as is done by DSA), we
> run into possible double probing which is, at best unnecessary, and at
> worse, can cause problems if that's not expected (e.g: during probe
> deferral).
> 
> Fix this by remember the original mdio->phy_mask, and make sure that if
> it was set to all 0xF, we set it to zero internally in order not to
> influence how the child PHY/MDIO device registration is going to behave.
> When the original mdio->phy_mask is set to something non-zero, we honor
> this value and utilize it as a hint to register only the child nodes
> that we have both found, and indicated to be necessary.
> 
> Signed-off-by: Florian Fainelli 

I don't think it's valid to have a unique OF node appear twice in the
device tree hiearchy.

Even if you can somehow hack this situation into working, you are
asking for all kinds of problems in the long run by doing things that
way.

If you have to, instantiate a new dummy device (perhaps a
platform_device, which thus can have private attributes you can store
in a structure whose layout you control) to act as the placeholder for
operation interception and property duplication.

Re: [PATCH net-next] net: ipv6: send unsolicited NA on admin up

2017-04-13 Thread David Ahern

On 4/13/17 5:45 AM, Hannes Frederic Sowa wrote:
> 
> 
> On Wed, Apr 12, 2017, at 20:49, David Ahern wrote:
>> ndisc_notify is the ipv6 equivalent to arp_notify. When arp_notify is
>> set to 1, gratuitous arp requests are sent when the device is brought up.
>> The same is expected when ndisc_notify is set to 1 (per ndisc_notify in
>> Documentation/networking/ip-sysctl.txt). The NA is not sent on NETDEV_UP
>> event; add it.
>>
>> Fixes: 5cb04436eef6 ("ipv6: add knob to send unsolicited ND on link-layer
>> address change")
>> Signed-off-by: David Ahern 
> 
> Acked-by: Hannes Frederic Sowa 
> 
> In future we might be able to make this a bit more robust when DAD is
> happening at the same time.


agreed.

Re: [PATCH net-next] net: stmmac: set total length of the packet to be transmitted in TDES3

2017-04-13 Thread David Miller

From: Niklas Cassel 
Date: Mon, 10 Apr 2017 20:33:29 +0200

> From: Niklas Cassel 
> 
> Field FL/TPL in register TDES3 is not correctly set on GMAC4.
> TX appears to be functional on GMAC 4.10a even if this field is not set,
> however, to avoid relying on undefined behavior, set the length in TDES3.
> 
> The field has a different meaning depending on if the TSE bit in TDES3
> is set or not (TSO). However, regardless of the TSE bit, the field is
> not optional. The field is already set correctly when the TSE bit is set.
> 
> Since there is no limit for the number of descriptors that can be
> used for a single packet, the field should be set to the sum of
> the buffers contained in:
> [ ...  ...
> ], which should be equal to skb->len.
> 
> Signed-off-by: Niklas Cassel 

Applied, thanks.

Re: [PATCH net-next] cxgb4: save tid while creating server filter

2017-04-13 Thread David Miller

From: Ganesh Goudar 
Date: Mon, 10 Apr 2017 21:26:18 +0530

> Save the filter tid while creating the server filter, which is used
> later to retrieve the corresponding filter instance while handling
> the filter reply.
> 
> Signed-off-by: Ganesh Goudar 

Applied.

Re: [PATCH 1/1] drivers: net: usb: qmi_wwan: add QMI_QUIRK_SET_DTR for Telit PID 0x1201

2017-04-13 Thread David Miller

From: Daniele Palmas 
Date: Mon, 10 Apr 2017 17:34:23 +0200

> Telit LE920A4 uses the same pid 0x1201 of LE920, but modem
> implementation is different, since it requires DTR to be set for
> answering to qmi messages.
> 
> This patch replaces QMI_FIXED_INTF with QMI_QUIRK_SET_DTR: tests on
> LE920 have been performed in order to verify backward compatibility.
> 
> Signed-off-by: Daniele Palmas 

Applied, thank you.

Re: IGMP on IPv6

2017-04-13 Thread Murali Karicheri

On 03/22/2017 11:04 AM, Murali Karicheri wrote:
> Hi Liu,
> 
> I saw that you have sent patches to the list for IGMP and have a question on 
> IGMP on IPv6.
> Hope you can clarify. I have posted the question already to the list and is 
> reproduced
> below. Let me know if you have an answer.
> 
> = See email with subject "IPv6 IGMP issue in v4.4.44 ?? 
> 
> Cut-n-paste from that email
> 
> I see an issue with IGMP for IPv6 when I test HSR redundancy network
> interface. As soon as I set up an HSR interface, I see some IGMP messages
> (destination mac address: 33 33 00 00 00 02 going over HSR interface to
> slave interfaces, at the egress where as for IPv6, I see similar messages
> going directly over the Ethernet interfaces that are attached to
> HSR master. It appears that the NETDEV_CHANGEUPPER is not handled properly
> and the mcast snoop sends the packets over the old interfaces at timer
> expiry. 
> 
> A dump of the message at the slave Ethernet interface looks like below.
> 
> IPv4
> 
> [   64.643842] 33 33 00 00 00 02 70 ff 76 1c 0f 8d 89 2f 10 3e fc 
> [   64.649910] 18 86 dd 60 00 00 00 00 10 3a ff fe 80 00 00 00 
> [   64.655705] 00 00 00 72 ff 76 ff fe 1c 0f 8d ff 02 00 00 00 
> [   64.661503] 00 00 00 00 00 00 00 00 00 00 02 85 00 8d dc 
> 
> 
> You can see this is tagged with HSR.
> 
> IPv6
> 
> [   65.559130] 33 33 00 00 00 02 70 ff 76 1c 0f 8d 86 dd 60 00 00 
> [   65.565205] 00 00 10 3a ff fe 80 00 00 00 00 00 00 72 ff 76 
> [   65.571011] ff fe 1c 0f 8d ff 02 00 00 00 00 00 00 00 00 00 
> [   65.576806] 00 00 00 00 02 85 00 8d dc 00 00 00 00 01 01 
> 
> This is going directly to the slave Ethernet interface.
> 
> When I put a WARN_ONCE, I found this is coming directly from 
> mld_ifc_timer_expire() -> mld_sendpack() -> ip6_output()
> 
> Do you think this is fixed in latest kernel at master? If so, could
> you point me to some commits.
> 
> 
Ping... I see this behavior is also seen on v4.9.x Kernel. Any clue if 
this is fixed by some commit or I need to debug? I see IGMPv6 has some 
fixes on the list to make it similar to IGMPv4. So can someone clarify this is
is a bug at IGMPv6 code or I need to look into the HSR driver code?
Since IGMPv4 is going over the HSR interface I am assuming this is a
bug in the IGMPv6 code. But since I have not experience with this code
can some expert comment please?

Murali

-- 
Murali Karicheri
Linux Kernel, Keystone

[PATCH v4 2/3] VSOCK: Add vsockmon device

2017-04-13 Thread Stefan Hajnoczi

From: Gerard Garcia 

Add vsockmon virtual network device that receives packets from the vsock
transports and exposes them to user space.

Based on the nlmon device.

Signed-off-by: Gerard Garcia 
Signed-off-by: Stefan Hajnoczi 
---
v4:
 * Add explicit reserved padding field to struct af_vsockmon_hdr and
   drop __attribute__((packed)) [Michael, DaveM]
v3:
 * Fix DEFAULT_MTU macro definition [Zhu Yanjun]
 * Rename af_vsockmon_hdr->t field ->transport for clarity
 * Update .ndo_get_stats64() return type since it has changed
---
 drivers/net/Makefile  |   1 +
 include/uapi/linux/vsockmon.h |  58 +++
 drivers/net/vsockmon.c| 167 ++
 drivers/net/Kconfig   |   8 ++
 include/uapi/linux/Kbuild |   1 +
 5 files changed, 235 insertions(+)
 create mode 100644 include/uapi/linux/vsockmon.h
 create mode 100644 drivers/net/vsockmon.c

diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 98ed4d9..2d54930 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -30,6 +30,7 @@ obj-$(CONFIG_GENEVE) += geneve.o
 obj-$(CONFIG_GTP) += gtp.o
 obj-$(CONFIG_NLMON) += nlmon.o
 obj-$(CONFIG_NET_VRF) += vrf.o
+obj-$(CONFIG_VSOCKMON) += vsockmon.o
 
 #
 # Networking Drivers
diff --git a/include/uapi/linux/vsockmon.h b/include/uapi/linux/vsockmon.h
new file mode 100644
index 000..5fce3991
--- /dev/null
+++ b/include/uapi/linux/vsockmon.h
@@ -0,0 +1,58 @@
+#ifndef _UAPI_VSOCKMON_H
+#define _UAPI_VSOCKMON_H
+
+#include 
+
+/*
+ * vsockmon is the AF_VSOCK packet capture device.  Packets captured have the
+ * following layout:
+ *
+ *   +---+
+ *   |   vsockmon header |
+ *   |  (struct af_vsockmon_hdr) |
+ *   +---+
+ *   |  transport header |
+ *   | (af_vsockmon_hdr->len bytes long) |
+ *   +---+
+ *   |  payload  |
+ *   |   (until end of packet)   |
+ *   +---+
+ *
+ * The vsockmon header is a transport-independent description of the packet.
+ * It duplicates some of the information from the transport header so that
+ * no transport-specific knowledge is necessary to process packets.
+ *
+ * The transport header is useful for low-level transport-specific packet
+ * analysis.  Transport type is given in af_vsockmon_hdr->transport and
+ * transport header length is given in af_vsockmon_hdr->len.
+ *
+ * If af_vsockmon_hdr->op is AF_VSOCK_OP_PAYLOAD then the payload follows the
+ * transport header.  Other ops do not have a payload.
+ */
+
+struct af_vsockmon_hdr {
+   __le64 src_cid;
+   __le64 dst_cid;
+   __le32 src_port;
+   __le32 dst_port;
+   __le16 op;  /* enum af_vsockmon_op */
+   __le16 transport;   /* enum af_vsockmon_transport */
+   __le16 len; /* Transport header length */
+   __u8 reserved[2];
+};
+
+enum af_vsockmon_op {
+   AF_VSOCK_OP_UNKNOWN = 0,
+   AF_VSOCK_OP_CONNECT = 1,
+   AF_VSOCK_OP_DISCONNECT = 2,
+   AF_VSOCK_OP_CONTROL = 3,
+   AF_VSOCK_OP_PAYLOAD = 4,
+};
+
+enum af_vsockmon_transport {
+   AF_VSOCK_TRANSPORT_UNKNOWN = 0,
+   AF_VSOCK_TRANSPORT_NO_INFO = 1, /* No transport information */
+   AF_VSOCK_TRANSPORT_VIRTIO = 2,  /* Virtio transport header (struct 
virtio_vsock_hdr) */
+};
+
+#endif
diff --git a/drivers/net/vsockmon.c b/drivers/net/vsockmon.c
new file mode 100644
index 000..0bff1e9
--- /dev/null
+++ b/drivers/net/vsockmon.c
@@ -0,0 +1,167 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* Virtio transport max packet size plus header */
+#define DEFAULT_MTU (VIRTIO_VSOCK_MAX_PKT_BUF_SIZE + \
+sizeof(struct af_vsockmon_hdr))
+
+struct pcpu_lstats {
+   u64 rx_packets;
+   u64 rx_bytes;
+   struct u64_stats_sync syncp;
+};
+
+static int vsockmon_dev_init(struct net_device *dev)
+{
+   dev->lstats = netdev_alloc_pcpu_stats(struct pcpu_lstats);
+   return dev->lstats == NULL ? -ENOMEM : 0;
+}
+
+static void vsockmon_dev_uninit(struct net_device *dev)
+{
+   free_percpu(dev->lstats);
+}
+
+struct vsockmon {
+   struct vsock_tap vt;
+};
+
+static int vsockmon_open(struct net_device *dev)
+{
+   struct vsockmon *vsockmon = netdev_priv(dev);
+
+   vsockmon->vt.dev = dev;
+   vsockmon->vt.module = THIS_MODULE;
+   return vsock_add_tap(>vt);
+}
+
+static int vsockmon_close(struct net_device *dev) {
+   struct vsockmon *vsockmon = netdev_priv(dev);
+
+   return vsock_remove_tap(>vt);
+}
+
+static netdev_tx_t vsockmon_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+   int len = skb->len;
+   struct pcpu_lstats *stats = this_cpu_ptr(dev->lstats);
+
+   u64_stats_update_begin(>syncp);
+

[PATCH v4 3/3] VSOCK: Add virtio vsock vsockmon hooks

2017-04-13 Thread Stefan Hajnoczi

From: Gerard Garcia 

The virtio drivers deal with struct virtio_vsock_pkt.  Add
virtio_transport_deliver_tap_pkt(pkt) for handing packets to the
vsockmon device.

We call virtio_transport_deliver_tap_pkt(pkt) from
net/vmw_vsock/virtio_transport.c and drivers/vhost/vsock.c instead of
common code.  This is because the drivers may drop packets before
handing them to common code - we still want to capture them.

Signed-off-by: Gerard Garcia 
Signed-off-by: Stefan Hajnoczi 
---
v3:
 * Hook virtio_transport.c (guest driver), not just
   drivers/vhost/vsock.c (host driver)
---
 include/linux/virtio_vsock.h|  1 +
 drivers/vhost/vsock.c   |  8 +
 net/vmw_vsock/virtio_transport.c|  3 ++
 net/vmw_vsock/virtio_transport_common.c | 58 +
 4 files changed, 70 insertions(+)

diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
index 584f9a6..ab13f07 100644
--- a/include/linux/virtio_vsock.h
+++ b/include/linux/virtio_vsock.h
@@ -153,5 +153,6 @@ void virtio_transport_free_pkt(struct virtio_vsock_pkt 
*pkt);
 void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct 
virtio_vsock_pkt *pkt);
 u32 virtio_transport_get_credit(struct virtio_vsock_sock *vvs, u32 wanted);
 void virtio_transport_put_credit(struct virtio_vsock_sock *vvs, u32 credit);
+void virtio_transport_deliver_tap_pkt(struct virtio_vsock_pkt *pkt);
 
 #endif /* _LINUX_VIRTIO_VSOCK_H */
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 44eed8e..d939ac1 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -176,6 +176,11 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
restart_tx = true;
}
 
+   /* Deliver to monitoring devices all correctly transmitted
+* packets.
+*/
+   virtio_transport_deliver_tap_pkt(pkt);
+
virtio_transport_free_pkt(pkt);
}
if (added)
@@ -383,6 +388,9 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work 
*work)
 
len = pkt->len;
 
+   /* Deliver to monitoring devices all received packets */
+   virtio_transport_deliver_tap_pkt(pkt);
+
/* Only accept correctly addressed packets */
if (le64_to_cpu(pkt->hdr.src_cid) == vsock->guest_cid)
virtio_transport_recv_pkt(pkt);
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 68675a1..9dffe02 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -144,6 +144,8 @@ virtio_transport_send_pkt_work(struct work_struct *work)
list_del_init(>list);
spin_unlock_bh(>send_pkt_list_lock);
 
+   virtio_transport_deliver_tap_pkt(pkt);
+
reply = pkt->reply;
 
sg_init_one(, >hdr, sizeof(pkt->hdr));
@@ -370,6 +372,7 @@ static void virtio_transport_rx_work(struct work_struct 
*work)
}
 
pkt->len = len - sizeof(pkt->hdr);
+   virtio_transport_deliver_tap_pkt(pkt);
virtio_transport_recv_pkt(pkt);
}
} while (!virtqueue_enable_cb(vq));
diff --git a/net/vmw_vsock/virtio_transport_common.c 
b/net/vmw_vsock/virtio_transport_common.c
index af087b4..aae60c1 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -85,6 +86,63 @@ virtio_transport_alloc_pkt(struct virtio_vsock_pkt_info 
*info,
return NULL;
 }
 
+/* Packet capture */
+void virtio_transport_deliver_tap_pkt(struct virtio_vsock_pkt *pkt)
+{
+   struct sk_buff *skb;
+   struct af_vsockmon_hdr *hdr;
+   unsigned char *t_hdr, *payload;
+
+   skb = alloc_skb(sizeof(*hdr) + sizeof(pkt->hdr) + pkt->len,
+   GFP_ATOMIC);
+   if (!skb)
+   return; /* nevermind if we cannot capture the packet */
+
+   hdr = (struct af_vsockmon_hdr *)skb_put(skb, sizeof(*hdr));
+
+   /* pkt->hdr is little-endian so no need to byteswap here */
+   hdr->src_cid = pkt->hdr.src_cid;
+   hdr->src_port = pkt->hdr.src_port;
+   hdr->dst_cid = pkt->hdr.dst_cid;
+   hdr->dst_port = pkt->hdr.dst_port;
+
+   hdr->transport = cpu_to_le16(AF_VSOCK_TRANSPORT_VIRTIO);
+   hdr->len = cpu_to_le16(sizeof(pkt->hdr));
+   hdr->reserved[0] = hdr->reserved[1] = 0;
+
+   switch(cpu_to_le16(pkt->hdr.op)) {
+   case VIRTIO_VSOCK_OP_REQUEST:
+   case VIRTIO_VSOCK_OP_RESPONSE:
+   hdr->op = cpu_to_le16(AF_VSOCK_OP_CONNECT);
+   break;
+   case VIRTIO_VSOCK_OP_RST:
+   case VIRTIO_VSOCK_OP_SHUTDOWN:
+   hdr->op = cpu_to_le16(AF_VSOCK_OP_DISCONNECT);

[PATCH v4 1/3] VSOCK: Add vsockmon tap functions

2017-04-13 Thread Stefan Hajnoczi

From: Gerard Garcia 

Add tap functions that can be used by the vsock transports to
deliver packets to vsockmon virtual network devices.

Signed-off-by: Gerard Garcia 
Signed-off-by: Stefan Hajnoczi 
---
v4:
 * Call synchronize_net() before module_put() [Michael]
v3:
 * Include missing  header in af_vsock_tap.c
---
 net/vmw_vsock/Makefile   |   2 +-
 include/net/af_vsock.h   |  13 ++
 include/uapi/linux/if_arp.h  |   1 +
 net/vmw_vsock/af_vsock_tap.c | 107 +++
 4 files changed, 122 insertions(+), 1 deletion(-)
 create mode 100644 net/vmw_vsock/af_vsock_tap.c

diff --git a/net/vmw_vsock/Makefile b/net/vmw_vsock/Makefile
index bc27c70..09fc2eb 100644
--- a/net/vmw_vsock/Makefile
+++ b/net/vmw_vsock/Makefile
@@ -3,7 +3,7 @@ obj-$(CONFIG_VMWARE_VMCI_VSOCKETS) += vmw_vsock_vmci_transport.o
 obj-$(CONFIG_VIRTIO_VSOCKETS) += vmw_vsock_virtio_transport.o
 obj-$(CONFIG_VIRTIO_VSOCKETS_COMMON) += vmw_vsock_virtio_transport_common.o
 
-vsock-y += af_vsock.o vsock_addr.o
+vsock-y += af_vsock.o af_vsock_tap.o vsock_addr.o
 
 vmw_vsock_vmci_transport-y += vmci_transport.o vmci_transport_notify.o \
vmci_transport_notify_qstate.o
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index f32ed9a..c526d4f 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -188,4 +188,17 @@ struct sock *vsock_find_connected_socket(struct 
sockaddr_vm *src,
 void vsock_remove_sock(struct vsock_sock *vsk);
 void vsock_for_each_connected_socket(void (*fn)(struct sock *sk));
 
+/ TAP /
+
+struct vsock_tap {
+   struct net_device *dev;
+   struct module *module;
+   struct list_head list;
+};
+
+int vsock_init_tap(void);
+int vsock_add_tap(struct vsock_tap *vt);
+int vsock_remove_tap(struct vsock_tap *vt);
+void vsock_deliver_tap(struct sk_buff *skb);
+
 #endif /* __AF_VSOCK_H__ */
diff --git a/include/uapi/linux/if_arp.h b/include/uapi/linux/if_arp.h
index 4d024d7..cf73510 100644
--- a/include/uapi/linux/if_arp.h
+++ b/include/uapi/linux/if_arp.h
@@ -95,6 +95,7 @@
 #define ARPHRD_IP6GRE  823 /* GRE over IPv6*/
 #define ARPHRD_NETLINK 824 /* Netlink header   */
 #define ARPHRD_6LOWPAN 825 /* IPv6 over LoWPAN */
+#define ARPHRD_VSOCKMON826 /* Vsock monitor header 
*/
 
 #define ARPHRD_VOID  0x/* Void type, nothing is known */
 #define ARPHRD_NONE  0xFFFE/* zero header length */
diff --git a/net/vmw_vsock/af_vsock_tap.c b/net/vmw_vsock/af_vsock_tap.c
new file mode 100644
index 000..db0c4e7
--- /dev/null
+++ b/net/vmw_vsock/af_vsock_tap.c
@@ -0,0 +1,107 @@
+/*
+ * Tap functions for AF_VSOCK sockets.
+ *
+ * Code based on net/netlink/af_netlink.c tap functions.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+static DEFINE_SPINLOCK(vsock_tap_lock);
+static struct list_head vsock_tap_all __read_mostly =
+   LIST_HEAD_INIT(vsock_tap_all);
+
+int vsock_add_tap(struct vsock_tap *vt) {
+   if (unlikely(vt->dev->type != ARPHRD_VSOCKMON))
+   return -EINVAL;
+
+   __module_get(vt->module);
+
+   spin_lock(_tap_lock);
+   list_add_rcu(>list, _tap_all);
+   spin_unlock(_tap_lock);
+
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(vsock_add_tap);
+
+int vsock_remove_tap(struct vsock_tap *vt)
+{
+   struct vsock_tap *tmp;
+   bool found = false;
+
+   spin_lock(_tap_lock);
+
+   list_for_each_entry(tmp, _tap_all, list) {
+   if (vt == tmp) {
+   list_del_rcu(>list);
+   found = true;
+   goto out;
+   }
+   }
+
+   pr_warn("vsock_remove_tap: %p not found\n", vt);
+out:
+   spin_unlock(_tap_lock);
+
+   synchronize_net();
+
+   if (found)
+   module_put(vt->module);
+
+   return found ? 0 : -ENODEV;
+}
+EXPORT_SYMBOL_GPL(vsock_remove_tap);
+
+static int __vsock_deliver_tap_skb(struct sk_buff *skb,
+struct net_device *dev)
+{
+   int ret = 0;
+   struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC);
+
+   if (nskb) {
+   dev_hold(dev);
+
+   nskb->dev = dev;
+   ret = dev_queue_xmit(nskb);
+   if (unlikely(ret > 0))
+   ret = net_xmit_errno(ret);
+
+   dev_put(dev);
+   }
+
+   return ret;
+}
+
+static void __vsock_deliver_tap(struct sk_buff *skb)
+{
+   int ret;
+   struct vsock_tap *tmp;
+
+   list_for_each_entry_rcu(tmp, _tap_all, list) {
+   ret =

Re: [RFC PATCH 6/7] net: allow simultaneous SW and HW transmit timestamping

2017-04-13 Thread Willem de Bruijn

On Thu, Apr 13, 2017 at 11:24 AM, Keller, Jacob E
 wrote:
>
>
>> -Original Message-
>> From: Miroslav Lichvar [mailto:mlich...@redhat.com]
>> Sent: Thursday, April 13, 2017 8:00 AM
>>
>> Oh, I see. I was struggling to find a good name for this option.
>>
>> > The name for this option is therefore not very descriptive. Perhaps
>> > SOF_TIMESTAMPING_OPT_BOTH_SW_HW.
>>
>> Simultaneous SW/HW timestamping was already possible for incoming
>> packets. Maybe _OPT_TX_SWHW would be better?
>>
>
> This sounds more accurate to me.

Agreed.

[PATCH v4 0/3] VSOCK: vsockmon virtual device to monitor AF_VSOCK sockets.

2017-04-13 Thread Stefan Hajnoczi

v4:
 * Add explicit reserved padding field to struct af_vsockmon_hdr and
   drop __attribute__((packed)) [Michael, DaveM]
 * Call synchronize_net() before module_put() [Michael]

v3:
 * Hook virtio_transport.c (guest driver), not just drivers/vhost/vsock.c (host
   driver)
 * Fix DEFAULT_MTU macro definition [Zhu Yanjun]
 * Rename af_vsockmon_hdr->t field ->transport for clarity
 * Update .ndo_get_stats64() return type since it has changed
 * Include missing  header in af_vsock_tap.c

This is a continuation of Gerard Garcia's work on the vsockmon packet capture
interface for AF_VSOCK.  Packet capture is an essential feature for network
communication.  Gerard began addressing this feature gap in his Google Summer
of Code 2016 project.  I have cleaned up, rebased, and retested the v2 series
he posted previously.

The design follows the nlmon packet capture interface closely.  This is because
vsock has the same problem as netlink: there is no netdev on which packets can
be captured.  The nlmon driver is a synthetic netdev purely for the purpose of
enabling packet capture.  We follow the same approach here with vsockmon.

See include/uapi/linux/vsockmon.h in this series for details on the packet
layout.

How to try it:

1. Build tcpdump with vsockmon patches:

  $ git clone -b vsock https://github.com/stefanha/libpcap
  $ (cd libcap && ./configure && make)
  $ git clone -b vsock https://github.com/stefanha/tcpdump
  $ (cd tcpdump && ./configure && make)

2. Build nc-vsock (a netcat-like tool):

  $ git clone https://github.com/stefanha/nc-vsock
  $ (cd nc-vsock && make)

3. Launch a virtual machine:

  # modprobe vhost_vsock
  # qemu-system-x86_64 -M accel=kvm -m 1024 -cpu host \
  -drive if=virtio,file=test.img,format=raw \
  -device vhost-vsock-pci,guest-cid=3

  (Assumes guest is running a kernel with this patch)

4. Capture AF_VSOCK traffic in guest and/or host:

  # modprobe vsockmon
  # ip link add type vsockmon
  # ip link set vsockmon0 up
  # tcpdump -i vsockmon0 -vvv

5. Communicate!

  (host)$ nc-vsock -l 1234
  (guest)$ nc-vsock 2 1234

Gerard Garcia (3):
  VSOCK: Add vsockmon tap functions
  VSOCK: Add vsockmon device
  VSOCK: Add virtio vsock vsockmon hooks

 drivers/net/Makefile|   1 +
 net/vmw_vsock/Makefile  |   2 +-
 include/linux/virtio_vsock.h|   1 +
 include/net/af_vsock.h  |  13 +++
 include/uapi/linux/if_arp.h |   1 +
 include/uapi/linux/vsockmon.h   |  58 +++
 drivers/net/vsockmon.c  | 167 
 drivers/vhost/vsock.c   |   8 ++
 net/vmw_vsock/af_vsock_tap.c| 107 
 net/vmw_vsock/virtio_transport.c|   3 +
 net/vmw_vsock/virtio_transport_common.c |  58 +++
 drivers/net/Kconfig |   8 ++
 include/uapi/linux/Kbuild   |   1 +
 13 files changed, 427 insertions(+), 1 deletion(-)
 create mode 100644 include/uapi/linux/vsockmon.h
 create mode 100644 drivers/net/vsockmon.c
 create mode 100644 net/vmw_vsock/af_vsock_tap.c

-- 
2.9.3

Re: [RFC PATCH 3/7] net: add option to get information about timestamped packets

2017-04-13 Thread Willem de Bruijn

On Thu, Apr 13, 2017 at 11:18 AM, Miroslav Lichvar  wrote:
> On Thu, Apr 13, 2017 at 10:37:07AM -0400, Willem de Bruijn wrote:
>> On Wed, Apr 12, 2017 at 10:17 AM, Miroslav Lichvar  
>> wrote:
>> > Extend the skb_shared_hwtstamps structure with the index of the
>> > real interface which received or transmitted the packet and the length
>> > of the packet at layer 2.
>>
>> The original packet is received along with the timestamp.
>
> But only outgoing packets, right?

Timestamps for incoming packets are also passed alongside the original packet.

>> Why is this L2 length needed?
>
> It's needed for incoming packets to allow converting of preamble
> timestamps to trailer timestamps.

Receiving the mac length of a packet sounds like a feature independent
from timestamping. Either an ioctl similar to SIOCGIFMTU or, if it may
vary due to existince of vlan headers, a new independent cmsg at the
SOL_SOCKET layer.

>> > Add a SOF_TIMESTAMPING_OPT_PKTINFO flag to
>> > the SO_TIMESTAMPING option to allow applications to get this information
>> > as struct scm_ts_pktinfo in SCM_TIMESTAMPING_PKTINFO control message.
>>
>> This patch saves skb->dev->ifindex, which is the same as existing
>> SOF_TIMESTAMPING_OPT_CMSG. See also the bug fix for that
>> feature I sent yesterday: http://patchwork.ozlabs.org/patch/750197/
>
> The main point is that it provides the index of the device which
> received the packet. It does duplicate the functionality of OPT_CMSG +
> IP_PKTINFO for outgoing packets, but I thought it might be useful with
> the TSONLY option.

Agreed. I'd prefer to reuse the existing option for that and just extend it
to work together with TSONLY.

We will have to set serr->header.h4.iif from something other than skb->dev
if the skb was allocated fresh in __skb_tstamp_tx without the device
association.

[patch net-next] MAINTAINERS: rename TC entry and add couple of header files

2017-04-13 Thread Jiri Pirko

From: Jiri Pirko 

The section is not specific only to "TC classifiers", but applies to the
whole TC subsystem. Also, add couple of forgotten headers.

Signed-off-by: Jiri Pirko 
---
 MAINTAINERS | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 5397f54..549e8e1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12180,12 +12180,17 @@ F:Documentation/accounting/taskstats*
 F: include/linux/taskstats*
 F: kernel/taskstats.c
 
-TC CLASSIFIER
+TC subsystem
 M: Jamal Hadi Salim 
 L: netdev@vger.kernel.org
 S: Maintained
 F: include/net/pkt_cls.h
+F: include/net/pkt_sched.h
+F: include/net/tc_act/
 F: include/uapi/linux/pkt_cls.h
+F: include/uapi/linux/pkt_sched.h
+F: include/uapi/linux/tc_act/
+F: include/uapi/linux/tc_ematch/
 F: net/sched/
 
 TCP LOW PRIORITY MODULE
-- 
2.7.4

Re: [Patch net-next v2] net_sched: move the empty tp check from ->destroy() to ->delete()

2017-04-13 Thread Cong Wang

On Thu, Apr 13, 2017 at 12:28 AM, kbuild test robot <l...@intel.com> wrote:
> Hi Cong,
>
> [auto build test WARNING on net-next/master]
>
> url:
> https://github.com/0day-ci/linux/commits/Cong-Wang/net_sched-move-the-empty-tp-check-from-destroy-to-delete/20170413-145318
> config: x86_64-randconfig-x004-201715 (attached as .config)
> compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=x86_64
>
> All warnings (new ones prefixed by >>):
>
>net/sched/cls_matchall.c: In function 'mall_destroy':
>>> net/sched/cls_matchall.c:99:10: warning: 'return' with a value, in function 
>>> returning void
>   return true;
>  ^~~~
>net/sched/cls_matchall.c:93:13: note: declared here
> static void mall_destroy(struct tcf_proto *tp)
> ^~~~
>net/sched/cls_matchall.c:105:9: warning: 'return' with a value, in 
> function returning void
>  return true;
> ^~~~
>net/sched/cls_matchall.c:93:13: note: declared here
> static void mall_destroy(struct tcf_proto *tp)
> ^~~~

Ah, I must miss it while compiling... Will send v3 after waiting for
other comments.

[PATCH v4 net-next RFC] net: Generic XDP

2017-04-13 Thread David Miller


This provides a generic SKB based non-optimized XDP path which is used
if either the driver lacks a specific XDP implementation, or the user
requests it via a new IFLA_XDP_FLAGS value named XDP_FLAGS_SKB_MODE.

It is arguable that perhaps I should have required something like
this as part of the initial XDP feature merge.

I believe this is critical for two reasons:

1) Accessibility.  More people can play with XDP with less
   dependencies.  Yes I know we have XDP support in virtio_net, but
   that just creates another depedency for learning how to use this
   facility.

   I wrote this to make life easier for the XDP newbies.

2) As a model for what the expected semantics are.  If there is a pure
   generic core implementation, it serves as a semantic example for
   driver folks adding XDP support.

This is just a rough draft and is untested.

One thing I have not tried to address here is the issue of
XDP_PACKET_HEADROOM, thanks to Daniel for spotting that.  It seems
incredibly expensive to do a skb_cow(skb, XDP_PACKET_HEADROOM) or
whatever even if the XDP program doesn't try to push headers at all.
I think we really need the verifier to somehow propagate whether
certain XDP helpers are used or not.

Signed-off-by: David S. Miller 
---

v4:
 - Fix MAC header adjustmnet before calling prog (David Ahern)
 - Disable LRO when generic XDP is installed (Michael Chan)
 - Bypass qdisc et al. on XDP_TX and record the event (Alexei)
 - Do not perform generic XDP on reinjected packets (DaveM)

v3:
 - Make sure XDP program sees packet at MAC header, push back MAC
   header if we do XDP_TX.  (Alexei)
 - Elide GRO when generic XDP is in use.  (Alexei)
 - Add XDP_FLAG_SKB_MODE flag which the user can use to request generic
   XDP even if the driver has an XDP implementation.  (Alexei)
 - Report whether SKB mode is in use in rtnl_xdp_fill() via XDP_FLAGS
   attribute.  (Daniel)

v2:
 - Add some "fall through" comments in switch statements based
   upon feedback from Andrew Lunn
 - Use RCU for generic xdp_prog, thanks to Johannes Berg.

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index b0aa089..071a58b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1891,9 +1891,17 @@ struct net_device {
struct lock_class_key   *qdisc_tx_busylock;
struct lock_class_key   *qdisc_running_key;
boolproto_down;
+   struct bpf_prog __rcu   *xdp_prog;
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
+static inline bool netif_elide_gro(const struct net_device *dev)
+{
+   if (!(dev->features & NETIF_F_GRO) || dev->xdp_prog)
+   return true;
+   return false;
+}
+
 #defineNETDEV_ALIGN32
 
 static inline
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 8b405af..633aa02 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -887,7 +887,9 @@ enum {
 /* XDP section */
 
 #define XDP_FLAGS_UPDATE_IF_NOEXIST(1U << 0)
-#define XDP_FLAGS_MASK (XDP_FLAGS_UPDATE_IF_NOEXIST)
+#define XDP_FLAGS_SKB_MODE (2U << 0)
+#define XDP_FLAGS_MASK (XDP_FLAGS_UPDATE_IF_NOEXIST | \
+XDP_FLAGS_SKB_MODE)
 
 enum {
IFLA_XDP_UNSPEC,
diff --git a/net/core/dev.c b/net/core/dev.c
index ef9fe60e..9ed4569 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -95,6 +95,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -4247,6 +4248,88 @@ static int __netif_receive_skb(struct sk_buff *skb)
return ret;
 }
 
+static struct static_key generic_xdp_needed __read_mostly;
+
+static int generic_xdp_install(struct net_device *dev, struct netdev_xdp *xdp)
+{
+   struct bpf_prog *new = xdp->prog;
+   int ret = 0;
+
+   switch (xdp->command) {
+   case XDP_SETUP_PROG: {
+   struct bpf_prog *old = rtnl_dereference(dev->xdp_prog);
+
+   rcu_assign_pointer(dev->xdp_prog, new);
+   if (old)
+   bpf_prog_put(old);
+
+   if (old && !new)
+   static_key_slow_dec(_xdp_needed);
+   else if (new && !old)
+   static_key_slow_inc(_xdp_needed);
+   break;
+   }
+
+   case XDP_QUERY_PROG:
+   xdp->prog_attached = !!rcu_access_pointer(dev->xdp_prog);
+   break;
+
+   default:
+   ret = -EINVAL;
+   break;
+   }
+
+   return ret;
+}
+
+static u32 netif_receive_generic_xdp(struct sk_buff *skb,
+struct bpf_prog *xdp_prog)
+{
+   struct xdp_buff xdp;
+   u32 act = XDP_DROP;
+   void *orig_data;
+   int hlen, off;
+
+   if (skb_linearize(skb))
+   goto do_drop;
+
+   /* The XDP program wants to see the packet starting at the MAC
+* header.
+*/
+

Re: [PATCH v3 net-next RFC] Generic XDP

2017-04-13 Thread David Miller

From: Daniel Borkmann 
Date: Thu, 13 Apr 2017 17:57:06 +0200

> On 04/12/2017 08:54 PM, David Miller wrote:
> [...]
>> +static u32 netif_receive_generic_xdp(struct sk_buff *skb,
>> + struct bpf_prog *xdp_prog)
>> +{
>> +struct xdp_buff xdp;
>> +u32 act = XDP_DROP;
>> +void *orig_data;
>> +int hlen, off;
>> +
>> +if (skb_linearize(skb))
> 
> Btw, given the skb can come from all kind of points in the stack,
> it could also be a clone at this point. One example is act_mirred
> which in fact does skb_clone() and can push the skb back to
> ingress path through netif_receive_skb() and thus could then go
> into generic xdp processing, where skb can be mangled.
> 
> Instead of skb_linearize() we would therefore need to use something
> like skb_ensure_writable(skb, skb->len) as equivalent, which also
> makes sure that we unclone whenever needed.

We could use skb_cow() for this purpose, which deals with cloning as
well as enforcing headroom.

However, thinking further about this, the goal is to make generic XDP
match precisely how in-driver-XDP behaves.  Therefore, such redirects
from act_mirred would never flow through the XDP path.

No other possibility can cause us to see a cloned packet here, we
are before network taps are processed, etc.  So in my opinion the
thing to do is to elide generic XDP if the SKB is cloned.

1 2 >

1 - 100 of 190 matches

Mail list logo