from:"Florent Revest"

[RFC 06/10] media: platform: Add Sunxi Cedrus decoder driver

2016-08-25 Thread Florent Revest

This patch adds a "sunxi-cedrus" v4l2 m2m decoder driver for
Allwinner's Video Processing Unit. This VPU has a low-level interface
which requires manual registers writing for frame headers. Hence, it
depends on the Request API to synchronize buffers with controls.

Most of the reverse engineering on which I based my work comes from the
"Cedrus" project: http://linux-sunxi.org/Cedrus

The driver currently only runs on the A13 and this patch doesn't
include any codec.

Signed-off-by: Florent Revest <florent.rev...@free-electrons.com>
---
 drivers/media/platform/Kconfig |  13 +
 drivers/media/platform/Makefile|   1 +
 drivers/media/platform/sunxi-cedrus/Makefile   |   2 +
 drivers/media/platform/sunxi-cedrus/sunxi_cedrus.c | 248 ++
 .../platform/sunxi-cedrus/sunxi_cedrus_common.h|  86 
 .../media/platform/sunxi-cedrus/sunxi_cedrus_dec.c | 544 +
 .../media/platform/sunxi-cedrus/sunxi_cedrus_dec.h |  33 ++
 .../media/platform/sunxi-cedrus/sunxi_cedrus_hw.c  | 153 ++
 .../media/platform/sunxi-cedrus/sunxi_cedrus_hw.h  |  32 ++
 .../platform/sunxi-cedrus/sunxi_cedrus_regs.h  | 170 +++
 10 files changed, 1282 insertions(+)
 create mode 100644 drivers/media/platform/sunxi-cedrus/Makefile
 create mode 100644 drivers/media/platform/sunxi-cedrus/sunxi_cedrus.c
 create mode 100644 drivers/media/platform/sunxi-cedrus/sunxi_cedrus_common.h
 create mode 100644 drivers/media/platform/sunxi-cedrus/sunxi_cedrus_dec.c
 create mode 100644 drivers/media/platform/sunxi-cedrus/sunxi_cedrus_dec.h
 create mode 100644 drivers/media/platform/sunxi-cedrus/sunxi_cedrus_hw.c
 create mode 100644 drivers/media/platform/sunxi-cedrus/sunxi_cedrus_hw.h
 create mode 100644 drivers/media/platform/sunxi-cedrus/sunxi_cedrus_regs.h

diff --git a/drivers/media/platform/Kconfig b/drivers/media/platform/Kconfig
index f25344b..92c92d3 100644
--- a/drivers/media/platform/Kconfig
+++ b/drivers/media/platform/Kconfig
@@ -315,6 +315,19 @@ config VIDEO_TI_VPE
  Support for the TI VPE(Video Processing Engine) block
  found on DRA7XX SoC.
 
+config VIDEO_SUNXI_CEDRUS
+   tristate "Sunxi CEDRUS VPU driver"
+   depends on VIDEO_DEV && VIDEO_V4L2
+   depends on ARCH_SUNXI
+   depends on HAS_DMA
+   select VIDEOBUF2_DMA_CONTIG
+   select V4L2_MEM2MEM_DEV
+   ---help---
+ Support for the VPU video codec found on Sunxi SoC.
+
+ To compile this driver as a module, choose M here: the module
+ will be called sunxi-cedrus.
+
 config VIDEO_TI_VPE_DEBUG
bool "VPE debug messages"
depends on VIDEO_TI_VPE
diff --git a/drivers/media/platform/Makefile b/drivers/media/platform/Makefile
index 21771c1..1419749 100644
--- a/drivers/media/platform/Makefile
+++ b/drivers/media/platform/Makefile
@@ -53,6 +53,7 @@ obj-$(CONFIG_VIDEO_RENESAS_VSP1)  += vsp1/
 obj-y  += omap/
 
 obj-$(CONFIG_VIDEO_AM437X_VPFE)+= am437x/
+obj-$(CONFIG_VIDEO_SUNXI_CEDRUS)   += sunxi-cedrus/
 
 obj-$(CONFIG_VIDEO_XILINX) += xilinx/
 
diff --git a/drivers/media/platform/sunxi-cedrus/Makefile 
b/drivers/media/platform/sunxi-cedrus/Makefile
new file mode 100644
index 000..14c2f7a
--- /dev/null
+++ b/drivers/media/platform/sunxi-cedrus/Makefile
@@ -0,0 +1,2 @@
+obj-$(CONFIG_VIDEO_SUNXI_CEDRUS) += sunxi_cedrus.o sunxi_cedrus_hw.o \
+   sunxi_cedrus_dec.o
diff --git a/drivers/media/platform/sunxi-cedrus/sunxi_cedrus.c 
b/drivers/media/platform/sunxi-cedrus/sunxi_cedrus.c
new file mode 100644
index 000..17af34c
--- /dev/null
+++ b/drivers/media/platform/sunxi-cedrus/sunxi_cedrus.c
@@ -0,0 +1,248 @@
+/*
+ * Sunxi Cedrus codec driver
+ *
+ * Copyright (C) 2016 Florent Revest
+ * Florent Revest <florent.rev...@free-electrons.com>
+ *
+ * Based on vim2m
+ *
+ * Copyright (c) 2009-2010 Samsung Electronics Co., Ltd.
+ * Pawel Osciak, <pa...@osciak.com>
+ * Marek Szyprowski, <m.szyprow...@samsung.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include "sunxi_cedrus_common.h"
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "sunxi_cedrus_dec.h"
+#include "sunxi_cedrus_hw.h"
+
+static int sunxi_cedrus_s_ctrl(struct v4l2_ctrl *ctrl)
+{
+   struct sunxi_cedrus_ctx *ctx =
+   container_of(ctrl->handler, struc

[RFC 01/10] clk: sunxi-ng: Add a couple of A13 clocks

2016-08-25 Thread Florent Revest

Add a new style driver for the clock control unit in Allwinner A13.

Only AVS and VE are supported since they weren't provided until now and are
needed for "sunxi-cedrus".

Signed-off-by: Florent Revest <florent.rev...@free-electrons.com>
---
 .../devicetree/bindings/clock/sunxi-ccu.txt|  1 +
 arch/arm/boot/dts/sun5i-a13.dtsi   | 11 +++
 drivers/clk/sunxi-ng/Kconfig   | 11 +++
 drivers/clk/sunxi-ng/Makefile  |  1 +
 drivers/clk/sunxi-ng/ccu-sun5i-a13.c   | 80 ++
 drivers/clk/sunxi-ng/ccu-sun5i-a13.h   | 25 +++
 include/dt-bindings/clock/sun5i-a13-ccu.h  | 49 +
 include/dt-bindings/reset/sun5i-a13-ccu.h  | 48 +
 8 files changed, 226 insertions(+)
 create mode 100644 drivers/clk/sunxi-ng/ccu-sun5i-a13.c
 create mode 100644 drivers/clk/sunxi-ng/ccu-sun5i-a13.h
 create mode 100644 include/dt-bindings/clock/sun5i-a13-ccu.h
 create mode 100644 include/dt-bindings/reset/sun5i-a13-ccu.h

diff --git a/Documentation/devicetree/bindings/clock/sunxi-ccu.txt 
b/Documentation/devicetree/bindings/clock/sunxi-ccu.txt
index cb91507..7bb7a6a 100644
--- a/Documentation/devicetree/bindings/clock/sunxi-ccu.txt
+++ b/Documentation/devicetree/bindings/clock/sunxi-ccu.txt
@@ -4,6 +4,7 @@ Allwinner Clock Control Unit Binding
 Required properties :
 - compatible: must contain one of the following compatible:
- "allwinner,sun8i-h3-ccu"
+   - "allwinner,sun5i-a13-ccu"
 
 - reg: Must contain the registers base address and length
 - clocks: phandle to the oscillators feeding the CCU. Two are needed:
diff --git a/arch/arm/boot/dts/sun5i-a13.dtsi b/arch/arm/boot/dts/sun5i-a13.dtsi
index e012890..2afe05fb 100644
--- a/arch/arm/boot/dts/sun5i-a13.dtsi
+++ b/arch/arm/boot/dts/sun5i-a13.dtsi
@@ -46,8 +46,10 @@
 
 #include "sun5i.dtsi"
 
+#include 
 #include 
 #include 
+#include 
 
 / {
interrupt-parent = <>;
@@ -327,6 +329,15 @@
};
};
};
+
+   ccu: clock@01c2 {
+   compatible = "allwinner,sun5i-a13-ccu";
+   reg = <0x01c2 0x400>;
+   clocks = <>, <>;
+   clock-names = "hosc", "losc";
+   #clock-cells = <1>;
+   #reset-cells = <1>;
+   };
};
 };
 
diff --git a/drivers/clk/sunxi-ng/Kconfig b/drivers/clk/sunxi-ng/Kconfig
index 2afcbd3..8faba4e 100644
--- a/drivers/clk/sunxi-ng/Kconfig
+++ b/drivers/clk/sunxi-ng/Kconfig
@@ -51,6 +51,17 @@ config SUNXI_CCU_MP
 
 # SoC Drivers
 
+config SUN5I_A13_CCU
+   bool "Support for the Allwinner A13 CCU"
+   select SUNXI_CCU_DIV
+   select SUNXI_CCU_NK
+   select SUNXI_CCU_NKM
+   select SUNXI_CCU_NKMP
+   select SUNXI_CCU_NM
+   select SUNXI_CCU_MP
+   select SUNXI_CCU_PHASE
+   default ARCH_SUN5I
+
 config SUN8I_H3_CCU
bool "Support for the Allwinner H3 CCU"
select SUNXI_CCU_DIV
diff --git a/drivers/clk/sunxi-ng/Makefile b/drivers/clk/sunxi-ng/Makefile
index 633ce64..1710745 100644
--- a/drivers/clk/sunxi-ng/Makefile
+++ b/drivers/clk/sunxi-ng/Makefile
@@ -17,4 +17,5 @@ obj-$(CONFIG_SUNXI_CCU_NM)+= ccu_nm.o
 obj-$(CONFIG_SUNXI_CCU_MP) += ccu_mp.o
 
 # SoC support
+obj-$(CONFIG_SUN5I_A13_CCU)+= ccu-sun5i-a13.o
 obj-$(CONFIG_SUN8I_H3_CCU) += ccu-sun8i-h3.o
diff --git a/drivers/clk/sunxi-ng/ccu-sun5i-a13.c 
b/drivers/clk/sunxi-ng/ccu-sun5i-a13.c
new file mode 100644
index 000..7f1da20
--- /dev/null
+++ b/drivers/clk/sunxi-ng/ccu-sun5i-a13.c
@@ -0,0 +1,80 @@
+/*
+ * Copyright (c) 2016 Maxime Ripard. All rights reserved.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+
+#include "ccu_common.h"
+#include "ccu_reset.h"
+
+#include "ccu_div.h"
+#include "ccu_gate.h"
+#include "ccu_mp.h"
+#include "ccu_mult.h"
+#include "ccu_nk.h"
+#include "ccu_nkm.h"
+#include "ccu_nkmp.h"
+#include "ccu_nm.h"
+#include "ccu_phase.h"
+
+#include "ccu-sun5i-a13.h"
+
+static SUNXI_CCU_GATE(ve_clk, "ve", "pll4",
+ 0x13c, BIT(31), CLK_SET_RATE_PARENT);
+
+static SUNXI_CCU_GATE(avs_clk, "avs",  "

[RFC 02/10] v4l: Add private compound control type.

2016-08-25 Thread Florent Revest

From: Pawel Osciak <posc...@chromium.org>

V4L2_CTRL_TYPE_PRIVATE is to be used for private driver compound
controls that use the "ptr" member of struct v4l2_ext_control.

Signed-off-by: Pawel Osciak <posc...@chromium.org>
Signed-off-by: Jung Zhao <jung.z...@rock-chips.com>
Signed-off-by: Florent Revest <florent.rev...@free-electrons.com>
---
 drivers/media/v4l2-core/v4l2-ctrls.c | 4 
 include/uapi/linux/videodev2.h   | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/drivers/media/v4l2-core/v4l2-ctrls.c 
b/drivers/media/v4l2-core/v4l2-ctrls.c
index f7333fe..60056b0 100644
--- a/drivers/media/v4l2-core/v4l2-ctrls.c
+++ b/drivers/media/v4l2-core/v4l2-ctrls.c
@@ -1543,6 +1543,10 @@ static int std_validate(const struct v4l2_ctrl *ctrl, 
u32 idx,
return -ERANGE;
return 0;
 
+   /* FIXME:just return 0 for now */
+   case V4L2_CTRL_TYPE_PRIVATE:
+   return 0;
+
default:
return -EINVAL;
}
diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
index 3eafd3f..904c44c 100644
--- a/include/uapi/linux/videodev2.h
+++ b/include/uapi/linux/videodev2.h
@@ -1521,6 +1521,8 @@ enum v4l2_ctrl_type {
V4L2_CTRL_TYPE_U8= 0x0100,
V4L2_CTRL_TYPE_U16   = 0x0101,
V4L2_CTRL_TYPE_U32   = 0x0102,
+
+   V4L2_CTRL_TYPE_PRIVATE   = 0x,
 };
 
 /*  Used in the VIDIOC_QUERYCTRL ioctl for querying controls */
-- 
2.7.4

[RFC 05/10] v4l: Add MPEG4 low-level decoder API control

2016-08-25 Thread Florent Revest

This control is to be used with the new low-level decoder API for MPEG4
to provide additional parameters for the hardware that cannot parse the
input stream.

Some fields are still missing for this structure to be complete.

Signed-off-by: Florent Revest <florent.rev...@free-electrons.com>
---
 drivers/media/v4l2-core/v4l2-ctrls.c |  8 +++
 drivers/media/v4l2-core/v4l2-ioctl.c |  1 +
 include/uapi/linux/v4l2-controls.h   | 42 
 include/uapi/linux/videodev2.h   |  3 +++
 4 files changed, 54 insertions(+)

diff --git a/drivers/media/v4l2-core/v4l2-ctrls.c 
b/drivers/media/v4l2-core/v4l2-ctrls.c
index 331d009..302c744 100644
--- a/drivers/media/v4l2-core/v4l2-ctrls.c
+++ b/drivers/media/v4l2-core/v4l2-ctrls.c
@@ -761,6 +761,7 @@ const char *v4l2_ctrl_get_name(u32 id)
case V4L2_CID_MPEG_VIDEO_FORCE_KEY_FRAME:   return "Force 
Key Frame";
 
case V4L2_CID_MPEG_VIDEO_MPEG2_FRAME_HDR:   return "MPEG2 
Frame Header";
+   case V4L2_CID_MPEG_VIDEO_MPEG4_FRAME_HDR:   return "MPEG4 
Frame Header";
 
/* VPX controls */
case V4L2_CID_MPEG_VIDEO_VPX_NUM_PARTITIONS:return "VPX 
Number of Partitions";
@@ -1148,6 +1149,9 @@ void v4l2_ctrl_fill(u32 id, const char **name, enum 
v4l2_ctrl_type *type,
case V4L2_CID_MPEG_VIDEO_MPEG2_FRAME_HDR:
*type = V4L2_CTRL_TYPE_MPEG2_FRAME_HDR;
break;
+   case V4L2_CID_MPEG_VIDEO_MPEG4_FRAME_HDR:
+   *type = V4L2_CTRL_TYPE_MPEG4_FRAME_HDR;
+   break;
default:
*type = V4L2_CTRL_TYPE_INTEGER;
break;
@@ -1549,6 +1553,7 @@ static int std_validate(const struct v4l2_ctrl *ctrl, u32 
idx,
return 0;
 
case V4L2_CTRL_TYPE_MPEG2_FRAME_HDR:
+   case V4L2_CTRL_TYPE_MPEG4_FRAME_HDR:
return 0;
 
/* FIXME:just return 0 for now */
@@ -2107,6 +2112,9 @@ static struct v4l2_ctrl *v4l2_ctrl_new(struct 
v4l2_ctrl_handler *hdl,
case V4L2_CTRL_TYPE_MPEG2_FRAME_HDR:
elem_size = sizeof(struct v4l2_ctrl_mpeg2_frame_hdr);
break;
+   case V4L2_CTRL_TYPE_MPEG4_FRAME_HDR:
+   elem_size = sizeof(struct v4l2_ctrl_mpeg4_frame_hdr);
+   break;
default:
if (type < V4L2_CTRL_COMPOUND_TYPES)
elem_size = sizeof(s32);
diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c 
b/drivers/media/v4l2-core/v4l2-ioctl.c
index de382a1..be7973e 100644
--- a/drivers/media/v4l2-core/v4l2-ioctl.c
+++ b/drivers/media/v4l2-core/v4l2-ioctl.c
@@ -1274,6 +1274,7 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc *fmt)
case V4L2_PIX_FMT_VC1_ANNEX_L:  descr = "VC-1 (SMPTE 412M Annex 
L)"; break;
case V4L2_PIX_FMT_VP8:  descr = "VP8"; break;
case V4L2_PIX_FMT_MPEG2_FRAME:  descr = "MPEG2 FRAME"; break;
+   case V4L2_PIX_FMT_MPEG4_FRAME:  descr = "MPEG4 FRAME"; break;
case V4L2_PIX_FMT_CPIA1:descr = "GSPCA CPiA YUV"; break;
case V4L2_PIX_FMT_WNVA: descr = "WNVA"; break;
case V4L2_PIX_FMT_SN9C10X:  descr = "GSPCA SN9C10X"; break;
diff --git a/include/uapi/linux/v4l2-controls.h 
b/include/uapi/linux/v4l2-controls.h
index cdf9497..af466ca 100644
--- a/include/uapi/linux/v4l2-controls.h
+++ b/include/uapi/linux/v4l2-controls.h
@@ -548,6 +548,7 @@ enum v4l2_mpeg_video_mpeg4_profile {
 #define V4L2_CID_MPEG_VIDEO_MPEG4_QPEL (V4L2_CID_MPEG_BASE+407)
 
 #define V4L2_CID_MPEG_VIDEO_MPEG2_FRAME_HDR (V4L2_CID_MPEG_BASE+450)
+#define V4L2_CID_MPEG_VIDEO_MPEG4_FRAME_HDR (V4L2_CID_MPEG_BASE+451)
 
 /*  Control IDs for VP8 streams
  *  Although VP8 is not part of MPEG we add these controls to the MPEG class
@@ -1000,4 +1001,45 @@ struct v4l2_ctrl_mpeg2_frame_hdr {
__u8 forward_index;
 };
 
+struct v4l2_ctrl_mpeg4_frame_hdr {
+   __u32 slice_len;
+   __u32 slice_pos;
+   unsigned char quant_scale;
+
+   __u16 width;
+   __u16 height;
+
+   struct {
+   unsigned int short_video_header : 1;
+   unsigned int chroma_format  : 2;
+   unsigned int interlaced : 1;
+   unsigned int obmc_disable   : 1;
+   unsigned int sprite_enable  : 2;
+   unsigned int sprite_warping_accuracy: 2;
+   unsigned int quant_type : 1;
+   unsigned int quarter_sample : 1;
+   unsigned int data_partitioned   : 1;
+   unsigned int reversible_vlc : 1;
+   unsigned int resync_marker_disable  : 1;
+   } v

[RFC 09/10] ARM: dts: sun5i: Use video-engine node

2016-08-25 Thread Florent Revest

Now that we have a driver matching "allwinner,sun5i-a13-video-engine" we
can load it.

The "video-engine" node depends on the new sunxi-ng's CCU clock and
reset bindings. This patch also includes a ve_reserved DMA pool for
videobuf2 buffer allocations in sunxi-cedrus.

Signed-off-by: Florent Revest <florent.rev...@free-electrons.com>
---
 arch/arm/boot/dts/sun5i-a13.dtsi | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/arch/arm/boot/dts/sun5i-a13.dtsi b/arch/arm/boot/dts/sun5i-a13.dtsi
index 2afe05fb..384b645 100644
--- a/arch/arm/boot/dts/sun5i-a13.dtsi
+++ b/arch/arm/boot/dts/sun5i-a13.dtsi
@@ -69,6 +69,19 @@
};
};
 
+   reserved-memory {
+   #address-cells = <1>;
+   #size-cells = <1>;
+   ranges;
+
+   ve_reserved: cma {
+   compatible = "shared-dma-pool";
+   reg = <0x43d0 0x900>;
+   no-map;
+   linux,cma-default;
+   };
+   };
+
thermal-zones {
cpu_thermal {
/* milliseconds */
@@ -330,6 +343,24 @@
};
};
 
+   video-engine {
+   compatible = "allwinner,sun5i-a13-video-engine";
+   memory-region = <_reserved>;
+
+   clocks = <_gates 32>, < CLK_VE>,
+<_gates 0>;
+   clock-names = "ahb", "mod", "ram";
+
+   assigned-clocks = < CLK_VE>;
+   assigned-clock-rates = <32000>;
+
+   resets = < RST_VE>;
+
+   interrupts = <53>;
+
+   reg = <0x01c0e000 4096>;
+   };
+
ccu: clock@01c2 {
compatible = "allwinner,sun5i-a13-ccu";
reg = <0x01c2 0x400>;
-- 
2.7.4

[RFC 04/10] v4l: Add MPEG2 low-level decoder API control

2016-08-25 Thread Florent Revest

This control is to be used with the new low-level decoder API for
MPEG2 to provide additional parameters for the hardware that cannot parse
the input stream.

Signed-off-by: Florent Revest <florent.rev...@free-electrons.com>
---
 drivers/media/v4l2-core/v4l2-ctrls.c | 11 +++
 drivers/media/v4l2-core/v4l2-ioctl.c |  1 +
 include/uapi/linux/v4l2-controls.h   | 26 ++
 include/uapi/linux/videodev2.h   |  3 +++
 4 files changed, 41 insertions(+)

diff --git a/drivers/media/v4l2-core/v4l2-ctrls.c 
b/drivers/media/v4l2-core/v4l2-ctrls.c
index 60056b0..331d009 100644
--- a/drivers/media/v4l2-core/v4l2-ctrls.c
+++ b/drivers/media/v4l2-core/v4l2-ctrls.c
@@ -760,6 +760,8 @@ const char *v4l2_ctrl_get_name(u32 id)
case V4L2_CID_MPEG_VIDEO_REPEAT_SEQ_HEADER: return "Repeat 
Sequence Header";
case V4L2_CID_MPEG_VIDEO_FORCE_KEY_FRAME:   return "Force 
Key Frame";
 
+   case V4L2_CID_MPEG_VIDEO_MPEG2_FRAME_HDR:   return "MPEG2 
Frame Header";
+
/* VPX controls */
case V4L2_CID_MPEG_VIDEO_VPX_NUM_PARTITIONS:return "VPX 
Number of Partitions";
case V4L2_CID_MPEG_VIDEO_VPX_IMD_DISABLE_4X4:   return "VPX 
Intra Mode Decision Disable";
@@ -1143,6 +1145,9 @@ void v4l2_ctrl_fill(u32 id, const char **name, enum 
v4l2_ctrl_type *type,
case V4L2_CID_RDS_TX_ALT_FREQS:
*type = V4L2_CTRL_TYPE_U32;
break;
+   case V4L2_CID_MPEG_VIDEO_MPEG2_FRAME_HDR:
+   *type = V4L2_CTRL_TYPE_MPEG2_FRAME_HDR;
+   break;
default:
*type = V4L2_CTRL_TYPE_INTEGER;
break;
@@ -1543,6 +1548,9 @@ static int std_validate(const struct v4l2_ctrl *ctrl, u32 
idx,
return -ERANGE;
return 0;
 
+   case V4L2_CTRL_TYPE_MPEG2_FRAME_HDR:
+   return 0;
+
/* FIXME:just return 0 for now */
case V4L2_CTRL_TYPE_PRIVATE:
return 0;
@@ -2096,6 +2104,9 @@ static struct v4l2_ctrl *v4l2_ctrl_new(struct 
v4l2_ctrl_handler *hdl,
case V4L2_CTRL_TYPE_U32:
elem_size = sizeof(u32);
break;
+   case V4L2_CTRL_TYPE_MPEG2_FRAME_HDR:
+   elem_size = sizeof(struct v4l2_ctrl_mpeg2_frame_hdr);
+   break;
default:
if (type < V4L2_CTRL_COMPOUND_TYPES)
elem_size = sizeof(s32);
diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c 
b/drivers/media/v4l2-core/v4l2-ioctl.c
index f19b666..de382a1 100644
--- a/drivers/media/v4l2-core/v4l2-ioctl.c
+++ b/drivers/media/v4l2-core/v4l2-ioctl.c
@@ -1273,6 +1273,7 @@ static void v4l_fill_fmtdesc(struct v4l2_fmtdesc *fmt)
case V4L2_PIX_FMT_VC1_ANNEX_G:  descr = "VC-1 (SMPTE 412M Annex 
G)"; break;
case V4L2_PIX_FMT_VC1_ANNEX_L:  descr = "VC-1 (SMPTE 412M Annex 
L)"; break;
case V4L2_PIX_FMT_VP8:  descr = "VP8"; break;
+   case V4L2_PIX_FMT_MPEG2_FRAME:  descr = "MPEG2 FRAME"; break;
case V4L2_PIX_FMT_CPIA1:descr = "GSPCA CPiA YUV"; break;
case V4L2_PIX_FMT_WNVA: descr = "WNVA"; break;
case V4L2_PIX_FMT_SN9C10X:  descr = "GSPCA SN9C10X"; break;
diff --git a/include/uapi/linux/v4l2-controls.h 
b/include/uapi/linux/v4l2-controls.h
index b6a357a..cdf9497 100644
--- a/include/uapi/linux/v4l2-controls.h
+++ b/include/uapi/linux/v4l2-controls.h
@@ -547,6 +547,8 @@ enum v4l2_mpeg_video_mpeg4_profile {
 };
 #define V4L2_CID_MPEG_VIDEO_MPEG4_QPEL (V4L2_CID_MPEG_BASE+407)
 
+#define V4L2_CID_MPEG_VIDEO_MPEG2_FRAME_HDR (V4L2_CID_MPEG_BASE+450)
+
 /*  Control IDs for VP8 streams
  *  Although VP8 is not part of MPEG we add these controls to the MPEG class
  *  as that class is already handling other video compression standards
@@ -974,4 +976,28 @@ enum v4l2_detect_md_mode {
 #define V4L2_CID_DETECT_MD_THRESHOLD_GRID  (V4L2_CID_DETECT_CLASS_BASE + 3)
 #define V4L2_CID_DETECT_MD_REGION_GRID (V4L2_CID_DETECT_CLASS_BASE + 4)
 
+struct v4l2_ctrl_mpeg2_frame_hdr {
+   __u32 slice_len;
+   __u32 slice_pos;
+   enum { MPEG1, MPEG2 } type;
+
+   __u16 width;
+   __u16 height;
+
+   enum { PCT_I = 1, PCT_P, PCT_B, PCT_D } picture_coding_type;
+   __u8 f_code[2][2];
+
+   __u8 intra_dc_precision;
+   __u8 picture_structure;
+   __u8 top_field_first;
+   __u8 frame_pred_frame_dct;
+   __u8 concealment_motion_vectors;
+   __u8 q_scale_type;
+   __u8 intra_vlc_format;
+   __u8 alternate_scan;
+
+   __u8 backward_index;
+   __u8 forward_index;
+};
+
 #endif
diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
index 96e034d..feff200 100644
--- a/include/uapi/linux/videodev2.h
+

[RFC 10/10] sunxi-cedrus: Add device tree binding document

2016-08-25 Thread Florent Revest

Device Tree bindings for the Allwinner's video engine

Signed-off-by: Florent Revest <florent.rev...@free-electrons.com>
---
 .../devicetree/bindings/media/sunxi-cedrus.txt | 44 ++
 1 file changed, 44 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/media/sunxi-cedrus.txt

diff --git a/Documentation/devicetree/bindings/media/sunxi-cedrus.txt 
b/Documentation/devicetree/bindings/media/sunxi-cedrus.txt
new file mode 100644
index 000..26f2e09
--- /dev/null
+++ b/Documentation/devicetree/bindings/media/sunxi-cedrus.txt
@@ -0,0 +1,44 @@
+Device-Tree bindings for SUNXI video engine found in sunXi SoC family
+
+Required properties:
+- compatible   : "allwinner,sun5i-a13-video-engine";
+- memory-region : DMA pool for buffers allocation;
+- clocks   : list of clock specifiers, corresponding to
+ entries in clock-names property;
+- clock-names  : should contain "ahb", "mod" and "ram" entries;
+- resets   : phandle for reset;
+- interrupts   : should contain VE interrupt number;
+- reg  : should contain register base and length of VE.
+
+Example:
+
+reserved-memory {
+   #address-cells = <1>;
+   #size-cells = <1>;
+   ranges;
+
+   ve_reserved: cma {
+   compatible = "shared-dma-pool";
+   reg = <0x43d0 0x900>;
+   no-map;
+   linux,cma-default;
+   };
+};
+
+video-engine {
+   compatible = "allwinner,sun5i-a13-video-engine";
+   memory-region = <_reserved>;
+
+   clocks = <_gates 32>, < CLK_VE>,
+<_gates 0>;
+   clock-names = "ahb", "mod", "ram";
+
+   assigned-clocks = < CLK_VE>;
+   assigned-clock-rates = <32000>;
+
+   resets = < RST_VE>;
+
+   interrupts = <53>;
+
+   reg = <0x01c0e000 4096>;
+};
-- 
2.7.4

[RFC 07/10] sunxi-cedrus: Add a MPEG 2 codec

2016-08-25 Thread Florent Revest

This patch introduces the support of MPEG2 video decoding to the
sunxi-cedrus video decoder driver.

Signed-off-by: Florent Revest <florent.rev...@free-electrons.com>
---
 drivers/media/platform/sunxi-cedrus/Makefile   |   2 +-
 drivers/media/platform/sunxi-cedrus/sunxi_cedrus.c |  26 +++-
 .../platform/sunxi-cedrus/sunxi_cedrus_common.h|   2 +
 .../media/platform/sunxi-cedrus/sunxi_cedrus_dec.c |  15 +-
 .../media/platform/sunxi-cedrus/sunxi_cedrus_hw.c  |  17 ++-
 .../media/platform/sunxi-cedrus/sunxi_cedrus_hw.h  |   4 +
 .../platform/sunxi-cedrus/sunxi_cedrus_mpeg2.c | 152 +
 7 files changed, 211 insertions(+), 7 deletions(-)
 create mode 100644 drivers/media/platform/sunxi-cedrus/sunxi_cedrus_mpeg2.c

diff --git a/drivers/media/platform/sunxi-cedrus/Makefile 
b/drivers/media/platform/sunxi-cedrus/Makefile
index 14c2f7a..2d495a2 100644
--- a/drivers/media/platform/sunxi-cedrus/Makefile
+++ b/drivers/media/platform/sunxi-cedrus/Makefile
@@ -1,2 +1,2 @@
 obj-$(CONFIG_VIDEO_SUNXI_CEDRUS) += sunxi_cedrus.o sunxi_cedrus_hw.o \
-   sunxi_cedrus_dec.o
+   sunxi_cedrus_dec.o sunxi_cedrus_mpeg2.o
diff --git a/drivers/media/platform/sunxi-cedrus/sunxi_cedrus.c 
b/drivers/media/platform/sunxi-cedrus/sunxi_cedrus.c
index 17af34c..d1c957a 100644
--- a/drivers/media/platform/sunxi-cedrus/sunxi_cedrus.c
+++ b/drivers/media/platform/sunxi-cedrus/sunxi_cedrus.c
@@ -46,14 +46,31 @@ static int sunxi_cedrus_s_ctrl(struct v4l2_ctrl *ctrl)
struct sunxi_cedrus_ctx *ctx =
container_of(ctrl->handler, struct sunxi_cedrus_ctx, hdl);
 
-   v4l2_err(>dev->v4l2_dev, "Invalid control\n");
-   return -EINVAL;
+   switch (ctrl->id) {
+   case V4L2_CID_MPEG_VIDEO_MPEG2_FRAME_HDR:
+   /* This is kept in memory and used directly. */
+   break;
+   default:
+   v4l2_err(>dev->v4l2_dev, "Invalid control\n");
+   return -EINVAL;
+   }
+
+   return 0;
 }
 
 static const struct v4l2_ctrl_ops sunxi_cedrus_ctrl_ops = {
.s_ctrl = sunxi_cedrus_s_ctrl,
 };
 
+static const struct v4l2_ctrl_config sunxi_cedrus_ctrl_mpeg2_frame_hdr = {
+   .ops = _cedrus_ctrl_ops,
+   .id = V4L2_CID_MPEG_VIDEO_MPEG2_FRAME_HDR,
+   .type = V4L2_CTRL_TYPE_PRIVATE,
+   .name = "MPEG2 Frame Header Parameters",
+   .max_reqs = VIDEO_MAX_FRAME,
+   .elem_size = sizeof(struct v4l2_ctrl_mpeg2_frame_hdr),
+};
+
 /*
  * File operations
  */
@@ -78,6 +95,10 @@ static int sunxi_cedrus_open(struct file *file)
hdl = >hdl;
v4l2_ctrl_handler_init(hdl, 1);
 
+   ctx->mpeg2_frame_hdr_ctrl = v4l2_ctrl_new_custom(hdl,
+   _cedrus_ctrl_mpeg2_frame_hdr, NULL);
+   ctx->mpeg2_frame_hdr_ctrl->flags |= V4L2_CTRL_FLAG_REQ_KEEP;
+
if (hdl->error) {
rc = hdl->error;
v4l2_ctrl_handler_free(hdl);
@@ -117,6 +138,7 @@ static int sunxi_cedrus_release(struct file *file)
v4l2_fh_del(>fh);
v4l2_fh_exit(>fh);
v4l2_ctrl_handler_free(>hdl);
+   ctx->mpeg2_frame_hdr_ctrl = NULL;
mutex_lock(>dev_mutex);
v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
mutex_unlock(>dev_mutex);
diff --git a/drivers/media/platform/sunxi-cedrus/sunxi_cedrus_common.h 
b/drivers/media/platform/sunxi-cedrus/sunxi_cedrus_common.h
index 6b8d87a..e715184 100644
--- a/drivers/media/platform/sunxi-cedrus/sunxi_cedrus_common.h
+++ b/drivers/media/platform/sunxi-cedrus/sunxi_cedrus_common.h
@@ -70,6 +70,8 @@ struct sunxi_cedrus_ctx {
struct v4l2_ctrl_handler hdl;
 
struct vb2_buffer *dst_bufs[VIDEO_MAX_FRAME];
+
+   struct v4l2_ctrl *mpeg2_frame_hdr_ctrl;
 };
 
 static inline void sunxi_cedrus_write(struct sunxi_cedrus_dev *vpu,
diff --git a/drivers/media/platform/sunxi-cedrus/sunxi_cedrus_dec.c 
b/drivers/media/platform/sunxi-cedrus/sunxi_cedrus_dec.c
index 71ef34b..38e8a3a 100644
--- a/drivers/media/platform/sunxi-cedrus/sunxi_cedrus_dec.c
+++ b/drivers/media/platform/sunxi-cedrus/sunxi_cedrus_dec.c
@@ -48,6 +48,11 @@ static struct sunxi_cedrus_fmt formats[] = {
.depth = 8,
.num_planes = 2,
},
+   {
+   .fourcc = V4L2_PIX_FMT_MPEG2_FRAME,
+   .types  = SUNXI_CEDRUS_OUTPUT,
+   .num_planes = 1,
+   },
 };
 
 #define NUM_FORMATS ARRAY_SIZE(formats)
@@ -120,8 +125,14 @@ void device_run(void *priv)
 V4L2_BUF_FLAG_KEYFRAME | V4L2_BUF_FLAG_PFRAME |
 V4L2_BUF_FLAG_BFRAME   | V4L2_BUF_FLAG_TSTAMP_SRC_MASK);
 
-   v4l2_m2m_buf_done(in_vb, VB2_BUF_STATE_ERROR);
-   v4l2_m2m_buf_done(out_vb, VB2_BUF_STATE_ERROR);
+   if (ctx->vpu_src_fmt->fourcc == V4L2_PIX_FMT_MPEG2_FRAME) {
+   struct v4l2_ctrl_mpeg2_frame_hdr *frame_hdr =
+

[RFC 00/10] Add Sunxi Cedrus Video Decoder Driver

2016-08-25 Thread Florent Revest

Hi,

This series adds a v4l2 memory2memory decoder driver for Allwinner's
VPU found in the A13 SoC. It follows the reverse engineering effort
of the Cedrus [1] project.

The VPU is able to decode a bunch of formats but currently only MPEG2
and a subset of MPEG4 are supported by the driver, more will come
eventually.

This VPU needs a frame-by-frame registers programming and implements
the idea of "Frame API" [2] proposed by Pawel Osciak. (i.e: binding
standard frame headers with frames via the "Request API") The patchset
includes both a new control for MPEG2 and MPEG4 frame headers. The
MPEG2 control should be generic enough for every other drivers but the
MPEG4 control sticks to the bare minimum needed by sunxi-cedrus.

The "Frame API" relies on the controls features of the "Request API".
[3] Since the latest Request API RFCs don't support controls and
given the time I had to work on this driver, I chose to use an older
RFC from Hans. [4] Of course, this is definitely not meant to be kept
and as soon as a newer Request API will support controls I'll stick
to the newest code base.

If you are interested in testing this driver, you can find a recently
(v4.8rc3) rebased version of this request API along my patchset in
this repository. [5] I also developed a libVA backend interfacing
with my proposal of MPEG2 and MPEG4 Frame API. [6] It's called 
"sunxi-cedrus-drv-video" but the only sunxi-cedrus specific part is
the format conversion code. Overall it should be generic enough for
any other v4l driver using the Frame API and as soon as the support
of DRM planes for this pixel format will be added it could be
renamed to something more generic.

[1] http://linux-sunxi.org/Cedrus
[2] 
https://docs.google.com/presentation/d/1RLkH3QxdmrcW_t41KllEvUmVsrHMbMOgd6CqAgzR7U4/pub?slide=id.p
[3] https://lwn.net/Articles/688585/
[4] https://lwn.net/Articles/641204/
[5] https://github.com/FlorentRevest/linux-sunxi-cedrus
[6] https://github.com/FlorentRevest/sunxi-cedrus-drv-video

Florent Revest (9):
  clk: sunxi-ng: Add a couple of A13 clocks
  v4l: Add sunxi Video Engine pixel format
  v4l: Add MPEG2 low-level decoder API control
  v4l: Add MPEG4 low-level decoder API control
  media: platform: Add Sunxi Cedrus decoder driver
  sunxi-cedrus: Add a MPEG 2 codec
  sunxi-cedrus: Add a MPEG 4 codec
  ARM: dts: sun5i: Use video-engine node
  sunxi-cedrus: Add device tree binding document

Pawel Osciak (1):
  v4l: Add private compound control type.

 .../devicetree/bindings/clock/sunxi-ccu.txt|   1 +
 .../devicetree/bindings/media/sunxi-cedrus.txt |  44 ++
 arch/arm/boot/dts/sun5i-a13.dtsi   |  42 ++
 drivers/clk/sunxi-ng/Kconfig   |  11 +
 drivers/clk/sunxi-ng/Makefile  |   1 +
 drivers/clk/sunxi-ng/ccu-sun5i-a13.c   |  80 +++
 drivers/clk/sunxi-ng/ccu-sun5i-a13.h   |  25 +
 drivers/media/platform/Kconfig |  13 +
 drivers/media/platform/Makefile|   1 +
 drivers/media/platform/sunxi-cedrus/Makefile   |   3 +
 drivers/media/platform/sunxi-cedrus/sunxi_cedrus.c | 285 ++
 .../platform/sunxi-cedrus/sunxi_cedrus_common.h| 101 
 .../media/platform/sunxi-cedrus/sunxi_cedrus_dec.c | 588 +
 .../media/platform/sunxi-cedrus/sunxi_cedrus_dec.h |  33 ++
 .../media/platform/sunxi-cedrus/sunxi_cedrus_hw.c  | 166 ++
 .../media/platform/sunxi-cedrus/sunxi_cedrus_hw.h  |  39 ++
 .../platform/sunxi-cedrus/sunxi_cedrus_mpeg2.c | 152 ++
 .../platform/sunxi-cedrus/sunxi_cedrus_mpeg4.c | 140 +
 .../platform/sunxi-cedrus/sunxi_cedrus_regs.h  | 170 ++
 drivers/media/v4l2-core/v4l2-ctrls.c   |  23 +
 drivers/media/v4l2-core/v4l2-ioctl.c   |   2 +
 include/dt-bindings/clock/sun5i-a13-ccu.h  |  49 ++
 include/dt-bindings/reset/sun5i-a13-ccu.h  |  48 ++
 include/uapi/linux/v4l2-controls.h |  68 +++
 include/uapi/linux/videodev2.h |   9 +
 25 files changed, 2094 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/media/sunxi-cedrus.txt
 create mode 100644 drivers/clk/sunxi-ng/ccu-sun5i-a13.c
 create mode 100644 drivers/clk/sunxi-ng/ccu-sun5i-a13.h
 create mode 100644 drivers/media/platform/sunxi-cedrus/Makefile
 create mode 100644 drivers/media/platform/sunxi-cedrus/sunxi_cedrus.c
 create mode 100644 drivers/media/platform/sunxi-cedrus/sunxi_cedrus_common.h
 create mode 100644 drivers/media/platform/sunxi-cedrus/sunxi_cedrus_dec.c
 create mode 100644 drivers/media/platform/sunxi-cedrus/sunxi_cedrus_dec.h
 create mode 100644 drivers/media/platform/sunxi-cedrus/sunxi_cedrus_hw.c
 create mode 100644 drivers/media/platform/sunxi-cedrus/sunxi_cedrus_hw.h
 create mode 100644 drivers/media/platform/sunxi-cedrus/sunxi_cedrus_mpeg2.c
 create mode 100644 drivers/media/platform/sunxi-cedrus/sunxi_cedrus_mpeg4.c
 crea

[RFC 03/10] v4l: Add sunxi Video Engine pixel format

2016-08-25 Thread Florent Revest

Add support for the allwinner's proprietary pixel format described in
details here: http://linux-sunxi.org/File:Ve_tile_format_v1.pdf

This format is similar to V4L2_PIX_FMT_NV12M but the planes are divided
in tiles of 32x32px.

Signed-off-by: Florent Revest <florent.rev...@free-electrons.com>
---
 include/uapi/linux/videodev2.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
index 904c44c..96e034d 100644
--- a/include/uapi/linux/videodev2.h
+++ b/include/uapi/linux/videodev2.h
@@ -627,6 +627,7 @@ struct v4l2_pix_format {
 #define V4L2_PIX_FMT_Y8I  v4l2_fourcc('Y', '8', 'I', ' ') /* Greyscale 
8-bit L/R interleaved */
 #define V4L2_PIX_FMT_Y12I v4l2_fourcc('Y', '1', '2', 'I') /* Greyscale 
12-bit L/R interleaved */
 #define V4L2_PIX_FMT_Z16  v4l2_fourcc('Z', '1', '6', ' ') /* Depth data 
16-bit */
+#define V4L2_PIX_FMT_SUNXIv4l2_fourcc('S', 'X', 'I', 'Y') /* Sunxi VE's 
32x32 tiled NV12 */
 
 /* SDR formats - used only for Software Defined Radio devices */
 #define V4L2_SDR_FMT_CU8  v4l2_fourcc('C', 'U', '0', '8') /* IQ u8 */
-- 
2.7.4

[RFC 08/10] sunxi-cedrus: Add a MPEG 4 codec

2016-08-25 Thread Florent Revest

This patch introduces the support of MPEG4 video decoding to the
sunxi-cedrus video decoder driver.

Signed-off-by: Florent Revest <florent.rev...@free-electrons.com>
---
 drivers/media/platform/sunxi-cedrus/Makefile   |   3 +-
 drivers/media/platform/sunxi-cedrus/sunxi_cedrus.c |  15 +++
 .../platform/sunxi-cedrus/sunxi_cedrus_common.h|  13 ++
 .../media/platform/sunxi-cedrus/sunxi_cedrus_dec.c |  33 +
 .../media/platform/sunxi-cedrus/sunxi_cedrus_hw.h  |   3 +
 .../platform/sunxi-cedrus/sunxi_cedrus_mpeg4.c | 140 +
 6 files changed, 206 insertions(+), 1 deletion(-)
 create mode 100644 drivers/media/platform/sunxi-cedrus/sunxi_cedrus_mpeg4.c

diff --git a/drivers/media/platform/sunxi-cedrus/Makefile 
b/drivers/media/platform/sunxi-cedrus/Makefile
index 2d495a2..823d611 100644
--- a/drivers/media/platform/sunxi-cedrus/Makefile
+++ b/drivers/media/platform/sunxi-cedrus/Makefile
@@ -1,2 +1,3 @@
 obj-$(CONFIG_VIDEO_SUNXI_CEDRUS) += sunxi_cedrus.o sunxi_cedrus_hw.o \
-   sunxi_cedrus_dec.o sunxi_cedrus_mpeg2.o
+   sunxi_cedrus_dec.o sunxi_cedrus_mpeg2.o \
+   sunxi_cedrus_mpeg4.o
diff --git a/drivers/media/platform/sunxi-cedrus/sunxi_cedrus.c 
b/drivers/media/platform/sunxi-cedrus/sunxi_cedrus.c
index d1c957a..3001440 100644
--- a/drivers/media/platform/sunxi-cedrus/sunxi_cedrus.c
+++ b/drivers/media/platform/sunxi-cedrus/sunxi_cedrus.c
@@ -47,6 +47,7 @@ static int sunxi_cedrus_s_ctrl(struct v4l2_ctrl *ctrl)
container_of(ctrl->handler, struct sunxi_cedrus_ctx, hdl);
 
switch (ctrl->id) {
+   case V4L2_CID_MPEG_VIDEO_MPEG4_FRAME_HDR:
case V4L2_CID_MPEG_VIDEO_MPEG2_FRAME_HDR:
/* This is kept in memory and used directly. */
break;
@@ -71,6 +72,15 @@ static const struct v4l2_ctrl_config 
sunxi_cedrus_ctrl_mpeg2_frame_hdr = {
.elem_size = sizeof(struct v4l2_ctrl_mpeg2_frame_hdr),
 };
 
+static const struct v4l2_ctrl_config sunxi_cedrus_ctrl_mpeg4_frame_hdr = {
+   .ops = _cedrus_ctrl_ops,
+   .id = V4L2_CID_MPEG_VIDEO_MPEG4_FRAME_HDR,
+   .type = V4L2_CTRL_TYPE_PRIVATE,
+   .name = "MPEG4 Frame Header Parameters",
+   .max_reqs = VIDEO_MAX_FRAME,
+   .elem_size = sizeof(struct v4l2_ctrl_mpeg4_frame_hdr),
+};
+
 /*
  * File operations
  */
@@ -99,6 +109,10 @@ static int sunxi_cedrus_open(struct file *file)
_cedrus_ctrl_mpeg2_frame_hdr, NULL);
ctx->mpeg2_frame_hdr_ctrl->flags |= V4L2_CTRL_FLAG_REQ_KEEP;
 
+   ctx->mpeg4_frame_hdr_ctrl = v4l2_ctrl_new_custom(hdl,
+   _cedrus_ctrl_mpeg4_frame_hdr, NULL);
+   ctx->mpeg4_frame_hdr_ctrl->flags |= V4L2_CTRL_FLAG_REQ_KEEP;
+
if (hdl->error) {
rc = hdl->error;
v4l2_ctrl_handler_free(hdl);
@@ -139,6 +153,7 @@ static int sunxi_cedrus_release(struct file *file)
v4l2_fh_exit(>fh);
v4l2_ctrl_handler_free(>hdl);
ctx->mpeg2_frame_hdr_ctrl = NULL;
+   ctx->mpeg4_frame_hdr_ctrl = NULL;
mutex_lock(>dev_mutex);
v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);
mutex_unlock(>dev_mutex);
diff --git a/drivers/media/platform/sunxi-cedrus/sunxi_cedrus_common.h 
b/drivers/media/platform/sunxi-cedrus/sunxi_cedrus_common.h
index e715184..33fa891 100644
--- a/drivers/media/platform/sunxi-cedrus/sunxi_cedrus_common.h
+++ b/drivers/media/platform/sunxi-cedrus/sunxi_cedrus_common.h
@@ -49,6 +49,18 @@ struct sunxi_cedrus_dev {
struct reset_control *rstc;
 
char *base;
+
+   unsigned int mbh_buf;
+   unsigned int dcac_buf;
+   unsigned int ncf_buf;
+
+   void *mbh_buf_virt;
+   void *dcac_buf_virt;
+   void *ncf_buf_virt;
+
+   unsigned int mbh_buf_size;
+   unsigned int dcac_buf_size;
+   unsigned int ncf_buf_size;
 };
 
 struct sunxi_cedrus_fmt {
@@ -72,6 +84,7 @@ struct sunxi_cedrus_ctx {
struct vb2_buffer *dst_bufs[VIDEO_MAX_FRAME];
 
struct v4l2_ctrl *mpeg2_frame_hdr_ctrl;
+   struct v4l2_ctrl *mpeg4_frame_hdr_ctrl;
 };
 
 static inline void sunxi_cedrus_write(struct sunxi_cedrus_dev *vpu,
diff --git a/drivers/media/platform/sunxi-cedrus/sunxi_cedrus_dec.c 
b/drivers/media/platform/sunxi-cedrus/sunxi_cedrus_dec.c
index 38e8a3a..8ce635d 100644
--- a/drivers/media/platform/sunxi-cedrus/sunxi_cedrus_dec.c
+++ b/drivers/media/platform/sunxi-cedrus/sunxi_cedrus_dec.c
@@ -53,6 +53,11 @@ static struct sunxi_cedrus_fmt formats[] = {
.types  = SUNXI_CEDRUS_OUTPUT,
.num_planes = 1,
},
+   {
+   .fourcc = V4L2_PIX_FMT_MPEG4_FRAME,
+   .types  = SUNXI_CEDRUS_OUTPUT,
+   .num_planes = 1,
+   },
 };
 
 #define NUM_FORMATS ARRAY_SIZE(formats)
@@ -129,6 +134,10 @@ void device_run(void *priv)
struct v4l2_ctrl_

Re: [RFC 04/11] KVM, arm, arm64: Offer PAs to IPAs idmapping to internal VMs

2017-09-26 Thread Florent Revest

On Thu, 2017-08-31 at 11:23 +0200, Christoffer Dall wrote:
> > diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> > index 2ea21da..1d2d3df 100644
> > --- a/virt/kvm/arm/mmu.c
> > +++ b/virt/kvm/arm/mmu.c
> > @@ -772,6 +772,11 @@ static void stage2_unmap_memslot(struct kvm
> > *kvm,
> > phys_addr_t size = PAGE_SIZE * memslot->npages;
> > hva_t reg_end = hva + size;
> > 
> > +   if (unlikely(!kvm->mm)) {
> I think you should consider using a predicate so that it's clear that
> this is for in-kernel VMs and not just some random situation where mm
> can be NULL.

Internal VMs should be the only usage when kvm->mm would be NULL.
However if you'd prefer it otherwise, I'll make sure this condition
will be made clearer.

> So it's unclear to me why we don't need any special casing in
> kvm_handle_guest_abort, related to MMIO exits etc.  You probably
> assume that we will never do emulation, but that should be described
> and addressed somewhere before I can critically review this patch.

This is indeed what I was assuming. This RFC does not allow MMIO with
internal VMs. I can not think of a usage when this would be useful. I'd
make sure this would be documented in an eventual later RFC.

> > +static int internal_vm_prep_mem(struct kvm *kvm,
> > +   const struct
> > kvm_userspace_memory_region *mem)
> > +{
> > +   phys_addr_t addr, end;
> > +   unsigned long pfn;
> > +   int ret;
> > +   struct kvm_mmu_memory_cache cache = { 0 };
> > +
> > +   end = mem->guest_phys_addr + mem->memory_size;
> > +   pfn = __phys_to_pfn(mem->guest_phys_addr);
> > +   addr = mem->guest_phys_addr;
> My main concern here is that we don't do any checks on this region
> and we could be mapping device memory here as well.  Are we intending
> that to be ok, and are we then relying on the guest to use proper
> memory attributes ?

Indeed, being able to map device memory is intended. It is needed for
Runtime Services sandboxing. It also relies on the guest being
correctly configured.

> > +
> > +   for (; addr < end; addr += PAGE_SIZE) {
> > +   pte_t pte = pfn_pte(pfn, PAGE_S2);
> > +
> > +   pte = kvm_s2pte_mkwrite(pte);
> > +
> > +   ret = mmu_topup_memory_cache(,
> > +KVM_MMU_CACHE_MIN_PAGE
> > S,
> > +KVM_NR_MEM_OBJS);
> You should be able to allocate all you need up front instead of doing
> it in sequences.

Ok.

> > 
> > +   if (ret) {
> > +   mmu_free_memory_cache();
> > +   return ret;
> > +   }
> > +   spin_lock(>mmu_lock);
> > +   ret = stage2_set_pte(kvm, , addr, , 0);
> > +   spin_unlock(>mmu_lock);
> Since you're likely to allocate some large contiguous chunks here,
> can you have a look at using section mappings?

Will do.

Thank you very much,
    Florent

Re: [RFC 00/11] KVM, EFI, arm64: EFI Runtime Services Sandboxing

2017-09-26 Thread Florent Revest

On Thu, 2017-08-31 at 11:26 +0200, Christoffer Dall wrote:
> I wonder if this should be split into two series; one that sets up
> anything you may need from KVM, and another one that uses that for
> UEFI.
> 
> There's a lot KVM and UEFI intertwined logic and assumptions in patch
> 10, which makes this series a bit hard to read.

The way hypercalls are currently handled in handle_hvc required this
mixed patch. Would some kind of HVC subscription mechanism be suitable
to have in KVM? (e.g: a function allowing to register a callback on a
certain HVC function ID) This would allow the 10/11 patch to keep the
kvm code intact.

> I'd like some documentation (in the series and in
> Documentation/virtual/kvm) of how this works, and which hidden
> assumptions there are. For example, how do you ensure you never
> attempt to return to userspace?

I don't think my code ensured this. I'd need to give it a second look.

>  How many VCPUs do you support?

You can create as many VCPUs as you would in a "normal VM". Also, each
VCPU can be ran in a kthread.

>  Do you support any form of virtual interrupts? How about timers?

No support for virtual interrupts or timers indeed. The EFI Runtime
Services sandboxing wouldn't require that.

> Can a VM access physical devices?

The very idea of Runtime Services sandboxing requires Internal VMs to
have access to some of the physical devices.

>  How do you debug and trace something like this? Can the VM be
> monitored from userspace?

There is nothing ready for that.

> These feel like fundamental questions to me that needs addressing
> before I can competently review the code.
> 
> I think a slightly more concrete motivation and outlining the example
> of the broken UEFI on Seattle would help paving the way for these
> patches.

As far as I can remember, EFI Runtime Services on this platform have
already been reported to sometimes disable or enable interrupts. Maybe
someone at ARM has more details about the problem ?

Thanks a lot for your review,
    Florent

Re: [RFC 00/11] KVM, EFI, arm64: EFI Runtime Services Sandboxing

2017-09-26 Thread Florent Revest

On Fri, 2017-09-22 at 14:44 -0700, Ard Biesheuvel wrote:
> From the EFI side, there are some minor concerns on my part regarding
> the calling convention, and the fact that we can no longer invoke
> runtime services from a kernel running at EL1, but those all seem
> fixable. I will respond to the patches in question in greater detail
> at a later time.

Indeed, this RFC currently breaks EFI Runtime Services at EL1. This
would need to be fixed in a new patchset.

The patch 10/11 also underlines that the current argument passing
method does not respect alignment. The way arguments are currently
pushed and pulled makes it quite hard to fix the issue. Any suggestion
would be welcome.

> In the mean time, Christoffer has raised a number for valid concerns,
> and those need to be addressed first before it makes sense to talk
> about EFI specifics. I hope you will find more time to invest in
> this: I would really love to have this feature upstream.

Unfortunately, I'm no longer working at ARM and my other projects keep
me very busy. I would also love to invest more time in this patchset to
have it upstream but I'm really unsure when I will be able to find the
time for this.

Best,
    Florent

Re: [RFC 00/11] KVM, EFI, arm64: EFI Runtime Services Sandboxing

2017-08-25 Thread Florent Revest

Hi,

I just realised that my email client was not configured correctly and
the confidential disclaimer at the bottom of my emails obviously don't
apply. Sorry about that.

Florent

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

[RFC 05/11] KVM: Expose VM/VCPU creation functions

2017-08-25 Thread Florent Revest

Now that KVM is capable of creating internal virtual machines, the rest of
the kernel needs an API to access this capability.

This patch exposes two functions for VMs and VCPUs creation in kvm_host.h:
 - kvm_create_internal_vm: ensures that kvm->mm is kept NULL at VM creation
 - kvm_vm_create_vcpu: simple alias of kvm_vm_ioctl_create_vcpu for clarity

Signed-off-by: Florent Revest <florent.rev...@arm.com>
---
 include/linux/kvm_host.h |  3 +++
 virt/kvm/kvm_main.c  | 10 ++
 2 files changed, 13 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 21a6fd6..dd10d3b 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -565,6 +565,9 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned 
vcpu_align,
  struct module *module);
 void kvm_exit(void);

+struct kvm *kvm_create_internal_vm(unsigned long type);
+int kvm_vm_create_vcpu(struct kvm *kvm, u32 id);
+
 void kvm_get_kvm(struct kvm *kvm);
 void kvm_put_kvm(struct kvm *kvm);

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 2e7af1a..c1c8bb6 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -733,6 +733,11 @@ static struct kvm *kvm_create_vm(unsigned long type, 
struct mm_struct *mm)
return ERR_PTR(r);
 }

+struct kvm *kvm_create_internal_vm(unsigned long type)
+{
+   return kvm_create_vm(type, NULL);
+}
+
 static void kvm_destroy_devices(struct kvm *kvm)
 {
struct kvm_device *dev, *tmp;
@@ -2549,6 +2554,11 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 
id)
return r;
 }

+int kvm_vm_create_vcpu(struct kvm *kvm, u32 id)
+{
+   return kvm_vm_ioctl_create_vcpu(kvm, id);
+}
+
 static int kvm_vcpu_ioctl_set_sigmask(struct kvm_vcpu *vcpu, sigset_t *sigset)
 {
if (sigset) {
--
1.9.1

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

[RFC 09/11] EFI, arm, arm64: Enable EFI Runtime Services later

2017-08-25 Thread Florent Revest

EFI Runtime Services on ARM are enabled very early in the boot process
although they aren't used until substantially later. This patch modifies
the efi initialization sequence on ARM to enable runtime services just
before they are effectively needed (in a subsys target instead of early).

The reason behind this change is that eventually, a late Runtime Services
initialization could take advantage of KVM's internal virtual machines to
sandbox firmware code execution. Since KVM's core is only available
starting from the subsys target, this reordering would be compulsory.

Signed-off-by: Florent Revest <florent.rev...@arm.com>
---
 arch/arm/include/asm/efi.h | 2 ++
 arch/arm64/include/asm/efi.h   | 2 ++
 arch/x86/include/asm/efi.h | 2 ++
 drivers/firmware/efi/arm-runtime.c | 3 +--
 drivers/firmware/efi/efi.c | 3 +++
 5 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/efi.h b/arch/arm/include/asm/efi.h
index 17f1f1a..ed575ae 100644
--- a/arch/arm/include/asm/efi.h
+++ b/arch/arm/include/asm/efi.h
@@ -35,6 +35,8 @@
__f(args);  \
 })

+int efi_arch_late_enable_runtime_services(void);
+
 #define ARCH_EFI_IRQ_FLAGS_MASK \
(PSR_J_BIT | PSR_E_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT | \
 PSR_T_BIT | MODE_MASK)
diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h
index 8f3043a..373d94d 100644
--- a/arch/arm64/include/asm/efi.h
+++ b/arch/arm64/include/asm/efi.h
@@ -37,6 +37,8 @@
kernel_neon_end();  \
 })

+int efi_arch_late_enable_runtime_services(void);
+
 #define ARCH_EFI_IRQ_FLAGS_MASK (PSR_D_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT)

 /* arch specific definitions used by the stub code */
diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 796ff6c..869efbb 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -233,6 +233,8 @@ static inline bool efi_is_64bit(void)

 extern bool efi_reboot_required(void);

+int __init efi_arch_late_enable_runtime_services(void) {}
+
 #else
 static inline void parse_efi_setup(u64 phys_addr, u32 data_len) {}
 static inline bool efi_reboot_required(void)
diff --git a/drivers/firmware/efi/arm-runtime.c 
b/drivers/firmware/efi/arm-runtime.c
index 1cc41c3..d94d240 100644
--- a/drivers/firmware/efi/arm-runtime.c
+++ b/drivers/firmware/efi/arm-runtime.c
@@ -115,7 +115,7 @@ static bool __init efi_virtmap_init(void)
  * non-early mapping of the UEFI system table and virtual mappings for all
  * EFI_MEMORY_RUNTIME regions.
  */
-static int __init arm_enable_runtime_services(void)
+int __init efi_arch_late_enable_runtime_services(void)
 {
u64 mapsize;

@@ -154,7 +154,6 @@ static int __init arm_enable_runtime_services(void)

return 0;
 }
-early_initcall(arm_enable_runtime_services);

 void efi_virtmap_load(void)
 {
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 045d6d3..2b447b4 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -33,6 +33,7 @@
 #include 

 #include 
+#include 

 struct efi __read_mostly efi = {
.mps= EFI_INVALID_TABLE_ADDR,
@@ -304,6 +305,8 @@ static int __init efisubsys_init(void)
 {
int error;

+   efi_arch_late_enable_runtime_services();
+
if (!efi_enabled(EFI_BOOT))
return 0;

--
1.9.1

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

[RFC 02/11] KVM: arm64: Return an Unknown ID on unhandled HVC

2017-08-25 Thread Florent Revest

So far, when the KVM hypervisor received an hvc from a guest, it only
routed the hypercall to the PSCI calls handler. If the function ID of the
hypercall wouldn't be supported by the PSCI code, a PSCI_RET_NOT_SUPPORTED
error code would be returned in x0.

This patch introduces a kvm_psci_is_call() check which is verified before
entering the PSCI calls handling code. The HVC is now only routed to the
PSCI code if its function ID is in the ranges of PSCI functions defined by
SMCCC (0x8400-0x841f and 0xc400-0xc41f).

If the function ID is not in those ranges, an Unknown Function Identifier
is returned in x0. This implements the behavior defined by SMCCC and paves
the way for other hvc handlers.

Signed-off-by: Florent Revest <florent.rev...@arm.com>
---
 arch/arm/include/asm/kvm_psci.h   |  1 +
 arch/arm64/include/asm/kvm_psci.h |  1 +
 arch/arm64/kvm/handle_exit.c  | 24 ++--
 include/uapi/linux/psci.h |  2 ++
 virt/kvm/arm/psci.c   | 21 +
 5 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/arch/arm/include/asm/kvm_psci.h b/arch/arm/include/asm/kvm_psci.h
index 6bda945..8dcd642 100644
--- a/arch/arm/include/asm/kvm_psci.h
+++ b/arch/arm/include/asm/kvm_psci.h
@@ -22,6 +22,7 @@
 #define KVM_ARM_PSCI_0_2   2

 int kvm_psci_version(struct kvm_vcpu *vcpu);
+bool kvm_psci_is_call(struct kvm_vcpu *vcpu);
 int kvm_psci_call(struct kvm_vcpu *vcpu);

 #endif /* __ARM_KVM_PSCI_H__ */
diff --git a/arch/arm64/include/asm/kvm_psci.h 
b/arch/arm64/include/asm/kvm_psci.h
index bc39e55..1a28809 100644
--- a/arch/arm64/include/asm/kvm_psci.h
+++ b/arch/arm64/include/asm/kvm_psci.h
@@ -22,6 +22,7 @@
 #define KVM_ARM_PSCI_0_2   2

 int kvm_psci_version(struct kvm_vcpu *vcpu);
+bool kvm_psci_is_call(struct kvm_vcpu *vcpu);
 int kvm_psci_call(struct kvm_vcpu *vcpu);

 #endif /* __ARM64_KVM_PSCI_H__ */
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 17d8a16..bc7ade5 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -21,6 +21,7 @@

 #include 
 #include 
+#include 

 #include 
 #include 
@@ -34,19 +35,30 @@

 typedef int (*exit_handle_fn)(struct kvm_vcpu *, struct kvm_run *);

+/*
+ * handle_hvc - handle a guest hypercall
+ *
+ * @vcpu:  the vcpu pointer
+ * @run:   access to the kvm_run structure for results
+ *
+ * Route a given hypercall to its right HVC handler thanks to its function ID.
+ * If no corresponding handler is found, write an Unknown ID in x0 (cf. SMCCC).
+ *
+ * This function returns: > 0 (success), 0 (success but exit to user
+ * space), and < 0 (errors)
+ */
 static int handle_hvc(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
-   int ret;
+   int ret = 1;

trace_kvm_hvc_arm64(*vcpu_pc(vcpu), vcpu_get_reg(vcpu, 0),
kvm_vcpu_hvc_get_imm(vcpu));
vcpu->stat.hvc_exit_stat++;

-   ret = kvm_psci_call(vcpu);
-   if (ret < 0) {
-   kvm_inject_undefined(vcpu);
-   return 1;
-   }
+   if (kvm_psci_is_call(vcpu))
+   ret = kvm_psci_call(vcpu);
+   else
+   vcpu_set_reg(vcpu, 0, SMCCC_STD_RET_UNKNOWN_ID);

return ret;
 }
diff --git a/include/uapi/linux/psci.h b/include/uapi/linux/psci.h
index 3d7a0fc..79704fe 100644
--- a/include/uapi/linux/psci.h
+++ b/include/uapi/linux/psci.h
@@ -24,10 +24,12 @@
 /* PSCI v0.2 interface */
 #define PSCI_0_2_FN_BASE   0x8400
 #define PSCI_0_2_FN(n) (PSCI_0_2_FN_BASE + (n))
+#define PSCI_0_2_FN_ENDPSCI_0_2_FN(0x1F)
 #define PSCI_0_2_64BIT 0x4000
 #define PSCI_0_2_FN64_BASE \
(PSCI_0_2_FN_BASE + PSCI_0_2_64BIT)
 #define PSCI_0_2_FN64(n)   (PSCI_0_2_FN64_BASE + (n))
+#define PSCI_0_2_FN64_END  PSCI_0_2_FN64(0x1F)

 #define PSCI_0_2_FN_PSCI_VERSION   PSCI_0_2_FN(0)
 #define PSCI_0_2_FN_CPU_SUSPENDPSCI_0_2_FN(1)
diff --git a/virt/kvm/arm/psci.c b/virt/kvm/arm/psci.c
index f1e363b..9602894 100644
--- a/virt/kvm/arm/psci.c
+++ b/virt/kvm/arm/psci.c
@@ -332,3 +332,24 @@ int kvm_psci_call(struct kvm_vcpu *vcpu)
return -EINVAL;
};
 }
+
+/**
+ * kvm_psci_is_call - checks if a HVC function ID is in a PSCI range
+ * @vcpu: Pointer to the VCPU struct
+ *
+ * When a hypercall is received from a guest. The SMCCC defines a function ID
+ * as a value to be put in x0 to identify the destination of the call. The same
+ * document defines ranges of function IDs to be used by PSCI. This function
+ * checks whether a given vcpu is requesting a PSCI related handler.
+ *
+ * This function returns:
+ *  - true if this HVC should be handled by kvm_psci_call
+ *  - false if it shouldn't
+ */
+inline bool kvm_psci_is_call(str

[RFC 00/11] KVM, EFI, arm64: EFI Runtime Services Sandboxing

2017-08-25 Thread Florent Revest

Hi,

This series implements a mechanism to sandbox EFI Runtime Services on arm64.
It can be enabled with CONFIG_EFI_SANDBOX. At boot it spawns an internal KVM
virtual machine that is ran everytime an EFI Runtime Service is called. This
limits the possible security and stability impact of EFI runtime on the kernel.

The patch set is split as follow:
 - Patches 1 and 2: Give more control over HVC handling to KVM
 - Patches 3 to 6: Introduce the concept of KVM "internal VMs"
 - Patches 7 to 9: Reorder KVM and EFI initialization on ARM
 - Patch 10: Introduces the EFI sandboxing VM and wrappers
 - Patch 11: Workarounds some EFI Runtime Services relying on EL3

The sandboxing has been tested to work reliably (rtc and efivars) on a
SoftIron OverDrive 1000 box and on a ARMv8.3 model with VHE enabled. Normal
userspace KVM instance have also been tested to still work correctly.

Those patches apply cleanly on the Linus' v4.13-rc6 tag and have no other
dependencies.

Florent Revest (11):
  arm64: Add an SMCCC function IDs header
  KVM: arm64: Return an Unknown ID on unhandled HVC
  KVM: Allow VM lifecycle management without userspace
  KVM, arm, arm64: Offer PAs to IPAs idmapping to internal VMs
  KVM: Expose VM/VCPU creation functions
  KVM, arm64: Expose a VCPU initialization function
  KVM: Allow initialization before the module target
  KVM, arm, arm64: Initialize KVM's core earlier
  EFI, arm, arm64: Enable EFI Runtime Services later
  efi, arm64: Sandbox Runtime Services in a VM
  KVM, arm64: Don't trap internal VMs SMC calls

 arch/arm/include/asm/efi.h |   7 +
 arch/arm/include/asm/kvm_coproc.h  |   3 +
 arch/arm/include/asm/kvm_host.h|   1 +
 arch/arm/include/asm/kvm_psci.h|   1 +
 arch/arm/kvm/coproc.c  |   6 +
 arch/arm/kvm/coproc_a15.c  |   3 +-
 arch/arm/kvm/coproc_a7.c   |   3 +-
 arch/arm64/include/asm/efi.h   |  71 
 arch/arm64/include/asm/kvm_emulate.h   |   3 +
 arch/arm64/include/asm/kvm_host.h  |   4 +
 arch/arm64/include/asm/kvm_psci.h  |   1 +
 arch/arm64/kernel/asm-offsets.c|   3 +
 arch/arm64/kvm/handle_exit.c   |  27 +-
 arch/arm64/kvm/sys_regs_generic_v8.c   |   8 +-
 arch/x86/include/asm/efi.h |   2 +
 drivers/firmware/efi/Kconfig   |  10 +
 drivers/firmware/efi/Makefile  |   1 +
 drivers/firmware/efi/arm-runtime.c |   5 +-
 drivers/firmware/efi/arm-sandbox-payload.S |  96 +
 drivers/firmware/efi/arm-sandbox.c | 569 +
 drivers/firmware/efi/efi.c |   3 +
 include/linux/kvm_host.h   |   4 +
 include/linux/smccc_fn.h   |  53 +++
 include/uapi/linux/psci.h  |   2 +
 virt/kvm/arm/arm.c |  18 +-
 virt/kvm/arm/mmu.c |  76 +++-
 virt/kvm/arm/psci.c|  21 ++
 virt/kvm/kvm_main.c| 102 --
 28 files changed, 1050 insertions(+), 53 deletions(-)
 create mode 100644 drivers/firmware/efi/arm-sandbox-payload.S
 create mode 100644 drivers/firmware/efi/arm-sandbox.c
 create mode 100644 include/linux/smccc_fn.h

--
1.9.1

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

[RFC 06/11] KVM, arm64: Expose a VCPU initialization function

2017-08-25 Thread Florent Revest

KVM's core now offers internal virtual machine capabilities, however on ARM
the KVM_ARM_VCPU_INIT ioctl also has to be used to initialize a virtual CPU

This patch exposes a kvm_arm_vcpu_init() function to the rest of the kernel
on arm64 so that it can be used for arm64 internal VM initialization.

This function actually used to be named kvm_arch_vcpu_ioctl_vcpu_init() but
the "ioctl" part of the name wasn't consistent with the rest of the KVM arm
ioctl handlers. Moreover, it wasn't relevant to the usage of internal VMs.
Therefore it has been decided to rename the function to make it less
misleading.

Signed-off-by: Florent Revest <florent.rev...@arm.com>
---
 arch/arm64/include/asm/kvm_host.h | 2 ++
 virt/kvm/arm/arm.c| 5 ++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 65aab35..07b7460 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -372,6 +372,8 @@ static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu 
*vcpu) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
 static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}

+int kvm_arm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_vcpu_init *init);
+
 void kvm_arm_init_debug(void);
 void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index a39a1e1..aa29a5d 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -888,8 +888,7 @@ static int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
 }


-static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu,
-struct kvm_vcpu_init *init)
+int kvm_arm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_vcpu_init *init)
 {
int ret;

@@ -973,7 +972,7 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
if (copy_from_user(, argp, sizeof(init)))
return -EFAULT;

-   return kvm_arch_vcpu_ioctl_vcpu_init(vcpu, );
+   return kvm_arm_vcpu_init(vcpu, );
}
case KVM_SET_ONE_REG:
case KVM_GET_ONE_REG: {
--
1.9.1

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

[RFC 07/11] KVM: Allow initialization before the module target

2017-08-25 Thread Florent Revest

The kvm_init function has been designed to be executed during the
module_init target. It requires a struct module pointer to be used as
the owner of the /dev/* files and also tries to register /dev/kvm with a
function (misc_register) that can only be used late in the boot process.

This patch modifies kvm_init to execute this late initialization code
conditionally, only in the context of a module_init. It also offers a
kvm_set_module function to be used for /dev/kvm registration and device
files owning once the module target is reached.

As is, this patch does not change anything. However it could be used by
certain architectures to initialize the core of kvm earlier in the boot
(e.g: in a subsys_initcall) and then initialize the userspace facing files
in a module_init target. This can be useful to create internal VMs before
being able to offer the userspace APIs.

Signed-off-by: Florent Revest <florent.rev...@arm.com>
---
 include/linux/kvm_host.h |  1 +
 virt/kvm/kvm_main.c  | 28 
 2 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index dd10d3b..15a0a8d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -563,6 +563,7 @@ static inline void kvm_irqfd_exit(void)
 #endif
 int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
  struct module *module);
+int kvm_set_module(struct module *module);
 void kvm_exit(void);

 struct kvm *kvm_create_internal_vm(unsigned long type);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c1c8bb6..3c9cb00 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4086,14 +4086,10 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned 
vcpu_align,
if (r)
goto out_free;

-   kvm_chardev_ops.owner = module;
-   kvm_vm_fops.owner = module;
-   kvm_vcpu_fops.owner = module;
-
-   r = misc_register(_dev);
-   if (r) {
-   pr_err("kvm: misc device register failed\n");
-   goto out_unreg;
+   if (module) {
+   r = kvm_set_module(module);
+   if (r)
+   goto out_unreg;
}

register_syscore_ops(_syscore_ops);
@@ -4136,6 +4132,22 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned 
vcpu_align,
 }
 EXPORT_SYMBOL_GPL(kvm_init);

+int kvm_set_module(struct module *module)
+{
+   int r;
+
+   kvm_chardev_ops.owner = module;
+   kvm_vm_fops.owner = module;
+   kvm_vcpu_fops.owner = module;
+
+   r = misc_register(_dev);
+   if (r)
+   pr_err("kvm: misc device register failed\n");
+
+   return r;
+}
+EXPORT_SYMBOL_GPL(kvm_set_module);
+
 void kvm_exit(void)
 {
debugfs_remove_recursive(kvm_debugfs_dir);
--
1.9.1

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

[RFC 04/11] KVM, arm, arm64: Offer PAs to IPAs idmapping to internal VMs

2017-08-25 Thread Florent Revest

Usual KVM virtual machines map guest's physical addresses from a process
userspace memory. However, with the new concept of internal VMs, a virtual
machine can be created from the kernel, without any link to a userspace
context. Hence, some of the KVM's architecture-specific code needs to be
modified to take this kind of VMs into account.

The approach chosen with this patch is to let internal VMs idmap physical
addresses into intermediary physical addresses by calling
kvm_set_memory_region with a kvm_userspace_memory_region where the
guest_phys_addr field points both to the original PAs and to the IPAs. The
userspace_addr field of this struct is therefore ignored with internal VMs.

This patch extends the capabilities of the arm and arm64 stage2 MMU code
to handle internal VMs. Three things are changed:

- Various parts of the MMU code which are related to a userspace context
are now only executed if kvm->mm is present.

- When this pointer is NULL, struct kvm_userspace_memory_regions are
treated by internal_vm_prep_mem as idmaps of physical memory.

- A set of 256 additional private memslots is now reserved on arm64 for the
usage of internal VMs memory idmapping.

Note: this patch should have pretty much no performance impact on the
critical path of traditional VMs since only one unlikely branch had to be
added to the page fault handler.

Signed-off-by: Florent Revest <florent.rev...@arm.com>
---
 arch/arm64/include/asm/kvm_host.h |  1 +
 virt/kvm/arm/mmu.c| 76 +--
 2 files changed, 74 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index d686300..65aab35 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -32,6 +32,7 @@
 #define __KVM_HAVE_ARCH_INTC_INITIALIZED

 #define KVM_USER_MEM_SLOTS 512
+#define KVM_PRIVATE_MEM_SLOTS 256
 #define KVM_HALT_POLL_NS_DEFAULT 50

 #include 
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 2ea21da..1d2d3df 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -772,6 +772,11 @@ static void stage2_unmap_memslot(struct kvm *kvm,
phys_addr_t size = PAGE_SIZE * memslot->npages;
hva_t reg_end = hva + size;

+   if (unlikely(!kvm->mm)) {
+   unmap_stage2_range(kvm, addr, size);
+   return;
+   }
+
/*
 * A memory region could potentially cover multiple VMAs, and any holes
 * between them, so iterate over all of them to find out if we should
@@ -819,7 +824,8 @@ void stage2_unmap_vm(struct kvm *kvm)
int idx;

idx = srcu_read_lock(>srcu);
-   down_read(>mm->mmap_sem);
+   if (likely(kvm->mm))
+   down_read(>mm->mmap_sem);
spin_lock(>mmu_lock);

slots = kvm_memslots(kvm);
@@ -827,7 +833,8 @@ void stage2_unmap_vm(struct kvm *kvm)
stage2_unmap_memslot(kvm, memslot);

spin_unlock(>mmu_lock);
-   up_read(>mm->mmap_sem);
+   if (likely(kvm->mm))
+   up_read(>mm->mmap_sem);
srcu_read_unlock(>srcu, idx);
 }

@@ -1303,6 +1310,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
return -EFAULT;
}

+   if (unlikely(!kvm->mm)) {
+   kvm_err("Unexpected internal VM page fault\n");
+   kvm_inject_vabt(vcpu);
+   return 0;
+   }
+
/* Let's check if we will get back a huge page backed by hugetlbfs */
down_read(>mm->mmap_sem);
vma = find_vma_intersection(current->mm, hva, hva + 1);
@@ -1850,6 +1863,54 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
kvm_mmu_wp_memory_region(kvm, mem->slot);
 }

+/*
+ * internal_vm_prep_mem - maps a range of hpa to gpa at stage2
+ *
+ * While userspace VMs manage gpas using hvas, internal virtual machines need a
+ * way to map physical addresses to a guest. In order to avoid code 
duplication,
+ * the kvm_set_memory_region call is kept for internal VMs, however it usually
+ * expects a struct kvm_userspace_memory_region with a userspace_addr field.
+ * With internal VMs, this field is ignored and physical memory memory pointed
+ * by guest_phys_addr can only be idmapped.
+ */
+static int internal_vm_prep_mem(struct kvm *kvm,
+   const struct kvm_userspace_memory_region *mem)
+{
+   phys_addr_t addr, end;
+   unsigned long pfn;
+   int ret;
+   struct kvm_mmu_memory_cache cache = { 0 };
+
+   end = mem->guest_phys_addr + mem->memory_size;
+   pfn = __phys_to_pfn(mem->guest_phys_addr);
+   addr = mem->guest_phys_addr;
+
+   for (; addr < end; addr += PAGE_SIZE) {
+   pte_t pte = pfn_pte(pfn, PAGE_S2);
+
+   pte = kvm_s2pte_mkwrite(pte);
+
+   re

[RFC 03/11] KVM: Allow VM lifecycle management without userspace

2017-08-25 Thread Florent Revest

The current codebase of KVM makes many assumptions regarding the origin of
the virtual machine being executed or configured. Indeed, the KVM API
implementation has been written with userspace usage in mind and lots of
userspace-specific code is used (namely preempt_notifiers, eventfd, mmu
notifiers, current->mm...)

The aim of this patch is to make the KVM API (create_vm, create_vcpu etc)
usable from a kernel context. A simple trick is used to distinguish
userspace VMs (coming from QEMU or LKVM...) from internal VMs. (coming
from other subsystems, for example for sandboxing purpose):
  - When a VM is created from an ioctl, kvm->mm is set to current->mm
  - When a VM is created from the kernel, kvm->mm must be set to NULL

This ensures that no userspace program can create internal VMs and allows
to easily check whether a given VM is attached to a process or is internal.

This patch simply encloses the userspace-specific pieces of code of
kvm_main in conditions checking if kvm->mm is present and modifies the
prototype of kvm_create_vm to enable NULL mm.

Signed-off-by: Florent Revest <florent.rev...@arm.com>
---
 virt/kvm/kvm_main.c | 64 ++---
 1 file changed, 41 insertions(+), 23 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 15252d7..2e7af1a 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -154,7 +154,8 @@ int vcpu_load(struct kvm_vcpu *vcpu)
if (mutex_lock_killable(>mutex))
return -EINTR;
cpu = get_cpu();
-   preempt_notifier_register(>preempt_notifier);
+   if (vcpu->kvm->mm)
+   preempt_notifier_register(>preempt_notifier);
kvm_arch_vcpu_load(vcpu, cpu);
put_cpu();
return 0;
@@ -165,7 +166,8 @@ void vcpu_put(struct kvm_vcpu *vcpu)
 {
preempt_disable();
kvm_arch_vcpu_put(vcpu);
-   preempt_notifier_unregister(>preempt_notifier);
+   if (vcpu->kvm->mm)
+   preempt_notifier_unregister(>preempt_notifier);
preempt_enable();
mutex_unlock(>mutex);
 }
@@ -640,7 +642,7 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
return 0;
 }

-static struct kvm *kvm_create_vm(unsigned long type)
+static struct kvm *kvm_create_vm(unsigned long type, struct mm_struct *mm)
 {
int r, i;
struct kvm *kvm = kvm_arch_alloc_vm();
@@ -649,9 +651,11 @@ static struct kvm *kvm_create_vm(unsigned long type)
return ERR_PTR(-ENOMEM);

spin_lock_init(>mmu_lock);
-   mmgrab(current->mm);
-   kvm->mm = current->mm;
-   kvm_eventfd_init(kvm);
+   kvm->mm = mm;
+   if (mm) {
+   mmgrab(current->mm);
+   kvm_eventfd_init(kvm);
+   }
mutex_init(>lock);
mutex_init(>irq_lock);
mutex_init(>slots_lock);
@@ -697,15 +701,18 @@ static struct kvm *kvm_create_vm(unsigned long type)
goto out_err;
}

-   r = kvm_init_mmu_notifier(kvm);
-   if (r)
-   goto out_err;
+   if (mm) {
+   r = kvm_init_mmu_notifier(kvm);
+   if (r)
+   goto out_err;
+   }

spin_lock(_lock);
list_add(>vm_list, _list);
spin_unlock(_lock);

-   preempt_notifier_inc();
+   if (mm)
+   preempt_notifier_inc();

return kvm;

@@ -721,7 +728,8 @@ static struct kvm *kvm_create_vm(unsigned long type)
for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++)
kvm_free_memslots(kvm, __kvm_memslots(kvm, i));
kvm_arch_free_vm(kvm);
-   mmdrop(current->mm);
+   if (mm)
+   mmdrop(mm);
return ERR_PTR(r);
 }

@@ -772,9 +780,11 @@ static void kvm_destroy_vm(struct kvm *kvm)
cleanup_srcu_struct(>irq_srcu);
cleanup_srcu_struct(>srcu);
kvm_arch_free_vm(kvm);
-   preempt_notifier_dec();
+   if (mm)
+   preempt_notifier_dec();
hardware_disable_all();
-   mmdrop(mm);
+   if (mm)
+   mmdrop(mm);
 }

 void kvm_get_kvm(struct kvm *kvm)
@@ -1269,6 +1279,9 @@ unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t 
gfn)
if (kvm_is_error_hva(addr))
return PAGE_SIZE;

+   if (!kvm->mm)
+   return PAGE_SIZE;
+
down_read(>mm->mmap_sem);
vma = find_vma(current->mm, addr);
if (!vma)
@@ -2486,9 +2499,11 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 
id)
if (r)
goto vcpu_destroy;

-   r = kvm_create_vcpu_debugfs(vcpu);
-   if (r)
-   goto vcpu_destroy;
+   if (kvm->mm) {
+   r = kvm_create_vcpu_debugfs(vcpu);
+   if (r)
+   goto vcpu_destroy;
+   }

mutex_lock(>lock);
if (kvm_get_vcpu_by_id(kvm, id)) {

[RFC 08/11] KVM, arm, arm64: Initialize KVM's core earlier

2017-08-25 Thread Florent Revest

In order to use internal VMs early in the boot process, the arm_init
function, in charge of initializing KVM, is split in two parts:
 - A subsys_initcall target initializing KVM's core only
 - A module_init target initializing KVM's userspace facing files

An implicit dependency of VM execution on arm and arm64, the
initialization of KVM system registers, is also rescheduled to be
effective as soon as KVM's core is initialized.

Signed-off-by: Florent Revest <florent.rev...@arm.com>
---
 arch/arm/include/asm/kvm_coproc.h|  3 +++
 arch/arm/include/asm/kvm_host.h  |  1 +
 arch/arm/kvm/coproc.c|  6 ++
 arch/arm/kvm/coproc_a15.c|  3 +--
 arch/arm/kvm/coproc_a7.c |  3 +--
 arch/arm64/include/asm/kvm_host.h|  1 +
 arch/arm64/kvm/sys_regs_generic_v8.c |  8 ++--
 virt/kvm/arm/arm.c   | 13 +++--
 8 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/arch/arm/include/asm/kvm_coproc.h 
b/arch/arm/include/asm/kvm_coproc.h
index e74ab0f..1502723 100644
--- a/arch/arm/include/asm/kvm_coproc.h
+++ b/arch/arm/include/asm/kvm_coproc.h
@@ -45,4 +45,7 @@ struct kvm_coproc_target_table {
 int kvm_arm_coproc_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *);
 int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *);
 unsigned long kvm_arm_num_coproc_regs(struct kvm_vcpu *vcpu);
+
+int coproc_a15_init(void);
+int coproc_a7_init(void);
 #endif /* __ARM_KVM_COPROC_H__ */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 127e2dd..fb94666 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -287,6 +287,7 @@ static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu 
*vcpu) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
 static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}

+void kvm_arm_init_sys_reg(void);
 static inline void kvm_arm_init_debug(void) {}
 static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arm_clear_debug(struct kvm_vcpu *vcpu) {}
diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index 6d1d2e2..28bc397 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -1369,3 +1369,9 @@ void kvm_reset_coprocs(struct kvm_vcpu *vcpu)
if (vcpu_cp15(vcpu, num) == 0x42424242)
panic("Didn't reset vcpu_cp15(vcpu, %zi)", num);
 }
+
+void kvm_arm_init_sys_reg(void)
+{
+   coproc_a7_init();
+   coproc_a15_init();
+}
diff --git a/arch/arm/kvm/coproc_a15.c b/arch/arm/kvm/coproc_a15.c
index a713675..83102a3 100644
--- a/arch/arm/kvm/coproc_a15.c
+++ b/arch/arm/kvm/coproc_a15.c
@@ -43,9 +43,8 @@
.num = ARRAY_SIZE(a15_regs),
 };

-static int __init coproc_a15_init(void)
+int coproc_a15_init(void)
 {
kvm_register_target_coproc_table(_target_table);
return 0;
 }
-late_initcall(coproc_a15_init);
diff --git a/arch/arm/kvm/coproc_a7.c b/arch/arm/kvm/coproc_a7.c
index b19e46d..b365ac0 100644
--- a/arch/arm/kvm/coproc_a7.c
+++ b/arch/arm/kvm/coproc_a7.c
@@ -46,9 +46,8 @@
.num = ARRAY_SIZE(a7_regs),
 };

-static int __init coproc_a7_init(void)
+int coproc_a7_init(void)
 {
kvm_register_target_coproc_table(_target_table);
return 0;
 }
-late_initcall(coproc_a7_init);
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 07b7460..e360bb3 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -374,6 +374,7 @@ static inline void kvm_arch_vcpu_block_finish(struct 
kvm_vcpu *vcpu) {}

 int kvm_arm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_vcpu_init *init);

+void kvm_arm_init_sys_reg(void);
 void kvm_arm_init_debug(void);
 void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/sys_regs_generic_v8.c 
b/arch/arm64/kvm/sys_regs_generic_v8.c
index 969ade1..0fe755d 100644
--- a/arch/arm64/kvm/sys_regs_generic_v8.c
+++ b/arch/arm64/kvm/sys_regs_generic_v8.c
@@ -72,7 +72,7 @@ static void reset_actlr(struct kvm_vcpu *vcpu, const struct 
sys_reg_desc *r)
},
 };

-static int __init sys_reg_genericv8_init(void)
+static int sys_reg_genericv8_init(void)
 {
unsigned int i;

@@ -95,4 +95,8 @@ static int __init sys_reg_genericv8_init(void)

return 0;
 }
-late_initcall(sys_reg_genericv8_init);
+
+void kvm_arm_init_sys_reg(void)
+{
+   sys_reg_genericv8_init();
+}
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index aa29a5d..7d0aa4f 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -1451,6 +1451,8 @@ int kvm_arch_init(void *opaque)
int err;
int ret, cpu;

+   kvm_arm_init_sys_reg();
+
if (!is_hyp_mode_available()) {
kvm_err("HYP mode not available\n");
return -ENODEV;
@@ -1496,8 +1498,15 @@ void kvm_arch_exit(void)

[RFC 10/11] efi, arm64: Sandbox Runtime Services in a VM

2017-08-25 Thread Florent Revest

EFI Runtime Services are binary blobs currently executed in a special
memory context but with the privileges of the kernel. This can potentially
cause security or stability issues (registers corruption for example).

This patch adds a CONFIG_EFI_SANDBOX option that can be used on arm64 to
enclose the Runtime Services in a virtual machine and limit the impact they
can potentially have on the kernel. This sandboxing can also be useful for
debugging as exceptions caused by the firmware code can be recovered and
examined.

When booting the machine, an internal KVM virtual machine is created with
physical and virtual addresses mirroring the host's EFI context.

One page of code and at least 16K of data pages are kept in low memory for
the usage of an internal (in the VM) assembly function call wrapper
(efi_sandbox_wrapper). Calling this internal wrapper is done from external
C function wrappers (e.g: efi_sandbox_get_next_variable) filling the VCPU
registers with arguments and the data page with copies of memory buffers
first.

When a Runtime Service returns, the internal wrapper issues an HVC to let
the host know the efi status return value in x1. In case of exception,
an internal handler also sends an HVC with an EFI_ABORTED error code.

Details of the VCPU initialization, VM memory mapping and service call/ret
are extensively documented in arm-sandbox.c and arm-sandbox-payload.S.

Note: The current version of this patch could potentially cause problems of
arguments alignment when calling an EFI Runtime Service. Indeed, the
buffers arguments are just pushed onto a stack in a data page without any
regards to the ARM Calling Convention alignments.

Signed-off-by: Florent Revest <florent.rev...@arm.com>
---
 arch/arm/include/asm/efi.h |   5 +
 arch/arm64/include/asm/efi.h   |  69 
 arch/arm64/kernel/asm-offsets.c|   3 +
 arch/arm64/kvm/handle_exit.c   |   3 +
 drivers/firmware/efi/Kconfig   |  10 +
 drivers/firmware/efi/Makefile  |   1 +
 drivers/firmware/efi/arm-runtime.c |   2 +
 drivers/firmware/efi/arm-sandbox-payload.S |  96 +
 drivers/firmware/efi/arm-sandbox.c | 569 +
 include/linux/smccc_fn.h   |   3 +
 10 files changed, 761 insertions(+)
 create mode 100644 drivers/firmware/efi/arm-sandbox-payload.S
 create mode 100644 drivers/firmware/efi/arm-sandbox.c

diff --git a/arch/arm/include/asm/efi.h b/arch/arm/include/asm/efi.h
index ed575ae..524f0dd 100644
--- a/arch/arm/include/asm/efi.h
+++ b/arch/arm/include/asm/efi.h
@@ -35,6 +35,11 @@
__f(args);  \
 })

+struct kvm_vcpu;
+static inline void efi_arm_sandbox_init(struct mm_struct *efi_mm) { }
+static inline bool efi_sandbox_is_exit(struct kvm_vcpu *vcpu) { return false; }
+static inline int efi_sandbox_exit(struct kvm_vcpu *vcpu) { return -1; }
+
 int efi_arch_late_enable_runtime_services(void);

 #define ARCH_EFI_IRQ_FLAGS_MASK \
diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h
index 373d94d..f1c33cd 100644
--- a/arch/arm64/include/asm/efi.h
+++ b/arch/arm64/include/asm/efi.h
@@ -1,6 +1,8 @@
 #ifndef _ASM_EFI_H
 #define _ASM_EFI_H

+#include 
+
 #include 
 #include 
 #include 
@@ -18,6 +20,10 @@
 int efi_create_mapping(struct mm_struct *mm, efi_memory_desc_t *md);
 int efi_set_mapping_permissions(struct mm_struct *mm, efi_memory_desc_t *md);

+struct kvm_vcpu;
+
+#ifndef CONFIG_EFI_SANDBOX
+
 #define arch_efi_call_virt_setup() \
 ({ \
kernel_neon_begin();\
@@ -37,6 +43,69 @@
kernel_neon_end();  \
 })

+static inline void efi_arm_sandbox_init(struct mm_struct *efi_mm) { }
+static inline bool efi_sandbox_is_exit(struct kvm_vcpu *vcpu) { return false; }
+static inline int efi_sandbox_exit(struct kvm_vcpu *vcpu) { return -1; }
+
+#else
+
+void efi_arm_sandbox_init(struct mm_struct *efi_mm);
+bool efi_sandbox_is_exit(struct kvm_vcpu *vcpu);
+int efi_sandbox_exit(struct kvm_vcpu *vcpu);
+
+void efi_sandbox_excpt(void);
+void efi_sandbox_wrapper(void);
+void efi_sandbox_vectors(void);
+
+#define arch_efi_call_virt_setup() ({})
+#define arch_efi_call_virt(p, f, args...)  efi_sandbox_##f(args)
+#define arch_efi_call_virt_teardown() ({})
+
+/*
+ * The following function wrappers are needed in order to serialize the 
variadic
+ * macro's arguments (arch_efi_call_virt(p, f, args...)) in the vcpu's 
registers
+ * p is also ignored since it is available in the context of the virtual 
machine
+ */
+
+efi_status_t efi_sandbox_get_time(efi_time_t *tm,
+  efi_time_cap_t *tc);
+efi_status_t efi_sandbox_set_time(efi_time_t *tm);
+efi_status_t efi_sandbox_get_wakeup_time(efi_bool_t *e

[RFC 11/11] KVM, arm64: Don't trap internal VMs SMC calls

2017-08-25 Thread Florent Revest

Internal virtual machines can be used to sandbox code such as EFI Runtime
Services. However, some implementations of those Runtime Services rely on
handlers placed in the Secure World (e.g: SoftIron Overdrive 1000) and need
access to SMC calls.

This patch modifies the Hypervisor Configuration Register to avoid trapping
SMC calls of internal virtual machines. Normal userspace VMs are not
affected by this patch.

Note: Letting Runtime Services VMs access EL3 without control can
potentially be a security threat on its own. An alternative would be to
forward SMC calls selectively from inside handle_smc. However, this would
require some level of knowledge of the SMC calls arguments and EFI
Runtime Services implementations.

Signed-off-by: Florent Revest <florent.rev...@arm.com>
---
 arch/arm64/include/asm/kvm_emulate.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_emulate.h 
b/arch/arm64/include/asm/kvm_emulate.h
index fe39e68..4b46cd0 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -49,6 +49,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
vcpu->arch.hcr_el2 |= HCR_E2H;
if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features))
vcpu->arch.hcr_el2 &= ~HCR_RW;
+
+   if (!vcpu->kvm->mm)
+   vcpu->arch.hcr_el2 &= ~HCR_TSC;
 }

 static inline unsigned long vcpu_get_hcr(struct kvm_vcpu *vcpu)
--
1.9.1

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

[RFC 01/11] arm64: Add an SMCCC function IDs header

2017-08-25 Thread Florent Revest

The ARM SMC Calling Convention (DEN 0028B) introduces function IDs for
hypercalls, given in the x0 register during an SMC or HVC from a guest.

The document defines ranges of function IDs targeting different kinds of
hypervisors or supervisors.

Two ID ranges are of particular interest for the kernel:
- Standard hypervisor service calls
- Vendor specific hypervisor service calls

This patch introduces a couple of useful macros when working with SMCCC.
They provide defines of those ID ranges to be used by HVC handling (KVM)
or calling. (e.g: to leverage paravirtualized services)

The document also defines standard return values to be written into x0
after a hypercall handling. Once again, those macros can potentially be
used from both the hypervisor or the guest.

Signed-off-by: Florent Revest <florent.rev...@arm.com>
---
 include/linux/smccc_fn.h | 50 
 1 file changed, 50 insertions(+)
 create mode 100644 include/linux/smccc_fn.h

diff --git a/include/linux/smccc_fn.h b/include/linux/smccc_fn.h
new file mode 100644
index 000..f08145d
--- /dev/null
+++ b/include/linux/smccc_fn.h
@@ -0,0 +1,50 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * Copyright (C) 2017 ARM Limited
+ */
+
+#ifndef __LINUX_SMCCC_FN_H
+#define __LINUX_SMCCC_FN_H
+
+/*
+ * Standard return values
+ */
+
+#define SMCCC_STD_RET_SUCCESS  0
+#define SMCCC_STD_RET_UNKNOWN_ID   -1
+
+
+/*
+ * SMC32
+ */
+
+/* Standard hypervisor services interface */
+#define SMCCC32_STD_HYP_FN_BASE0x8500
+#define SMCCC32_STD_HYP_FN(n)  (SMCCC32_STD_HYP_FN_BASE + (n))
+
+/* Vendor specific hypervisor services interface */
+#define SMCCC32_VDR_HYP_FN_BASE0x8600
+#define SMCCC32_VDR_HYP_FN(n)  (SMCCC32_VDR_HYP_FN_BASE + (n))
+
+
+/*
+ * SMC64
+ */
+
+/* Standard hypervisor services interface */
+#define SMCCC64_STD_HYP_FN_BASE0xc500
+#define SMCCC64_STD_HYP_FN(n)  (SMCCC64_STD_HYP_FN_BASE + (n))
+
+/* Vendor specific hypervisor services interface */
+#define SMCCC64_VDR_HYP_FN_BASE0xc600
+#define SMCCC64_VDR_HYP_FN(n)  (SMCCC64_VDR_HYP_FN_BASE + (n))
+
+#endif /* __LINUX_SMCCC_FN_H */
--
1.9.1

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

[RFC 02/11] KVM: arm64: Return an Unknown ID on unhandled HVC

2017-08-25 Thread Florent Revest

So far, when the KVM hypervisor received an hvc from a guest, it only
routed the hypercall to the PSCI calls handler. If the function ID of the
hypercall wouldn't be supported by the PSCI code, a PSCI_RET_NOT_SUPPORTED
error code would be returned in x0.

This patch introduces a kvm_psci_is_call() check which is verified before
entering the PSCI calls handling code. The HVC is now only routed to the
PSCI code if its function ID is in the ranges of PSCI functions defined by
SMCCC (0x8400-0x841f and 0xc400-0xc41f).

If the function ID is not in those ranges, an Unknown Function Identifier
is returned in x0. This implements the behavior defined by SMCCC and paves
the way for other hvc handlers.

Signed-off-by: Florent Revest 
---
 arch/arm/include/asm/kvm_psci.h   |  1 +
 arch/arm64/include/asm/kvm_psci.h |  1 +
 arch/arm64/kvm/handle_exit.c  | 24 ++--
 include/uapi/linux/psci.h |  2 ++
 virt/kvm/arm/psci.c   | 21 +
 5 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/arch/arm/include/asm/kvm_psci.h b/arch/arm/include/asm/kvm_psci.h
index 6bda945..8dcd642 100644
--- a/arch/arm/include/asm/kvm_psci.h
+++ b/arch/arm/include/asm/kvm_psci.h
@@ -22,6 +22,7 @@
 #define KVM_ARM_PSCI_0_2   2

 int kvm_psci_version(struct kvm_vcpu *vcpu);
+bool kvm_psci_is_call(struct kvm_vcpu *vcpu);
 int kvm_psci_call(struct kvm_vcpu *vcpu);

 #endif /* __ARM_KVM_PSCI_H__ */
diff --git a/arch/arm64/include/asm/kvm_psci.h 
b/arch/arm64/include/asm/kvm_psci.h
index bc39e55..1a28809 100644
--- a/arch/arm64/include/asm/kvm_psci.h
+++ b/arch/arm64/include/asm/kvm_psci.h
@@ -22,6 +22,7 @@
 #define KVM_ARM_PSCI_0_2   2

 int kvm_psci_version(struct kvm_vcpu *vcpu);
+bool kvm_psci_is_call(struct kvm_vcpu *vcpu);
 int kvm_psci_call(struct kvm_vcpu *vcpu);

 #endif /* __ARM64_KVM_PSCI_H__ */
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 17d8a16..bc7ade5 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -21,6 +21,7 @@

 #include 
 #include 
+#include 

 #include 
 #include 
@@ -34,19 +35,30 @@

 typedef int (*exit_handle_fn)(struct kvm_vcpu *, struct kvm_run *);

+/*
+ * handle_hvc - handle a guest hypercall
+ *
+ * @vcpu:  the vcpu pointer
+ * @run:   access to the kvm_run structure for results
+ *
+ * Route a given hypercall to its right HVC handler thanks to its function ID.
+ * If no corresponding handler is found, write an Unknown ID in x0 (cf. SMCCC).
+ *
+ * This function returns: > 0 (success), 0 (success but exit to user
+ * space), and < 0 (errors)
+ */
 static int handle_hvc(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
-   int ret;
+   int ret = 1;

trace_kvm_hvc_arm64(*vcpu_pc(vcpu), vcpu_get_reg(vcpu, 0),
kvm_vcpu_hvc_get_imm(vcpu));
vcpu->stat.hvc_exit_stat++;

-   ret = kvm_psci_call(vcpu);
-   if (ret < 0) {
-   kvm_inject_undefined(vcpu);
-   return 1;
-   }
+   if (kvm_psci_is_call(vcpu))
+   ret = kvm_psci_call(vcpu);
+   else
+   vcpu_set_reg(vcpu, 0, SMCCC_STD_RET_UNKNOWN_ID);

return ret;
 }
diff --git a/include/uapi/linux/psci.h b/include/uapi/linux/psci.h
index 3d7a0fc..79704fe 100644
--- a/include/uapi/linux/psci.h
+++ b/include/uapi/linux/psci.h
@@ -24,10 +24,12 @@
 /* PSCI v0.2 interface */
 #define PSCI_0_2_FN_BASE   0x8400
 #define PSCI_0_2_FN(n) (PSCI_0_2_FN_BASE + (n))
+#define PSCI_0_2_FN_ENDPSCI_0_2_FN(0x1F)
 #define PSCI_0_2_64BIT 0x4000
 #define PSCI_0_2_FN64_BASE \
(PSCI_0_2_FN_BASE + PSCI_0_2_64BIT)
 #define PSCI_0_2_FN64(n)   (PSCI_0_2_FN64_BASE + (n))
+#define PSCI_0_2_FN64_END  PSCI_0_2_FN64(0x1F)

 #define PSCI_0_2_FN_PSCI_VERSION   PSCI_0_2_FN(0)
 #define PSCI_0_2_FN_CPU_SUSPENDPSCI_0_2_FN(1)
diff --git a/virt/kvm/arm/psci.c b/virt/kvm/arm/psci.c
index f1e363b..9602894 100644
--- a/virt/kvm/arm/psci.c
+++ b/virt/kvm/arm/psci.c
@@ -332,3 +332,24 @@ int kvm_psci_call(struct kvm_vcpu *vcpu)
return -EINVAL;
};
 }
+
+/**
+ * kvm_psci_is_call - checks if a HVC function ID is in a PSCI range
+ * @vcpu: Pointer to the VCPU struct
+ *
+ * When a hypercall is received from a guest. The SMCCC defines a function ID
+ * as a value to be put in x0 to identify the destination of the call. The same
+ * document defines ranges of function IDs to be used by PSCI. This function
+ * checks whether a given vcpu is requesting a PSCI related handler.
+ *
+ * This function returns:
+ *  - true if this HVC should be handled by kvm_psci_call
+ *  - false if it shouldn't
+ */
+inline bool kvm_psci_is_call(struct kvm_vcpu *vcpu)
+{
+   unsigned long fn =

[RFC 00/11] KVM, EFI, arm64: EFI Runtime Services Sandboxing

2017-08-25 Thread Florent Revest

Hi,

This series implements a mechanism to sandbox EFI Runtime Services on arm64.
It can be enabled with CONFIG_EFI_SANDBOX. At boot it spawns an internal KVM
virtual machine that is ran everytime an EFI Runtime Service is called. This
limits the possible security and stability impact of EFI runtime on the kernel.

The patch set is split as follow:
 - Patches 1 and 2: Give more control over HVC handling to KVM
 - Patches 3 to 6: Introduce the concept of KVM "internal VMs"
 - Patches 7 to 9: Reorder KVM and EFI initialization on ARM
 - Patch 10: Introduces the EFI sandboxing VM and wrappers
 - Patch 11: Workarounds some EFI Runtime Services relying on EL3

The sandboxing has been tested to work reliably (rtc and efivars) on a
SoftIron OverDrive 1000 box and on a ARMv8.3 model with VHE enabled. Normal
userspace KVM instance have also been tested to still work correctly.

Those patches apply cleanly on the Linus' v4.13-rc6 tag and have no other
dependencies.

Florent Revest (11):
  arm64: Add an SMCCC function IDs header
  KVM: arm64: Return an Unknown ID on unhandled HVC
  KVM: Allow VM lifecycle management without userspace
  KVM, arm, arm64: Offer PAs to IPAs idmapping to internal VMs
  KVM: Expose VM/VCPU creation functions
  KVM, arm64: Expose a VCPU initialization function
  KVM: Allow initialization before the module target
  KVM, arm, arm64: Initialize KVM's core earlier
  EFI, arm, arm64: Enable EFI Runtime Services later
  efi, arm64: Sandbox Runtime Services in a VM
  KVM, arm64: Don't trap internal VMs SMC calls

 arch/arm/include/asm/efi.h |   7 +
 arch/arm/include/asm/kvm_coproc.h  |   3 +
 arch/arm/include/asm/kvm_host.h|   1 +
 arch/arm/include/asm/kvm_psci.h|   1 +
 arch/arm/kvm/coproc.c  |   6 +
 arch/arm/kvm/coproc_a15.c  |   3 +-
 arch/arm/kvm/coproc_a7.c   |   3 +-
 arch/arm64/include/asm/efi.h   |  71 
 arch/arm64/include/asm/kvm_emulate.h   |   3 +
 arch/arm64/include/asm/kvm_host.h  |   4 +
 arch/arm64/include/asm/kvm_psci.h  |   1 +
 arch/arm64/kernel/asm-offsets.c|   3 +
 arch/arm64/kvm/handle_exit.c   |  27 +-
 arch/arm64/kvm/sys_regs_generic_v8.c   |   8 +-
 arch/x86/include/asm/efi.h |   2 +
 drivers/firmware/efi/Kconfig   |  10 +
 drivers/firmware/efi/Makefile  |   1 +
 drivers/firmware/efi/arm-runtime.c |   5 +-
 drivers/firmware/efi/arm-sandbox-payload.S |  96 +
 drivers/firmware/efi/arm-sandbox.c | 569 +
 drivers/firmware/efi/efi.c |   3 +
 include/linux/kvm_host.h   |   4 +
 include/linux/smccc_fn.h   |  53 +++
 include/uapi/linux/psci.h  |   2 +
 virt/kvm/arm/arm.c |  18 +-
 virt/kvm/arm/mmu.c |  76 +++-
 virt/kvm/arm/psci.c|  21 ++
 virt/kvm/kvm_main.c| 102 --
 28 files changed, 1050 insertions(+), 53 deletions(-)
 create mode 100644 drivers/firmware/efi/arm-sandbox-payload.S
 create mode 100644 drivers/firmware/efi/arm-sandbox.c
 create mode 100644 include/linux/smccc_fn.h

--
1.9.1

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

[RFC 05/11] KVM: Expose VM/VCPU creation functions

2017-08-25 Thread Florent Revest

Now that KVM is capable of creating internal virtual machines, the rest of
the kernel needs an API to access this capability.

This patch exposes two functions for VMs and VCPUs creation in kvm_host.h:
 - kvm_create_internal_vm: ensures that kvm->mm is kept NULL at VM creation
 - kvm_vm_create_vcpu: simple alias of kvm_vm_ioctl_create_vcpu for clarity

Signed-off-by: Florent Revest 
---
 include/linux/kvm_host.h |  3 +++
 virt/kvm/kvm_main.c  | 10 ++
 2 files changed, 13 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 21a6fd6..dd10d3b 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -565,6 +565,9 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned 
vcpu_align,
  struct module *module);
 void kvm_exit(void);

+struct kvm *kvm_create_internal_vm(unsigned long type);
+int kvm_vm_create_vcpu(struct kvm *kvm, u32 id);
+
 void kvm_get_kvm(struct kvm *kvm);
 void kvm_put_kvm(struct kvm *kvm);

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 2e7af1a..c1c8bb6 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -733,6 +733,11 @@ static struct kvm *kvm_create_vm(unsigned long type, 
struct mm_struct *mm)
return ERR_PTR(r);
 }

+struct kvm *kvm_create_internal_vm(unsigned long type)
+{
+   return kvm_create_vm(type, NULL);
+}
+
 static void kvm_destroy_devices(struct kvm *kvm)
 {
struct kvm_device *dev, *tmp;
@@ -2549,6 +2554,11 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 
id)
return r;
 }

+int kvm_vm_create_vcpu(struct kvm *kvm, u32 id)
+{
+   return kvm_vm_ioctl_create_vcpu(kvm, id);
+}
+
 static int kvm_vcpu_ioctl_set_sigmask(struct kvm_vcpu *vcpu, sigset_t *sigset)
 {
if (sigset) {
--
1.9.1

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

[RFC 09/11] EFI, arm, arm64: Enable EFI Runtime Services later

2017-08-25 Thread Florent Revest

EFI Runtime Services on ARM are enabled very early in the boot process
although they aren't used until substantially later. This patch modifies
the efi initialization sequence on ARM to enable runtime services just
before they are effectively needed (in a subsys target instead of early).

The reason behind this change is that eventually, a late Runtime Services
initialization could take advantage of KVM's internal virtual machines to
sandbox firmware code execution. Since KVM's core is only available
starting from the subsys target, this reordering would be compulsory.

Signed-off-by: Florent Revest 
---
 arch/arm/include/asm/efi.h | 2 ++
 arch/arm64/include/asm/efi.h   | 2 ++
 arch/x86/include/asm/efi.h | 2 ++
 drivers/firmware/efi/arm-runtime.c | 3 +--
 drivers/firmware/efi/efi.c | 3 +++
 5 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/efi.h b/arch/arm/include/asm/efi.h
index 17f1f1a..ed575ae 100644
--- a/arch/arm/include/asm/efi.h
+++ b/arch/arm/include/asm/efi.h
@@ -35,6 +35,8 @@
__f(args);  \
 })

+int efi_arch_late_enable_runtime_services(void);
+
 #define ARCH_EFI_IRQ_FLAGS_MASK \
(PSR_J_BIT | PSR_E_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT | \
 PSR_T_BIT | MODE_MASK)
diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h
index 8f3043a..373d94d 100644
--- a/arch/arm64/include/asm/efi.h
+++ b/arch/arm64/include/asm/efi.h
@@ -37,6 +37,8 @@
kernel_neon_end();  \
 })

+int efi_arch_late_enable_runtime_services(void);
+
 #define ARCH_EFI_IRQ_FLAGS_MASK (PSR_D_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT)

 /* arch specific definitions used by the stub code */
diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 796ff6c..869efbb 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -233,6 +233,8 @@ static inline bool efi_is_64bit(void)

 extern bool efi_reboot_required(void);

+int __init efi_arch_late_enable_runtime_services(void) {}
+
 #else
 static inline void parse_efi_setup(u64 phys_addr, u32 data_len) {}
 static inline bool efi_reboot_required(void)
diff --git a/drivers/firmware/efi/arm-runtime.c 
b/drivers/firmware/efi/arm-runtime.c
index 1cc41c3..d94d240 100644
--- a/drivers/firmware/efi/arm-runtime.c
+++ b/drivers/firmware/efi/arm-runtime.c
@@ -115,7 +115,7 @@ static bool __init efi_virtmap_init(void)
  * non-early mapping of the UEFI system table and virtual mappings for all
  * EFI_MEMORY_RUNTIME regions.
  */
-static int __init arm_enable_runtime_services(void)
+int __init efi_arch_late_enable_runtime_services(void)
 {
u64 mapsize;

@@ -154,7 +154,6 @@ static int __init arm_enable_runtime_services(void)

return 0;
 }
-early_initcall(arm_enable_runtime_services);

 void efi_virtmap_load(void)
 {
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 045d6d3..2b447b4 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -33,6 +33,7 @@
 #include 

 #include 
+#include 

 struct efi __read_mostly efi = {
.mps= EFI_INVALID_TABLE_ADDR,
@@ -304,6 +305,8 @@ static int __init efisubsys_init(void)
 {
int error;

+   efi_arch_late_enable_runtime_services();
+
if (!efi_enabled(EFI_BOOT))
return 0;

--
1.9.1

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

[RFC 11/11] KVM, arm64: Don't trap internal VMs SMC calls

2017-08-25 Thread Florent Revest

Internal virtual machines can be used to sandbox code such as EFI Runtime
Services. However, some implementations of those Runtime Services rely on
handlers placed in the Secure World (e.g: SoftIron Overdrive 1000) and need
access to SMC calls.

This patch modifies the Hypervisor Configuration Register to avoid trapping
SMC calls of internal virtual machines. Normal userspace VMs are not
affected by this patch.

Note: Letting Runtime Services VMs access EL3 without control can
potentially be a security threat on its own. An alternative would be to
forward SMC calls selectively from inside handle_smc. However, this would
require some level of knowledge of the SMC calls arguments and EFI
Runtime Services implementations.

Signed-off-by: Florent Revest 
---
 arch/arm64/include/asm/kvm_emulate.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_emulate.h 
b/arch/arm64/include/asm/kvm_emulate.h
index fe39e68..4b46cd0 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -49,6 +49,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
vcpu->arch.hcr_el2 |= HCR_E2H;
if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features))
vcpu->arch.hcr_el2 &= ~HCR_RW;
+
+   if (!vcpu->kvm->mm)
+   vcpu->arch.hcr_el2 &= ~HCR_TSC;
 }

 static inline unsigned long vcpu_get_hcr(struct kvm_vcpu *vcpu)
--
1.9.1

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

[RFC 10/11] efi, arm64: Sandbox Runtime Services in a VM

2017-08-25 Thread Florent Revest

EFI Runtime Services are binary blobs currently executed in a special
memory context but with the privileges of the kernel. This can potentially
cause security or stability issues (registers corruption for example).

This patch adds a CONFIG_EFI_SANDBOX option that can be used on arm64 to
enclose the Runtime Services in a virtual machine and limit the impact they
can potentially have on the kernel. This sandboxing can also be useful for
debugging as exceptions caused by the firmware code can be recovered and
examined.

When booting the machine, an internal KVM virtual machine is created with
physical and virtual addresses mirroring the host's EFI context.

One page of code and at least 16K of data pages are kept in low memory for
the usage of an internal (in the VM) assembly function call wrapper
(efi_sandbox_wrapper). Calling this internal wrapper is done from external
C function wrappers (e.g: efi_sandbox_get_next_variable) filling the VCPU
registers with arguments and the data page with copies of memory buffers
first.

When a Runtime Service returns, the internal wrapper issues an HVC to let
the host know the efi status return value in x1. In case of exception,
an internal handler also sends an HVC with an EFI_ABORTED error code.

Details of the VCPU initialization, VM memory mapping and service call/ret
are extensively documented in arm-sandbox.c and arm-sandbox-payload.S.

Note: The current version of this patch could potentially cause problems of
arguments alignment when calling an EFI Runtime Service. Indeed, the
buffers arguments are just pushed onto a stack in a data page without any
regards to the ARM Calling Convention alignments.

Signed-off-by: Florent Revest 
---
 arch/arm/include/asm/efi.h |   5 +
 arch/arm64/include/asm/efi.h   |  69 
 arch/arm64/kernel/asm-offsets.c|   3 +
 arch/arm64/kvm/handle_exit.c   |   3 +
 drivers/firmware/efi/Kconfig   |  10 +
 drivers/firmware/efi/Makefile  |   1 +
 drivers/firmware/efi/arm-runtime.c |   2 +
 drivers/firmware/efi/arm-sandbox-payload.S |  96 +
 drivers/firmware/efi/arm-sandbox.c | 569 +
 include/linux/smccc_fn.h   |   3 +
 10 files changed, 761 insertions(+)
 create mode 100644 drivers/firmware/efi/arm-sandbox-payload.S
 create mode 100644 drivers/firmware/efi/arm-sandbox.c

diff --git a/arch/arm/include/asm/efi.h b/arch/arm/include/asm/efi.h
index ed575ae..524f0dd 100644
--- a/arch/arm/include/asm/efi.h
+++ b/arch/arm/include/asm/efi.h
@@ -35,6 +35,11 @@
__f(args);  \
 })

+struct kvm_vcpu;
+static inline void efi_arm_sandbox_init(struct mm_struct *efi_mm) { }
+static inline bool efi_sandbox_is_exit(struct kvm_vcpu *vcpu) { return false; }
+static inline int efi_sandbox_exit(struct kvm_vcpu *vcpu) { return -1; }
+
 int efi_arch_late_enable_runtime_services(void);

 #define ARCH_EFI_IRQ_FLAGS_MASK \
diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h
index 373d94d..f1c33cd 100644
--- a/arch/arm64/include/asm/efi.h
+++ b/arch/arm64/include/asm/efi.h
@@ -1,6 +1,8 @@
 #ifndef _ASM_EFI_H
 #define _ASM_EFI_H

+#include 
+
 #include 
 #include 
 #include 
@@ -18,6 +20,10 @@
 int efi_create_mapping(struct mm_struct *mm, efi_memory_desc_t *md);
 int efi_set_mapping_permissions(struct mm_struct *mm, efi_memory_desc_t *md);

+struct kvm_vcpu;
+
+#ifndef CONFIG_EFI_SANDBOX
+
 #define arch_efi_call_virt_setup() \
 ({ \
kernel_neon_begin();\
@@ -37,6 +43,69 @@
kernel_neon_end();  \
 })

+static inline void efi_arm_sandbox_init(struct mm_struct *efi_mm) { }
+static inline bool efi_sandbox_is_exit(struct kvm_vcpu *vcpu) { return false; }
+static inline int efi_sandbox_exit(struct kvm_vcpu *vcpu) { return -1; }
+
+#else
+
+void efi_arm_sandbox_init(struct mm_struct *efi_mm);
+bool efi_sandbox_is_exit(struct kvm_vcpu *vcpu);
+int efi_sandbox_exit(struct kvm_vcpu *vcpu);
+
+void efi_sandbox_excpt(void);
+void efi_sandbox_wrapper(void);
+void efi_sandbox_vectors(void);
+
+#define arch_efi_call_virt_setup() ({})
+#define arch_efi_call_virt(p, f, args...)  efi_sandbox_##f(args)
+#define arch_efi_call_virt_teardown() ({})
+
+/*
+ * The following function wrappers are needed in order to serialize the 
variadic
+ * macro's arguments (arch_efi_call_virt(p, f, args...)) in the vcpu's 
registers
+ * p is also ignored since it is available in the context of the virtual 
machine
+ */
+
+efi_status_t efi_sandbox_get_time(efi_time_t *tm,
+  efi_time_cap_t *tc);
+efi_status_t efi_sandbox_set_time(efi_time_t *tm);
+efi_status_t efi_sandbox_get_wakeup_time(efi_bool_t *enabled

[RFC 08/11] KVM, arm, arm64: Initialize KVM's core earlier

2017-08-25 Thread Florent Revest

In order to use internal VMs early in the boot process, the arm_init
function, in charge of initializing KVM, is split in two parts:
 - A subsys_initcall target initializing KVM's core only
 - A module_init target initializing KVM's userspace facing files

An implicit dependency of VM execution on arm and arm64, the
initialization of KVM system registers, is also rescheduled to be
effective as soon as KVM's core is initialized.

Signed-off-by: Florent Revest 
---
 arch/arm/include/asm/kvm_coproc.h|  3 +++
 arch/arm/include/asm/kvm_host.h  |  1 +
 arch/arm/kvm/coproc.c|  6 ++
 arch/arm/kvm/coproc_a15.c|  3 +--
 arch/arm/kvm/coproc_a7.c |  3 +--
 arch/arm64/include/asm/kvm_host.h|  1 +
 arch/arm64/kvm/sys_regs_generic_v8.c |  8 ++--
 virt/kvm/arm/arm.c   | 13 +++--
 8 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/arch/arm/include/asm/kvm_coproc.h 
b/arch/arm/include/asm/kvm_coproc.h
index e74ab0f..1502723 100644
--- a/arch/arm/include/asm/kvm_coproc.h
+++ b/arch/arm/include/asm/kvm_coproc.h
@@ -45,4 +45,7 @@ struct kvm_coproc_target_table {
 int kvm_arm_coproc_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *);
 int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *);
 unsigned long kvm_arm_num_coproc_regs(struct kvm_vcpu *vcpu);
+
+int coproc_a15_init(void);
+int coproc_a7_init(void);
 #endif /* __ARM_KVM_COPROC_H__ */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 127e2dd..fb94666 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -287,6 +287,7 @@ static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu 
*vcpu) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
 static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}

+void kvm_arm_init_sys_reg(void);
 static inline void kvm_arm_init_debug(void) {}
 static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arm_clear_debug(struct kvm_vcpu *vcpu) {}
diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index 6d1d2e2..28bc397 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -1369,3 +1369,9 @@ void kvm_reset_coprocs(struct kvm_vcpu *vcpu)
if (vcpu_cp15(vcpu, num) == 0x42424242)
panic("Didn't reset vcpu_cp15(vcpu, %zi)", num);
 }
+
+void kvm_arm_init_sys_reg(void)
+{
+   coproc_a7_init();
+   coproc_a15_init();
+}
diff --git a/arch/arm/kvm/coproc_a15.c b/arch/arm/kvm/coproc_a15.c
index a713675..83102a3 100644
--- a/arch/arm/kvm/coproc_a15.c
+++ b/arch/arm/kvm/coproc_a15.c
@@ -43,9 +43,8 @@
.num = ARRAY_SIZE(a15_regs),
 };

-static int __init coproc_a15_init(void)
+int coproc_a15_init(void)
 {
kvm_register_target_coproc_table(_target_table);
return 0;
 }
-late_initcall(coproc_a15_init);
diff --git a/arch/arm/kvm/coproc_a7.c b/arch/arm/kvm/coproc_a7.c
index b19e46d..b365ac0 100644
--- a/arch/arm/kvm/coproc_a7.c
+++ b/arch/arm/kvm/coproc_a7.c
@@ -46,9 +46,8 @@
.num = ARRAY_SIZE(a7_regs),
 };

-static int __init coproc_a7_init(void)
+int coproc_a7_init(void)
 {
kvm_register_target_coproc_table(_target_table);
return 0;
 }
-late_initcall(coproc_a7_init);
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 07b7460..e360bb3 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -374,6 +374,7 @@ static inline void kvm_arch_vcpu_block_finish(struct 
kvm_vcpu *vcpu) {}

 int kvm_arm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_vcpu_init *init);

+void kvm_arm_init_sys_reg(void);
 void kvm_arm_init_debug(void);
 void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/sys_regs_generic_v8.c 
b/arch/arm64/kvm/sys_regs_generic_v8.c
index 969ade1..0fe755d 100644
--- a/arch/arm64/kvm/sys_regs_generic_v8.c
+++ b/arch/arm64/kvm/sys_regs_generic_v8.c
@@ -72,7 +72,7 @@ static void reset_actlr(struct kvm_vcpu *vcpu, const struct 
sys_reg_desc *r)
},
 };

-static int __init sys_reg_genericv8_init(void)
+static int sys_reg_genericv8_init(void)
 {
unsigned int i;

@@ -95,4 +95,8 @@ static int __init sys_reg_genericv8_init(void)

return 0;
 }
-late_initcall(sys_reg_genericv8_init);
+
+void kvm_arm_init_sys_reg(void)
+{
+   sys_reg_genericv8_init();
+}
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index aa29a5d..7d0aa4f 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -1451,6 +1451,8 @@ int kvm_arch_init(void *opaque)
int err;
int ret, cpu;

+   kvm_arm_init_sys_reg();
+
if (!is_hyp_mode_available()) {
kvm_err("HYP mode not available\n");
return -ENODEV;
@@ -1496,8 +1498,15 @@ void kvm_arch_exit(void)

 static int arm_init(void)
 {
-

[RFC 04/11] KVM, arm, arm64: Offer PAs to IPAs idmapping to internal VMs

2017-08-25 Thread Florent Revest

Usual KVM virtual machines map guest's physical addresses from a process
userspace memory. However, with the new concept of internal VMs, a virtual
machine can be created from the kernel, without any link to a userspace
context. Hence, some of the KVM's architecture-specific code needs to be
modified to take this kind of VMs into account.

The approach chosen with this patch is to let internal VMs idmap physical
addresses into intermediary physical addresses by calling
kvm_set_memory_region with a kvm_userspace_memory_region where the
guest_phys_addr field points both to the original PAs and to the IPAs. The
userspace_addr field of this struct is therefore ignored with internal VMs.

This patch extends the capabilities of the arm and arm64 stage2 MMU code
to handle internal VMs. Three things are changed:

- Various parts of the MMU code which are related to a userspace context
are now only executed if kvm->mm is present.

- When this pointer is NULL, struct kvm_userspace_memory_regions are
treated by internal_vm_prep_mem as idmaps of physical memory.

- A set of 256 additional private memslots is now reserved on arm64 for the
usage of internal VMs memory idmapping.

Note: this patch should have pretty much no performance impact on the
critical path of traditional VMs since only one unlikely branch had to be
added to the page fault handler.

Signed-off-by: Florent Revest 
---
 arch/arm64/include/asm/kvm_host.h |  1 +
 virt/kvm/arm/mmu.c| 76 +--
 2 files changed, 74 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index d686300..65aab35 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -32,6 +32,7 @@
 #define __KVM_HAVE_ARCH_INTC_INITIALIZED

 #define KVM_USER_MEM_SLOTS 512
+#define KVM_PRIVATE_MEM_SLOTS 256
 #define KVM_HALT_POLL_NS_DEFAULT 50

 #include 
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 2ea21da..1d2d3df 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -772,6 +772,11 @@ static void stage2_unmap_memslot(struct kvm *kvm,
phys_addr_t size = PAGE_SIZE * memslot->npages;
hva_t reg_end = hva + size;

+   if (unlikely(!kvm->mm)) {
+   unmap_stage2_range(kvm, addr, size);
+   return;
+   }
+
/*
 * A memory region could potentially cover multiple VMAs, and any holes
 * between them, so iterate over all of them to find out if we should
@@ -819,7 +824,8 @@ void stage2_unmap_vm(struct kvm *kvm)
int idx;

idx = srcu_read_lock(>srcu);
-   down_read(>mm->mmap_sem);
+   if (likely(kvm->mm))
+   down_read(>mm->mmap_sem);
spin_lock(>mmu_lock);

slots = kvm_memslots(kvm);
@@ -827,7 +833,8 @@ void stage2_unmap_vm(struct kvm *kvm)
stage2_unmap_memslot(kvm, memslot);

spin_unlock(>mmu_lock);
-   up_read(>mm->mmap_sem);
+   if (likely(kvm->mm))
+   up_read(>mm->mmap_sem);
srcu_read_unlock(>srcu, idx);
 }

@@ -1303,6 +1310,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
return -EFAULT;
}

+   if (unlikely(!kvm->mm)) {
+   kvm_err("Unexpected internal VM page fault\n");
+   kvm_inject_vabt(vcpu);
+   return 0;
+   }
+
/* Let's check if we will get back a huge page backed by hugetlbfs */
down_read(>mm->mmap_sem);
vma = find_vma_intersection(current->mm, hva, hva + 1);
@@ -1850,6 +1863,54 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
kvm_mmu_wp_memory_region(kvm, mem->slot);
 }

+/*
+ * internal_vm_prep_mem - maps a range of hpa to gpa at stage2
+ *
+ * While userspace VMs manage gpas using hvas, internal virtual machines need a
+ * way to map physical addresses to a guest. In order to avoid code 
duplication,
+ * the kvm_set_memory_region call is kept for internal VMs, however it usually
+ * expects a struct kvm_userspace_memory_region with a userspace_addr field.
+ * With internal VMs, this field is ignored and physical memory memory pointed
+ * by guest_phys_addr can only be idmapped.
+ */
+static int internal_vm_prep_mem(struct kvm *kvm,
+   const struct kvm_userspace_memory_region *mem)
+{
+   phys_addr_t addr, end;
+   unsigned long pfn;
+   int ret;
+   struct kvm_mmu_memory_cache cache = { 0 };
+
+   end = mem->guest_phys_addr + mem->memory_size;
+   pfn = __phys_to_pfn(mem->guest_phys_addr);
+   addr = mem->guest_phys_addr;
+
+   for (; addr < end; addr += PAGE_SIZE) {
+   pte_t pte = pfn_pte(pfn, PAGE_S2);
+
+   pte = kvm_s2pte_mkwrite(pte);
+
+   ret = mmu_topup_memory_cache(,
+

[RFC 07/11] KVM: Allow initialization before the module target

2017-08-25 Thread Florent Revest

The kvm_init function has been designed to be executed during the
module_init target. It requires a struct module pointer to be used as
the owner of the /dev/* files and also tries to register /dev/kvm with a
function (misc_register) that can only be used late in the boot process.

This patch modifies kvm_init to execute this late initialization code
conditionally, only in the context of a module_init. It also offers a
kvm_set_module function to be used for /dev/kvm registration and device
files owning once the module target is reached.

As is, this patch does not change anything. However it could be used by
certain architectures to initialize the core of kvm earlier in the boot
(e.g: in a subsys_initcall) and then initialize the userspace facing files
in a module_init target. This can be useful to create internal VMs before
being able to offer the userspace APIs.

Signed-off-by: Florent Revest 
---
 include/linux/kvm_host.h |  1 +
 virt/kvm/kvm_main.c  | 28 
 2 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index dd10d3b..15a0a8d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -563,6 +563,7 @@ static inline void kvm_irqfd_exit(void)
 #endif
 int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
  struct module *module);
+int kvm_set_module(struct module *module);
 void kvm_exit(void);

 struct kvm *kvm_create_internal_vm(unsigned long type);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c1c8bb6..3c9cb00 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4086,14 +4086,10 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned 
vcpu_align,
if (r)
goto out_free;

-   kvm_chardev_ops.owner = module;
-   kvm_vm_fops.owner = module;
-   kvm_vcpu_fops.owner = module;
-
-   r = misc_register(_dev);
-   if (r) {
-   pr_err("kvm: misc device register failed\n");
-   goto out_unreg;
+   if (module) {
+   r = kvm_set_module(module);
+   if (r)
+   goto out_unreg;
}

register_syscore_ops(_syscore_ops);
@@ -4136,6 +4132,22 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned 
vcpu_align,
 }
 EXPORT_SYMBOL_GPL(kvm_init);

+int kvm_set_module(struct module *module)
+{
+   int r;
+
+   kvm_chardev_ops.owner = module;
+   kvm_vm_fops.owner = module;
+   kvm_vcpu_fops.owner = module;
+
+   r = misc_register(_dev);
+   if (r)
+   pr_err("kvm: misc device register failed\n");
+
+   return r;
+}
+EXPORT_SYMBOL_GPL(kvm_set_module);
+
 void kvm_exit(void)
 {
debugfs_remove_recursive(kvm_debugfs_dir);
--
1.9.1

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

[RFC 03/11] KVM: Allow VM lifecycle management without userspace

2017-08-25 Thread Florent Revest

The current codebase of KVM makes many assumptions regarding the origin of
the virtual machine being executed or configured. Indeed, the KVM API
implementation has been written with userspace usage in mind and lots of
userspace-specific code is used (namely preempt_notifiers, eventfd, mmu
notifiers, current->mm...)

The aim of this patch is to make the KVM API (create_vm, create_vcpu etc)
usable from a kernel context. A simple trick is used to distinguish
userspace VMs (coming from QEMU or LKVM...) from internal VMs. (coming
from other subsystems, for example for sandboxing purpose):
  - When a VM is created from an ioctl, kvm->mm is set to current->mm
  - When a VM is created from the kernel, kvm->mm must be set to NULL

This ensures that no userspace program can create internal VMs and allows
to easily check whether a given VM is attached to a process or is internal.

This patch simply encloses the userspace-specific pieces of code of
kvm_main in conditions checking if kvm->mm is present and modifies the
prototype of kvm_create_vm to enable NULL mm.

Signed-off-by: Florent Revest 
---
 virt/kvm/kvm_main.c | 64 ++---
 1 file changed, 41 insertions(+), 23 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 15252d7..2e7af1a 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -154,7 +154,8 @@ int vcpu_load(struct kvm_vcpu *vcpu)
if (mutex_lock_killable(>mutex))
return -EINTR;
cpu = get_cpu();
-   preempt_notifier_register(>preempt_notifier);
+   if (vcpu->kvm->mm)
+   preempt_notifier_register(>preempt_notifier);
kvm_arch_vcpu_load(vcpu, cpu);
put_cpu();
return 0;
@@ -165,7 +166,8 @@ void vcpu_put(struct kvm_vcpu *vcpu)
 {
preempt_disable();
kvm_arch_vcpu_put(vcpu);
-   preempt_notifier_unregister(>preempt_notifier);
+   if (vcpu->kvm->mm)
+   preempt_notifier_unregister(>preempt_notifier);
preempt_enable();
mutex_unlock(>mutex);
 }
@@ -640,7 +642,7 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
return 0;
 }

-static struct kvm *kvm_create_vm(unsigned long type)
+static struct kvm *kvm_create_vm(unsigned long type, struct mm_struct *mm)
 {
int r, i;
struct kvm *kvm = kvm_arch_alloc_vm();
@@ -649,9 +651,11 @@ static struct kvm *kvm_create_vm(unsigned long type)
return ERR_PTR(-ENOMEM);

spin_lock_init(>mmu_lock);
-   mmgrab(current->mm);
-   kvm->mm = current->mm;
-   kvm_eventfd_init(kvm);
+   kvm->mm = mm;
+   if (mm) {
+   mmgrab(current->mm);
+   kvm_eventfd_init(kvm);
+   }
mutex_init(>lock);
mutex_init(>irq_lock);
mutex_init(>slots_lock);
@@ -697,15 +701,18 @@ static struct kvm *kvm_create_vm(unsigned long type)
goto out_err;
}

-   r = kvm_init_mmu_notifier(kvm);
-   if (r)
-   goto out_err;
+   if (mm) {
+   r = kvm_init_mmu_notifier(kvm);
+   if (r)
+   goto out_err;
+   }

spin_lock(_lock);
list_add(>vm_list, _list);
spin_unlock(_lock);

-   preempt_notifier_inc();
+   if (mm)
+   preempt_notifier_inc();

return kvm;

@@ -721,7 +728,8 @@ static struct kvm *kvm_create_vm(unsigned long type)
for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++)
kvm_free_memslots(kvm, __kvm_memslots(kvm, i));
kvm_arch_free_vm(kvm);
-   mmdrop(current->mm);
+   if (mm)
+   mmdrop(mm);
return ERR_PTR(r);
 }

@@ -772,9 +780,11 @@ static void kvm_destroy_vm(struct kvm *kvm)
cleanup_srcu_struct(>irq_srcu);
cleanup_srcu_struct(>srcu);
kvm_arch_free_vm(kvm);
-   preempt_notifier_dec();
+   if (mm)
+   preempt_notifier_dec();
hardware_disable_all();
-   mmdrop(mm);
+   if (mm)
+   mmdrop(mm);
 }

 void kvm_get_kvm(struct kvm *kvm)
@@ -1269,6 +1279,9 @@ unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t 
gfn)
if (kvm_is_error_hva(addr))
return PAGE_SIZE;

+   if (!kvm->mm)
+   return PAGE_SIZE;
+
down_read(>mm->mmap_sem);
vma = find_vma(current->mm, addr);
if (!vma)
@@ -2486,9 +2499,11 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 
id)
if (r)
goto vcpu_destroy;

-   r = kvm_create_vcpu_debugfs(vcpu);
-   if (r)
-   goto vcpu_destroy;
+   if (kvm->mm) {
+   r = kvm_create_vcpu_debugfs(vcpu);
+   if (r)
+   goto vcpu_destroy;
+   }

mutex_lock(>lock);
if (kvm_get_vcpu_by_id(kvm, id)) {
@@ -2499,11 +2514,13 @@ static i

[RFC 06/11] KVM, arm64: Expose a VCPU initialization function

2017-08-25 Thread Florent Revest

KVM's core now offers internal virtual machine capabilities, however on ARM
the KVM_ARM_VCPU_INIT ioctl also has to be used to initialize a virtual CPU

This patch exposes a kvm_arm_vcpu_init() function to the rest of the kernel
on arm64 so that it can be used for arm64 internal VM initialization.

This function actually used to be named kvm_arch_vcpu_ioctl_vcpu_init() but
the "ioctl" part of the name wasn't consistent with the rest of the KVM arm
ioctl handlers. Moreover, it wasn't relevant to the usage of internal VMs.
Therefore it has been decided to rename the function to make it less
misleading.

Signed-off-by: Florent Revest 
---
 arch/arm64/include/asm/kvm_host.h | 2 ++
 virt/kvm/arm/arm.c| 5 ++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 65aab35..07b7460 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -372,6 +372,8 @@ static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu 
*vcpu) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
 static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}

+int kvm_arm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_vcpu_init *init);
+
 void kvm_arm_init_debug(void);
 void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index a39a1e1..aa29a5d 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -888,8 +888,7 @@ static int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
 }


-static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu,
-struct kvm_vcpu_init *init)
+int kvm_arm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_vcpu_init *init)
 {
int ret;

@@ -973,7 +972,7 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
if (copy_from_user(, argp, sizeof(init)))
return -EFAULT;

-   return kvm_arch_vcpu_ioctl_vcpu_init(vcpu, );
+   return kvm_arm_vcpu_init(vcpu, );
}
case KVM_SET_ONE_REG:
case KVM_GET_ONE_REG: {
--
1.9.1

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

[RFC 01/11] arm64: Add an SMCCC function IDs header

2017-08-25 Thread Florent Revest

The ARM SMC Calling Convention (DEN 0028B) introduces function IDs for
hypercalls, given in the x0 register during an SMC or HVC from a guest.

The document defines ranges of function IDs targeting different kinds of
hypervisors or supervisors.

Two ID ranges are of particular interest for the kernel:
- Standard hypervisor service calls
- Vendor specific hypervisor service calls

This patch introduces a couple of useful macros when working with SMCCC.
They provide defines of those ID ranges to be used by HVC handling (KVM)
or calling. (e.g: to leverage paravirtualized services)

The document also defines standard return values to be written into x0
after a hypercall handling. Once again, those macros can potentially be
used from both the hypervisor or the guest.

Signed-off-by: Florent Revest 
---
 include/linux/smccc_fn.h | 50 
 1 file changed, 50 insertions(+)
 create mode 100644 include/linux/smccc_fn.h

diff --git a/include/linux/smccc_fn.h b/include/linux/smccc_fn.h
new file mode 100644
index 000..f08145d
--- /dev/null
+++ b/include/linux/smccc_fn.h
@@ -0,0 +1,50 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * Copyright (C) 2017 ARM Limited
+ */
+
+#ifndef __LINUX_SMCCC_FN_H
+#define __LINUX_SMCCC_FN_H
+
+/*
+ * Standard return values
+ */
+
+#define SMCCC_STD_RET_SUCCESS  0
+#define SMCCC_STD_RET_UNKNOWN_ID   -1
+
+
+/*
+ * SMC32
+ */
+
+/* Standard hypervisor services interface */
+#define SMCCC32_STD_HYP_FN_BASE0x8500
+#define SMCCC32_STD_HYP_FN(n)  (SMCCC32_STD_HYP_FN_BASE + (n))
+
+/* Vendor specific hypervisor services interface */
+#define SMCCC32_VDR_HYP_FN_BASE0x8600
+#define SMCCC32_VDR_HYP_FN(n)  (SMCCC32_VDR_HYP_FN_BASE + (n))
+
+
+/*
+ * SMC64
+ */
+
+/* Standard hypervisor services interface */
+#define SMCCC64_STD_HYP_FN_BASE0xc500
+#define SMCCC64_STD_HYP_FN(n)  (SMCCC64_STD_HYP_FN_BASE + (n))
+
+/* Vendor specific hypervisor services interface */
+#define SMCCC64_VDR_HYP_FN_BASE0xc600
+#define SMCCC64_VDR_HYP_FN(n)  (SMCCC64_VDR_HYP_FN_BASE + (n))
+
+#endif /* __LINUX_SMCCC_FN_H */
--
1.9.1

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

Re: [RFC 00/11] KVM, EFI, arm64: EFI Runtime Services Sandboxing

2017-08-25 Thread Florent Revest

Hi,

I just realised that my email client was not configured correctly and
the confidential disclaimer at the bottom of my emails obviously don't
apply. Sorry about that.

Florent

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

Re: [PATCH bpf-next v5 4/4] selftests/bpf: Add a selftest for the tracing bpf_get_socket_cookie

2021-01-20 Thread Florent Revest

On Wed, Jan 20, 2021 at 8:04 PM Alexei Starovoitov
 wrote:
>
> On Wed, Jan 20, 2021 at 9:08 AM KP Singh  wrote:
> >
> > On Tue, Jan 19, 2021 at 5:00 PM Florent Revest  wrote:
> > >
> > > This builds up on the existing socket cookie test which checks whether
> > > the bpf_get_socket_cookie helpers provide the same value in
> > > cgroup/connect6 and sockops programs for a socket created by the
> > > userspace part of the test.
> > >
> > > Adding a tracing program to the existing objects requires a different
> > > attachment strategy and different headers.
> > >
> > > Signed-off-by: Florent Revest 
> >
> > Acked-by: KP Singh 
> >
> > (one minor note, doesn't really need fixing as a part of this though)
> >
> > > ---
> > >  .../selftests/bpf/prog_tests/socket_cookie.c  | 24 +++
> > >  .../selftests/bpf/progs/socket_cookie_prog.c  | 41 ---
> > >  2 files changed, 52 insertions(+), 13 deletions(-)
> > >
> > > diff --git a/tools/testing/selftests/bpf/prog_tests/socket_cookie.c 
> > > b/tools/testing/selftests/bpf/prog_tests/socket_cookie.c
> > > index 53d0c44e7907..e5c5e2ea1deb 100644
> > > --- a/tools/testing/selftests/bpf/prog_tests/socket_cookie.c
> > > +++ b/tools/testing/selftests/bpf/prog_tests/socket_cookie.c
> > > @@ -15,8 +15,8 @@ struct socket_cookie {
> > >
> > >  void test_socket_cookie(void)
> > >  {
> > > +   struct bpf_link *set_link, *update_sockops_link, 
> > > *update_tracing_link;
> > > socklen_t addr_len = sizeof(struct sockaddr_in6);
> > > -   struct bpf_link *set_link, *update_link;
> > > int server_fd, client_fd, cgroup_fd;
> > > struct socket_cookie_prog *skel;
> > > __u32 cookie_expected_value;
> > > @@ -39,15 +39,21 @@ void test_socket_cookie(void)
> > >   PTR_ERR(set_link)))
> > > goto close_cgroup_fd;
> > >
> > > -   update_link = 
> > > bpf_program__attach_cgroup(skel->progs.update_cookie,
> > > -cgroup_fd);
> > > -   if (CHECK(IS_ERR(update_link), "update-link-cg-attach", "err 
> > > %ld\n",
> > > - PTR_ERR(update_link)))
> > > +   update_sockops_link = bpf_program__attach_cgroup(
> > > +   skel->progs.update_cookie_sockops, cgroup_fd);
> > > +   if (CHECK(IS_ERR(update_sockops_link), 
> > > "update-sockops-link-cg-attach",
> > > + "err %ld\n", PTR_ERR(update_sockops_link)))
> > > goto free_set_link;
> > >
> > > +   update_tracing_link = bpf_program__attach(
> > > +   skel->progs.update_cookie_tracing);
> > > +   if (CHECK(IS_ERR(update_tracing_link), 
> > > "update-tracing-link-attach",
> > > + "err %ld\n", PTR_ERR(update_tracing_link)))
> > > +   goto free_update_sockops_link;
> > > +
> > > server_fd = start_server(AF_INET6, SOCK_STREAM, "::1", 0, 0);
> > > if (CHECK(server_fd < 0, "start_server", "errno %d\n", errno))
> > > -   goto free_update_link;
> > > +   goto free_update_tracing_link;
> > >
> > > client_fd = connect_to_fd(server_fd, 0);
> > > if (CHECK(client_fd < 0, "connect_to_fd", "errno %d\n", errno))
> > > @@ -71,8 +77,10 @@ void test_socket_cookie(void)
> > > close(client_fd);
> > >  close_server_fd:
> > > close(server_fd);
> > > -free_update_link:
> > > -   bpf_link__destroy(update_link);
> > > +free_update_tracing_link:
> > > +   bpf_link__destroy(update_tracing_link);
> >
> > I don't think this need to block submission unless there are other
> > issues but the
> > bpf_link__destroy can just be called in a single cleanup label because
> > it handles null or
> > erroneous inputs:
> >
> > int bpf_link__destroy(struct bpf_link *link)
> > {
> > int err = 0;
> >
> > if (IS_ERR_OR_NULL(link))
> >  return 0;
> > [...]
>
> +1 to KP's point.
>
> Also Florent, how did you test it?
> This test fails in CI and in my manual run:
> ./test_progs -t cook
> libbpf: load bpf program failed: Permission denied
> libbpf: -- BEGIN DUMP

Re: [PATCH bpf-next v4 2/4] bpf: Expose bpf_get_socket_cookie to tracing programs

2021-01-19 Thread Florent Revest

On Wed, Dec 9, 2020 at 5:35 PM Daniel Borkmann  wrote:
>
> On 12/9/20 2:26 PM, Florent Revest wrote:
> > This needs two new helpers, one that works in a sleepable context (using
> > sock_gen_cookie which disables/enables preemption) and one that does not
> > (for performance reasons). Both take a struct sock pointer and need to
> > check it for NULLness.
> >
> > This helper could also be useful to other BPF program types such as LSM.
>
> Looks like this commit description is now stale and needs to be updated
> since we only really add one helper?
>
> > Signed-off-by: Florent Revest 
> > ---
> >   include/linux/bpf.h|  1 +
> >   include/uapi/linux/bpf.h   |  7 +++
> >   kernel/trace/bpf_trace.c   |  2 ++
> >   net/core/filter.c  | 12 
> >   tools/include/uapi/linux/bpf.h |  7 +++
> >   5 files changed, 29 insertions(+)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 07cb5d15e743..5a858e8c3f1a 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -1860,6 +1860,7 @@ extern const struct bpf_func_proto 
> > bpf_per_cpu_ptr_proto;
> >   extern const struct bpf_func_proto bpf_this_cpu_ptr_proto;
> >   extern const struct bpf_func_proto bpf_ktime_get_coarse_ns_proto;
> >   extern const struct bpf_func_proto bpf_sock_from_file_proto;
> > +extern const struct bpf_func_proto bpf_get_socket_ptr_cookie_proto;
> >
> >   const struct bpf_func_proto *bpf_tracing_func_proto(
> >   enum bpf_func_id func_id, const struct bpf_prog *prog);
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index ba59309f4d18..9ac66cf25959 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -1667,6 +1667,13 @@ union bpf_attr {
> >*  Return
> >*  A 8-byte long unique number.
> >*
> > + * u64 bpf_get_socket_cookie(void *sk)
> > + *   Description
> > + *   Equivalent to **bpf_get_socket_cookie**\ () helper that 
> > accepts
> > + *   *sk*, but gets socket from a BTF **struct sock**.
>
> Maybe add a small comment that this one also works for sleepable [tracing] 
> progs?
>
> > + *   Return
> > + *   A 8-byte long unique number.
>
> ... or 0 if *sk* is NULL.

Argh, I somehow missed this email during my holidays, I'm sending a
v5. Thank you Daniel!

[PATCH bpf-next v5 4/4] selftests/bpf: Add a selftest for the tracing bpf_get_socket_cookie

2021-01-19 Thread Florent Revest

This builds up on the existing socket cookie test which checks whether
the bpf_get_socket_cookie helpers provide the same value in
cgroup/connect6 and sockops programs for a socket created by the
userspace part of the test.

Adding a tracing program to the existing objects requires a different
attachment strategy and different headers.

Signed-off-by: Florent Revest 
---
 .../selftests/bpf/prog_tests/socket_cookie.c  | 24 +++
 .../selftests/bpf/progs/socket_cookie_prog.c  | 41 ---
 2 files changed, 52 insertions(+), 13 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/socket_cookie.c 
b/tools/testing/selftests/bpf/prog_tests/socket_cookie.c
index 53d0c44e7907..e5c5e2ea1deb 100644
--- a/tools/testing/selftests/bpf/prog_tests/socket_cookie.c
+++ b/tools/testing/selftests/bpf/prog_tests/socket_cookie.c
@@ -15,8 +15,8 @@ struct socket_cookie {
 
 void test_socket_cookie(void)
 {
+   struct bpf_link *set_link, *update_sockops_link, *update_tracing_link;
socklen_t addr_len = sizeof(struct sockaddr_in6);
-   struct bpf_link *set_link, *update_link;
int server_fd, client_fd, cgroup_fd;
struct socket_cookie_prog *skel;
__u32 cookie_expected_value;
@@ -39,15 +39,21 @@ void test_socket_cookie(void)
  PTR_ERR(set_link)))
goto close_cgroup_fd;
 
-   update_link = bpf_program__attach_cgroup(skel->progs.update_cookie,
-cgroup_fd);
-   if (CHECK(IS_ERR(update_link), "update-link-cg-attach", "err %ld\n",
- PTR_ERR(update_link)))
+   update_sockops_link = bpf_program__attach_cgroup(
+   skel->progs.update_cookie_sockops, cgroup_fd);
+   if (CHECK(IS_ERR(update_sockops_link), "update-sockops-link-cg-attach",
+ "err %ld\n", PTR_ERR(update_sockops_link)))
goto free_set_link;
 
+   update_tracing_link = bpf_program__attach(
+   skel->progs.update_cookie_tracing);
+   if (CHECK(IS_ERR(update_tracing_link), "update-tracing-link-attach",
+ "err %ld\n", PTR_ERR(update_tracing_link)))
+   goto free_update_sockops_link;
+
server_fd = start_server(AF_INET6, SOCK_STREAM, "::1", 0, 0);
if (CHECK(server_fd < 0, "start_server", "errno %d\n", errno))
-   goto free_update_link;
+   goto free_update_tracing_link;
 
client_fd = connect_to_fd(server_fd, 0);
if (CHECK(client_fd < 0, "connect_to_fd", "errno %d\n", errno))
@@ -71,8 +77,10 @@ void test_socket_cookie(void)
close(client_fd);
 close_server_fd:
close(server_fd);
-free_update_link:
-   bpf_link__destroy(update_link);
+free_update_tracing_link:
+   bpf_link__destroy(update_tracing_link);
+free_update_sockops_link:
+   bpf_link__destroy(update_sockops_link);
 free_set_link:
bpf_link__destroy(set_link);
 close_cgroup_fd:
diff --git a/tools/testing/selftests/bpf/progs/socket_cookie_prog.c 
b/tools/testing/selftests/bpf/progs/socket_cookie_prog.c
index 81e84be6f86d..1f770b732cb1 100644
--- a/tools/testing/selftests/bpf/progs/socket_cookie_prog.c
+++ b/tools/testing/selftests/bpf/progs/socket_cookie_prog.c
@@ -1,11 +1,13 @@
 // SPDX-License-Identifier: GPL-2.0
 // Copyright (c) 2018 Facebook
 
-#include 
-#include 
+#include "vmlinux.h"
 
 #include 
 #include 
+#include 
+
+#define AF_INET6 10
 
 struct socket_cookie {
__u64 cookie_key;
@@ -19,6 +21,14 @@ struct {
__type(value, struct socket_cookie);
 } socket_cookies SEC(".maps");
 
+/*
+ * These three programs get executed in a row on connect() syscalls. The
+ * userspace side of the test creates a client socket, issues a connect() on it
+ * and then checks that the local storage associated with this socket has:
+ * cookie_value == local_port << 8 | 0xFF
+ * The different parts of this cookie_value are appended by those hooks if they
+ * all agree on the output of bpf_get_socket_cookie().
+ */
 SEC("cgroup/connect6")
 int set_cookie(struct bpf_sock_addr *ctx)
 {
@@ -32,14 +42,14 @@ int set_cookie(struct bpf_sock_addr *ctx)
if (!p)
return 1;
 
-   p->cookie_value = 0xFF;
+   p->cookie_value = 0xF;
p->cookie_key = bpf_get_socket_cookie(ctx);
 
return 1;
 }
 
 SEC("sockops")
-int update_cookie(struct bpf_sock_ops *ctx)
+int update_cookie_sockops(struct bpf_sock_ops *ctx)
 {
struct bpf_sock *sk;
struct socket_cookie *p;
@@ -60,9 +70,30 @@ int update_cookie(struct bpf_sock_ops *ctx)
if (p->cookie_key != bpf_get_socket_cookie(ctx))
return 1;
 
-   p->cookie_value = (ctx->local_port << 8) | p->cookie_value;
+   p->cookie_value |= (ctx->local_port << 8);

[PATCH bpf-next v5 1/4] bpf: Be less specific about socket cookies guarantees

2021-01-19 Thread Florent Revest

Since "92acdc58ab11 bpf, net: Rework cookie generator as per-cpu one"
socket cookies are not guaranteed to be non-decreasing. The
bpf_get_socket_cookie helper descriptions are currently specifying that
cookies are non-decreasing but we don't want users to rely on that.

Reported-by: Daniel Borkmann 
Signed-off-by: Florent Revest 
---
 include/uapi/linux/bpf.h   | 8 
 tools/include/uapi/linux/bpf.h | 8 
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c001766adcbc..0b735c2729b2 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1656,22 +1656,22 @@ union bpf_attr {
  * networking traffic statistics as it provides a global socket
  * identifier that can be assumed unique.
  * Return
- * A 8-byte long non-decreasing number on success, or 0 if the
- * socket field is missing inside *skb*.
+ * A 8-byte long unique number on success, or 0 if the socket
+ * field is missing inside *skb*.
  *
  * u64 bpf_get_socket_cookie(struct bpf_sock_addr *ctx)
  * Description
  * Equivalent to bpf_get_socket_cookie() helper that accepts
  * *skb*, but gets socket from **struct bpf_sock_addr** context.
  * Return
- * A 8-byte long non-decreasing number.
+ * A 8-byte long unique number.
  *
  * u64 bpf_get_socket_cookie(struct bpf_sock_ops *ctx)
  * Description
  * Equivalent to **bpf_get_socket_cookie**\ () helper that accepts
  * *skb*, but gets socket from **struct bpf_sock_ops** context.
  * Return
- * A 8-byte long non-decreasing number.
+ * A 8-byte long unique number.
  *
  * u32 bpf_get_socket_uid(struct sk_buff *skb)
  * Return
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index c001766adcbc..0b735c2729b2 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1656,22 +1656,22 @@ union bpf_attr {
  * networking traffic statistics as it provides a global socket
  * identifier that can be assumed unique.
  * Return
- * A 8-byte long non-decreasing number on success, or 0 if the
- * socket field is missing inside *skb*.
+ * A 8-byte long unique number on success, or 0 if the socket
+ * field is missing inside *skb*.
  *
  * u64 bpf_get_socket_cookie(struct bpf_sock_addr *ctx)
  * Description
  * Equivalent to bpf_get_socket_cookie() helper that accepts
  * *skb*, but gets socket from **struct bpf_sock_addr** context.
  * Return
- * A 8-byte long non-decreasing number.
+ * A 8-byte long unique number.
  *
  * u64 bpf_get_socket_cookie(struct bpf_sock_ops *ctx)
  * Description
  * Equivalent to **bpf_get_socket_cookie**\ () helper that accepts
  * *skb*, but gets socket from **struct bpf_sock_ops** context.
  * Return
- * A 8-byte long non-decreasing number.
+ * A 8-byte long unique number.
  *
  * u32 bpf_get_socket_uid(struct sk_buff *skb)
  * Return
-- 
2.30.0.284.gd98b1dd5eaa7-goog

[PATCH bpf-next v5 3/4] selftests/bpf: Integrate the socket_cookie test to test_progs

2021-01-19 Thread Florent Revest

Currently, the selftest for the BPF socket_cookie helpers is built and
run independently from test_progs. It's easy to forget and hard to
maintain.

This patch moves the socket cookies test into prog_tests/ and vastly
simplifies its logic by:
- rewriting the loading code with BPF skeletons
- rewriting the server/client code with network helpers
- rewriting the cgroup code with test__join_cgroup
- rewriting the error handling code with CHECKs

Signed-off-by: Florent Revest 
---
 tools/testing/selftests/bpf/Makefile  |   3 +-
 .../selftests/bpf/prog_tests/socket_cookie.c  |  82 +++
 .../selftests/bpf/progs/socket_cookie_prog.c  |   2 -
 .../selftests/bpf/test_socket_cookie.c| 208 --
 4 files changed, 83 insertions(+), 212 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/socket_cookie.c
 delete mode 100644 tools/testing/selftests/bpf/test_socket_cookie.c

diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index 63d6288e419c..af00fe3b7fb9 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -33,7 +33,7 @@ LDLIBS += -lcap -lelf -lz -lrt -lpthread
 # Order correspond to 'make run_tests' order
 TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map 
test_progs \
test_verifier_log test_dev_cgroup \
-   test_sock test_sockmap get_cgroup_id_user test_socket_cookie \
+   test_sock test_sockmap get_cgroup_id_user \
test_cgroup_storage \
test_netcnt test_tcpnotify_user test_sysctl \
test_progs-no_alu32
@@ -187,7 +187,6 @@ $(OUTPUT)/test_dev_cgroup: cgroup_helpers.c
 $(OUTPUT)/test_skb_cgroup_id_user: cgroup_helpers.c
 $(OUTPUT)/test_sock: cgroup_helpers.c
 $(OUTPUT)/test_sock_addr: cgroup_helpers.c
-$(OUTPUT)/test_socket_cookie: cgroup_helpers.c
 $(OUTPUT)/test_sockmap: cgroup_helpers.c
 $(OUTPUT)/test_tcpnotify_user: cgroup_helpers.c trace_helpers.c
 $(OUTPUT)/get_cgroup_id_user: cgroup_helpers.c
diff --git a/tools/testing/selftests/bpf/prog_tests/socket_cookie.c 
b/tools/testing/selftests/bpf/prog_tests/socket_cookie.c
new file mode 100644
index ..53d0c44e7907
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/socket_cookie.c
@@ -0,0 +1,82 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2020 Google LLC.
+// Copyright (c) 2018 Facebook
+
+#include 
+#include "socket_cookie_prog.skel.h"
+#include "network_helpers.h"
+
+static int duration;
+
+struct socket_cookie {
+   __u64 cookie_key;
+   __u32 cookie_value;
+};
+
+void test_socket_cookie(void)
+{
+   socklen_t addr_len = sizeof(struct sockaddr_in6);
+   struct bpf_link *set_link, *update_link;
+   int server_fd, client_fd, cgroup_fd;
+   struct socket_cookie_prog *skel;
+   __u32 cookie_expected_value;
+   struct sockaddr_in6 addr;
+   struct socket_cookie val;
+   int err = 0;
+
+   skel = socket_cookie_prog__open_and_load();
+   if (CHECK(!skel, "socket_cookie_prog__open_and_load",
+ "skeleton open_and_load failed\n"))
+   return;
+
+   cgroup_fd = test__join_cgroup("/socket_cookie");
+   if (CHECK(cgroup_fd < 0, "join_cgroup", "cgroup creation failed\n"))
+   goto destroy_skel;
+
+   set_link = bpf_program__attach_cgroup(skel->progs.set_cookie,
+ cgroup_fd);
+   if (CHECK(IS_ERR(set_link), "set-link-cg-attach", "err %ld\n",
+ PTR_ERR(set_link)))
+   goto close_cgroup_fd;
+
+   update_link = bpf_program__attach_cgroup(skel->progs.update_cookie,
+cgroup_fd);
+   if (CHECK(IS_ERR(update_link), "update-link-cg-attach", "err %ld\n",
+ PTR_ERR(update_link)))
+   goto free_set_link;
+
+   server_fd = start_server(AF_INET6, SOCK_STREAM, "::1", 0, 0);
+   if (CHECK(server_fd < 0, "start_server", "errno %d\n", errno))
+   goto free_update_link;
+
+   client_fd = connect_to_fd(server_fd, 0);
+   if (CHECK(client_fd < 0, "connect_to_fd", "errno %d\n", errno))
+   goto close_server_fd;
+
+   err = bpf_map_lookup_elem(bpf_map__fd(skel->maps.socket_cookies),
+ _fd, );
+   if (CHECK(err, "map_lookup", "err %d errno %d\n", err, errno))
+   goto close_client_fd;
+
+   err = getsockname(client_fd, (struct sockaddr *), _len);
+   if (CHECK(err, "getsockname", "Can't get client local addr\n"))
+   goto close_client_fd;
+
+   cookie_expected_value = (ntohs(addr.sin6_port) << 8) | 0xFF;
+   CHECK(val.cookie_value != cookie_expected_value, "",
+

[PATCH bpf-next v5 2/4] bpf: Expose bpf_get_socket_cookie to tracing programs

2021-01-19 Thread Florent Revest

This needs a new helper that:
- can work in a sleepable context (using sock_gen_cookie)
- takes a struct sock pointer and checks that it's not NULL

Signed-off-by: Florent Revest 
---
 include/linux/bpf.h|  1 +
 include/uapi/linux/bpf.h   |  8 
 kernel/trace/bpf_trace.c   |  2 ++
 net/core/filter.c  | 12 
 tools/include/uapi/linux/bpf.h |  8 
 5 files changed, 31 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 1aac2af12fed..26219465e1f7 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1874,6 +1874,7 @@ extern const struct bpf_func_proto bpf_per_cpu_ptr_proto;
 extern const struct bpf_func_proto bpf_this_cpu_ptr_proto;
 extern const struct bpf_func_proto bpf_ktime_get_coarse_ns_proto;
 extern const struct bpf_func_proto bpf_sock_from_file_proto;
+extern const struct bpf_func_proto bpf_get_socket_ptr_cookie_proto;
 
 const struct bpf_func_proto *bpf_tracing_func_proto(
enum bpf_func_id func_id, const struct bpf_prog *prog);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 0b735c2729b2..5855c398d685 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1673,6 +1673,14 @@ union bpf_attr {
  * Return
  * A 8-byte long unique number.
  *
+ * u64 bpf_get_socket_cookie(void *sk)
+ * Description
+ * Equivalent to **bpf_get_socket_cookie**\ () helper that accepts
+ * *sk*, but gets socket from a BTF **struct sock**. This helper
+ * also works for sleepable programs.
+ * Return
+ * A 8-byte long unique number or 0 if *sk* is NULL.
+ *
  * u32 bpf_get_socket_uid(struct sk_buff *skb)
  * Return
  * The owner UID of the socket associated to *skb*. If the socket
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 6c0018abe68a..845b2168e006 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1760,6 +1760,8 @@ tracing_prog_func_proto(enum bpf_func_id func_id, const 
struct bpf_prog *prog)
return _sk_storage_delete_tracing_proto;
case BPF_FUNC_sock_from_file:
return _sock_from_file_proto;
+   case BPF_FUNC_get_socket_cookie:
+   return _get_socket_ptr_cookie_proto;
 #endif
case BPF_FUNC_seq_printf:
return prog->expected_attach_type == BPF_TRACE_ITER ?
diff --git a/net/core/filter.c b/net/core/filter.c
index 9ab94e90d660..606e2b6115ed 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4631,6 +4631,18 @@ static const struct bpf_func_proto 
bpf_get_socket_cookie_sock_proto = {
.arg1_type  = ARG_PTR_TO_CTX,
 };
 
+BPF_CALL_1(bpf_get_socket_ptr_cookie, struct sock *, sk)
+{
+   return sk ? sock_gen_cookie(sk) : 0;
+}
+
+const struct bpf_func_proto bpf_get_socket_ptr_cookie_proto = {
+   .func   = bpf_get_socket_ptr_cookie,
+   .gpl_only   = false,
+   .ret_type   = RET_INTEGER,
+   .arg1_type  = ARG_PTR_TO_BTF_ID_SOCK_COMMON,
+};
+
 BPF_CALL_1(bpf_get_socket_cookie_sock_ops, struct bpf_sock_ops_kern *, ctx)
 {
return __sock_gen_cookie(ctx->sk);
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 0b735c2729b2..5855c398d685 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1673,6 +1673,14 @@ union bpf_attr {
  * Return
  * A 8-byte long unique number.
  *
+ * u64 bpf_get_socket_cookie(void *sk)
+ * Description
+ * Equivalent to **bpf_get_socket_cookie**\ () helper that accepts
+ * *sk*, but gets socket from a BTF **struct sock**. This helper
+ * also works for sleepable programs.
+ * Return
+ * A 8-byte long unique number or 0 if *sk* is NULL.
+ *
  * u32 bpf_get_socket_uid(struct sk_buff *skb)
  * Return
  * The owner UID of the socket associated to *skb*. If the socket
-- 
2.30.0.284.gd98b1dd5eaa7-goog

Re: [PATCH bpf-next v5 4/4] selftests/bpf: Add a selftest for the tracing bpf_get_socket_cookie

2021-01-22 Thread Florent Revest

On Wed, Jan 20, 2021 at 8:06 PM Florent Revest  wrote:
>
> On Wed, Jan 20, 2021 at 8:04 PM Alexei Starovoitov
>  wrote:
> >
> > On Wed, Jan 20, 2021 at 9:08 AM KP Singh  wrote:
> > >
> > > On Tue, Jan 19, 2021 at 5:00 PM Florent Revest  
> > > wrote:
> > > >
> > > > This builds up on the existing socket cookie test which checks whether
> > > > the bpf_get_socket_cookie helpers provide the same value in
> > > > cgroup/connect6 and sockops programs for a socket created by the
> > > > userspace part of the test.
> > > >
> > > > Adding a tracing program to the existing objects requires a different
> > > > attachment strategy and different headers.
> > > >
> > > > Signed-off-by: Florent Revest 
> > >
> > > Acked-by: KP Singh 
> > >
> > > (one minor note, doesn't really need fixing as a part of this though)
> > >
> > > > ---
> > > >  .../selftests/bpf/prog_tests/socket_cookie.c  | 24 +++
> > > >  .../selftests/bpf/progs/socket_cookie_prog.c  | 41 ---
> > > >  2 files changed, 52 insertions(+), 13 deletions(-)
> > > >
> > > > diff --git a/tools/testing/selftests/bpf/prog_tests/socket_cookie.c 
> > > > b/tools/testing/selftests/bpf/prog_tests/socket_cookie.c
> > > > index 53d0c44e7907..e5c5e2ea1deb 100644
> > > > --- a/tools/testing/selftests/bpf/prog_tests/socket_cookie.c
> > > > +++ b/tools/testing/selftests/bpf/prog_tests/socket_cookie.c
> > > > @@ -15,8 +15,8 @@ struct socket_cookie {
> > > >
> > > >  void test_socket_cookie(void)
> > > >  {
> > > > +   struct bpf_link *set_link, *update_sockops_link, 
> > > > *update_tracing_link;
> > > > socklen_t addr_len = sizeof(struct sockaddr_in6);
> > > > -   struct bpf_link *set_link, *update_link;
> > > > int server_fd, client_fd, cgroup_fd;
> > > > struct socket_cookie_prog *skel;
> > > > __u32 cookie_expected_value;
> > > > @@ -39,15 +39,21 @@ void test_socket_cookie(void)
> > > >   PTR_ERR(set_link)))
> > > > goto close_cgroup_fd;
> > > >
> > > > -   update_link = 
> > > > bpf_program__attach_cgroup(skel->progs.update_cookie,
> > > > -cgroup_fd);
> > > > -   if (CHECK(IS_ERR(update_link), "update-link-cg-attach", "err 
> > > > %ld\n",
> > > > - PTR_ERR(update_link)))
> > > > +   update_sockops_link = bpf_program__attach_cgroup(
> > > > +   skel->progs.update_cookie_sockops, cgroup_fd);
> > > > +   if (CHECK(IS_ERR(update_sockops_link), 
> > > > "update-sockops-link-cg-attach",
> > > > + "err %ld\n", PTR_ERR(update_sockops_link)))
> > > > goto free_set_link;
> > > >
> > > > +   update_tracing_link = bpf_program__attach(
> > > > +   skel->progs.update_cookie_tracing);
> > > > +   if (CHECK(IS_ERR(update_tracing_link), 
> > > > "update-tracing-link-attach",
> > > > + "err %ld\n", PTR_ERR(update_tracing_link)))
> > > > +   goto free_update_sockops_link;
> > > > +
> > > > server_fd = start_server(AF_INET6, SOCK_STREAM, "::1", 0, 0);
> > > > if (CHECK(server_fd < 0, "start_server", "errno %d\n", errno))
> > > > -   goto free_update_link;
> > > > +   goto free_update_tracing_link;
> > > >
> > > > client_fd = connect_to_fd(server_fd, 0);
> > > > if (CHECK(client_fd < 0, "connect_to_fd", "errno %d\n", errno))
> > > > @@ -71,8 +77,10 @@ void test_socket_cookie(void)
> > > > close(client_fd);
> > > >  close_server_fd:
> > > > close(server_fd);
> > > > -free_update_link:
> > > > -   bpf_link__destroy(update_link);
> > > > +free_update_tracing_link:
> > > > +   bpf_link__destroy(update_tracing_link);
> > >
> > > I don't think this need to block submission unless there are other
> > > issues but the
> > > bpf_link__destroy can just be called in a single clean

Re: [PATCH bpf-next v5 3/4] selftests/bpf: Integrate the socket_cookie test to test_progs

2021-01-22 Thread Florent Revest

On Thu, Jan 21, 2021 at 8:55 AM Andrii Nakryiko
 wrote:
>
> On Tue, Jan 19, 2021 at 8:00 AM Florent Revest  wrote:
> >
> > Currently, the selftest for the BPF socket_cookie helpers is built and
> > run independently from test_progs. It's easy to forget and hard to
> > maintain.
> >
> > This patch moves the socket cookies test into prog_tests/ and vastly
> > simplifies its logic by:
> > - rewriting the loading code with BPF skeletons
> > - rewriting the server/client code with network helpers
> > - rewriting the cgroup code with test__join_cgroup
> > - rewriting the error handling code with CHECKs
> >
> > Signed-off-by: Florent Revest 
> > ---
>
> Few nits below regarding skeleton and ASSERT_xxx usage.
>
> >  tools/testing/selftests/bpf/Makefile  |   3 +-
> >  .../selftests/bpf/prog_tests/socket_cookie.c  |  82 +++
> >  .../selftests/bpf/progs/socket_cookie_prog.c  |   2 -
> >  .../selftests/bpf/test_socket_cookie.c| 208 --
>
> please also update .gitignore

Good catch!

> >  4 files changed, 83 insertions(+), 212 deletions(-)
> >  create mode 100644 tools/testing/selftests/bpf/prog_tests/socket_cookie.c
> >  delete mode 100644 tools/testing/selftests/bpf/test_socket_cookie.c
> >
>
> [...]
>
> > +
> > +   skel = socket_cookie_prog__open_and_load();
> > +   if (CHECK(!skel, "socket_cookie_prog__open_and_load",
> > + "skeleton open_and_load failed\n"))
>
> nit: ASSERT_PTR_OK

Ah great, I find the ASSERT semantic much easier to follow than CHECKs.

> > +   return;
> > +
> > +   cgroup_fd = test__join_cgroup("/socket_cookie");
> > +   if (CHECK(cgroup_fd < 0, "join_cgroup", "cgroup creation failed\n"))
> > +   goto destroy_skel;
> > +
> > +   set_link = bpf_program__attach_cgroup(skel->progs.set_cookie,
> > + cgroup_fd);
>
> you can use skel->links->set_cookie here and it will be auto-destroyed
> when the whole skeleton is destroyed. More simplification.

Sick. :)

> > +   if (CHECK(IS_ERR(set_link), "set-link-cg-attach", "err %ld\n",
> > + PTR_ERR(set_link)))
> > +   goto close_cgroup_fd;
> > +
> > +   update_link = bpf_program__attach_cgroup(skel->progs.update_cookie,
> > +cgroup_fd);
>
> same as above, no need to maintain your link outside of skeleton
>
>
> > +   if (CHECK(IS_ERR(update_link), "update-link-cg-attach", "err %ld\n",
> > + PTR_ERR(update_link)))
> > +   goto free_set_link;
> > +
> > +   server_fd = start_server(AF_INET6, SOCK_STREAM, "::1", 0, 0);
> > +   if (CHECK(server_fd < 0, "start_server", "errno %d\n", errno))
> > +   goto free_update_link;
> > +
> > +   client_fd = connect_to_fd(server_fd, 0);
> > +   if (CHECK(client_fd < 0, "connect_to_fd", "errno %d\n", errno))
> > +   goto close_server_fd;
>
> nit: ASSERT_OK is nicer (here and in few other places)

Did you mean ASSERT_OK for the two following err checks ?

ASSERT_OK does not seem right for a fd check where we want fd to be
positive. ASSERT_OK does: "bool ___ok = ___res == 0;"

I will keep my "CHECK(fd < 0" but maybe there could be an
ASSERT_POSITIVE that does "bool ___ok = ___res >= 0;"

> > +
> > +   err = bpf_map_lookup_elem(bpf_map__fd(skel->maps.socket_cookies),
> > + _fd, );
> > +   if (CHECK(err, "map_lookup", "err %d errno %d\n", err, errno))
> > +   goto close_client_fd;
> > +
> > +   err = getsockname(client_fd, (struct sockaddr *), _len);
> > +   if (CHECK(err, "getsockname", "Can't get client local addr\n"))
> > +   goto close_client_fd;
> > +
> > +   cookie_expected_value = (ntohs(addr.sin6_port) << 8) | 0xFF;
> > +   CHECK(val.cookie_value != cookie_expected_value, "",
> > + "Unexpected value in map: %x != %x\n", val.cookie_value,
> > + cookie_expected_value);
>
> nit: ASSERT_NEQ is nicer

Indeed.

> > +
> > +close_client_fd:
> > +   close(client_fd);
> > +close_server_fd:
> > +   close(server_fd);
> > +free_update_link:
> > +   bpf_link__destroy(update_link);
> > +free_set_link:
> > +   bpf_link__destroy(set_link);
> > +close_cgroup_fd:
> > +   close(cgroup_fd);
> > +destroy_skel:
> > +   socket_cookie_prog__destroy(skel);
> > +}
>
> [...]

Re: [PATCH] ima: Fix NULL pointer dereference in ima_file_hash

2020-09-16 Thread Florent Revest

Reviewed-by: Florent Revest 

On Wed, 2020-09-16 at 12:05 +, KP Singh wrote:
> From: KP Singh 
> 
> ima_file_hash can be called when there is no iint->ima_hash available
> even though the inode exists in the integrity cache.
> 
> An example where this can happen (suggested by Jann Horn):
> 
> Process A does:
> 
>   while(1) {
>   unlink("/tmp/imafoo");
>   fd = open("/tmp/imafoo", O_RDWR|O_CREAT|O_TRUNC, 0700);
>   if (fd == -1) {
>   perror("open");
>   continue;
>   }
>   write(fd, "A", 1);
>   close(fd);
>   }
> 
> and Process B does:
> 
>   while (1) {
>   int fd = open("/tmp/imafoo", O_RDONLY);
>   if (fd == -1)
>   continue;
>   char *mapping = mmap(NULL, 0x1000, PROT_READ|PROT_EXEC,
>MAP_PRIVATE, fd, 0);
>   if (mapping != MAP_FAILED)
>   munmap(mapping, 0x1000);
>   close(fd);
>   }
> 
> Due to the race to get the iint->mutex between ima_file_hash and
> process_measurement iint->ima_hash could still be NULL.
> 
> Fixes: 6beea7afcc72 ("ima: add the ability to query the cached hash
> of a given file")
> Signed-off-by: KP Singh 
> ---
>  security/integrity/ima/ima_main.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/security/integrity/ima/ima_main.c
> b/security/integrity/ima/ima_main.c
> index 8a91711ca79b..4c86cd4eece0 100644
> --- a/security/integrity/ima/ima_main.c
> +++ b/security/integrity/ima/ima_main.c
> @@ -531,6 +531,16 @@ int ima_file_hash(struct file *file, char *buf,
> size_t buf_size)
>   return -EOPNOTSUPP;
>  
>   mutex_lock(>mutex);
> +
> + /*
> +  * ima_file_hash can be called when ima_collect_measurement has
> still
> +  * not been called, we might not always have a hash.
> +  */
> + if (!iint->ima_hash) {
> + mutex_unlock(>mutex);
> + return -EOPNOTSUPP;
> + }
> +
>   if (buf) {
>   size_t copied_size;
>

Re: [PATCH bpf-next 1/2] bpf: Add a bpf_kallsyms_lookup helper

2020-12-11 Thread Florent Revest

On Wed, Dec 2, 2020 at 10:18 PM Alexei Starovoitov
 wrote:
> I still think that adopting printk/vsnprintf for this instead of
> reinventing the wheel
> is more flexible and easier to maintain long term.
> Almost the same layout can be done with vsnprintf
> with exception of \0 char.
> More meaningful names, etc.
> See Documentation/core-api/printk-formats.rst

I agree this would be nice. I finally got a bit of time to experiment
with this and I noticed a few things:

First of all, because helpers only have 5 arguments, if we use two for
the output buffer and its size and two for the format string and its
size, we are only left with one argument for a modifier. This is still
enough for our usecase (where we'd only use "%ps" for example) but it
does not strictly-speaking allow for the same layout that Andrii
proposed.

> If we force fmt to come from readonly map then bpf_trace_printk()-like
> run-time check of fmt string can be moved into load time check
> and performance won't suffer.

Regarding this bit, I have the impression that this would not be
possible, but maybe I'm missing something ? :)

The iteration that bpf_trace_printk does over the format string
argument is not only used for validation. It is also used to remember
what extra operations need to be done based on the modifier types. For
example, it remembers whether an arg should be interpreted as 32bits or
64bits. In the case of string printing, it also remembers whether it is
a kernel-space or user-space pointer so that bpf_trace_copy_string can
be called with the right arg. If we were to run the iteration over the format
string in the verifier, how would you recommend that we
"remember" the modifier type until the helper gets called ?

Re: [PATCH bpf-next v3 2/4] bpf: Expose bpf_get_socket_cookie to tracing programs

2020-12-09 Thread Florent Revest

On Tue, 2020-12-08 at 23:08 +0100, KP Singh wrote:
> My understanding is you can simply always call sock_gen_cookie and
> not have two protos.
> 
> This will disable preemption in sleepable programs and not have any
> effect in non-sleepable programs since preemption will already be
> disabled.

Sure, that works. I thought that providing two helper implems would
slightly improve performances on non-sleepable programs but I can send
a v4 with only one helper that calls sock_gen_cookie.

[PATCH bpf-next v4 3/4] selftests/bpf: Integrate the socket_cookie test to test_progs

2020-12-09 Thread Florent Revest

Currently, the selftest for the BPF socket_cookie helpers is built and
run independently from test_progs. It's easy to forget and hard to
maintain.

This patch moves the socket cookies test into prog_tests/ and vastly
simplifies its logic by:
- rewriting the loading code with BPF skeletons
- rewriting the server/client code with network helpers
- rewriting the cgroup code with test__join_cgroup
- rewriting the error handling code with CHECKs

Signed-off-by: Florent Revest 
---
 tools/testing/selftests/bpf/Makefile  |   3 +-
 .../selftests/bpf/prog_tests/socket_cookie.c  |  82 +++
 .../selftests/bpf/progs/socket_cookie_prog.c  |   2 -
 .../selftests/bpf/test_socket_cookie.c| 208 --
 4 files changed, 83 insertions(+), 212 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/socket_cookie.c
 delete mode 100644 tools/testing/selftests/bpf/test_socket_cookie.c

diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index ac25ba5d0d6c..c21960d5f286 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -33,7 +33,7 @@ LDLIBS += -lcap -lelf -lz -lrt -lpthread
 # Order correspond to 'make run_tests' order
 TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map 
test_progs \
test_verifier_log test_dev_cgroup \
-   test_sock test_sockmap get_cgroup_id_user test_socket_cookie \
+   test_sock test_sockmap get_cgroup_id_user \
test_cgroup_storage \
test_netcnt test_tcpnotify_user test_sysctl \
test_progs-no_alu32 \
@@ -167,7 +167,6 @@ $(OUTPUT)/test_dev_cgroup: cgroup_helpers.c
 $(OUTPUT)/test_skb_cgroup_id_user: cgroup_helpers.c
 $(OUTPUT)/test_sock: cgroup_helpers.c
 $(OUTPUT)/test_sock_addr: cgroup_helpers.c
-$(OUTPUT)/test_socket_cookie: cgroup_helpers.c
 $(OUTPUT)/test_sockmap: cgroup_helpers.c
 $(OUTPUT)/test_tcpnotify_user: cgroup_helpers.c trace_helpers.c
 $(OUTPUT)/get_cgroup_id_user: cgroup_helpers.c
diff --git a/tools/testing/selftests/bpf/prog_tests/socket_cookie.c 
b/tools/testing/selftests/bpf/prog_tests/socket_cookie.c
new file mode 100644
index ..53d0c44e7907
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/socket_cookie.c
@@ -0,0 +1,82 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2020 Google LLC.
+// Copyright (c) 2018 Facebook
+
+#include 
+#include "socket_cookie_prog.skel.h"
+#include "network_helpers.h"
+
+static int duration;
+
+struct socket_cookie {
+   __u64 cookie_key;
+   __u32 cookie_value;
+};
+
+void test_socket_cookie(void)
+{
+   socklen_t addr_len = sizeof(struct sockaddr_in6);
+   struct bpf_link *set_link, *update_link;
+   int server_fd, client_fd, cgroup_fd;
+   struct socket_cookie_prog *skel;
+   __u32 cookie_expected_value;
+   struct sockaddr_in6 addr;
+   struct socket_cookie val;
+   int err = 0;
+
+   skel = socket_cookie_prog__open_and_load();
+   if (CHECK(!skel, "socket_cookie_prog__open_and_load",
+ "skeleton open_and_load failed\n"))
+   return;
+
+   cgroup_fd = test__join_cgroup("/socket_cookie");
+   if (CHECK(cgroup_fd < 0, "join_cgroup", "cgroup creation failed\n"))
+   goto destroy_skel;
+
+   set_link = bpf_program__attach_cgroup(skel->progs.set_cookie,
+ cgroup_fd);
+   if (CHECK(IS_ERR(set_link), "set-link-cg-attach", "err %ld\n",
+ PTR_ERR(set_link)))
+   goto close_cgroup_fd;
+
+   update_link = bpf_program__attach_cgroup(skel->progs.update_cookie,
+cgroup_fd);
+   if (CHECK(IS_ERR(update_link), "update-link-cg-attach", "err %ld\n",
+ PTR_ERR(update_link)))
+   goto free_set_link;
+
+   server_fd = start_server(AF_INET6, SOCK_STREAM, "::1", 0, 0);
+   if (CHECK(server_fd < 0, "start_server", "errno %d\n", errno))
+   goto free_update_link;
+
+   client_fd = connect_to_fd(server_fd, 0);
+   if (CHECK(client_fd < 0, "connect_to_fd", "errno %d\n", errno))
+   goto close_server_fd;
+
+   err = bpf_map_lookup_elem(bpf_map__fd(skel->maps.socket_cookies),
+ _fd, );
+   if (CHECK(err, "map_lookup", "err %d errno %d\n", err, errno))
+   goto close_client_fd;
+
+   err = getsockname(client_fd, (struct sockaddr *), _len);
+   if (CHECK(err, "getsockname", "Can't get client local addr\n"))
+   goto close_client_fd;
+
+   cookie_expected_value = (ntohs(addr.sin6_port) << 8) | 0xFF;
+   CHECK(val.cookie_value != cookie_expected_value, "",
+

[PATCH bpf-next v4 2/4] bpf: Expose bpf_get_socket_cookie to tracing programs

2020-12-09 Thread Florent Revest

This needs two new helpers, one that works in a sleepable context (using
sock_gen_cookie which disables/enables preemption) and one that does not
(for performance reasons). Both take a struct sock pointer and need to
check it for NULLness.

This helper could also be useful to other BPF program types such as LSM.

Signed-off-by: Florent Revest 
---
 include/linux/bpf.h|  1 +
 include/uapi/linux/bpf.h   |  7 +++
 kernel/trace/bpf_trace.c   |  2 ++
 net/core/filter.c  | 12 
 tools/include/uapi/linux/bpf.h |  7 +++
 5 files changed, 29 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 07cb5d15e743..5a858e8c3f1a 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1860,6 +1860,7 @@ extern const struct bpf_func_proto bpf_per_cpu_ptr_proto;
 extern const struct bpf_func_proto bpf_this_cpu_ptr_proto;
 extern const struct bpf_func_proto bpf_ktime_get_coarse_ns_proto;
 extern const struct bpf_func_proto bpf_sock_from_file_proto;
+extern const struct bpf_func_proto bpf_get_socket_ptr_cookie_proto;
 
 const struct bpf_func_proto *bpf_tracing_func_proto(
enum bpf_func_id func_id, const struct bpf_prog *prog);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index ba59309f4d18..9ac66cf25959 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1667,6 +1667,13 @@ union bpf_attr {
  * Return
  * A 8-byte long unique number.
  *
+ * u64 bpf_get_socket_cookie(void *sk)
+ * Description
+ * Equivalent to **bpf_get_socket_cookie**\ () helper that accepts
+ * *sk*, but gets socket from a BTF **struct sock**.
+ * Return
+ * A 8-byte long unique number.
+ *
  * u32 bpf_get_socket_uid(struct sk_buff *skb)
  * Return
  * The owner UID of the socket associated to *skb*. If the socket
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 52ddd217d6a1..be5e96de306d 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1760,6 +1760,8 @@ tracing_prog_func_proto(enum bpf_func_id func_id, const 
struct bpf_prog *prog)
return _sk_storage_delete_tracing_proto;
case BPF_FUNC_sock_from_file:
return _sock_from_file_proto;
+   case BPF_FUNC_get_socket_cookie:
+   return _get_socket_ptr_cookie_proto;
 #endif
case BPF_FUNC_seq_printf:
return prog->expected_attach_type == BPF_TRACE_ITER ?
diff --git a/net/core/filter.c b/net/core/filter.c
index 255aeee72402..13ad9a64f04f 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4631,6 +4631,18 @@ static const struct bpf_func_proto 
bpf_get_socket_cookie_sock_proto = {
.arg1_type  = ARG_PTR_TO_CTX,
 };
 
+BPF_CALL_1(bpf_get_socket_ptr_cookie, struct sock *, sk)
+{
+   return sk ? sock_gen_cookie(sk) : 0;
+}
+
+const struct bpf_func_proto bpf_get_socket_ptr_cookie_proto = {
+   .func   = bpf_get_socket_ptr_cookie,
+   .gpl_only   = false,
+   .ret_type   = RET_INTEGER,
+   .arg1_type  = ARG_PTR_TO_BTF_ID_SOCK_COMMON,
+};
+
 BPF_CALL_1(bpf_get_socket_cookie_sock_ops, struct bpf_sock_ops_kern *, ctx)
 {
return __sock_gen_cookie(ctx->sk);
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index ba59309f4d18..9ac66cf25959 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1667,6 +1667,13 @@ union bpf_attr {
  * Return
  * A 8-byte long unique number.
  *
+ * u64 bpf_get_socket_cookie(void *sk)
+ * Description
+ * Equivalent to **bpf_get_socket_cookie**\ () helper that accepts
+ * *sk*, but gets socket from a BTF **struct sock**.
+ * Return
+ * A 8-byte long unique number.
+ *
  * u32 bpf_get_socket_uid(struct sk_buff *skb)
  * Return
  * The owner UID of the socket associated to *skb*. If the socket
-- 
2.29.2.576.ga3fc446d84-goog

[PATCH bpf-next v4 4/4] selftests/bpf: Add a selftest for the tracing bpf_get_socket_cookie

2020-12-09 Thread Florent Revest

This builds up on the existing socket cookie test which checks whether
the bpf_get_socket_cookie helpers provide the same value in
cgroup/connect6 and sockops programs for a socket created by the
userspace part of the test.

Adding a tracing program to the existing objects requires a different
attachment strategy and different headers.

Signed-off-by: Florent Revest 
---
 .../selftests/bpf/prog_tests/socket_cookie.c  | 24 +++
 .../selftests/bpf/progs/socket_cookie_prog.c  | 41 ---
 2 files changed, 52 insertions(+), 13 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/socket_cookie.c 
b/tools/testing/selftests/bpf/prog_tests/socket_cookie.c
index 53d0c44e7907..e5c5e2ea1deb 100644
--- a/tools/testing/selftests/bpf/prog_tests/socket_cookie.c
+++ b/tools/testing/selftests/bpf/prog_tests/socket_cookie.c
@@ -15,8 +15,8 @@ struct socket_cookie {
 
 void test_socket_cookie(void)
 {
+   struct bpf_link *set_link, *update_sockops_link, *update_tracing_link;
socklen_t addr_len = sizeof(struct sockaddr_in6);
-   struct bpf_link *set_link, *update_link;
int server_fd, client_fd, cgroup_fd;
struct socket_cookie_prog *skel;
__u32 cookie_expected_value;
@@ -39,15 +39,21 @@ void test_socket_cookie(void)
  PTR_ERR(set_link)))
goto close_cgroup_fd;
 
-   update_link = bpf_program__attach_cgroup(skel->progs.update_cookie,
-cgroup_fd);
-   if (CHECK(IS_ERR(update_link), "update-link-cg-attach", "err %ld\n",
- PTR_ERR(update_link)))
+   update_sockops_link = bpf_program__attach_cgroup(
+   skel->progs.update_cookie_sockops, cgroup_fd);
+   if (CHECK(IS_ERR(update_sockops_link), "update-sockops-link-cg-attach",
+ "err %ld\n", PTR_ERR(update_sockops_link)))
goto free_set_link;
 
+   update_tracing_link = bpf_program__attach(
+   skel->progs.update_cookie_tracing);
+   if (CHECK(IS_ERR(update_tracing_link), "update-tracing-link-attach",
+ "err %ld\n", PTR_ERR(update_tracing_link)))
+   goto free_update_sockops_link;
+
server_fd = start_server(AF_INET6, SOCK_STREAM, "::1", 0, 0);
if (CHECK(server_fd < 0, "start_server", "errno %d\n", errno))
-   goto free_update_link;
+   goto free_update_tracing_link;
 
client_fd = connect_to_fd(server_fd, 0);
if (CHECK(client_fd < 0, "connect_to_fd", "errno %d\n", errno))
@@ -71,8 +77,10 @@ void test_socket_cookie(void)
close(client_fd);
 close_server_fd:
close(server_fd);
-free_update_link:
-   bpf_link__destroy(update_link);
+free_update_tracing_link:
+   bpf_link__destroy(update_tracing_link);
+free_update_sockops_link:
+   bpf_link__destroy(update_sockops_link);
 free_set_link:
bpf_link__destroy(set_link);
 close_cgroup_fd:
diff --git a/tools/testing/selftests/bpf/progs/socket_cookie_prog.c 
b/tools/testing/selftests/bpf/progs/socket_cookie_prog.c
index 81e84be6f86d..1f770b732cb1 100644
--- a/tools/testing/selftests/bpf/progs/socket_cookie_prog.c
+++ b/tools/testing/selftests/bpf/progs/socket_cookie_prog.c
@@ -1,11 +1,13 @@
 // SPDX-License-Identifier: GPL-2.0
 // Copyright (c) 2018 Facebook
 
-#include 
-#include 
+#include "vmlinux.h"
 
 #include 
 #include 
+#include 
+
+#define AF_INET6 10
 
 struct socket_cookie {
__u64 cookie_key;
@@ -19,6 +21,14 @@ struct {
__type(value, struct socket_cookie);
 } socket_cookies SEC(".maps");
 
+/*
+ * These three programs get executed in a row on connect() syscalls. The
+ * userspace side of the test creates a client socket, issues a connect() on it
+ * and then checks that the local storage associated with this socket has:
+ * cookie_value == local_port << 8 | 0xFF
+ * The different parts of this cookie_value are appended by those hooks if they
+ * all agree on the output of bpf_get_socket_cookie().
+ */
 SEC("cgroup/connect6")
 int set_cookie(struct bpf_sock_addr *ctx)
 {
@@ -32,14 +42,14 @@ int set_cookie(struct bpf_sock_addr *ctx)
if (!p)
return 1;
 
-   p->cookie_value = 0xFF;
+   p->cookie_value = 0xF;
p->cookie_key = bpf_get_socket_cookie(ctx);
 
return 1;
 }
 
 SEC("sockops")
-int update_cookie(struct bpf_sock_ops *ctx)
+int update_cookie_sockops(struct bpf_sock_ops *ctx)
 {
struct bpf_sock *sk;
struct socket_cookie *p;
@@ -60,9 +70,30 @@ int update_cookie(struct bpf_sock_ops *ctx)
if (p->cookie_key != bpf_get_socket_cookie(ctx))
return 1;
 
-   p->cookie_value = (ctx->local_port << 8) | p->cookie_value;
+   p->cookie_value |= (ctx->local_port << 8);

[PATCH bpf-next v4 1/4] bpf: Be less specific about socket cookies guarantees

2020-12-09 Thread Florent Revest

Since "92acdc58ab11 bpf, net: Rework cookie generator as per-cpu one"
socket cookies are not guaranteed to be non-decreasing. The
bpf_get_socket_cookie helper descriptions are currently specifying that
cookies are non-decreasing but we don't want users to rely on that.

Reported-by: Daniel Borkmann 
Signed-off-by: Florent Revest 
---
 include/uapi/linux/bpf.h   | 8 
 tools/include/uapi/linux/bpf.h | 8 
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 30b477a26482..ba59309f4d18 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1650,22 +1650,22 @@ union bpf_attr {
  * networking traffic statistics as it provides a global socket
  * identifier that can be assumed unique.
  * Return
- * A 8-byte long non-decreasing number on success, or 0 if the
- * socket field is missing inside *skb*.
+ * A 8-byte long unique number on success, or 0 if the socket
+ * field is missing inside *skb*.
  *
  * u64 bpf_get_socket_cookie(struct bpf_sock_addr *ctx)
  * Description
  * Equivalent to bpf_get_socket_cookie() helper that accepts
  * *skb*, but gets socket from **struct bpf_sock_addr** context.
  * Return
- * A 8-byte long non-decreasing number.
+ * A 8-byte long unique number.
  *
  * u64 bpf_get_socket_cookie(struct bpf_sock_ops *ctx)
  * Description
  * Equivalent to **bpf_get_socket_cookie**\ () helper that accepts
  * *skb*, but gets socket from **struct bpf_sock_ops** context.
  * Return
- * A 8-byte long non-decreasing number.
+ * A 8-byte long unique number.
  *
  * u32 bpf_get_socket_uid(struct sk_buff *skb)
  * Return
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 30b477a26482..ba59309f4d18 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1650,22 +1650,22 @@ union bpf_attr {
  * networking traffic statistics as it provides a global socket
  * identifier that can be assumed unique.
  * Return
- * A 8-byte long non-decreasing number on success, or 0 if the
- * socket field is missing inside *skb*.
+ * A 8-byte long unique number on success, or 0 if the socket
+ * field is missing inside *skb*.
  *
  * u64 bpf_get_socket_cookie(struct bpf_sock_addr *ctx)
  * Description
  * Equivalent to bpf_get_socket_cookie() helper that accepts
  * *skb*, but gets socket from **struct bpf_sock_addr** context.
  * Return
- * A 8-byte long non-decreasing number.
+ * A 8-byte long unique number.
  *
  * u64 bpf_get_socket_cookie(struct bpf_sock_ops *ctx)
  * Description
  * Equivalent to **bpf_get_socket_cookie**\ () helper that accepts
  * *skb*, but gets socket from **struct bpf_sock_ops** context.
  * Return
- * A 8-byte long non-decreasing number.
+ * A 8-byte long unique number.
  *
  * u32 bpf_get_socket_uid(struct sk_buff *skb)
  * Return
-- 
2.29.2.576.ga3fc446d84-goog

Re: [PATCH bpf-next 1/2] bpf: Add a bpf_kallsyms_lookup helper

2020-12-22 Thread Florent Revest

On Tue, Dec 22, 2020 at 3:18 PM Christoph Hellwig  wrote:
>
> FYI, there is a reason why kallsyms_lookup is not exported any more.
> I don't think adding that back through a backdoor is a good idea.

Did you maybe mean kallsyms_lookup_name (the one that looks an address
up based on a symbol name) ? It used to be exported but isn't anymore
indeed.
However, this is not what we're trying to do. As far as I can tell,
kallsyms_lookup (the one that looks a symbol name up based on an
address) has never been exported but its close cousins sprint_symbol
and sprint_symbol_no_offset (which only call kallsyms_lookup and
pretty print the result) are still exported, they are also used by
vsprintf. Is this an issue ?

Re: [PATCH bpf-next 1/2] bpf: Add a bpf_kallsyms_lookup helper

2020-12-22 Thread Florent Revest

On Fri, Dec 18, 2020 at 9:47 PM Andrii Nakryiko
 wrote:
>
> On Fri, Dec 18, 2020 at 12:36 PM Alexei Starovoitov
>  wrote:
> >
> > On Fri, Dec 18, 2020 at 10:53:57AM -0800, Andrii Nakryiko wrote:
> > > On Thu, Dec 17, 2020 at 7:20 PM Alexei Starovoitov
> > >  wrote:
> > > >
> > > > On Thu, Dec 17, 2020 at 09:26:09AM -0800, Yonghong Song wrote:
> > > > >
> > > > >
> > > > > On 12/17/20 7:31 AM, Florent Revest wrote:
> > > > > > On Mon, Dec 14, 2020 at 7:47 AM Yonghong Song  wrote:
> > > > > > > On 12/11/20 6:40 AM, Florent Revest wrote:
> > > > > > > > On Wed, Dec 2, 2020 at 10:18 PM Alexei Starovoitov
> > > > > > > >  wrote:
> > > > > > > > > I still think that adopting printk/vsnprintf for this instead 
> > > > > > > > > of
> > > > > > > > > reinventing the wheel
> > > > > > > > > is more flexible and easier to maintain long term.
> > > > > > > > > Almost the same layout can be done with vsnprintf
> > > > > > > > > with exception of \0 char.
> > > > > > > > > More meaningful names, etc.
> > > > > > > > > See Documentation/core-api/printk-formats.rst
> > > > > > > >
> > > > > > > > I agree this would be nice. I finally got a bit of time to 
> > > > > > > > experiment
> > > > > > > > with this and I noticed a few things:
> > > > > > > >
> > > > > > > > First of all, because helpers only have 5 arguments, if we use 
> > > > > > > > two for
> > > > > > > > the output buffer and its size and two for the format string 
> > > > > > > > and its
> > > > > > > > size, we are only left with one argument for a modifier. This 
> > > > > > > > is still
> > > > > > > > enough for our usecase (where we'd only use "%ps" for example) 
> > > > > > > > but it
> > > > > > > > does not strictly-speaking allow for the same layout that Andrii
> > > > > > > > proposed.
> > > > > > >
> > > > > > > See helper bpf_seq_printf. It packs all arguments for format 
> > > > > > > string and
> > > > > > > puts them into an array. bpf_seq_printf will unpack them as it 
> > > > > > > parsed
> > > > > > > through the format string. So it should be doable to have more 
> > > > > > > than
> > > > > > > "%ps" in format string.
> > > > > >
> > > > > > This could be a nice trick, thank you for the suggestion Yonghong :)
> > > > > >
> > > > > > My understanding is that this would also require two extra args (one
> > > > > > for the array of arguments and one for the size of this array) so it
> > > > > > would still not fit the 5 arguments limit I described in my previous
> > > > > > email.
> > > > > > eg: this would not be possible:
> > > > > > long bpf_snprintf(const char *out, u32 out_size,
> > > > > >const char *fmt, u32 fmt_size,
> > > > > >   const void *data, u32 data_len)
> > > > >
> > > > > Right. bpf allows only up to 5 parameters.
> > > > > >
> > > > > > Would you then suggest that we also put the format string and its
> > > > > > length in the first and second cells of this array and have 
> > > > > > something
> > > > > > along the line of:
> > > > > > long bpf_snprintf(const char *out, u32 out_size,
> > > > > >const void *args, u32 args_len) ?
> > > > > > This seems like a fairly opaque signature to me and harder to 
> > > > > > verify.
> > > > >
> > > > > One way is to define an explicit type for args, something like
> > > > >struct bpf_fmt_str_data {
> > > > >   char *fmt;
> > > > >   u64 fmt_len;
> > > > >   u64 data[];
> > > > >};
> > > >
> > > > that feels a bit convoluted.
> > > >
> &

Re: [PATCH bpf-next 1/2] bpf: Add a bpf_kallsyms_lookup helper

2020-12-22 Thread Florent Revest

On Fri, Dec 18, 2020 at 4:20 AM Alexei Starovoitov
 wrote:
> As far as 6 arg issue:
> long bpf_snprintf(const char *out, u32 out_size,
>   const char *fmt, u32 fmt_size,
>   const void *data, u32 data_len);
> Yeah. It won't work as-is, but fmt_size is unnecessary nowadays.
> The verifier understands read-only data.
> Hence the helper can be:
> long bpf_snprintf(const char *out, u32 out_size,
>   const char *fmt,
>   const void *data, u32 data_len);
> The 3rd arg cannot be ARG_PTR_TO_MEM.
> Instead we can introduce ARG_PTR_TO_CONST_STR in the verifier.
> See check_mem_access() where it's doing bpf_map_direct_read().
> That 'fmt' string will be accessed through the same bpf_map_direct_read().
> The verifier would need to check that it's NUL-terminated valid string.

Ok, this works for me.

> It should probably do % specifier checks at the same time.

However, I'm still not sure whether that would work. Did you maybe
miss my comment in a previous email? Let me put it back here:

> The iteration that bpf_trace_printk does over the format string
> argument is not only used for validation. It is also used to remember
> what extra operations need to be done based on the modifier types. For
> example, it remembers whether an arg should be interpreted as 32bits or
> 64bits. In the case of string printing, it also remembers whether it is
> a kernel-space or user-space pointer so that bpf_trace_copy_string can
> be called with the right arg. If we were to run the iteration over the format
> string in the verifier, how would you recommend that we
> "remember" the modifier type until the helper gets called ?

The best solution I can think of would be to iterate over the format
string in the helper. In that case, the format string verification in
the verifier would be redundant and the format string wouldn't have to
be constant. Do you have any suggestions ?

> At the end bpf_snprintf() will have 5 args and when wrapped with
> BPF_SNPRINTF() macro it will accept arbitrary number of arguments to print.
> It also will be generally useful to do all other kinds of pretty printing.

Yep this macro is a good idea, I like that. :)

Re: [PATCH bpf-next 1/2] bpf: Add a bpf_kallsyms_lookup helper

2020-12-17 Thread Florent Revest

On Mon, Dec 14, 2020 at 7:47 AM Yonghong Song  wrote:
> On 12/11/20 6:40 AM, Florent Revest wrote:
> > On Wed, Dec 2, 2020 at 10:18 PM Alexei Starovoitov
> >  wrote:
> >> I still think that adopting printk/vsnprintf for this instead of
> >> reinventing the wheel
> >> is more flexible and easier to maintain long term.
> >> Almost the same layout can be done with vsnprintf
> >> with exception of \0 char.
> >> More meaningful names, etc.
> >> See Documentation/core-api/printk-formats.rst
> >
> > I agree this would be nice. I finally got a bit of time to experiment
> > with this and I noticed a few things:
> >
> > First of all, because helpers only have 5 arguments, if we use two for
> > the output buffer and its size and two for the format string and its
> > size, we are only left with one argument for a modifier. This is still
> > enough for our usecase (where we'd only use "%ps" for example) but it
> > does not strictly-speaking allow for the same layout that Andrii
> > proposed.
>
> See helper bpf_seq_printf. It packs all arguments for format string and
> puts them into an array. bpf_seq_printf will unpack them as it parsed
> through the format string. So it should be doable to have more than
> "%ps" in format string.

This could be a nice trick, thank you for the suggestion Yonghong :)

My understanding is that this would also require two extra args (one
for the array of arguments and one for the size of this array) so it
would still not fit the 5 arguments limit I described in my previous
email.
eg: this would not be possible:
long bpf_snprintf(const char *out, u32 out_size,
  const char *fmt, u32 fmt_size,
 const void *data, u32 data_len)

Would you then suggest that we also put the format string and its
length in the first and second cells of this array and have something
along the line of:
long bpf_snprintf(const char *out, u32 out_size,
  const void *args, u32 args_len) ?
This seems like a fairly opaque signature to me and harder to verify.

[PATCH] bpf: Expose bpf_sk_storage_* to iterator programs

2020-11-12 Thread Florent Revest

From: Florent Revest 

Iterators are currently used to expose kernel information to userspace
over fast procfs-like files but iterators could also be used to
initialize local storage. For example, the task_file iterator could be
used to store associations between processes and sockets.

This exposes the socket local storage helpers to all iterators. Martin
Kafai checked that this was safe to call these helpers from the
sk_storage_map iterators.

Signed-off-by: Florent Revest 
---
 kernel/trace/bpf_trace.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index e4515b0f62a8..3530120fa280 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -17,6 +17,8 @@
 #include 
 #include 
 
+#include 
+
 #include 
 #include 
 
@@ -1750,6 +1752,14 @@ tracing_prog_func_proto(enum bpf_func_id func_id, const 
struct bpf_prog *prog)
   NULL;
case BPF_FUNC_d_path:
return _d_path_proto;
+   case BPF_FUNC_sk_storage_get:
+   return prog->expected_attach_type == BPF_TRACE_ITER ?
+  _sk_storage_get_proto :
+  NULL;
+   case BPF_FUNC_sk_storage_delete:
+   return prog->expected_attach_type == BPF_TRACE_ITER ?
+  _sk_storage_delete_proto :
+  NULL;
default:
return raw_tp_prog_func_proto(func_id, prog);
}
-- 
2.29.2.222.g5d2a92d10f8-goog

[PATCH] bpf: Expose a bpf_sock_from_file helper to tracing programs

2020-11-12 Thread Florent Revest

From: Florent Revest 

eBPF programs can already check whether a file is a socket using
file->f_op == _file_ops but they can not convert file->private_data
into a struct socket with BTF information. For that, we need a new
helper that is essentially just a wrapper for sock_from_file.

sock_from_file can set an err value but this is only set to -ENOTSOCK
when the return value is NULL so it's useless superfluous information.

Signed-off-by: Florent Revest 
---
 include/uapi/linux/bpf.h   |  7 +++
 kernel/trace/bpf_trace.c   | 22 ++
 scripts/bpf_helpers_doc.py |  4 
 tools/include/uapi/linux/bpf.h |  7 +++
 4 files changed, 40 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 162999b12790..6c96bf9c1f94 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3787,6 +3787,12 @@ union bpf_attr {
  * *ARG_PTR_TO_BTF_ID* of type *task_struct*.
  * Return
  * Pointer to the current task.
+ *
+ * struct socket *bpf_sock_from_file(struct file *file)
+ * Description
+ * If the given file contains a socket, returns the associated 
socket.
+ * Return
+ * A pointer to a struct socket on success, or NULL on failure.
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -3948,6 +3954,7 @@ union bpf_attr {
FN(task_storage_get),   \
FN(task_storage_delete),\
FN(get_current_task_btf),   \
+   FN(sock_from_file), \
/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 3530120fa280..d040d3ec8313 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1255,6 +1255,26 @@ const struct bpf_func_proto bpf_snprintf_btf_proto = {
.arg5_type  = ARG_ANYTHING,
 };
 
+BPF_CALL_1(bpf_sock_from_file, struct file *, file)
+{
+   int err;
+
+   return (unsigned long) sock_from_file(file, );
+}
+
+BTF_ID_LIST(bpf_sock_from_file_btf_ids)
+BTF_ID(struct, socket)
+BTF_ID(struct, file)
+
+const struct bpf_func_proto bpf_sock_from_file_proto = {
+   .func   = bpf_sock_from_file,
+   .gpl_only   = true,
+   .ret_type   = RET_PTR_TO_BTF_ID_OR_NULL,
+   .ret_btf_id = _sock_from_file_btf_ids[0],
+   .arg1_type  = ARG_PTR_TO_BTF_ID,
+   .arg1_btf_id= _sock_from_file_btf_ids[1],
+};
+
 const struct bpf_func_proto *
 bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
@@ -1349,6 +1369,8 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const 
struct bpf_prog *prog)
return _per_cpu_ptr_proto;
case BPF_FUNC_bpf_this_cpu_ptr:
return _this_cpu_ptr_proto;
+   case BPF_FUNC_sock_from_file:
+   return _sock_from_file_proto;
default:
return NULL;
}
diff --git a/scripts/bpf_helpers_doc.py b/scripts/bpf_helpers_doc.py
index 6769caae142f..99068ec40315 100755
--- a/scripts/bpf_helpers_doc.py
+++ b/scripts/bpf_helpers_doc.py
@@ -434,6 +434,8 @@ class PrinterHelpers(Printer):
 'struct xdp_md',
 'struct path',
 'struct btf_ptr',
+'struct socket',
+'struct file',
 ]
 known_types = {
 '...',
@@ -477,6 +479,8 @@ class PrinterHelpers(Printer):
 'struct task_struct',
 'struct path',
 'struct btf_ptr',
+'struct socket',
+'struct file',
 }
 mapped_types = {
 'u8': '__u8',
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 162999b12790..6c96bf9c1f94 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -3787,6 +3787,12 @@ union bpf_attr {
  * *ARG_PTR_TO_BTF_ID* of type *task_struct*.
  * Return
  * Pointer to the current task.
+ *
+ * struct socket *bpf_sock_from_file(struct file *file)
+ * Description
+ * If the given file contains a socket, returns the associated 
socket.
+ * Return
+ * A pointer to a struct socket on success, or NULL on failure.
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -3948,6 +3954,7 @@ union bpf_attr {
FN(task_storage_get),   \
FN(task_storage_delete),\
FN(get_current_task_btf),   \
+   FN(sock_from_file), \
/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
-- 
2.29.2.222.g5d2a92d10f8-goog

[PATCH v2 1/5] net: Remove the err argument from sock_from_file

2020-11-19 Thread Florent Revest

From: Florent Revest 

Currently, the sock_from_file prototype takes an "err" pointer that is
either not set or set to -ENOTSOCK IFF the returned socket is NULL. This
makes the error redundant and it is ignored by a few callers.

This patch simplifies the API by letting callers deduce the error based
on whether the returned socket is NULL or not.

Suggested-by: Al Viro 
Signed-off-by: Florent Revest 
---
 fs/eventpoll.c   |  3 +--
 fs/io_uring.c| 16 
 include/linux/net.h  |  2 +-
 net/core/netclassid_cgroup.c |  3 +--
 net/core/netprio_cgroup.c|  3 +--
 net/core/sock.c  |  8 +---
 net/socket.c | 27 ---
 7 files changed, 29 insertions(+), 33 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 4df61129566d..c764d8d5a76a 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -415,12 +415,11 @@ static inline void ep_set_busy_poll_napi_id(struct epitem 
*epi)
unsigned int napi_id;
struct socket *sock;
struct sock *sk;
-   int err;
 
if (!net_busy_loop_on())
return;
 
-   sock = sock_from_file(epi->ffd.file, );
+   sock = sock_from_file(epi->ffd.file);
if (!sock)
return;
 
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 8018c7076b25..ace99b15cbd3 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -4341,9 +4341,9 @@ static int io_sendmsg(struct io_kiocb *req, bool 
force_nonblock,
unsigned flags;
int ret;
 
-   sock = sock_from_file(req->file, );
+   sock = sock_from_file(req->file);
if (unlikely(!sock))
-   return ret;
+   return -ENOTSOCK;
 
if (req->async_data) {
kmsg = req->async_data;
@@ -4390,9 +4390,9 @@ static int io_send(struct io_kiocb *req, bool 
force_nonblock,
unsigned flags;
int ret;
 
-   sock = sock_from_file(req->file, );
+   sock = sock_from_file(req->file);
if (unlikely(!sock))
-   return ret;
+   return -ENOTSOCK;
 
ret = import_single_range(WRITE, sr->buf, sr->len, , _iter);
if (unlikely(ret))
@@ -4569,9 +4569,9 @@ static int io_recvmsg(struct io_kiocb *req, bool 
force_nonblock,
unsigned flags;
int ret, cflags = 0;
 
-   sock = sock_from_file(req->file, );
+   sock = sock_from_file(req->file);
if (unlikely(!sock))
-   return ret;
+   return -ENOTSOCK;
 
if (req->async_data) {
kmsg = req->async_data;
@@ -4632,9 +4632,9 @@ static int io_recv(struct io_kiocb *req, bool 
force_nonblock,
unsigned flags;
int ret, cflags = 0;
 
-   sock = sock_from_file(req->file, );
+   sock = sock_from_file(req->file);
if (unlikely(!sock))
-   return ret;
+   return -ENOTSOCK;
 
if (req->flags & REQ_F_BUFFER_SELECT) {
kbuf = io_recv_buffer_select(req, !force_nonblock);
diff --git a/include/linux/net.h b/include/linux/net.h
index 0dcd51feef02..9e2324efc26a 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -240,7 +240,7 @@ int sock_sendmsg(struct socket *sock, struct msghdr *msg);
 int sock_recvmsg(struct socket *sock, struct msghdr *msg, int flags);
 struct file *sock_alloc_file(struct socket *sock, int flags, const char 
*dname);
 struct socket *sockfd_lookup(int fd, int *err);
-struct socket *sock_from_file(struct file *file, int *err);
+struct socket *sock_from_file(struct file *file);
 #define sockfd_put(sock) fput(sock->file)
 int net_ratelimit(void);
 
diff --git a/net/core/netclassid_cgroup.c b/net/core/netclassid_cgroup.c
index 41b24cd31562..b49c57d35a88 100644
--- a/net/core/netclassid_cgroup.c
+++ b/net/core/netclassid_cgroup.c
@@ -68,9 +68,8 @@ struct update_classid_context {
 
 static int update_classid_sock(const void *v, struct file *file, unsigned n)
 {
-   int err;
struct update_classid_context *ctx = (void *)v;
-   struct socket *sock = sock_from_file(file, );
+   struct socket *sock = sock_from_file(file);
 
if (sock) {
spin_lock(_sk_update_lock);
diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
index 9bd4cab7d510..99a431c56f23 100644
--- a/net/core/netprio_cgroup.c
+++ b/net/core/netprio_cgroup.c
@@ -220,8 +220,7 @@ static ssize_t write_priomap(struct kernfs_open_file *of,
 
 static int update_netprio(const void *v, struct file *file, unsigned n)
 {
-   int err;
-   struct socket *sock = sock_from_file(file, );
+   struct socket *sock = sock_from_file(file);
if (sock) {
spin_lock(_sk_update_lock);
sock_cgroup_set_prioidx(>sk->sk_cgrp_data,
diff --git a/net/core/sock.c b/net/core/sock.c
index 727ea1cc633c..dd0598d831ef 100644
--- a/net/core/sock.c
+++ b/net/core/sock.

[PATCH v2 5/5] bpf: Add an iterator selftest for bpf_sk_storage_get

2020-11-19 Thread Florent Revest

From: Florent Revest 

The eBPF program iterates over all files and tasks. For all socket
files, it stores the tgid of the last task it encountered with a handle
to that socket. This is a heuristic for finding the "owner" of a socket
similar to what's done by lsof, ss, netstat or fuser. Potentially, this
information could be used from a cgroup_skb/*gress hook to try to
associate network traffic with processes.

The test makes sure that a socket it created is tagged with prog_tests's
pid.

Signed-off-by: Florent Revest 
---
 .../selftests/bpf/prog_tests/bpf_iter.c   | 35 +++
 .../progs/bpf_iter_bpf_sk_storage_helpers.c   | 26 ++
 2 files changed, 61 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c 
b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
index bb4a638f2e6f..4d0626003c03 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
@@ -975,6 +975,39 @@ static void test_bpf_sk_storage_delete(void)
bpf_iter_bpf_sk_storage_helpers__destroy(skel);
 }
 
+/* The BPF program stores in every socket the tgid of a task owning a handle to
+ * it. The test verifies that a locally-created socket is tagged with its pid
+ */
+static void test_bpf_sk_storage_get(void)
+{
+   struct bpf_iter_bpf_sk_storage_helpers *skel;
+   int err, map_fd, val = -1;
+   int sock_fd = -1;
+
+   skel = bpf_iter_bpf_sk_storage_helpers__open_and_load();
+   if (CHECK(!skel, "bpf_iter_bpf_sk_storage_helpers__open_and_load",
+ "skeleton open_and_load failed\n"))
+   return;
+
+   sock_fd = socket(AF_INET6, SOCK_STREAM, 0);
+   if (CHECK(sock_fd < 0, "socket", "errno: %d\n", errno))
+   goto out;
+
+   do_dummy_read(skel->progs.fill_socket_owners);
+
+   map_fd = bpf_map__fd(skel->maps.sk_stg_map);
+
+   err = bpf_map_lookup_elem(map_fd, _fd, );
+   CHECK(err || val != getpid(), "bpf_map_lookup_elem",
+ "map value wasn't set correctly (expected %d, got %d, err=%d)\n",
+ getpid(), val, err);
+
+   if (sock_fd >= 0)
+   close(sock_fd);
+out:
+   bpf_iter_bpf_sk_storage_helpers__destroy(skel);
+}
+
 static void test_bpf_sk_storage_map(void)
 {
DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
@@ -1131,6 +1164,8 @@ void test_bpf_iter(void)
test_bpf_sk_storage_map();
if (test__start_subtest("bpf_sk_storage_delete"))
test_bpf_sk_storage_delete();
+   if (test__start_subtest("bpf_sk_storage_get"))
+   test_bpf_sk_storage_get();
if (test__start_subtest("rdonly-buf-out-of-bound"))
test_rdonly_buf_out_of_bound();
if (test__start_subtest("buf-neg-offset"))
diff --git 
a/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c 
b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c
index 01ff3235e413..7206fd6f09ab 100644
--- a/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c
+++ b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c
@@ -21,3 +21,29 @@ int delete_bpf_sk_storage_map(struct 
bpf_iter__bpf_sk_storage_map *ctx)
 
return 0;
 }
+
+SEC("iter/task_file")
+int fill_socket_owners(struct bpf_iter__task_file *ctx)
+{
+   struct task_struct *task = ctx->task;
+   struct file *file = ctx->file;
+   struct socket *sock;
+   int *sock_tgid;
+
+   if (!task || !file || task->tgid != task->pid)
+   return 0;
+
+   sock = bpf_sock_from_file(file);
+   if (!sock)
+   return 0;
+
+   sock_tgid = bpf_sk_storage_get(_stg_map, sock->sk, 0,
+  BPF_SK_STORAGE_GET_F_CREATE);
+   if (!sock_tgid)
+   return 0;
+
+   *sock_tgid = task->tgid;
+
+   return 0;
+}
+
-- 
2.29.2.299.gdc1121823c-goog

[PATCH v2 4/5] bpf: Add an iterator selftest for bpf_sk_storage_delete

2020-11-19 Thread Florent Revest

From: Florent Revest 

The eBPF program iterates over all entries (well, only one) of a socket
local storage map and deletes them all. The test makes sure that the
entry is indeed deleted.

Signed-off-by: Florent Revest 
---
 .../selftests/bpf/prog_tests/bpf_iter.c   | 64 +++
 .../progs/bpf_iter_bpf_sk_storage_helpers.c   | 23 +++
 2 files changed, 87 insertions(+)
 create mode 100644 
tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c 
b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
index 448885b95eed..bb4a638f2e6f 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
@@ -20,6 +20,7 @@
 #include "bpf_iter_bpf_percpu_hash_map.skel.h"
 #include "bpf_iter_bpf_array_map.skel.h"
 #include "bpf_iter_bpf_percpu_array_map.skel.h"
+#include "bpf_iter_bpf_sk_storage_helpers.skel.h"
 #include "bpf_iter_bpf_sk_storage_map.skel.h"
 #include "bpf_iter_test_kern5.skel.h"
 #include "bpf_iter_test_kern6.skel.h"
@@ -913,6 +914,67 @@ static void test_bpf_percpu_array_map(void)
bpf_iter_bpf_percpu_array_map__destroy(skel);
 }
 
+/* An iterator program deletes all local storage in a map. */
+static void test_bpf_sk_storage_delete(void)
+{
+   DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
+   struct bpf_iter_bpf_sk_storage_helpers *skel;
+   union bpf_iter_link_info linfo;
+   int err, len, map_fd, iter_fd;
+   struct bpf_link *link;
+   int sock_fd = -1;
+   __u32 val = 42;
+   char buf[64];
+
+   skel = bpf_iter_bpf_sk_storage_helpers__open_and_load();
+   if (CHECK(!skel, "bpf_iter_bpf_sk_storage_helpers__open_and_load",
+ "skeleton open_and_load failed\n"))
+   return;
+
+   map_fd = bpf_map__fd(skel->maps.sk_stg_map);
+
+   sock_fd = socket(AF_INET6, SOCK_STREAM, 0);
+   if (CHECK(sock_fd < 0, "socket", "errno: %d\n", errno))
+   goto out;
+   err = bpf_map_update_elem(map_fd, _fd, , BPF_NOEXIST);
+   if (CHECK(err, "map_update", "map_update failed\n"))
+   goto out;
+
+   memset(, 0, sizeof(linfo));
+   linfo.map.map_fd = map_fd;
+   opts.link_info = 
+   opts.link_info_len = sizeof(linfo);
+   link = bpf_program__attach_iter(skel->progs.delete_bpf_sk_storage_map,
+   );
+   if (CHECK(IS_ERR(link), "attach_iter", "attach_iter failed\n"))
+   goto out;
+
+   iter_fd = bpf_iter_create(bpf_link__fd(link));
+   if (CHECK(iter_fd < 0, "create_iter", "create_iter failed\n"))
+   goto free_link;
+
+   /* do some tests */
+   while ((len = read(iter_fd, buf, sizeof(buf))) > 0)
+   ;
+   if (CHECK(len < 0, "read", "read failed: %s\n", strerror(errno)))
+   goto close_iter;
+
+   /* test results */
+   err = bpf_map_lookup_elem(map_fd, _fd, );
+   if (CHECK(!err || errno != ENOENT, "bpf_map_lookup_elem",
+ "map value wasn't deleted (err=%d, errno=%d)\n", err, errno))
+   goto close_iter;
+
+close_iter:
+   close(iter_fd);
+free_link:
+   bpf_link__destroy(link);
+out:
+   if (sock_fd >= 0)
+   close(sock_fd);
+   bpf_iter_bpf_sk_storage_helpers__destroy(skel);
+}
+
 static void test_bpf_sk_storage_map(void)
 {
DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
@@ -1067,6 +1129,8 @@ void test_bpf_iter(void)
test_bpf_percpu_array_map();
if (test__start_subtest("bpf_sk_storage_map"))
test_bpf_sk_storage_map();
+   if (test__start_subtest("bpf_sk_storage_delete"))
+   test_bpf_sk_storage_delete();
if (test__start_subtest("rdonly-buf-out-of-bound"))
test_rdonly_buf_out_of_bound();
if (test__start_subtest("buf-neg-offset"))
diff --git 
a/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c 
b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c
new file mode 100644
index ..01ff3235e413
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c
@@ -0,0 +1,23 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020 Google LLC. */
+#include "bpf_iter.h"
+#include 
+#include 
+
+char _license[] SEC("license") = "GPL";
+
+struct {
+   __uint(type, BPF_MAP_TYPE_SK_STORAGE);
+   __uint(map_flags, BPF_F_NO_PREALLOC);
+   __type(key, int);
+   __type(value, int);
+} sk_stg_map SEC(".maps");
+
+SEC("iter/bpf_sk_storage_map")
+int delete_bpf_sk_storage_map(struct bpf_iter__bpf_sk_storage_map *ctx)
+{
+   if (ctx->sk)
+   bpf_sk_storage_delete(_stg_map, ctx->sk);
+
+   return 0;
+}
-- 
2.29.2.299.gdc1121823c-goog

[PATCH v2 2/5] bpf: Add a bpf_sock_from_file helper

2020-11-19 Thread Florent Revest

From: Florent Revest 

While eBPF programs can check whether a file is a socket by file->f_op
== _file_ops, they cannot convert the void private_data pointer
to a struct socket BTF pointer. In order to do this a new helper
wrapping sock_from_file is added.

This is useful to tracing programs but also other program types
inheriting this set of helpers such as iterators or LSM programs.

Signed-off-by: Florent Revest 
---
 include/uapi/linux/bpf.h   |  7 +++
 kernel/trace/bpf_trace.c   | 20 
 scripts/bpf_helpers_doc.py |  4 
 tools/include/uapi/linux/bpf.h |  7 +++
 4 files changed, 38 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 162999b12790..7d598f161dc0 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3787,6 +3787,12 @@ union bpf_attr {
  * *ARG_PTR_TO_BTF_ID* of type *task_struct*.
  * Return
  * Pointer to the current task.
+ *
+ * struct socket *bpf_sock_from_file(struct file *file)
+ * Description
+ * If the given file contains a socket, returns the associated 
socket.
+ * Return
+ * A pointer to a struct socket on success or NULL on failure.
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -3948,6 +3954,7 @@ union bpf_attr {
FN(task_storage_get),   \
FN(task_storage_delete),\
FN(get_current_task_btf),   \
+   FN(sock_from_file), \
/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 02986c7b90eb..d87ca6f93c58 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1260,6 +1260,24 @@ const struct bpf_func_proto bpf_snprintf_btf_proto = {
.arg5_type  = ARG_ANYTHING,
 };
 
+BPF_CALL_1(bpf_sock_from_file, struct file *, file)
+{
+   return (unsigned long) sock_from_file(file);
+}
+
+BTF_ID_LIST(bpf_sock_from_file_btf_ids)
+BTF_ID(struct, socket)
+BTF_ID(struct, file)
+
+static const struct bpf_func_proto bpf_sock_from_file_proto = {
+   .func   = bpf_sock_from_file,
+   .gpl_only   = false,
+   .ret_type   = RET_PTR_TO_BTF_ID_OR_NULL,
+   .ret_btf_id = _sock_from_file_btf_ids[0],
+   .arg1_type  = ARG_PTR_TO_BTF_ID,
+   .arg1_btf_id= _sock_from_file_btf_ids[1],
+};
+
 const struct bpf_func_proto *
 bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
@@ -1354,6 +1372,8 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const 
struct bpf_prog *prog)
return _per_cpu_ptr_proto;
case BPF_FUNC_bpf_this_cpu_ptr:
return _this_cpu_ptr_proto;
+   case BPF_FUNC_sock_from_file:
+   return _sock_from_file_proto;
default:
return NULL;
}
diff --git a/scripts/bpf_helpers_doc.py b/scripts/bpf_helpers_doc.py
index 31484377b8b1..d609f20e8360 100755
--- a/scripts/bpf_helpers_doc.py
+++ b/scripts/bpf_helpers_doc.py
@@ -435,6 +435,8 @@ class PrinterHelpers(Printer):
 'struct xdp_md',
 'struct path',
 'struct btf_ptr',
+'struct socket',
+'struct file',
 ]
 known_types = {
 '...',
@@ -478,6 +480,8 @@ class PrinterHelpers(Printer):
 'struct task_struct',
 'struct path',
 'struct btf_ptr',
+'struct socket',
+'struct file',
 }
 mapped_types = {
 'u8': '__u8',
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 162999b12790..7d598f161dc0 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -3787,6 +3787,12 @@ union bpf_attr {
  * *ARG_PTR_TO_BTF_ID* of type *task_struct*.
  * Return
  * Pointer to the current task.
+ *
+ * struct socket *bpf_sock_from_file(struct file *file)
+ * Description
+ * If the given file contains a socket, returns the associated 
socket.
+ * Return
+ * A pointer to a struct socket on success or NULL on failure.
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -3948,6 +3954,7 @@ union bpf_attr {
FN(task_storage_get),   \
FN(task_storage_delete),\
FN(get_current_task_btf),   \
+   FN(sock_from_file), \
/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
-- 
2.29.2.299.gdc1121823c-goog

[PATCH v2 3/5] bpf: Expose bpf_sk_storage_* to iterator programs

2020-11-19 Thread Florent Revest

From: Florent Revest 

Iterators are currently used to expose kernel information to userspace
over fast procfs-like files but iterators could also be used to
manipulate local storage. For example, the task_file iterator could be
used to initialize a socket local storage with associations between
processes and sockets or to selectively delete local storage values.

This exposes both socket local storage helpers to all iterators.
Alternatively we could expose it to only certain iterators with strcmps
on prog->aux->attach_func_name.

Signed-off-by: Florent Revest 
---
 net/core/bpf_sk_storage.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index a32037daa933..4edd033e899c 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -394,6 +394,7 @@ static bool bpf_sk_storage_tracing_allowed(const struct 
bpf_prog *prog)
 * use the bpf_sk_storage_(get|delete) helper.
 */
switch (prog->expected_attach_type) {
+   case BPF_TRACE_ITER:
case BPF_TRACE_RAW_TP:
/* bpf_sk_storage has no trace point */
return true;
-- 
2.29.2.299.gdc1121823c-goog

Re: saner sock_from_file() calling conventions (was Re: [PATCH] bpf: Expose a bpf_sock_from_file helper to tracing programs)

2020-11-13 Thread Florent Revest

On Thu, 2020-11-12 at 20:28 +, Al Viro wrote:
> On Thu, Nov 12, 2020 at 09:09:44PM +0100, Florent Revest wrote:
> > From: Florent Revest 
> > 
> > eBPF programs can already check whether a file is a socket using
> > file->f_op == _file_ops but they can not convert file-
> > >private_data into a struct socket with BTF information. For that,
> > we need a new helper that is essentially just a wrapper for
> > sock_from_file.
> > 
> > sock_from_file can set an err value but this is only set to
> > -ENOTSOCK when the return value is NULL so it's useless superfluous
> > information.
> 
> That's a wrong way to handle that kind of stuff.  *IF*
> sock_from_file() really has no need to return an error, its calling
> conventions ought to be changed. OTOH, if that is not the case, your
> API is a landmine.
> 
> That needs to be dealt with by netdev folks, rather than quietly
> papered over in BPF code.

Sounds good to me. :) What do netdev folks think of this ?

> It does appear that there's no realistic cause to ever need other
> errors there (well, short of some clown attaching a hook, pardon the
> obscenity), so I would recommend something like the patch below
> (completely untested):

Thanks for taking the time but is this the patch you meant to send?

> sanitize sock_from_file() calling conventions
> 
> deal with error value (always -ENOTSOCK) in the callers
> 
> Signed-off-by: Al Viro 
> ---
> diff --git a/fs/seq_file.c b/fs/seq_file.c
> index 3b20e21604e7..07b33c1f34a9 100644
> --- a/fs/seq_file.c
> +++ b/fs/seq_file.c
> @@ -168,7 +168,6 @@ EXPORT_SYMBOL(seq_read);
>  ssize_t seq_read_iter(struct kiocb *iocb, struct iov_iter *iter)
>  {
>   struct seq_file *m = iocb->ki_filp->private_data;
> - size_t size = iov_iter_count(iter);
>   size_t copied = 0;
>   size_t n;
>   void *p;
> @@ -208,14 +207,11 @@ ssize_t seq_read_iter(struct kiocb *iocb,
> struct iov_iter *iter)
>   }
>   /* if not empty - flush it first */
>   if (m->count) {
> - n = min(m->count, size);
> - if (copy_to_iter(m->buf + m->from, n, iter) != n)
> - goto Efault;
> + n = copy_to_iter(m->buf + m->from, m->count, iter);
>   m->count -= n;
>   m->from += n;
> - size -= n;
>   copied += n;
> - if (!size)
> + if (!iov_iter_count(iter) || m->count)
>   goto Done;
>   }
>   /* we need at least one record in buffer */
> @@ -249,6 +245,7 @@ ssize_t seq_read_iter(struct kiocb *iocb, struct
> iov_iter *iter)
>   goto Done;
>  Fill:
>   /* they want more? let's try to get some more */
> + /* m->count is positive and there's space left in iter */
>   while (1) {
>   size_t offs = m->count;
>   loff_t pos = m->index;
> @@ -263,7 +260,7 @@ ssize_t seq_read_iter(struct kiocb *iocb, struct
> iov_iter *iter)
>   err = PTR_ERR(p);
>   break;
>   }
> - if (m->count >= size)
> + if (m->count >= iov_iter_count(iter))
>   break;
>   err = m->op->show(m, p);
>   if (seq_has_overflowed(m) || err) {
> @@ -273,16 +270,14 @@ ssize_t seq_read_iter(struct kiocb *iocb,
> struct iov_iter *iter)
>   }
>   }
>   m->op->stop(m, p);
> - n = min(m->count, size);
> - if (copy_to_iter(m->buf, n, iter) != n)
> - goto Efault;
> + n = copy_to_iter(m->buf, m->count, iter);
>   copied += n;
>   m->count -= n;
>   m->from = n;
>  Done:
> - if (!copied)
> - copied = err;
> - else {
> + if (unlikely(!copied)) {
> + copied = m->count ? -EFAULT : err;
> + } else {
>   iocb->ki_pos += copied;
>   m->read_pos += copied;
>   }
> @@ -291,9 +286,6 @@ ssize_t seq_read_iter(struct kiocb *iocb, struct
> iov_iter *iter)
>  Enomem:
>   err = -ENOMEM;
>   goto Done;
> -Efault:
> - err = -EFAULT;
> - goto Done;
>  }
>  EXPORT_SYMBOL(seq_read_iter);

Re: [PATCH] bpf: Expose bpf_sk_storage_* to iterator programs

2020-11-13 Thread Florent Revest

On Thu, 2020-11-12 at 13:57 -0800, Martin KaFai Lau wrote:
> Test(s) is needed.  e.g. iterating a bpf_sk_storage_map and also
> calling bpf_sk_storage_get/delete.
> 
> I would expect to see another test/example showing how it works end-
> to-end to solve the problem you have in hand.
> This patch probably belongs to a longer series.

Fair point, I'll get that done, thank you!

> BTW, I am also enabling bpf_sk_storage_(get|delete) for
> FENTRY/FEXIT/RAW_TP but I think the conflict should be manageable.
> https://patchwork.ozlabs.org/project/netdev/patch/20201112211313.2587383-1-ka...@fb.com/

Thanks for the heads up, should be no problem :)

Re: [PATCH bpf-next 1/2] bpf: Add a bpf_kallsyms_lookup helper

2020-11-30 Thread Florent Revest

On Sat, 2020-11-28 at 17:07 -0800, Alexei Starovoitov wrote:
> On Thu, Nov 26, 2020 at 05:57:47PM +0100, Florent Revest wrote:
> > This helper exposes the kallsyms_lookup function to eBPF tracing
> > programs. This can be used to retrieve the name of the symbol at an
> > address. For example, when hooking into nf_register_net_hook, one
> > can
> > audit the name of the registered netfilter hook and potentially
> > also
> > the name of the module in which the symbol is located.
> > 
> > Signed-off-by: Florent Revest 
> > ---
> >  include/uapi/linux/bpf.h   | 16 +
> >  kernel/trace/bpf_trace.c   | 41
> > ++
> >  tools/include/uapi/linux/bpf.h | 16 +
> >  3 files changed, 73 insertions(+)
> > 
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index c3458ec1f30a..670998635eac 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -3817,6 +3817,21 @@ union bpf_attr {
> >   * The **hash_algo** is returned on success,
> >   * **-EOPNOTSUP** if IMA is disabled or **-EINVAL** if
> >   * invalid arguments are passed.
> > + *
> > + * long bpf_kallsyms_lookup(u64 address, char *symbol, u32
> > symbol_size, char *module, u32 module_size)
> > + * Description
> > + * Uses kallsyms to write the name of the symbol at
> > *address*
> > + * into *symbol* of size *symbol_sz*. This is guaranteed
> > to be
> > + * zero terminated.
> > + * If the symbol is in a module, up to *module_size* bytes
> > of
> > + * the module name is written in *module*. This is also
> > + * guaranteed to be zero-terminated. Note: a module name
> > + * is always shorter than 64 bytes.
> > + * Return
> > + * On success, the strictly positive length of the full
> > symbol
> > + * name, If this is greater than *symbol_size*, the
> > written
> > + * symbol is truncated.
> > + * On error, a negative value.
> 
> Looks like debug-only helper.
> I cannot think of a way to use in production code.
> What program suppose to do with that string?
> Do string compare? BPF side doesn't have a good way to do string
> manipulations.
> If you really need to print a symbolic name for a given address
> I'd rather extend bpf_trace_printk() to support %pS

We actually use this helper for auditing, not debugging.
We don't want to parse /proc/kallsyms from userspace because we have no
guarantee that the module will still be loaded by the time the event
reaches userspace (this is also faster in kernelspace).

[PATCH bpf-next v5 3/6] bpf: Expose bpf_sk_storage_* to iterator programs

2020-12-04 Thread Florent Revest

Iterators are currently used to expose kernel information to userspace
over fast procfs-like files but iterators could also be used to
manipulate local storage. For example, the task_file iterator could be
used to initialize a socket local storage with associations between
processes and sockets or to selectively delete local storage values.

Signed-off-by: Florent Revest 
Acked-by: Martin KaFai Lau 
Acked-by: KP Singh 
---
 net/core/bpf_sk_storage.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index a32037daa933..4edd033e899c 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -394,6 +394,7 @@ static bool bpf_sk_storage_tracing_allowed(const struct 
bpf_prog *prog)
 * use the bpf_sk_storage_(get|delete) helper.
 */
switch (prog->expected_attach_type) {
+   case BPF_TRACE_ITER:
case BPF_TRACE_RAW_TP:
/* bpf_sk_storage has no trace point */
return true;
-- 
2.29.2.576.ga3fc446d84-goog

[PATCH bpf-next v5 5/6] selftests/bpf: Add an iterator selftest for bpf_sk_storage_get

2020-12-04 Thread Florent Revest

The eBPF program iterates over all files and tasks. For all socket
files, it stores the tgid of the last task it encountered with a handle
to that socket. This is a heuristic for finding the "owner" of a socket
similar to what's done by lsof, ss, netstat or fuser. Potentially, this
information could be used from a cgroup_skb/*gress hook to try to
associate network traffic with processes.

The test makes sure that a socket it created is tagged with prog_tests's
pid.

Signed-off-by: Florent Revest 
Acked-by: Yonghong Song 
Acked-by: Martin KaFai Lau 
---
 .../selftests/bpf/prog_tests/bpf_iter.c   | 40 +++
 .../progs/bpf_iter_bpf_sk_storage_helpers.c   | 24 +++
 2 files changed, 64 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c 
b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
index bb4a638f2e6f..9336d0f18331 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
@@ -975,6 +975,44 @@ static void test_bpf_sk_storage_delete(void)
bpf_iter_bpf_sk_storage_helpers__destroy(skel);
 }
 
+/* This creates a socket and its local storage. It then runs a task_iter BPF
+ * program that replaces the existing socket local storage with the tgid of the
+ * only task owning a file descriptor to this socket, this process, prog_tests.
+ */
+static void test_bpf_sk_storage_get(void)
+{
+   struct bpf_iter_bpf_sk_storage_helpers *skel;
+   int err, map_fd, val = -1;
+   int sock_fd = -1;
+
+   skel = bpf_iter_bpf_sk_storage_helpers__open_and_load();
+   if (CHECK(!skel, "bpf_iter_bpf_sk_storage_helpers__open_and_load",
+ "skeleton open_and_load failed\n"))
+   return;
+
+   sock_fd = socket(AF_INET6, SOCK_STREAM, 0);
+   if (CHECK(sock_fd < 0, "socket", "errno: %d\n", errno))
+   goto out;
+
+   map_fd = bpf_map__fd(skel->maps.sk_stg_map);
+
+   err = bpf_map_update_elem(map_fd, _fd, , BPF_NOEXIST);
+   if (CHECK(err, "bpf_map_update_elem", "map_update_failed\n"))
+   goto close_socket;
+
+   do_dummy_read(skel->progs.fill_socket_owner);
+
+   err = bpf_map_lookup_elem(map_fd, _fd, );
+   CHECK(err || val != getpid(), "bpf_map_lookup_elem",
+ "map value wasn't set correctly (expected %d, got %d, err=%d)\n",
+ getpid(), val, err);
+
+close_socket:
+   close(sock_fd);
+out:
+   bpf_iter_bpf_sk_storage_helpers__destroy(skel);
+}
+
 static void test_bpf_sk_storage_map(void)
 {
DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
@@ -1131,6 +1169,8 @@ void test_bpf_iter(void)
test_bpf_sk_storage_map();
if (test__start_subtest("bpf_sk_storage_delete"))
test_bpf_sk_storage_delete();
+   if (test__start_subtest("bpf_sk_storage_get"))
+   test_bpf_sk_storage_get();
if (test__start_subtest("rdonly-buf-out-of-bound"))
test_rdonly_buf_out_of_bound();
if (test__start_subtest("buf-neg-offset"))
diff --git 
a/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c 
b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c
index 01ff3235e413..dde53df37de8 100644
--- a/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c
+++ b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c
@@ -21,3 +21,27 @@ int delete_bpf_sk_storage_map(struct 
bpf_iter__bpf_sk_storage_map *ctx)
 
return 0;
 }
+
+SEC("iter/task_file")
+int fill_socket_owner(struct bpf_iter__task_file *ctx)
+{
+   struct task_struct *task = ctx->task;
+   struct file *file = ctx->file;
+   struct socket *sock;
+   int *sock_tgid;
+
+   if (!task || !file)
+   return 0;
+
+   sock = bpf_sock_from_file(file);
+   if (!sock)
+   return 0;
+
+   sock_tgid = bpf_sk_storage_get(_stg_map, sock->sk, 0, 0);
+   if (!sock_tgid)
+   return 0;
+
+   *sock_tgid = task->tgid;
+
+   return 0;
+}
-- 
2.29.2.576.ga3fc446d84-goog

[PATCH bpf-next v5 1/6] net: Remove the err argument from sock_from_file

2020-12-04 Thread Florent Revest

Currently, the sock_from_file prototype takes an "err" pointer that is
either not set or set to -ENOTSOCK IFF the returned socket is NULL. This
makes the error redundant and it is ignored by a few callers.

This patch simplifies the API by letting callers deduce the error based
on whether the returned socket is NULL or not.

Suggested-by: Al Viro 
Signed-off-by: Florent Revest 
Reviewed-by: KP Singh 
---
 fs/eventpoll.c   |  3 +--
 fs/io_uring.c| 16 
 include/linux/net.h  |  2 +-
 net/core/netclassid_cgroup.c |  3 +--
 net/core/netprio_cgroup.c|  3 +--
 net/core/sock.c  |  8 +---
 net/socket.c | 27 ---
 7 files changed, 29 insertions(+), 33 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 73c346e503d7..19499b7bb82c 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -416,12 +416,11 @@ static inline void ep_set_busy_poll_napi_id(struct epitem 
*epi)
unsigned int napi_id;
struct socket *sock;
struct sock *sk;
-   int err;
 
if (!net_busy_loop_on())
return;
 
-   sock = sock_from_file(epi->ffd.file, );
+   sock = sock_from_file(epi->ffd.file);
if (!sock)
return;
 
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 8018c7076b25..ace99b15cbd3 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -4341,9 +4341,9 @@ static int io_sendmsg(struct io_kiocb *req, bool 
force_nonblock,
unsigned flags;
int ret;
 
-   sock = sock_from_file(req->file, );
+   sock = sock_from_file(req->file);
if (unlikely(!sock))
-   return ret;
+   return -ENOTSOCK;
 
if (req->async_data) {
kmsg = req->async_data;
@@ -4390,9 +4390,9 @@ static int io_send(struct io_kiocb *req, bool 
force_nonblock,
unsigned flags;
int ret;
 
-   sock = sock_from_file(req->file, );
+   sock = sock_from_file(req->file);
if (unlikely(!sock))
-   return ret;
+   return -ENOTSOCK;
 
ret = import_single_range(WRITE, sr->buf, sr->len, , _iter);
if (unlikely(ret))
@@ -4569,9 +4569,9 @@ static int io_recvmsg(struct io_kiocb *req, bool 
force_nonblock,
unsigned flags;
int ret, cflags = 0;
 
-   sock = sock_from_file(req->file, );
+   sock = sock_from_file(req->file);
if (unlikely(!sock))
-   return ret;
+   return -ENOTSOCK;
 
if (req->async_data) {
kmsg = req->async_data;
@@ -4632,9 +4632,9 @@ static int io_recv(struct io_kiocb *req, bool 
force_nonblock,
unsigned flags;
int ret, cflags = 0;
 
-   sock = sock_from_file(req->file, );
+   sock = sock_from_file(req->file);
if (unlikely(!sock))
-   return ret;
+   return -ENOTSOCK;
 
if (req->flags & REQ_F_BUFFER_SELECT) {
kbuf = io_recv_buffer_select(req, !force_nonblock);
diff --git a/include/linux/net.h b/include/linux/net.h
index 0dcd51feef02..9e2324efc26a 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -240,7 +240,7 @@ int sock_sendmsg(struct socket *sock, struct msghdr *msg);
 int sock_recvmsg(struct socket *sock, struct msghdr *msg, int flags);
 struct file *sock_alloc_file(struct socket *sock, int flags, const char 
*dname);
 struct socket *sockfd_lookup(int fd, int *err);
-struct socket *sock_from_file(struct file *file, int *err);
+struct socket *sock_from_file(struct file *file);
 #define sockfd_put(sock) fput(sock->file)
 int net_ratelimit(void);
 
diff --git a/net/core/netclassid_cgroup.c b/net/core/netclassid_cgroup.c
index 41b24cd31562..b49c57d35a88 100644
--- a/net/core/netclassid_cgroup.c
+++ b/net/core/netclassid_cgroup.c
@@ -68,9 +68,8 @@ struct update_classid_context {
 
 static int update_classid_sock(const void *v, struct file *file, unsigned n)
 {
-   int err;
struct update_classid_context *ctx = (void *)v;
-   struct socket *sock = sock_from_file(file, );
+   struct socket *sock = sock_from_file(file);
 
if (sock) {
spin_lock(_sk_update_lock);
diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
index 9bd4cab7d510..99a431c56f23 100644
--- a/net/core/netprio_cgroup.c
+++ b/net/core/netprio_cgroup.c
@@ -220,8 +220,7 @@ static ssize_t write_priomap(struct kernfs_open_file *of,
 
 static int update_netprio(const void *v, struct file *file, unsigned n)
 {
-   int err;
-   struct socket *sock = sock_from_file(file, );
+   struct socket *sock = sock_from_file(file);
if (sock) {
spin_lock(_sk_update_lock);
sock_cgroup_set_prioidx(>sk->sk_cgrp_data,
diff --git a/net/core/sock.c b/net/core/sock.c
index d422a6808405..eb55cf79bb24 100644
--- a/net/core/sock.c
+++ b/net/core/sock.

[PATCH bpf-next v5 6/6] selftests/bpf: Test bpf_sk_storage_get in tcp iterators

2020-12-04 Thread Florent Revest

This extends the existing bpf_sk_storage_get test where a socket is
created and tagged with its creator's pid by a task_file iterator.

A TCP iterator is now also used at the end of the test to negate the
values already stored in the local storage. The test therefore expects
-getpid() to be stored in the local storage.

Signed-off-by: Florent Revest 
Acked-by: Yonghong Song 
Acked-by: Martin KaFai Lau 
---
 .../selftests/bpf/prog_tests/bpf_iter.c| 18 --
 .../progs/bpf_iter_bpf_sk_storage_helpers.c| 18 ++
 2 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c 
b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
index 9336d0f18331..0e586368948d 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
@@ -978,6 +978,8 @@ static void test_bpf_sk_storage_delete(void)
 /* This creates a socket and its local storage. It then runs a task_iter BPF
  * program that replaces the existing socket local storage with the tgid of the
  * only task owning a file descriptor to this socket, this process, prog_tests.
+ * It then runs a tcp socket iterator that negates the value in the existing
+ * socket local storage, the test verifies that the resulting value is -pid.
  */
 static void test_bpf_sk_storage_get(void)
 {
@@ -994,6 +996,10 @@ static void test_bpf_sk_storage_get(void)
if (CHECK(sock_fd < 0, "socket", "errno: %d\n", errno))
goto out;
 
+   err = listen(sock_fd, 1);
+   if (CHECK(err != 0, "listen", "errno: %d\n", errno))
+   goto close_socket;
+
map_fd = bpf_map__fd(skel->maps.sk_stg_map);
 
err = bpf_map_update_elem(map_fd, _fd, , BPF_NOEXIST);
@@ -1003,9 +1009,17 @@ static void test_bpf_sk_storage_get(void)
do_dummy_read(skel->progs.fill_socket_owner);
 
err = bpf_map_lookup_elem(map_fd, _fd, );
-   CHECK(err || val != getpid(), "bpf_map_lookup_elem",
+   if (CHECK(err || val != getpid(), "bpf_map_lookup_elem",
+   "map value wasn't set correctly (expected %d, got %d, err=%d)\n",
+   getpid(), val, err))
+   goto close_socket;
+
+   do_dummy_read(skel->progs.negate_socket_local_storage);
+
+   err = bpf_map_lookup_elem(map_fd, _fd, );
+   CHECK(err || val != -getpid(), "bpf_map_lookup_elem",
  "map value wasn't set correctly (expected %d, got %d, err=%d)\n",
- getpid(), val, err);
+ -getpid(), val, err);
 
 close_socket:
close(sock_fd);
diff --git 
a/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c 
b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c
index dde53df37de8..6cecab2b32ba 100644
--- a/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c
+++ b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c
@@ -45,3 +45,21 @@ int fill_socket_owner(struct bpf_iter__task_file *ctx)
 
return 0;
 }
+
+SEC("iter/tcp")
+int negate_socket_local_storage(struct bpf_iter__tcp *ctx)
+{
+   struct sock_common *sk_common = ctx->sk_common;
+   int *sock_tgid;
+
+   if (!sk_common)
+   return 0;
+
+   sock_tgid = bpf_sk_storage_get(_stg_map, sk_common, 0, 0);
+   if (!sock_tgid)
+   return 0;
+
+   *sock_tgid = -*sock_tgid;
+
+   return 0;
+}
-- 
2.29.2.576.ga3fc446d84-goog

[PATCH bpf-next v5 4/6] selftests/bpf: Add an iterator selftest for bpf_sk_storage_delete

2020-12-04 Thread Florent Revest

The eBPF program iterates over all entries (well, only one) of a socket
local storage map and deletes them all. The test makes sure that the
entry is indeed deleted.

Signed-off-by: Florent Revest 
Acked-by: Martin KaFai Lau 
---
 .../selftests/bpf/prog_tests/bpf_iter.c   | 64 +++
 .../progs/bpf_iter_bpf_sk_storage_helpers.c   | 23 +++
 2 files changed, 87 insertions(+)
 create mode 100644 
tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c 
b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
index 448885b95eed..bb4a638f2e6f 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
@@ -20,6 +20,7 @@
 #include "bpf_iter_bpf_percpu_hash_map.skel.h"
 #include "bpf_iter_bpf_array_map.skel.h"
 #include "bpf_iter_bpf_percpu_array_map.skel.h"
+#include "bpf_iter_bpf_sk_storage_helpers.skel.h"
 #include "bpf_iter_bpf_sk_storage_map.skel.h"
 #include "bpf_iter_test_kern5.skel.h"
 #include "bpf_iter_test_kern6.skel.h"
@@ -913,6 +914,67 @@ static void test_bpf_percpu_array_map(void)
bpf_iter_bpf_percpu_array_map__destroy(skel);
 }
 
+/* An iterator program deletes all local storage in a map. */
+static void test_bpf_sk_storage_delete(void)
+{
+   DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
+   struct bpf_iter_bpf_sk_storage_helpers *skel;
+   union bpf_iter_link_info linfo;
+   int err, len, map_fd, iter_fd;
+   struct bpf_link *link;
+   int sock_fd = -1;
+   __u32 val = 42;
+   char buf[64];
+
+   skel = bpf_iter_bpf_sk_storage_helpers__open_and_load();
+   if (CHECK(!skel, "bpf_iter_bpf_sk_storage_helpers__open_and_load",
+ "skeleton open_and_load failed\n"))
+   return;
+
+   map_fd = bpf_map__fd(skel->maps.sk_stg_map);
+
+   sock_fd = socket(AF_INET6, SOCK_STREAM, 0);
+   if (CHECK(sock_fd < 0, "socket", "errno: %d\n", errno))
+   goto out;
+   err = bpf_map_update_elem(map_fd, _fd, , BPF_NOEXIST);
+   if (CHECK(err, "map_update", "map_update failed\n"))
+   goto out;
+
+   memset(, 0, sizeof(linfo));
+   linfo.map.map_fd = map_fd;
+   opts.link_info = 
+   opts.link_info_len = sizeof(linfo);
+   link = bpf_program__attach_iter(skel->progs.delete_bpf_sk_storage_map,
+   );
+   if (CHECK(IS_ERR(link), "attach_iter", "attach_iter failed\n"))
+   goto out;
+
+   iter_fd = bpf_iter_create(bpf_link__fd(link));
+   if (CHECK(iter_fd < 0, "create_iter", "create_iter failed\n"))
+   goto free_link;
+
+   /* do some tests */
+   while ((len = read(iter_fd, buf, sizeof(buf))) > 0)
+   ;
+   if (CHECK(len < 0, "read", "read failed: %s\n", strerror(errno)))
+   goto close_iter;
+
+   /* test results */
+   err = bpf_map_lookup_elem(map_fd, _fd, );
+   if (CHECK(!err || errno != ENOENT, "bpf_map_lookup_elem",
+ "map value wasn't deleted (err=%d, errno=%d)\n", err, errno))
+   goto close_iter;
+
+close_iter:
+   close(iter_fd);
+free_link:
+   bpf_link__destroy(link);
+out:
+   if (sock_fd >= 0)
+   close(sock_fd);
+   bpf_iter_bpf_sk_storage_helpers__destroy(skel);
+}
+
 static void test_bpf_sk_storage_map(void)
 {
DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
@@ -1067,6 +1129,8 @@ void test_bpf_iter(void)
test_bpf_percpu_array_map();
if (test__start_subtest("bpf_sk_storage_map"))
test_bpf_sk_storage_map();
+   if (test__start_subtest("bpf_sk_storage_delete"))
+   test_bpf_sk_storage_delete();
if (test__start_subtest("rdonly-buf-out-of-bound"))
test_rdonly_buf_out_of_bound();
if (test__start_subtest("buf-neg-offset"))
diff --git 
a/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c 
b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c
new file mode 100644
index ..01ff3235e413
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c
@@ -0,0 +1,23 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020 Google LLC. */
+#include "bpf_iter.h"
+#include 
+#include 
+
+char _license[] SEC("license") = "GPL";
+
+struct {
+   __uint(type, BPF_MAP_TYPE_SK_STORAGE);
+   __uint(map_flags, BPF_F_NO_PREALLOC);
+   __type(key, int);
+   __type(value, int);
+} sk_stg_map SEC(".maps");
+
+SEC("iter/bpf_sk_storage_map")
+int delete_bpf_sk_storage_map(struct bpf_iter__bpf_sk_storage_map *ctx)
+{
+   if (ctx->sk)
+   bpf_sk_storage_delete(_stg_map, ctx->sk);
+
+   return 0;
+}
-- 
2.29.2.576.ga3fc446d84-goog

[PATCH bpf-next v5 2/6] bpf: Add a bpf_sock_from_file helper

2020-12-04 Thread Florent Revest

While eBPF programs can check whether a file is a socket by file->f_op
== _file_ops, they cannot convert the void private_data pointer
to a struct socket BTF pointer. In order to do this a new helper
wrapping sock_from_file is added.

This is useful to tracing programs but also other program types
inheriting this set of helpers such as iterators or LSM programs.

Signed-off-by: Florent Revest 
Acked-by: KP Singh 
Acked-by: Martin KaFai Lau 
---
 include/uapi/linux/bpf.h   |  9 +
 kernel/trace/bpf_trace.c   | 20 
 scripts/bpf_helpers_doc.py |  4 
 tools/include/uapi/linux/bpf.h |  9 +
 4 files changed, 42 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 1233f14f659f..30b477a26482 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3822,6 +3822,14 @@ union bpf_attr {
  * The **hash_algo** is returned on success,
  * **-EOPNOTSUP** if IMA is disabled or **-EINVAL** if
  * invalid arguments are passed.
+ *
+ * struct socket *bpf_sock_from_file(struct file *file)
+ * Description
+ * If the given file represents a socket, returns the associated
+ * socket.
+ * Return
+ * A pointer to a struct socket on success or NULL if the file is
+ * not a socket.
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -3986,6 +3994,7 @@ union bpf_attr {
FN(bprm_opts_set),  \
FN(ktime_get_coarse_ns),\
FN(ima_inode_hash), \
+   FN(sock_from_file), \
/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 23a390aac524..acbe76790996 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1260,6 +1260,24 @@ const struct bpf_func_proto bpf_snprintf_btf_proto = {
.arg5_type  = ARG_ANYTHING,
 };
 
+BPF_CALL_1(bpf_sock_from_file, struct file *, file)
+{
+   return (unsigned long) sock_from_file(file);
+}
+
+BTF_ID_LIST(bpf_sock_from_file_btf_ids)
+BTF_ID(struct, socket)
+BTF_ID(struct, file)
+
+static const struct bpf_func_proto bpf_sock_from_file_proto = {
+   .func   = bpf_sock_from_file,
+   .gpl_only   = false,
+   .ret_type   = RET_PTR_TO_BTF_ID_OR_NULL,
+   .ret_btf_id = _sock_from_file_btf_ids[0],
+   .arg1_type  = ARG_PTR_TO_BTF_ID,
+   .arg1_btf_id= _sock_from_file_btf_ids[1],
+};
+
 const struct bpf_func_proto *
 bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
@@ -1356,6 +1374,8 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const 
struct bpf_prog *prog)
return _per_cpu_ptr_proto;
case BPF_FUNC_bpf_this_cpu_ptr:
return _this_cpu_ptr_proto;
+   case BPF_FUNC_sock_from_file:
+   return _sock_from_file_proto;
default:
return NULL;
}
diff --git a/scripts/bpf_helpers_doc.py b/scripts/bpf_helpers_doc.py
index 8b829748d488..867ada23281c 100755
--- a/scripts/bpf_helpers_doc.py
+++ b/scripts/bpf_helpers_doc.py
@@ -437,6 +437,8 @@ class PrinterHelpers(Printer):
 'struct path',
 'struct btf_ptr',
 'struct inode',
+'struct socket',
+'struct file',
 ]
 known_types = {
 '...',
@@ -482,6 +484,8 @@ class PrinterHelpers(Printer):
 'struct path',
 'struct btf_ptr',
 'struct inode',
+'struct socket',
+'struct file',
 }
 mapped_types = {
 'u8': '__u8',
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 1233f14f659f..30b477a26482 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -3822,6 +3822,14 @@ union bpf_attr {
  * The **hash_algo** is returned on success,
  * **-EOPNOTSUP** if IMA is disabled or **-EINVAL** if
  * invalid arguments are passed.
+ *
+ * struct socket *bpf_sock_from_file(struct file *file)
+ * Description
+ * If the given file represents a socket, returns the associated
+ * socket.
+ * Return
+ * A pointer to a struct socket on success or NULL if the file is
+ * not a socket.
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -3986,6 +3994,7 @@ union bpf_attr {
FN(bprm_opts_set),  \
FN(ktime_get_coarse_ns),\
FN(ima_inode_hash), \
+   FN(sock_from_file), \
/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
-- 
2.29.2.576.ga3fc446d84-goog

Re: [PATCH bpf-next v4 6/6] bpf: Test bpf_sk_storage_get in tcp iterators

2020-12-04 Thread Florent Revest

On Thu, 2020-12-03 at 18:05 -0800, Martin KaFai Lau wrote:
> On Wed, Dec 02, 2020 at 09:55:27PM +0100, Florent Revest wrote:
> > This extends the existing bpf_sk_storage_get test where a socket is
> > created and tagged with its creator's pid by a task_file iterator.
> > 
> > A TCP iterator is now also used at the end of the test to negate
> > the
> > values already stored in the local storage. The test therefore
> > expects
> > -getpid() to be stored in the local storage.
> > 
> > Signed-off-by: Florent Revest 
> > Acked-by: Yonghong Song 
> > ---
> >  .../selftests/bpf/prog_tests/bpf_iter.c| 13 +
> >  .../progs/bpf_iter_bpf_sk_storage_helpers.c| 18
> > ++
> >  2 files changed, 31 insertions(+)
> > 
> > diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
> > b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
> > index 9336d0f18331..b8362147c9e3 100644
> > --- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
> > +++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
> > @@ -978,6 +978,8 @@ static void test_bpf_sk_storage_delete(void)
> >  /* This creates a socket and its local storage. It then runs a
> > task_iter BPF
> >   * program that replaces the existing socket local storage with
> > the tgid of the
> >   * only task owning a file descriptor to this socket, this
> > process, prog_tests.
> > + * It then runs a tcp socket iterator that negates the value in
> > the existing
> > + * socket local storage, the test verifies that the resulting
> > value is -pid.
> >   */
> >  static void test_bpf_sk_storage_get(void)
> >  {
> > @@ -994,6 +996,10 @@ static void test_bpf_sk_storage_get(void)
> > if (CHECK(sock_fd < 0, "socket", "errno: %d\n", errno))
> > goto out;
> >  
> > +   err = listen(sock_fd, 1);
> > +   if (CHECK(err != 0, "listen", "errno: %d\n", errno))
> > +   goto out;
> 
>   goto close_socket;
> 
> > +
> > map_fd = bpf_map__fd(skel->maps.sk_stg_map);
> >  
> > err = bpf_map_update_elem(map_fd, _fd, , BPF_NOEXIST);
> > @@ -1007,6 +1013,13 @@ static void test_bpf_sk_storage_get(void)
> >   "map value wasn't set correctly (expected %d, got %d,
> > err=%d)\n",
> >   getpid(), val, err);
> The failure of this CHECK here should "goto close_socket;" now.
> 
> Others LGTM.
> 
> Acked-by: Martin KaFai Lau 

Ah good points, thanks! Fixed in v5 :)

Re: [PATCH bpf-next 1/2] bpf: Add a bpf_kallsyms_lookup helper

2020-12-01 Thread Florent Revest

On Mon, 2020-11-30 at 18:41 -0800, Alexei Starovoitov wrote:
> On Mon, Nov 30, 2020 at 05:23:22PM +0100, Florent Revest wrote:
> > On Sat, 2020-11-28 at 17:07 -0800, Alexei Starovoitov wrote:
> > > Looks like debug-only helper.
> > > I cannot think of a way to use in production code.
> > > What program suppose to do with that string?
> > > Do string compare? BPF side doesn't have a good way to do string
> > > manipulations.
> > > If you really need to print a symbolic name for a given address
> > > I'd rather extend bpf_trace_printk() to support %pS
> > 
> > We actually use this helper for auditing, not debugging.
> > We don't want to parse /proc/kallsyms from userspace because we
> > have no guarantee that the module will still be loaded by the time
> > the event reaches userspace (this is also faster in kernelspace).
> 
> so what are you going to do with that string?
> print it? send to user space via ring buffer?

We send our auditing events down to the userspace via a ring buffer and
then events are aggregated and looked at by security analysts. Having
the symbol and module names instead of a hex address makes these events
more meaningful.

> Where are you getting that $pc ?

I give an example in the commit description: we hook into callback
registration functions (for example, nf_register_net_hook), get the
callback address from the function arguments and log audit information
about the registered callback. For example, we want to know the name of
the module in which the callback belongs and the symbol name also helps
enrich the event.

Re: [PATCH bpf-next 1/2] bpf: Add a bpf_kallsyms_lookup helper

2020-12-02 Thread Florent Revest

On Tue, 2020-12-01 at 16:55 -0800, Andrii Nakryiko wrote:
> On Fri, Nov 27, 2020 at 8:09 AM Yonghong Song  wrote:
> > 
> > 
> > On 11/27/20 3:20 AM, KP Singh wrote:
> > > On Fri, Nov 27, 2020 at 8:35 AM Yonghong Song  wrote:
> > > > 
> > > > In this case, module name may be truncated and user did not get
> > > > any indication from return value. In the helper description, it
> > > > is mentioned that module name currently is most 64 bytes. But
> > > > from UAPI perspective, it may be still good to return something
> > > > to let user know the name is truncated.
> > > > 
> > > > I do not know what is the best way to do this. One suggestion
> > > > is to break it into two helpers, one for symbol name and
> > > > another
> > > 
> > > I think it would be slightly preferable to have one helper
> > > though. maybe something like bpf_get_symbol_info (better names
> > > anyone? :)) with flags to get the module name or the symbol name
> > > depending
> > > on the flag?
> > 
> > This works even better. Previously I am thinking if we have two
> > helpers,
> > we can add flags for each of them for future extension. But we
> > can certainly have just one helper with flags to indicate
> > whether this is for module name or for symbol name or something
> > else.
> > 
> > The buffer can be something like
> > union bpf_ksymbol_info {
> >char   module_name[];
> >char   symbol_name[];
> >...
> > }
> > and flags will indicate what information user wants.
> 
> one more thing that might be useful to resolve to the symbol's "base
> address". E.g., if we have IP inside the function, this would resolve
> to the start of the function, sort of "canonical" symbol address.
> Type of ksym is another "characteristic" which could be returned (as
> a single char?)
> 
> I wouldn't define bpf_ksymbol_info, though. Just depending on the
> flag, specify what kind of memory layou (e.g., for strings -
> zero-terminated string, for address - 8 byte numbers, etc). That way
> we can also allow fetching multiple things together, they would just
> be laid out one after another in memory.
> 
> E.g.:
> 
> char buf[256];
> int err = bpf_ksym_resolve(, BPF_KSYM_NAME | BPF_KSYM_MODNAME |
> BPF_KSYM_BASE_ADDR, buf, sizeof(buf));
> 
> if (err == -E2BIG)
>   /* need bigger buffer, but all the data up to truncation point is
> filled in */
> else
>   /* err has exact number of bytes used, including zero terminator(s)
> */
>   /* data is laid out as
> "cpufreq_gov_powersave_init\0cpufreq_powersave\0\x12\x23\x45\x56\x12\
> x23\x45\x56"
> */

Great idea! I like that, thanks for the suggestion :)

[PATCH bpf-next v4 4/6] bpf: Add an iterator selftest for bpf_sk_storage_delete

2020-12-02 Thread Florent Revest

The eBPF program iterates over all entries (well, only one) of a socket
local storage map and deletes them all. The test makes sure that the
entry is indeed deleted.

Signed-off-by: Florent Revest 
Acked-by: Martin KaFai Lau 
---
 .../selftests/bpf/prog_tests/bpf_iter.c   | 64 +++
 .../progs/bpf_iter_bpf_sk_storage_helpers.c   | 23 +++
 2 files changed, 87 insertions(+)
 create mode 100644 
tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c 
b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
index 448885b95eed..bb4a638f2e6f 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
@@ -20,6 +20,7 @@
 #include "bpf_iter_bpf_percpu_hash_map.skel.h"
 #include "bpf_iter_bpf_array_map.skel.h"
 #include "bpf_iter_bpf_percpu_array_map.skel.h"
+#include "bpf_iter_bpf_sk_storage_helpers.skel.h"
 #include "bpf_iter_bpf_sk_storage_map.skel.h"
 #include "bpf_iter_test_kern5.skel.h"
 #include "bpf_iter_test_kern6.skel.h"
@@ -913,6 +914,67 @@ static void test_bpf_percpu_array_map(void)
bpf_iter_bpf_percpu_array_map__destroy(skel);
 }
 
+/* An iterator program deletes all local storage in a map. */
+static void test_bpf_sk_storage_delete(void)
+{
+   DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
+   struct bpf_iter_bpf_sk_storage_helpers *skel;
+   union bpf_iter_link_info linfo;
+   int err, len, map_fd, iter_fd;
+   struct bpf_link *link;
+   int sock_fd = -1;
+   __u32 val = 42;
+   char buf[64];
+
+   skel = bpf_iter_bpf_sk_storage_helpers__open_and_load();
+   if (CHECK(!skel, "bpf_iter_bpf_sk_storage_helpers__open_and_load",
+ "skeleton open_and_load failed\n"))
+   return;
+
+   map_fd = bpf_map__fd(skel->maps.sk_stg_map);
+
+   sock_fd = socket(AF_INET6, SOCK_STREAM, 0);
+   if (CHECK(sock_fd < 0, "socket", "errno: %d\n", errno))
+   goto out;
+   err = bpf_map_update_elem(map_fd, _fd, , BPF_NOEXIST);
+   if (CHECK(err, "map_update", "map_update failed\n"))
+   goto out;
+
+   memset(, 0, sizeof(linfo));
+   linfo.map.map_fd = map_fd;
+   opts.link_info = 
+   opts.link_info_len = sizeof(linfo);
+   link = bpf_program__attach_iter(skel->progs.delete_bpf_sk_storage_map,
+   );
+   if (CHECK(IS_ERR(link), "attach_iter", "attach_iter failed\n"))
+   goto out;
+
+   iter_fd = bpf_iter_create(bpf_link__fd(link));
+   if (CHECK(iter_fd < 0, "create_iter", "create_iter failed\n"))
+   goto free_link;
+
+   /* do some tests */
+   while ((len = read(iter_fd, buf, sizeof(buf))) > 0)
+   ;
+   if (CHECK(len < 0, "read", "read failed: %s\n", strerror(errno)))
+   goto close_iter;
+
+   /* test results */
+   err = bpf_map_lookup_elem(map_fd, _fd, );
+   if (CHECK(!err || errno != ENOENT, "bpf_map_lookup_elem",
+ "map value wasn't deleted (err=%d, errno=%d)\n", err, errno))
+   goto close_iter;
+
+close_iter:
+   close(iter_fd);
+free_link:
+   bpf_link__destroy(link);
+out:
+   if (sock_fd >= 0)
+   close(sock_fd);
+   bpf_iter_bpf_sk_storage_helpers__destroy(skel);
+}
+
 static void test_bpf_sk_storage_map(void)
 {
DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
@@ -1067,6 +1129,8 @@ void test_bpf_iter(void)
test_bpf_percpu_array_map();
if (test__start_subtest("bpf_sk_storage_map"))
test_bpf_sk_storage_map();
+   if (test__start_subtest("bpf_sk_storage_delete"))
+   test_bpf_sk_storage_delete();
if (test__start_subtest("rdonly-buf-out-of-bound"))
test_rdonly_buf_out_of_bound();
if (test__start_subtest("buf-neg-offset"))
diff --git 
a/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c 
b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c
new file mode 100644
index ..01ff3235e413
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c
@@ -0,0 +1,23 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020 Google LLC. */
+#include "bpf_iter.h"
+#include 
+#include 
+
+char _license[] SEC("license") = "GPL";
+
+struct {
+   __uint(type, BPF_MAP_TYPE_SK_STORAGE);
+   __uint(map_flags, BPF_F_NO_PREALLOC);
+   __type(key, int);
+   __type(value, int);
+} sk_stg_map SEC(".maps");
+
+SEC("iter/bpf_sk_storage_map")
+int delete_bpf_sk_storage_map(struct bpf_iter__bpf_sk_storage_map *ctx)
+{
+   if (ctx->sk)
+   bpf_sk_storage_delete(_stg_map, ctx->sk);
+
+   return 0;
+}
-- 
2.29.2.454.gaff20da3a2-goog

[PATCH bpf-next v4 6/6] bpf: Test bpf_sk_storage_get in tcp iterators

2020-12-02 Thread Florent Revest

This extends the existing bpf_sk_storage_get test where a socket is
created and tagged with its creator's pid by a task_file iterator.

A TCP iterator is now also used at the end of the test to negate the
values already stored in the local storage. The test therefore expects
-getpid() to be stored in the local storage.

Signed-off-by: Florent Revest 
Acked-by: Yonghong Song 
---
 .../selftests/bpf/prog_tests/bpf_iter.c| 13 +
 .../progs/bpf_iter_bpf_sk_storage_helpers.c| 18 ++
 2 files changed, 31 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c 
b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
index 9336d0f18331..b8362147c9e3 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
@@ -978,6 +978,8 @@ static void test_bpf_sk_storage_delete(void)
 /* This creates a socket and its local storage. It then runs a task_iter BPF
  * program that replaces the existing socket local storage with the tgid of the
  * only task owning a file descriptor to this socket, this process, prog_tests.
+ * It then runs a tcp socket iterator that negates the value in the existing
+ * socket local storage, the test verifies that the resulting value is -pid.
  */
 static void test_bpf_sk_storage_get(void)
 {
@@ -994,6 +996,10 @@ static void test_bpf_sk_storage_get(void)
if (CHECK(sock_fd < 0, "socket", "errno: %d\n", errno))
goto out;
 
+   err = listen(sock_fd, 1);
+   if (CHECK(err != 0, "listen", "errno: %d\n", errno))
+   goto out;
+
map_fd = bpf_map__fd(skel->maps.sk_stg_map);
 
err = bpf_map_update_elem(map_fd, _fd, , BPF_NOEXIST);
@@ -1007,6 +1013,13 @@ static void test_bpf_sk_storage_get(void)
  "map value wasn't set correctly (expected %d, got %d, err=%d)\n",
  getpid(), val, err);
 
+   do_dummy_read(skel->progs.negate_socket_local_storage);
+
+   err = bpf_map_lookup_elem(map_fd, _fd, );
+   CHECK(err || val != -getpid(), "bpf_map_lookup_elem",
+ "map value wasn't set correctly (expected %d, got %d, err=%d)\n",
+ -getpid(), val, err);
+
 close_socket:
close(sock_fd);
 out:
diff --git 
a/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c 
b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c
index dde53df37de8..6cecab2b32ba 100644
--- a/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c
+++ b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c
@@ -45,3 +45,21 @@ int fill_socket_owner(struct bpf_iter__task_file *ctx)
 
return 0;
 }
+
+SEC("iter/tcp")
+int negate_socket_local_storage(struct bpf_iter__tcp *ctx)
+{
+   struct sock_common *sk_common = ctx->sk_common;
+   int *sock_tgid;
+
+   if (!sk_common)
+   return 0;
+
+   sock_tgid = bpf_sk_storage_get(_stg_map, sk_common, 0, 0);
+   if (!sock_tgid)
+   return 0;
+
+   *sock_tgid = -*sock_tgid;
+
+   return 0;
+}
-- 
2.29.2.454.gaff20da3a2-goog

[PATCH bpf-next v4 5/6] bpf: Add an iterator selftest for bpf_sk_storage_get

2020-12-02 Thread Florent Revest

The eBPF program iterates over all files and tasks. For all socket
files, it stores the tgid of the last task it encountered with a handle
to that socket. This is a heuristic for finding the "owner" of a socket
similar to what's done by lsof, ss, netstat or fuser. Potentially, this
information could be used from a cgroup_skb/*gress hook to try to
associate network traffic with processes.

The test makes sure that a socket it created is tagged with prog_tests's
pid.

Signed-off-by: Florent Revest 
Acked-by: Yonghong Song 
---
 .../selftests/bpf/prog_tests/bpf_iter.c   | 40 +++
 .../progs/bpf_iter_bpf_sk_storage_helpers.c   | 24 +++
 2 files changed, 64 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c 
b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
index bb4a638f2e6f..9336d0f18331 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
@@ -975,6 +975,44 @@ static void test_bpf_sk_storage_delete(void)
bpf_iter_bpf_sk_storage_helpers__destroy(skel);
 }
 
+/* This creates a socket and its local storage. It then runs a task_iter BPF
+ * program that replaces the existing socket local storage with the tgid of the
+ * only task owning a file descriptor to this socket, this process, prog_tests.
+ */
+static void test_bpf_sk_storage_get(void)
+{
+   struct bpf_iter_bpf_sk_storage_helpers *skel;
+   int err, map_fd, val = -1;
+   int sock_fd = -1;
+
+   skel = bpf_iter_bpf_sk_storage_helpers__open_and_load();
+   if (CHECK(!skel, "bpf_iter_bpf_sk_storage_helpers__open_and_load",
+ "skeleton open_and_load failed\n"))
+   return;
+
+   sock_fd = socket(AF_INET6, SOCK_STREAM, 0);
+   if (CHECK(sock_fd < 0, "socket", "errno: %d\n", errno))
+   goto out;
+
+   map_fd = bpf_map__fd(skel->maps.sk_stg_map);
+
+   err = bpf_map_update_elem(map_fd, _fd, , BPF_NOEXIST);
+   if (CHECK(err, "bpf_map_update_elem", "map_update_failed\n"))
+   goto close_socket;
+
+   do_dummy_read(skel->progs.fill_socket_owner);
+
+   err = bpf_map_lookup_elem(map_fd, _fd, );
+   CHECK(err || val != getpid(), "bpf_map_lookup_elem",
+ "map value wasn't set correctly (expected %d, got %d, err=%d)\n",
+ getpid(), val, err);
+
+close_socket:
+   close(sock_fd);
+out:
+   bpf_iter_bpf_sk_storage_helpers__destroy(skel);
+}
+
 static void test_bpf_sk_storage_map(void)
 {
DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
@@ -1131,6 +1169,8 @@ void test_bpf_iter(void)
test_bpf_sk_storage_map();
if (test__start_subtest("bpf_sk_storage_delete"))
test_bpf_sk_storage_delete();
+   if (test__start_subtest("bpf_sk_storage_get"))
+   test_bpf_sk_storage_get();
if (test__start_subtest("rdonly-buf-out-of-bound"))
test_rdonly_buf_out_of_bound();
if (test__start_subtest("buf-neg-offset"))
diff --git 
a/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c 
b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c
index 01ff3235e413..dde53df37de8 100644
--- a/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c
+++ b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c
@@ -21,3 +21,27 @@ int delete_bpf_sk_storage_map(struct 
bpf_iter__bpf_sk_storage_map *ctx)
 
return 0;
 }
+
+SEC("iter/task_file")
+int fill_socket_owner(struct bpf_iter__task_file *ctx)
+{
+   struct task_struct *task = ctx->task;
+   struct file *file = ctx->file;
+   struct socket *sock;
+   int *sock_tgid;
+
+   if (!task || !file)
+   return 0;
+
+   sock = bpf_sock_from_file(file);
+   if (!sock)
+   return 0;
+
+   sock_tgid = bpf_sk_storage_get(_stg_map, sock->sk, 0, 0);
+   if (!sock_tgid)
+   return 0;
+
+   *sock_tgid = task->tgid;
+
+   return 0;
+}
-- 
2.29.2.454.gaff20da3a2-goog

[PATCH bpf-next v4 3/6] bpf: Expose bpf_sk_storage_* to iterator programs

2020-12-02 Thread Florent Revest

Iterators are currently used to expose kernel information to userspace
over fast procfs-like files but iterators could also be used to
manipulate local storage. For example, the task_file iterator could be
used to initialize a socket local storage with associations between
processes and sockets or to selectively delete local storage values.

Signed-off-by: Florent Revest 
Acked-by: Martin KaFai Lau 
Acked-by: KP Singh 
---
 net/core/bpf_sk_storage.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index a32037daa933..4edd033e899c 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -394,6 +394,7 @@ static bool bpf_sk_storage_tracing_allowed(const struct 
bpf_prog *prog)
 * use the bpf_sk_storage_(get|delete) helper.
 */
switch (prog->expected_attach_type) {
+   case BPF_TRACE_ITER:
case BPF_TRACE_RAW_TP:
/* bpf_sk_storage has no trace point */
return true;
-- 
2.29.2.454.gaff20da3a2-goog

[PATCH bpf-next v4 1/6] net: Remove the err argument from sock_from_file

2020-12-02 Thread Florent Revest

Currently, the sock_from_file prototype takes an "err" pointer that is
either not set or set to -ENOTSOCK IFF the returned socket is NULL. This
makes the error redundant and it is ignored by a few callers.

This patch simplifies the API by letting callers deduce the error based
on whether the returned socket is NULL or not.

Suggested-by: Al Viro 
Signed-off-by: Florent Revest 
Reviewed-by: KP Singh 
---
 fs/eventpoll.c   |  3 +--
 fs/io_uring.c| 16 
 include/linux/net.h  |  2 +-
 net/core/netclassid_cgroup.c |  3 +--
 net/core/netprio_cgroup.c|  3 +--
 net/core/sock.c  |  8 +---
 net/socket.c | 27 ---
 7 files changed, 29 insertions(+), 33 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 73c346e503d7..19499b7bb82c 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -416,12 +416,11 @@ static inline void ep_set_busy_poll_napi_id(struct epitem 
*epi)
unsigned int napi_id;
struct socket *sock;
struct sock *sk;
-   int err;
 
if (!net_busy_loop_on())
return;
 
-   sock = sock_from_file(epi->ffd.file, );
+   sock = sock_from_file(epi->ffd.file);
if (!sock)
return;
 
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 8018c7076b25..ace99b15cbd3 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -4341,9 +4341,9 @@ static int io_sendmsg(struct io_kiocb *req, bool 
force_nonblock,
unsigned flags;
int ret;
 
-   sock = sock_from_file(req->file, );
+   sock = sock_from_file(req->file);
if (unlikely(!sock))
-   return ret;
+   return -ENOTSOCK;
 
if (req->async_data) {
kmsg = req->async_data;
@@ -4390,9 +4390,9 @@ static int io_send(struct io_kiocb *req, bool 
force_nonblock,
unsigned flags;
int ret;
 
-   sock = sock_from_file(req->file, );
+   sock = sock_from_file(req->file);
if (unlikely(!sock))
-   return ret;
+   return -ENOTSOCK;
 
ret = import_single_range(WRITE, sr->buf, sr->len, , _iter);
if (unlikely(ret))
@@ -4569,9 +4569,9 @@ static int io_recvmsg(struct io_kiocb *req, bool 
force_nonblock,
unsigned flags;
int ret, cflags = 0;
 
-   sock = sock_from_file(req->file, );
+   sock = sock_from_file(req->file);
if (unlikely(!sock))
-   return ret;
+   return -ENOTSOCK;
 
if (req->async_data) {
kmsg = req->async_data;
@@ -4632,9 +4632,9 @@ static int io_recv(struct io_kiocb *req, bool 
force_nonblock,
unsigned flags;
int ret, cflags = 0;
 
-   sock = sock_from_file(req->file, );
+   sock = sock_from_file(req->file);
if (unlikely(!sock))
-   return ret;
+   return -ENOTSOCK;
 
if (req->flags & REQ_F_BUFFER_SELECT) {
kbuf = io_recv_buffer_select(req, !force_nonblock);
diff --git a/include/linux/net.h b/include/linux/net.h
index 0dcd51feef02..9e2324efc26a 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -240,7 +240,7 @@ int sock_sendmsg(struct socket *sock, struct msghdr *msg);
 int sock_recvmsg(struct socket *sock, struct msghdr *msg, int flags);
 struct file *sock_alloc_file(struct socket *sock, int flags, const char 
*dname);
 struct socket *sockfd_lookup(int fd, int *err);
-struct socket *sock_from_file(struct file *file, int *err);
+struct socket *sock_from_file(struct file *file);
 #define sockfd_put(sock) fput(sock->file)
 int net_ratelimit(void);
 
diff --git a/net/core/netclassid_cgroup.c b/net/core/netclassid_cgroup.c
index 41b24cd31562..b49c57d35a88 100644
--- a/net/core/netclassid_cgroup.c
+++ b/net/core/netclassid_cgroup.c
@@ -68,9 +68,8 @@ struct update_classid_context {
 
 static int update_classid_sock(const void *v, struct file *file, unsigned n)
 {
-   int err;
struct update_classid_context *ctx = (void *)v;
-   struct socket *sock = sock_from_file(file, );
+   struct socket *sock = sock_from_file(file);
 
if (sock) {
spin_lock(_sk_update_lock);
diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
index 9bd4cab7d510..99a431c56f23 100644
--- a/net/core/netprio_cgroup.c
+++ b/net/core/netprio_cgroup.c
@@ -220,8 +220,7 @@ static ssize_t write_priomap(struct kernfs_open_file *of,
 
 static int update_netprio(const void *v, struct file *file, unsigned n)
 {
-   int err;
-   struct socket *sock = sock_from_file(file, );
+   struct socket *sock = sock_from_file(file);
if (sock) {
spin_lock(_sk_update_lock);
sock_cgroup_set_prioidx(>sk->sk_cgrp_data,
diff --git a/net/core/sock.c b/net/core/sock.c
index d422a6808405..eb55cf79bb24 100644
--- a/net/core/sock.c
+++ b/net/core/sock.

[PATCH bpf-next v4 2/6] bpf: Add a bpf_sock_from_file helper

2020-12-02 Thread Florent Revest

While eBPF programs can check whether a file is a socket by file->f_op
== _file_ops, they cannot convert the void private_data pointer
to a struct socket BTF pointer. In order to do this a new helper
wrapping sock_from_file is added.

This is useful to tracing programs but also other program types
inheriting this set of helpers such as iterators or LSM programs.

Signed-off-by: Florent Revest 
Acked-by: KP Singh 
Acked-by: Martin KaFai Lau 
---
 include/uapi/linux/bpf.h   |  9 +
 kernel/trace/bpf_trace.c   | 20 
 scripts/bpf_helpers_doc.py |  4 
 tools/include/uapi/linux/bpf.h |  9 +
 4 files changed, 42 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c3458ec1f30a..a92b2b7d331b 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3817,6 +3817,14 @@ union bpf_attr {
  * The **hash_algo** is returned on success,
  * **-EOPNOTSUP** if IMA is disabled or **-EINVAL** if
  * invalid arguments are passed.
+ *
+ * struct socket *bpf_sock_from_file(struct file *file)
+ * Description
+ * If the given file represents a socket, returns the associated
+ * socket.
+ * Return
+ * A pointer to a struct socket on success or NULL if the file is
+ * not a socket.
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -3981,6 +3989,7 @@ union bpf_attr {
FN(bprm_opts_set),  \
FN(ktime_get_coarse_ns),\
FN(ima_inode_hash), \
+   FN(sock_from_file), \
/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index d255bc9b2bfa..d0aac9eac2d8 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1260,6 +1260,24 @@ const struct bpf_func_proto bpf_snprintf_btf_proto = {
.arg5_type  = ARG_ANYTHING,
 };
 
+BPF_CALL_1(bpf_sock_from_file, struct file *, file)
+{
+   return (unsigned long) sock_from_file(file);
+}
+
+BTF_ID_LIST(bpf_sock_from_file_btf_ids)
+BTF_ID(struct, socket)
+BTF_ID(struct, file)
+
+static const struct bpf_func_proto bpf_sock_from_file_proto = {
+   .func   = bpf_sock_from_file,
+   .gpl_only   = false,
+   .ret_type   = RET_PTR_TO_BTF_ID_OR_NULL,
+   .ret_btf_id = _sock_from_file_btf_ids[0],
+   .arg1_type  = ARG_PTR_TO_BTF_ID,
+   .arg1_btf_id= _sock_from_file_btf_ids[1],
+};
+
 const struct bpf_func_proto *
 bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
@@ -1356,6 +1374,8 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const 
struct bpf_prog *prog)
return _per_cpu_ptr_proto;
case BPF_FUNC_bpf_this_cpu_ptr:
return _this_cpu_ptr_proto;
+   case BPF_FUNC_sock_from_file:
+   return _sock_from_file_proto;
default:
return NULL;
}
diff --git a/scripts/bpf_helpers_doc.py b/scripts/bpf_helpers_doc.py
index 8b829748d488..867ada23281c 100755
--- a/scripts/bpf_helpers_doc.py
+++ b/scripts/bpf_helpers_doc.py
@@ -437,6 +437,8 @@ class PrinterHelpers(Printer):
 'struct path',
 'struct btf_ptr',
 'struct inode',
+'struct socket',
+'struct file',
 ]
 known_types = {
 '...',
@@ -482,6 +484,8 @@ class PrinterHelpers(Printer):
 'struct path',
 'struct btf_ptr',
 'struct inode',
+'struct socket',
+'struct file',
 }
 mapped_types = {
 'u8': '__u8',
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index c3458ec1f30a..a92b2b7d331b 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -3817,6 +3817,14 @@ union bpf_attr {
  * The **hash_algo** is returned on success,
  * **-EOPNOTSUP** if IMA is disabled or **-EINVAL** if
  * invalid arguments are passed.
+ *
+ * struct socket *bpf_sock_from_file(struct file *file)
+ * Description
+ * If the given file represents a socket, returns the associated
+ * socket.
+ * Return
+ * A pointer to a struct socket on success or NULL if the file is
+ * not a socket.
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -3981,6 +3989,7 @@ union bpf_attr {
FN(bprm_opts_set),  \
FN(ktime_get_coarse_ns),\
FN(ima_inode_hash), \
+   FN(sock_from_file), \
/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
-- 
2.29.2.454.gaff20da3a2-goog

[PATCH bpf-next v2 3/3] bpf: Add a selftest for the tracing bpf_get_socket_cookie

2020-12-03 Thread Florent Revest

This builds up on the existing socket cookie test which checks whether
the bpf_get_socket_cookie helpers provide the same value in
cgroup/connect6 and sockops programs for a socket created by the
userspace part of the test.

Adding a tracing program to the existing objects requires a different
attachment strategy and different headers.

Signed-off-by: Florent Revest 
---
 .../selftests/bpf/prog_tests/socket_cookie.c  | 24 +++
 .../selftests/bpf/progs/socket_cookie_prog.c  | 41 ---
 2 files changed, 52 insertions(+), 13 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/socket_cookie.c 
b/tools/testing/selftests/bpf/prog_tests/socket_cookie.c
index 53d0c44e7907..e5c5e2ea1deb 100644
--- a/tools/testing/selftests/bpf/prog_tests/socket_cookie.c
+++ b/tools/testing/selftests/bpf/prog_tests/socket_cookie.c
@@ -15,8 +15,8 @@ struct socket_cookie {
 
 void test_socket_cookie(void)
 {
+   struct bpf_link *set_link, *update_sockops_link, *update_tracing_link;
socklen_t addr_len = sizeof(struct sockaddr_in6);
-   struct bpf_link *set_link, *update_link;
int server_fd, client_fd, cgroup_fd;
struct socket_cookie_prog *skel;
__u32 cookie_expected_value;
@@ -39,15 +39,21 @@ void test_socket_cookie(void)
  PTR_ERR(set_link)))
goto close_cgroup_fd;
 
-   update_link = bpf_program__attach_cgroup(skel->progs.update_cookie,
-cgroup_fd);
-   if (CHECK(IS_ERR(update_link), "update-link-cg-attach", "err %ld\n",
- PTR_ERR(update_link)))
+   update_sockops_link = bpf_program__attach_cgroup(
+   skel->progs.update_cookie_sockops, cgroup_fd);
+   if (CHECK(IS_ERR(update_sockops_link), "update-sockops-link-cg-attach",
+ "err %ld\n", PTR_ERR(update_sockops_link)))
goto free_set_link;
 
+   update_tracing_link = bpf_program__attach(
+   skel->progs.update_cookie_tracing);
+   if (CHECK(IS_ERR(update_tracing_link), "update-tracing-link-attach",
+ "err %ld\n", PTR_ERR(update_tracing_link)))
+   goto free_update_sockops_link;
+
server_fd = start_server(AF_INET6, SOCK_STREAM, "::1", 0, 0);
if (CHECK(server_fd < 0, "start_server", "errno %d\n", errno))
-   goto free_update_link;
+   goto free_update_tracing_link;
 
client_fd = connect_to_fd(server_fd, 0);
if (CHECK(client_fd < 0, "connect_to_fd", "errno %d\n", errno))
@@ -71,8 +77,10 @@ void test_socket_cookie(void)
close(client_fd);
 close_server_fd:
close(server_fd);
-free_update_link:
-   bpf_link__destroy(update_link);
+free_update_tracing_link:
+   bpf_link__destroy(update_tracing_link);
+free_update_sockops_link:
+   bpf_link__destroy(update_sockops_link);
 free_set_link:
bpf_link__destroy(set_link);
 close_cgroup_fd:
diff --git a/tools/testing/selftests/bpf/progs/socket_cookie_prog.c 
b/tools/testing/selftests/bpf/progs/socket_cookie_prog.c
index 81e84be6f86d..1f770b732cb1 100644
--- a/tools/testing/selftests/bpf/progs/socket_cookie_prog.c
+++ b/tools/testing/selftests/bpf/progs/socket_cookie_prog.c
@@ -1,11 +1,13 @@
 // SPDX-License-Identifier: GPL-2.0
 // Copyright (c) 2018 Facebook
 
-#include 
-#include 
+#include "vmlinux.h"
 
 #include 
 #include 
+#include 
+
+#define AF_INET6 10
 
 struct socket_cookie {
__u64 cookie_key;
@@ -19,6 +21,14 @@ struct {
__type(value, struct socket_cookie);
 } socket_cookies SEC(".maps");
 
+/*
+ * These three programs get executed in a row on connect() syscalls. The
+ * userspace side of the test creates a client socket, issues a connect() on it
+ * and then checks that the local storage associated with this socket has:
+ * cookie_value == local_port << 8 | 0xFF
+ * The different parts of this cookie_value are appended by those hooks if they
+ * all agree on the output of bpf_get_socket_cookie().
+ */
 SEC("cgroup/connect6")
 int set_cookie(struct bpf_sock_addr *ctx)
 {
@@ -32,14 +42,14 @@ int set_cookie(struct bpf_sock_addr *ctx)
if (!p)
return 1;
 
-   p->cookie_value = 0xFF;
+   p->cookie_value = 0xF;
p->cookie_key = bpf_get_socket_cookie(ctx);
 
return 1;
 }
 
 SEC("sockops")
-int update_cookie(struct bpf_sock_ops *ctx)
+int update_cookie_sockops(struct bpf_sock_ops *ctx)
 {
struct bpf_sock *sk;
struct socket_cookie *p;
@@ -60,9 +70,30 @@ int update_cookie(struct bpf_sock_ops *ctx)
if (p->cookie_key != bpf_get_socket_cookie(ctx))
return 1;
 
-   p->cookie_value = (ctx->local_port << 8) | p->cookie_value;
+   p->cookie_value |= (ctx->local_port << 8);

[PATCH bpf-next v2 1/3] bpf: Expose bpf_get_socket_cookie to tracing programs

2020-12-03 Thread Florent Revest

This creates a new helper proto because the existing
bpf_get_socket_cookie_sock_proto has a ARG_PTR_TO_CTX argument and only
works for BPF programs where the context is a sock.

This helper could also be useful to other BPF program types such as LSM.

Signed-off-by: Florent Revest 
---
 include/uapi/linux/bpf.h   | 7 +++
 kernel/trace/bpf_trace.c   | 4 
 net/core/filter.c  | 7 +++
 tools/include/uapi/linux/bpf.h | 7 +++
 4 files changed, 25 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c3458ec1f30a..3e0e33c43998 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1662,6 +1662,13 @@ union bpf_attr {
  * Return
  * A 8-byte long non-decreasing number.
  *
+ * u64 bpf_get_socket_cookie(void *sk)
+ * Description
+ * Equivalent to **bpf_get_socket_cookie**\ () helper that accepts
+ * *sk*, but gets socket from a BTF **struct sock**.
+ * Return
+ * A 8-byte long non-decreasing number.
+ *
  * u32 bpf_get_socket_uid(struct sk_buff *skb)
  * Return
  * The owner UID of the socket associated to *skb*. If the socket
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index d255bc9b2bfa..14ad96579813 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1725,6 +1725,8 @@ raw_tp_prog_func_proto(enum bpf_func_id func_id, const 
struct bpf_prog *prog)
}
 }
 
+extern const struct bpf_func_proto bpf_get_socket_cookie_sock_tracing_proto;
+
 const struct bpf_func_proto *
 tracing_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
@@ -1748,6 +1750,8 @@ tracing_prog_func_proto(enum bpf_func_id func_id, const 
struct bpf_prog *prog)
return _sk_storage_get_tracing_proto;
case BPF_FUNC_sk_storage_delete:
return _sk_storage_delete_tracing_proto;
+   case BPF_FUNC_get_socket_cookie:
+   return _get_socket_cookie_sock_tracing_proto;
 #endif
case BPF_FUNC_seq_printf:
return prog->expected_attach_type == BPF_TRACE_ITER ?
diff --git a/net/core/filter.c b/net/core/filter.c
index 2ca5eecebacf..177c4e5e529d 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4631,6 +4631,13 @@ static const struct bpf_func_proto 
bpf_get_socket_cookie_sock_proto = {
.arg1_type  = ARG_PTR_TO_CTX,
 };
 
+const struct bpf_func_proto bpf_get_socket_cookie_sock_tracing_proto = {
+   .func   = bpf_get_socket_cookie_sock,
+   .gpl_only   = false,
+   .ret_type   = RET_INTEGER,
+   .arg1_type  = ARG_PTR_TO_BTF_ID_SOCK_COMMON,
+};
+
 BPF_CALL_1(bpf_get_socket_cookie_sock_ops, struct bpf_sock_ops_kern *, ctx)
 {
return __sock_gen_cookie(ctx->sk);
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index c3458ec1f30a..3e0e33c43998 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1662,6 +1662,13 @@ union bpf_attr {
  * Return
  * A 8-byte long non-decreasing number.
  *
+ * u64 bpf_get_socket_cookie(void *sk)
+ * Description
+ * Equivalent to **bpf_get_socket_cookie**\ () helper that accepts
+ * *sk*, but gets socket from a BTF **struct sock**.
+ * Return
+ * A 8-byte long non-decreasing number.
+ *
  * u32 bpf_get_socket_uid(struct sk_buff *skb)
  * Return
  * The owner UID of the socket associated to *skb*. If the socket
-- 
2.29.2.576.ga3fc446d84-goog

[PATCH bpf-next v2 2/3] selftests/bpf: Integrate the socket_cookie test to test_progs

2020-12-03 Thread Florent Revest

Currently, the selftest for the BPF socket_cookie helpers is built and
run independently from test_progs. It's easy to forget and hard to
maintain.

This patch moves the socket cookies test into prog_tests/ and vastly
simplifies its logic by:
- rewriting the loading code with BPF skeletons
- rewriting the server/client code with network helpers
- rewriting the cgroup code with test__join_cgroup
- rewriting the error handling code with CHECKs

Signed-off-by: Florent Revest 
---
 tools/testing/selftests/bpf/Makefile  |   3 +-
 .../selftests/bpf/prog_tests/socket_cookie.c  |  82 +++
 .../selftests/bpf/progs/socket_cookie_prog.c  |   2 -
 .../selftests/bpf/test_socket_cookie.c| 208 --
 4 files changed, 83 insertions(+), 212 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/socket_cookie.c
 delete mode 100644 tools/testing/selftests/bpf/test_socket_cookie.c

diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index 894192c319fb..8c7ff88f0eb3 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -33,7 +33,7 @@ LDLIBS += -lcap -lelf -lz -lrt -lpthread
 # Order correspond to 'make run_tests' order
 TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map 
test_progs \
test_verifier_log test_dev_cgroup \
-   test_sock test_sockmap get_cgroup_id_user test_socket_cookie \
+   test_sock test_sockmap get_cgroup_id_user \
test_cgroup_storage \
test_netcnt test_tcpnotify_user test_sysctl \
test_progs-no_alu32 \
@@ -161,7 +161,6 @@ $(OUTPUT)/test_dev_cgroup: cgroup_helpers.c
 $(OUTPUT)/test_skb_cgroup_id_user: cgroup_helpers.c
 $(OUTPUT)/test_sock: cgroup_helpers.c
 $(OUTPUT)/test_sock_addr: cgroup_helpers.c
-$(OUTPUT)/test_socket_cookie: cgroup_helpers.c
 $(OUTPUT)/test_sockmap: cgroup_helpers.c
 $(OUTPUT)/test_tcpnotify_user: cgroup_helpers.c trace_helpers.c
 $(OUTPUT)/get_cgroup_id_user: cgroup_helpers.c
diff --git a/tools/testing/selftests/bpf/prog_tests/socket_cookie.c 
b/tools/testing/selftests/bpf/prog_tests/socket_cookie.c
new file mode 100644
index ..53d0c44e7907
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/socket_cookie.c
@@ -0,0 +1,82 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2020 Google LLC.
+// Copyright (c) 2018 Facebook
+
+#include 
+#include "socket_cookie_prog.skel.h"
+#include "network_helpers.h"
+
+static int duration;
+
+struct socket_cookie {
+   __u64 cookie_key;
+   __u32 cookie_value;
+};
+
+void test_socket_cookie(void)
+{
+   socklen_t addr_len = sizeof(struct sockaddr_in6);
+   struct bpf_link *set_link, *update_link;
+   int server_fd, client_fd, cgroup_fd;
+   struct socket_cookie_prog *skel;
+   __u32 cookie_expected_value;
+   struct sockaddr_in6 addr;
+   struct socket_cookie val;
+   int err = 0;
+
+   skel = socket_cookie_prog__open_and_load();
+   if (CHECK(!skel, "socket_cookie_prog__open_and_load",
+ "skeleton open_and_load failed\n"))
+   return;
+
+   cgroup_fd = test__join_cgroup("/socket_cookie");
+   if (CHECK(cgroup_fd < 0, "join_cgroup", "cgroup creation failed\n"))
+   goto destroy_skel;
+
+   set_link = bpf_program__attach_cgroup(skel->progs.set_cookie,
+ cgroup_fd);
+   if (CHECK(IS_ERR(set_link), "set-link-cg-attach", "err %ld\n",
+ PTR_ERR(set_link)))
+   goto close_cgroup_fd;
+
+   update_link = bpf_program__attach_cgroup(skel->progs.update_cookie,
+cgroup_fd);
+   if (CHECK(IS_ERR(update_link), "update-link-cg-attach", "err %ld\n",
+ PTR_ERR(update_link)))
+   goto free_set_link;
+
+   server_fd = start_server(AF_INET6, SOCK_STREAM, "::1", 0, 0);
+   if (CHECK(server_fd < 0, "start_server", "errno %d\n", errno))
+   goto free_update_link;
+
+   client_fd = connect_to_fd(server_fd, 0);
+   if (CHECK(client_fd < 0, "connect_to_fd", "errno %d\n", errno))
+   goto close_server_fd;
+
+   err = bpf_map_lookup_elem(bpf_map__fd(skel->maps.socket_cookies),
+ _fd, );
+   if (CHECK(err, "map_lookup", "err %d errno %d\n", err, errno))
+   goto close_client_fd;
+
+   err = getsockname(client_fd, (struct sockaddr *), _len);
+   if (CHECK(err, "getsockname", "Can't get client local addr\n"))
+   goto close_client_fd;
+
+   cookie_expected_value = (ntohs(addr.sin6_port) << 8) | 0xFF;
+   CHECK(val.cookie_value != cookie_expected_value, "",
+

Re: [PATCH v2 5/5] bpf: Add an iterator selftest for bpf_sk_storage_get

2020-11-26 Thread Florent Revest

On Thu, 2020-11-19 at 16:32 -0800, Martin KaFai Lau wrote:
> Does it affect all sk(s) in the system?  Can it be limited to
> the sk that the test is testing?

Oh I just realized I haven't answered you here yet! Thanks for the
reviews. :D I'm sending a v3 addressing your comments

[PATCH bpf-next v3 2/6] bpf: Add a bpf_sock_from_file helper

2020-11-26 Thread Florent Revest

While eBPF programs can check whether a file is a socket by file->f_op
== _file_ops, they cannot convert the void private_data pointer
to a struct socket BTF pointer. In order to do this a new helper
wrapping sock_from_file is added.

This is useful to tracing programs but also other program types
inheriting this set of helpers such as iterators or LSM programs.

Signed-off-by: Florent Revest 
Acked-by: KP Singh 
Acked-by: Martin KaFai Lau 
---
 include/uapi/linux/bpf.h   |  9 +
 kernel/trace/bpf_trace.c   | 20 
 scripts/bpf_helpers_doc.py |  4 
 tools/include/uapi/linux/bpf.h |  9 +
 4 files changed, 42 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c3458ec1f30a..a92b2b7d331b 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3817,6 +3817,14 @@ union bpf_attr {
  * The **hash_algo** is returned on success,
  * **-EOPNOTSUP** if IMA is disabled or **-EINVAL** if
  * invalid arguments are passed.
+ *
+ * struct socket *bpf_sock_from_file(struct file *file)
+ * Description
+ * If the given file represents a socket, returns the associated
+ * socket.
+ * Return
+ * A pointer to a struct socket on success or NULL if the file is
+ * not a socket.
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -3981,6 +3989,7 @@ union bpf_attr {
FN(bprm_opts_set),  \
FN(ktime_get_coarse_ns),\
FN(ima_inode_hash), \
+   FN(sock_from_file), \
/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index d255bc9b2bfa..d0aac9eac2d8 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1260,6 +1260,24 @@ const struct bpf_func_proto bpf_snprintf_btf_proto = {
.arg5_type  = ARG_ANYTHING,
 };
 
+BPF_CALL_1(bpf_sock_from_file, struct file *, file)
+{
+   return (unsigned long) sock_from_file(file);
+}
+
+BTF_ID_LIST(bpf_sock_from_file_btf_ids)
+BTF_ID(struct, socket)
+BTF_ID(struct, file)
+
+static const struct bpf_func_proto bpf_sock_from_file_proto = {
+   .func   = bpf_sock_from_file,
+   .gpl_only   = false,
+   .ret_type   = RET_PTR_TO_BTF_ID_OR_NULL,
+   .ret_btf_id = _sock_from_file_btf_ids[0],
+   .arg1_type  = ARG_PTR_TO_BTF_ID,
+   .arg1_btf_id= _sock_from_file_btf_ids[1],
+};
+
 const struct bpf_func_proto *
 bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
@@ -1356,6 +1374,8 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const 
struct bpf_prog *prog)
return _per_cpu_ptr_proto;
case BPF_FUNC_bpf_this_cpu_ptr:
return _this_cpu_ptr_proto;
+   case BPF_FUNC_sock_from_file:
+   return _sock_from_file_proto;
default:
return NULL;
}
diff --git a/scripts/bpf_helpers_doc.py b/scripts/bpf_helpers_doc.py
index 8b829748d488..867ada23281c 100755
--- a/scripts/bpf_helpers_doc.py
+++ b/scripts/bpf_helpers_doc.py
@@ -437,6 +437,8 @@ class PrinterHelpers(Printer):
 'struct path',
 'struct btf_ptr',
 'struct inode',
+'struct socket',
+'struct file',
 ]
 known_types = {
 '...',
@@ -482,6 +484,8 @@ class PrinterHelpers(Printer):
 'struct path',
 'struct btf_ptr',
 'struct inode',
+'struct socket',
+'struct file',
 }
 mapped_types = {
 'u8': '__u8',
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index c3458ec1f30a..a92b2b7d331b 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -3817,6 +3817,14 @@ union bpf_attr {
  * The **hash_algo** is returned on success,
  * **-EOPNOTSUP** if IMA is disabled or **-EINVAL** if
  * invalid arguments are passed.
+ *
+ * struct socket *bpf_sock_from_file(struct file *file)
+ * Description
+ * If the given file represents a socket, returns the associated
+ * socket.
+ * Return
+ * A pointer to a struct socket on success or NULL if the file is
+ * not a socket.
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -3981,6 +3989,7 @@ union bpf_attr {
FN(bprm_opts_set),  \
FN(ktime_get_coarse_ns),\
FN(ima_inode_hash), \
+   FN(sock_from_file), \
/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
-- 
2.29.2.454.gaff20da3a2-goog

[PATCH bpf-next v3 1/6] net: Remove the err argument from sock_from_file

2020-11-26 Thread Florent Revest

Currently, the sock_from_file prototype takes an "err" pointer that is
either not set or set to -ENOTSOCK IFF the returned socket is NULL. This
makes the error redundant and it is ignored by a few callers.

This patch simplifies the API by letting callers deduce the error based
on whether the returned socket is NULL or not.

Suggested-by: Al Viro 
Signed-off-by: Florent Revest 
---
 fs/eventpoll.c   |  3 +--
 fs/io_uring.c| 16 
 include/linux/net.h  |  2 +-
 net/core/netclassid_cgroup.c |  3 +--
 net/core/netprio_cgroup.c|  3 +--
 net/core/sock.c  |  8 +---
 net/socket.c | 27 ---
 7 files changed, 29 insertions(+), 33 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 4df61129566d..c764d8d5a76a 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -415,12 +415,11 @@ static inline void ep_set_busy_poll_napi_id(struct epitem 
*epi)
unsigned int napi_id;
struct socket *sock;
struct sock *sk;
-   int err;
 
if (!net_busy_loop_on())
return;
 
-   sock = sock_from_file(epi->ffd.file, );
+   sock = sock_from_file(epi->ffd.file);
if (!sock)
return;
 
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 8018c7076b25..ace99b15cbd3 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -4341,9 +4341,9 @@ static int io_sendmsg(struct io_kiocb *req, bool 
force_nonblock,
unsigned flags;
int ret;
 
-   sock = sock_from_file(req->file, );
+   sock = sock_from_file(req->file);
if (unlikely(!sock))
-   return ret;
+   return -ENOTSOCK;
 
if (req->async_data) {
kmsg = req->async_data;
@@ -4390,9 +4390,9 @@ static int io_send(struct io_kiocb *req, bool 
force_nonblock,
unsigned flags;
int ret;
 
-   sock = sock_from_file(req->file, );
+   sock = sock_from_file(req->file);
if (unlikely(!sock))
-   return ret;
+   return -ENOTSOCK;
 
ret = import_single_range(WRITE, sr->buf, sr->len, , _iter);
if (unlikely(ret))
@@ -4569,9 +4569,9 @@ static int io_recvmsg(struct io_kiocb *req, bool 
force_nonblock,
unsigned flags;
int ret, cflags = 0;
 
-   sock = sock_from_file(req->file, );
+   sock = sock_from_file(req->file);
if (unlikely(!sock))
-   return ret;
+   return -ENOTSOCK;
 
if (req->async_data) {
kmsg = req->async_data;
@@ -4632,9 +4632,9 @@ static int io_recv(struct io_kiocb *req, bool 
force_nonblock,
unsigned flags;
int ret, cflags = 0;
 
-   sock = sock_from_file(req->file, );
+   sock = sock_from_file(req->file);
if (unlikely(!sock))
-   return ret;
+   return -ENOTSOCK;
 
if (req->flags & REQ_F_BUFFER_SELECT) {
kbuf = io_recv_buffer_select(req, !force_nonblock);
diff --git a/include/linux/net.h b/include/linux/net.h
index 0dcd51feef02..9e2324efc26a 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -240,7 +240,7 @@ int sock_sendmsg(struct socket *sock, struct msghdr *msg);
 int sock_recvmsg(struct socket *sock, struct msghdr *msg, int flags);
 struct file *sock_alloc_file(struct socket *sock, int flags, const char 
*dname);
 struct socket *sockfd_lookup(int fd, int *err);
-struct socket *sock_from_file(struct file *file, int *err);
+struct socket *sock_from_file(struct file *file);
 #define sockfd_put(sock) fput(sock->file)
 int net_ratelimit(void);
 
diff --git a/net/core/netclassid_cgroup.c b/net/core/netclassid_cgroup.c
index 41b24cd31562..b49c57d35a88 100644
--- a/net/core/netclassid_cgroup.c
+++ b/net/core/netclassid_cgroup.c
@@ -68,9 +68,8 @@ struct update_classid_context {
 
 static int update_classid_sock(const void *v, struct file *file, unsigned n)
 {
-   int err;
struct update_classid_context *ctx = (void *)v;
-   struct socket *sock = sock_from_file(file, );
+   struct socket *sock = sock_from_file(file);
 
if (sock) {
spin_lock(_sk_update_lock);
diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
index 9bd4cab7d510..99a431c56f23 100644
--- a/net/core/netprio_cgroup.c
+++ b/net/core/netprio_cgroup.c
@@ -220,8 +220,7 @@ static ssize_t write_priomap(struct kernfs_open_file *of,
 
 static int update_netprio(const void *v, struct file *file, unsigned n)
 {
-   int err;
-   struct socket *sock = sock_from_file(file, );
+   struct socket *sock = sock_from_file(file);
if (sock) {
spin_lock(_sk_update_lock);
sock_cgroup_set_prioidx(>sk->sk_cgrp_data,
diff --git a/net/core/sock.c b/net/core/sock.c
index 727ea1cc633c..dd0598d831ef 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2808,14 +280

[PATCH bpf-next v3 3/6] bpf: Expose bpf_sk_storage_* to iterator programs

2020-11-26 Thread Florent Revest

Iterators are currently used to expose kernel information to userspace
over fast procfs-like files but iterators could also be used to
manipulate local storage. For example, the task_file iterator could be
used to initialize a socket local storage with associations between
processes and sockets or to selectively delete local storage values.

Signed-off-by: Florent Revest 
Acked-by: Martin KaFai Lau 
---
 net/core/bpf_sk_storage.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index a32037daa933..4edd033e899c 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -394,6 +394,7 @@ static bool bpf_sk_storage_tracing_allowed(const struct 
bpf_prog *prog)
 * use the bpf_sk_storage_(get|delete) helper.
 */
switch (prog->expected_attach_type) {
+   case BPF_TRACE_ITER:
case BPF_TRACE_RAW_TP:
/* bpf_sk_storage has no trace point */
return true;
-- 
2.29.2.454.gaff20da3a2-goog

[PATCH bpf-next v3 6/6] bpf: Test bpf_sk_storage_get in tcp iterators

2020-11-26 Thread Florent Revest

This extends the existing bpf_sk_storage_get test where a socket is
created and tagged with its creator's pid by a task_file iterator.

A TCP iterator is now also used at the end of the test to negate the
values already stored in the local storage. The test therefore expects
-getpid() to be stored in the local storage.

Signed-off-by: Florent Revest 
---
 .../selftests/bpf/prog_tests/bpf_iter.c| 13 +
 .../progs/bpf_iter_bpf_sk_storage_helpers.c| 18 ++
 2 files changed, 31 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c 
b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
index 9336d0f18331..b8362147c9e3 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
@@ -978,6 +978,8 @@ static void test_bpf_sk_storage_delete(void)
 /* This creates a socket and its local storage. It then runs a task_iter BPF
  * program that replaces the existing socket local storage with the tgid of the
  * only task owning a file descriptor to this socket, this process, prog_tests.
+ * It then runs a tcp socket iterator that negates the value in the existing
+ * socket local storage, the test verifies that the resulting value is -pid.
  */
 static void test_bpf_sk_storage_get(void)
 {
@@ -994,6 +996,10 @@ static void test_bpf_sk_storage_get(void)
if (CHECK(sock_fd < 0, "socket", "errno: %d\n", errno))
goto out;
 
+   err = listen(sock_fd, 1);
+   if (CHECK(err != 0, "listen", "errno: %d\n", errno))
+   goto out;
+
map_fd = bpf_map__fd(skel->maps.sk_stg_map);
 
err = bpf_map_update_elem(map_fd, _fd, , BPF_NOEXIST);
@@ -1007,6 +1013,13 @@ static void test_bpf_sk_storage_get(void)
  "map value wasn't set correctly (expected %d, got %d, err=%d)\n",
  getpid(), val, err);
 
+   do_dummy_read(skel->progs.negate_socket_local_storage);
+
+   err = bpf_map_lookup_elem(map_fd, _fd, );
+   CHECK(err || val != -getpid(), "bpf_map_lookup_elem",
+ "map value wasn't set correctly (expected %d, got %d, err=%d)\n",
+ -getpid(), val, err);
+
 close_socket:
close(sock_fd);
 out:
diff --git 
a/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c 
b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c
index d7a7a802d172..b3f0cb139c55 100644
--- a/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c
+++ b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c
@@ -46,3 +46,21 @@ int fill_socket_owner(struct bpf_iter__task_file *ctx)
return 0;
 }
 
+SEC("iter/tcp")
+int negate_socket_local_storage(struct bpf_iter__tcp *ctx)
+{
+   struct sock_common *sk_common = ctx->sk_common;
+   int *sock_tgid;
+
+   if (!sk_common)
+   return 0;
+
+   sock_tgid = bpf_sk_storage_get(_stg_map, sk_common, 0, 0);
+   if (!sock_tgid)
+   return 0;
+
+   *sock_tgid = -*sock_tgid;
+
+   return 0;
+}
+
-- 
2.29.2.454.gaff20da3a2-goog

[PATCH bpf-next v3 4/6] bpf: Add an iterator selftest for bpf_sk_storage_delete

2020-11-26 Thread Florent Revest

The eBPF program iterates over all entries (well, only one) of a socket
local storage map and deletes them all. The test makes sure that the
entry is indeed deleted.

Signed-off-by: Florent Revest 
Acked-by: Martin KaFai Lau 
---
 .../selftests/bpf/prog_tests/bpf_iter.c   | 64 +++
 .../progs/bpf_iter_bpf_sk_storage_helpers.c   | 23 +++
 2 files changed, 87 insertions(+)
 create mode 100644 
tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c 
b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
index 448885b95eed..bb4a638f2e6f 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
@@ -20,6 +20,7 @@
 #include "bpf_iter_bpf_percpu_hash_map.skel.h"
 #include "bpf_iter_bpf_array_map.skel.h"
 #include "bpf_iter_bpf_percpu_array_map.skel.h"
+#include "bpf_iter_bpf_sk_storage_helpers.skel.h"
 #include "bpf_iter_bpf_sk_storage_map.skel.h"
 #include "bpf_iter_test_kern5.skel.h"
 #include "bpf_iter_test_kern6.skel.h"
@@ -913,6 +914,67 @@ static void test_bpf_percpu_array_map(void)
bpf_iter_bpf_percpu_array_map__destroy(skel);
 }
 
+/* An iterator program deletes all local storage in a map. */
+static void test_bpf_sk_storage_delete(void)
+{
+   DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
+   struct bpf_iter_bpf_sk_storage_helpers *skel;
+   union bpf_iter_link_info linfo;
+   int err, len, map_fd, iter_fd;
+   struct bpf_link *link;
+   int sock_fd = -1;
+   __u32 val = 42;
+   char buf[64];
+
+   skel = bpf_iter_bpf_sk_storage_helpers__open_and_load();
+   if (CHECK(!skel, "bpf_iter_bpf_sk_storage_helpers__open_and_load",
+ "skeleton open_and_load failed\n"))
+   return;
+
+   map_fd = bpf_map__fd(skel->maps.sk_stg_map);
+
+   sock_fd = socket(AF_INET6, SOCK_STREAM, 0);
+   if (CHECK(sock_fd < 0, "socket", "errno: %d\n", errno))
+   goto out;
+   err = bpf_map_update_elem(map_fd, _fd, , BPF_NOEXIST);
+   if (CHECK(err, "map_update", "map_update failed\n"))
+   goto out;
+
+   memset(, 0, sizeof(linfo));
+   linfo.map.map_fd = map_fd;
+   opts.link_info = 
+   opts.link_info_len = sizeof(linfo);
+   link = bpf_program__attach_iter(skel->progs.delete_bpf_sk_storage_map,
+   );
+   if (CHECK(IS_ERR(link), "attach_iter", "attach_iter failed\n"))
+   goto out;
+
+   iter_fd = bpf_iter_create(bpf_link__fd(link));
+   if (CHECK(iter_fd < 0, "create_iter", "create_iter failed\n"))
+   goto free_link;
+
+   /* do some tests */
+   while ((len = read(iter_fd, buf, sizeof(buf))) > 0)
+   ;
+   if (CHECK(len < 0, "read", "read failed: %s\n", strerror(errno)))
+   goto close_iter;
+
+   /* test results */
+   err = bpf_map_lookup_elem(map_fd, _fd, );
+   if (CHECK(!err || errno != ENOENT, "bpf_map_lookup_elem",
+ "map value wasn't deleted (err=%d, errno=%d)\n", err, errno))
+   goto close_iter;
+
+close_iter:
+   close(iter_fd);
+free_link:
+   bpf_link__destroy(link);
+out:
+   if (sock_fd >= 0)
+   close(sock_fd);
+   bpf_iter_bpf_sk_storage_helpers__destroy(skel);
+}
+
 static void test_bpf_sk_storage_map(void)
 {
DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
@@ -1067,6 +1129,8 @@ void test_bpf_iter(void)
test_bpf_percpu_array_map();
if (test__start_subtest("bpf_sk_storage_map"))
test_bpf_sk_storage_map();
+   if (test__start_subtest("bpf_sk_storage_delete"))
+   test_bpf_sk_storage_delete();
if (test__start_subtest("rdonly-buf-out-of-bound"))
test_rdonly_buf_out_of_bound();
if (test__start_subtest("buf-neg-offset"))
diff --git 
a/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c 
b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c
new file mode 100644
index ..01ff3235e413
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c
@@ -0,0 +1,23 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020 Google LLC. */
+#include "bpf_iter.h"
+#include 
+#include 
+
+char _license[] SEC("license") = "GPL";
+
+struct {
+   __uint(type, BPF_MAP_TYPE_SK_STORAGE);
+   __uint(map_flags, BPF_F_NO_PREALLOC);
+   __type(key, int);
+   __type(value, int);
+} sk_stg_map SEC(".maps");
+
+SEC("iter/bpf_sk_storage_map")
+int delete_bpf_sk_storage_map(struct bpf_iter__bpf_sk_storage_map *ctx)
+{
+   if (ctx->sk)
+   bpf_sk_storage_delete(_stg_map, ctx->sk);
+
+   return 0;
+}
-- 
2.29.2.454.gaff20da3a2-goog

[PATCH bpf-next v3 5/6] bpf: Add an iterator selftest for bpf_sk_storage_get

2020-11-26 Thread Florent Revest

The eBPF program iterates over all files and tasks. For all socket
files, it stores the tgid of the last task it encountered with a handle
to that socket. This is a heuristic for finding the "owner" of a socket
similar to what's done by lsof, ss, netstat or fuser. Potentially, this
information could be used from a cgroup_skb/*gress hook to try to
associate network traffic with processes.

The test makes sure that a socket it created is tagged with prog_tests's
pid.

Signed-off-by: Florent Revest 
---
 .../selftests/bpf/prog_tests/bpf_iter.c   | 40 +++
 .../progs/bpf_iter_bpf_sk_storage_helpers.c   | 25 
 2 files changed, 65 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c 
b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
index bb4a638f2e6f..9336d0f18331 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
@@ -975,6 +975,44 @@ static void test_bpf_sk_storage_delete(void)
bpf_iter_bpf_sk_storage_helpers__destroy(skel);
 }
 
+/* This creates a socket and its local storage. It then runs a task_iter BPF
+ * program that replaces the existing socket local storage with the tgid of the
+ * only task owning a file descriptor to this socket, this process, prog_tests.
+ */
+static void test_bpf_sk_storage_get(void)
+{
+   struct bpf_iter_bpf_sk_storage_helpers *skel;
+   int err, map_fd, val = -1;
+   int sock_fd = -1;
+
+   skel = bpf_iter_bpf_sk_storage_helpers__open_and_load();
+   if (CHECK(!skel, "bpf_iter_bpf_sk_storage_helpers__open_and_load",
+ "skeleton open_and_load failed\n"))
+   return;
+
+   sock_fd = socket(AF_INET6, SOCK_STREAM, 0);
+   if (CHECK(sock_fd < 0, "socket", "errno: %d\n", errno))
+   goto out;
+
+   map_fd = bpf_map__fd(skel->maps.sk_stg_map);
+
+   err = bpf_map_update_elem(map_fd, _fd, , BPF_NOEXIST);
+   if (CHECK(err, "bpf_map_update_elem", "map_update_failed\n"))
+   goto close_socket;
+
+   do_dummy_read(skel->progs.fill_socket_owner);
+
+   err = bpf_map_lookup_elem(map_fd, _fd, );
+   CHECK(err || val != getpid(), "bpf_map_lookup_elem",
+ "map value wasn't set correctly (expected %d, got %d, err=%d)\n",
+ getpid(), val, err);
+
+close_socket:
+   close(sock_fd);
+out:
+   bpf_iter_bpf_sk_storage_helpers__destroy(skel);
+}
+
 static void test_bpf_sk_storage_map(void)
 {
DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
@@ -1131,6 +1169,8 @@ void test_bpf_iter(void)
test_bpf_sk_storage_map();
if (test__start_subtest("bpf_sk_storage_delete"))
test_bpf_sk_storage_delete();
+   if (test__start_subtest("bpf_sk_storage_get"))
+   test_bpf_sk_storage_get();
if (test__start_subtest("rdonly-buf-out-of-bound"))
test_rdonly_buf_out_of_bound();
if (test__start_subtest("buf-neg-offset"))
diff --git 
a/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c 
b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c
index 01ff3235e413..d7a7a802d172 100644
--- a/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c
+++ b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_helpers.c
@@ -21,3 +21,28 @@ int delete_bpf_sk_storage_map(struct 
bpf_iter__bpf_sk_storage_map *ctx)
 
return 0;
 }
+
+SEC("iter/task_file")
+int fill_socket_owner(struct bpf_iter__task_file *ctx)
+{
+   struct task_struct *task = ctx->task;
+   struct file *file = ctx->file;
+   struct socket *sock;
+   int *sock_tgid;
+
+   if (!task || !file || task->tgid != task->pid)
+   return 0;
+
+   sock = bpf_sock_from_file(file);
+   if (!sock)
+   return 0;
+
+   sock_tgid = bpf_sk_storage_get(_stg_map, sock->sk, 0, 0);
+   if (!sock_tgid)
+   return 0;
+
+   *sock_tgid = task->tgid;
+
+   return 0;
+}
+
-- 
2.29.2.454.gaff20da3a2-goog

[PATCH bpf-next 1/2] bpf: Add a bpf_kallsyms_lookup helper

2020-11-26 Thread Florent Revest

This helper exposes the kallsyms_lookup function to eBPF tracing
programs. This can be used to retrieve the name of the symbol at an
address. For example, when hooking into nf_register_net_hook, one can
audit the name of the registered netfilter hook and potentially also
the name of the module in which the symbol is located.

Signed-off-by: Florent Revest 
---
 include/uapi/linux/bpf.h   | 16 +
 kernel/trace/bpf_trace.c   | 41 ++
 tools/include/uapi/linux/bpf.h | 16 +
 3 files changed, 73 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c3458ec1f30a..670998635eac 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3817,6 +3817,21 @@ union bpf_attr {
  * The **hash_algo** is returned on success,
  * **-EOPNOTSUP** if IMA is disabled or **-EINVAL** if
  * invalid arguments are passed.
+ *
+ * long bpf_kallsyms_lookup(u64 address, char *symbol, u32 symbol_size, char 
*module, u32 module_size)
+ * Description
+ * Uses kallsyms to write the name of the symbol at *address*
+ * into *symbol* of size *symbol_sz*. This is guaranteed to be
+ * zero terminated.
+ * If the symbol is in a module, up to *module_size* bytes of
+ * the module name is written in *module*. This is also
+ * guaranteed to be zero-terminated. Note: a module name
+ * is always shorter than 64 bytes.
+ * Return
+ * On success, the strictly positive length of the full symbol
+ * name, If this is greater than *symbol_size*, the written
+ * symbol is truncated.
+ * On error, a negative value.
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -3981,6 +3996,7 @@ union bpf_attr {
FN(bprm_opts_set),  \
FN(ktime_get_coarse_ns),\
FN(ima_inode_hash), \
+   FN(kallsyms_lookup),\
/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index d255bc9b2bfa..9d86e20c2b13 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -1260,6 +1261,44 @@ const struct bpf_func_proto bpf_snprintf_btf_proto = {
.arg5_type  = ARG_ANYTHING,
 };
 
+BPF_CALL_5(bpf_kallsyms_lookup, u64, address, char *, symbol, u32, symbol_size,
+  char *, module, u32, module_size)
+{
+   char buffer[KSYM_SYMBOL_LEN];
+   unsigned long offset, size;
+   const char *name;
+   char *modname;
+   long ret;
+
+   name = kallsyms_lookup(address, , , , buffer);
+   if (!name)
+   return -EINVAL;
+
+   ret = strlen(name) + 1;
+   if (symbol_size) {
+   strncpy(symbol, name, symbol_size);
+   symbol[symbol_size - 1] = '\0';
+   }
+
+   if (modname && module_size) {
+   strncpy(module, modname, module_size);
+   module[module_size - 1] = '\0';
+   }
+
+   return ret;
+}
+
+const struct bpf_func_proto bpf_kallsyms_lookup_proto = {
+   .func   = bpf_kallsyms_lookup,
+   .gpl_only   = false,
+   .ret_type   = RET_INTEGER,
+   .arg1_type  = ARG_ANYTHING,
+   .arg2_type  = ARG_PTR_TO_MEM,
+   .arg3_type  = ARG_CONST_SIZE,
+   .arg4_type  = ARG_PTR_TO_MEM,
+   .arg5_type  = ARG_CONST_SIZE,
+};
+
 const struct bpf_func_proto *
 bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
@@ -1356,6 +1395,8 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const 
struct bpf_prog *prog)
return _per_cpu_ptr_proto;
case BPF_FUNC_bpf_this_cpu_ptr:
return _this_cpu_ptr_proto;
+   case BPF_FUNC_kallsyms_lookup:
+   return _kallsyms_lookup_proto;
default:
return NULL;
}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index c3458ec1f30a..670998635eac 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -3817,6 +3817,21 @@ union bpf_attr {
  * The **hash_algo** is returned on success,
  * **-EOPNOTSUP** if IMA is disabled or **-EINVAL** if
  * invalid arguments are passed.
+ *
+ * long bpf_kallsyms_lookup(u64 address, char *symbol, u32 symbol_size, char 
*module, u32 module_size)
+ * Description
+ * Uses kallsyms to write the name of the symbol at *address*
+ * into *symbol* of size *symbol_sz*. This is guaranteed to be
+ * zero terminated.
+ * If the symbol is in a module, up to *module_size* bytes of
+ * the module name is written in *module*. This

[PATCH bpf-next 2/2] selftests/bpf: Add bpf_kallsyms_lookup test

2020-11-26 Thread Florent Revest

This piggybacks on the existing "ksyms" test because this test also
relies on a __ksym symbol and requires CONFIG_KALLSYMS.

Signed-off-by: Florent Revest 
---
 tools/testing/selftests/bpf/config|  1 +
 .../testing/selftests/bpf/prog_tests/ksyms.c  | 46 ++-
 .../bpf/progs/test_kallsyms_lookup.c  | 38 +++
 3 files changed, 84 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/progs/test_kallsyms_lookup.c

diff --git a/tools/testing/selftests/bpf/config 
b/tools/testing/selftests/bpf/config
index 365bf9771b07..791a46e5d013 100644
--- a/tools/testing/selftests/bpf/config
+++ b/tools/testing/selftests/bpf/config
@@ -43,3 +43,4 @@ CONFIG_IMA=y
 CONFIG_SECURITYFS=y
 CONFIG_IMA_WRITE_POLICY=y
 CONFIG_IMA_READ_POLICY=y
+CONFIG_KALLSYMS=y
diff --git a/tools/testing/selftests/bpf/prog_tests/ksyms.c 
b/tools/testing/selftests/bpf/prog_tests/ksyms.c
index b295969b263b..0478b67a92ae 100644
--- a/tools/testing/selftests/bpf/prog_tests/ksyms.c
+++ b/tools/testing/selftests/bpf/prog_tests/ksyms.c
@@ -3,11 +3,12 @@
 
 #include 
 #include "test_ksyms.skel.h"
+#include "test_kallsyms_lookup.skel.h"
 #include 
 
 static int duration;
 
-void test_ksyms(void)
+void test_ksyms_variables(void)
 {
const char *btf_path = "/sys/kernel/btf/vmlinux";
struct test_ksyms *skel;
@@ -59,3 +60,46 @@ void test_ksyms(void)
 cleanup:
test_ksyms__destroy(skel);
 }
+
+void test_kallsyms_lookup(void)
+{
+   struct test_kallsyms_lookup *skel;
+   int err;
+
+   skel = test_kallsyms_lookup__open_and_load();
+   if (CHECK(!skel, "skel_open", "failed to open and load skeleton\n"))
+   return;
+
+   err = test_kallsyms_lookup__attach(skel);
+   if (CHECK(err, "skel_attach", "skeleton attach failed: %d\n", err))
+   goto cleanup;
+
+   /* trigger tracepoint */
+   usleep(1);
+
+   CHECK(strcmp(skel->bss->name, "schedule"), "name",
+ "got \"%s\", exp \"schedule\"\n", skel->bss->name);
+   CHECK(strcmp(skel->bss->name_truncated, "sched"), "name_truncated",
+ "got \"%s\", exp \"sched\"\n", skel->bss->name_truncated);
+   CHECK(strcmp(skel->bss->name_invalid, ""), "name_invalid",
+ "got \"%s\", exp \"\"\n", skel->bss->name_invalid);
+   CHECK(strcmp(skel->bss->module_name, ""), "module_name",
+ "got \"%s\", exp \"\"\n", skel->bss->module_name);
+   CHECK(skel->bss->schedule_ret != 9, "schedule_ret",
+ "got %d, exp 0\n", skel->bss->schedule_ret);
+   CHECK(skel->bss->sched_ret != 9, "sched_ret",
+ "got %d, exp 0\n", skel->bss->sched_ret);
+   CHECK(skel->bss->invalid_ret != -EINVAL, "invalid_ret",
+ "got %d, exp %d\n", skel->bss->invalid_ret, -EINVAL);
+
+cleanup:
+   test_kallsyms_lookup__destroy(skel);
+}
+
+void test_ksyms(void)
+{
+   if (test__start_subtest("ksyms_variables"))
+   test_ksyms_variables();
+   if (test__start_subtest("kallsyms_lookup"))
+   test_kallsyms_lookup();
+}
diff --git a/tools/testing/selftests/bpf/progs/test_kallsyms_lookup.c 
b/tools/testing/selftests/bpf/progs/test_kallsyms_lookup.c
new file mode 100644
index ..4f15f1527ab4
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_kallsyms_lookup.c
@@ -0,0 +1,38 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020 Google LLC. */
+
+#include 
+#include 
+
+extern const void schedule __ksym;
+
+#define SYMBOL_NAME_LEN10
+char name[SYMBOL_NAME_LEN];
+char name_invalid[SYMBOL_NAME_LEN];
+
+#define SYMBOL_TRUNCATED_NAME_LEN  6
+char name_truncated[SYMBOL_TRUNCATED_NAME_LEN];
+
+#define MODULE_NAME_LEN64
+char module_name[MODULE_NAME_LEN];
+
+long schedule_ret;
+long sched_ret;
+long invalid_ret;
+
+SEC("raw_tp/sys_enter")
+int handler(const void *ctx)
+{
+   schedule_ret = bpf_kallsyms_lookup((__u64),
+  name, SYMBOL_NAME_LEN,
+  module_name, MODULE_NAME_LEN);
+   invalid_ret = bpf_kallsyms_lookup(0,
+ name_invalid, SYMBOL_NAME_LEN,
+ module_name, MODULE_NAME_LEN);
+   sched_ret = bpf_kallsyms_lookup((__u64), name_truncated,
+   SYMBOL_TRUNCATED_NAME_LEN,
+   module_name, MODULE_NAME_LEN);
+   return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.29.2.454.gaff20da3a2-goog

[PATCH bpf-next 2/2] bpf: Add a selftest for the tracing bpf_get_socket_cookie

2020-11-26 Thread Florent Revest

This builds up on the existing socket cookie test which checks whether
the bpf_get_socket_cookie helpers provide the same value in
cgroup/connect6 and sockops programs for a socket created by the
userspace part of the test.

Adding a tracing program to the existing objects requires a different
attachment strategy and different headers.

Signed-off-by: Florent Revest 
---
 .../selftests/bpf/progs/socket_cookie_prog.c  | 41 ---
 .../selftests/bpf/test_socket_cookie.c| 18 +---
 2 files changed, 49 insertions(+), 10 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/socket_cookie_prog.c 
b/tools/testing/selftests/bpf/progs/socket_cookie_prog.c
index 0cb5656a22b0..a11026aeaaf1 100644
--- a/tools/testing/selftests/bpf/progs/socket_cookie_prog.c
+++ b/tools/testing/selftests/bpf/progs/socket_cookie_prog.c
@@ -1,11 +1,13 @@
 // SPDX-License-Identifier: GPL-2.0
 // Copyright (c) 2018 Facebook
 
-#include 
-#include 
+#include "vmlinux.h"
 
 #include 
 #include 
+#include 
+
+#define AF_INET6 10
 
 struct socket_cookie {
__u64 cookie_key;
@@ -19,6 +21,14 @@ struct {
__type(value, struct socket_cookie);
 } socket_cookies SEC(".maps");
 
+/*
+ * These three programs get executed in a row on connect() syscalls. The
+ * userspace side of the test creates a client socket, issues a connect() on it
+ * and then checks that the local storage associated with this socket has:
+ * cookie_value == local_port << 8 | 0xFF
+ * The different parts of this cookie_value are appended by those hooks if they
+ * all agree on the output of bpf_get_socket_cookie().
+ */
 SEC("cgroup/connect6")
 int set_cookie(struct bpf_sock_addr *ctx)
 {
@@ -32,14 +42,14 @@ int set_cookie(struct bpf_sock_addr *ctx)
if (!p)
return 1;
 
-   p->cookie_value = 0xFF;
+   p->cookie_value = 0xF;
p->cookie_key = bpf_get_socket_cookie(ctx);
 
return 1;
 }
 
 SEC("sockops")
-int update_cookie(struct bpf_sock_ops *ctx)
+int update_cookie_sockops(struct bpf_sock_ops *ctx)
 {
struct bpf_sock *sk;
struct socket_cookie *p;
@@ -60,11 +70,32 @@ int update_cookie(struct bpf_sock_ops *ctx)
if (p->cookie_key != bpf_get_socket_cookie(ctx))
return 1;
 
-   p->cookie_value = (ctx->local_port << 8) | p->cookie_value;
+   p->cookie_value |= (ctx->local_port << 8);
 
return 1;
 }
 
+SEC("fexit/inet_stream_connect")
+int BPF_PROG(update_cookie_tracing, struct socket *sock,
+struct sockaddr *uaddr, int addr_len, int flags)
+{
+   struct socket_cookie *p;
+
+   if (uaddr->sa_family != AF_INET6)
+   return 0;
+
+   p = bpf_sk_storage_get(_cookies, sock->sk, 0, 0);
+   if (!p)
+   return 0;
+
+   if (p->cookie_key != bpf_get_socket_cookie(sock->sk))
+   return 0;
+
+   p->cookie_value |= 0xF0;
+
+   return 0;
+}
+
 int _version SEC("version") = 1;
 
 char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_socket_cookie.c 
b/tools/testing/selftests/bpf/test_socket_cookie.c
index ca7ca87e91aa..0d955c65a4f8 100644
--- a/tools/testing/selftests/bpf/test_socket_cookie.c
+++ b/tools/testing/selftests/bpf/test_socket_cookie.c
@@ -133,6 +133,7 @@ static int run_test(int cgfd)
struct bpf_prog_load_attr attr;
struct bpf_program *prog;
struct bpf_object *pobj;
+   struct bpf_link *link;
const char *prog_name;
int server_fd = -1;
int client_fd = -1;
@@ -153,11 +154,18 @@ static int run_test(int cgfd)
bpf_object__for_each_program(prog, pobj) {
prog_name = bpf_program__section_name(prog);
 
-   if (libbpf_attach_type_by_name(prog_name, _type))
-   goto err;
-
-   err = bpf_prog_attach(bpf_program__fd(prog), cgfd, attach_type,
- BPF_F_ALLOW_OVERRIDE);
+   if (bpf_program__is_tracing(prog)) {
+   link = bpf_program__attach(prog);
+   err = !link;
+   continue;
+   } else {
+   if (libbpf_attach_type_by_name(prog_name, _type))
+   goto err;
+
+   err = bpf_prog_attach(bpf_program__fd(prog), cgfd,
+ attach_type,
+ BPF_F_ALLOW_OVERRIDE);
+   }
if (err) {
log_err("Failed to attach prog %s", prog_name);
goto out;
-- 
2.29.2.454.gaff20da3a2-goog

[PATCH bpf-next 1/2] bpf: Expose bpf_get_socket_cookie to tracing programs

2020-11-26 Thread Florent Revest

This creates a new helper proto because the existing
bpf_get_socket_cookie_sock_proto has a ARG_PTR_TO_CTX argument and only
works for BPF programs where the context is a sock.

This helper could also be useful to other BPF program types such as LSM.

Signed-off-by: Florent Revest 
---
 kernel/trace/bpf_trace.c | 4 
 net/core/filter.c| 7 +++
 2 files changed, 11 insertions(+)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index d255bc9b2bfa..14ad96579813 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1725,6 +1725,8 @@ raw_tp_prog_func_proto(enum bpf_func_id func_id, const 
struct bpf_prog *prog)
}
 }
 
+extern const struct bpf_func_proto bpf_get_socket_cookie_sock_tracing_proto;
+
 const struct bpf_func_proto *
 tracing_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
@@ -1748,6 +1750,8 @@ tracing_prog_func_proto(enum bpf_func_id func_id, const 
struct bpf_prog *prog)
return _sk_storage_get_tracing_proto;
case BPF_FUNC_sk_storage_delete:
return _sk_storage_delete_tracing_proto;
+   case BPF_FUNC_get_socket_cookie:
+   return _get_socket_cookie_sock_tracing_proto;
 #endif
case BPF_FUNC_seq_printf:
return prog->expected_attach_type == BPF_TRACE_ITER ?
diff --git a/net/core/filter.c b/net/core/filter.c
index 2ca5eecebacf..177c4e5e529d 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4631,6 +4631,13 @@ static const struct bpf_func_proto 
bpf_get_socket_cookie_sock_proto = {
.arg1_type  = ARG_PTR_TO_CTX,
 };
 
+const struct bpf_func_proto bpf_get_socket_cookie_sock_tracing_proto = {
+   .func   = bpf_get_socket_cookie_sock,
+   .gpl_only   = false,
+   .ret_type   = RET_INTEGER,
+   .arg1_type  = ARG_PTR_TO_BTF_ID_SOCK_COMMON,
+};
+
 BPF_CALL_1(bpf_get_socket_cookie_sock_ops, struct bpf_sock_ops_kern *, ctx)
 {
return __sock_gen_cookie(ctx->sk);
-- 
2.29.2.454.gaff20da3a2-goog

Re: [PATCH bpf-next 2/2] bpf: Add a selftest for the tracing bpf_get_socket_cookie

2020-11-27 Thread Florent Revest

On Thu, 2020-11-26 at 23:56 -0800, Yonghong Song wrote:
> 
> On 11/26/20 9:02 AM, Florent Revest wrote:
> > This builds up on the existing socket cookie test which checks
> > whether
> > the bpf_get_socket_cookie helpers provide the same value in
> > cgroup/connect6 and sockops programs for a socket created by the
> > userspace part of the test.
> > 
> > Adding a tracing program to the existing objects requires a
> > different
> > attachment strategy and different headers.
> > 
> > Signed-off-by: Florent Revest 
> > ---
> >   .../selftests/bpf/progs/socket_cookie_prog.c  | 41
> > ---
> >   .../selftests/bpf/test_socket_cookie.c| 18 +---
> 
> Do you think it is possible to migrate test_socket_cookie.c to
> selftests/bpf/prog_tests so it can be part of test_progs so
> it will be regularly exercised?

I suppose it's possible, I can give it a try :)

Re: [PATCH bpf-next 1/2] bpf: Add a bpf_kallsyms_lookup helper

2020-11-27 Thread Florent Revest

On Thu, 2020-11-26 at 23:35 -0800, Yonghong Song wrote:
> On 11/26/20 8:57 AM, Florent Revest wrote:
> > +BPF_CALL_5(bpf_kallsyms_lookup, u64, address, char *, symbol, u32,
> > symbol_size,
> > +  char *, module, u32, module_size)
> > +{
> > +   char buffer[KSYM_SYMBOL_LEN];
> > +   unsigned long offset, size;
> > +   const char *name;
> > +   char *modname;
> > +   long ret;
> > +
> > +   name = kallsyms_lookup(address, , , ,
> > buffer);
> > +   if (!name)
> > +   return -EINVAL;
> > +
> > +   ret = strlen(name) + 1;
> > +   if (symbol_size) {
> > +   strncpy(symbol, name, symbol_size);
> > +   symbol[symbol_size - 1] = '\0';
> > +   }
> > +
> > +   if (modname && module_size) {
> > +   strncpy(module, modname, module_size);
> > +   module[module_size - 1] = '\0';
> 
> In this case, module name may be truncated and user did not get any
> indication from return value. In the helper description, it is
> mentioned that module name currently is most 64 bytes. But from UAPI
> perspective, it may be still good to return something to let user
> know the name is truncated.
> 
> I do not know what is the best way to do this. One suggestion is
> to break it into two helpers, one for symbol name and another
> for module name. What is the use cases people want to get both
> symbol name and module name and is it common?

Fair, I can split this into two helpers :) The lookup would be done
twice but I don't think that's a big deal.

1 2 3 >

1 - 100 of 213 matches

Mail list logo